19
Formalizations of Function & Formalizations of Function & Literature Databases Literature Databases

Formalizations of Function & Literature Databases

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Formalizations of Function & Literature Databases

Formalizations of Function & Literature Formalizations of Function & Literature DatabasesDatabases

Formalizations of Function & Literature Formalizations of Function & Literature DatabasesDatabases

Page 2: Formalizations of Function & Literature Databases

Protein function Protein function predictionprediction

Protein function Protein function predictionprediction

• What is function ?What is function ?

• Various levels of Various levels of descriptiondescription

• What is function ?What is function ?

• Various levels of Various levels of descriptiondescription

Page 3: Formalizations of Function & Literature Databases

What is function?What is function?What is function?What is function?

• Contextual / philosophical pointContextual / philosophical point

• operational dichotomy I often operational dichotomy I often use: biochemical function vs use: biochemical function vs biological role: biological role:

– Enolase (2-phospho-D-Enolase (2-phospho-D-glycerate hydrolase) glycerate hydrolase) catalyses the catalyses the interconversion of 2-interconversion of 2-phosphoglycerate and phosphoglycerate and phosphoenolpyruvatephosphoenolpyruvate

– Part of the glycolysis Part of the glycolysis pathwaypathway

• Contextual / philosophical pointContextual / philosophical point

• operational dichotomy I often operational dichotomy I often use: biochemical function vs use: biochemical function vs biological role: biological role:

– Enolase (2-phospho-D-Enolase (2-phospho-D-glycerate hydrolase) glycerate hydrolase) catalyses the catalyses the interconversion of 2-interconversion of 2-phosphoglycerate and phosphoglycerate and phosphoenolpyruvatephosphoenolpyruvate

– Part of the glycolysis Part of the glycolysis pathwaypathway

Page 4: Formalizations of Function & Literature Databases

But ..But ..But ..But ..

• α-enolase in addition functions as a lens structural α-enolase in addition functions as a lens structural protein, τ-crystallin in ducksprotein, τ-crystallin in ducks

• protein multi-functionality protein multi-functionality

• Molecular function? Nice crystallization and Molecular function? Nice crystallization and refractory properties?refractory properties?

• α-enolase in addition functions as a lens structural α-enolase in addition functions as a lens structural protein, τ-crystallin in ducksprotein, τ-crystallin in ducks

• protein multi-functionality protein multi-functionality

• Molecular function? Nice crystallization and Molecular function? Nice crystallization and refractory properties?refractory properties?

Page 5: Formalizations of Function & Literature Databases

Gene OntologyGene OntologyGene OntologyGene Ontology

• Historically: nothing … except swissprot Historically: nothing … except swissprot keywords and specific systems for metabolic keywords and specific systems for metabolic enzymes enzymes

• This is somewhat problematic for automated This is somewhat problematic for automated gene function prediction (e.g. blast and/or co-gene function prediction (e.g. blast and/or co-expression) and for the study of the evolution of expression) and for the study of the evolution of gene function. gene function.

• Despite everything that we know as written down Despite everything that we know as written down in the public literature !?in the public literature !?

• One (example) solution: Gene OntologyOne (example) solution: Gene Ontology

• Historically: nothing … except swissprot Historically: nothing … except swissprot keywords and specific systems for metabolic keywords and specific systems for metabolic enzymes enzymes

• This is somewhat problematic for automated This is somewhat problematic for automated gene function prediction (e.g. blast and/or co-gene function prediction (e.g. blast and/or co-expression) and for the study of the evolution of expression) and for the study of the evolution of gene function. gene function.

• Despite everything that we know as written down Despite everything that we know as written down in the public literature !?in the public literature !?

• One (example) solution: Gene OntologyOne (example) solution: Gene Ontology

Page 6: Formalizations of Function & Literature Databases

Gene OntologyGene OntologyGene OntologyGene Ontology

• computer science: an ontology is a data model computer science: an ontology is a data model that represents a domain and is used to reason that represents a domain and is used to reason about the objects in that domain and the relations about the objects in that domain and the relations between them.between them.

– GO:0008150 : biological_processGO:0008150 : biological_process

– GO:0005575 : cellular_component GO:0005575 : cellular_component

– GO:0003674 : molecular_functionGO:0003674 : molecular_function

• computer science: an ontology is a data model computer science: an ontology is a data model that represents a domain and is used to reason that represents a domain and is used to reason about the objects in that domain and the relations about the objects in that domain and the relations between them.between them.

– GO:0008150 : biological_processGO:0008150 : biological_process

– GO:0005575 : cellular_component GO:0005575 : cellular_component

– GO:0003674 : molecular_functionGO:0003674 : molecular_function

Page 7: Formalizations of Function & Literature Databases

Gene Ontology: Molecular functionGene Ontology: Molecular functionGene Ontology: Molecular functionGene Ontology: Molecular function

• Molecular function describes activities, such as catalytic or Molecular function describes activities, such as catalytic or binding activities, at the molecular level. GO molecular binding activities, at the molecular level. GO molecular function terms represent activities rather than the entities function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do (molecules or complexes) that perform the actions, and do not specify where or when, or in what context, the action not specify where or when, or in what context, the action takes place. Molecular functions generally correspond to takes place. Molecular functions generally correspond to activities that can be performed by individual gene activities that can be performed by individual gene products, but some activities are performed by assembled products, but some activities are performed by assembled complexes of gene products. Examples of broad complexes of gene products. Examples of broad functional terms are functional terms are catalytic activitycatalytic activity, , transporter transporter activityactivity, or , or bindingbinding; examples of narrower functional ; examples of narrower functional terms are terms are adenylate cyclase activityadenylate cyclase activity or or Toll receptor Toll receptor bindingbinding. .

• Molecular function describes activities, such as catalytic or Molecular function describes activities, such as catalytic or binding activities, at the molecular level. GO molecular binding activities, at the molecular level. GO molecular function terms represent activities rather than the entities function terms represent activities rather than the entities (molecules or complexes) that perform the actions, and do (molecules or complexes) that perform the actions, and do not specify where or when, or in what context, the action not specify where or when, or in what context, the action takes place. Molecular functions generally correspond to takes place. Molecular functions generally correspond to activities that can be performed by individual gene activities that can be performed by individual gene products, but some activities are performed by assembled products, but some activities are performed by assembled complexes of gene products. Examples of broad complexes of gene products. Examples of broad functional terms are functional terms are catalytic activitycatalytic activity, , transporter transporter activityactivity, or , or bindingbinding; examples of narrower functional ; examples of narrower functional terms are terms are adenylate cyclase activityadenylate cyclase activity or or Toll receptor Toll receptor bindingbinding. .

Page 8: Formalizations of Function & Literature Databases

• DNA-directed DNA polymerase activityDNA-directed DNA polymerase activity• Accession: GO:0003887 Accession: GO:0003887 • Ontology: molecular_function Ontology: molecular_function • Synonyms: alt_id: GO:0003888 Synonyms: alt_id: GO:0003888 • Definition: Definition:

– Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template or primer.triphosphates in the presence of a DNA template or primer.

• Comment: None Comment: None

• Term LineageTerm Lineage • Graphical ViewGraphical View• all : all ( 228266 ) all : all ( 228266 )

– GO:0003674 : molecular_function ( 172339 ) GO:0003674 : molecular_function ( 172339 ) • GO:0003824 : catalytic activity ( 68591 ) GO:0003824 : catalytic activity ( 68591 )

– GO:0016740 : transferase activity ( 22363 ) GO:0016740 : transferase activity ( 22363 ) » GO:0016772 : transferase activity, transferring phosphorus-GO:0016772 : transferase activity, transferring phosphorus-

containing groups ( 13535 ) containing groups ( 13535 ) » GO:0016779 : nucleotidyltransferase activity ( 3400 ) GO:0016779 : nucleotidyltransferase activity ( 3400 ) » GO:0003887 : DNA-directed DNA polymerase activityGO:0003887 : DNA-directed DNA polymerase activity ( 519 ) ( 519 )

• DNA-directed DNA polymerase activityDNA-directed DNA polymerase activity• Accession: GO:0003887 Accession: GO:0003887 • Ontology: molecular_function Ontology: molecular_function • Synonyms: alt_id: GO:0003888 Synonyms: alt_id: GO:0003888 • Definition: Definition:

– Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = Catalysis of the reaction: deoxynucleoside triphosphate + DNA(n) = diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide diphosphate + DNA(n+1); the synthesis of DNA from deoxyribonucleotide triphosphates in the presence of a DNA template or primer.triphosphates in the presence of a DNA template or primer.

• Comment: None Comment: None

• Term LineageTerm Lineage • Graphical ViewGraphical View• all : all ( 228266 ) all : all ( 228266 )

– GO:0003674 : molecular_function ( 172339 ) GO:0003674 : molecular_function ( 172339 ) • GO:0003824 : catalytic activity ( 68591 ) GO:0003824 : catalytic activity ( 68591 )

– GO:0016740 : transferase activity ( 22363 ) GO:0016740 : transferase activity ( 22363 ) » GO:0016772 : transferase activity, transferring phosphorus-GO:0016772 : transferase activity, transferring phosphorus-

containing groups ( 13535 ) containing groups ( 13535 ) » GO:0016779 : nucleotidyltransferase activity ( 3400 ) GO:0016779 : nucleotidyltransferase activity ( 3400 ) » GO:0003887 : DNA-directed DNA polymerase activityGO:0003887 : DNA-directed DNA polymerase activity ( 519 ) ( 519 )

Page 9: Formalizations of Function & Literature Databases

Gene Ontology: Biological ProcessGene Ontology: Biological ProcessGene Ontology: Biological ProcessGene Ontology: Biological Process

• A biological process is series of events A biological process is series of events accomplished by one or more ordered accomplished by one or more ordered assemblies of molecular functions. Examples of assemblies of molecular functions. Examples of broad biological process terms are broad biological process terms are cellular cellular physiological processphysiological process or or signal transductionsignal transduction. . Examples of more specific terms are Examples of more specific terms are pyrimidine pyrimidine metabolismmetabolism or or alpha-glucoside transportalpha-glucoside transport. It . It can be difficult to distinguish between a biological can be difficult to distinguish between a biological process and a molecular function, but the process and a molecular function, but the general rule is that a process must have more general rule is that a process must have more than one distinct steps. than one distinct steps.

• A biological process is series of events A biological process is series of events accomplished by one or more ordered accomplished by one or more ordered assemblies of molecular functions. Examples of assemblies of molecular functions. Examples of broad biological process terms are broad biological process terms are cellular cellular physiological processphysiological process or or signal transductionsignal transduction. . Examples of more specific terms are Examples of more specific terms are pyrimidine pyrimidine metabolismmetabolism or or alpha-glucoside transportalpha-glucoside transport. It . It can be difficult to distinguish between a biological can be difficult to distinguish between a biological process and a molecular function, but the process and a molecular function, but the general rule is that a process must have more general rule is that a process must have more than one distinct steps. than one distinct steps.

Page 10: Formalizations of Function & Literature Databases

• DNA replicationDNA replication• Accession: Accession: GO:0006260 GO:0006260 • Ontology: Ontology: biological_process biological_process • Synonyms: Synonyms:

– relatedrelated: DNA biosynthesis : DNA biosynthesis – relatedrelated: DNA synthesis : DNA synthesis

• Definition: Definition: – The process whereby new strands of DNA are synthesized. The template for The process whereby new strands of DNA are synthesized. The template for

replication can either be DNA or RNA.replication can either be DNA or RNA.• Comment: Comment:

– See also the biological process terms 'DNA-dependent DNA replication ; See also the biological process terms 'DNA-dependent DNA replication ; GO:0006261' and 'RNA-dependent DNA replication ; GO:0006278'.GO:0006261' and 'RNA-dependent DNA replication ; GO:0006278'.

• Term LineageTerm Lineage • Graphical ViewGraphical View• all : all ( 228266 ) all : all ( 228266 )

– GO:0008150 : biological_process ( 166476 ) GO:0008150 : biological_process ( 166476 ) • GO:0009987 : cellular process ( 111929 ) GO:0009987 : cellular process ( 111929 )

– GO:0050875 : cellular physiological process ( 103960 ) GO:0050875 : cellular physiological process ( 103960 ) » GO:0044237 : cellular metabolism ( 71681 ) GO:0044237 : cellular metabolism ( 71681 ) » GO:0006139 : nucleobase, nucleoside, nucleotide and nucleic acid GO:0006139 : nucleobase, nucleoside, nucleotide and nucleic acid

metabolism ( 27559 ) metabolism ( 27559 ) » GO:0006259 : DNA metabolism ( 8807 ) GO:0006259 : DNA metabolism ( 8807 ) » GO:0006260 : DNA replicationGO:0006260 : DNA replication ( 3202 ) ( 3202 )

• DNA replicationDNA replication• Accession: Accession: GO:0006260 GO:0006260 • Ontology: Ontology: biological_process biological_process • Synonyms: Synonyms:

– relatedrelated: DNA biosynthesis : DNA biosynthesis – relatedrelated: DNA synthesis : DNA synthesis

• Definition: Definition: – The process whereby new strands of DNA are synthesized. The template for The process whereby new strands of DNA are synthesized. The template for

replication can either be DNA or RNA.replication can either be DNA or RNA.• Comment: Comment:

– See also the biological process terms 'DNA-dependent DNA replication ; See also the biological process terms 'DNA-dependent DNA replication ; GO:0006261' and 'RNA-dependent DNA replication ; GO:0006278'.GO:0006261' and 'RNA-dependent DNA replication ; GO:0006278'.

• Term LineageTerm Lineage • Graphical ViewGraphical View• all : all ( 228266 ) all : all ( 228266 )

– GO:0008150 : biological_process ( 166476 ) GO:0008150 : biological_process ( 166476 ) • GO:0009987 : cellular process ( 111929 ) GO:0009987 : cellular process ( 111929 )

– GO:0050875 : cellular physiological process ( 103960 ) GO:0050875 : cellular physiological process ( 103960 ) » GO:0044237 : cellular metabolism ( 71681 ) GO:0044237 : cellular metabolism ( 71681 ) » GO:0006139 : nucleobase, nucleoside, nucleotide and nucleic acid GO:0006139 : nucleobase, nucleoside, nucleotide and nucleic acid

metabolism ( 27559 ) metabolism ( 27559 ) » GO:0006259 : DNA metabolism ( 8807 ) GO:0006259 : DNA metabolism ( 8807 ) » GO:0006260 : DNA replicationGO:0006260 : DNA replication ( 3202 ) ( 3202 )

Page 11: Formalizations of Function & Literature Databases

Gene Ontology: Cellular ComponentGene Ontology: Cellular ComponentGene Ontology: Cellular ComponentGene Ontology: Cellular Component

• A cellular component is just that, a component of A cellular component is just that, a component of a cell, but with the proviso that it is part of some a cell, but with the proviso that it is part of some larger object; this may be an anatomical structure larger object; this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, a gene product group (e.g. ribosome, proteasome or a protein dimer). proteasome or a protein dimer). cellular_componentcellular_component

• A cellular component is just that, a component of A cellular component is just that, a component of a cell, but with the proviso that it is part of some a cell, but with the proviso that it is part of some larger object; this may be an anatomical structure larger object; this may be an anatomical structure (e.g. rough endoplasmic reticulum or nucleus) or (e.g. rough endoplasmic reticulum or nucleus) or a gene product group (e.g. ribosome, a gene product group (e.g. ribosome, proteasome or a protein dimer). proteasome or a protein dimer). cellular_componentcellular_component

Page 12: Formalizations of Function & Literature Databases

• DNA-directed RNA polymerase II, core complexDNA-directed RNA polymerase II, core complex• Accession: Accession: GO:0005665 GO:0005665 • Ontology: Ontology: cellular_component cellular_component • Synonyms: relatedSynonyms: related: DNA-directed RNA polymerase II activity : DNA-directed RNA polymerase II activity • Definition: Definition:

– RNA polymerase II, one of three eukaryotic nuclear RNA polymerases, is a multisubunit RNA polymerase II, one of three eukaryotic nuclear RNA polymerases, is a multisubunit complex; it produces mRNAs, snoRNAs, and some of the snRNAs. Two large subunits complex; it produces mRNAs, snoRNAs, and some of the snRNAs. Two large subunits comprise the most conserved portion including the catalytic site and share similarity with comprise the most conserved portion including the catalytic site and share similarity with other eukaryotic and bacterial multisubunit RNA polymerases. The largest subunit of RNA other eukaryotic and bacterial multisubunit RNA polymerases. The largest subunit of RNA polymerase II contains an essential carboxyl-terminal domain (CTD) composed of a polymerase II contains an essential carboxyl-terminal domain (CTD) composed of a variable number of heptapeptide repeats (YSPTSPS). The remainder of the complex is variable number of heptapeptide repeats (YSPTSPS). The remainder of the complex is composed of smaller subunits (generally ten or more), some of which are also found in composed of smaller subunits (generally ten or more), some of which are also found in RNA polymerases I and III. Although the core is competent to mediate ribonucleic acid RNA polymerases I and III. Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template.synthesis, it requires additional factors to select the appropriate template.

GO:0005575 : cellular_component ( 116994 ) GO:0005575 : cellular_component ( 116994 ) GO:0005623 : cell ( 86438 ) GO:0005623 : cell ( 86438 )

GO:0044464 : cell part ( 86397 ) GO:0044464 : cell part ( 86397 ) GO:0005622 : intracellular ( 70018 ) GO:0005622 : intracellular ( 70018 )

GO:0044424 : intracellular part ( 69369 ) GO:0044424 : intracellular part ( 69369 ) GO:0043229 : intracellular organelle ( 63194 ) GO:0043229 : intracellular organelle ( 63194 ) GO:0043231 : intracellular membrane-bound organelle ( 58868 ) GO:0043231 : intracellular membrane-bound organelle ( 58868 ) GO:0005634 : nucleus ( 12609 ) GO:0005634 : nucleus ( 12609 ) GO:0044428 : nuclear part ( 5000 ) GO:0044428 : nuclear part ( 5000 ) GO:0031981 : nuclear lumen ( 3017 ) GO:0031981 : nuclear lumen ( 3017 ) GO:0005654 : nucleoplasm ( 1990 ) GO:0005654 : nucleoplasm ( 1990 ) GO:0044451 : nucleoplasm part ( 1791 ) GO:0044451 : nucleoplasm part ( 1791 ) GO:0016591 : DNA-directed RNA polymerase II, holoenzyme ( 462 ) GO:0016591 : DNA-directed RNA polymerase II, holoenzyme ( 462 ) GO:0005665 : DNA-directed RNA polymerase II, core complexGO:0005665 : DNA-directed RNA polymerase II, core complex (85)(85)

• DNA-directed RNA polymerase II, core complexDNA-directed RNA polymerase II, core complex• Accession: Accession: GO:0005665 GO:0005665 • Ontology: Ontology: cellular_component cellular_component • Synonyms: relatedSynonyms: related: DNA-directed RNA polymerase II activity : DNA-directed RNA polymerase II activity • Definition: Definition:

– RNA polymerase II, one of three eukaryotic nuclear RNA polymerases, is a multisubunit RNA polymerase II, one of three eukaryotic nuclear RNA polymerases, is a multisubunit complex; it produces mRNAs, snoRNAs, and some of the snRNAs. Two large subunits complex; it produces mRNAs, snoRNAs, and some of the snRNAs. Two large subunits comprise the most conserved portion including the catalytic site and share similarity with comprise the most conserved portion including the catalytic site and share similarity with other eukaryotic and bacterial multisubunit RNA polymerases. The largest subunit of RNA other eukaryotic and bacterial multisubunit RNA polymerases. The largest subunit of RNA polymerase II contains an essential carboxyl-terminal domain (CTD) composed of a polymerase II contains an essential carboxyl-terminal domain (CTD) composed of a variable number of heptapeptide repeats (YSPTSPS). The remainder of the complex is variable number of heptapeptide repeats (YSPTSPS). The remainder of the complex is composed of smaller subunits (generally ten or more), some of which are also found in composed of smaller subunits (generally ten or more), some of which are also found in RNA polymerases I and III. Although the core is competent to mediate ribonucleic acid RNA polymerases I and III. Although the core is competent to mediate ribonucleic acid synthesis, it requires additional factors to select the appropriate template.synthesis, it requires additional factors to select the appropriate template.

GO:0005575 : cellular_component ( 116994 ) GO:0005575 : cellular_component ( 116994 ) GO:0005623 : cell ( 86438 ) GO:0005623 : cell ( 86438 )

GO:0044464 : cell part ( 86397 ) GO:0044464 : cell part ( 86397 ) GO:0005622 : intracellular ( 70018 ) GO:0005622 : intracellular ( 70018 )

GO:0044424 : intracellular part ( 69369 ) GO:0044424 : intracellular part ( 69369 ) GO:0043229 : intracellular organelle ( 63194 ) GO:0043229 : intracellular organelle ( 63194 ) GO:0043231 : intracellular membrane-bound organelle ( 58868 ) GO:0043231 : intracellular membrane-bound organelle ( 58868 ) GO:0005634 : nucleus ( 12609 ) GO:0005634 : nucleus ( 12609 ) GO:0044428 : nuclear part ( 5000 ) GO:0044428 : nuclear part ( 5000 ) GO:0031981 : nuclear lumen ( 3017 ) GO:0031981 : nuclear lumen ( 3017 ) GO:0005654 : nucleoplasm ( 1990 ) GO:0005654 : nucleoplasm ( 1990 ) GO:0044451 : nucleoplasm part ( 1791 ) GO:0044451 : nucleoplasm part ( 1791 ) GO:0016591 : DNA-directed RNA polymerase II, holoenzyme ( 462 ) GO:0016591 : DNA-directed RNA polymerase II, holoenzyme ( 462 ) GO:0005665 : DNA-directed RNA polymerase II, core complexGO:0005665 : DNA-directed RNA polymerase II, core complex (85)(85)

Page 13: Formalizations of Function & Literature Databases

go or no gogo or no gogo or no gogo or no go

• Used frequently for question such as: is there Used frequently for question such as: is there any functional pattern to my set of co-expressed any functional pattern to my set of co-expressed genes? (overrepresentation of a particular genes? (overrepresentation of a particular process and/or complex) process and/or complex)

• Better than nothing. Better than nothing.

• How are the GO terms assigned (e.g. TAS vs How are the GO terms assigned (e.g. TAS vs IEA)IEA)

• GO slim … GO slim …

• A framework / staring pointA framework / staring point

• Used frequently for question such as: is there Used frequently for question such as: is there any functional pattern to my set of co-expressed any functional pattern to my set of co-expressed genes? (overrepresentation of a particular genes? (overrepresentation of a particular process and/or complex) process and/or complex)

• Better than nothing. Better than nothing.

• How are the GO terms assigned (e.g. TAS vs How are the GO terms assigned (e.g. TAS vs IEA)IEA)

• GO slim … GO slim …

• A framework / staring pointA framework / staring point

Page 14: Formalizations of Function & Literature Databases

Use for questions like: what portion of Use for questions like: what portion of proteins does human devote to transcription proteins does human devote to transcription

regulation: allows for such questionsregulation: allows for such questions

Use for questions like: what portion of Use for questions like: what portion of proteins does human devote to transcription proteins does human devote to transcription

regulation: allows for such questionsregulation: allows for such questions

• Controlled vocabularyControlled vocabulary

• Conceptual framework of thinking about our Conceptual framework of thinking about our knowledge on cellular mechanismsknowledge on cellular mechanisms

• Controlled vocabularyControlled vocabulary

• Conceptual framework of thinking about our Conceptual framework of thinking about our knowledge on cellular mechanismsknowledge on cellular mechanisms

Page 15: Formalizations of Function & Literature Databases

E(nzyme) C(ode) number: a hierarchical E(nzyme) C(ode) number: a hierarchical system to describe enzymatic function system to describe enzymatic function E(nzyme) C(ode) number: a hierarchical E(nzyme) C(ode) number: a hierarchical system to describe enzymatic function system to describe enzymatic function

• EC 1 OxidoreductasesEC 1 Oxidoreductases

• EC 2 TransferasesEC 2 Transferases

• EC 3 HydrolasesEC 3 Hydrolases

• EC 4 LyasesEC 4 Lyases

• EC 5 IsomerasesEC 5 Isomerases

• EC 6 LigasesEC 6 Ligases

• EC 2.7 Transferring phosphorus-containing groupsEC 2.7 Transferring phosphorus-containing groups

• EC 2.7.7 NucleotidyltransferasesEC 2.7.7 Nucleotidyltransferases

• EC 2.7.7.6 DNA-directed RNA polymerase EC 2.7.7.6 DNA-directed RNA polymerase

• EC 1 OxidoreductasesEC 1 Oxidoreductases

• EC 2 TransferasesEC 2 Transferases

• EC 3 HydrolasesEC 3 Hydrolases

• EC 4 LyasesEC 4 Lyases

• EC 5 IsomerasesEC 5 Isomerases

• EC 6 LigasesEC 6 Ligases

• EC 2.7 Transferring phosphorus-containing groupsEC 2.7 Transferring phosphorus-containing groups

• EC 2.7.7 NucleotidyltransferasesEC 2.7.7 Nucleotidyltransferases

• EC 2.7.7.6 DNA-directed RNA polymerase EC 2.7.7.6 DNA-directed RNA polymerase

Page 16: Formalizations of Function & Literature Databases

Homology ~ molecular functionHomology ~ molecular functionHomology ~ molecular functionHomology ~ molecular function

• In other words re metabolic pathways, homologs In other words re metabolic pathways, homologs are observed to catalyze similar reactions, but are observed to catalyze similar reactions, but often in different pathways. often in different pathways.

• In other words re metabolic pathways, homologs In other words re metabolic pathways, homologs are observed to catalyze similar reactions, but are observed to catalyze similar reactions, but often in different pathways. often in different pathways.

Page 17: Formalizations of Function & Literature Databases

Homology ~ molecular functionHomology ~ molecular functionHomology ~ molecular functionHomology ~ molecular function

Page 18: Formalizations of Function & Literature Databases

So if we do function prediction using So if we do function prediction using sequence (i.e. blast, trees ect. ) then?sequence (i.e. blast, trees ect. ) then?So if we do function prediction using So if we do function prediction using sequence (i.e. blast, trees ect. ) then?sequence (i.e. blast, trees ect. ) then?

• If we think we see an ortholog we can transfer a If we think we see an ortholog we can transfer a lot of aspects of function and rolelot of aspects of function and role

• If we see only an homolog we can only transfer If we see only an homolog we can only transfer some aspects of molecular function but not some aspects of molecular function but not process / roleprocess / role

• If we think we see an ortholog we can transfer a If we think we see an ortholog we can transfer a lot of aspects of function and rolelot of aspects of function and role

• If we see only an homolog we can only transfer If we see only an homolog we can only transfer some aspects of molecular function but not some aspects of molecular function but not process / roleprocess / role

Page 19: Formalizations of Function & Literature Databases

ExamplesExamplesExamplesExamples

• Fringe as glycosyl transferaseFringe as glycosyl transferase

• ATPase family associated with various ATPase family associated with various cellular activities (AAA)cellular activities (AAA)AAA family proteins often perform chaperone-like AAA family proteins often perform chaperone-like functions that assist in the assembly, operation, functions that assist in the assembly, operation, or disassembly of protein complexes or disassembly of protein complexes

• … … so how to place query genes in a process/role so how to place query genes in a process/role then?then?

• Fringe as glycosyl transferaseFringe as glycosyl transferase

• ATPase family associated with various ATPase family associated with various cellular activities (AAA)cellular activities (AAA)AAA family proteins often perform chaperone-like AAA family proteins often perform chaperone-like functions that assist in the assembly, operation, functions that assist in the assembly, operation, or disassembly of protein complexes or disassembly of protein complexes

• … … so how to place query genes in a process/role so how to place query genes in a process/role then?then?