85
SRI International Bioinformatics 1 Computing with Pathway/Genome Databases

Computing with Pathway/Genome Databases

  • Upload
    mauve

  • View
    59

  • Download
    1

Embed Size (px)

DESCRIPTION

Computing with Pathway/Genome Databases. Aprox presentation time: 1.5 hrs. Overview. Summary of Pathway Tools data access mechanisms and formats Pathway Tools APIs Overview of Pathway Tools schema. Motivations to Understanding Schema. - PowerPoint PPT Presentation

Citation preview

Page 1: Computing with Pathway/Genome Databases

SRI International Bioinformatics1

Computing with Pathway/Genome Databases

Page 2: Computing with Pathway/Genome Databases

SRI International Bioinformatics2

Aprox presentation time: 1.5 hrs

Page 3: Computing with Pathway/Genome Databases

SRI International Bioinformatics3

Overview

Summary of Pathway Tools data access mechanisms and formats

Pathway Tools APIs

Overview of Pathway Tools schema

Page 4: Computing with Pathway/Genome Databases

SRI International Bioinformatics4

Motivations to Understanding Schema

When writing complex queries to PGDBs, those queries must refer to classes and slots within the schema

Queries using Lisp, Perl, Java APIs Queries using Structured Advanced Query Form Queries using BioVelo

Find all monomers longer than 1,000 amino acids (loop for g in (get-class-all-instances ‘|Genes|) when (< 1000 (abs (- (get-slot-value g ‘left-end-position) (get-slot-value g ‘right-end-position) )) collect (get-slot-value g ‘product) )

Page 5: Computing with Pathway/Genome Databases

SRI International Bioinformatics6

More Information

Pathway Tools Web Site, Tutorial Slides http://bioinformatics.ai.sri.com/ptools/ PTools APIs: http://brg.ai.sri.com/ptools/ptools-resources.html Web services: http://biocyc.org/web-services.shtml

Guide to the Pathway Tools Schema http://biocyc.org/schema.shtml

Curator's Guide http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf

Page 6: Computing with Pathway/Genome Databases

SRI International Bioinformatics7

References

Ontology Papers section of http://biocyc.org/publications.shtml

"An Evidence Ontology for use in Pathway/Genome Databases"

"An ontology for biological function based on molecular interactions"

"Representations of metabolic knowledge: Pathways"

"Representations of metabolic knowledge"

Page 7: Computing with Pathway/Genome Databases

SRI International Bioinformatics8

Data Exchange APIs: Lisp API, Java API, and Perl API

Read and modify access Web services Cyclone Export to files

BioPAX Export Biopax.org Export PGDB genome to Genbank format Export entire PGDB as column-delimited and attribute-value file formats Export PGDB reactions as SBML -- sbml.org Import/Export of Pathways: between PGDBs Import/Export of Selected Frames, for Spreadsheets Import/Export of Compounds as Molfile, CML

BioWarehouse : Loader for Flatfiles, SQL access http://bioinformatics.ai.sri.com/biowarehouse/ BMC Bioinformatics 7:170 2006

Page 8: Computing with Pathway/Genome Databases

SRI International Bioinformatics9

Pathway Tools Ontology / Schema

Ontology classes: 1621 Datatype classes: Define objects from genomes to pathways Classification systems for pathways, chemical compounds,

enzymatic reactions (EC system) Protein Feature ontology Controlled vocabularies:

Cell Component Ontology Evidence codes

Comprehensive set of 279 attributes and relationships

Page 9: Computing with Pathway/Genome Databases

SRI International Bioinformatics10

High-Level Classes in the PathwayTools Ontology

Chemicals -- All molecules Polymer-Segments -- Regions of polymers Protein-Features -- Features on proteins

Organisms

Reactions -- Biochemical reactions Enzymatic-Reactions -- Link enzymes to reactions they catalyze Pathways -- Metabolic and signaling pathways Regulation -- Regulatory interactions

CCO -- Cell Component Ontology Evidence -- Evidence ontology Gene-Ontology-Terms -- GO

Growth-Observations -- Observations of growth of organism

Notes -- Timestamped, person-stamped notes Organizations, People Publications

Page 10: Computing with Pathway/Genome Databases

SRI International Bioinformatics11

Navigating the Schema

Page 11: Computing with Pathway/Genome Databases

SRI International Bioinformatics12

Use GKB Editor to Inspect thePathway Tools Ontology

GKB Editor = Generic Knowledge Base EditorType in Navigator window: (GKB) or[Right-Click] Edit->Ontology Editor

View->Browse Class Hierarchy[Middle-Click] to expand hierarchyTo view classes or instances, select them and:

Frame -> List Frame Contents Frame -> Edit Frame

Page 12: Computing with Pathway/Genome Databases

SRI International Bioinformatics13

Use the SAQP to Inspect the Schema

Page 13: Computing with Pathway/Genome Databases

SRI International Bioinformatics14

Pathway Tools Schema

Guide to the Pathway Tools Schema

Schema overview diagram

Page 14: Computing with Pathway/Genome Databases

SRI International Bioinformatics15

Principal Classes

Class names are capitalized, plural, separated by dashes

Genetic-Elements, with subclasses: Chromosomes Plasmids

Genes Transcription-Units RNAs

rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses:

Polypeptides Protein-Complexes

Page 15: Computing with Pathway/Genome Databases

SRI International Bioinformatics16

Principal Classes

Reactions, with subclasses: Transport-Reactions

Enzymatic-Reactions

Pathways

Compounds-And-Elements

Page 16: Computing with Pathway/Genome Databases

SRI International Bioinformatics17

Principal Classes

Regulation

Page 17: Computing with Pathway/Genome Databases

SRI International Bioinformatics18

Slot Links

Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2

sdhA sdhB sdhC sdhD

Succinate + FAD = fumarate + FADH2

Enzymatic-reaction

Succinate dehydrogenase

TCA Cycle

product

component-of

catalyzes

reaction

in-pathway

Page 18: Computing with Pathway/Genome Databases

SRI International Bioinformatics19

Programmatic Access to BioCyc

Common LISP• Native language of Pathway Tools• Interactive & Mature Environment• Full Access to the Data & Many Utility Functions• Source code is available for academics

PerlCyc• API of Functions, Exposed to Perl• Communication through UNIX Socket

JavaCyc• API of Functions, Exposed to Java• Communication through UNIX Socket

• Cyclone

Page 19: Computing with Pathway/Genome Databases

SRI International Bioinformatics20

Cyclone

Developed by Schachter and colleagues from Genoscope

http://nemo-cyclone.sourceforge.net/archi.php

Cyclone is a Java-based system that: Extracts data from a Pathway Tools PGDB Converts it to an XML schema Maps the data to Java objects and to a relational database Changes made to the data on the Java side can be

committed back to a Pathway Tools PGDB

Page 20: Computing with Pathway/Genome Databases

SRI International Bioinformatics21

Lisp API

Accessible whenever you start Pathway Tools with the –lisp argument

Lisp queries evaluate against the running Pathway Tools binary and execute very fast

Page 21: Computing with Pathway/Genome Databases

SRI International Bioinformatics22

Ocelot Object Database

Page 22: Computing with Pathway/Genome Databases

SRI International Bioinformatics23

Pathway Tools Implementation Details Platforms:

Macintosh, PC/Linux, and PC/Windows platforms

Same binary can run as desktop app or Web server

Production-quality software Version control Two regular releases per year Extensive quality assurance Extensive documentation Auto-patch Automatic DB-upgrade

600,000 lines of Lisp code

Page 23: Computing with Pathway/Genome Databases

SRI International Bioinformatics24

Pathway Tools Architecture

Ocelot DBMS

GFP API

PathwayGenome Navigator

WebMode

DesktopMode

Protein EditorPathway EditorReaction Editor

Oracleor

MySQLDiskFile

LispPerlJava

Page 24: Computing with Pathway/Genome Databases

SRI International Bioinformatics25

Ocelot Object Database

Frame data model Classes, instances, inheritance Frames have slots that define their properties, attributes,

relationships A slot has one or more values

Datatypes include numbers, strings, etc.

Slotunit frames define metadata about slots: Domain, range, inverse Collection type, number of values, value constraints

Page 25: Computing with Pathway/Genome Databases

SRI International Bioinformatics26

Storage System Architecture

File KBs

Read-only applications can be distributed without a relational DBMS

Load all objects and code into Lisp memory Dump virtual memory to binary executable file

Page 26: Computing with Pathway/Genome Databases

SRI International Bioinformatics27

Ocelot Storage System Architecture

Persistent storage via disk files, MySQL or Oracle DBMS Concurrent development: MySQL or Oracle Single-user development: disk files

Relational DBMS storage RDBMS is submerged within Ocelot, invisible to users Frames transferred from RDBMS to Ocelot

On demand By background prefetcher Memory cache Persistent disk cache to speed performance via Internet

Page 27: Computing with Pathway/Genome Databases

SRI International Bioinformatics28

Transaction Logging

Relational DBMS stores The latest version of each Ocelot frame A log of all GFP operations applied to KB

Transaction log enables: Reconstruction of earlier versions of KB View history of changes to an object Update replicates of a KB Detection of update conflicts during concurrency control Undo of updates

Page 28: Computing with Pathway/Genome Databases

SRI International Bioinformatics29

Optimistic Concurrency Control

Locking approach: edits to one object can require locking all connected objects

No locking

User performs updates in local workspace

When user commits changes, storage system compares user changes against all other committed changes

Page 29: Computing with Pathway/Genome Databases

SRI International Bioinformatics30

Ocelot Knowledge Server Schema Evolution

FRSs store and process class and instance information similarly

Application can query schema information as easily as it can query instances

Schema is stored within the DBSchema is self documentingSchema evolution facilitated by

Easy addition/removal of slots, or alteration of slot datatypes Flexible data formats that do not require dumping/reloading of

data

Page 30: Computing with Pathway/Genome Databases

SRI International Bioinformatics31

Generic Frame Protocol (GFP)

A library of procedures for accessing Ocelot DBs

GFP specification: http://www.ai.sri.com/~gfp/spec/paper/paper.html

A small number of GFP functions are sufficient for most complex queries

Page 31: Computing with Pathway/Genome Databases

SRI International Bioinformatics32

Example of a Single GFP Call

The General Pattern:gfp-function(frame slot value ...)(gfp-function frame slot value …)

LISP(get-slot-values 'TRYPSYN-RXN 'LEFT)==> (INDOLE-3-GLYCEROL-P SER)

Page 32: Computing with Pathway/Genome Databases

SRI International Bioinformatics33

Frame References

At the GFP level, every Ocelot frame can be referred to using either symbol frame name or frame object

Most GFP functions return frame objects

Importance of using fequal for comparisons

Page 33: Computing with Pathway/Genome Databases

SRI International Bioinformatics34

Generic Frame Protocol

get-class-all-instances (Class) Returns direct and indirect instances of Class

coercible-to-frame-p (Thing) Is Thing a frame? Returns True if Thing is the name of a frame, or a frame object;

else False

Page 34: Computing with Pathway/Genome Databases

SRI International Bioinformatics35

Generic Frame Protocol Notation Frame.Slot means a specified slot of a specified

frame. Note: Slot must be a symbol!

get-slot-value(Frame Slot) Returns first value of Frame.Slot

get-slot-values(Frame Slot) Returns all values of Frame.Slot as a list

slot-has-value-p(Frame Slot) Returns True if Frame.Slot has at least one value; else False

member-slot-value-p(Frame Slot Value) Returns True if Value is one of the values of Frame.Slot; else False

Instance-all-instance-of-p(Instance Class) Returns True if Instance is an all-instance of Class

Page 35: Computing with Pathway/Genome Databases

SRI International Bioinformatics36

Generic Frame Protocol

print-frame(Frame) Prints the contents of Frame

Page 36: Computing with Pathway/Genome Databases

SRI International Bioinformatics37

Generic Frame Protocol – Update Operations

put-slot-value(Frame Slot Value) Replace the current value(s) of Frame.Slot with Value

put-slot-values(Frame Slot Value-List) Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values

add-slot-value(Frame Slot Value) Add Value to the current value(s) of Frame.Slot, if any

remove-slot-value(Frame Slot Value) Remove Value from the current value(s) of Frame.slot

replace-slot-value(Frame Slot Old-Value New-Value) In Frame.Slot, replace Old-Value with New-Value

remove-local-slot-values(Frame Slot) Remove all of the values of Frame.Slot

Page 37: Computing with Pathway/Genome Databases

SRI International Bioinformatics38

Generic Frame Protocol –Update Operations

save-kb Saves the current KB

Page 38: Computing with Pathway/Genome Databases

SRI International Bioinformatics39

Additional Pathway Tools Functions –Semantic Inference LayerSemantic inference layer defines built-in

functions to compute commonly required relationships in a PGDB

http://bioinformatics.ai.sri.com/ptools/ptools-fns.html

Page 39: Computing with Pathway/Genome Databases

SRI International Bioinformatics40

PerlCyc and JavaCyc

Work on Unix (Solaris or Linux) onlyStart up Pathway Tools with the –api argPathway Tools listens on a Unix socket – perl

program communicates through this socketSupports both querying and editing PGDBsMust run perl or java program on the same

machine that runs Pathway Tools This is a security measure, as the API server has no built-in

securityCan only handle one connection at a time

Page 40: Computing with Pathway/Genome Databases

SRI International Bioinformatics41

Obtaining PerlCyc and JavaCyc

Download from http://www.sgn.cornell.edu/downloads/

PerlCyc written and maintained by Lukas Mueller at Boyce Thompson Institute for Plant Research.

JavaCyc written by Thomas Yan at Carnegie Institute, maintained by Lukas Mueller.

Easy to extend…

Page 41: Computing with Pathway/Genome Databases

SRI International Bioinformatics42

Examples of PerlCyc, JavaCyc Functions

GFP functions (require knowledge of Pathway Tools schema):

get_slot_values get_class_all_instances put_slot_values

Pathway Tools functions (described at http://bioinformatics.ai.sri.com/ptools/ptools-fns.html):

genes_of_reaction find_indexed_frame pathways_of_gene transport_p

getSlotValues getClassAllInstances putSlotValues

genesOfReaction findIndexedFrame pathwaysOfGene transportP

Page 42: Computing with Pathway/Genome Databases

SRI International Bioinformatics43

Writing a PerlCyc or JavaCyc program Create a PerlCyc, JavaCyc object:

perlcyc -> new (“ORGID”)new Javacyc (“ORGID”)

Call PerlCyc, JavaCyc functions on this object:my $cyc = perlcyc -> new (“ECOLI”);my @pathways = $cyc -> all_pathways ();

Javacyc cyc = new Javacyc(“ECOLI”);ArrayList pathways = cyc.allPathways ();

Functions return object IDs, not objects. Must connect to server again to retrieve attributes of an object.foreach my $p (@pathways) {

print $cyc -> get_slot_value ($p, “COMMON-NAME”);}

for (int i=0; I < pathways.size(); i++) { String pwy = (String) pathways.get(i); System.out.println (cyc.getSlotValue (pwy, “COMMON-NAME”); }

Page 43: Computing with Pathway/Genome Databases

SRI International Bioinformatics44

Sample PerlCyc Query

Number of proteins in E. coliuse perlcyc;my $cyc = perlcyc -> new (“ECOLI”);

my @proteins = $cyc-> get_class_all_instances("|Proteins|");

my $protein_count = scalar(@proteins);print "Protein count: $protein_count.\n";

Page 44: Computing with Pathway/Genome Databases

SRI International Bioinformatics45

Sample PerlCyc Query

Print IDs of all proteins with molecular weight between 10 and 20 kD and pI between 4 and 5.

use perlcyc;my $cyc = perlcyc -> new (“ECOLI”);

foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { my $mw = $cyc->get_slot_value($p, "molecular-weight-kd"); my $pI = $cyc->get_slot_value($p, "pi"); if ($mw <= 20 && $mw >= 10 && $pI <= 5 && $pI >= 4) { print "$p\n"; }}

Page 45: Computing with Pathway/Genome Databases

SRI International Bioinformatics46

Sample PerlCyc Query

List all the transcription factors in E. coli, and the list of genes that each regulates:

use perlcyc;my $cyc = perlcyc -> new (“ECOLI”);

foreach my $p ($cyc->get_class_all_instances("|Proteins|")) { if ($cyc->transcription_factor_p($p)) { my $name = $cyc->get_slot_value($p, "common-name"); my %genes = (); foreach my $tu ($cyc->regulon_of_protein($p)) { foreach my $g ($cyc->transcription_unit_genes($tu)) { $genes{$g} = $cyc->get_slot_value($g, "common-name"); } } print "\n\n$name: "; print join " ", values %genes; }}

Page 46: Computing with Pathway/Genome Databases

SRI International Bioinformatics47

Sample Editing Using PerlCyc

Add a link from each gene to the corresponding object in MY-DB (assume ID is same in both cases)

use perlcyc;my $cyc = perlcyc -> new (“HPY”);

my @genes = $cyc->get_class_all_instances (“|Genes|”);foreach my $g (@genes) {$cyc->add_slot_value ($g, “DBLINKS”, “(MY-DB \”$g\”)”);

}

$cyc->save_kb();

Page 47: Computing with Pathway/Genome Databases

SRI International Bioinformatics48

Sample JavaCyc Query:Enzymes for which ATP is a regulatorimport java.util.*; public class JavacycSample { public static void main(String[] args) { Javacyc cyc = new Javacyc("ECOLI"); ArrayList regframes = cyc.getClassAllInstances("|Regulation-of-Enzyme-Activity|"); for (int i = 0; i < regframes.size(); i++) { String reg = (String)regframes.get(i); boolean bool = cyc.memberSlotValueP(reg, “Regulator", "ATP"); if (bool) { String enzrxn = cyc.getSlotValue (reg, “Regulated-Entity”); String enzyme = cyc.getSlotValue (enzrxn, “Enzyme”); System.out.println(enz); } } } }

Page 48: Computing with Pathway/Genome Databases

SRI International Bioinformatics49

Simple Lisp Query Example: Enzymes for which ATP is a regulator(defun atp-inhibits () (loop for x in (get-class-all-instances '|Regulation-of-Enzyme-Activity|) ;; Does the Regulator slot contain the compound ATP, and the mode ;; of regulation is negative (inhibition)? when (and (member-slot-value-p x ‘Regulator 'ATP) (member-slot-value-p x ‘Mode “-”) ) ;; Whenever the test is positive, we collect the value of the slot Enzyme ;; of the Regulated-Entity of the regulatory interaction frame. ;; The collected values are returned as a list, once the loop terminates. collect (get-slot-value (get-slot-value x ‘Regulated-Entity) ‘Enzyme) ) );;; invoking the query:(select-organism :org-id 'ECOLI)(atp-inhibits)(get-slot-values 'TRYPSYN-RXN 'LEFT)==> (INDOLE-3-GLYCEROL-P SER)

Page 49: Computing with Pathway/Genome Databases

SRI International Bioinformatics50

Simple Perl Query Example: Enzymes for which ATP is a regulatoruse perlcyc;my $cyc = perlcyc -> new("ECOLI");my @regs = $cyc -> get_class_all_instances("|Regulation-of-Enzyme-

Activity|");## We check every instance of the classforeach my $reg (@regs) { ## We test for whether the INHIBITORS-ALL ## slot contains the compound frame ATP my $bool1 = $cyc -> member_slot_value_p($reg, “Regulator", "Atp"); my $bool2 = $cyc -> member_slot_value_p($reg, “Mode", “-");if ($bool1 && $bool2) { ## Whenever the test is positive, we collect the value of the slot

ENZYME . ## The results are printed in the terminal. my $enzrxn = $cyc -> get_slot_value($reg, “Regulated-Entity"); my $enz = $cyc -> get_slot_value($enzrxn, "Enzyme");print STDOUT "$enz\n"; }}

Page 50: Computing with Pathway/Genome Databases

SRI International Bioinformatics51

Getting started with Lisp

pathway-tools –lisp (load “file”) (compile-file “file.lisp”)

Emacs is a useful editor Pathway Tools source code is available: ask Overview of Lisp information resources:

http://bioinformatics.ai.sri.com/ptools/ptools-resources.html Documented Pathway Tools Lisp functions:

http://brg.ai.sri.com/ptools/ptools-fns.html

Page 51: Computing with Pathway/Genome Databases

SRI International Bioinformatics52

Viewing Results via the Answer List

(loop for r in (get-class-all-instances '|Reactions|) when (< 3 (length (get-slot-values r 'left))) collect r)

(setq answer *)(object-table answer)(replace-answer-list answer)

(pt)Next Answer

Page 52: Computing with Pathway/Genome Databases

SRI International Bioinformatics53

Query Gotchas

Study schema carefully:test #’fequalCascade of slot-values: check for NIL

Page 53: Computing with Pathway/Genome Databases

SRI International Bioinformatics54

Semantic Inference Layerrelationships.lisp Library of functions that encapsulate common query

building blocks and intricacies of navigating the schema

enzymes-of-gene reactions-of-gene pathways-of-gene genes-of-pathway pathway-hole-p reactions-of-compound top-containers(protein) all-rxns(type) (:metab-smm :metab-all :metab-pathways :enzyme :transport

etc.) (all-rxns :metab-pathways)

Page 54: Computing with Pathway/Genome Databases

SRI International Bioinformatics55

Pathway Tools Schema and Semantic Inference Layer

Genes, Operons, and Replicons

Page 55: Computing with Pathway/Genome Databases

SRI International Bioinformatics56

Representing a Genome

Classes: ORG is of class Organisms CHROM1 is of class Chromosomes PLASMID1 is of class Plasmids Gene1 is of class Genes Product1 is of class Polypeptides or RNA

ORG

CHROM1

CHROM2

PLASMID1

Gene1

Gene2

Gene3

genome

components Product1product

Page 56: Computing with Pathway/Genome Databases

SRI International Bioinformatics57

Polynucleotides

Review slots of COLI and of COLI-K12

Page 57: Computing with Pathway/Genome Databases

SRI International Bioinformatics58

Genetic-Elements

Sequence is stored in a separate file or database table

Page 58: Computing with Pathway/Genome Databases

SRI International Bioinformatics59

Polymer-Segments

Review slots of Genes

Page 59: Computing with Pathway/Genome Databases

SRI International Bioinformatics60

Complexities of Gene / Gene-ProductRelationships The Product of a gene can be an instance of Polypeptides

or RNAs An instance of Polypeptides can have more than one gene

encoding it Sequence position:

Nucleotide positions of starting and ending codons specified in Left-End-Position and Right-End-Position (usually greater, except at origin)

Transcription-Direction + / - Alternative splicing:

Nucleotide positions of starting and ending codons specified in Left-End-Position and Right-End-Position

Intron positions specified in Splice-Form-Introns of gene product (200 300) (350 400)

Page 60: Computing with Pathway/Genome Databases

SRI International Bioinformatics61

Gene Reaction Schematic

Page 61: Computing with Pathway/Genome Databases

SRI International Bioinformatics62

Exercises

Find all genes on a given chromosome

Find all ribosomal RNAs

Find the DNA sequence of a given gene

Find all proteins longer than 1,000 amino acids

Page 62: Computing with Pathway/Genome Databases

SRI International Bioinformatics63

Exercises

Find all genes on a given chromosome(defun genes-of-chrom (chrom) (loop for x in (get-slot-values chrom ‘components) when (instance-all-instance-of-p x ‘|Genes|) collect x) )

Find all ribosomal RNAs (get-class-all-instances ‘|rRNAs|)

Find the DNA sequence of a given gene (get-gene-sequence gene)

Page 63: Computing with Pathway/Genome Databases

SRI International Bioinformatics64

Exercises

Find all monomers longer than 1,000 nucleotides (loop for g in (get-class-all-instances ‘|Genes|) for p = (get-slot-value g ‘product) when (and (< 1000 (abs (- (get-slot-value g ‘left-end-position) (get-slot-value g ‘right-end-position) ))) (instance-all-instance-of-p p ‘|Polypeptides|) ) collect p )

Page 64: Computing with Pathway/Genome Databases

SRI International Bioinformatics65

Proteins

Page 65: Computing with Pathway/Genome Databases

SRI International Bioinformatics66

Proteins and Protein Complexes

Polypeptide: the monomer protein product of a gene (may have multiple isoforms, as indicated at gene level)

Protein complex: proteins consisting of multiple polypeptides or protein complexes

Example: DNA pol III DnaE is a polypeptide pol III core is DnaE and two other polypeptides pol III holoenzymes is several protein complexes combined

Page 66: Computing with Pathway/Genome Databases

SRI International Bioinformatics67

Protein Complex Relationships

Page 67: Computing with Pathway/Genome Databases

SRI International Bioinformatics68

Slots of a protein (DnaE)

catalyzesIs it an activator/reactant/etc?commentscomponent-ofdblinksfeatures (edited in feature editor)

Many other features possible

Page 68: Computing with Pathway/Genome Databases

SRI International Bioinformatics69

A complex at the frame level (pol III)

Same features as polypeptide frame, different use

commentcomponent-of and components

note coefficients

Page 69: Computing with Pathway/Genome Databases

SRI International Bioinformatics70

Protein Complex Relationships

Page 70: Computing with Pathway/Genome Databases

SRI International Bioinformatics71

Relationships are Defined in Many Places

component-of comes from creating a complex

appears-in-left-side-of comes from defining a reaction (as do modified forms)

inhibitor-of comes from an enzymatic reaction

can only edit dna-footprint if protein has been associated with a TU

Page 71: Computing with Pathway/Genome Databases

SRI International Bioinformatics72

Semantic Inference Layer

Reactions-of-protein (prot) Returns a list of rxns this protein catalyzes

Transcription-units-of-proteins(prot) Returns a list of TU’s activated/inhibited by the given protein

Transporter? (prot) Is this protein a transporter?

Polypeptide-or-homomultimer?(prot)Transcription-factor? (prot)Obtain-protein-stats

Returns 5 values Length of : all-polypeptides, complexes, transporters, enzymes, etc…

Page 72: Computing with Pathway/Genome Databases

SRI International Bioinformatics73

ExampleFind all enzymes that use pyridoxal phosphate as

a cofactor or prosthetic group (loop for protein in (get-class-all-instances ‘|Proteins|)

for enzrxn = (get-slot-value protein ‘enzymatic-reaction)when (and enzrxn

(or (member-slot-value-p enzrxn ‘cofactors ‘pyridoxal_phosphate)

(member-slot-value-p enzrxn ‘prosthetic-groups ‘pyridoxal_phosphate))

collect protein)

(member-slot-value-p frame slot value) : T if Value is one of the values of Slot of Frame.

Page 73: Computing with Pathway/Genome Databases

SRI International Bioinformatics74

Example Queries

Find all homomultimers

Find proteins whose pI > 10, and that reside on the negative strand of the first chromosome

Page 74: Computing with Pathway/Genome Databases

SRI International Bioinformatics75

SampleFind all proteins without a comment anywhere

Page 75: Computing with Pathway/Genome Databases

SRI International Bioinformatics76

Compounds / Reactions / Pathways

Page 76: Computing with Pathway/Genome Databases

SRI International Bioinformatics77

Compounds / Reactions / Pathways

Think of a three tiered structure: Reactions built on top of compounds Pathways built on top of reactions

Metabolic network defined by reactions alone; pathways are an additional “optional” structure

Some reactions not part of a pathwaySome reactions have no attached enzymeSome enzymes have no attached gene

Page 77: Computing with Pathway/Genome Databases

SRI International Bioinformatics78

Compounds

Page 78: Computing with Pathway/Genome Databases

SRI International Bioinformatics79

Page 79: Computing with Pathway/Genome Databases

SRI International Bioinformatics80

Compounds

Relatively few aspects of a compound defined within the compound editor

MW, formula calculated from edited structure

Most aspects defined in other editors “Pathway reactions” comes from reaction editing followed by

pathway editing Activator, etc come from the enzymatic reaction editor

Page 80: Computing with Pathway/Genome Databases

SRI International Bioinformatics81

-- Instance TRP --- Types: |Amino-Acid|, |Aromatic-Amino-Acids|, |Non-polar-amino-acids|

APPEARS-IN-LEFT-SIDE-OF: RXN0-287, TRANS-RXN-76, TRYPTOPHAN-RXN, TRYPTOPHAN--TRNA-LIGASE-RXN

APPEARS-IN-RIGHT-SIDE-OF: RXN0-2382, RXN0-301, TRANS-RXN-76, TRYPSYN-RXN

CHEMICAL-FORMULA: (C 11), (H 12), (N 2), (O 2)

COMMON-NAME: "L-tryptophan"

DBLINKS: (LIGAND-CPD "C00078" NIL |kaipa| 3311532640 NIL NIL), (CAS "6912-86-3"), (CAS "73-22-3")

NAMES: "L-tryptophan", "W", "tryptacin", "trofan", "trp", "tryptophan", "2-amino-3-indolylpropanic acid"

SMILES: "c1(c(CC(N)C(=O)O)c2(c([nH]1)cccc2))"

SYNONYMS: "W", "tryptacin", "trofan", "trp", "tryptophan", "2-amino-3-indolylpropanic acid"

____________________________________________

Page 81: Computing with Pathway/Genome Databases

SRI International Bioinformatics82

Where is diphosphate in the ontology?

Page 82: Computing with Pathway/Genome Databases

SRI International Bioinformatics83

Semantic Inference LayerReactions-of-compound (cpd)Pathways-of-compound (cpd)Is-substrate-an-autocatalytic-enzyme-p (cpd) Activated/inhibited-by? (cpds slots)

Returns a list of enzrxns for which a cpd in cpds is a modulator (example slots: activators-all, activators-allosteric)All-substrates (rxns)

All unique substrates specified in the given rxnsHas-structure-p (cpd)Obtain-cpd-stats

Returns two values: Length of :all-cpds, cpds with structures

Page 83: Computing with Pathway/Genome Databases

SRI International Bioinformatics84

Miscellaneous things….

History ListBack/Forward and History buttonsDefault list is 50 items

Show frame(print-frame ‘frame)

Page 84: Computing with Pathway/Genome Databases

SRI International Bioinformatics85

Page 85: Computing with Pathway/Genome Databases

SRI International Bioinformatics86

Queries with Multiple Answers Navigator queries:

Example: Substring search for “pyruvate” Selected list is placed on the Answer list Use “Next Answer” button to view each one of them

Lisp queries:

Example : Find reactions involving pyruvate as a substrate

(get-class-all-instances ‘|Compounds|)

(loop for rxn in (get-class-all-instances ‘|Reactions|) when (member ‘pyruvate (get-slot-values rxn ‘substrates)

collect rxn)(replace-answer-list * )