22
Systems Biology through Systems Biology through Pathway Statistics Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Embed Size (px)

Citation preview

Page 1: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Systems Biology throughSystems Biology throughPathway StatisticsPathway Statistics

Chris EveloBiGCaT Bioinformatics Group – BMT-TU/e & UMDiepenbeek; May 14 2004

Page 2: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

BiGCaT BioinformaticsWhere the cat hunts

Page 3: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

BiGCaT Bioinformatics, BiGCaT Bioinformatics, bridge between two universitiesbridge between two universities

Universiteit MaastrichtPatients, Experiments,

Arrays and Loads of Data

TU/eIdeas & Experience in Data Handling

BiGCaT

LUC DiepenbeekStatistical Foundations

Page 4: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

BiGCaT Bioinformatics,BiGCaT Bioinformatics,between two research fieldsbetween two research fields

CardiovascularResearch

Nutritional &Environmental

Research

BiGCaT

Page 5: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Our usual prey:Our usual prey:gene expression gene expression arraysarrays

Microarrays: relative fluorescense signals. Identification.

Macroarrays: absolute radioactive signal. Validation.

Page 6: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Transcriptomics:Transcriptomics:

The study of genome wide geneexpression on the transcriptional level

Where genome wide means: >20K genes. And transcriptional level means that somehow

>20K mRNA sequences have to be analyzed And >20K expression values have to be

filtered, normalized, replicate treated,clustered and understood

Thus no transcriptomics without bioinformatics

Page 7: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

No separate statistics?:No separate statistics?:

Previous slide: “…have to be: filtered, normalized, replicate treated, clustered and understood”

Don’t we have to know which genes really changed?

Page 8: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Changed?Changed?

We need statistical prove of genes changing because…

Scientist ask for it.Journals ask for it.

But do we really need it?

Page 9: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

No we don’t!No we don’t!

Biologist will double check anyway

Largest problem are false positives1 in 1000 means 20 on an array!

Replicate filtering gets rid of that, loosing very little power

off course that needed statistical proof

To understand we need pathways not single genes (or proteins)

Page 10: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Two types of arraysTwo types of arrays

Single longer (>60 mer) cDNA reporters

Agilent, Incyte,custom

1 value per reporter

Reference variabilityor multi array stats

Multi short(25 mer) oligo

reporters

Affymetrix

16-20 values perreporter

Single array statistics

Page 11: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Systems Biology TriangleSystems Biology Triangle

SystemsBiology

Transcriptomics

MetabolomicsProteomics

microarrays, 20 k(available)

Large scale analytical chemistry

(developing outside)

2D-gels, antibody techniques

(developing inside)

Page 12: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Proteomics would be:Proteomics would be:

The study of genome wide gene expression on the translational level

Where genome wide would mean: >20K proteins.

Then proteomics does not yet exist!

Page 13: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Protein variants derived from single genes

Phosphorylation?Modification?

Alternative splicing?Phosphorylation?

Alternative splicing?Modification?

Page 14: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Two types of omicsTwo types of omics

Transcriptomics

Microarrays

Values for 20 K genes

Annotation difficult

Proteomics

Currently only 2D+MS

Only 20-50identified proteins

Annotationis identification

Plus modifications

Page 15: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Gene Ontology (GO) levels (I)Gene Ontology (GO) levels (I)

Amigo browser http://www.godatabase.org/cgi-bin/go.cgiGO consortium: http://www.geneontology.org

The Gene Ontology (GO) project gives a consistent descriptions of gene products from different databases.

Page 16: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Gene Ontology (GO) levels (II)Gene Ontology (GO) levels (II)

Page 17: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Use of GO classificationUse of GO classification-GenMAPP--GenMAPP-

GenMAPP = Gene MicroArray Pathway Profiler

Program to visualize Gene Expression Data on MAPPs representing biological pathways and grouping of genes

* Local MAPPs contain pathways made by specific research institutes

* Gene Ontology (GO) MAPPS contain pathways with functionally related genes from the public Gene Ontology Project

Page 18: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Example Local MAPPExample Local MAPP

Page 19: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Example GO MAPPExample GO MAPP

Page 20: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Local MAPPLocal MAPP

Page 21: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

GO MAPPGO MAPP

Page 22: Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May 14 2004

Understanding changesUnderstanding changes

Map changed genes/proteins (quantitatively or qualitatively) to known pathways.

Or use information from the Gene Ontology (GO) database

Steal and smartly adapt a transcriptomics tool:GenMapp/Mappfinder

Rachel will show some examples