78
Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Bioinformatics as an integrative science

Jaap Heringa

Faculty of Sciences

Faculty of Earth and Life Sciences

Integrative Bioinformatics Institute VU (IBIVU)

[email protected], www.cs.vu.nl/~ibivu, Tel. +31-20-4447649

Page 2: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Gathering knowledge

• Anatomy, architecture

• Dynamics, mechanics

• Informatics(Cybernetics – Wiener, 1948) (Cybernetics has been defined as the science of control in machines and animals, and hence it applies to technological, animal and environmental systems)

• Genomics, bioinformatics

Rembrandt, 1632

Newton, 1726

Page 3: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

MathematicsStatistics

Computer ScienceInformatics

BiologyMolecular biology

Medicine

Chemistry

Physics

Bioinformatics

Bioinformatics

Page 4: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Bioinformatics“Studying informational processes in biological systems”

(Hogeweg Utrecht; early 1970s)

Applying algorithms and mathematical formalisms in biology (genomics) USA started but now everywhere

Taking care of the computational infrastructure and data management everywhere

Is a supporting science everywhere

“Information technology applied to the management and analysis of biological data” (Attwood and Parry-Smith)

Page 5: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

The Human Genome -- 26 June 2000

Page 6: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Dinner discussion: Integrative Bioinformatics & Genomics VUDinner discussion: Integrative Bioinformatics & Genomics VU

metabolomemetabolome

proteomeproteome

genomegenome

transcriptometranscriptome

physiomephysiome

Genomics

Page 7: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

A gene codes for a protein

Protein

mRNA

DNA

transcription

translation

CCTGAGCCAACTATTGATGAA

PEPTIDE

CCUGAGCCAACUAUUGAUGAA

4-letter alphabet

20-letter alphabet

Page 8: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Humans havespliced genes…

Page 9: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

DNA makes RNA makes Protein

Page 10: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Remarks•Proteins can use different combinations of exons =>

alternative splicing

•The human factor VIII gene (whose mutations cause hemophilia A) is spread over ~186,000 bp. It consists of 26 exons ranging in size from 69 to 3,106 bp, and its 25 introns range in size from 207 to 32,400 bp. The complete gene is thus ~9 kb of exon and ~177 kb of intron.

•The biggest human gene yet is for dystrophin. It has > 30 exons and is spread over 2.4 million bp.

•Single Nucleotide Polymorphism (SNP) data important for health

Page 11: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Microarray with about20K genes…

Page 12: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Proteomics

• X-ray crystallography• NMR• Mass spectrometry data • Structural genomics: solving and

categorising all existing protein folds (3D structures)

• Protein-protein interactions • Protein-ligand interactions (drug design)

Page 13: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Metabolic networks

Glycolysis and

Gluconeogenesis

Kegg database (Japan)

Page 14: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Physiome

• Metabolomics + all other little things in the cell

• Ions, protons, etc.

Page 15: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Algorithms in bioinformatics• string algorithms• dynamic programming• machine learning (NN, k-NN, SVM, GA, ..)• Markov chain models• hidden Markov models• Markov Chain Monte Carlo (MCMC) algorithms• stochastic context free grammars• EM algorithms• Gibbs sampling• clustering• tree algorithms• text analysis• hybrid/combinatorial techniques and more…

Page 16: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Free University initiativesIntegrative Bioinformatics Institute VU (IBIVU)

•Centre for Research on BioComplex Systems (CRBCS) – Systems Biology

•Centre for Neurobiology and Cognitive Research (CNCR)

•VU Medical Centre (Microarray, CGH data)

Page 17: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

IBIVU supporting Dutch initiatives•BioRange: Pan-Dutch bioinformatics proposal (65M Euro)

•Centre for Medical Systems Biology (Leiden, A’dam, R’dam)

•Ecogenomics (A’dam, Wageningen, Nat. Inst. For Health and Environment (RIVM))

•BioASP: streamline/stimulate bioinformatics teaching across The Netherlands

Page 18: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Dutch Centres of Excellence

•Cancer Genomics Consortium [DCGP]

•Center for Biosystem Genomics [CBSG], focuses on plant genomics (potato, tomato) 

•Kluyver Centre for Genomics of Industrial Fermentation [Kluyver]

•Center for Medical Systems Biology [CMSB], focuses on multifactorial disease

•Netherlands Proteomics Centre for proteomics as an emerging horizontal genomics discipline

Page 19: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Dutch academic/industrial initiatives• Nutrigenomics exploration into the prevention and care of

nutrional inroads in vascular disease, diabetes, hypertension and obesity

• Interaction between the immune system and food; a functional genomics approach to celiac disease

• Mechanisms of life-threatening virus disease and new leads for treatment and vaccines

• Genomics of host – respiratory virus interactions: towards novel intervention strategies;

• Ecogenomics: Functioning of ecosystems targeted at sustainable environmentally friendly and healthy products (ecology, toxicology and sustainable innovation)

Page 20: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

In vitro

Life-support functions soil

Biol. response array

Eco-toxicology

In situ

Metagenome array

Technologydevelopment

Research questions

Bio-informatics-

Technology platform

Assessing the Living Soil

Ecogenomics

Page 21: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

PROJECTCoordinator

Task force

EPIDEMIOLOGYDorret Boomsma(VU/mc)*Cornelia van Duyn(EMC)

Populations EMC : van Duyn, Hofman, OostraVU/mc : Boomsma, Boers, Dijkmans, Heine, Hoogendijk, van der Knaap, Meier, Pena, PinedoLUMC : Slagboom, Bertina, Breedveld, Breuning, Cornelisse, Devilee, vDissel, Ferrari, Huizinga, Roosendaal, Roos, van der Velde, Westendorp, ZitmanGenotyping LUMC : Slagboom, Sandkuijl, den DunnenEMC : Oostra, HeutinkVU/mc : Boomsma, (Heutink)

SYSTEMS BIOLOGYJan vd Greef (TNO/UL)*Cor Verweij (VU/mc) 

Arraying LUMC: den Dunnen, Boer, FoddeVU/mc: Verweij, Ylstra, BrakenhoffEMC: Oostra

Proteomics LUMC: Koning, Deelder, den Dunnen, van der MaarelUL: Overkleeft, Abrahams, VU/mc: Smit, Li, van Kooyk

Metabolomics

UL: Verduijn Lunel, van de Geer, VerheijTNO: van der Greef, Havekes, te KoppeleVU/mc: Jakobs

TECHNOLOGYHuub de Groot (UL) 

Molecular interactions

UL: Abrahams, Brouwer, IJzerman, van BoomLUMC: Tanke, Raap, Deelder, den DunnenVU/mc: Leurs, Irth

In vivo imaging

UL: de Groot, KokLUMC: Reiber, van Buchem, de Roos, Poelmann, LowickVU/mc: Witter, Bal, LammertsmaEMC: van Duyn, van Swieten

MODEL SYSTEMSRune Frants (LUMC) 

Mouse / RatZebrafishDrosophilaYeast

EMC: OostraLUMC : Verbeek, Fodde, deKloet, Verrijzer, Noordermeer, MullendersTNO: Havekes; UL: Spaink, Brouwer, SchmidtVU/mc: Verhage, Smit, Vandenbroucke-Grauls

CLINICAL APPLICATIONSCornelis Melief (LUMC)  

Cells, vaccines

LUMC : Melief, Goulmy, Falkenburg, Ottenhoff; Spaan, de VriesVU/mc: van Kooiyk; Meijer,Pinedo

Viral LUMC: Spaan, Wiertz, HoebenVU/mc: Gerritsen, Curiel

Methodologies,Pharmaceuticals

UL: IJzerman, Mulder, van BoomLUMC: Huizinga, Breedveld, Breuning, van Deutekom, Ferrari, Fodde, Frants, Jukema, de Kloet, Ottenhoff, van der Velde, ZitmanVU/mc: Maassen, Dijkmans, Leurs, Meijer, PinedoEMC : Stricker

CENTRAL PROJECTCoordinator / Elements

DATA INTEGRATION,ANALYSIS AND LOGISTICSNN

Central Information ManagementTFBI / BIG-VU / EBB - Rosetta Resolver® - LIMS integration/ /interfacing Biostatistics van Houwelingen, Eijlers, Boer, Sandkuijl (LUMC); van der Vaart, de Gunst, Boers (VU/mc); Houwing-Duiistermaat (EMC), van de Geer (UL)

Bioinformatics Boer, Svensson, Gorbalenya (LUMC), Heringa, vBeek (VU/mc), Stijnen, van der Lei, Mons (EMC), Kok (UL)

BioASP Interface ism: Vriend/TellegenGRID – Virtual Laboratory NWO- BMI FLEXwork van Ommen, Boer, Svensson ism: - Stiekema (Wag) - Herzberger (Ams) - Vriend (Nijm)

Medical Systems Biology

Page 22: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Integrate data sources

• Integrate methods

• Integrate data through method integration (biological model)

Integrative bioinformaticsData integration

Page 23: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Bioinformatics tool

Data

Algorithm

BiologicalInterpretation

(model)

tool

Page 24: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

“Nothing in Biology makes sense except in the light of evolution” (Theodosius Dobzhansky (1900-1975))

“Nothing in Bioinformatics makes sense except in the light of Biology”

Bioinformatics

Page 25: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Pair-wise sequence alignment(more than just string matching)

MDAGSTVILCFVGMDAASTILCGS

Amino Acid Exchange

Matrix

Gap penalties (open,extension)

Search matrix

MDAGSTVILCFVG-MDAAST-ILC--GS

EvolutionGlobal dynamic programming

Page 26: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Data

Algorithm

BiologicalInterpretation

(model)

tool

Integrative bioinformaticsData integration

Page 27: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Integrative bioinformaticsData integration

Data 1 Data 2 Data 3

Page 28: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Integrative bioinformaticsData integration

Data 1

Algorithm 1

BiologicalInterpretation

(model) 1

tool

Algorithm 2

BiologicalInterpretation

(model) 2

Algorithm 3

BiologicalInterpretation

(model) 3

Data 2 Data 3

Page 29: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

“The solution includes an infrastructure or data pipeline involving: •a general portal•virtual lab technology (virtual LIMS)•‘petabase’ data handling facilities•methods, software and ‘tools’ to integrate data and extract knowledge from data in the user domain.

This infrastructure calls for •a central facilitation unit providing large storage and computing facilities to run central software packages with user interfaces”

•Could Gridlab do this?

Integrative bioinformaticsData integration

Page 30: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Integrating Primary and Predicted Secondary Structure data for

Multiple Alignment

Page 31: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Using secondary structure in multiple alignment

“Structure more conserved than sequence”

•10 years SS prediction method development: Q3 += 3%

•10 years MA method development: difference in Q3 can be >30%

Page 32: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Protein structure hierarchical levels

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE (oligomers)

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Page 33: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Protein structure hierarchical levels

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE (oligomers)

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Page 34: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Protein structure hierarchical levels

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE (oligomers)

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Page 35: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Flavodoxin-cheY: Praline alignment (prepro=1500)

1fx1 -PKALIVYGSTTGNT-EYTAETIARQLANAG-YEVDSRDAASVEAGGLFEGFDLVLLGCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACF

FLAV_DESDE MSKVLIVFGSSTGNT-ESIaQKLEELIAAGG-HEVTLLNAADASAENLADGYDAVLFgCSAWGMEDL------EMQDDFLSLF-EEFNRFGLAGRKVAAf

FLAV_DESVH MPKALIVYGSTTGNT-EYTaETIARELADAG-YEVDSRDAASVEAGGLFEGFDLVLLgCSTWGDDSI------ELQDDFIPLF-DSLEETGAQGRKVACf

FLAV_DESSA MSKSLIVYGSTTGNT-ETAaEYVAEAFENKE-IDVELKNVTDVSVADLGNGYDIVLFgCSTWGEEEI------ELQDDFIPLY-DSLENADLKGKKVSVf

FLAV_DESGI MPKALIVYGSTTGNT-EGVaEAIAKTLNSEG-METTVVNVADVTAPGLAEGYDVVLLgCSTWGDDEI------ELQEDFVPLY-EDLDRAGLKDKKVGVf

2fcr --KIGIFFSTSTGNT-TEVADFIGKTLGA---KADAPIDVDDVTDPQALKDYDLLFLGAPTWNTG----ADTERSGTSWDEFLYDKLPEVDMKDLPVAIF

FLAV_AZOVI -AKIGLFFGSNTGKT-RKVaKSIKKRFDDET-MSDA-LNVNRVS-AEDFAQYQFLILgTPTLGEGELPGLSSDCENESWEEFL-PKIEGLDFSGKTVALf

FLAV_ENTAG MATIGIFFGSDTGQT-RKVaKLIHQKLDG---IADAPLDVRRAT-REQFLSYPVLLLgTPTLGDGELPGVEAGSQYDSWQEFT-NTLSEADLTGKTVALf

FLAV_ANASP SKKIGLFYGTQTGKT-ESVaEIIRDEFGN---DVVTLHDVSQAE-VTDLNDYQYLIIgCPTWNIGEL--------QSDWEGLY-SELDDVDFNGKLVAYf

FLAV_ECOLI -AITGIFFGSDTGNT-ENIaKMIQKQLGK---DVADVHDIAKSS-KEDLEAYDILLLgIPTWYYGE--------AQCDWDDFF-PTLEEIDFNGKLVALf

4fxn -MK--IVYWSGTGNT-EKMAELIAKGIIESG-KDVNTINVSDVNIDELL-NEDILILGCSAMGDEVL-------EESEFEPFI-EEIS-TKISGKKVALF

FLAV_MEGEL MVE--IVYWSGTGNT-EAMaNEIEAAVKAAG-ADVESVRFEDTNVDDVA-SKDVILLgCPAMGSEEL-------EDSVVEPFF-TDLA-PKLKGKKVGLf

FLAV_CLOAB -MKISILYSSKTGKT-ERVaKLIEEGVKRSGNIEVKTMNLDAVD-KKFLQESEGIIFgTPTYYAN---------ISWEMKKWI-DESSEFNLEGKLGAAf

3chy ADKELKFLVVDDFSTMRRIVRNLLKELGFN--NVEEAEDGVDALNKLQAGGYGFVI---SDWNMPNM----------DGLELL-KTIRADGAMSALPVLM

T1fx1 GCGDS-SY-EYFCGA-VDAIEEKLKNLGAEIVQD---------------------GLRIDGD--PRAARDDIVGWAHDVRGAI--------

FLAV_DESDE ASGDQ-EY-EHFCGA-VPAIEERAKELgATIIAE---------------------GLKMEGD--ASNDPEAVASfAEDVLKQL--------

FLAV_DESVH GCGDS-SY-EYFCGA-VDAIEEKLKNLgAEIVQD---------------------GLRIDGD--PRAARDDIVGwAHDVRGAI--------

FLAV_DESSA GCGDS-DY-TYFCGA-VDAIEEKLEKMgAVVIGD---------------------SLKIDGD--PE--RDEIVSwGSGIADKI--------

FLAV_DESGI GCGDS-SY-TYFCGA-VDVIEKKAEELgATLVAS---------------------SLKIDGE--PD--SAEVLDwAREVLARV--------

2fcr GLGDAEGYPDNFCDA-IEEIHDCFAKQGAKPVGFSNPDDYDYEESKS-VRDGKFLGLPLDMVNDQIPMEKRVAGWVEAVVSETGV------

FLAV_AZOVI GLGDQVGYPENYLDA-LGELYSFFKDRgAKIVGSWSTDGYEFESSEA-VVDGKFVGLALDLDNQSGKTDERVAAwLAQIAPEFGLS--L--

FLAV_ENTAG GLGDQLNYSKNFVSA-MRILYDLVIARgACVVGNWPREGYKFSFSAALLENNEFVGLPLDQENQYDLTEERIDSwLEKLKPAV-L------

FLAV_ANASP GTGDQIGYADNFQDA-IGILEEKISQRgGKTVGYWSTDGYDFNDSKA-LRNGKFVGLALDEDNQSDLTDDRIKSwVAQLKSEFGL------

FLAV_ECOLI GCGDQEDYAEYFCDA-LGTIRDIIEPRgATIVGHWPTAGYHFEASKGLADDDHFVGLAIDEDRQPELTAERVEKwVKQISEELHLDEILNA

4fxn G-----SY-GWGDGKWMRDFEERMNGYGCVVVET---------------------PLIVQNE--PDEAEQDCIEFGKKIANI---------

FLAV_MEGEL G-----SY-GWGSGEWMDAWKQRTEDTgATVIGT----------------------AIVNEM--PDNA-PECKElGEAAAKA---------

FLAV_CLOAB STANSIAGGSDIA---LLTILNHLMVKgMLVYSG----GVAFGKPKTHLGYVHINEIQENEDENARIfGERiANkVKQIF-----------

3chy VTAEAKK--ENIIAA---------AQAGAS-------------------------GYVV-----KPFTAATLEEKLNKIFEKLGM------

GIteration 0 SP= 136944.00 AvSP= 10.675 SId= 4009 AvSId= 0.313

Page 36: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Flavodoxin-cheY NJ tree

Page 37: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Secondary structure-induced alignment iteration

Page 38: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

3chy-AA SEQUENCE|| AA |ADKELKFLVVDDFSTMRRIVRNLLKELGFNNVEEAEDGVDALNKLQAGGYGFVISDWNMP|

3chy-ITERATION-0|| PHD | EEEEEEE HHHHHHHHHHHHHHHHH E HHHHHHHHHH HHHEEE |

3chy-ITERATION-1|| PHD | EEEEEEEE HHHHHHHHHHHHHHH HHHHHHHH EEEEEE |

3chy-ITERATION-2|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHH EEEEEE |

3chy-ITERATION-3|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE |

3chy-ITERATION-4|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEE |

3chy-ITERATION-5|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE |

3chy-ITERATION-6|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHH EEEEEE |

3chy-ITERATION-7|| PHD | EEEEEEEE HHHHHHHHHHHHHH EEE HHHHHH EEEEE |

3chy-ITERATION-8|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHH EEEEEE |

3chy-ITERATION-9|| PHD | EEEEEEEE HHHHHHHHHHHHHH HHHHHHHHHH EEEEE |

3chy-AA SEQUENCE|| AA |NMDGLELLKTIRADGAMSALPVLMVTAEAKKENIIAAAQAGASGYVVKPFTAATLEEKLNKIFEKLGM|

3chy-ITERATION-0|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHH HHHHHHHHHHHHHH |

3chy-ITERATION-1|| PHD | HHHHHHEEEEEE HHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH |

3chy-ITERATION-2|| PHD | HHHHHHEEEEEE HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH |

3chy-ITERATION-3|| PHD | HHHHHHHHHHHH HHHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH |

3chy-ITERATION-4|| PHD | HHHHH EEEEE HHHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH |

3chy-ITERATION-5|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH |

3chy-ITERATION-6|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

3chy-ITERATION-7|| PHD | HHHHHHHH EEEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH |

3chy-ITERATION-8|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHHH EEE HHHHHHHHHHHHHH |

3chy-ITERATION-9|| PHD | HHHHHHHH EEEEE HHHHHHHHHHHHHHH EEEE HHHHHHHHHHHHHH |

Flavodoxin-cheY multiple alignment/ secondary structure iteration

cheY SSEs

Page 39: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Integrating secondary structure prediction and multiple alignment

• Low key example

• But difficult

• How to scale up?

• Need new formalisms and technology

Page 40: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

SnapDRAGON

Richard A. George

George R.A. and Heringa, J. (2002) J. Mol. Biol., 316, 839-851.

 

Integrating protein multiple alignment, secondary and tertiary structure

prediction to predict structural domains in sequence data

Page 41: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

The DEATH Domain• Present in a variety of Eukaryotic proteins involved with cell death.• Six helices enclose a tightly packed hydrophobic core.• Some DEATH domains form homotypic and heterotypic dimers.

http

://w

ww

.msh

ri.o

n.ca

/paw

son

Page 42: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Pyruvate kinasePhosphotransferase

barrel regulatory domain

barrel catalytic substrate binding domain

nucleotide binding domain

1 continuous + 2 discontinuous domains

Structural domain organisation can be nasty…

Page 43: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Protein structure hierarchical levels

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Page 44: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Protein structure hierarchical levels

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Page 45: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Protein structure hierarchical levels

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Page 46: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Protein structure hierarchical levels

VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

PRIMARY STRUCTURE (amino acid sequence)

QUATERNARY STRUCTURE

SECONDARY STRUCTURE (helices, strands)

TERTIARY STRUCTURE (fold)

Page 47: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

•The C distance matrix is divided into smaller clusters.

•Seperately, each cluster is embedded into a local centroid.

•The final predicted structure is generated from full embedding of the multiple centroids and their corresponding local structures.

3NN

NN

C distancematrix

Targetmatrix

N

CCHHHCCEEE

Multiple alignment

Predicted secondary structure

100 randomisedinitial matrices

100 predictions

Input data

SnapDRAGON

Page 48: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Domains in structures assigned using method by Taylor (1997)

Domain boundary positions of each model against sequence

Summed and Smoothed Boundaries (Biased window protocol)

SnapDRAGON

1

2

3

Page 49: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

SnapDRAGON

•Predicting domain boundaries for single average size protein could take hours on 128-node cluster computer with simplified significance testing.

•How to do scale up to structural genomics? 30,000 human proteins of 1 hr each gives 3.5 years..

Page 50: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

What we still cannot do well• “Give us sequence, we do rest” failed so far; e.g., number of

human genes• Gene prediction bad, RNA genes missed• Protein structure/function prediction unsolved; we have no clue

about function of 50% of human genes• No theory of gene regulation• Cannot well predict post-translational modification• Many (database) solutions not generic• We have no E=mc2 so need to keep all data• Integrating methods and data• Understand biologically

Page 51: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Future Bioinformatics Research Topics

• Integration of knowledge– We have some formalisms (ontologies,

distributed databases) but we need to develop many completely new formalisms and new technologies beyond what we have now

Page 52: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Conclusions

• Getting important integrative Bioinformatics/Systems Biology applications onto the Grid through Gridlab can be significant

• Bioinformatics and genomics are getting clinical. Gridlab could play an important role

Page 53: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

The end. Thanks

Page 54: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Future Bioinformatics Research Topics

Keywords morning session

• Integration of knowledge– Information transfer from one object to another– What are the rules– From genotype to phenoypes, current algorithms and

ontologies not sufficient– Biological interpretation needs context– DB maintenance is dynamic process, most info is static– Need resources– Environment should allow student to make method in 3

hours

Page 55: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Genomics– Identifying genetic elements still bad– Collect easy primary biological facts– Gene pred, struct pred, functional all unsolved– Genetic “parts” list is uncomplete and scanty– Many omics “unknowme”

Page 56: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Genomics– Hypothesis driven versus systematic approaches

– Need databases,algorithms, biol knowledge

– Data structures not suitable for complexity

– Solutions such as Ensembl not generic

– Need technologies beyond ontologies

– Need new formalisms to be able to do “vertical genomics”

Page 57: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Systems Biology– Very promising area

• Health

• Pharmaceuticals

• Biotechnology

• Environment

Page 58: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• (Medical) Systems Biology– Diego di Bernardo– Ilias Jakovidis– Very promising area

Page 59: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)
Page 60: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Summary

• How can Europe regain ground

Page 61: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Hans Werner Mewes

• DNA contains all• Identifying genetic elements still bad• Collect easy primary biological facts• Gene pred, struct pred, functional all

unsolved• Genetic “parts” list is uncomplete and

scanty• Many omics “unknowme”

Page 62: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Hypothesis driven versus systematic approaches

• Need databases,algorithms, biol knowledge

• Data structures not suitable for complexity

• Solutions such as Ensembl not generic

• Need technologies beyond ontologies

Page 63: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Information transfer from one object to another• What are the rules• From genotype to phenoypes, current algorithms not

sufficient• Biological interpretation needs context• DB maintenance is dynamic process, most info is

static• Need resources• Environment should allow student to make method in

3 hours

Page 64: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Diego di Bernardo

• TIGEM: disease genes• Bioinformatics and comp biol not at a par• 81 of genome “genomics&databases” and 19%

“genomics&algorithms• Important topics: regulation, network, digital signal

processing HMMs• Problems : algorithms not biological and no

experimental verification• Bioinformatics helps design biological experimnents• Richard Durbin: value of physics and engineering

Page 65: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Computational tools for discovery of novel objects

Page 66: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Ilias Jakovidis

• Medical informatics

• Health telematics

• eHealth

• Medical ontologies didn’t help Paul Schofield at all (tried with NCBI-big mess)

• Middleware includes ontologies so covers biology (IBM!)

Page 67: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Language engineering• Natural language in medicine, computerize

medical community• Biomedical informatics: applications in

healthcare, how to get to clinical?• Synergy between medicine and biology

informatics• Alphonso, med will dominate, lot of money with

unclear methods

Page 68: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Medical info has worked coherently, how can we do that? How can we change?

• Mewes: Bioinf has achieved usage, not med. Bioinf is entering cliniques.

Page 69: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Gunnar von Heijne

• Databases should be funded• Start problem for 5 years: and then what?• With infrastructure this problem is less, so funding

is relatively OK.• Technology development should not become

dominant• Most biologist are small scale hypothesis driven• Marketing problem

Page 70: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• From 19 bioinf nethods , 15 are European in genomics

• Validation is not always key (Alfonso)

• EMBOSS project European wide, for algorithm driven research. EMBOSS is longstanding. But could not get funding from EC (no funding category)

Page 71: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Alfonso Valencia

• Often, 1 bioinformatician for everything• Need of integration/collaboration

– Social, technical barriers

• People should realise that Bioinf and Bioinformaticians are very different

• Integrated (med) system– Underfunded (1 postdoc)– Difficult to develop– lack of standards and repositories– Difficult to interact with biologist– All these things essential

Page 72: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• 3-4 good bioinf groups in Spain• Make virtual institute for bioinformatics• There are few large groups with national

funding • There are few large groups with European

funding• There are many small groups with weak

institutional funding

Page 73: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Create framework valid for biology• Interaction reduces overhead• System access for biologists, point to the

right expert• Create new science beyond current needs• This does not compete with basic needs• Support strong European areas (eg. protein

interaction)

Page 74: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Bioinformatics is a new discipline• Who solves the problem, who is interested in solving

it, and not always who qualifies to solving it (engineers,..)

• Example “information extraction in molecular biology”: after years no real progress made.

• Systems Biology: what to do and how (no linear path), but we have opportunity to develop knowing

• Experimental validation: methods debug databases. Many proteins (90%?) have never seen an experiment

Page 75: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Should bioinf talk to biology or vice versa?

Page 76: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Jean-Marie Claverie

• 1951 first protein sequence (insulin)• Field has come of age, so outsiders shouldn’t tell

us what to do and how• Bioinformatics is part of the foundation• Clear difference in application of informatics or

bioinformatics• Future will be different• Give us sequence, we do rest failed! Number of

human genes is example.

Page 77: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

• Gene finding: standard genes good, RNA genes missed, no theory of gene transcription

• We have no E=mc2 so need to keep data• Computational biology is same as systems biology• Good integration: E. Coli Bioinf-project, find all

genes in small bacterium. Inclusive project. Now good consortium.

• Bioinformatics becomes invisible for biologists (Blast).

Page 78: Bioinformatics as an integrative science Jaap Heringa Faculty of Sciences Faculty of Earth and Life Sciences Integrative Bioinformatics Institute VU (IBIVU)

Howard Bilofsky

• PRISM forum

• Provide challenges for (bio)informatics

• Drives Bioinf,omics,.. techniques