19
Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Challenges for computer scienceas a part of Systems Biology

Benno SchwikowskiInstitute for Systems Biology

Seattle, WA

Page 2: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Species

Conditions/time

Genes

Towards integrative models

Proteininteraction- Interaction partner- Direct/indirect- Affinity- Effect

DNA- Sequence- Genomic locus- Domain content- Intron/exon structure- Regulatory motifs- Chemical modifications - SNPs - Splice variants- Accessibility- Variation

mRNA- Abundance- Regulatory information- initiation/ termination signals

Protein- Abundance- State- Localization- 3D structure- Functional characterization- Half-life- Active sites- Biochemical function- Cellular role

Page 3: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Integrative models

…Across genes and proteins: Many genes involved (e.g., multifactorial diseases)

• …Across model systems: Lack of experimental platforms in target system

• …Across levels of biological organization(e.g. gene regulatory processes involving phosphorylation)

• …Across experiments: Robustness against errors in mass spectrometry, mRNA measurements

• …Across timescales

Page 4: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

DNARNA

ProteinsModules

OrganellesCells

OrgansIndividuals

PopulationsEcologies

Challenge: Capturing evolutionary constraints

"Nothing in biology makes sense except in the light of evolution.“Theodosius Dobzhansky

Page 5: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Challenge: Which tools and experiments to use

Page 6: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Choosing experiments

• Machine LearningDetermine most likely classification/parameterization on the basis of a randomly sampled dataset

• Active LearningAllow an algorithm to query selected data points, using the result of previous queries.

Page 7: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Relations between system variables can be quite

complex

Yuh, Bolouri, Davidson, Science, 1998

Page 8: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Relations between system variables can be quite

complex

Yuh, Bolouri, Davidson,

Science, 1998

Page 9: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Develop models that allow extremely efficient algorithms

AGTCGTACGTGAC...

AGTAGACGTGCCG...

ACGTGAGATACGT...

GAACGGAGTACGT...

TCGTGACGGTGAT...

Page 10: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

CLUSTALW(1.74) multiple sequence alignment

Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATTPea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACATobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACCIce-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACCTurnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGCWheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAADuckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAALarch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC

Cotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----APea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------ATobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGAIce-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAATurnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------AWheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC--------Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATTLarch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA

Cotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTAPea GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTATobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATGIce-plant GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGGTurnip CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATAWheat CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTGDuckweed TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATCLarch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA

Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTACPea TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAACTobacco CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAAIce-plant TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTACLarch TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCATurnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAGWheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCCDuckweed CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG

Page 11: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Developing models that allow extremely efficient algorithms

Parsimony score: 1

AGTCGTACGTGAC...

AGTAGACGTGCCG...

ACGTGAGATACGT...

GAACGGAGTACGT...

TCGTGACGGTGAT...ACGGACGT

ACGT

ACGT

J. Comp Biol. 2002

Page 12: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

An Exact Algorithm(generalizing Sankoff and Rousseau 1975)

Wu [s] = best parsimony score for subtree rooted at node u,

if u is labeled with string s.

AGTCGTACGTG

ACGGGACGTGC

ACGTGAGATAC

GAACGGAGTAC

TCGTGACGGTG

… ACGG: 2 ACGT: 1 ...

… ACGG: 0 ACGT: 2...

… ACGG: 1 ACGT: 1 ...

ACGG: + ACGT: 0

...

… ACGG: 1 ACGT: 0 ...

4k entries

… ACGG: 0 ACGT: + ...

…ACGG: ACGT :0...…ACGG:ACGT :0...…ACGG: ACGT :0 ...

Wu [s] = min ( Wv [t] + d(s, t) ) v: child t of u

J. Comp Biol. 2002

Page 13: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

What are good challenges to tackle?

• Biological/medical questions asked• Experimental technologies to acquire a lot

of relevant data• Available datasets with a formalized

notion of “data quality”

Page 14: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Memory complexity: O(k 42k ) per node

Number of species

Average sequence

length

Motif length

Time complexity: Total time O(n k (42k + l ))

J. Comp Biol. 2002

Page 15: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Technology-based challenges:Universal DNA Tag Systems

Existing applications in high-throughput

technologies

• Universal DNA arrays

• Padlock probes

• LYNX mRNA technology

Page 16: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Formalization

Define: weight(A/T)=1, weight(C/G)=2weight(AACTTG) = 1+1+2+1+1+2 = 8 melting temperature (AACTTG) =

2·weight

l-u code problemGiven two integers, l < u, find the largestset of tags such that

Each tag has weight u Each string of weight l occurs at most once

J. Comp Biol. 2000 & 2003

Page 17: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Visualization

Andrea Weston et al.@ ISB & Cytoscape

Page 18: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

Challenge: Visualization

Cytoscape, pre-release 2.0

Page 19: Challenges for computer science as a part of Systems Biology Benno Schwikowski Institute for Systems Biology Seattle, WA

Benno SchwikowskiMath and Computer Science

Challenges

A computer scientist’s perspective

“Biology is so digital, and incredibly complicated […] I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.”

Donald Knuth, 7 Dec 1993

Donald Knuth