55
10/29/2004 Bioinformatics in Computer Science 1 Bioinformatics in Computer Science, the Virginia Bioinformatics Institute, and Opportunities for Engineering Lenwood S. Heath Department of Computer Science Blacksburg, VA 24061 College of Engineering Advisory Board Meeting October 29, 2004

10/29/2004 Bioinformatics in Computer Science 1 Bioinformatics in Computer Science, the Virginia Bioinformatics Institute, and Opportunities for Engineering

  • View
    231

  • Download
    1

Embed Size (px)

Citation preview

10/29/2004 Bioinformatics in Computer Science 1

Bioinformatics in Computer Science, the Virginia Bioinformatics Institute, and Opportunities for Engineering

Lenwood S. HeathDepartment of Computer Science

Blacksburg, VA 24061

College of EngineeringAdvisory Board Meeting

October 29, 2004

10/29/2004 Bioinformatics in Computer Science 2

Overview

• Computational biology and bioinformatics

• The players• Computer Science• Virginia Bioinformatics Institute (VBI)• Others at VT

• Opportunities for the College• Collaboration with VBI• SBES, Wake Forest School of Medicine• NIH and DHS funding• Scientific modeling

10/29/2004 Bioinformatics in Computer Science 3

Computational Biology and Bioinformatics

• Computational biology — computational research inspired by biology

• Bioinformatics — application of computational research (computer science, mathematics, statistics) to advance basic and applied research in the life sciences

• Agriculture• Basic biological science• Medicine

• Both ideally done within multidisciplinary collaborations

10/29/2004 Bioinformatics in Computer Science 4

Bioinformatics at VT (Part I)

• Biological modeling (Tyson, Watson): > 20 years• Computational biology, genome rearrangements

(Heath): > 10 years • Fralin Biotechnology sponsored faculty advisory

committee centered on bioinformatics: 1998-2000•Biochemistry; biology; CALS; computer science (Heath, Watson); statistics; VetMed

•Provost provided $1 million seed money•First VT bioinformatics hire (Gibas, biology, 1999)

10/29/2004 Bioinformatics in Computer Science 5

Bioinformatics at VT (Part II)

• Outside initiative submitted to VT for a campus bioinformatics center — 1998

• Discussions of bioinformatics advisory committee contributed to a proposal to the Gilmore administration — 1999

• Governor Gilmore puts plans and money for bioinformatics center in budget — 1999-2000

• Virginia Bioinformatics Institute (VBI) established July, 2000; housed in CRC

10/29/2004 Bioinformatics in Computer Science 6

Bioinformatics at VT (Part III)

• Bioinformatics course and curriculum development began with faculty subcommittee — 1999

• Courses supporting bioinformatics now in many life science and computational science departments, including:

• Biology• Biochemistry• Computer Science• Plant Pathology, Physiology, and Weed Science (PPWS)• Mathematics• Statistics

10/29/2004 Bioinformatics in Computer Science 7

Bioinformatics Education at VT

• CS has been training CS graduate students in bioinformatics since 2000

• Graduate bioinformatics option established in a number of participating departments — 2003

• Ph.D. program in Genetics, Bioinformatics, and Computational Biology (GBCB) — 2003

• First GBCB students arrived, Fall, 2003; now in second year; completing core requirements

10/29/2004 Bioinformatics in Computer Science 8

Bioinformatics Spirit at VT

• Close collaboration between life scientists and computational scientists from the beginning

• Educational approach insists on adequate multidisciplinary background

• Multidisciplinary collaborators work closely on a regular basis

• Contributions to biology or medicine essential outcomes

10/29/2004 Bioinformatics in Computer Science 9

The Players

• Computer Science• Virginia Bioinformatics Institute (VBI)• Others at VT

10/29/2004 Bioinformatics in Computer Science 10

CS Bioinformatics Faculty

1. Chris Barrett (VBI, CS)

2. Vicky Choi

3. Roger Ehrich

4. Edward A. Fox

5. Lenny Heath

6. T. M. Murali

7. Chris North

8. Alexey Onufriev

9. Naren Ramakrishnan

10. Adrian Sandu

11. Eunice Santos

12. João Setubal (VBI, CS)

13. Cliff Shaffer

14. Layne Watson

15. Liqing Zhang

10/29/2004 Bioinformatics in Computer Science 11

Relevant Expertise

• Algorithms — Choi, Heath, Santos, Setubal, Shaffer, Watson• Computational structural biology — Onufriev, Sandu• Computational systems biology — Murali• Data mining — Ramakrishnan• Genomics — Heath, Murali, Ramakrishnan, Setubal, Zhang• Human-computer interaction, visualization — North• Image processing — Ehrich, Watson• Information retrieval — Ehrich• High performance computing — Sandu, Santos, Watson• Optimization — Watson• Simulation — Barrett

10/29/2004 Bioinformatics in Computer Science 12

Established Bioinformatics Faculty

• Layne Watson• Lenny Heath• Cliff Shaffer• Naren Ramakrishnan• Eunice Santos

10/29/2004 Bioinformatics in Computer Science 13

Layne Watson

• Professor of Computer Science and Mathematics• Expertise: algorithms; image processing; high

performance computing; optimization; scientific computing

• Computational biology: has worked with John Tyson (biology) for over 20 years

• JigCell: cell-cycle modeling environment; with Tyson, Shaffer, Ramakrishnan, Pedro Mendes of VBI

• Expresso: microarray experimentation; with Heath, Ramakrishnan

10/29/2004 Bioinformatics in Computer Science 14

Lenny Heath

• Professor of Computer Science• Expertise: algorithms; theoretical computer science;

graph theory• Computational biology: worked in genome

rearrangements 10 years ago• Bioinformatics: concentration in past 5 years• Expresso: microarray experimentation; with

Ramakrishnan, Watson– Multimodal networks– Computational models of gene silencing

10/29/2004 Bioinformatics in Computer Science 15

Cliff Shaffer

• Associate Professor of Computer Science

• Expertise: algorithms; problem solving environments; spatial data structures;

• JigCell: cell-cycle modeling environment; with Ramakrishnan, Tyson, Watson

10/29/2004 Bioinformatics in Computer Science 16

Naren Ramakrishnan

• Associate Professor of Computer Science• Expertise: data mining; machine learning; problem

solving environments• JigCell: cell-cycle modeling problem solving

environment; with Shaffer, Watson• Expresso: microarray experimentation; with Heath,

Watson– Proteus — inductive logic programming system for

biological applications– Computational models of gene silencing

10/29/2004 Bioinformatics in Computer Science 17

Eunice Santos

• Associate Professor of Computer Science

• Expertise: Algorithms; computational biology; computational complexity; parallel and distributed processing; scientific computing

• Relevant bioinformatics project: modeling progress of breast cancer

10/29/2004 Bioinformatics in Computer Science 18

New Bioinformatics Faculty

• T. M. Murali (2003) CS bioinformatics hire• Alexey Onufriev (2003) CS bioinformatics hire• Adrian Sandu (2004) CS hire• João Setubal (Early 2004) VBI and CS• Vicky Choi (2004) CS bioinformatics hire• Liqing Zhang (2004) CS bioinformatics hire• Chris Barrett (Fall 2004) VBI and CS• One more bioinformatics position for Fall, 2005

10/29/2004 Bioinformatics in Computer Science 19

T. M. Murali

• Assistant Professor of Computer Science• Hired in 2003 for bioinformatics group• Expertise: algorithms; computational geometry;

computational systems biology• Projects:

– Functional gene annotation– xMotif — find patterns of coexpression among subsets of

genes– RankGene — rank genes according to predictive power for

disease

10/29/2004 Bioinformatics in Computer Science 20

Alexey Onufriev

• Assistant Professor of Computer Science• Hired in 2003 for bioinformatics group• Expertise: Computational and theoretical biophysics and

chemistry; structural bioinformatics; numerical methods; scientific programming

• Projects:– Biomolecular electrostatics– Theory of cooperative ligand binding– Protein folding– Protein dynamics — how does myoglobin uptake oxygen?– Computational models of gene silencing

10/29/2004 Bioinformatics in Computer Science 21

Adrian Sandu

• Associate Professor of Computer Science• Hired in 2003• Expertise: Computational science; numerical methods;

parallel computing; scientific and engineering applications

• Computational science:– New generation of air quality models– computational tools for assimilation of atmospheric

chemical and optical measurements into atmospheric chemical transport models

10/29/2004 Bioinformatics in Computer Science 22

João Setubal

• Research Associate Professor at VBI• Associate Professor of Computer Science• Joined in early 2004• Expertise: algorithms; computational biology;

bacterial genomes• Comparative genomics

10/29/2004 Bioinformatics in Computer Science 23

Vicky Choi

• Assistant Professor of Computer Science• Hired in 2004 for bioinformatics group• Expertise: computational biology; algorithms• Projects:

– Algorithms for genome assembly

– Protein docking

– Biological pathways

10/29/2004 Bioinformatics in Computer Science 24

Liqing Zhang

• Assistant Professor of Computer Science• Hired in 2004 for bioinformatics group• Expertise: evolutionary biology; bioinformatics• Research interests:

– Comparative evolutionary genomics

– Functional genomics

– Multi-scale models of bacterial evolution

10/29/2004 Bioinformatics in Computer Science 25

Bioinformatics Research in CS

• Collaboration• Funding• Resources• Overview of projects

10/29/2004 Bioinformatics in Computer Science 26

Selected Collaborations

• Virginia Tech: Biochemistry, Biology, Fralin Biotechnology Center, PPWS, Veterinary Medicine, VBI, Wood Science

• North Carolina State University: Forest Biotechnology Center

• Duke: Biology

• University of Illinois: Plant Biology

10/29/2004 Bioinformatics in Computer Science 27

Selected Funding (Watson/Tyson)• NSF MCB-0083315: Biocomplexity---Incubation Activity: A

Collaborative Problem Solving Environment for Computational Modeling of Eukaryotic Cell Cycle Controls. J. J. Tyson, L. T. Watson, N. Ramakrishnan, C. A. Shaffer, J. C. Sible. $99,965.

• NIH 1 R01 GM64339-01: ``Problem Solving Environment for Modeling the Cell Cycle. J. J. Tyson, J. Sible, K. Chen, L. T. Watson, C. A. Shaffer, N. Ramakrishnan, P. Mendes (VBI). $211,038.

• Air Force Research Laboratory F30602-01-2-0572: The Eukaryotic Cell Cycle as a Test Case for Modeling Cellular Regulation in a Collaborative Problem Solving Environment. J. J. Tyson, J. C. Sible, K. C. Chen, L. T. Watson, C. A. Shaffer, N. Ramakrishnan. $1,650,000.

10/29/2004 Bioinformatics in Computer Science 28

Selected Funding (Heath, et al.)• NSF IBN 0219322: ITR: Understanding Stress Resistance

Mechanisms in Plants: Multimodal Models Integrating Experimental Data, Databases, and the Literature. L. S. Heath; R. Grene, B. I. Chevone, N. Ramakrishnan, L. T. Watson. $499,973.

• NSF EIA-01903660: A Microarray Experiment Management System. N. Ramakrishnan, L. S. Heath, L. T. Watson, R. Grene, J. W. Weller (VBI). $600,000.

• DARPA N00014-01-1-0852: Dryophile Genes to Engineer Stasis-Recovery of Human Cells. M. Potts, L. S. Heath, R. F. Helm, N. Ramakrishnan, T. O. Sitz, F. Bloom, P. Price (Life Technologies), J. Battista (LSU). $4,532,622.

• NSF CCF 0428344: ITR-(NHS)-(sim): Computational Models for Gene Silencing: Elucidating a Pervasive Biological Defensive Response. L. S. Heath, R. F. Helm, A. Onufriev, M. Potts, N. Ramakrishnan. $1,500,000.

10/29/2004 Bioinformatics in Computer Science 29

Research Resources Available to CS Bioinformatics

System X• Third fastest computer on the planet (2003)Laboratory for Advanced Scientific Computing &

Applications (LASCA)• Parallel algorithms & math software• Anantham Cluster• Grid computingBioinformatics Research LAN• Linux, Mac OS X• Bioinformatics databases and analysis

10/29/2004 Bioinformatics in Computer Science 30

JigCell: A PSE for JigCell: A PSE for Eukaryotic Cell Cycle ControlsEukaryotic Cell Cycle Controls

Marc Vass, Nick Allen, Jason Zwolak, Dan Moisa,

Clifford A. Shaffer, Layne T. Watson,

Naren Ramakrishnan, and John J. Tyson

Departments of Computer Science and Biology

10/29/2004 Bioinformatics in Computer Science 31

Clb5MBF

P Sic1SCFSic1

Swi5

Clb2Mcm1

Unaligned chromosomes

Cln2Clb2

Clb5

Cdc20 Cdc20

Cdh1

Cdh1

Cdc20

APC

PPX

Mcm1

SBF

Esp1Esp1 Pds1

Pds1

Cdc20

Net1

Net1P

Cdc14

RENT

Cdc14

Cdc14

Cdc15

Tem1

Bub2

CDKs

Esp1

Mcm1 Mad2

Esp1

Unaligned chromosomes

Cdc15

Lte1

Budding

Cln2SBF

?

Cln3

Bck2and

growth

Sister chromatid separation

DNA synthesis

Cell Cycle of Budding Yeast

10/29/2004 Bioinformatics in Computer Science 32

JigCell Problem-Solving Environment

Experimental Database

Wiring Diagram

Differential Equations Parameter Values

Analysis Simulation

VisualizationAutomatic Parameter Estimation

10/29/2004 Bioinformatics in Computer Science 33

Why do these calculations?

• Is the model “yeast-shaped”?

• Bioinformatics role: the model organizes experimental information.

• New science: prediction, insight

JigCell is part of the DARPA BioSPICE suite of software tools for computational cell biology.

10/29/2004 Bioinformatics in Computer Science 34

Expresso:A Next Generation Software

System for Microarray Experiment Management

and Data Analysis

10/29/2004 Bioinformatics in Computer Science 35

Scenarios for Effects of Abiotic Stress on Gene Expression in Plants

10/29/2004 Bioinformatics in Computer Science 36

The Expresso Pipeline

10/29/2004 Bioinformatics in Computer Science 37

Proteus — Data Mining with ILP

• ILP (inductive logic programming) — a data mining algorithm for inferring relationships or rules

• Proteus — efficient system for ILP in bioinformatics context

• Flexibly incorporates a priori biological knowledge (e.g., gene function) and experimental data (e.g., gene expression)

• Infers rules without explicit direction

10/29/2004 Bioinformatics in Computer Science 38

Networks in Bioinformatics

• Mathematical Model(s) for Biological Networks

• Representation: What biological entities and parameters to represent and at what level of granularity?

• Operations and Computations: What manipulations and transformations are supported?

• Presentation: How can biologists visualize and explore networks?

10/29/2004 Bioinformatics in Computer Science 39

Reconciling Networks

Munnik and Meijer,FEBS Letters, 2001

Shinozaki and Yamaguchi-Shinozaki, Current Opinion

in Plant Biology, 2000

10/29/2004 Bioinformatics in Computer Science 40

Multimodal Networks• Nodes and edges have flexible semantics to represent:

- Time

- Uncertainty

- Cellular decision making; process regulation

- Cell topology and compartmentalization

- Rate constants

- Phylogeny

• Hierarchical

10/29/2004 Bioinformatics in Computer Science 41

Using Multimodal Networks

• Help biologists find new biological knowledge

• Visualize and explore

• Generating hypotheses and experiments

• Predict regulatory phenomena

• Predict responses to stress

• Incorporate into Expresso as part of closing the loop

10/29/2004 Bioinformatics in Computer Science 42

Fusion — Chris North

• “Snap together” visualization environment

• Interactively linked data from multiple sources

• Data mining in the background

10/29/2004 Bioinformatics in Computer Science 43

• Established by the state in July, 2000; high visibility• Applies computational and information technology in

biological research• Research faculty (currently, about 18) expertise includes

– Biochemistry– Comparative Genomics– Computer Science– Drug Discovery– Human and Plant Pathogens

• More than $43 million funded research

Virginia Bioinformatics Institute (VBI)

– Mathematics– Physics– Simulation– Statistics

10/29/2004 Bioinformatics in Computer Science 44

At The Virginia Bioinformatics Institute, we research biological systems and design, develop and disseminate technologies to make discoveries that improve the quality of human life.

We focus on understanding biology through systems that integrate the interaction between organisms and their environment for the benefit of science and society.

We also strive to collaborate with the scientific community by enabling transformation of information into useful knowledge and by providing scientific services.

VBI Mission Statement

10/29/2004 Bioinformatics in Computer Science 45

The Disease Triangle

10/29/2004 Bioinformatics in Computer Science 46

• Core lab facilities– DNA sequencing– Gene expression– Proteomics– Metabolomics

• Core computational facilities– Cluster computing dedicated to bioinformatics– Data storage– Visualization– Database administration

Specialized VBI Facilities

10/29/2004 Bioinformatics in Computer Science 47

• Originally housed in Corporate Research Center• Partially moved to campus last year — Bioinformatics I building• Final move to campus, December, 2004 — Bioinformatics II

building• Total space in Bioinformatics I and II will be 130,560 square feet

VBI Integration into Main Campus

10/29/2004 Bioinformatics in Computer Science 48

VBI Research Portfolio ( by sponsor )

38%

25%

12%

12%

5%

5%

1%

1%

1%

National Institutes of Health

National Science Foundation

VT (JHU/ASPIRES/VTF)

U.S. Dept of Defense

CTRF

Other Academic Institutions

Industry

U.S. Dept of Agriculture

Foundations

10/29/2004 Bioinformatics in Computer Science 49

Funded Partnerships with VT Departments

• Aerospace and Ocean Engineering• Biochemistry• Biology• Biomedical Science and Pathobiology, VMRCVM• Computer Science• Crop and Soil Environmental Sciences• Electrical and Computer Engineering• Fisheries and Wildlife Science• Horticulture• Mathematics• Plant Pathology, Physiology, and Weed Science• Statistics

10/29/2004 Bioinformatics in Computer Science 50

Opportunities for CS and the College of Engineering

• Collaboration with VBI• SBES, Wake Forest School of Medicine• NIH and DHS funding• Scientific modeling

10/29/2004 Bioinformatics in Computer Science 51

Collaboration with VBI

• Basic biological science — molecular biology, functional genomics, systems biology

• Computational methods to answer biological questions from vast stores of VBI data resources

• Computational models and simulation of biological systems, e.g., host-pathogen interaction

10/29/2004 Bioinformatics in Computer Science 52

SBES, Wake Forest

• Medical research includes significant computational challenges

• Much analysis can be done without additional lab biology

• Biomedical data analysis and mining

• Identification of genes responsible for complex traits

• More flexible and useful medical instrumentation

• Precise identification of disease

• Treatment suggestion

• Prognosis prediction

10/29/2004 Bioinformatics in Computer Science 53

NIH and DHS Funding

• Bioinformatics is one of the New Pathways to Discovery in the NIH Roadmap

• Computation is essential to advancing medical practice, from diagnosis to drug design

• Department of Homeland Security (DHS) is funding research to respond to bioterrorism

•Detection and identification of agents•Rapid response to threats•Modeling crisis impact and response

10/29/2004 Bioinformatics in Computer Science 54

Scientific Modeling

• Protein folding• Protein function• Protein-protein interaction• Cellular signaling and decision processes• Heart, lung, neurological function• System X is an essential component

10/29/2004 Bioinformatics in Computer Science 55

Conclusion

• Bioinformatics is an emerging area of opportunity, but challenging to enter

• Rapid developments the norm; flexibility essential

• Virginia Tech and the College are well-positioned to take advantage