Bioinformatics - Discovering the Bio Logic Of Nature

Preview:

DESCRIPTION

Bioinformatics will help us discover the hidden mathematics and bio-logic of nature.

Citation preview

Bioinformatics – Discovering the “Bio-Logic” of Nature

Robert CormiaFoothill College

Transducing the Genome

• 50 years after Watson and Crick deduced the structure of DNA…

• The information molecules of nature now reside as data bits inside computers– But what does it all mean?

• We have ~15 GBytes of genomic data– And only just beginning to unravel it

‘Energy Systems’ Before ‘Life’

• “Life” arose on earth almost 4 billion years ago, 1 billion years before cells

• Long chains of molecules harvesting energy, probably deep below the sea– Before DNA, RNA and the sophisticated

proteins that we know today• There were plenty of sources of energy,

but no “choreographed metabolism”

Energy Metabolism

“In the Beginning”

• Rock, heat, and some water• Early molecules of life• Energy moved from rock into sea• Molecular networks played in the path• Capturing a memory of that process

was probably the key to life today

Life on the Sea Floor?

RNA Busy Before Cellular Life

The RNA World

• There is no way to know how the molecules of life really formed…

• Amino acids and ribonucleotides have formed in “pre-biotic” experiments

• RNA molecules, which appear to be both catalysts and templates, are thought to have formed energy networks

RNA Codons and Catalysts

RNA and DNA

• A, T, C, G, and U• A = Adenine• T = Thymine• C = Cytosine• G = Guanine• U = Uracil• A-T and C-G in DNA• A-U and C-G in RNA

Central Dogma of Life

The Genome

• DNA – DeoxyriboNucleic Acid is the prominent molecule of the genome

• Genes are formed of lengths of DNA polymers which code for proteins

• Exons and introns exist in DNA• Regulatory regions control transcription

and the formation of every protein and enzyme. It is the key to metabolism.

DNA at Transcription

The Proteome

• Proteins form cellular structure and enzymes, which function in metabolism

• Over 100,000 proteins exist in humans• DNA is not enough to run metabolism• Proteins have a “run-time” knowledge• Proteins control the transcription of DNA

and DNA controls formation of proteins

Rubisco Protein – Photosynthesis

RAD Protein Complex

Number of Genes vs. Time

What is Bioinformatics?

• Molecular biology– Ability to sequence DNA

• Internet databases– To store and transmit data

• Mathematical algorithms– To model and solve biological problems

Analysis Using the I2I Technology Model

Internet TechnologiesCPU

NetworkingData Storage

Data

Mini

ngGrid Com

puting

Storage Area Networks

Bioinformatics TechnologiesInformatics

IT / NetworkingMolecular Biology

Data

Mod

eling

Computational Biology

Genomic Databases

A Tool for Biotechnology

• Bioinformatics creates a set of tools for understanding the mountain of new data

• In biotechnology, these tools are used to discover how genes and proteins work

• Computers are used to both analyze and “mine” new data for hidden relationshipsDiscovering the “bio-logic” of nature

From Data to Knowledge

DNA Sequencing

DNA Sequencing

• Chemical sequencing• Molecular sequencing• Now about $0.01 per base• Human Genome took 10 years

– Celera sequenced in 3 years• Moore’s law applies to biotechnology too

– In 2010 a single human genome in ~7 days

DNA Sequencing

http://www.accessexcellence.org

Gel Enhanced Staining

DNA Micro Arrays

• Used to monitor gene expression– Which genes are active?– What are the “co-expressed patterns”?

• Compare healthy and diseased tissue– Extract “expressed” mRNA in cytoplasm– Convert mRNA to cDNA

• Discover relationships of proteins to disease states, and function / location of genes

• Is becoming the first step in “drug-discovery”

Microarray Output Screen

Microarray Output

Partnering with Pharma

• Bioinformatics is an industry of tools– Biotech is a consumer / user of these tools

• Pharma needs more “innovation engines”– Less than 2 drugs per firm in the ‘pipeline’– Drug discovery creates a new value chain

bioinformatics > biotech > ‘big pharma’Convergence is the modality of innovation

Pharma and Biotech

Drug Discovery

• Target discovery• Target validation• Protein interactions• Rapid screening• The long haul…

– $800 million / year is spent on drug discovery– Over 75% of drug compounds will never work

Drug Development Process

Drug Discovery

“Pharmaco Genomics”

• Individualized medicine• Looking at SNPs along drug targets

– What makes each of us – us?– 1 million SNPs, about one per intron

• In the future, each of us will have our genome “insilico” (genome on a chip)

• Data mining against 6 billion genomes!

Pharmaco Genomics

One Genome

• There are three very different ways to look at genomic diversity – and all are equally valid!

• A “collective” human genome– 3 billion base pairs – called the ‘golden path’

• Each one of us is a unique genome– “I am a genome of one”, my SNPS make me - ‘me’

• The Genome on planet earth– A collective metabolic evolution and speciation

Terra Genoma

Molecular Networks

• Genome or Proteome?• Proteome of Genome?• Wait a minute…• What if it’s both?• Now what would that look like?

Gene Regulatory Networks

Pathway Kinetics

Gene Regulatory Network

Bioinformatics Tools

• NCBI– BLAST, 12 million records, SNP databases

• ExPASy– Swiss-Prot, EMBL, Swiss-Model

• PIR – Protein Information Resource• PDB – Protein Data Bank• Pfam – Protein families

NCBI

• National Center for Biotechnology Information, part of NIH and NLM

• Funded by US – open to all• GenBank and GenPept

– 13 million entries, 12 billion base pairs– Resources include oncology, retroviruses,

SNP databases, and much more• Sequin submission of raw sequence data

NCBI Resources

Retroviruses

BLAST

• Basic Local Alignment Search Tool• Used as a “genomic search engine”• Compare your target sequence to the

“non-redundant” database of 13B bps.• Can search the genomes of species

– Human, mouse, fly, E.coli etc.• ‘Hits’ return inks to GenBank and GenPept

Swiss-Prot

• Swiss - protein annotated database• Protein resource

– Minimal redundancy, reasonably current– protein annotated / integrated database– Links to protein structures and properties

• Links back into GenBank, EMBL, DDBJ• Literature resources for submissions

ExPASy

• The ExPASy (Expert Protein Analysis System)

• Proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to analysis of protein sequences and structures

• Swiss-Prot and PROSITE• Links to SWISS-MODEL

PROSITE - Database of Protein Families and Domains

Structure Analysis

Protein Data Bank

• SWISS-MODEL• Protein Data Bank• Archive of .pdb files• Structures determined by X-ray, NMR• Theoretical Structure Search• Features a “Molecule of the Month”• http://www.rcsb.org/pdb/

PIR

• Protein Information Resource• iProClass and PRI-NREF

– PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB

• http://pir.georgetown.edu/• Integrated public resource of protein informatics• Supports genomic and proteomic research and

scientific discovery - iProClass and PRI-NREF

Pfam

• Protein family comparisons– Look at multiple alignments – View protein domain architectures – Examine species distribution – Follow links to other databases – View known protein structures

• Follow ‘conserved domains’ from BLASTp searches of protein databases

The Grand Challenge

The Technology Roadmap• Genomics

– 1995 to 2005• Proteomics

– 2000 to 2010• Systems biology

– 2005 to 2015• Genetic remodeling / re-engineering

– 2010 to 2020• Generation Phi

– Children born in 2025 may never know disease

Convergence of Biotech & Pharma

• Genomics• Proteomics• Systems biology• Pharmaco genomics• Genetic engineering

Mouse Genome

Gene Therapy

• Somatic Gene Therapy• Therapeutic Gene Therapy

– Incorporate “missing genes”– Remove cells from host organism– Amplify target cells– Insert gene using (viral) vector– Return target cells into host organism

• Insulin gene was one of the first trials

Labeling Active Genes Along Chromosomes

Transgenic Species

Designer Flies – Is Blue Cool?

Your Own Private Genome

Surfing the Genome• Internet technologies

– Connecting users, tools, and data• Molecular biology

– Racing forward a top Moore’s Law• Informatics

– Mathematical interrogation of nature’s secrets• Surfing the Genome!

– Discovering the “bio-logic” of Nature

http://www.SurfingTheGenome.us/ Spring 2003

Contact Information

• Robert D. Cormia• Foothill College• rdcormia@earthlink.net• http://www.informaticus.org/• 650 747 1588• Surfing the Genome – Spring 2003

Recommended