75
Bioinformatics – Discovering the “Bio- Logic” of Nature Robert Cormia Foothill College

Bioinformatics - Discovering the Bio Logic Of Nature

Embed Size (px)

DESCRIPTION

Bioinformatics will help us discover the hidden mathematics and bio-logic of nature.

Citation preview

Page 1: Bioinformatics - Discovering the Bio Logic Of Nature

Bioinformatics – Discovering the “Bio-Logic” of Nature

Robert CormiaFoothill College

Page 2: Bioinformatics - Discovering the Bio Logic Of Nature

Transducing the Genome

• 50 years after Watson and Crick deduced the structure of DNA…

• The information molecules of nature now reside as data bits inside computers– But what does it all mean?

• We have ~15 GBytes of genomic data– And only just beginning to unravel it

Page 3: Bioinformatics - Discovering the Bio Logic Of Nature
Page 4: Bioinformatics - Discovering the Bio Logic Of Nature

‘Energy Systems’ Before ‘Life’

• “Life” arose on earth almost 4 billion years ago, 1 billion years before cells

• Long chains of molecules harvesting energy, probably deep below the sea– Before DNA, RNA and the sophisticated

proteins that we know today• There were plenty of sources of energy,

but no “choreographed metabolism”

Page 5: Bioinformatics - Discovering the Bio Logic Of Nature

Energy Metabolism

Page 6: Bioinformatics - Discovering the Bio Logic Of Nature

“In the Beginning”

• Rock, heat, and some water• Early molecules of life• Energy moved from rock into sea• Molecular networks played in the path• Capturing a memory of that process

was probably the key to life today

Page 7: Bioinformatics - Discovering the Bio Logic Of Nature

Life on the Sea Floor?

Page 8: Bioinformatics - Discovering the Bio Logic Of Nature

RNA Busy Before Cellular Life

Page 9: Bioinformatics - Discovering the Bio Logic Of Nature

The RNA World

• There is no way to know how the molecules of life really formed…

• Amino acids and ribonucleotides have formed in “pre-biotic” experiments

• RNA molecules, which appear to be both catalysts and templates, are thought to have formed energy networks

Page 10: Bioinformatics - Discovering the Bio Logic Of Nature

RNA Codons and Catalysts

Page 11: Bioinformatics - Discovering the Bio Logic Of Nature

RNA and DNA

• A, T, C, G, and U• A = Adenine• T = Thymine• C = Cytosine• G = Guanine• U = Uracil• A-T and C-G in DNA• A-U and C-G in RNA

Page 12: Bioinformatics - Discovering the Bio Logic Of Nature

Central Dogma of Life

Page 13: Bioinformatics - Discovering the Bio Logic Of Nature

The Genome

• DNA – DeoxyriboNucleic Acid is the prominent molecule of the genome

• Genes are formed of lengths of DNA polymers which code for proteins

• Exons and introns exist in DNA• Regulatory regions control transcription

and the formation of every protein and enzyme. It is the key to metabolism.

Page 14: Bioinformatics - Discovering the Bio Logic Of Nature
Page 15: Bioinformatics - Discovering the Bio Logic Of Nature

DNA at Transcription

Page 16: Bioinformatics - Discovering the Bio Logic Of Nature

The Proteome

• Proteins form cellular structure and enzymes, which function in metabolism

• Over 100,000 proteins exist in humans• DNA is not enough to run metabolism• Proteins have a “run-time” knowledge• Proteins control the transcription of DNA

and DNA controls formation of proteins

Page 17: Bioinformatics - Discovering the Bio Logic Of Nature

Rubisco Protein – Photosynthesis

Page 18: Bioinformatics - Discovering the Bio Logic Of Nature

RAD Protein Complex

Page 19: Bioinformatics - Discovering the Bio Logic Of Nature

Number of Genes vs. Time

Page 20: Bioinformatics - Discovering the Bio Logic Of Nature

What is Bioinformatics?

• Molecular biology– Ability to sequence DNA

• Internet databases– To store and transmit data

• Mathematical algorithms– To model and solve biological problems

Analysis Using the I2I Technology Model

Page 21: Bioinformatics - Discovering the Bio Logic Of Nature

Internet TechnologiesCPU

NetworkingData Storage

Data

Mini

ngGrid Com

puting

Storage Area Networks

Page 22: Bioinformatics - Discovering the Bio Logic Of Nature

Bioinformatics TechnologiesInformatics

IT / NetworkingMolecular Biology

Data

Mod

eling

Computational Biology

Genomic Databases

Page 23: Bioinformatics - Discovering the Bio Logic Of Nature

A Tool for Biotechnology

• Bioinformatics creates a set of tools for understanding the mountain of new data

• In biotechnology, these tools are used to discover how genes and proteins work

• Computers are used to both analyze and “mine” new data for hidden relationshipsDiscovering the “bio-logic” of nature

Page 24: Bioinformatics - Discovering the Bio Logic Of Nature

From Data to Knowledge

Page 25: Bioinformatics - Discovering the Bio Logic Of Nature

DNA Sequencing

Page 26: Bioinformatics - Discovering the Bio Logic Of Nature

DNA Sequencing

• Chemical sequencing• Molecular sequencing• Now about $0.01 per base• Human Genome took 10 years

– Celera sequenced in 3 years• Moore’s law applies to biotechnology too

– In 2010 a single human genome in ~7 days

Page 27: Bioinformatics - Discovering the Bio Logic Of Nature

DNA Sequencing

http://www.accessexcellence.org

Page 28: Bioinformatics - Discovering the Bio Logic Of Nature

Gel Enhanced Staining

Page 29: Bioinformatics - Discovering the Bio Logic Of Nature

DNA Micro Arrays

• Used to monitor gene expression– Which genes are active?– What are the “co-expressed patterns”?

• Compare healthy and diseased tissue– Extract “expressed” mRNA in cytoplasm– Convert mRNA to cDNA

• Discover relationships of proteins to disease states, and function / location of genes

• Is becoming the first step in “drug-discovery”

Page 30: Bioinformatics - Discovering the Bio Logic Of Nature

Microarray Output Screen

Page 31: Bioinformatics - Discovering the Bio Logic Of Nature

Microarray Output

Page 32: Bioinformatics - Discovering the Bio Logic Of Nature

Partnering with Pharma

• Bioinformatics is an industry of tools– Biotech is a consumer / user of these tools

• Pharma needs more “innovation engines”– Less than 2 drugs per firm in the ‘pipeline’– Drug discovery creates a new value chain

bioinformatics > biotech > ‘big pharma’Convergence is the modality of innovation

Page 33: Bioinformatics - Discovering the Bio Logic Of Nature

Pharma and Biotech

Page 34: Bioinformatics - Discovering the Bio Logic Of Nature

Drug Discovery

• Target discovery• Target validation• Protein interactions• Rapid screening• The long haul…

– $800 million / year is spent on drug discovery– Over 75% of drug compounds will never work

Page 35: Bioinformatics - Discovering the Bio Logic Of Nature

Drug Development Process

Page 36: Bioinformatics - Discovering the Bio Logic Of Nature

Drug Discovery

Page 37: Bioinformatics - Discovering the Bio Logic Of Nature

“Pharmaco Genomics”

• Individualized medicine• Looking at SNPs along drug targets

– What makes each of us – us?– 1 million SNPs, about one per intron

• In the future, each of us will have our genome “insilico” (genome on a chip)

• Data mining against 6 billion genomes!

Page 38: Bioinformatics - Discovering the Bio Logic Of Nature

Pharmaco Genomics

Page 39: Bioinformatics - Discovering the Bio Logic Of Nature

One Genome

• There are three very different ways to look at genomic diversity – and all are equally valid!

• A “collective” human genome– 3 billion base pairs – called the ‘golden path’

• Each one of us is a unique genome– “I am a genome of one”, my SNPS make me - ‘me’

• The Genome on planet earth– A collective metabolic evolution and speciation

Page 40: Bioinformatics - Discovering the Bio Logic Of Nature

Terra Genoma

Page 41: Bioinformatics - Discovering the Bio Logic Of Nature

Molecular Networks

• Genome or Proteome?• Proteome of Genome?• Wait a minute…• What if it’s both?• Now what would that look like?

Page 42: Bioinformatics - Discovering the Bio Logic Of Nature

Gene Regulatory Networks

Page 43: Bioinformatics - Discovering the Bio Logic Of Nature
Page 44: Bioinformatics - Discovering the Bio Logic Of Nature

Pathway Kinetics

Page 45: Bioinformatics - Discovering the Bio Logic Of Nature

Gene Regulatory Network

Page 46: Bioinformatics - Discovering the Bio Logic Of Nature

Bioinformatics Tools

• NCBI– BLAST, 12 million records, SNP databases

• ExPASy– Swiss-Prot, EMBL, Swiss-Model

• PIR – Protein Information Resource• PDB – Protein Data Bank• Pfam – Protein families

Page 47: Bioinformatics - Discovering the Bio Logic Of Nature

NCBI

• National Center for Biotechnology Information, part of NIH and NLM

• Funded by US – open to all• GenBank and GenPept

– 13 million entries, 12 billion base pairs– Resources include oncology, retroviruses,

SNP databases, and much more• Sequin submission of raw sequence data

Page 48: Bioinformatics - Discovering the Bio Logic Of Nature
Page 49: Bioinformatics - Discovering the Bio Logic Of Nature

NCBI Resources

Page 50: Bioinformatics - Discovering the Bio Logic Of Nature

Retroviruses

Page 51: Bioinformatics - Discovering the Bio Logic Of Nature

BLAST

• Basic Local Alignment Search Tool• Used as a “genomic search engine”• Compare your target sequence to the

“non-redundant” database of 13B bps.• Can search the genomes of species

– Human, mouse, fly, E.coli etc.• ‘Hits’ return inks to GenBank and GenPept

Page 52: Bioinformatics - Discovering the Bio Logic Of Nature
Page 53: Bioinformatics - Discovering the Bio Logic Of Nature

Swiss-Prot

• Swiss - protein annotated database• Protein resource

– Minimal redundancy, reasonably current– protein annotated / integrated database– Links to protein structures and properties

• Links back into GenBank, EMBL, DDBJ• Literature resources for submissions

Page 54: Bioinformatics - Discovering the Bio Logic Of Nature

ExPASy

• The ExPASy (Expert Protein Analysis System)

• Proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to analysis of protein sequences and structures

• Swiss-Prot and PROSITE• Links to SWISS-MODEL

Page 55: Bioinformatics - Discovering the Bio Logic Of Nature

PROSITE - Database of Protein Families and Domains

Page 56: Bioinformatics - Discovering the Bio Logic Of Nature

Structure Analysis

Page 57: Bioinformatics - Discovering the Bio Logic Of Nature

Protein Data Bank

• SWISS-MODEL• Protein Data Bank• Archive of .pdb files• Structures determined by X-ray, NMR• Theoretical Structure Search• Features a “Molecule of the Month”• http://www.rcsb.org/pdb/

Page 58: Bioinformatics - Discovering the Bio Logic Of Nature
Page 59: Bioinformatics - Discovering the Bio Logic Of Nature

PIR

• Protein Information Resource• iProClass and PRI-NREF

– PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB

• http://pir.georgetown.edu/• Integrated public resource of protein informatics• Supports genomic and proteomic research and

scientific discovery - iProClass and PRI-NREF

Page 60: Bioinformatics - Discovering the Bio Logic Of Nature
Page 61: Bioinformatics - Discovering the Bio Logic Of Nature

Pfam

• Protein family comparisons– Look at multiple alignments – View protein domain architectures – Examine species distribution – Follow links to other databases – View known protein structures

• Follow ‘conserved domains’ from BLASTp searches of protein databases

Page 62: Bioinformatics - Discovering the Bio Logic Of Nature
Page 63: Bioinformatics - Discovering the Bio Logic Of Nature

The Grand Challenge

Page 64: Bioinformatics - Discovering the Bio Logic Of Nature

The Technology Roadmap• Genomics

– 1995 to 2005• Proteomics

– 2000 to 2010• Systems biology

– 2005 to 2015• Genetic remodeling / re-engineering

– 2010 to 2020• Generation Phi

– Children born in 2025 may never know disease

Page 65: Bioinformatics - Discovering the Bio Logic Of Nature

Convergence of Biotech & Pharma

• Genomics• Proteomics• Systems biology• Pharmaco genomics• Genetic engineering

Page 66: Bioinformatics - Discovering the Bio Logic Of Nature

Mouse Genome

Page 67: Bioinformatics - Discovering the Bio Logic Of Nature
Page 68: Bioinformatics - Discovering the Bio Logic Of Nature

Gene Therapy

• Somatic Gene Therapy• Therapeutic Gene Therapy

– Incorporate “missing genes”– Remove cells from host organism– Amplify target cells– Insert gene using (viral) vector– Return target cells into host organism

• Insulin gene was one of the first trials

Page 69: Bioinformatics - Discovering the Bio Logic Of Nature
Page 70: Bioinformatics - Discovering the Bio Logic Of Nature

Labeling Active Genes Along Chromosomes

Page 71: Bioinformatics - Discovering the Bio Logic Of Nature

Transgenic Species

Page 72: Bioinformatics - Discovering the Bio Logic Of Nature

Designer Flies – Is Blue Cool?

Page 73: Bioinformatics - Discovering the Bio Logic Of Nature

Your Own Private Genome

Page 74: Bioinformatics - Discovering the Bio Logic Of Nature

Surfing the Genome• Internet technologies

– Connecting users, tools, and data• Molecular biology

– Racing forward a top Moore’s Law• Informatics

– Mathematical interrogation of nature’s secrets• Surfing the Genome!

– Discovering the “bio-logic” of Nature

http://www.SurfingTheGenome.us/ Spring 2003

Page 75: Bioinformatics - Discovering the Bio Logic Of Nature

Contact Information

• Robert D. Cormia• Foothill College• [email protected]• http://www.informaticus.org/• 650 747 1588• Surfing the Genome – Spring 2003