58
DNA Learning Center July 15, 2003 W. Richard McCombie Professor Cold Spring Harbor Laboratory and The Watson School of Biological Sciences

DNA Learning Center July 15, 2003

  • Upload
    garth

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

DNA Learning Center July 15, 2003. W. Richard McCombie Professor Cold Spring Harbor Laboratory and The Watson School of Biological Sciences. Basic points. Genome research is advancing very rapidly Technologies are driving the progress - PowerPoint PPT Presentation

Citation preview

Page 1: DNA Learning Center July 15, 2003

DNA Learning CenterJuly 15, 2003

W. Richard McCombie

Professor

Cold Spring Harbor Laboratory and

The Watson School of Biological Sciences

Page 2: DNA Learning Center July 15, 2003

Basic points

• Genome research is advancing very rapidly

• Technologies are driving the progress

• These technologies and the data that results from them will have a revolutionary effect on the way biological research is done and in our understanding of biology and medicine

Page 3: DNA Learning Center July 15, 2003

Major Topics• What is genomics and in particular the human

genome program• Introduction and historical perspective on

sequencing. • Some information about genomes being sequenced• Stategies to analyse genomes • Comparative genomics• How genomics has and will change biology and

medicine

Page 4: DNA Learning Center July 15, 2003

What is an organism

• At ONE LEVEL, it is the result of the execution of the code that is its genome

• We do not know the degree to which environment alters this execution

• We do know that in addition to physical attributes, many complex processes such as behavior have an influence from the code

• We now know that in mammals, this code is only comprised of about 30,000-40,000 genes and their control units

Page 5: DNA Learning Center July 15, 2003

The Genome of an organism is:

• The complete set of inherited instructions for that organism - It’s complete DNA code

• When operating creates a set of proteins in an organized fashion

• These proteins act to cause growth, development and reproduction of the organism

Page 6: DNA Learning Center July 15, 2003

What is genomics

• Genomics is the analysis of the complete set of genetic instructions of an organism

• These genetic instructions consist of genes, which direct the production of proteins and their control elements

• These genes consist of a series of DNA bases• Previously we could only look at one or at most a

few of these objects or parts at a time• Technology now enables us to see them all

Page 7: DNA Learning Center July 15, 2003

Why will genomics have such an impact

• Important biological problems such as cancer and learning and memory are extraordinarily complex

• Genomics lets us integrate this complex information in a meaningful way

• Ultimately, much of biological research will be driven by computational analysis

Page 8: DNA Learning Center July 15, 2003

Sizes of some important genomes

• Virus 0.003 - 0.300 million

• Bacteria 0.8- 6 million

• Yeast 15 million

• C. elegans 100 million

• Rice 435 million

• Arabidopsis 130 million

• Fugu 800 million

• Mouse 2.5 billion

• Corn 2.5 billion

• Human 3 billion

• Wheat 16-20 billion

• Loblolly pine 20 billion

Page 9: DNA Learning Center July 15, 2003

Genome sequencing efficiencies per person

• 1980: 0.1-1 kb per year

• 1985: 1-5 kb per year

• 1990: 25-50 kb per year

• 1996: 100-200 kb per year

• 2000: 500-1000 kb per year

• 2002: 10,000 - 25,000 kb per year

Page 10: DNA Learning Center July 15, 2003

1982 1985 1988 1991 1994 1997

0

1000000000

2000000000

3000000000

4000000000

Bases in GenBank

Bases in GenBank

Page 11: DNA Learning Center July 15, 2003

Bases in GenBank 1982-1987

02000000

40000006000000

800000010000000

1200000014000000

1600000018000000

1982 1983 1984 1985 1986 1987

Bases in GenBank

Page 12: DNA Learning Center July 15, 2003

Methods to analyse a complex genome

• Mapping– Genetic

– Physical

• Expressed gene analysis• Genome sequence analysis

– Complete sequence

– Skimming

– “Rough draft”

Page 13: DNA Learning Center July 15, 2003

Salient features of genome organization

• Higher organisms have large genomes with considerable amount of repeat sequences

• Genes from higher organisms are interrupted by non-coding regions

• Only a small portion of a genome codes for genes

• Related organisms have related genomes

Page 14: DNA Learning Center July 15, 2003

Expressed Sequence Tags (sequencing parts of the processed genes)

• Advantages

• Inexpensive

• “Know” sequence is coding

• Information about tissue or developmental stage expression

• Disadvantages

• Coverage is incomplete

• Position of sequence in the genome is unknown

• Only partial information about each gene

• No information about structural elements

Page 15: DNA Learning Center July 15, 2003

Steps in genome sequencing

• Construction of a large-insert library• Construction of a small insert subclone library• Isolation of DNA• Sequencing of the DNA fragments (8-10x)• Assembly of the data into contiguous regions• Filling the gaps in the sequence and resolving

discrepancies• Confirmation of the sequence• Analysis

Page 16: DNA Learning Center July 15, 2003

High Accuracy Genomic Sequencing (6-10x plus resolution of problems)

• Advantages• Normalized coverage

of all genes• Information about

gene structure• Information about

regulatory elements• Genome organization

• Disadvantages• Cost• Time• Difficult to determine

if a sequence codes for a gene

Page 17: DNA Learning Center July 15, 2003

“Rough draft”

• Can be thought of as:– High coverage skimming– Low coverage complete sequencing

• Advantages and disadvantages are intermediate between skimming and complete sequencing - dependent on the coverage

Page 18: DNA Learning Center July 15, 2003

Cost of various types of sequencing (per base)

• “Base perfect” (uncomplicated) $0.3• 8x shotgun - no finishing $0.1• 4x shotgun - no finishing $0.05• 3x shotgun - no finishing $0.04• 1x shotgun - no finishing $0.01

Page 19: DNA Learning Center July 15, 2003

The Human Genome Project

• Human genome consists of three billion base pairs – Adenine, Cytosine, Guanine, Thymine

• Printing out the A,C,G,T would fill over 150,000 telephone book pages

• Disease is often caused by a single variation in the three billion bases - one different letter in 150,000 pages

Page 20: DNA Learning Center July 15, 2003

The human genome project

• A concerted effort to build resources to unravel the human control code

• To develop map resources to link genetic elements (such as disease genes) to a physical representation of the genome

• To determine the sequence of all of the DNA that combines to make the human control code

Page 21: DNA Learning Center July 15, 2003

2-15-01

Page 22: DNA Learning Center July 15, 2003

Genome sequencing assignments

CSHSC

ESSA

KazusaGenoscope

TIGR

SPP

I II III IV VKazusa

Page 23: DNA Learning Center July 15, 2003

The Arabidopsis genome Ğ basic statistics

feature Chr.1 Chr.2 Chr.3 Chr.4 Chr.5

length[ Mbp ] 30.4 19.8 23.7 17.8 27.0

GC content 33.4 % 35.5 % 36.1 % 35.5 % 35.9 %

GC content in coding regions 44.0 % 44.1 % 44.2 % 44.1 % 44.0 %

GC content in non-coding

regions

32.4 % 33.3 % 32.4 % 32.8 % 32.5 %

no. of genes 7046 4036 5126 3825 5874

exon length 247 259 250 256 242

gene density (kb / gene) 4.3 4.9 4.5 4.6 4.6

EST matches

(% genes with at least one EST

above 90% similarity)

60.6 % 56.8 % 59.7% 59.6 % 61.2 %

tRNAs 105 73 41 81 140

Targeted to mitochondria 445

(11%)

425

(10.5%)

446

(8.7%)

377

(9.9%)

627

(10.7)

Targeted to chloroplast 543

(15%)

533

(13.2%)

621

(12.1%)

513

(13.4%)

884

(15.1%)

Page 24: DNA Learning Center July 15, 2003

Gene Families

Gene families containingNo. of

singetonsand

distinctgene

families

unique 2membe

rs

3membe

rs

4membe

rs

5membe

rs

>5membe

rs

H.influenzae

1587 88.8 % 6.8 % 2.3 % 0.7 % 0.0 % 1.4 %

S.cerevisiae

5105 71.4 % 13.8 % 3.5 % 2.2 % 0.7 % 8.4 %

D.melanogaster

10736 72.5 % 8.5 % 3.4 % 1.9 % 1.6 % 12.1 %

C. elegans 14177 55.2 % 12.0 % 4.5 % 2.7 % 1.6 % 24.0 %A. thaliana 11601 35.0 % 12.5 % 7.0 % 4.4 % 3.6 % 37.4 %

Page 25: DNA Learning Center July 15, 2003
Page 26: DNA Learning Center July 15, 2003

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

metab

olism

ener

gy

cell g

row

th, c

ell d

ivisio

n an

d dn

a sy

nthe

sis

trans

cript

ion

prot

ein s

ynth

esis

prot

ein d

estin

ation

trans

port

facil

itatio

n

intra

cellu

lar tr

ansp

ort

cellu

lar b

iogen

esis

cellu

lar c

ommun

icatio

n / s

ignal

trans

duct

ion

cell r

escu

e, d

efen

se, c

ell d

eath

, age

ing

ionic

homeo

stas

is

class

if icat

ion n

ot y

et c

lear-c

ut

uncla

ssif ie

d

E.coli

Syneccocystis

Saccharomyces c.

C.elegans

Drosophila m.

human

Page 27: DNA Learning Center July 15, 2003

Cytogenetic map of chromosome 4S

NOR

knob

cen

3Mb

2Mb

0.5Mb

2Mb

0.5Mb

Paul Fransz

Page 28: DNA Learning Center July 15, 2003

Complete genomic sequencing reduces the genetics of an

organism to a closed, finite system

Page 29: DNA Learning Center July 15, 2003

FRUITFULL Gene Function

The AGL8 gene was renamed FRUITFULL (ful1)

Page 30: DNA Learning Center July 15, 2003

Genetic Redundancy

• apetala1 cauliflower double mutants have proliferating floral meristems ressembling cauliflowers

ap1 cal ful triple mutants have flowers replaced by shoots

Page 31: DNA Learning Center July 15, 2003

The state of Arabidopsis research200??

• Complete annotated sequence available• Time to clone a gene has decreased from months

to years to weeks in some cases• People are beginning to look at global features of

Arabidopsis• Gene trap insertion in “every” gene• Insertion site sequences known, linked to physical

and genetic map

Page 32: DNA Learning Center July 15, 2003

Analysis of not the first, or the second, but subsequent genomes

• The information from the first few genomes will enable huge cost and time savings

• A major emphasis will be to determine the function of genes

Page 33: DNA Learning Center July 15, 2003

What are the genes and what do they do???

• Computational analysis

• Functional analysis– Microarrays– Transposons– Various other methods

• Comparative analysis

Page 34: DNA Learning Center July 15, 2003

Comparative Genomics

Page 35: DNA Learning Center July 15, 2003

What can we learn from comparative analysis

• Evolutionary relationships

• Better annotation of genes, particularly of beginning and ends of genes

• Detection of conserved regulatory regions

• Functional evidence

Page 36: DNA Learning Center July 15, 2003

Benefits of having a model genome reference sequence with conserved local

gene order to your plant of interest• Requirements for sequence accuracy

decrease for most of the genome– you can fill in with high accuracy where needed

• The reference genome can be used as a scaffold allowing the anchoring of clones (allowing partial sequence coverage to infer complete clone coverage)

Page 37: DNA Learning Center July 15, 2003

Co-linearity among cereal genomes

Page 38: DNA Learning Center July 15, 2003

What type of comparisons are useful?• Arabidopsis to very closely related species

– Annotate the Arabidopsis sequence

• Arabidopsis to related crop plants (soybean, tomato, Medicago truncatula)– Determine the degree of locally conserved gene order between these crops and

Arabidopsis– Determine how the Arabidopsis sequence can be used in the analysis of these

species

• Arabidopsis to distant plants (rice for instance)– Gene discovery– Systems analysis– Gene order conservation???

• Arabidopsis to animals– How plants and animals differ in carrying out basic biological processes– How plant and animals organize and manage gene expression

Page 39: DNA Learning Center July 15, 2003

Mammalian Comparative Genomics

• Canine vs. Human Genome• Sequence canine ESTs• In collaboration with Elaine Ostrander (FHCRC)

map to the dog genome• Map computationally to the human genome• Use to better annotate the human sequence• Starting material for microarrays• Use in gene discovery (behavior and cancer)

Page 40: DNA Learning Center July 15, 2003
Page 41: DNA Learning Center July 15, 2003

myosin, light polypeptide 4, alkali

Page 42: DNA Learning Center July 15, 2003

How will genomics effect the way we do biological research

Page 43: DNA Learning Center July 15, 2003

Rate at which genes can be identified

• Cloning - weeks to years

• Database searches - seconds to minutes

Page 44: DNA Learning Center July 15, 2003

What are the areas where genome technology will impact us

• Diagnostics

• Forensics

• Understanding of diseases such as cancer at the molecular level

• Treatments for diseases customized to the individual

Page 45: DNA Learning Center July 15, 2003

Genomic Information allows us to look at the entire gene content of

an organism simultaneously

Page 46: DNA Learning Center July 15, 2003

> 9 of the 10 Leading Causes of Mortality Have Genetic Components

• 1. Heart disease (29.5% of deaths in ‘00) • 2. Cancer (22.9%) • 3. Cerebrovascular diseases (6.9%)• 4. Chronic lower respiratory dis. (5.1%)? 5. Injury (3.9%)• 6. Diabetes (2.9%) • 7. Pneumonia/Influenza (2.8%)• 8. Alzheimer disease (2.0%)• 9. Kidney disease (1.6%)• 10. Septicemia (1.3%)

Page 47: DNA Learning Center July 15, 2003

Genomic Health Care

• About conditions partly:–Caused by mutation(s) in gene(s)

• e.g., breast cancer, colon cancer, autism, atherosclerosis, inflammatory bowel disease, diabetes, Alzheimer disease, mood disorders, etc., etc.

–Prevented by mutation(s) in gene(s)• e.g., HIV (CCR5), ?atherosclerosis, ?cancers, ?

diabetes , etc., etc.

Page 48: DNA Learning Center July 15, 2003

Genomic Health Care

• Will change health care by...– Creating a fundamental understanding

of the biology of many diseases (and disabilities), even many “non-genetic” ones

– Helping to redefine illnesses by etiology rather than by symptomatology

Page 49: DNA Learning Center July 15, 2003

Genomic Health Care

• Knowledge of individual genetic predispositions will allow:– Individualized screening

– Individualized behavior changes

– Presymptomatic medical therapies, e.g., antihypertensive agents before hypertension develops, anti-mood disorder agents before mood disorder occurs

Page 50: DNA Learning Center July 15, 2003

Crystal Ball - 2010

• Predictive genetic tests for 10 - 25 conditions• Intervention to reduce risk for many of them • Gene therapy for a few conditions• Primary care providers begin to practice genetic medicine• Preimplantation diagnosis widely available, limits

fiercely debated• Effective legislative solutions to genetic discrimination &

privacy in place in US• Access remains inequitable, especially in developing

world

Page 51: DNA Learning Center July 15, 2003

Crystal Ball - 2020

• Gene-based designer drugs for diabetes, hypertension, etc. coming on the market

• Cancer therapy precisely targets molecular fingerprint of tumor

• Pharmacogenomic approach is standard approach for many drugs

• Mental illness diagnosis transformed, new therapies arriving, societal views shifting

• Homologous recombination technology suggests germline gene therapy could be safe

Page 52: DNA Learning Center July 15, 2003

Crystal Ball - 2030

• Genes involved in aging fully cataloged

• Clinical trials underway to extend life span

• Full computer model of human cells replaces many laboratory experiments

• Complete genomic sequencing of an individual is routine, costs less than $100

• Major anti-technology movements active in US, elsewhere

• Worldwide inequities remain

Page 53: DNA Learning Center July 15, 2003

Genomics• May also change society…

– Genetic stratification, e.g., in employment or marriage

– Genetic engineering against (and for) diseases and characteristics

– Cloning

– Increased opportunity for “private eugenics”

Page 54: DNA Learning Center July 15, 2003

Genomics

• If we are all mutants, what is the definition of normal?

Page 55: DNA Learning Center July 15, 2003

Conclusions

• Genomics will be the knowledge base or infrastructure for virtually all biology and medicine of the 21st century

• In silico biology will be a driving force in research and medicine

• Treatments for diseases will be radically improved by our understanding of complex diseases

Page 56: DNA Learning Center July 15, 2003

Collaborators and FundingRob Martienssen Pablo RabinowiczLincoln Stein

Susan McCouchSteve Tanksley

Rick WilsonMarco MarraElaine MardisJohn McPhersonBob WaterstonThe WUGSC

Special thanks to NHGRIfor some of the slides used

Rod Wing and the CUGI Group

Doug Cook

Mike BevanOur ESSA-MIPS Collaborators

Daphne PreussThe AGI

NSF, USDA, DOENIH (NHGRI) and NCI

Monsanto, Westvaco, David Luke III

Page 57: DNA Learning Center July 15, 2003
Page 58: DNA Learning Center July 15, 2003

“It is now conceivable that our children's children will know the term cancer only as a constellation of stars.”

–President Clinton at the White House, June 26, 2000 announcing completion of the human genome draft sequence