38
R.K.Wilson 2007 “Cancer Genomics” [email protected] Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007 “Cancer Genomics” [email protected] Richard K. Wilson, Ph.D. Washington University School of Medicine

Embed Size (px)

Citation preview

Page 1: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

“Cancer Genomics”

[email protected]

Richard K. Wilson, Ph.D.Washington University

School of Medicine

Page 2: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Human Genome v1.0

TechnologySoftware toolsInfrastructure

Ancillarygenomes:

mousechimp

etc.

Discovery

CancerOther diseases

Cancer Genomics

Next-generation sequencing technology

Page 3: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

list of candidate

genes

large collection of patient samples

PCR-based re-sequencing

Page 4: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

K DFG R Y

Tyrosine kinase

745 Y869

K DFG Y Y Y YTM

718 964

EGF ligand binding autophos

GXGXXG

835

R

776

H

858 947

M

LREA

EGFR mutations in NSCLC

Most TKI responders have EGFR mutations: Study 1: 8/9 (89%) vs. 0/7 controls Study 2: 5/5 (100%) vs. 0/4 controls Study 3: 19/24 (79%) vs. 0/20 controls

Page 5: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

~600 genes of interest

~200 lung adenocarcinoma samples

Tumor Sequencing Project

• Sequencing Centers: BCM-HGSC, BI, WUGSC• Cancer Centers: MSKCC, DFCI, SCC, MDA

Page 6: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

• Too expensive to sequence the whole genome; therefore, focus on “drugable” targets.

• For lung adenocarcinoma TSP: ~600 genes (exons only)– Receptor tyrosine kinases (e.g. EGFR)– Selected serine-threonine kinases– Known oncogenes– Known tumor suppressor genes– EGFR pathway genes– DNA repair genes– Etc.

TSP Target List

Page 7: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

SNP Arrays

Page 8: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

SNP Arrays

Page 9: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

DNA Chips/SNP Arrays

Page 10: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Lung Adeno Genomic Events

SNP Array Analysis

Weir et al. Nature (2007)

Page 11: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Lung Adeno Genomic Events

Weir et al. Nature (2007)

Page 12: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Lung Adeno Genomic Events

Weir et al. Nature (2007)

Page 13: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Lung Adenocarcinoma Amplifications

Weir et al. Nature (2007)

Page 14: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

KRAS and TP53 Are Mutated in About 1/3 of Tumor Samples Indels have not been included in the analysis

0

10

20

30

40

50

60

70

KR

AS

E2F

4T

P53

GN

AS

ST

K11

EG

FR

LRR

K2

CD

KN

2AE

PH

A3

NF

1S

CA

RF

2P

TP

RD

LMT

K2

TY

K2

RIN

1R

OR

2M

KN

K2

ER

BB

4LR

P1B

NT

RK

1M

YO

3BP

IK3C

GLZ

TR

1JA

G2

CD

C2L

2E

PH

A5

CD

H11

PA

K3

SLC

38A

3P

IK3C

3IN

SR

RN

TR

K3

AT

MP

RK

CG

BA

GE

4K

DR

PT

EN

NR

AS

ZM

YN

D10

PD

GF

RA

INH

BA

PF

TK

1T

P73

LF

LT4

LTK

DO

CK

3N

TR

K2

EP

HB

6IR

AK

2IT

KE

PH

B1

AP

CE

PH

A7

BA

GE

3M

ST

1LM

TK

3P

AK

7G

AT

A1

TF

DP

1P

RK

AC

BT

SH

RM

INK

1F

GF

R4

RB

1F

GF

R1

# o

f m

uta

tio

ns

Mutations in lung adenocarcinoma

Page 15: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Mutations in TP53, ERBB3, and AKT3 appear to correlate with tumor grade

N=24 N=85 N=71

Mutation

Page 16: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

• Mutations in PDGFRA, PTEN, NTRK1 and PRKDC show positive correlation with tumor stage.

• Mutations in LRP1B, PRKDC, TP53, and APC correlate with the solid tumor histological subtype of lung adenocarcinoma.

• High correlation of mutations in EGFR and MYO3B with never smoker and mutations in KRAS and LRP1B with smokers.

Correlations between mutations and clinical features

Page 17: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Screen of kinase domains in glioblastomano recurrent mutations But …

119 Lung Tumors: no EC mutations270 HapMap Normals: no EC mutations

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28TMEC KDJM

18/132 glioblastoma (13.6%); + 1 KD1/8 glioblastoma cell lines (12.5%)

0/11 lower grade gliomas

151 Total samples

red=somaticblue=germlineblack=unknown

L86

1Q

A28

9V/D

/TT

263PR

108K

D46

N,G

63R

R32

4LE

330K

G59

8VP

596L

EGFRvIII (del AA 30-297)

KINASEI II III IV

7 8 15 212 3

EGFR mutations in glioblastoma

Page 18: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

• Hypothesis-driven (biased): - Gene sets with related functions: “kinome”,

“phosphatome”- Genes mutated in other cancers- Closely related genes- Investigator-driven ideas

• Data-driven (unbiased):- Use genomic platforms to identify loci with

recurrent somatic alterations- Array-based RNA profiling- Array CGH- Array-based SNP genotyping

Genomic Studies of Cancer

R.K.Wilson 2007

Page 19: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

• Project initiated in 2002.• Primary tumors, matched normal

tissue (i.e., germline variants vs. somatic mutations)

• “Discovery set” (46 tumors) + “Validation set” (94 tumors)

• Initial target list: 450 genes• Orthogonal technologies (CGH

arrays, expression profiling, etc.) for genome characterization and to detect additional sequencing targets.

Acute myelogenous leukemia

Page 20: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

- FLT3: 29%

- NPM1: 25%

- NRAS: 9.6%

- PTPN11: 4%

- RUNX1: 4%

- GCSFR: 4%

- Others: 2-3%

Acute myelogenous leukemia

Page 21: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

• What are we missing outside of the exons?

• PCR-based re-sequencing:- Relatively expensive- Diploid (at best) & low coverage

Is there a better approach?

R.K.Wilson 2007

Page 22: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Solexa/Illumina 1G Analyzer

Page 23: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Solexa/Illumina 1G Analyzer

Illumina flow cell

• Acts as the microfluidic conduit for cluster generation and sequencing reagents.

• 8-lane flow cell configuration.• Separate libraries can be sequenced in each lane, or

the same library in all.• ~60M clusters are sequenced per flow cell.

Page 24: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Next Generation Sequencing Technologies

Genome size 3000 Mb

Req'd coverage 6 12 20

3730 454 FLX Solexabp/read 600 250 32Reads/run 96 400,000 28,000,000 bp/run 57,600 100,000,000 896,000,000 #/runs req'd 312,500 360 67

Cost per run 48$ 6,800$ 9,300$ Total cost 15,000,000$ 2,448,000$ 622,768$

Page 25: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

• Whole genome sequence (tumor genome): Solexa• FL cDNA normalized library: Solexa + 454• Whole genome sequence (epidermal genome): Solexa

• Compare sequence to previously identified mutations. • Compare increasing coverage levels to heterozygous

SNPs from Affy/Illumina arrays for coverage evaluation.• Devise strategic approaches to find novel variants;

validate and characterize.

Data types:

Analysis plans:

AML: Whole Genome Sequencing

Page 26: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

“933124”

• 57 y/o Caucasian female

• De novo M1 AML• 100% blasts in initial

BM sample• Relapsed and died at

11 months• Normal cytogenetics• No LOH on Affy

500K SNP array• Informed consent for

whole genome sequencing

Page 27: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007 R.K.Wilson 2007

Page 28: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Page 29: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

• As of 1/28/08:• 75 Solexa runs completed (32 bp reads)• 62 billion bp (~22X haploid coverage)• 2,123,143 sequence variants detected (Q30)• 492,569 (23.2%) are previously undiscovered SNPs

• 46,320 heterozygous (informative) SNPs from Affy and Ilumina SNP arrays.

• 77% of informative SNPs with both WT and variant alleles were detected in the genome sequence.

• 97.4% of informative SNPs of either allele were detected in the genome sequence.

AML: Whole Genome Sequencing

R.K.Wilson 2007

Page 30: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

“933124” genome sequence

2,123,143 variants

dbSNP 1,630,574

Genic334,477

Intergenic145,092

Splice_site

99Other

329,322Coding5,056

Synonymous1,222

Missense

3,402Nonsense

320Nonstop

9

*Only reporting Q30 variants*Genic region = gene boundary +/- 50kb

AML: Whole Genome Sequencing

Page 31: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

454 cDNA sequencing:Number of mapped cDNA reads: 306,267

Solexa cDNA sequencing:Number of mapped reads: 47,153,784

AML: Transcriptome Sequencing

Various cDNA library construction procedures & normalization schemes

Page 32: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

Expressed genes: variant:germline frequencies

– MYCBP2 1188:345– HSP90B1 694:1347– BCCIP 391:394– NCOR1 256:268– CHFR 230:52– DNAJ 218:0– PTPN11 198:1– NUMA1 157:2– CASPASE 7 145:147– HOX C6 118:2– PLEKHC1 112:14– NTRK3 112:10– CDC2 96:82

R.K.Wilson 2007

AML: Transcriptome Sequencing

Page 33: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

V194M (C to T) in FLT3

cDNA sequence Tumor genome sequence

CTCT

Page 34: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

• Currently using SXOligoSearchG (Synamatix) to detect small (1-2 bp) indels.

• Evaluating software tools for detection of larger indels.

AML: Whole Genome Sequencing

Page 35: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

AML: Current status

R.K.Wilson 2007

thirsty for knowledge?

Page 36: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

• Diploid coverage was obtained for 77% of an AML M1 tumor genome with 22x haploid coverage.

• 2.1M sequence variants found (similar to other whole genomes already ‘finished’).

• ~495,000 novel variants: SNPs vs. somatic mutations• 10x coverage of epidermis (“normal”) genome just

completed; may identify >90% of variants as rare SNPs.• Remaining 50,000 variants are being prioritized by

detection in cDNA: should be <1,000• Very rare somatic mutations in cDNA thusfar (only 2

validated).• No mutator (“driver”) phenotype is readily apparent for this

AML case; ”passenger” mutations appear to be rare.• We continue to sift through the data…

AML: Current status

R.K.Wilson 2007

Page 37: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

• Exon-targeted sequencing (TSP, glioblastoma) is revealing useful & interesting findings; expensive & slow!

• Next Gen sequencing is here and will have a substantial near-term impact on the study of cancer genomes!

• Ancillary genome-based technologies (expression profiling, SNP arrays, cDNA sequencing) are crucial for understanding the target genome before considering WGS.

• The dream is not hype: a comprehensive understanding of the “cancer genome” is probable, and will change the way that you diagnose & treat your patients.

Cancer Genomics

R.K.Wilson 2007

Page 38: R.K.Wilson 2007 “Cancer Genomics” rwilson@watson.wustl.edu Richard K. Wilson, Ph.D. Washington University School of Medicine

R.K.Wilson 2007

Acknowledgments• WU Genome Sequencing Center

Elaine Mardis, Li Ding, Dave Dooling, Tracy Miner, Mike McLellan, Ginger Fewell, Jim Eldred, Asif Chinwalla, Yumi Kasai, Lucinda Fulton, Vince Magrini, Matt Hickenbotham, Lisa Cook, Michael Wendl, Michael Province

• WU Siteman Cancer CenterTim Ley, Mark Watson, Matt Walter, Rhonda Ries, Jackie Payton, John DiPersio, Dan Link, Michael Tomasson, Tim Graubert, Sharon Heath

• TSP/TCGA ColleaguesBaylor HGSC, Broad Institute, many others…

• Funding sourcesNHGRI (Wilson), NCI (Ley), Alvin J. Siteman (AML WGS)

genome.wustl.edu