Upload
shonda-greer
View
218
Download
2
Tags:
Embed Size (px)
Citation preview
R.K.Wilson 2007
“Cancer Genomics”
Richard K. Wilson, Ph.D.Washington University
School of Medicine
R.K.Wilson 2007
Human Genome v1.0
TechnologySoftware toolsInfrastructure
Ancillarygenomes:
mousechimp
etc.
Discovery
CancerOther diseases
Cancer Genomics
Next-generation sequencing technology
R.K.Wilson 2007
list of candidate
genes
large collection of patient samples
PCR-based re-sequencing
R.K.Wilson 2007
K DFG R Y
Tyrosine kinase
745 Y869
K DFG Y Y Y YTM
718 964
EGF ligand binding autophos
GXGXXG
835
R
776
H
858 947
M
LREA
EGFR mutations in NSCLC
Most TKI responders have EGFR mutations: Study 1: 8/9 (89%) vs. 0/7 controls Study 2: 5/5 (100%) vs. 0/4 controls Study 3: 19/24 (79%) vs. 0/20 controls
R.K.Wilson 2007
~600 genes of interest
~200 lung adenocarcinoma samples
Tumor Sequencing Project
• Sequencing Centers: BCM-HGSC, BI, WUGSC• Cancer Centers: MSKCC, DFCI, SCC, MDA
R.K.Wilson 2007
• Too expensive to sequence the whole genome; therefore, focus on “drugable” targets.
• For lung adenocarcinoma TSP: ~600 genes (exons only)– Receptor tyrosine kinases (e.g. EGFR)– Selected serine-threonine kinases– Known oncogenes– Known tumor suppressor genes– EGFR pathway genes– DNA repair genes– Etc.
TSP Target List
R.K.Wilson 2007
SNP Arrays
R.K.Wilson 2007
SNP Arrays
R.K.Wilson 2007
DNA Chips/SNP Arrays
R.K.Wilson 2007
Lung Adeno Genomic Events
SNP Array Analysis
Weir et al. Nature (2007)
R.K.Wilson 2007
Lung Adeno Genomic Events
Weir et al. Nature (2007)
R.K.Wilson 2007
Lung Adeno Genomic Events
Weir et al. Nature (2007)
R.K.Wilson 2007
Lung Adenocarcinoma Amplifications
Weir et al. Nature (2007)
R.K.Wilson 2007
KRAS and TP53 Are Mutated in About 1/3 of Tumor Samples Indels have not been included in the analysis
0
10
20
30
40
50
60
70
KR
AS
E2F
4T
P53
GN
AS
ST
K11
EG
FR
LRR
K2
CD
KN
2AE
PH
A3
NF
1S
CA
RF
2P
TP
RD
LMT
K2
TY
K2
RIN
1R
OR
2M
KN
K2
ER
BB
4LR
P1B
NT
RK
1M
YO
3BP
IK3C
GLZ
TR
1JA
G2
CD
C2L
2E
PH
A5
CD
H11
PA
K3
SLC
38A
3P
IK3C
3IN
SR
RN
TR
K3
AT
MP
RK
CG
BA
GE
4K
DR
PT
EN
NR
AS
ZM
YN
D10
PD
GF
RA
INH
BA
PF
TK
1T
P73
LF
LT4
LTK
DO
CK
3N
TR
K2
EP
HB
6IR
AK
2IT
KE
PH
B1
AP
CE
PH
A7
BA
GE
3M
ST
1LM
TK
3P
AK
7G
AT
A1
TF
DP
1P
RK
AC
BT
SH
RM
INK
1F
GF
R4
RB
1F
GF
R1
# o
f m
uta
tio
ns
Mutations in lung adenocarcinoma
R.K.Wilson 2007
Mutations in TP53, ERBB3, and AKT3 appear to correlate with tumor grade
N=24 N=85 N=71
Mutation
R.K.Wilson 2007
• Mutations in PDGFRA, PTEN, NTRK1 and PRKDC show positive correlation with tumor stage.
• Mutations in LRP1B, PRKDC, TP53, and APC correlate with the solid tumor histological subtype of lung adenocarcinoma.
• High correlation of mutations in EGFR and MYO3B with never smoker and mutations in KRAS and LRP1B with smokers.
Correlations between mutations and clinical features
R.K.Wilson 2007
Screen of kinase domains in glioblastomano recurrent mutations But …
119 Lung Tumors: no EC mutations270 HapMap Normals: no EC mutations
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28TMEC KDJM
18/132 glioblastoma (13.6%); + 1 KD1/8 glioblastoma cell lines (12.5%)
0/11 lower grade gliomas
151 Total samples
red=somaticblue=germlineblack=unknown
L86
1Q
A28
9V/D
/TT
263PR
108K
D46
N,G
63R
R32
4LE
330K
G59
8VP
596L
EGFRvIII (del AA 30-297)
KINASEI II III IV
7 8 15 212 3
EGFR mutations in glioblastoma
• Hypothesis-driven (biased): - Gene sets with related functions: “kinome”,
“phosphatome”- Genes mutated in other cancers- Closely related genes- Investigator-driven ideas
• Data-driven (unbiased):- Use genomic platforms to identify loci with
recurrent somatic alterations- Array-based RNA profiling- Array CGH- Array-based SNP genotyping
Genomic Studies of Cancer
R.K.Wilson 2007
R.K.Wilson 2007
• Project initiated in 2002.• Primary tumors, matched normal
tissue (i.e., germline variants vs. somatic mutations)
• “Discovery set” (46 tumors) + “Validation set” (94 tumors)
• Initial target list: 450 genes• Orthogonal technologies (CGH
arrays, expression profiling, etc.) for genome characterization and to detect additional sequencing targets.
Acute myelogenous leukemia
R.K.Wilson 2007
- FLT3: 29%
- NPM1: 25%
- NRAS: 9.6%
- PTPN11: 4%
- RUNX1: 4%
- GCSFR: 4%
- Others: 2-3%
Acute myelogenous leukemia
• What are we missing outside of the exons?
• PCR-based re-sequencing:- Relatively expensive- Diploid (at best) & low coverage
Is there a better approach?
R.K.Wilson 2007
R.K.Wilson 2007
Solexa/Illumina 1G Analyzer
R.K.Wilson 2007
Solexa/Illumina 1G Analyzer
Illumina flow cell
• Acts as the microfluidic conduit for cluster generation and sequencing reagents.
• 8-lane flow cell configuration.• Separate libraries can be sequenced in each lane, or
the same library in all.• ~60M clusters are sequenced per flow cell.
R.K.Wilson 2007
Next Generation Sequencing Technologies
Genome size 3000 Mb
Req'd coverage 6 12 20
3730 454 FLX Solexabp/read 600 250 32Reads/run 96 400,000 28,000,000 bp/run 57,600 100,000,000 896,000,000 #/runs req'd 312,500 360 67
Cost per run 48$ 6,800$ 9,300$ Total cost 15,000,000$ 2,448,000$ 622,768$
R.K.Wilson 2007
• Whole genome sequence (tumor genome): Solexa• FL cDNA normalized library: Solexa + 454• Whole genome sequence (epidermal genome): Solexa
• Compare sequence to previously identified mutations. • Compare increasing coverage levels to heterozygous
SNPs from Affy/Illumina arrays for coverage evaluation.• Devise strategic approaches to find novel variants;
validate and characterize.
Data types:
Analysis plans:
AML: Whole Genome Sequencing
R.K.Wilson 2007
“933124”
• 57 y/o Caucasian female
• De novo M1 AML• 100% blasts in initial
BM sample• Relapsed and died at
11 months• Normal cytogenetics• No LOH on Affy
500K SNP array• Informed consent for
whole genome sequencing
R.K.Wilson 2007 R.K.Wilson 2007
R.K.Wilson 2007
• As of 1/28/08:• 75 Solexa runs completed (32 bp reads)• 62 billion bp (~22X haploid coverage)• 2,123,143 sequence variants detected (Q30)• 492,569 (23.2%) are previously undiscovered SNPs
• 46,320 heterozygous (informative) SNPs from Affy and Ilumina SNP arrays.
• 77% of informative SNPs with both WT and variant alleles were detected in the genome sequence.
• 97.4% of informative SNPs of either allele were detected in the genome sequence.
AML: Whole Genome Sequencing
R.K.Wilson 2007
R.K.Wilson 2007
“933124” genome sequence
2,123,143 variants
dbSNP 1,630,574
Genic334,477
Intergenic145,092
Splice_site
99Other
329,322Coding5,056
Synonymous1,222
Missense
3,402Nonsense
320Nonstop
9
*Only reporting Q30 variants*Genic region = gene boundary +/- 50kb
AML: Whole Genome Sequencing
R.K.Wilson 2007
454 cDNA sequencing:Number of mapped cDNA reads: 306,267
Solexa cDNA sequencing:Number of mapped reads: 47,153,784
AML: Transcriptome Sequencing
Various cDNA library construction procedures & normalization schemes
Expressed genes: variant:germline frequencies
– MYCBP2 1188:345– HSP90B1 694:1347– BCCIP 391:394– NCOR1 256:268– CHFR 230:52– DNAJ 218:0– PTPN11 198:1– NUMA1 157:2– CASPASE 7 145:147– HOX C6 118:2– PLEKHC1 112:14– NTRK3 112:10– CDC2 96:82
R.K.Wilson 2007
AML: Transcriptome Sequencing
R.K.Wilson 2007
V194M (C to T) in FLT3
cDNA sequence Tumor genome sequence
CTCT
R.K.Wilson 2007
• Currently using SXOligoSearchG (Synamatix) to detect small (1-2 bp) indels.
• Evaluating software tools for detection of larger indels.
AML: Whole Genome Sequencing
AML: Current status
R.K.Wilson 2007
thirsty for knowledge?
• Diploid coverage was obtained for 77% of an AML M1 tumor genome with 22x haploid coverage.
• 2.1M sequence variants found (similar to other whole genomes already ‘finished’).
• ~495,000 novel variants: SNPs vs. somatic mutations• 10x coverage of epidermis (“normal”) genome just
completed; may identify >90% of variants as rare SNPs.• Remaining 50,000 variants are being prioritized by
detection in cDNA: should be <1,000• Very rare somatic mutations in cDNA thusfar (only 2
validated).• No mutator (“driver”) phenotype is readily apparent for this
AML case; ”passenger” mutations appear to be rare.• We continue to sift through the data…
AML: Current status
R.K.Wilson 2007
• Exon-targeted sequencing (TSP, glioblastoma) is revealing useful & interesting findings; expensive & slow!
• Next Gen sequencing is here and will have a substantial near-term impact on the study of cancer genomes!
• Ancillary genome-based technologies (expression profiling, SNP arrays, cDNA sequencing) are crucial for understanding the target genome before considering WGS.
• The dream is not hype: a comprehensive understanding of the “cancer genome” is probable, and will change the way that you diagnose & treat your patients.
Cancer Genomics
R.K.Wilson 2007
R.K.Wilson 2007
Acknowledgments• WU Genome Sequencing Center
Elaine Mardis, Li Ding, Dave Dooling, Tracy Miner, Mike McLellan, Ginger Fewell, Jim Eldred, Asif Chinwalla, Yumi Kasai, Lucinda Fulton, Vince Magrini, Matt Hickenbotham, Lisa Cook, Michael Wendl, Michael Province
• WU Siteman Cancer CenterTim Ley, Mark Watson, Matt Walter, Rhonda Ries, Jackie Payton, John DiPersio, Dan Link, Michael Tomasson, Tim Graubert, Sharon Heath
• TSP/TCGA ColleaguesBaylor HGSC, Broad Institute, many others…
• Funding sourcesNHGRI (Wilson), NCI (Ley), Alvin J. Siteman (AML WGS)
genome.wustl.edu