Applied research in human genetics Weibin Shi Michele Sale

Applied research in human Applied research in human geneticsgenetics

Weibin ShiMichele Sale

Identification of genes that cause disease

The central focus of human The central focus of human genetics research:genetics research:

Which polymorphisms inWhich polymorphisms in

Which genes inWhich genes in

Which individualsWhich individuals

Exposed to which environmental factorsExposed to which environmental factors

Increase risk of developing disease?Increase risk of developing disease?

Defining what to studyDefining what to study As in any biomedical study, need to precisely As in any biomedical study, need to precisely

define the disease under study define the disease under study

Define primary phenotype and secondary Define primary phenotype and secondary phenotypesphenotypes

Understanding risk factorsUnderstanding risk factors Genetic or Environmental?Genetic or Environmental?

• Ethnic differencesEthnic differences• Age/gender influenceAge/gender influence

Refining whether the disease Refining whether the disease under study is geneticunder study is genetic

Family studies: Familial aggregation Twin studies: Concordance rate of

disorder for monozygotic twins (MZ) vs. the rate for dyzogotic (DZ) twins

Adoption studies: disease frequency of adoptees’ biological vs. their adopted parents or siblings

Ethnic differences

Best Proof of All?Best Proof of All?

Connect genetic variation to the disease!Connect genetic variation to the disease!

But, how do we But, how do we find the gene?find the gene?

Linkage analysis Linkage analysis andand Association Association analysisanalysis are effective in identifying are effective in identifying Mendelian disorder genesMendelian disorder genes but are but are

less effective in identifying less effective in identifying complex disease genescomplex disease genes

Complex diseases are often caused by multiple genes and environmental factors

Difficulties of genetic studies of complex disease in humans

Heterogeneity of human populations

Several to many genes involved

modest effects for any single gene

Environmental influences

Advantages over other mammals:-Small size (<40g), short generation time (8-9 wks), large litter size (5~10 puppies)-Numerous inbred strains and gene-targeted-Easy control of environmental factors

Mouse model of Mouse model of humanhuman genetic genetic diseasedisease

Mouse genome shares great similarity Mouse genome shares great similarity with the human genomewith the human genome

Mouse-Human Comparison2.5 vs. 3.2 billion bp long> 99% of genes have homologs

> 95% of genome “syntenic” (relative gene-order conservation)

Variation among mouse strains in susceptibility to diet-induced atherosclerosis

Atherosclerotic Vascular Disease

Terminology

Discrete/qualitative trait - traits that are present or absent.

Continuous/quantitative trait - traits that have measurable characteristics across a range of values. This class includes the vast majority of diseases afflicting humans.

Gene 1

Gene 3Gene 4

Gene 5

Gene 2

Gene 6

Quantitative trait locus (QTL) Quantitative trait locus (QTL) analysisanalysis

B6

F1

x

x

F2…

C3H

QTL analysis starts with selection of two QTL analysis starts with selection of two phenotypically different strainsphenotypically different strains

All F2s are analyzed for trait values

All FAll F22s are typed for genetic markers s are typed for genetic markers

spanning the who genomespanning the who genome

Statistical analysis

Map Manager QTXb20 (http://mapmgr.roswellpark.org/) and R/qtl (http://www.biostat. jhsph.edu/~kbroman/software) are available for testing the association of a phenotype with each marker.

Log of the-odds-ratio (LOD) score is used to define the significance of the association of a genetic marker with a trait.

Genome-wide scan for Genome-wide scan for atherosclerotic lesions atherosclerotic lesions

Interval mapping provides best estimation on the location of genes affecting atherosclerotic lesions

Dissect major QTL byconstruction and analysis of congenic strains

Congenic strain: identical to an inbred strain except for a differential chromosomal segment

Sequence ComparisonSequence Comparison

If crosses include those of sequenced If crosses include those of sequenced strains, search database for polymorphisms strains, search database for polymorphisms of positional candidate genes in the QTL of positional candidate genes in the QTL regions.regions.15 common inbred strains (B6, AJ, 129, DBA, C3H …)15 common inbred strains (B6, AJ, 129, DBA, C3H …) now available at now available at MGI, NCBI, and MGI, NCBI, and EnsemblEnsembl

Re-sequence coding and promoter regions of Re-sequence coding and promoter regions of strong candidate genes.strong candidate genes.

Gene expression databaseGene expression database

Where is your gene expressed?Where is your gene expressed?http://www.informatics.jax.org/javawi2/servlet/WIFetch?http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=expressionQFpage=expressionQF

http://www.ncbi.nlm.nih.gov/geo/

Is there microarry data for your gene?Is there microarry data for your gene?

Conduct functional studies to prove the identity of promising candidate genes

Test the significance of QTL genes found in mouse Test the significance of QTL genes found in mouse by association analysis using human populations by association analysis using human populations

Table 2 Genotyping results for genes in the human Chr 1 region homologous to the mouse Table 2 Genotyping results for genes in the human Chr 1 region homologous to the mouse Ath1Ath1 locus locus

Rare alleles, % (total alleles)Rare alleles, % (total alleles)

GeneGeneaa RefSNP IDRefSNP IDbb SNPSNPcc Position in gene (bp in Ensembl)Position in gene (bp in Ensembl) AffectedAffected ControlControl PP

PIGCPIGC rs1063412rs1063412 C/TC/T Exon 2 coding (169650343)Exon 2 coding (169650343) 40.7 (684)40.7 (684) 41.7 (734)41.7 (734) 0.690.69

C1orf9C1orf9dd rs1053381rs1053381 A/GA/G 3' UTR (169819913)3' UTR (169819913) 6.3 (694)6.3 (694) 8.0 (672)8.0 (672) 0.220.22

TNFSF6 (FASL)TNFSF6 (FASL) rs763110rs763110 C/TC/T 687 bp upstream (169866874)687 bp upstream (169866874) 31.6 (728)31.6 (728) 32.0 (744)32.0 (744) 0.870.87

IntergenicIntergenic rs983514rs983514 A/TA/T (170111828)(170111828) 3.0 (708)3.0 (708) 3.5 (714)3.5 (714) 0.570.57

TNFSF18TNFSF18 rs1883477rs1883477 A/GA/G Intron 1 (170258429)Intron 1 (170258429) 19.0 (694)19.0 (694) 18.3 (706)18.3 (706) 0.720.72

TNFSF4 (OX40L)TNFSF4 (OX40L) rs1234315rs1234315 C/TC/T 1,992 bp upstream (170417839)1,992 bp upstream (170417839)ff 45.9 (754)45.9 (754) 43.3 (778)43.3 (778) 0.310.31

rs3850641rs3850641 A/GA/G Intron 1 (170415208)Intron 1 (170415208)ee 15.5 (766)15.5 (766) 12.1 (784)12.1 (784) 0.050.05

rs1234313rs1234313 A/GA/G Intron 1 (170405623)Intron 1 (170405623)ee 29.6 (766)29.6 (766) 33.4 (784)33.4 (784) 0.110.11

rs3861950rs3861950 C/TC/T Intron 2 (170395668)Intron 2 (170395668)ee 33.4 (710)33.4 (710) 30.4 (746)30.4 (746) 0.230.23

rs1234312rs1234312 C/TC/T 1,809 bp downstream (170390438)1,809 bp downstream (170390438)ff 3.0 (766)3.0 (766) 2.6 (772)2.6 (772) 0.620.62

aa

Applied research in Applied research in human geneticshuman geneticsMichèle Sale, Ph.D.Michèle Sale, Ph.D.

Center for Public Health GenomicsCenter for Public Health Genomics

[email protected]@virginia.edu

Tel: 982-0368Tel: 982-0368

National DNA Day!National DNA Day!

April 25April 25 Commemorates the discovery of the Commemorates the discovery of the

structure of DNA in 1953 and the structure of DNA in 1953 and the sequencing of the human genome 50 sequencing of the human genome 50 years lateryears later

Genetic Information Non-Genetic Information Non-Discrimination Act of 2007 Discrimination Act of 2007

(GINA)(GINA) A A version first introduced in 1995version first introduced in 1995 GINA would:GINA would:

Prohibit access to individuals' personal genetic information by insurance Prohibit access to individuals' personal genetic information by insurance companies making health coverage plan enrollment decisions, and by companies making health coverage plan enrollment decisions, and by employers making hiring decisions;employers making hiring decisions;

Prohibit insurance companies from requesting that applicants for group Prohibit insurance companies from requesting that applicants for group or individual health coverage plans be subjected to genetic testing or or individual health coverage plans be subjected to genetic testing or screening, and prohibit them from discriminating against health plan screening, and prohibit them from discriminating against health plan applicants based on individual genetic information; andapplicants based on individual genetic information; and

Prohibit employers from using genetic information to refuse employment, Prohibit employers from using genetic information to refuse employment, and prohibit them from collecting employees' personal genetic and prohibit them from collecting employees' personal genetic information without their explicit consent. information without their explicit consent.

Nearly 40 states have had individual forms of the legislation in placeNearly 40 states have had individual forms of the legislation in place

Passed by House:Passed by House: April 25, 2007 (420-3), and againApril 25, 2007 (420-3), and again March 7, 2008 (as part of the Paul Wellstone Mental Health and March 7, 2008 (as part of the Paul Wellstone Mental Health and

Addiction Equity Act, 268-148)Addiction Equity Act, 268-148) Senator Tom Coburn (R, Oklahoma) had placed hold on bill in the senateSenator Tom Coburn (R, Oklahoma) had placed hold on bill in the senate April 24, 2008: GINA passes in Senate (95-0)April 24, 2008: GINA passes in Senate (95-0)

Some examples from Some examples from GWAS for type 2 GWAS for type 2

diabetesdiabetes

The first type 2 diabetes GWAS papers…The first type 2 diabetes GWAS papers…

Sladek et al. A genome-wide association study identifies novel risk loci for Sladek et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007 Feb 22; 445:881-5.type 2 diabetes. Nature. 2007 Feb 22; 445:881-5.

Frayling et al. A common variant in the FTO gene is associated with body Frayling et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007 mass index and predisposes to childhood and adult obesity. Science. 2007 May 11; 316:889-94.May 11; 316:889-94.

Steinthorsdottir et al. A variant in CDKAL1 influences insulin response and Steinthorsdottir et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet. 2007 Jun; 39:770-5.risk of type 2 diabetes. Nat Genet. 2007 Jun; 39:770-5.

Wellcome Trust Case Control Consortium. Wellcome Trust Case Control Consortium. Genome-wide association study Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007 Jun 7; 447:661-78.Nature. 2007 Jun 7; 447:661-78.

Saxena et al. Genome-wide association analysis identifies loci for type 2 Saxena et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007 Jun 1;316(5829):1331-6 diabetes and triglyceride levels. Science. 2007 Jun 1;316(5829):1331-6

Zeggini et al. Replication of genome-wide association signals in UK Zeggini et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007 Jun 1; samples reveals risk loci for type 2 diabetes. Science. 2007 Jun 1; 316:1336-41.316:1336-41.

Scott et al. A genome-wide association study of type 2 diabetes in Finns Scott et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007 Jun 1; 316:1341-5.detects multiple susceptibility variants. Science. 2007 Jun 1; 316:1341-5.

Diabetes Genetics Initiative of Broad Institute of Harvard Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of and MIT, Lund University, and Novartis Institutes of

BioMedical Research, Science 2007 Jun BioMedical Research, Science 2007 Jun 1;316(5829):1331-61;316(5829):1331-6

Association results from Association results from WTCC replication studyWTCC replication study

Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007). Frayling TM. Nat Rev Genet 2007 Sep; 8:657-62

Transcription-factor 7-like 2 Transcription-factor 7-like 2 (TCF7L2) (TCF7L2)

Major new diabetes geneMajor new diabetes gene Identified as a diabetes gene byIdentified as a diabetes gene by

Grant Grant et al.et al. Nat Genet 2006 March; 38: 320-323Nat Genet 2006 March; 38: 320-323

Not previously suspected to be involved in Not previously suspected to be involved in diabetesdiabetes

Known to influence levels of at least 60 other Known to influence levels of at least 60 other genes!genes!

Shown to have a role in insulin secretion Shown to have a role in insulin secretion (Lyssenko et al. J Clin Invest. 2007 Aug; 117:2155-63)(Lyssenko et al. J Clin Invest. 2007 Aug; 117:2155-63)

Replicated GWAS diabetes genesReplicated GWAS diabetes genes

GeneGene ChrChr ReferenceReference

Previously known diabetes genesPreviously known diabetes genes

TCF7L2TCF7L2 1010 Sladek, Steinthorsdottir, ScottSladek, Steinthorsdottir, Scott

PPARGPPARG 33 WTCCC, ScottWTCCC, Scott

KCNJ11KCNJ11 1111 WTCCC, Scott, SaxenaWTCCC, Scott, Saxena

Novel diabetes genesNovel diabetes genes

SLC30A8SLC30A8 11 Sladek, Scott, Zeggini, SaxenaSladek, Scott, Zeggini, Saxena

IGF2BP2IGF2BP2 33 Saxena, WTCCC, Scott, ZegginiSaxena, WTCCC, Scott, Zeggini

CDKAL1CDKAL1 66 Steinthorsdottir, Scott, Zeggini, SaxenaSteinthorsdottir, Scott, Zeggini, Saxena

HHEX/IDEHHEX/IDE 1010 Sladek, Scott, Zeggini, SaxenaSladek, Scott, Zeggini, Saxena

CDKN2A/CDKN2B regionCDKN2A/CDKN2B region 99 Saxena, WTCCC, ScottSaxena, WTCCC, Scott

FTOFTO 1616 WTCCC, Scott, ZegginiWTCCC, Scott, Zeggini

Frayling TM. Nat Rev Genet 2007 Sep; 8:657-62

Effect sizes of 11 confirmed Effect sizes of 11 confirmed diabetes variantsdiabetes variants

Frayling TM. Nat Rev Genet 2007 Sep; 8:657-62

TCF7L2 resultsTCF7L2 results

SNP Population Case frequency

Control frequency

P-value Odds ratio

rs7903146 Iceland 39% 30% 1.6 x 10-9 1.50

Denmark 36% 27% 0.0018 1.46

U.S. (Caucasians) 40% 28% 1.6 x 10-7 1.71

U.K. (Caucasians) 38% 31% 1.3 x 10-11 1.35

Finland 22% 18% 0.00042 1.33

France 43% 31% 6.0 x 10-35 1.69

Netherlands 37% 29% 4.4 x 10-5 1.41

Europe (Caucasians) 36% 28% <0.0001 1.54

U.K. (Indian) 34% 27% 0.002 1.53

U.S. (African American) 37% 28% 4.1 x 10-6 1.51

West Africa 41% 21% 0.0021 1.45

ButBut this variant is rarer in this variant is rarer inEast Asian and Native East Asian and Native American populationsAmerican populations

SNP Population Case frequency

Control frequency

P-value Odds ratio

rs7903146 Iceland 39% 30% 1.6 x 10-9 1.50

Denmark 36% 27% 0.0018 1.46

U.S. (Caucasians) 40% 28% 1.6 x 10-7 1.71

U.K. (Caucasians) 38% 31% 1.3 x 10-11 1.35

Finland 22% 18% 0.00042 1.33

France 43% 31% 6.0 x 10-35 1.69

Netherlands 37% 29% 4.4 x 10-5 1.41

Europe (Caucasians) 36% 28% <0.0001 1.54

U.K. (Indian) 34% 27% 0.002 1.53

U.S. (African American) 37% 28% 4.1 x 10-6 1.51

West Africa 41% 21% 0.0021 1.45

Mexico 19% 16% 0.16 1.25

Hong Kong (Chinese) 3% 2% 0.42 1.27

• However, other variants in the same gene are associated with diabetes

Investigation of “European” diabetes Investigation of “European” diabetes alleles in African Americansalleles in African Americans

*Dominant model (<10 counts for minor alllele homozygote)

Lewis et al. Diabetes 2008 (in press)

Gene SNP European Reported

Risk Allele

Admixture-Adjusted AdditiveP-value

Admixture-Adjusted OR

(95% CI)

PKN2 rs6698181 T 0.388 1.08 (0.91-1.29)

IGF2BP2 rs4402960 T 0.803 0.98 (0.87-1.11)

FLJ39370 rs17044137 A 0.747 0.98 (0.86-1.12)

CDKAL1 rs10946398 C 0.110 1.11 (0.98-1.26)

CDKAL1 rs7754840 C 0.136 1.10 (0.97-1.25)

SLC30A8 rs13266634 C 0.543* 1.46 (0.43-4.89)

CDKN2B/CDKN2A rs564398 T 0.320* 2.99 (0.34-25.98)

CDKN2B/CDKN2A rs10811661 T 0.128* 0.18 (0.02-1.64)

IDE/KIF11/HHEX rs1111875 C 0.767 1.02 (0.88-1.19)

IDE/KIF11/HHEX rs5015480 C 0.400 0.95 (0.83-1.08)

IDE/KIF11/HHEX rs7923837 G 0.303* 1.87 (0.57-6.12)

Intragenic rs9300039 C 0.029* 0.42 (0.19-0.91)

LOC387761 rs7480010 G 0.084 1.18 (0.98-1.44)

EXT2/ALX4 rs1113132 C 0.221* 0.47 (0.14-1.57)

EXT2/ALX4 rs11037909 T 0.511 0.94 (0.79-1.13)

EXT2/ALX4 rs3740878 A 0.129* 0.46 (0.17-1.26)

FTO rs8050136 A 0.783 1.02 (0.90-1.15)

TCF7L2* rs7903146 T 1.59x10-61.39 (1.21-1.60)

Allele frequencies differAllele frequencies differ

Lewis et al. Diabetes 2008 (in press)

Gene SNP European Reported

Risk Allele

Risk Allele Frequency Controls

Risk Allele Frequency

Cases

Reported Risk Allele Frequency Controls

Reported Risk Allele Frequency

Cases

α=0.05 α=0.10

PKN2 rs6698181 T 0.153 0.156 0.290 0.320 0.237 0.345

IGF2BP2 rs4402960 T 0.525 0.528 0.304 0.341 0.555 0.675

FLJ39370 rs17044137 A 0.329 0.326 0.230 0.270 0.060 0.115

CDKAL1 rs10946398 C 0.582 0.615 0.319 0.361 0.427 0.522

CDKAL1 rs7754840 C 0.585 0.616 0.360 0.387 0.427 0.552

SLC30A8 rs13266634 C 0.914 0.916 0.609 0.649 0.169 0.263

CDKN2B/CDKN2A rs564398 T 0.934 0.943 0.558 0.595 0.140 0.225

CDKN2B/CDKN2A rs10811661 T 0.933 0.927 0.850 0.872 0.304 0.422

IDE/KIF11/HHEX rs1111875 C 0.766 0.774 0.522 0.546 0.371 0.495

IDE/KIF11/HHEX rs5015480 C 0.633 0.621 0.425 0.379 0.470 0.595

IDE/KIF11/HHEX rs7923837 G 0.917 0.929 0.597 0.622 0.143 0.229

Intragenic rs9300039 C 0.889 0.884 0.892 0.924 0.584 0.701

LOC387761 rs7480010 G 0.858 0.890 0.301 0.336 0.062 0.117

EXT2/ALX4 rs1113132 C 0.915 0.920 0.733 0.763 0.475 0.600

EXT2/ALX4 rs11037909 T 0.862 0.859 0.729 0.760 0.913 0.953

EXT2/ALX4 rs3740878 A 0.907 0.914 0.728 0.760 0.760 0.846

FTO rs8050136 A 0.446 0.452 0.398 0.455 0.711 0.808

TCF7L2* rs7903146 T 0.284 0.354 0.181 0.227 0.997 0.999

Reported European Data

Power to Detect Association in African

Americans

African American Data

Can genetic Can genetic information change information change

practice in the clinic?practice in the clinic?

Neonatal diabetesNeonatal diabetes

Mutations of the ATP-sensitive inwardly-Mutations of the ATP-sensitive inwardly-rectifying potassium channel subunit Kir6.2 rectifying potassium channel subunit Kir6.2 ((KCNJ11KCNJ11) gene cause 30-58% of cases of ) gene cause 30-58% of cases of diabetes diagnosed in patients under six diabetes diagnosed in patients under six months of agemonths of age

The majority of cases (80-90%) are The majority of cases (80-90%) are de novode novo mutations, so won’t be identified on the mutations, so won’t be identified on the basis of family historybasis of family history

Neonatal diabetes –Neonatal diabetes –KCNJ11 mutationsKCNJ11 mutations

Pearson ER et al. N Engl J Med 2006, 355 (5), 467-477

In the beta-cell, glucose metabolism increases intracellular ATP In the beta-cell, glucose metabolism increases intracellular ATP production from ADPproduction from ADP

This leads to the closure of ATP-sensitive potassium channels and This leads to the closure of ATP-sensitive potassium channels and membrane depolarizationmembrane depolarization

Subsequent activation of voltage-dependent calcium channels and influx Subsequent activation of voltage-dependent calcium channels and influx of calcium results in insulin granule exocytosisof calcium results in insulin granule exocytosis

Patients with Patients with KCNJ11KCNJ11 mutations have K mutations have KATPATP channels with decreased channels with decreased sensitivity to ATPsensitivity to ATP

Channels remain open in the presence of glucoseChannels remain open in the presence of glucose Reducing insulin secretionReducing insulin secretion

Neonatal diabetesNeonatal diabetes

Since patients present with hyperglycemia, Since patients present with hyperglycemia, undetectable C-peptide, and frequently have undetectable C-peptide, and frequently have ketoacidosis (30%), they are often initially ketoacidosis (30%), they are often initially treated with insulintreated with insulin

A study of 49 patients showed that 90% could A study of 49 patients showed that 90% could successfully be treated with sulfonylureassuccessfully be treated with sulfonylureas

Pearson ER et al. N Engl J Med 2006, 355 (5), 467-477

PharmacogeneticsPharmacogenetics

Cytochrome P450 tableCytochrome P450 table

Stamer and Stuber. Genetic factors in pain and its treatment. Curr Opin Anaesthesiol. 2007 Oct;20(5):478-84.

http://http://medicine.iupui.edu/flockhart/table.htmmedicine.iupui.edu/flockhart/table.htm

Lanfear and McLeod. Pharmacogenetics: using DNA to optimize drug therapy. Am Fam Physician. 2007 Oct 15;76(8):1179-82.

Clinical trialsClinical trials

Genetic testing may allow selective Genetic testing may allow selective recruitment of participants in whom drug is recruitment of participants in whom drug is expected to be most efficaciousexpected to be most efficacious

Lower costs to bring drug to marketLower costs to bring drug to market Will it be approved for a select genetic Will it be approved for a select genetic

group?group?

Ethical issuesEthical issues

PrivacyPrivacy InsuranceInsurance

HealthHealth LifeLife DisabilityDisability

EmploymentEmployment

You can’t change your genes – You can’t change your genes – Why does genetics matter?Why does genetics matter?

Identify new pathways involved in disease predispositionIdentify new pathways involved in disease predisposition New “druggable” targetsNew “druggable” targets

More specific diagnosisMore specific diagnosis

PharmocogeneticsPharmocogenetics Identify genetic factors that influence an individual’s response Identify genetic factors that influence an individual’s response

to a particular therapyto a particular therapy Selection of therapiesSelection of therapies Clinical trial designClinical trial design

OutcomesOutcomes Recovery ratesRecovery rates Long-term sequelaeLong-term sequelae

Era of “personalized medicine”Era of “personalized medicine”

You can’t change your genes – You can’t change your genes – Why does genetics matter?Why does genetics matter?

Better prediction of who is at greatest risk Better prediction of who is at greatest risk and targeted early interventionand targeted early intervention

PREVENTION

J. Craig Venter

Results from Venter’s Results from Venter’s GenomeGenome

After QC filtering, 4.1 Million variants, 1.288M After QC filtering, 4.1 Million variants, 1.288M are novel to dbSNP (30%)are novel to dbSNP (30%) SNPs, indels, inversions, segmental duplication, and SNPs, indels, inversions, segmental duplication, and

more complex variationmore complex variation

78% of 4.1M are SNPs; the other 22% cover 78% of 4.1M are SNPs; the other 22% cover 9Mb of variant bases9Mb of variant bases

62 Copy Number Variants = 10Mb 62 Copy Number Variants = 10Mb Total of variation = 0.5% of genomeTotal of variation = 0.5% of genome Heterozygous Indels range from 1 - 321 bpHeterozygous Indels range from 1 - 321 bp

Levy et al, PLoS Biology, 2007

J. Craig VenterJ. Craig Venter

Carries:Carries: A gene variant linked to moist ear wax production A gene variant linked to moist ear wax production Genes linked to both heart disease (Genes linked to both heart disease (SORL1) SORL1) and longevity and longevity

Genes linked toGenes linked to Alzheimer’s (APOE) Alzheimer’s (APOE) Macular degeneration Macular degeneration High cholesterol High cholesterol Carries up to seven gene types linked to tobacco addiction Carries up to seven gene types linked to tobacco addiction

‘‘Project Jim’Project Jim’

Bio-IT World June 2007

1.3 percent of Watson’s genome did not match the existing reference genome. > 600,000 novel SNPs< 68,000 insertions and deletions compared to the reference sequence, 3bp - 7kbases

http://http://www.personalgenomes.orgwww.personalgenomes.org//

23andMe - Genetics Just Got Perso23andMe - Genetics Just Got Personal.nal.

NavigenicsNavigenics Home Home

Documents

Applied research in human genetics Weibin Shi Michele Sale