24
PUBH 8445: Lecture 1 Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota [email protected]

PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

Page 1: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

PUBH 8445: Lecture 1

Saonli Basu, Ph.D.

Division of BiostatisticsSchool of Public HealthUniversity of Minnesota

[email protected]

Page 2: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Statistical Genetics

It can broadly be classified into three sub categories:

Mendelian Genetics: studies the transmission of alleles inpedigrees.Population Genetics: the rules of how genes behave inpopulation.Quantitative Genetics: the rules of transmission of complexquantitative traits, those with both a genetic andenvironmental basis.

Saonli Basu PUBH 8445: Lecture 1

Page 3: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Genetic Terminologies

DNA, or deoxyribonucleic acid, is the hereditary material in humansand almost all other organisms. Nearly every cell in a person’s bodyhas the same DNA. Most DNA is located in the cell nucleus (whereit is called nuclear DNA), but a small amount of DNA can also befound in the mitochondria (where it is called mitochondrial DNA ormtDNA).

The information in DNA is stored as a code made up of fourchemical bases: adenine (A), guanine (G), cytosine (C), andthymine (T). Human DNA consists of about 3 billion bases, andmore than 99 percent of those bases are the same in all people. Theorder, or sequence, of these bases determines the informationavailable for building and maintaining an organism.

DNA bases pair up with each other, A with T and C with G, toform units called base pairs. Each base is also attached to a sugarmolecule and a phosphate molecule. Together, a base, sugar, andphosphate are called a nucleotide. Nucleotides are arranged in twolong strands that form a spiral called a double helix.

Saonli Basu PUBH 8445: Lecture 1

Page 4: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Genetic Terminologies

Cell nucleus

Adenine

Base pairs [

Thymine •,Guanine

Base pairs [

Cytosine • DNA's Double Helix. DNA molecules are found inside the cell's nucleus, tightly packed into chromosomes. Scientists use the term "double helix" to describe DNA's winding, two-stranded chemical structure. Alternating sugar and phosphate groups form the helix's two parallel strands, which run in opposite directions. Nitrogen bases on the two strands chemically pair together to form the interior, or the backbone of the helix. The base adenine (A) always pairs with thymine (T), while guanine (G) always pairs with cytosine (C).

Saonli Basu PUBH 8445: Lecture 1

Page 5: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Terminologies(contd)

Chromosome: The entire genome (complete set of nuclearDNA) is arranged in pairs of chromosomes.

There are 22 autosomes and 2 sex chromosomes.

For every pair of chromosomes, one is inherited from themother of an individual and one is inherited from the father ofan individual.

Chromosomes that are of the same pair and carry the sameset of genes and are called homologous.

Saonli Basu PUBH 8445: Lecture 1

Page 6: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Terminologies(contd)

Saonli Basu PUBH 8445: Lecture 1

Page 7: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

terminologies(contd)

Locus: Each position of the genome is called a “locus”(“loci” for multiple locations). A locus could represent asingle base position or a collection of bases.

Allele: The variations observed in the human population at alocus are called the “alleles” for that locus. If the locusrepresents a single base, there could be at most twovariations. This type of locus is called “Single NucleotideVariation” (SNP). There are markers called microsattelites(Short Tandem Repeats: GTAGTAGTAGTAGTA...)

Gene: A gene is the basic physical and functional unit ofheredity. Genes, which are made up of DNA, act asinstructions to make molecules called proteins. In humans,genes vary in size from a few hundred DNA bases to morethan 2 million bases. The Human Genome Project hasestimated that humans have about 20,000 genes.

Saonli Basu PUBH 8445: Lecture 1

Page 8: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Mendelian Inheritance

In cross-pollinating plants that either produce yellow or greenpea seeds exclusively, Mendel found that the first offspringgeneration (f1) always has yellow seeds. However, thefollowing generation (f2) consistently has a 3:1 ratio of yellowto green.

This 3:1 ratio occurs in later generations as well.

Saonli Basu PUBH 8445: Lecture 1

Page 9: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Mendelian Inheritance

Law of Segregation: for any particular trait, the pair ofalleles of each parent separate and only one allele passes fromeach parent on to an offspring. Which allele in a parent’s pairof alleles is inherited is a matter of chance. We now knowthat this segregation of alleles occurs during the process ofmeiosis.

Saonli Basu PUBH 8445: Lecture 1

Page 10: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Terminologies(contd)

Genotype: The specific combination of alleles for a givenlocus/gene. Going back to Mendel’s plants, we can now say that allof his true-breeding plants contained two of the same alleles at thegene location for seed color. Yellow plants in this P generation hadtwo alleles for yellow color (YY), and green P generation plants hadtwo alleles for green color (GG). When two alleles at a locus areidentical, the individual is said to be homozygous at the location.

On the other hand, crossing the two color plants to produce F1hybrids created a generation of plants with one Y allele and one Gallele (YG). An organism with two opposing alleles at a location issaid to be heterozygous.

Saonli Basu PUBH 8445: Lecture 1

Page 11: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

terminologies(contd)

Phenotype: The genetic makeup of a certain trait (e.g., YY, YGand GG) is called its genotype, while the physical expression ofthese traits (e.g., yellow or green) is called a phenotype.

Dominant/Recessive: For the pea plants, if the Y allele isdominant and the G allele is recessive, only two phenotypesare possible. Both the plants with YG and YY genotypes willhave the yellow color phenotype, while the plants with the GGgenotype will have the green color phenotype.A trait is the general aspect of physiology being shown in thephenotype. So, for example, the trait here is the pea seed-colorof the pea plant. The phenotype can be either yellow or greencolor, depending on the genotype.

Saonli Basu PUBH 8445: Lecture 1

Page 12: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Example: Genotype vs Phenotype

The ABO locus is on chromosome 9

The (main) alleles at the locus are A, B, and O.

The 6 genotypes are AA, AO, BB, BO, AB and OO

Homozygotes are AA,BB,OO; Heterozygotes are AO,BO andAB.

The 4 phenotypes are blood types A, B, AB and O

O allele is recessive to A and to B; A and B are eachdominant to O

AO and AA are blood type A; BB and BO are blood type B.

A and B are codominant: AA, AB and BB are distinguishable.

Saonli Basu PUBH 8445: Lecture 1

Page 13: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Mendel’s Principles of Genetic Inheritance

Law of Independent Assortment: In the gametes, alleles ofone gene separate independently of those of another gene, andthus all possible combinations of alleles are equally probable.

Law of Dominance: Each trait is determined by two factors(alleles), inherited one from each parent. These factors eachexhibit a characteristic dominant, co-dominant, or recessiveexpression, and those that are dominant will mask theexpression of those that are recessive.

Saonli Basu PUBH 8445: Lecture 1

Page 14: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Transmission of Alleles

Basic Concepts

\A B

a b

High LD -> No Recombination

(r2 = 1) SNP1 “tags” SNP2

A B

A B

A B

a b

a b

a b

Low LD -> Recombination

Many possibilities

A b

A ba Ba b

A BA B

a B

A b

etc…

A B

A B

X

OR

Parent 1 Parent 2

A B

a b

Saonli Basu PUBH 8445: Lecture 1

Page 15: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Quantitative Genetics

Quantitative genetics is the study of these polygenictraitsQuantitative genetic variation can be described in threeways:

Traits are influenced by multiple genes, i.e. theyre polygenic.They are usually influence more easily by environmental factorsthan simple Mendelian traits.Both of the factors above usually lead to a continuousdistribution of the particular trait. For example, you can seethe near normal distribution when comparing a samplepopulation by their height.

Saonli Basu PUBH 8445: Lecture 1

Page 16: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Allelic architecture and mapping strategy

Steps in Positional Cloning

Schuler (1996) Science

Saonli Basu PUBH 8445: Lecture 1

Page 17: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Broad Genetic Epidemiology Study Design Categories:

Linkage AnalysisFollows meiotic events through families for co-segregation ofdisease and particular genetic variants

Large FamiliesSibling Pairs (or other family pairs)Works VERY well for “Mendelian” diseases

Association Studies Detect association between geneticvariants and disease across families: exploits linkagedisequilibrium.

Case-Control designsCohort designsParents with affected child trios (TDT)May be more appropriate for complex diseases

Saonli Basu PUBH 8445: Lecture 1

Page 18: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Allelic architecture and mapping strategy

Mag

nitu

de o

f effe

ct

Frequency in population

Family-based linkage studies

Association studies in populations

Unlikely to exist

Fn. Studies

Slide thanks to D. Altshuler

Saonli Basu PUBH 8445: Lecture 1

Page 19: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Gene Discovery

5

800

Human Complex TraitsHuman Mendelian Trait

20001995199019851980

1800

1600

1400

1200

1000

600

400

200

45

40

35

30

25

20

15

10

Source: ‘Finding genes that underlie complex traits’Glazier AM, Nadeau JH, Aitman TJ Science,2002

Saonli Basu PUBH 8445: Lecture 1

Page 20: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

World-wide distribution of the IB (ABO) allele

Saonli Basu PUBH 8445: Lecture 1

Page 21: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Hapmap Project

Scientists thought the mutations that caused commondiseases would themselves be common.

They first identified the common mutations in the humanpopulation in a $100 million project called the HapMap. Thenthey compared patient’s genomes with those of healthygenomes. The comparisons relied on ingenious devices calledSNP chips, which scan just a tiny portion of the genome.

These projects, called genome-wide association studies(GWAS), each cost several millions.

The results of this costly international exercise have beendisappointing. About 2,000 sites on the human genome havebeen statistically linked with various diseases, but in manycases the sites are not inside working genes, suggesting theremay be some conceptual flaw in the statistics.

Saonli Basu PUBH 8445: Lecture 1

Page 22: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Hapmap project

View variation patterns

Triangle plot shows LD values using r2 or

D’/LOD scores in one or more HapMappopulations

Phased haplotype track shows all 120 chromosomes with

alleles colored yellow and blue

Saonli Basu PUBH 8445: Lecture 1

Page 23: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

1000 genome project

Searc

Search Health 3,000+ Topics

Enlarge This Image

Michael Stravato for The New York Times

Dr. James R. Lupski, a medical geneticist with a nerve disease, had his whole genome decoded.

Multimedia

Disease Cause Is Pinpointed With GenomeBy NICHOLAS WADEPublished: March 10, 2010

Two research teams have independently decoded the entire genome of patients to find the exact genetic cause of their diseases. The approach may offer a new start in the so far disappointing effort to identify the genetic roots of major killers like heart disease, diabetes and Alzheimer’s.

In the decade since the first full genetic code of a human was sequenced for some $500 million, less than a dozen genomes had been decoded, all of healthy people.

Geneticists said the new research showed it was now possible to sequence the entire genome of a patient at reasonable cost and with sufficient accuracy to be of practical use to medical researchers. One subject’s genome cost just $50,000 to decode.

“We are finally about to turn the corner, and I suspect that in the next few years human genetics will finally begin to systematically deliver clinically meaningful findings,” said

WisconPlay

Log in toare shaPrivacy

What’

Self-InjuFebruary 2

VegetabFebruary 1

RemediFebruary 1

A DoctoFebruary 1

In SurpFebruary 1

HOME PAGE TODAY'S PAPER VIDEO MOST POPULAR TIMES TOPICS

ResearchWORLD U.S. N.Y. / REGION BUSINESS TECHNOLOGY SCIENCE HEALTH SPORTS OPINION ARTS S

RESEARCH FITNESS & NUTRITION MONEY & POLICY VIEWS HEALTH G

COMMENTS (70)

SIGN IN TO E-MAIL

PRINT

REPRINTS

SHARE

RECOMMEND

TWITTER

TimesPeople recommended: Wisconsin Power Play 1:50 PMWelcome to TimesPeopleGet Started Recommend

Page 1 of 5Disease Cause Is Pinpointed With Genome - NYTimes.com

2/21/2011http://www.nytimes.com/2010/03/11/health/research/11gene.html

Saonli Basu PUBH 8445: Lecture 1

Page 24: PUBH 8445: Lecture 1 - Biostatistics - Academic Divisionssaonli/PUBH8445/Week1/Lecture1_2013.pdf · genomes had been decoded, all of healthy people. Geneticists said the new research

Why Statistical Genetics?

Extremely interesting and fun projects.

Rewarding and gratifying to see the importance of learningstatistical techniques.

Huge demand for statisticians and lots of money currentlyinvested in developing statistical techniques to facilitate thediscovery of personalised medicines.

Saonli Basu PUBH 8445: Lecture 1