19
Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Embed Size (px)

Citation preview

Page 1: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Chi-Square Analysis

Goodness of Fit

"Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Page 2: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

We will develop the use of the distribution through an example from biology.

Consider two different characteristics of tomatoes, leaf shape and plant size. The leaf shape may be potato-leafed or cut-leafed, and the plant may be tall or dwarf.

If we cross a tall cut-leaf tomato with a dwarf potato-leaf tomato and examine the progeny we will discover a uniform F1 generation.

The traits tall and cut-leaf are each dominant, while dwarf and potato-leaf are recessive. We use the letter T for height, and C for leaf shape, so the alleles are T, t, C, and c.

χ2

Page 3: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

gametes TC TC

tc TtCc TtCc

tc TtCc TtCc

Tall cut-leaf tomatoD

war

f po

tato

-lea

f to

mat

o

We will examine a Punnett square to illustrate this dihybrid cross.

Notice the uniformity among the offspring, all are TtCc.

Page 4: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Now we cross the F1 among themselves to produce the F2:

gametes TC Tc tC tc

TC TTCC TTCc TtCC TtCc

Tc TTCc TTcc TtCc Ttcc

tC TtCC TtCc ttCC ttCc

tc TtCc Ttcc ttCc ttcc

Page 5: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Now we identify the tall cut-leaf tomatoes:

gametes TC Tc tC tc

TC TTCC TTCc TtCC TtCc

Tc TTCc TTcc TtCc Ttcc

tC TtCC TtCc ttCC ttCc

tc TtCc Ttcc ttCc ttccttCc

Page 6: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Now we identify the tall potato-leaf tomatoes:

gametes TC Tc tC tc

TC TTCC TTCc TtCC TtCc

Tc TTCc TTcc TtCc Ttcc

tC TtCC TtCc ttCC ttCc

tc TtCc Ttcc ttCc ttcc

Page 7: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Next we identify the dwarf cut-leaf tomatoes:

gametes TC Tc tC tc

TC TTCC TTCc TtCC TtCc

Tc TTCc TTcc TtCc Ttcc

tC TtCC TtCc ttCC ttCc

tc TtCc Ttcc ttCc ttcc

Page 8: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

gametes TC Tc tC tc

TC TTCC TTCc TtCC TtCc

Tc TTCc TTcc TtCc Ttcc

tC TtCC TtCc ttCC ttCc

tc TtCc Ttcc ttCc ttcc

Finally, the last type of tomato is dwarf potato-leaf:

Page 9: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

So now we have four phenotypes (different physical forms) of tomatoes originating from the single phenotype of the F1 generation.

They are, along with their genotypes and expected frequencies:

Tall cut-leafTTCC, TTCc, TtCC, TtCc

Tall potato-leaf

TTcc, Ttcc

Dwarf cut-leaf ttCC, ttCc

Dwarf potato-leaf

ttcc

916

316

116

316

Page 10: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

If our understanding of genetics is correct and we have constructed the crosses we believe we have, we expect the proportions of the four phenotypes to fit our calculations.

With the distribution, we are able to test to see if groups of individuals are present in the same proportions as we expect.

This is rather like conducting multiple Z-tests for proportions, all at once.

In this example we carry out the dihybrid cross to produce an F1 generation, and, as expected, the F1 are all of the same phenotype, tall and cut-leaf.

Further, the F1 are crossed among themselves to produce the F2 generation. We record the numbers of individuals in each category.

χ 2

Page 11: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

The following table gives the observed numbers of each category.

Phenotype ObservedExpected frequency

Tall cut-leaf 926

Tall potato-leaf

288

Dwarf cut-leaf

293

Dwarf potato-leaf

104

916

316

116

316

Page 12: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

To make a test for “goodness of fit” we start as with all other tests of significance, with a null hypothesis.

Step 1: H0: The F2 generation is comprised of four phenotypes in the proportions predicted by our calculations (based on Mendelian genetics).

Ha: The F2 generation is not comprised of four phenotypes in the proportions predicted by our calculations.

Another way of saying this is that for the null hypothesis the population fits our expected pattern, and for the alternate hypothesis, it does not fit our pattern.

χ 2

Page 13: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Step 2:Assumptions: Our first assumption is that our data are counts. (We cannot use proportions or means.) With , we do not always have a

sample of a population, and sometimes examine an entire population, as with this example. We must ensure that we have a representative sample, when we work from a sample.

1. All expected counts must be one or more.

In order to check assumptions for this goodness of fit test we must calculate the expected counts for each category. Then we must meet two criteria:

2. No more than 20% of the counts may be less than 5.

χ 2

Page 14: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

We calculate the expected counts by finding the total number of observations and multiplying that by each expected frequency.

PhenotypeObserved

countsExpected frequency

Expected counts

Tall cut-leaf 926

Tall potato-leaf 288

Dwarf cut-leaf 293

Dwarf potato-leaf

104

916

316

1611( ) ≈302.063

316

116

916

1611( ) ≈906.188

316

116

1611( ) ≈100.688

316

1611( ) ≈302.063

Page 15: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

As you can see, all expected counts are greater than 5, so all assumptions are met.

Step 3: The formula for the test statistic is:

χ 2 =(o − e)2

e∑ where o = observed counts, and

e = expected counts

This calculation needs to be made in the graphing calculator.

Enter the observed counts in L1. Enter the expected frequencies in L2, as exact numbers. (Enter numbers like 1/3, directly, as fractions, never round to just .3 or .33.)

χ 2

Page 16: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

In L3 multiply L2 by 1611. This will give the expected counts. The sum of L1 can be found using 1-Var Stats.

Now in L4, enter , this will give you the contribution for each category.

(L1 −L3)2 / L3

Finally, is the sum of L4.

For this problem, the statistic is 1.4687.

In , we always need to know and report the degrees of freedom. The degrees of freedom are the number of categories minus one.

Here we have 3 degrees of freedom.

χ 2

χ 2

χ 2

χ 2

Page 17: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Step 4:

Page 18: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

Step 5: P(χ 2 >1.4687) =.6895

Step 6: Fail to reject H0, a test statistic this large may occur by chance alone almost 70% of the time.

Step 7:We lack strong evidence that the pattern of tomato phenotypes is different from the expected. That is, the F2 generation are present in the expected proportions.

The area can also be found with

cdf(1.4687,10^99,3).χ 2

Page 19: Chi-Square Analysis Goodness of Fit "Linkage Studies of the Tomato" (Trans. Royal Canad. Inst. (1931))

THE END