9.3 Part 1

PART 1 : TESTING AND THE STANDARD ERROR OF ESTIMATE

9.3 Inferences for Correlation and Regression

Inferences for Correlation and Regression

In Sections 9.1 and 9.2, we learned how to compute the sample correlation coefficient and the least-squares line using data from a sample. is only a sample correlation coefficient is only a “sample – based” least-squares line What if we used all possible data pairs? In theory, if we had the population of all (x, y) pairs,

then we could compute the population correlation coefficient (Greek letter rho) and we could compute the least-squares line for the entire population

Inferences for Correlation and Regression

Note the following:Sample Statistic Population Parameter

Requirements for Statistical Inference To make inferences regarding correlation and linear regression, we need to be

sure thata) The set (x, y) of ordered pairs is a random sample from the population of all

possible such (x, y) pairsb) For each fixed value of x, the y values have a normal distribution. All of the

y distributions have the same variance, and, for a given x value, the distribution of y values has a mean that lies on the least-squares line. We also assume that for a fixed y, each x has its own normal distribution. In most cases the results are still accurate if the distributions are simply mound-shaped and symmetric and the y variances are approximately equal.

We assume these conditions are met for all inferences presented in this section.

Testing the Correlation Coefficient

The first topic we want to study is the statistical significance of the sample correlation coefficient r.

To do this, we construct a statistical test of , the population correlation coefficient.

How to Test the population correlation coefficient

Let be the sample correlation coefficient computed using data pair ()

1. Use the null hypothesis (x and y have no linear correlation). Use the context of the application to state the alternate hypothesis ( or ). State the level of significance .

2. Obtain a sample of data pairs and compute the sample test statistic

with degrees of freedom 3. Use the TI-83 or TI-84 to calculate the p-value

STAT, TESTS, LinRegTTest

4. Conclude the test If the P-values is , then reject If the P-values is , then fail to reject

5. Interpret the results in the context of your application

Example: Testing

Do college graduates have an improved chance at a better income? Is there a trend in the general population to support the “learn more, earn more” statement? We suspect the population correlation is positive, let’s test using a 1% level of significance. Consider the following variables: x = percentage of the population 25 or older with at least four years of college and y = percentage growth in per capita income over the past seven years. A random sample of six communities in Ohio gave the information shown

Table 9.10

Education and Income Growth Percentages

Solution

P-value therefore we reject

At the 1% level of significance, weconclude that the population correlation coefficient between x and y is positive.

Using the calculator:

Caution: Although we have shown that x and y are positively correlated, we have not shown that an increase in education causes an increase in earnings.

You Try It!

A medical research team is studying the effect of a new drug on red blood cells. Let x be a random variable representing milligrams of the drug given to a patient. Let y be a random variable representing red blood cells per cubic milliliter of whole blood. A random sample of volunteer patients gave the following results.

Use a 1% level of significance to test the claim that

x 9.2 10.1 9.0 12.5 8.8 9.1 9.5

y 5.0 4.8 4.5 5.7 5.1 4.6 4.2

Solution

P-value therefore we fail to reject

At the 1% level of significance, the evidence is not strong enough to indicate any correlation between the amount of drug administered and the red blood cell count.

Using the calculator:

Standard Error of Estimate

Sometimes a scatter diagram clearly indicates the existence of a linear relationship between x and y, but it can happen that the points are widely scattered about the least-squares line. We need a method (besides just looking) for measuring the spread of a set of points about the least-squares line. There are three common methods of measuring the spread. the coefficient of correlation the coefficient of determination the standard error of estimate


For the standard error of estimate, we use a measure of spread that is in some ways like the standard deviation of measurements of a single variable. Let = a + bx be the predicted value of y from the least-squares line.

Then y –is the difference between the y value of the data point (x, y) shown on thescatter diagram (Figure 9-16)and the value of the point on the least-squares line with the same x value.

Figure 9.16

The Distance Between Points (x, y) and (x, )


The quantity y – is known as the residual. To avoid the difficulty of having some positive and some negative values, we square the quantity (y –).

Then we sum the squares and, for technical reasons, divide this sum by n – 2. Finally, we take the square root to obtain the standard error of estimate, denoted by Se.

Standard Error of Estimate = where and


Using the TI 83 & TI 841. STAT2. TEST3. LinRegTTest

The value for is given as

Example

June and Jim are partners in the chemistry lab. Their assignment is to determine how much copper sulfate (CuSO4) will dissolve in water at 10, 20, 30, 40, 50, 60, and 70C.

Their lab results are shown in Table 9-12, where y is the weight in grams of copper sulfatethat will dissolve in 100 grams ofwater at xC. Sketch a scatter diagram, find the equation of the least-squares line, and compute Se.

Table 9.12

Lab Results (x = C, y = amount of CuSo4)

Solution

Using the TI-83 & TI-84

Figure 9.17

Scatter Diagram and Least-Squares Line for Chemistry Experiment

Documents

9.3 Part 1