10
Name: ____________________________________________ Unit 3 – Regression and Correlation Lesson 2: Least Squares Regression & Correlation PRACTICE PROBLEMS I can describe the properties of a regression line and measure the strength of a linear relationship. Investigation Practice Problem Options Max Possible Points Total Points Earned Investigation 1: How Good is the Fit? #1, 2, 10, 11 14 points Investigation 2: Behavior of the Regression Line #3, 4 10 points Investigation 3: How Strong is the Association? #5, 6 8 points Investigation 4: Association and Causation #7, 8, 9 8 points ________/40 points ** In order to earn credit for practice problems, ALL WORK must be shown.**

Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

!Name:!____________________________________________!!

Unit 3 – Regression and Correlation

Lesson 2: Least Squares Regression & Correlation

PRACTICE PROBLEMS

I can describe the properties of a regression line and measure the strength of a linear relationship.

!

Investigation Practice Problem

Options Max Possible

Points Total Points

Earned Investigation 1: How Good is the

Fit? #1, 2, 10, 11 14 points

Investigation 2: Behavior of the Regression Line #3, 4 10 points

Investigation 3: How Strong is the Association? #5, 6 8 points

Investigation 4: Association and Causation #7, 8, 9 8 points

!

!

________/40 points !

** In order to earn credit for practice problems, ALL WORK must be shown.**

Page 2: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

LESSON 2 • Least Squares Regression and Correlation 305

On Your Own

Applications

1 The following table and plot over time (years in this case) give the hippopotamus population on the Luangwa River in Zambia for various years between 1970 and 1984.

Year Number of Hippos

1970 2,815

1972 2,919

1975 2,342

1976 4,501

1977 5,147

1978 4,765

1979 5,151

1981 4,884

1982 6,293

1983 6,544

2,000

3,000

4,000

5,000

6,000

7,000

Year1970 1972 1974 1976 1978 1980 1982 1984

Num

ber o

f Hip

pos

Source: Lawrence C. Hamilton, Regression with Graphics, page 179.

a. Find the equation of the regression line and graph it on a copy of the scatterplot.

b. What is the slope of the regression line? Interpret this slope in the context of these data.

c. For which year is the residual largest in absolute value? Estimate this residual using the scatterplot. Then find the value of this residual using the regression equation. Finally, interpret this residual.

d. Use the regression equation to predict the hippopotamus population for the current year. How much faith do you have in this prediction?

/2/2/4

/2

Page 3: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

306 UNIT 4 • Regression and Correlation

On Your Own

2 The age of a tree can often be determined by counting its rings. However, in tropical forests, annual tree rings do not always exist. Researchers measured the diameter of 20 large trees from a central Amazon rain forest and found their ages using carbon-14 dating. The results appear in the following table and scatterplot. The regression equation for predicting the age of a tree from its diameter is y = 4.39x - 19.

Diameter (in cm)

Age (in years)

180 1,372

120 1,167

100 895

225 842

140 722

142 657

139 582

150 562

110 562

150 552

Diameter (in cm)

Age (in years)

115 512

140 512

180 455

112 352

100 352

118 249

82 249

130 227

97 227

110 172

0

400

800

1,200

1,600

80 100 120 140 160 180 200 220 240Diameter (in cm)

Age

(in

year

s)

Source: Statistics for the Life Sciences, 3rd Ed., Myra L. Samuels and Jeffrey A. Witmer, pages 575–576, 2003. Their source: Jeffrey Q. Chambers, Niro Higuchi & Joshua P. Schimel. Ancient trees in Amazonia, Nature, 391 (1998) 135–136.

a. Interpret the slope of the regression line in the context of these data.

b. Use the regression equation to predict the age of a tree that is 125 cm in diameter.

c. For which tree diameter is the residual largest? Estimate the value of this residual from the scatterplot. Then find the value of this residual using the regression equation. Finally, interpret this residual.

d. Does it appear that the age of a tree can reasonably be predicted from measuring its diameter?

/2/1/4

/1

Page 4: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

LESSON 2 • Least Squares Regression and Correlation 307

On Your Own

3 The following table and scatterplot show the average gestation periods (length of pregnancy) and average life spans of various mammals. The regression equation for predicting average longevity from gestation is y = 0.0425x + 6.2.

Mammal Gestation(in days)

Average Longevity(in years) Mammal Gestation

(in days)Average Longevity

(in years)

Baboon 187 20 Goat 151 8

Black Bear 219 18 Gorilla 258 20

Beaver 105 5 Horse 330 20

Bison 285 15 Leopard 98 12

Cat 63 12 Lion 100 15

Chimpanzee 230 20 Moose 240 12

Cow 284 15 Rabbit 31 5

Dog 61 12 Sheep 154 12

African Elephant 660 35 Squirrel 44 10

Fox (red) 52 7 Wolf 63 5

Gestation and Life Span of Some Mammals

Source: World Almanac and Book of Facts 2001. Mahwah, NJ: World Almanac, 2001.

0

10

20

30

Gestation (in days)0 100 200 300 400 500 600 700

40

Ave

rage

Lon

gevi

ty(in

yea

rs)

a. Does a line appear to be an appropriate model of this situation?

b. What is the slope of the regression line? What does the slope indicate in the context of these data?

c. Use the regression line to predict the average life span of elk that have a gestation time of 250 days. How much faith would you have in the prediction?

d. Domestic pigs have a 112-day gestation period and live for an average of 10 years. Find and interpret the error of prediction for the domestic pig.

e. Verify that the regression line contains the centroid ( − x , − y ).

f. Identify a potential influential point in these data and determine how influential it is with respect to the regression equation.

/1/2/2

/2

/1/2

Page 5: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

308 UNIT 4 • Regression and Correlation

On Your Own

4 The following table and plot over time give the federal minimum wage in dollars in the United States for the years when Congress passed an increase in the minimum wage. The regression equation for predicting the minimum wage given the year is y = 0.1027x - 200.26.

Year Federal Minimum Wage(in dollars)

1955 0.75

1956 1.00

1961 1.15

1963 1.25

1967 1.40

1968 1.60

1974 2.00

1975 2.10

1978 2.65

1979 2.90

1980 3.10

1981 3.35

1990 3.80

1991 4.25

1996 4.75

1997 5.15

2007 5.85

1

2

3

4

5

1950 1960 1970 1980 1990Year

02000

Min

imum

Wag

e(in

dol

lars

)

2010

6

a. Is a line a reasonable model for these data?

b. Verify that the regression line contains the centroid ( − x , − y ) ≈ (1977.53, 2.77).

c. What is the slope of the regression line? What does it mean in the context of these data?

d. Check that the sum of the residuals is 0. Then find the sum of the squared residuals.

e. Use the regression line to predict the minimum wage for the current year. What was your error in prediction?

/1/1/2/2

/2

Page 6: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

LESSON 2 • Least Squares Regression and Correlation 309

On Your Own

5 A table and scatterplot showing the amount of fiber and the number of calories in one cup of various kinds of cereal are shown below.

Cereal Calories Fiber(in gm) Cereal Calories Fiber

(in gm)

Alpha-Bits 133.5 1.5 Honey Graham Oh’s 149 1

Apple Jacks 115.5 0.5 Honey Nut Cheerios 114.5 1.5

Cap’n Crunch 143 1 Kix 85.5 0.5

Cheerios 109.5 2.5 Lucky Charms 116 1

Cocoa Puffs 119 0 Product 19 110 1.5

Corn Chex 113.5 0.5 Puffed Rice 53.5 0

Corn Flakes 102 0.5 Raisin Bran (Kelloggs) 196.5 8

Froot Loops 117.5 0.5 Rice Krispies 99.5 0.5

Frosted Mini-Wheats 186.5 6 Special K 114.5 6.5

Golden Grahams 154 1 Total 140.5 3.5

Grape Nuts 389 11 Trix 122.5 0.5

Grape Nuts Flakes 144.5 4 Wheaties 110 2

Cereal Nutrition Information

Source: www.cereal.com/nutrition/compare-cereals.html

Calo

ries

20 4 6 8 10 120

200

300

400

Fiber (in g)

100

a. Describe the relationship between the grams of fiber and the calories in a serving of cereal.

b. Which of the following do you estimate is closest to the correlation?

r = -0.8 r = -0.3 r = 0.5 r = 0.8

c. Which cereal is a potential influential point? What will happen to the slope of the regression line if it is removed from the data set?

/2

/1

/2

Page 7: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

310 UNIT 4 • Regression and Correlation

On Your Own

6 The average length and weight of five different kinds of seals are given below.

Seal SizesSeals Length (in ft) Weight (in lbs)

Ribbon Seal 4.8 176

Bearded Seal 7.0 660

Hooded Seal 8.0 880

Common Seal 5.2 220

Baikal Seal 4.2 187

Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990.

100

300

500

700

900

Length (in feet)4.0 5.0 6.0 7.0 8.0

Wei

ght (

in p

ound

s)

a. Estimate the correlation between the average length and weight of the seals.

b. Calculate the correlation. How close is r to your estimate?

c. If you include the Northern Elephant seal at 14.4 feet long and 5,500 pounds, how do you think the correlation will be affected? Check your conjecture.

d. Do you think a line is a good model of the data? Why or why not?

e. Suppose in the table above, you converted each length to meters and each weight to kilograms. (A foot is 0.3048 meters, and a pound is about 0.454 kg.) What would be the correlation? Explain.

7 Consider the following two situations involving possible lurking variables.

a. Examine the following plots of mean earnings and years of schooling for men and women who are year-round, full-time workers, 25 years and older.

40

60

80

100

120

Years of Schooling6 8 10 12 16

Mal

eM

ean

Inco

me

(in th

ousa

nds

of d

olla

rs)

20

018 20 2214

40

60

80

100

120

Years of Schooling6 8 10 12 16

Fem

ale

Mea

n In

com

e(in

thou

sand

sof

dol

lars

)

20

018 20 2214

Source: U.S. Census Bureau, Current Population Survey, Annual Social and Economic Supplement. www.census.gov/hhes/www/income/histinc/p22.html

/2/3/2

/2/2

/2

Page 8: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

LESSON 2 • Least Squares Regression and Correlation 311

On Your Own

i. As you can see, for people in the United States, there is a high correlation between number of years of schooling S and yearly income I. One theory is that the correlation is high because jobs that pay well tend to require many years of schooling. Model this theory by a directed graph.

ii. Some people have suggested that there is a lurking variable P, which is the economic status of the person’s parents. That is, a person whose parents have more money tends to have the opportunities to earn more money. He or she also tends to be able to stay in school longer. Model this theory by a directed graph.

b. Examine the following report of research in which some possible lurking variables have been controlled.

Schooling Pays Off on Payday

Workers earn more from their investment in education than had been thought, a new study says. Students can increase their future income by an average of 16% for each year they stay in school, the study reports. Researchers Alan Krueger and Orley Ashenfelter, both of Princeton, based their estimate on interviews with 250 sets of twins. They correlated differences in wages and years of schooling within sets of twins.

Source: Todd Wallack, USA Today, September 1993.

i. How have the researchers controlled for some lurking variables? Which lurking variables have been controlled by this method?

ii. What lurking variable(s) has not been controlled?

8 The following article appears to claim that unemployment allows people to live longer.

Study Links Job Loss, Longer Life

As the economy enters another year of expansion and low unemployment, new research suggests that loss of a job may actually contribute to a healthier, longer life for at least some Americans. Christopher Ruhm, a professor of economics at the University of North Carolina at Greensboro, has concluded in a study that higher unemployment may lead to lower overall mortality rates and reduce fatalities from several major causes of death. The new study, which looks at state-level data compiled between 1972 and 1992, suggests that a 1 percentage point rise in the unemployment rate lowers the total death rate by 0.5 percent.

Source: San Diego Union-Tribune, January 27, 1997. Reprinted by permission of Reuters.

a. What variable is said to be the explanatory variable? The response variable?

/1

/1

/2

/2

/2

Page 9: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

312 UNIT 4 • Regression and Correlation

On Your Own

b. Suppose you were to graph these data on a scatterplot.

i. What would each point represent?

ii. What variable would go on the x-axis? On the y-axis?

iii. What would be the slope of the regression line?

c. Name a lurking variable that might explain the relationship between higher unemployment rates and lower death rates.

9 In each of the following news clips, a study is reported that revealed an association between two variables. Comment on the validity of the conclusion and whether or not you think there is a cause-and-effect relationship between the two variables.

a. USA Today (June 14, 2001) reported a study by researcher Lilia Cortina of the University of Michigan-Ann Arbor that rudeness in the workplace is damaging mental health and lowering productivity. “As encounters with uncivil behavior rose, so did symptoms of anxiety and depression. … Incidents of rude behavior were tied to less job satisfaction for the employee and lower productivity.

b. Study Links Parental Bond to Teenage Well-Being

by Judy Foreman

A study published in the Journal of the American Medical Association finds that strong emotional connection to a parent is the factor most strongly associated with teenagers’ “well-being”, as measured by health, school performance, and avoidance of risky behavior. The correlations were found to hold regardless of family income, education, race, and the specific amount of time a parent spends with a child or family structure.

From an initial 1995 survey of 90,000 students in grades 7 through 12, the study focused on 12,000 teenagers, who were interviewed individually at home in 1995 and again in 1996. The study was praised for its breadth and depth, and the data are expected to be a continuing source of material for investigation.

Among the findings already reported here are the following. High parental expectation for school performance were associated with lower incidence of risky behavior. Feeling that at least one adult at school treats them fairly was associated with lower risk in every health category studied except for pregnancy. Students with easy access to guns, alcohol, tobacco at home were more likely to use them or to engage in violence.

Source: The Boston Globe, 10 September 1997, A1.

/1/2/1/1

/2

/2

Page 10: Name:! ! Unit 3 –Regression and Correlation › uploads › 2 › 2 › 8 › 1 › 22815290 › ...Source: Grzimek’s Encyclopedia, Mammals V4. New York: McGraw-Hill, 1990. 100

LESSON 2 • Least Squares Regression and Correlation 313

On Your Own

Why Your Credit History Affects Your Insurance Ratesby Carrie Teegardin and Ann Hardie

By shuffling a customer’s debt and bill-paying records through a complicated computer program, insurers believed they could predict with amazing accuracy which customers were most likely to get into an auto accident and file a claim.

The computer program boiled each customer’s history down to a new version of a credit score and called it an “insurance score.” Customers with bad scores were bigger risks than customers with good scores, insurers said, so it was only fair that their policies cost more.

Like most people, Golick couldn’t then—and can’t now—explain the connection. Why would information about credit card bills and mortgage payments predict someone’s driving habits?

“I work in this business. It is not obvious to me,” said Golick, who in addition to his legislative job is an attorney for Allstate Insurance Co. “I do know that the data is conclusive that there is absolutely a correlation.”

The mysterious correlation was so strong that it prompted Golick, who handles regulatory matters for Allstate’s Southeast region, to take action. In 2003, he sponsored legislation that allows insurers to use credit information when pricing auto and homeowners insurance—but keeps the formulas they use secret from consumers.

Source: Atlanta Journal-Constitution, December 12, 2006, www.ajc.com/business/content/business/stories/2006/12/09/1210bizcreditmain.html

Connections

10 Make a scatterplot of the points (1, 1), (2, 2), and (3, 5). Plot the

regression line y = 2x - 4 _ 3 . Draw line segments on your graph to

show the residuals for each point. Illustrate the geometry of the term squared residuals by drawing on the graph an appropriate square for each residual.

11 Consider the set of points (1, 3), (2, 2), (3, 5), and (6, 5).

a. Using the equation y = x + 0.75, find the sum of the residuals and the sum of the squared residuals.

b. Using the equation y = 0.5x + 2.25, find the sum of the residuals and the sum of the squared residuals.

c. One of the two equations is the regression equation. Tell which one it is and how you know.

c. /2

/3

/2/2/2