15
Page 1 Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____________________________________________ (data collected when two variables are measured on each individual). Section 4.1—Scatter Diagrams and Correlation Definitions: ______________________________: the variable of interest in the study, that may be explained by the value of another variable; the variable we would like to be able to predict; notation: Y ______________________________: a variable that may explain the value of the response variable, also called the “predictor variable”, usually easier to measure or happens before the response variable; notation: X ______________________________: graph that shows the relationship between two QUANTITATIVE variables measured on the same individual where points are plotted using explanatory variable values on the horizontal axis (X) and response variable values on the vertical axis (Y) ______________________________: a type of relationship between two quantitative variables that follows a straight-line pattern ______________________________: a type of relationship between two quantitative variables that follows a curved pattern ______________________________: a type of relationship between two quantitative variables that shows a horizontal cloud pattern (no matter the value of X, Y is about the same) ______________________________: as one variable increases, the other also increases Types of Relation- ships

Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 1

Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4)

Chapter 4 discusses _____________________________________________ (data collected when two variables are measured on each individual).

Section 4.1—Scatter Diagrams and Correlation

Definitions: ______________________________: the variable of interest in the study, that may be explained by

the value of another variable; the variable we would like to be able to predict; notation: Y

______________________________: a variable that may explain the value of the response variable,also called the “predictor variable”, usually easier to measure or happens before the response variable; notation: X

______________________________: graph that shows the relationship between two QUANTITATIVEvariables measured on the same individual where points are plotted using explanatory variable values on the horizontal axis (X) and response variable values on the vertical axis (Y)

______________________________: a type of relationship between two quantitative variablesthat follows a straight-line pattern

______________________________: a type of relationship between two quantitative variablesthat follows a curved pattern

______________________________: a type of relationship between two quantitative variables that shows a horizontal cloud pattern(no matter the value of X, Y is about the same)

______________________________: as one variable increases, the other alsoincreases

Types of

Relation-ships

Page 2: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 2

Example: X = number of hours worked; Y = amount of pay receivedWork MORE hours, receive MORE pay.

______________________________: as one variable increases, the other decreasesExample: X = vehicle weight;

Y = gas mileageVehicle weighs MORE, it gets LESS mpg.

______________________________: a number that measures the strength and direction of a linearrelationship between two QUANTITATIVE variables.

Properties of the Correlation Coefficient (r)

1. ________ ≤ r ≤ ________2. r = ________ means a perfect positive linear relationship exists3. r = ________ means a perfect negative linear relationship exists4. the closer r is to +1, the stronger the _____________________ association5. the closer r is to –1, the stronger the _____________________ association6. r close to ________ means there is little or no evidence of a linear

relationship7. r is a ______________________ measure of association (r doesn’t change even if all values of

either variable are converted to a different scale)8. r is ____________ resistant (highly affected by outliers—always check scatter diagram)

Note: There is a formula, but do NOT actually use it—use the automatic function in your calculator! Skip over the pages in the textbook that show how to compute by hand.

Example: Match the linear correlation coefficient to the scatter diagram. a) b) c) d) 0.787r 0.523r 0.053r 0.946r

Example: Match the linear correlation coefficient to the scatter diagram. a) b) c) d) 0.969r 0.049r 1r 0.992r

Types of Linear

Relationships

r measures the strength of a LINEAR relationship

Page 3: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 3

Example: For each of the following statements, determine if there will generally be a positive correlation, negative correlation, or no correlation. a) Interest rates on car loans and number of cars sold

b) Temperature outside and ice-cream sales

c) Number of hours per week on the treadmill and cholesterol level

d) Price of a Big Mac and number of McDonald.s French fries sold in a week

e) Shoe size and IQ

f) Movie ticket price and number of movie goers

g) Years of education and annual salary

Definition:

__________________________________: a variable that is related to both the explanatory and the response variables; because of these variables, we say:

A significant linear relation does NOT imply that one variable causes the other!!

Testing Whether a Linear Relation Exists

Step 1: Compute r and determine its absolute value.

Step 2: Find the critical value in Table II using the sample size n = number of points (x, y).

Step 3: If |r|, the absolute value of the correlation coefficient is __________________ than the critical value, we say a linear relationship __________________ between the two variables.

Page 4: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 4

Otherwise, if |r| < critical value, _________ linear relation exists.

Example: Create the scatter diagram and compute r:

r = ____________

Is there a significant linear relation? n = ____________CV = ____________ |r| CV

r is ___________________, and |r| = ____________ is __________________ than the critical value ____________, so

________________________ linear relation exists.

Example: The Gallup Organization regularly surveys adult Americans regarding their commute time to work. In addition, they also administered a Well-Being Survey.

a) Which variable do you believe is the explanatory variable and which variable is the response variable?

b) Draw a scatter diagram on the TI calculator and compute the correlation coefficient.

r = ____________

c) Is there a significant linear relation? n = ____________CV = ____________ |r| CV

x y23241

14351

Note: for n > 30, use CV = 0.361

positive/negative greater/not greater

a positive/a negative/no

CV

Page 5: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 5

r is _________________, and |r| = __________ is _________________ than the critical value ________, so

___________________ linear relation exists.

Example: A pediatrician want to determine the relation that may exist between a child’s height and head circumference. She randomly selects eleven 3-year-old children and measures their heights and head circumference.

a) If the pediatrician wants to use height to predict head circumference, determine which variable is the explanatory variable and which is the response variable.

b) Draw the scatter diagram on the TI calculator and compute the correlation coefficient between the height and head circumference of a child.

c) Is there a significant linear relation? n = ____________CV = ____________ |r| CV

r is ________________, and |r| = __________ is ________________ than the critical value ________, so

_______________________ linear relation exists.

Section 4.2—Least Squares Regression

Consider the data below that represents the club-head speed and distance a golf ball travels for 8 swings of the club. We seek to determine if there is a correlation between club-head speed and distance the ball travels.

x=club-head speed (mph) 100 102 103 101 105 100 99 105Y=distance traveled (yds) 257 264 274 266 277 263 258 275

positive/negative greater/not greater

a positive/a negative/no

CV

positive/negative greater/not greater

a positive/a negative/no

CV

Page 6: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 6

98 99 100 101 102 103 104 105 106250

255

260

265

270

275

280

285

Distance vs. Club Head Speed

Club Head Spped (mph)

Dist

ance

(yds

)

a. Make a scatter plot of the data, does their appear to be a linear relation?

b. Determine the line least squares regression line and the correlation coefficient.

c. Is there a significant linear correlation? Why? If so is it positive or negative?

d. What does the slope represent? Write your interpretation in a complete sentence.

e. Use your line of best fit to predict the distance a ball travels if the club-head speed is 104 miles per hour.

Page 7: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 7

98 99 100 101 102 103 104 105 106245

250

255

260

265

270

275

280

Distance vs. Club Head Speed

Club Head Speed (mph)

Dist

ance

(yds

)

Definitions:

______________________________: observed y – predicted y, vertical difference between theobserved and predicted value of the response variable y

______________________________: the line through the data points in a scatter diagram thatminimizes the sum of the squared residuals, used to predict the value of y by plugging a particular value of x into the equation:

= b1x + b0 (Note: always passes through the point ( ))y x, y

NOTE: We can only interpret the ________________ value of __________ when ____________. if: (1) x = 0 is a _______________________ value for the explanatory variable,

(2) observations near ____________________ exist in the data set

CAUTIONS in Regression—when we should NOT use the regression equation to make predictions DO NOT predict outside the scope of the model, meaning we should not use the regression

model to make predictions for values of the explanatory variable that are much ___________________ or much ___________________ than those observed.

DO NOT use the regression model to predict when the correlation coefficient indicates no linear relation between the explanatory and response variables, and the scatter diagram indicates no relation between the variables. Then, use the mean of the response variable as the predicted value so that =_________.y

Page 8: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 8

Example: An engineer want to determine how the weight of a car, x, affects gas milage, y. The following data represents the weights of various domestic cars and their miles per gallon in the city for the 2015 model year.

a. Determine the line least squares regression line and the correlation coefficient.

b. Is there a significant linear correlation? Why? If so is it positive or negative?

c. Interpret the slope and y-intercept, if appropriate. Write your interpretation in a complete sentence.

d. What would you predict the mean fuel consumption to be for a 3000 pound car?

e. A particular 3000 pound car gets 34 miles per gallon. What is the residual? Is this above or below the average for all cars?

f. What would you predict for a 3000 pound car if actually r = –0.489?

Example: Recall the Gallup Organizations survey of adult Americans reqarding their commute times to work and level of Well-Being from chapter 4.1.

Page 9: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 9

a. Which variable is the explanatory variable?

The response variable?

b. Find the least-squares regression line and the correlation coefficient

c. Is their a significant linear relation? Why?

d. Interpret the slope and y-intercept, if appropriate.

e. Predict the Well-Being index of a person whose commute time is 30 minutes.

f. Predict the Well-Being index of a person whose commute time is 180 minutes.

g. Suppose Barbara has a 20-minute commute time and scores 67.3 on the survey. Is Barbara more or less “well-off” then the typical individual who has a 20-minute commute?

Height (inches)

Head Circumference

(inches)27.75 16.824.5 17.125.5 17.126 17.325 16.9

27.75 17.1

Page 10: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 10

Example: A pediatrician wants to determine the relationship between a child’s height and head circumference. She randomly selects 11 children and records the child’s height, x, and head circumference, y.

a. Find the least-squares regression equation and correlation coefficient.

b. Is their a significant linear relation? Why?

c. What is the predicted head circumference for a child that is 26 inches tall? For a child that is 25 inches tall?

Section 4.3—The Coefficient of Determination

Definitions: ___________________________________________: _________measures the proportion of variation in the

response variable that is explained by the regression line (percent of variation in y explained by x).

Comments about the Coefficient of Determination:

R2 is a number between _____ and _____ (it’s always positive since it’s squared).

The closer the observed Y’s are to the regression line, the _________________ R2 will be(the stronger the correlation (r closer to +1 or –1), the higher R2 will be).

To find R2, square the ________________________________________; that is R2 = ______.

R2 will be a decimal value between 0 and 1, but can be written as a percent by moving the decimal two places to the right.

26.6 17.327 17.5

26.75 17.326.75 17.527.5 17.5

Page 11: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 11

Example: Match each coefficient of determination to the appropriate scatter diagram

a) b) c) d) 2 0.58R 2 0.90R 2 1R 2 0.12R

Example: Car weight and MPG continued.

a. What is the coefficient of determination for vehicle weight and mpg?

r= _____________ ______________ 2R

b. Interpret the coefficient of determination for vehicle weight and mpg

___________% of the variation in __________________ is explained by the least-squares regression line.

Example: Gallup Organization survey on Commute time to work and level of Well-Being continued.

a. What is the coefficient of determination for Commute time and level of Well-Being?

r= _____________ ______________ 2R

b. Interpret the coefficient of determination for Commute time and level of Well-Being

___________% of the variation in __________________ is explained by the least-squares regression line.

Example: Pedatrician survey on child’s height and circumference of their head continued.

a. What is the coefficient of determination for child’s height and circumference of their head?

r= _____________ ______________ 2R

b. Interpret the coefficient of determination for child’s height and circumference of their head.

___________% of the variation in __________________ is explained by the least-squares regression line.

Section 4.4—Contingency Tables and Association

In sections 4.1 to 4.3 we have been looking at techniques for summarizing relations between

Page 12: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 12

quantitative variables. We now look at techniques for summarizing relations between qualitative variables.

Definitions: __________________________________: “two-way table,” which relates two categorical variables of

Data (one categorical variable becomes the “row variable”and the other becomes the “column variable”)

__________________________________: is a frequency or relative frequency distribution of either therow or column variable in the contingency table

__________________________________: lists the relative frequency of each category of the Response variable, GIVEN a specific value of the explanatory variable in the contingency table

Example: A recent survey asked questions about one’s level of happiness and their health. We want to investigate whether the two variables are associated. For example, are individuals who are more healthy also more happy?

Poor health Good healthNot too happy 43 30Pretty happy 61 189Very happy 22 122

Construct frequency marginal distributions. This will remove the effect of the other variable. To construct the frequency marginal distribution find the sum of each row or column.

- Marginal distribution for for Health removes the effect of level of happiness- Marginal distributon for Happiness removes the effect of level of health.

Construct relative frequency marginal distributions: The relative frequency marginal distribution is found by dividing the row total or the column totals by the table total.

How many people were surveyed?

How many very happy people were surveyed?

What proportion of people are in poor health?

Page 13: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 13

Construct conditional distributions: Are used to identify assosciations among categorical variables. To determine the conditional distribution we divide each value in the cell by either the row total or the column total depending on what the explanatory variable will be.

Construct a side-by-side bar graph:

Is happiness associated with health? How?

What proportion of people in good health are pretty happy? Example: The data below represents the employment status and level of education of all U.S. residents 25 years old or older in November 2014. We want to investigate whether the two variables are associated. For example, are individuals who have a higher level of education more likely to be employeed?

Level of Education

Page 14: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 14

Construct frequency marginal distributions. This will remove the effect of the other variable. To construct the frequency marginal distribution find the sum of each row or column.

Construct relative frequency marginal distributions:

What proportion of those surveyed had a Bachelor’s degree or higher?What proportion of those surveyed were not in the labor force?What proportion of those surveyed were unemployed?

Construct the Conditional distribution of employment status given (by) level of eduction:

Level of Education

Employment Status Did Not Finish High School

High School Graduate

Some College

Bachelor's Degree or Higher

Employment Status Did Not Finish

High School

High School Graduate

Some College

Bachelor's Degree or Higher

Employed 10,179 33,624 35,407 49,534Unemployed 945 2012 1823 1615Not in labor force 13,271 25,806 17,089 17,415

Level of Education

Employment Status Did Not Finish

High School

High School Graduate

Some College

Bachelor's Degree or Higher Totals

Employed 10,179 33,624 35,407 49,534

Unemployed 945 2012 1823 1615 Not in labor force 13,271 25,806 17,089 17,415

Totals

Level of Education

Employment Status Did Not Finish High School

High School Graduate

Some College

Bachelor's Degree or Higher

Relative Frequency Marginal

Distribution

Employed 10,179 33,624 35,407 49,534

Unemployed 945 2012 1823 1615

Not in labor force 13,271 25,806 17,089 17,415 Relative Frequency Marginal Distribution

Page 15: Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) · Math 1040 Study Guide/Lecture Notes (Ch. 4.1–4.4) Chapter 4 discusses _____ (data collected when two variables are measured

Page 15

Employed

Unemployed

Not in labor force

Totals

What proportion of those surveyed were unemployed?

What proportion of those surveyed were unemployed given they have a bachelors degree?

What proportion of those surveyed were unemployed given they have a high school degree?

Construct a triple Bar Graph of the Conditional Distribution:

What can we say about employement status and level of education?

Did not Finish High High School Some College Bachelor’s Degree School Graduate or Higher

Level of Education