11
Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Embed Size (px)

Citation preview

Page 1: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Chapter 10: Re-expressing Data

by: Sai Machineni, Hang Ha

AP STATISTICS

Page 2: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Re-express Data

● We re-express data by taking logarithm, the square root, the reciprocal, or some other mathematical operation on all values in the data set.

Page 3: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Goals of Re-expression

● Goal 1:Make the distribution of a variable more symmetric:- It is best to summarize.

To do this, we use the mean and SD. If unimodal though, we use the 68-95-99.7 rule.

● Goal 2: Make the spread of several groups more alike:- Groups that share a

common spread are easier to compare.

Page 4: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Goals of Re-expression

● Goal 3:Make the form of a scatterplot more nearly linear:- The greater the value of

re-expression is that we can fit a linear model once the relationship is straight

● Goal 4:Make the scatter in a scatterplot spread out evenly rather than following a fan shape

Page 5: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Ladder of Powers

● The Ladder of powers places in order the effects that many re expressions have on the data

Page 6: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Attack of the Logarithms● Use when none of the data values is zero or negative● Try taking the logs of both, the x-variable and y-variable● Then re-express the data using some combination.

Model Name X-axis Y-axis Comment

Exponential x log(y) This model is “0” power in the ladder, useful when percent increase

Logarithmic log(x) y When a scatterplot descends rapidly at the left.

Power log(x) log(y) When the ladder power is too big and the next is too small

Page 7: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Why Not a Curve?

● We can find “curves of best fit” using the same approach that led us to linear models

● For many reasons, it is usually better to re-express the data to straighten the plot.

Page 8: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

What Can Go Wrong?

● Don’t expect to be be perfect● Don’t choose a model based on R^2 alone● Beware of multiple models● Watch out for scatterplots that turn around ● Watch out for negative data values● Watch out for data far from 1 ● Don’t stray too far from the ladder

Page 9: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Example 1 (#27)Problem: Researcher studying how a car’s gas mileage varies with its speed drove a compact car 200 miles at various speeds on a test track. Their data are shown in the table.

Speed (mph) 35 40 45 50 55 60 65 70 75

Miles per gal 25.9 27.7 28.5 29.5 29.2 27.4 26.4 24.2 22.8

Create a linear model for this relationship and report any concerns you may have about the model.

Answer: Creating a straight relationship based upon this chapter is impossible.

Page 10: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Example 2 (#31)Problem: It’s often difficult to find the ideal model for the situations in which the data are strongly curved. The table below shows the rapid growth of the number of academic journals published on the Internet during the last decade.

Year(L1)

1991199219931994199519961997

Number of Journals(L2)

273645

18130610932459

a. Create a good model to describe this growth.log(journals) = -686.76 + 0.346(year)

Step 1: Type in data in STAT > Edit > L1- Year (0-6) and L2-JournalsStep 2: Check your residual: Type in Stat- Calc- LinREg (a+bx) L1,

L2Step 3: Start re-expressing: Find the log of journals. In your calculator type in log(L2) STO L3 (This store the Log)Step 4: Check scatterplot for the re-expressed data by changing STATPLOT specifications to Xlist:YR and Ylist: RESID. Then ZoomStat 9Step 5: Test Residual- Perform the regression for the log of tuition vs. year with command Stat > Cal > LinReg8 (a+bx) LYR, L1, Y1Step 6: In Stat Plot, Change Y List to RESID

Page 11: Chapter 10: Re- expressing Data by: Sai Machineni, Hang Ha AP STATISTICS

Example 2 Continueda. Use your model to estimate the number of electronic journals in the year 2000.

To estimate the year 2000 journals we must remember that in entering our data we designated

1991 as year 0. That means we’ll use 9 for the year 2001 and evaluate Y1(9)

About 21497.04 Journals.

b. Comment on your faith in this estimate.

My calculation may be a bit too high because even though there is a rapid growth throughout

the year. The model is still seemingly not correct.