Upload
milo-garrett
View
219
Download
0
Embed Size (px)
Citation preview
linear regressionquality of the fit and automating the analysis in Excel
living with the lab
© 2011 David Hall and the LWTL faculty teamThe Living with the Lab label, the Louisiana Tech Logo, and this copyright notice should not be removed when any part of this work is used by others. This work may not be used for commercial purposes. Inquiries should be addressed to [email protected]. This presentation on linear regression is based partially on class notes created by Dr. Mark Barker at Louisiana Tech University.
good, better and best aren’t very quantitative words to describe the “quality of the fit”
good fit
0 5 10 15 20 25 30 35 40 455060708090
100110120
heart rate versus exercise time
cumulative exercise time (s)
hear
t rat
e (b
pm)
better fit
0 5 10 15 20 25 30 35 40 455060708090
100110120
heart rate versus exercise time
cumulative exercise time (s)
hear
t rat
e (b
pm)
best fit
0 5 10 15 20 25 30 35 40 455060708090
100110120
heart rate versus exercise time
cumulative exercise time (s)
hear
t rat
e (b
pm)
living with the lab
2
The content of this presentation is for informational purposes only and is intended only for students attending Louisiana Tech University.
The author of this information does not make any claims as to the validity or accuracy of the information or methods presented.
The procedures demonstrated here are potentially dangerous and could result in injury or damage.
Louisiana Tech University and the State of Louisiana, their officers, employees, agents or volunteers, are not liable or responsible for any injuries, illness, damage or losses which may result from your using the materials or ideas, or from your performing the experiments or procedures depicted in this presentation.
If you do not agree, then do not view this content.
DISCLAIMER
Class Problem Determine the best fit line of “recovery for recycling” versus “year” for 1960, 1970, 1980, 1990, 2000 and 2009.
a. Use Excel to set up a table to manually determine the slope m and the y-intercept b.b. Plot the six raw data points versus the fit. Use markers only (with no lines) for the raw data
and lines only (no markers) for the fit.
3
living with the lab
www.epa.gov
Table ES-3. Generation, materials recovery, composting, combustion with energy recovery, and discards of municipal solid waste, 1960-2009, in pounds per person per day
http://www.wastexchange.org/upload_publications/MSWintheU.S.2010.pdf
𝑚=𝑛∑ 𝑥 𝑖 𝑦 𝑖−∑ 𝑥𝑖∑ 𝑦 𝑖
𝑛∑ 𝑥 𝑖2 − (∑ 𝑥𝑖 )
2 𝑏=∑ 𝑦 𝑖−𝑚∑ 𝑥 𝑖
𝑛
living with the lab
solution
4
𝑚=𝑛∑ 𝑥 𝑖 𝑦 𝑖−∑ 𝑥𝑖∑ 𝑦 𝑖
𝑛∑ 𝑥 𝑖2 − (∑ 𝑥𝑖 )
2 𝑏=∑ 𝑦 𝑖−𝑚∑ 𝑥 𝑖
𝑛
the “coefficient of determination,” more commonly referred to as r2, will be used to determine the “goodness of the fit”
living with the lab
5
coefficient of determination
𝑥
𝑦
𝑥𝑖
𝑦 𝑖𝑓𝑖𝑡𝑦 𝑖
❑
𝑦 𝑖❑− 𝑦 𝑖
𝑓𝑖𝑡
data point (𝑥 𝑖 , 𝑦 𝑖) best fit line
𝑦❑𝑓𝑖𝑡=𝑚 ∙𝑥+𝑏
𝑦 𝑖𝑓𝑖𝑡=𝑚 ∙𝑥 𝑖+𝑏
• the error at point is • since some errors are negative (fit lies below data point) and some are positive (fit lies
above data point), we square the errors: • if we simply reported the term above, the number would vary in size depending on the
problem being solved• we would like a number that varies between 0 (poor fit) and 1 (perfect fit), so we normalize
the error
where is the average value of
0≤𝑟 2≤1
𝑟2=1−∑ (𝑦 𝑖
𝑓𝑖𝑡− 𝑦 𝑖 )2
∑ ( 𝑦− 𝑦 𝑖 )2
living with the lab
6
alternate equation for r2
𝑟2=[ 𝑛∑ 𝑥 𝑖 𝑦 𝑖−∑ 𝑥 𝑖∑ 𝑦 𝑖
√𝑛 (∑ 𝑥 𝑖2 )− (∑ 𝑥𝑖 )
2∙√𝑛 (∑ 𝑦 𝑖
2 )− (∑ 𝑦 𝑖 )2 ]2
0≤𝑟 2≤1
instead of using the form for r2 presented on the previous slide, we use the form below; this form does not rely on and :
Class Problem Use Excel to compute for the recycling problem completed earlier.
living with the lab
7
solution: adding r2 to the earlier spreadsheet
𝑟2=[ 𝑛∑ 𝑥 𝑖 𝑦 𝑖−∑ 𝑥 𝑖∑ 𝑦 𝑖
√𝑛 (∑ 𝑥 𝑖2 )− (∑ 𝑥𝑖 )
2∙√𝑛 (∑ 𝑦 𝑖
2 )− (∑ 𝑦 𝑖 )2 ]2
0≤𝑟 2≤1
• if r2 is 0, then there is no apparent relationship between x and y
• if r2 is 1, then o x perfectly determines y o the variation in y is wholly due to xo y depends on x and there are no other
variables that affect y
living with the lab
8
repeat using built-in Excel tools
1. enter x and y data2. create a scatter plot3. right click on the markers and select “Add Trendline”4. select “Linear”, “Display Equation of chart” and “Display R-squared value on chart”
1950 1960 1970 1980 1990 2000 2010 20200
0.2
0.4
0.6
0.8
1
1.2
f(x) = 0.0212209701126899 x − 41.5367555120039R² = 0.937803707371666
recovery for recycling versus year
year
reco
very
for
recy
clin
g (lb
s/(p
erso
n*da
y))
STEPS:
NOTE: when studying for the next exam, be sure you can solve problems like the one today by hand and using Excel