Upload
nguyendan
View
223
Download
0
Embed Size (px)
Citation preview
Introduction to Regression
• Using Mult Lin Regression – Derived variables
Many alternative models
• Which model to choose? – Model Criticism
• Modelling Objective
• Model Details
• Data and Residuals
• Assumptions
20/01/2015 Cert in Statistics; Intro to Regression Week
3 1
Data Like This
• Values of coefficients
• Sampling Distributions
• Standard Errors
• 95% Confidence Intervals
• 95% Prediction Intervals
• ANOVA etc
20/01/2015
Cert in Statistics; Intro to Regression Week 3
2
Derived variables • General
– Logs
– Proportions and Ratios
– Indicator variables – categorical data
• Time series applications
– Indicator variables – eg seasonal effects
– Lagged variables
– Differences
– Logs and Rate of Return
20/01/2015 Cert in Statistics; Intro to Regression Week
3 3
Too many (derived) variables Redundancy Many versions of same model
Gas Consumption vs Temp
1086420
7
6
5
4
3
2
Temperature
Ga
s
S 0.281334
R-Sq 94.4%
R-Sq(adj) 94.1%
Fitted Line PlotGas = 6.854 - 0.3932 Temperature
1086420
5
4
3
2
1
Temperature
Ga
s
S 0.354848
R-Sq 81.3%
R-Sq(adj) 80.6%
Fitted Line PlotGas = 4.724 - 0.2779 Temperature
Period 1
Period 2
Weekly gas consumption (in 1000 cubic feet) and
the average outside temperature (in degrees
Celsius) at one house in south-east England for two
"heating seasons", one of 26 weeks before, and
one of 30 weeks after cavity-wall insulation was
installed. The object of the exercise was to assess
the effect of the insulation on gas consumption. The house thermostat was set at 20°C throughout.
Comparative
4 20/01/2015 Cert in Statistics; Intro to Regression Week
3
Objective
• Nominal focus on prediction
– Predict gas consumption in future for this house
– Knowing temp and whether or not insulated
• Actual interest
– Does insulation make a difference
• At all temps?
• How much? – Slope? Intercept?
– SEs? Data Like This
20/01/2015 Cert in Statistics; Intro to Regression Week
3 5
Using an Indicator variable
Week Insulation Temperature Gas
22 0 7.6 3.5
23 0 8.0 4.0
24 0 8.5 3.6
25 0 9.1 3.1
26 0 10.2 2.6
27 1 -0.7 4.8
28 1 0.8 4.6
29 1 1.0 4.7
Mr Derek Whiteside of the UK Building Research Station recorded the weekly gas consumption (in 1000 cubic feet) and the average outside temperature (in degrees Celsius) at his own house in south
England for two "heating seasons", one of 26 weeks before, and one of 30 weeks af ter cavityexercise was to assess the ef fect of the insulation on gas consumption.
The house thermostat was set at 20etc
Insulated Week Temperature Gas Insulated Week Temperature Gas
0 1 -0.8 7.2 1 27 -0.7 4.8
0 2 -0.7 6.9 1 28 0.8 4.6
0 3 0.4 6.4 1 29 1.0 4.7
0 4 2.5 6.0 1 30 1.4 4.0
etc etc
Two parallel data sets
One stacked data set
20/01/2015 Cert in Statistics; Intro to Regression Week
3 6
Simple Regression & Indicator Variable Gas vs Insulated • Insulated = 0
– Avg Gas = 4.750
• Insulated = 1 – Avg Gas = 3.483
• Diff = -1.267
Temp vs Insulated Coeff Unit Increase Random Error Design Implications
20/01/2015 Cert in Statistics; Intro to Regression Week
3 7
1.00.80.60.40.20.0
8
7
6
5
4
3
2
1
S 0.987577
R-Sq 29.8%
R-Sq(adj) 28.5%
Insulated
Gas
Fitted Line PlotGas = 4.750 - 1.267 Insulated
1.00.80.60.40.20.0
10
8
6
4
2
0
S 2.73812
R-Sq 2.6%
R-Sq(adj) 0.8%
Insulated
Tem
pera
ture
Fitted Line PlotTemperature = 5.350 - 0.8867 Insulated
SLR with indicator var & T-test Two-sample T for Gas
Insulated N Mean StDev SE Mean
0 26 4.75 1.16 0.23
1 30 3.483 0.806 0.15
Difference = μ (0) - μ (1)
T-Value = 4.79
P-Value = 0.000 DF = 54
Using Pooled StDev = 0.9876
20/01/2015 Cert in Statistics; Intro to Regression Week
3 8
1.00.80.60.40.20.0
8
7
6
5
4
3
2
1
S 0.987577
R-Sq 29.8%
R-Sq(adj) 28.5%
Insulated
Gas
Fitted Line PlotGas = 4.750 - 1.267 Insulated
Regression Analysis: Gas versus Insulated
S R-sq R-sq(adj) R-sq(pred)
0.987577 29.79% 28.49% 24.35%
Coefficients
Term Coef SE Coef T-Value P-Value
Constant 4.750 0.194 24.53 0.000
Insulated -1.267 0.265 -4.79 0.000
Indicator Variables in Regression
1 2
21 1 2 2
2 1 1
0 1 1
2 2 1 1
1 1 1
Response variable
Predictors , (0 /1)
Statistical Model
; ~ 0,
When 0
When 1
Y Gas
x Temp x Insulated
Y x x N
x Y x
Y x
x Y x
Y x
1
1 0 2
Common Slopes
Diff bet Int'cpts
No interaction
Binary Indicator Variable
20/01/2015 Cert in Statistics; Intro to Regression Week
3 9
Multiple Regression Output Regression Analysis: Gas versus Temperature, Insulated The regression equation is Gas = 6.55 - 0.337 Temperature - 1.57 Insulated Predictor Coef SE Coef Constant 6.5513 0.1181 Temperature -0.3367 0.0177 Insulated -1.5652 0.0970
2 2ˆ ˆ1.565 0.097
Rough 95%CI 1.57 2(0.097)
( 1.76, 1.37)
Prev
Mean Diff 1.27 2(0.274)
SE
20/01/2015 Cert in Statistics; Intro to Regression Week
3 10
Parallel lines
Implementation: Categorical Variable
20/01/2015 Cert in Statistics; Intro to Regression Week
3 11
Regression Output: Categorical Var
Regression Analysis: Gas versus Temperature, Insulated
Categorical predictor coding (1, 0)
Model Summary
S R-sq
0.357412 90.97%
Coefficients
Term Coef SE Coef T-Value P-Value
Constant 6.551 0.118 55.48 0.000
Temperature -0.3367 0.0178 -18.95 0.000
Insulated
1 -1.5652 0.0971 -16.13 0.000
20/01/2015 Cert in Statistics; Intro to Regression Week
3 12
Regression Equation
Insulated
0 Gas = 6.551 - 0.3367 Temperature
1 Gas = 4.986 - 0.3367 Temperature
Aside: Omitted predictors Hidden/Lurking variables
Knowing insulation status
Slopes negative On avg, gas consumption decreases with temp
Cert in Statistics; Intro to Regression Week 3
13
Subset of data Used in exam
Uninformed by insulation status Slope positive On avg, gas consumption increases with temp!
20/01/2015
Interaction? Refine the question Different slopes as well?
20/01/2015 Cert in Statistics; Intro to Regression Week
3 14
Indicator Variables in Regression
1 2 3 2
21 1 2 2 3 3
2 1 1
0 1 1
2 2 1 3 1
2
Response variable
Predictors , (0 /1),
Combined statistical model
; ~ 0,
When 0
When 1
diff in intercepts;
Y Gas
x Temp x Insulated x Temp x
Y x x x N
x Y x
Y x
x Y x
3 diff in slopes
20/01/2015 Cert in Statistics; Intro to Regression Week
3 15
New Derived Variable
20/01/2015 Cert in Statistics; Intro to Regression Week
3 16
Modelling two regression lines Regression Analysis: Gas versus Temperature, Insulated, Ins X Temp Gas = 6.85 - 0.393 Temperature - 2.13 Insulated + 0.115 Ins X Temp Predictor Coef SE Coef Constant 6.8538 0.1360 Temperature -0.39324 0.02249 Insulated -2.1300 0.1801 Ins X Temp 0.11530 0.03211 S = 0.323004 R-Sq = 92.8% R-Sq(adj) = 92.4% Which coeff most fundamantal to theory of heat loss?
20/01/2015 Cert in Statistics; Intro to Regression Week
3 17
Alt Models of two regression lines
20/01/2015 Cert in Statistics; Intro to Regression Week
3 18
Nearly equivalent Two sep lin regs Gas vs Temp Exercise Compare Coeff Ests 95% Ints a) One model, w interaction b) Two sep models
1 2
22 1
22 1
Response variable
Predictors , (0 / 1)
Two Statistical Models
0; ; 0,
1; ; 0,
NoIns NoIns NoIns
Ins Ins Ins
Y Gas
x Temp x Insulated
x Y x N
x Y x N
Multiple indicator variables
Will also meet
• Redundancy
• Multiple formulations of same model
20/01/2015 Cert in Statistics; Intro to Regression Week
3 19
Housing Completions, quarterly, 1978 to 2000
Quarter 1978 1979 1980 1981 1982 1983 1984 1985
Q1 5777 7276 3538 6642 5981 4859 5129 4947
Q2 4772 4510 6001 4710 4883 5862 4671 5188
Q3 4579 4278 5879 5570 5354 4663 4947 3930
Q4 4243 4274 6383 6314 4894 4564 3195 3360
Quarter 1986 1987 1988 1989 1990 1991 1992 1993
Q1 5186 4144 3682 3554 4296 4692 4155 3684
Q2 3719 3363 3298 3985 4477 3898 5603 4487
Q3 4533 4391 3747 5277 5011 4600 5919 5121
Q4 3726 3478 3477 4484 4752 5282 5305 6009
Quarter 1994 1995 1996 1997 1998 1999 2000
Q1 4291 5770 6582 7434 8010 9930 10302
Q2 5266 6149 7203 8799 9506 10227 11590
Q3 6871 6806 7634 9140 10103 10788 11892
Q4 7160 7879 8713 10081 11474 12079 12873
20/01/2015 Cert in Statistics; Intro to Regression Week
3 20
Figure 1.30 Housing Completions, quarterly, 1978 to 2000
Year
Quarter
19991996199319901987198419811978
Q1Q1Q1Q1Q1Q1Q1Q1
14000
12000
10000
8000
6000
4000
2000
Co
mp
leti
on
s
Q1
Q2
Q3
Q4
Quarter
Time Series Plot of Completions
20/01/2015 Cert in Statistics; Intro to Regression Week
3 21
Take objective: forecast one quarter ahead
Aside: Cubic/Quadratic Regression
20001995199019851980
16000
14000
12000
10000
8000
6000
4000
2000
time
Co
mp
sS 822.624
R-Sq 88.3%
R-Sq(adj) 87.9%
Regression
95% PI
Fitted Line PlotComps = - 1.44E+10 + 21783340 time
- 10988 time**2 + 1.848 time**3
Fitted Line plot Options
Log Quadratic Cubic
20/01/2015 Cert in Statistics; Intro to Regression Week
3 22
Modelling Options • Focus on stable linear structure post 1993
– Assume this structure will continue
– Exploit structure extension of Indicator Vars
– Disadvantage: smaller data set
• One model for entire data set
– Note: structure has changed; might change again
– Exploit weaker structure
• Use Lagged variables
– Advantage: use all data.
20/01/2015 Cert in Statistics; Intro to Regression Week
3 23
Comps, quarterly, 1993 to 2000 Option 1 – work since 1993
Year
Quarter
1999199819971996199519941993
Q1Q1Q1Q1Q1Q1Q1
13000
12000
11000
10000
9000
8000
7000
6000
5000
4000
Co
mp
leti
on
s
Q1
Q2
Q3
Q4
Quarter
Time Series Plot of Completions
Target is 2001 Q1 Use Q1 data only? OR Use all 1993-2000 data?
4 parallel lines more efficient Why/What sense?
20/01/2015 Cert in Statistics; Intro to Regression Week
3 24
Completions Q1 only
20001999199819971996199519941993
11000
10000
9000
8000
7000
6000
5000
4000
3000
year
Co
mp
leti
on
sS 316.477
R-Sq 98.5%
R-Sq(adj) 98.3%
Fitted Line PlotCompletions = - 1945191 + 977.8 year
Pred = -1945191 + 977.82001.00 ± 2(316.5) = (9795, 11061)
Other Qs; 4 sep lines
20/01/2015 Cert in Statistics; Intro to Regression Week
3 25
Later, use Time since 1978 Changes intercept only
Linear in Time plus Quarterly Ind Vars
Create set of binary variables Q1, Q2, Q3, Q4
Comps = 1Q1 + 2Q2 + 3Q3 + 4Q4
+ Time +
Year. Quarter time
Time
since
1978 Comps Q1 Q2 Q3 Q4
1993 Q1 1993 15.00 3684 1 0 0 0
1993 Q2 1993.25 15.25 4487 0 1 0 0
1993 Q3 1993.5 15.50 5089 0 0 1 0
1993 Q4 1993.75 15.75 6041 0 0 0 1
1994 Q1 1994 16.00 4291 1 0 0 0
1994 Q2 1994.25 16.25 5266 0 1 0 0
1994 Q3 1994.5 16.50 6835 0 0 1 0
1994 Q4 1994.75 16.75 7196 0 0 0 1
20/01/2015 Cert in Statistics; Intro to Regression Week
3 26
Multiple Indicator Vars: Tech Issue Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4
* Q4 is highly correlated with other X variables
* Q4 has been removed from the equation.
The regression equation is
Comps = - 9452 + 986 Time since 1978 - 1792 Q1 - 1139 Q2 - 758 Q3
1 1 2 2 3 3 4 4
Interp of
0 and 0
Alternatives
0 No Constant
Use 3 indicator variables only
Enter " " as categorical va
a
riabl
l
e
l i
Y Q Q Q Q t
t Q
equiv Quarter
20/01/2015 Cert in Statistics; Intro to Regression Week
3 27
Redundancy
Multiple Indicator Vars: Tech Issue
Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4
* Q4 is highly correlated with other X variables
* Q4 has been removed from the equation.
Comps = - 9452 + 986 Time since 1978 - 1792 Q1 - 1139 Q2 - 758 Q3
S = 297.382
OR
Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4
No constant option
Comps = 986 Time since 1978 - 11244 Q1 - 10592 Q2 - 10210 Q3 - 9452 Q4
S = 297.382
20/01/2015 Cert in Statistics; Intro to Regression Week
3 28
Note -11244 = -9452-1792 -9452 = -9452 +0 etc
Multiple Indicator Vars: Tech Issue
Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4
* Q4 is highly correlated with other X variables
* Q4 has been removed from the equation.
Comps = - 9452 + 986 Time since 1978 - 1792 Q1 - 1139 Q2 - 758 Q3
S = 297.382
OR
Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4
No constant option
Comps = 986 Time since 1978 - 11244 Q1 - 10592 Q2 - 10210 Q3 - 9452 Q4
S = 297.382
20/01/2015 Cert in Statistics; Intro to Regression Week
3 29
Note -11244 = -9452-1792 -9452 = -9452 +0 etc
Categorical Variable approach Model Summary
S R-sq
297.382 98.76%
Coefficients
Term Coef SE Coef
Constant -11244 437
time since 1978 986.5 22.9
Quarter
Q2 653 149
Q3 1034 149
Q4 1792 150
20/01/2015 Cert in Statistics; Intro to Regression Week
3 30
Regression Equations
Quarter
Q1 Comps = -11244 + 986.5 t
Q2 Comps = -10592 + 986.5 t
Q3 Comps = -10210 + 986.5 t
Q4 Comps = -9452 + 986.5 t
Consider Q2 – Q1 at t = 0
Derived variables and Transforms in Time Series
Lags
Differences
Rates of Return
Log scale
20/01/2015 Cert in Statistics; Intro to Regression Week
3 31
All Comps, quarterly, 1978 to 2000 Option 2 – use all data, but diff model
Year
Quarter
19991996199319901987198419811978
Q1Q1Q1Q1Q1Q1Q1Q1
14000
12000
10000
8000
6000
4000
2000
Co
mp
leti
on
s
Q1
Q2
Q3
Q4
Quarter
Time Series Plot of Completions
20/01/2015 Cert in Statistics; Intro to Regression Week
3 32
12000100008000600040002000
14000
12000
10000
8000
6000
4000
2000
Lag1Comp
Co
mp
s
S 1167.61
R-Sq 76.1%
R-Sq(adj) 75.8%
Fitted Line PlotComps = 564.6 + 0.9171 Lag1Comp
Auto-Regression for Time Series
Basic idea – next value ‘like’ last value (Lag1)
20/01/2015 Cert in Statistics; Intro to Regression Week
3 33
Auto-Regression for Time Series
Basic idea – next value ‘like’ last value (Lag1)
0 1 1
0 1 1
4 4
Auto Regression
+ * +
+ *
+ *
t lag t t
t lag t
lag t t
Y Y
Y Y
Y
Year. QuarterComps Lag1Comp Lag4Comp
1978 Q1 5777
1978 Q2 4772 5777
1978 Q3 4588 4772
1978 Q4 4234 4588
1979 Q1 7276 4234 5777
1979 Q2 4513 7276 4772
1979 Q3 4284 4513 4588
1979 Q4 4257 4284 4234
1980 Q1 7738 4257 7276
20/01/2015 Cert in Statistics; Intro to Regression Week
3 34
Using two lagged variables
Regression Analysis: Comps versus Lag1Comp, Lag4Comp
The regression equation is
Comps = - 387 + 0.328 Lag1Comp + 0.782 Lag4Comp :
S = 780.7
Comp Q4 2000 = 12873, Comp Q1 2000 = 10302
95% Pred Int Comp Q1 2001 = 11892 ± 2(780.7)= (10330, 13453)
20/01/2015 Cert in Statistics; Intro to Regression Week
3 35
Using Lagged Variables Basic Idea
Current Quarter ‘like’ prev quarter
same Q last year
1200080004000 1200080004000
12000
8000
4000
12000
8000
4000
Completions
Lag1Comp
Lag4Comp
Matrix Plot of Completions, Lag1Comp, Lag4Comp
20/01/2015 Cert in Statistics; Intro to Regression Week
3 36
Comparison
Lag 1 and
lag 4
Comps Lag 1 Lag 4 Q1 Q2 Q3
2000 22 Q1 10302 12079 9930 1 0 0 10451 11340.17
2000 22.25 Q2 11590 10302 10227 0 1 0 11347.5 10989.57
2000 22.5 Q3 11892 11590 10788 0 0 1 11945 11850.74
2000 22.75 Q4 12873 11892 12079 0 0 0 12979.5 12959.35
2001 23 Q1 12873 10302 1 0 0 11437 11891.51
2001 23.25 Q2 ? 11590 0 1 0 12333.5 ?
2001 23.5 Q3 ? 11892 0 0 1 12931
2001 23.75 Q4 ? 12873 0 0 0 13965.5
2002 24 Q1 ? ? 1 0 0 12423
Lin in
time + Q
inds
20/01/2015 Cert in Statistics; Intro to Regression Week
3 37
0
2000
4000
6000
8000
10000
12000
14000
16000
19
94
19
94
19
95
19
96
19
97
19
97
19
98
19
99
20
00
20
00
20
01
Forecasting models
Comps
Linear in Time, quarterindicators
Lag1 and Lag 4
1 1 2 2 3 3 4 4
1 1 4 4
Modelling Options
1 Parallel Linear Regressions
2 Seasonal Regression
More efficient for prediction
Fewer modelling assumptions
Different modelling strateg
t t
t t t t
Y Q Q Q Q t
Auto
Y Y Y
y
Model Criticism
• Criticism
– Does it make sense?
– Are there outliers?
• Choice amongst alternatives
– R2
– SE
20/01/2015 Cert in Statistics; Intro to Regression Week
3 38
Extra: Logs lags and differences
Financial data IBM share price
Natural language “%age change”
MINITAB language logs
20/01/2015 Cert in Statistics; Intro to Regression Week
3 39
Financial Series- IBM Prices daily
Simple Reg on Time
20/01/2015 Cert in Statistics; Intro to Regression Week
3 40
Log IBM Prices
Log(Yt ) vs log(Yt-1) Log(Yt ) vs t
10008006004002000
2.0
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
t
Lo
gp
rice
S 0.0393343
R-Sq 94.4%
R-Sq(adj) 94.4%
Regression
95% PI
IBM PricesLogprice = 1.364 + 0.000561 t
2.01.91.81.71.61.51.41.3
2.0
1.9
1.8
1.7
1.6
1.5
1.4
1.3
lag1logpriceLo
gp
rice
S 0.0080199
R-Sq 99.8%
R-Sq(adj) 99.8%
Regression
95% PI
IBM PricesLogprice = 0.002264 + 0.9990 lag1logprice
20/01/2015 Cert in Statistics; Intro to Regression Week
3 41
Modeled in Log Scale, presented in original units
Log(Yt )vs log(Yt-1) Log(Yt ) vs t
10008006004002000
100
90
80
70
60
50
40
30
20
10
t
pri
ce
S 0.0393343
R-Sq 94.4%
R-Sq(adj) 94.4%
Regression
95% PI
IBM Priceslog10(price) = 1.364 + 0.000561 t
1009080706050403020
100
90
80
70
60
50
40
30
20
10
lag1price
pri
ce
S 0.0080199
R-Sq 99.8%
R-Sq(adj) 99.8%
Regression
95% PI
IBM Priceslog10(price) = 0.002264 + 0.9990 log10(lag1price)
20/01/2015 Cert in Statistics; Intro to Regression Week
3 42
Differences/ Ratios
• First Differences Today – Yesterday
• Seasonal Diffs This Q – same Q last year
• Ratio Y(t) / Y(t-1)
• Rate of Return 100 x(Y(t) – Y(t-1))/ Y(t-1)
100 x (Ratio -1)
• Log(Ratio) Log( Y(t) ) – Log ( Y(t-1) )
20/01/2015 Cert in Statistics; Intro to Regression Week
3 43
Financial Series- IBM Prices daily
Simple Regression of Daily Diffs vs Time
10008006004002000
5.0
2.5
0.0
-2.5
-5.0
t
La
g1
dif
f
S 0.951260
R-Sq 0.1%
R-Sq(adj) 0.0%
Regression
95% PI
IBM Prices
Lag1diff = 0.01424 + 0.000109 t
20/01/2015 Cert in Statistics; Intro to Regression Week
3 44
Financial Series- IBM Prices daily Simple Regression of First Diffs of LogPrice vs Time
10008006004002000
0.04
0.03
0.02
0.01
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
t
La
g1
dif
flo
gS 0.0080216
R-Sq 0.0%
R-Sq(adj) 0.0%
Regression
95% PI
IBM PricesLag1difflog = 0.000568 + 0.000000 t
20/01/2015 Cert in Statistics; Intro to Regression Week
3 45
Financial Series- IBM Prices daily
10008006004002000
0.04
0.03
0.02
0.01
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
t
La
g1
dif
flo
g
S 0.0080216
R-Sq 0.0%
R-Sq(adj) 0.0%
Regression
95% PI
IBM PricesLag1difflog = 0.000568 + 0.000000 t
1
1 1
0.00057 0.016 0.016
1
log log 0
log log 0.00057 or in (0.00057 0.016,0.00057 0.016)
in (-0.0154,0.0166)
10 or in 10 ,10
1.0013 or in 0.96,1.04
In summary Rat
t t t
t tt
t t
t
t
P P time
P PP P
PP
e of return 0.13% per day 4%
Interpretation
20/01/2015 Cert in Statistics; Intro to Regression Week
3 46
Financial Series
Day to day changes most naturally expressed as % change
price tomorrow = price today small change
Log(price t+1)= Log(price t) + Log(small change) Average drift per day (for logs) is 0.00057 ie about 0.13% growth pd = 61% pa
20/01/2015 Cert in Statistics; Intro to Regression Week
3 47
Financial Series Confidence in future prediction
20/01/2015 Cert in Statistics; Intro to Regression Week
3 48
pt est hi lo
0.0006 -0.015 0.0166
10^ Factor 1.0013 0.9652 1.0390
Eg initial capital 1000
Day 1 1001.3 965 1039
2 1002.6 932 1079
3 1003.9 899 1122
4 1005.3 868 1165
5 1006.6 838 1211
364 1612.4 0.0 infinity
365 1614.5 0.0 infinity 61% per annum ??
Derived Variables • Why use derived variables?
– Adding extra variables gives more options
– Challenge
• Is there a ‘cost’?
• Which is ‘best’
– “Scientific” insight can powerful & simple analysis
20/01/2015 Cert in Statistics; Intro to Regression Week
3 49