Upload
ikhram-johari
View
4
Download
1
Tags:
Embed Size (px)
DESCRIPTION
ymjlsdjfsdlifj jqwrhqureq3teyqwjrkle jweofjdslkjfjf dlfjsj wlej djf klsfjslfj s;dfn ejroe joejf e jfosejfoefjo gjdlgfgmkghj josejf sdlmfksdjfds fsdnkgj fjgojgodjgort7439uwoerg
Citation preview
------------------------------------------------------------------
Group members:
AINIL AFIQAH BT. MOHD HAMDANAUWATIF BT. U-UDEEB MAHFOUZ JAZOULY
NUR HAYATUL NISAB BT. MAT SARIP
Prepared for :
MR. MOHD NOOR AZAM B. NAFI
Group :
D2 CS221 5A
Introduction :
STA 610SAS Programming
• Source of Dataset : Journal of Statistics Education - Data Archive. (http://www.amstat.org/publications/jse/jse_data_archive.htm)
• This data set contains information for individual residential properties sold in Ames, Iowa from 2006 to 2010.
• Description of Dataset : See table
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Introduction :
STA 610SAS Programming
In order to conduct this project, we have set several
objectives:
a) To check model adequacy: normality, homogeneity
of variance and independency assumption.
b) To check the significance level of the model.
c) To identify if there is relationship between
dependent variable and independent variable.
Objectives :
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Introduction :
STA 610SAS Programming
In order to conduct this project, we have set several
objectives:
a) To check the existence of multicolinearity.
b) To find the best model.
c) To compare Group means and the overall mean
d) To study the correlation between the variable and Sales Price.
Objectives :
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Methodology :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
I. ONE-WAY ANOVA
II. ONE SAMPLE T-TEST
III. TWO INDEPENDENT SAMPLE
IV. LINEAR REGRESSION
V. PIE AND BAR CHART
VI. CORRELATION COEFFICIENT
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Descriptive Statistics :
• Residential Low Density(1) $6,161,600
• Floating Village Residential(2)$3,360,710
• Residential Medium Density(3), Commercial (4), Residential High Density(0)$3,675,339, $ 1,327,476 and $1,189,400
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Descriptive Statistics :
• Single-family Detached (0):
• 62.72%
Townhouse
Inside Unit :
(2) 1.83%
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Descriptive Statistics :
• Single-family Detached (0):
• 62.72%
Townhouse
Inside Unit :
(2) 1.83%
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• to study the effect of Zoning, LotArea, FirstFloor, SecondFloor, GarageArea, LivArea, KitQual, BldgType variables on the SalesPrice.
• CORR procedure is required : computed Pearson correlation
• See output
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : relationship between the variables
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : relationship between the variables
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : relationship between the variables
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : relationship between the variables
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : relationship between the variables
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : able to check the multicollinearity in the model .
• See ouput
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : to check the model adequacy.
Normality Assumption :
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : to check the model adequacy.
Homogeneity Assumption :
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : to check the model adequacy.
Independence Assumption :
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : used to select the best predictor variable to be included in the model.
• main approach is on the Backward Selection
Final model : ( Best parameter )
Regression Model is significant. See hypothesis
Coefficient of Multiple Determination,𝑅2 = 0.8396
Model Adequacy is checked again.
𝑦= −13172 + 2.17109𝑥2 + 55.71420𝑥3 + 30.91042𝑥5
+ 39.99168𝑥6 + 40049𝑥11 + 55296𝑥12 + 18631𝑥13
+ 26700𝑥71 − 21846𝑥80 − 34193𝑥81
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• ANOVA procedure : used to perform analysis of variance for these data.
From the output given :
Hypothesis:
𝐻0 ∶ The regression model is not significant
𝐻1 ∶The regression model is a significant
Test Statistic : p-value= <0.0001.
Decision : Since the p-value (<0.0001) < α = 0.05,reject𝐻0.
Conclusion : The regression model is significant
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• t - TEST procedure : used to study the means of sales price between the zonings.
• The two – sample t test : compared the mean of sample from Floating Village Residential(2)minus the sample from Commercial(4) –See other comparison
From the output given:
Hypothesis :𝐻𝑜 ∶ 𝜇2 = 𝜇4 , 𝐻𝑜 ∶ 𝜇2 ≠ 𝜇4
(∝ = 0.05)Test statistic :𝒕 ∗ = 9.10Decision rule : 𝑡0.025,28.443 = 2.04667 .Since
𝑡 ∗ = 9.26 > 𝑡0.025,28.443 = 2.04667 , reject𝐻𝑜.
Conclusion :mean of sales price of Floating Village Residential(2) and Commercial(4) are significantly different.
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• t - TEST procedure : used to study the overall mean given by the zonings – One Sampled t – Test
From the output given :
Hypothesis :𝐻𝑜: 𝜇 = $200,000 , 𝐻𝑜: 𝜇 ≠ $200,000(∝ = 0.05)
Test statistic :𝑝 − 𝑣𝑎𝑙𝑢𝑒 = < .0001
Decision rule : 𝑡0.025,228 = 2.05169 (Interpolation)
Since 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = < .0001 < ∝ = 0.05, reject𝐻𝑜.
Conclusion : Mean of sales price is significantly different from $200,000.
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Conclusion :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
• Since the data is normally distributed, the
parametric test are proceeded.
• Model adequacy checking assumptions are all
satisfied.
• There are several factors that significantly affect
the sale price.
• The best model is : 𝑦 = −13172 + 2.17109𝑥2 +
55.71420𝑥3 + 30.91042𝑥5 + 39.99168𝑥6 +
40049𝑥11 + 55296𝑥12 + 18631𝑥13 +
26700𝑥71 − 21846𝑥80 − 34193𝑥81
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Reference :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
• Buchecker, M., Calhoun, S. and Repole, W. (2004).
SAS Programming I: Essentials. USA: SAS Institute
Inc.
• Kutner, M. H.,Nachtsheim, C. J., Neter, J. and Li, W.
(2005). Applied Linear Statistical Models. New
York: McGraw Hill.
• Daniel W. W. (1990). Applied Nonparametric
Statistics. USA: Brooks/Cole Cengage Learning.
Introduction :
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Variable Description Unit
X1=ZoningThe general zoning classification of the
sale.0 Residential High Density
1 Residential Low Density
2 Floating Village Residential
3 Residential Medium Density
4 Commercial
X2=LotArea Lot size Square feet (sqft)
X3=FirstFloor Area of First Floor Square feet (sqft)
X4=SecondFloor Area of Second Floor Square feet (sqft)
X5=GarageArea Area of garage Square feet (sqft)
X6=LivArea Size of living area Square feet (sqft)
X7=KitQual Kitchen quality 0 Excellent
1 Good
2 Typical/Average
X8=BldgType Type of building 0 Single-family Detached
1 Townhouse End Unit
2 Townhouse Inside Unit
3 Duplex
4 Two-family Conversion
Y=SalePrice Selling price Dollar ($)
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
Introduction
Methodology
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
Introduction
Methodology
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
Introduction
Methodology
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• Hypothesis :𝐻0 ∶ 𝛽𝑗 = 0, 𝐻1 : 𝛽𝑗 ≠ 0
for at least one j
• Test Statistic :p-value = <0.0001.
• Decision :Since the p-value (<0.0001) < α = 0.05 , reject𝐻0.
• Conclusion :The regression model is significant.
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : to check the model adequacy.
Normality Assumption :
Introduction
Methodology
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : to check the model adequacy.
Homogeneity Assumption :
Introduction
Methodology
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
• REG procedure : to check the model adequacy.
Independence Assumption :
Introduction
Methodology
Result &
Analysis
Conclusion
Reference
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
Introduction
Methodology
Conclusion
Reference
The SAS System
The TTEST Procedure
Variable: SalePrice (SalePrice)
Zoning Method Mean 95% CL Mean Std Dev 95% CL StdDev
2 197689 174778 220600 44560.9 33187.6 67818.6
4 78086.8 62254.6 93919.0 30792.8 22933.5 46864.4
Diff (1-2) Pooled 119602 92842.8 146361 38300.6 30800.9 50659.9
Diff (1-2) Satterthwaite 119602 92711.0 146493
Method Variances DF t Value Pr > |t|
Pooled Equal 32 9.10 <.0001
Satterthwaite Unequal 28.443 9.10 <.0001
Cochran Unequal 16 9.10 <.0001
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
Introduction
Methodology
Conclusion
Reference
Zoning t- value dft – tabulated
∝𝟐 = 𝟎. 𝟎𝟐𝟓
Decision Conclusion
0 and 1 -4.44 17.86 -2.10226Failed to
reject
𝐻𝑜
There are no
significant
differences
0 and 2 -4.95 21.99 -2.07406
0 and 3 0.03 15.642 2.12394
0 and 4 2.95 16.359 2.11641
Reject
𝐻𝑜
There are
significant
differences
1 and 2 -1.24 32.656 -2.03642
1 and 3 6.06 62.535 1.99916
1 and 4 9.56 44.307 2.01648
2 and 3 6.20 28.793 2.04562
2 and 4 9.10 28.443 2.04667
3 and 4 4.01 39.196 2.02269
Result & Analysis :
Introduction
Methodology
STA 610SAS Programming
Result &
Analysis
Conclusion
Reference
Inferential Statistics :
Introduction
Methodology
Conclusion
Reference
One - Mean Comparison of Sale Price Given BY Zoning
The TTEST Procedure
Variable: SalePrice (SalePrice)
Frequency: Zoning Zoning
N Mean Std Dev Std Err Minimum Maximum
229 127594 57758.0 3816.8 12789.0 289000
Mean 95% CL Mean Std Dev 95% CL StdDev
127594 120073 135114 57758.0 52908.2 63594.2
DF t Value Pr > |t|
228 -18.97 <.0001