Upload
sabyasachi-sahu
View
214
Download
0
Embed Size (px)
DESCRIPTION
Report on Sarva Siksha Abhiyan through regression
Citation preview
Regression Analysis
Regression analysis is a statistical tool for establishing the relationships among variables. It includes many techniques for modelling and analysing several variables, when the focus is on the relationship between a dependent variable (or ‘responses’) and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. It is also used for assessing the “statistical significance” of the estimated relationships that is the degree of confidence that the true relationship is close to the estimated relationship.
Typically, a regression analysis is used for one (or more) of three purposes:
prediction of the target variable (forecasting). modelling the relationship between x and y. testing of hypotheses.
A typical linear regression model has n sets of observations {x1i, x2i, . . . , xpi, yi}, and satisfies the linear relationship,
Y = β0 + β1X1 + β2X2 +…. ΒkXk + ε
Where Y is the dependent/Response variable, which varies according to the changes in the independent/explanatory variable X, βi is the coefficients that we get from the regression, and ε represents the combined effect of all other types of parameters not defined in the model.
Tools for Analysing a Regression Model
R-square: This states the goodness of fit of the regression model. It is equal to one minus the ratio of the sum of squared estimated errors (the deviation of the actual value of the dependent variable from the regression line) to the sum of squared deviations about the mean of the dependent variable. Hence, the R2 statistic is a measure of the extent to which the total variation of the dependent variable is explained by the regression.
T statistic: The t statistic is the coefficient divided by its standard error. This statistic tests the null hypothesis that the actual value of the coefficient is not zero. The larger the absolute value of t, the less likely that the actual value of the parameter could be zero. Ideally the value of t should be greater than 2.
P-value: It is a measure of the statistical significance of the regression coefficient. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable.
Need for Regression Analysis on Sarva Siksha Abhiyan
In order to establish a relation between the Expenditure (independent) made by the government under SSA and the enrollment (dependent) in the different states, which is the government’s eventual outcome.
Subsequently, descriptive analysis and hypotheses tests and the test for variance (across zones) have helped us narrow down our scope from the whole country to the state of Jammu and Kashmir.
To check the correlation between expenditure and enrollment and other factors viz. number of teachers and number of schools in Jammu and Kashmir (least literate) and Delhi (most literate). Then compare it to the pan India model.
Expectations from the Regression Analyses
By looking at the significance levels of the t-ratios and the p-values, the regression analysis should be able to give strong evidence in support of including the explanatory variables in the model
High values of R-square so that the coefficient of determination of the regression model is established.
Compare the β-weights of the explanatory variables and rank them in order of their explanatory significance.
With regards to our secondary research, our regression analyses is divided into three categories which are given as per the following:
1. Regression Analysis: Pan India2. Regression Analysis for the least literate state (Jammu and Kashmir)3. Regression Analysis for the most literate state (Delhi)4. Regression Analysis for Punjab
Regression Analysis: Pan India
Figure : Regression Analysis: Pan India- Enrolment versus Expenditure
Model
Unstandardized CoefficientsStandardized Coefficients
t Sig.
95.0% Confidence Interval for B
B Std. Error Beta Lower BoundUpper Bound
1 (Constant) -84457.918 503184.074 -.168 .868 -1108193.614
939277.777
Expenditure1011 103.097 5.621 .954 18.342 .000 91.662 114.533
Figure: P values and T stat for Pan India
Regression Equation: Pan India (before adjustments)
Yenrollment = -84457.91 + 103.097Xexpenditure + ε
Observations:
R2 = 0.911
1. The R-square value for the single linear regression between enrolment and expenditure comes to be around 91.1%. This means that the actual values are closer to the regression line, which gives an indication of the goodness of fit of the regression. The explanatory variable “Expenditure” has a strong correlation with the response variable “without significant variances.
2. The p value for the constant comes out to be 86.8% which states that β0 is statistically insignificant. However, the same for expenditure is around 100% which states that the inclusion of the explanatory variable is highly significant and there is no chance that the value of β1 is got by chance
3. The t stat for the constant is -0.168 which does not let us reject the null hypothesis that β0 is equal to 0. However, for the explanatory variable “expenditure”, the t stat value comes out to be 18.342, which proves that we can reject the null hypothesis that β1 is equal to 0.
4. The co-efficient of β0 is negative and that for β1 is positive.
Regression Equation: Pan India (after adjustments)
Yenrollment = 103.097Xexpenditure + ε
Regression Analysis: Jammu and Kashmir
Proceeding to analysis of regression done upon Jammu and Kashmir, which is the least literate state in North India, we find that there are three factors that could affect the enrolment of students in Jammu and Kashmir viz.
(i) Expenditure(ii) (ii) Number of teachers(iii) (iii) Number of schools
The results from the regression are as follows:
Observations: Enrolment vs. Expenditure, schools and teachers)
1. R-square value : 31.4% for enrolment vs. expenditure76.3% for enrolment vs. teachers and86.6% for enrolment vs. schoolsThis means that the actual values are somewhat farther from the regression line for the explanatory variable “expenditure” and comparatively closer to the regression line for the explanatory variables “schools” and “teachers”, which gives an indication of the goodness of fit of the regression and that the explanatory variable
“expenditure” has a weak correlation, whereas those of “schools” and “teachers” have a strong correlation with the response variable “enrolment.”
2. P-Value and t-statistics: (a) For the explanatory variable “expenditure”, the p value of 0.19 and t statistics of
1.512 for β1 does not let us reject the null hypothesis that value of β1 is equal to zero and that its value is statistically insignificant.
(b) For the explanatory variable “schools”, the p-value of β0 is 0.048 and that for β1 is 0.002. This states that both these coefficients are statistically significant.
(c) Similarly, the t statistics for the same variable for β0 and β1 comes out to be 2.60 and 5.679 respectively. These values state that we can easily reject the null hypothesis that value of β1 and β0 is equal to zero.
3. Sign of coefficients: The coefficient of both β0 and β1 is positive for both the explanatory variables “expenditure” and “schools”
1 SPSS gives the β0 and β1 values for the values with the most and the least R-square and hence we do not get the p-values and t-statistics as well as β1 and β0 values for the explanatory variable “teachers”
Regression Analysis: Delhi
Proceeding to analysis of regression done upon Delhi, which is the most literate state in North India, we come across three factors that could affect the enrolment of students in Delhi viz.
(i) Expenditure(ii) Number of teachers(iii) Number of schools
The results from the regression are as follows:
Observations: Enrolment vs. (Expenditure, schools and teachers)
1. R-square value: 6.8% for enrolment vs. expenditure1.6% for enrolment vs. teachers and5.2% for enrolment vs. schoolsThis means that the actual values are somewhat farther from the regression line for all the explanatory variables viz. “expenditure”, “schools” and “teachers” which gives
an indication of the goodness of fit of the regression and that all the explanatory variables have a weak correlation with the response variable “enrolment”
2. P-Value and t-statistics: (a)For the explanatory variable “expenditure”, the p value of 0.573 and t statistics of -0.603 for β1 does not let us reject the null hypothesis that value of β1 is equal to zero and that its value is statistically insignificant.(b)For the explanatory variable “schools” and “teachers”, the p-value of β1 is 0.357 and 0.398 respectively. This states that both these coefficients are statistically insignificant and that the changes in these variables have virtually no effect on the change in response variable “expenditure.”
(c) For the explanatory variable “schools” and “teachers”, the statistics of of β1 is 1.039 and -0.945 respectively. These values state that we fail to reject the null Hypothesis that value of β1 for both these variables are equal to zero.
3. Sign of coefficients: The coefficient of β1 is positive for all the explanatory variables “expenditure” “teachers” and “schools”.
Regression Analysis: Punjab
Proceeding to analysis of regression done upon Punjab, which is a random sample from a state in North India, we come across three factors that could affect the enrolment of students in Punjab viz.
(i) Expenditure(ii) Number of teachers(iii) Number of schools
The results from the regression are as follows:
Observations: Enrolment vs. (Expenditure, schools and teachers)
1. R-values: Since this is a multi-linear regression, instead of R-square values, we get Pearson correlation values which are given as below
Pearson Correlation
Enrolment Expenditure Teachers Schools
Enrolment 1.000 .703 .844 .666Expenditure .703 1.000 .958 .823Teachers .844 .958 1.000 .811Schools .666 .823 .811 1.000
The above table shows that, for Punjab, the response variable “enrolment” has a fair degree of correlation to the explanatory variables “expenditure”, “teachers” and “schools”. The R-square for each can be computed by just squaring the results obtained above. In short, it says that the goodness of fit of the regression for Punjab shows promising results.
2. P-Value and t-statistics: (a) For the explanatory variable “expenditure”, the p value of 0.875 and t statistics
of -2.140 for β1 does not let us reject the null hypothesis that value of β1 is equal to zero and that its value is statistically insignificant.
(b) For the explanatory variable “schools”, the p value of 0.902 and t statistics of -0.129 for β2 does not let us reject the null hypothesis that value of β2 is equal to zero and that its value is statistically insignificant.
(c) For the explanatory variable “teachers”, the p value of 0.008 and t statistics of 3.854 for β3 lets us reject the null hypothesis that value of β3 is equal to zero and that its value is statistically significant i.e. the effect of “teachers” on “enrolment” is very significant and a change in “teachers” effects enrolments directly.
3. Sign of coefficients: The sign of coefficient β3 is positive and that of β1 and β2 is negative.
Appendix:
Figure: R-square of Enrolment vs. Expenditure for J&K
R R SquareAdjusted R
SquareStd. Error of the
Estimate
1 .560a .314 .177 170705.83059
Model Summaryb
Model
a. Predictors: (Constant), Expenditure
b. Dependent Variable: Enrollment
Figure: Model Summary of Enrolment vs. Expenditure for J&K
Standardized Coefficients
B Std. Error Beta Lower Bound Upper Bound
(Constant) 1536126.071 243313.512 6.313 .001 910668.777 2161583.364
Expenditure 10.835 7.165 .560 1.512 .191 -7.583 29.253
1
a. Dependent Variable: Enrollment
Model
Unstandardized Coefficients
t Sig.
95.0% Confidence Interval for B
Figure: p-values, t-statistics and coefficients for Enrolment vs. Expenditure for J&K
Figure: R-square for enrolment vs. number of teachers for J&K
Figure: R-square for enrolment vs. number of schools for J&K
Standardized Coefficients
B Std. Error Beta Lower Bound Upper Bound
(Constant) 597537.003 229547.572 2.603 .048 7466.184 1187607.823
schools 68.785 12.113 .930 5.679 .002 37.647 99.923
1
a. Dependent Variable: Enrollment
Model
Unstandardized Coefficients
t Sig.
95.0% Confidence Interval for B
Figure: p-values, t-statistics and coefficients for Enrolment vs. schools for J&K
Figure: R-square of Enrolment vs. expenditure for Delhi
Figure: R-square of Enrolment vs. schools for Delhi
Figure: R-square of Enrolment vs. teachers for Delhi
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig.
95.0% Confidence Interval for B
B Std. Error BetaLower Bound Upper Bound
1 (Constant) 992840.056 119059.707 8.339 .001 662277.315 1323402.796
Teachers 10.646 10.242 1.965 1.039 .357 -17.789 39.081
Schools -173.551 183.640 -1.787 -.945 .398 -683.417 336.314
a. Dependent Variable: enrollment
Figure: p-values, t-statistics and coefficients for Enrolment vs. schools and teachers for Delhi
Correlations
Enrollment Expenditure teachers SchoolsPearson Correlation Enrollment 1.000 .703 .844 .666
Expenditure .703 1.000 .958 .823
teachers .844 .958 1.000 .811
Schools .666 .823 .811 1.000
Sig. (1-tailed) Enrollment .026 .004 .036
Expenditure .026 .000 .006
teachers .004 .000 .007
Schools .036 .006 .007
N Enrollment 8 8 8 8
Expenditure 8 8 8 8
teachers 8 8 8 8
Schools 8 8 8 8
Figure: Correlations for Enrolment vs. expenditure, schools and teachers for Punjab
Model Unstandardized Coefficients
Standardized Coefficients
t Sig. 95.0% Confidence Interval for B
Correlations
B Std. Error
Beta Lower Bound
Upper Bound
Zero-order
Partial
Part
1 (Constant)
788771.323
113848.449
6.928 .000 510194.205
1067348.442
teachers
10.765 2.793 .844 3.854 .008 3.931 17.600 .844 .844
.844
Figure: Coefficients, p values and t stats for Enrolment vs. teachers
Model Beta In t Sig.Partial
CorrelationCollinearity Statistics
Tolerance1 Expenditure -1.299b -2.140 .085 -.691 .082
Schools -.053b -.129 .902 -.058 .343
Figure: p values and t stats for Enrolment vs. expenditure and schools