15
Regression Analysis Regression analysis is a statistical tool for establishing the relationships among variables. It includes many techniques for modelling and analysing several variables, when the focus is on the relationship between a dependent variable (or ‘responses’) and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. It is also used for assessing the “statistical significance” of the estimated relationships that is the degree of confidence that the true relationship is close to the estimated relationship. Typically, a regression analysis is used for one (or more) of three purposes: prediction of the target variable (forecasting). modelling the relationship between x and y. testing of hypotheses. A typical linear regression model has n sets of observations {x 1i , x 2i , . . . , x pi , y i }, and satisfies the linear relationship, Y = β 0 + β 1 X 1 + β 2 X 2 +…. Β k X k + ε Where Y is the dependent/Response variable, which varies according to the changes in the independent/explanatory variable X, βi is the coefficients that we get from the regression, and ε represents the combined effect of all other types of parameters not defined in the model. Tools for Analysing a Regression Model R-square: This states the goodness of fit of the regression model. It is equal to one minus the ratio of the sum of squared estimated errors (the deviation of the actual value of the dependent variable from the regression line) to the sum of

SSA_Report.docx

Embed Size (px)

DESCRIPTION

Report on Sarva Siksha Abhiyan through regression

Citation preview

Page 1: SSA_Report.docx

Regression Analysis

Regression analysis is a statistical tool for establishing the relationships among variables. It includes many techniques for modelling and analysing several variables, when the focus is on the relationship between a dependent variable (or ‘responses’) and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. It is also used for assessing the “statistical significance” of the estimated relationships that is the degree of confidence that the true relationship is close to the estimated relationship.

Typically, a regression analysis is used for one (or more) of three purposes:

prediction of the target variable (forecasting). modelling the relationship between x and y. testing of hypotheses.

A typical linear regression model has n sets of observations {x1i, x2i, . . . , xpi, yi}, and satisfies the linear relationship,

Y = β0 + β1X1 + β2X2 +…. ΒkXk + ε

Where Y is the dependent/Response variable, which varies according to the changes in the independent/explanatory variable X, βi is the coefficients that we get from the regression, and ε represents the combined effect of all other types of parameters not defined in the model.

Tools for Analysing a Regression Model

R-square: This states the goodness of fit of the regression model. It is equal to one minus the ratio of the sum of squared estimated errors (the deviation of the actual value of the dependent variable from the regression line) to the sum of squared deviations about the mean of the dependent variable. Hence, the R2 statistic is a measure of the extent to which the total variation of the dependent variable is explained by the regression.

T statistic: The t statistic is the coefficient divided by its standard error. This statistic tests the null hypothesis that the actual value of the coefficient is not zero. The larger the absolute value of t, the less likely that the actual value of the parameter could be zero. Ideally the value of t should be greater than 2.

P-value: It is a measure of the statistical significance of the regression coefficient. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable.

Page 2: SSA_Report.docx

Need for Regression Analysis on Sarva Siksha Abhiyan

In order to establish a relation between the Expenditure (independent) made by the government under SSA and the enrollment (dependent) in the different states, which is the government’s eventual outcome.

Subsequently, descriptive analysis and hypotheses tests and the test for variance (across zones) have helped us narrow down our scope from the whole country to the state of Jammu and Kashmir.

To check the correlation between expenditure and enrollment and other factors viz. number of teachers and number of schools in Jammu and Kashmir (least literate) and Delhi (most literate). Then compare it to the pan India model.

Expectations from the Regression Analyses

By looking at the significance levels of the t-ratios and the p-values, the regression analysis should be able to give strong evidence in support of including the explanatory variables in the model

High values of R-square so that the coefficient of determination of the regression model is established.

Compare the β-weights of the explanatory variables and rank them in order of their explanatory significance.

With regards to our secondary research, our regression analyses is divided into three categories which are given as per the following:

1. Regression Analysis: Pan India2. Regression Analysis for the least literate state (Jammu and Kashmir)3. Regression Analysis for the most literate state (Delhi)4. Regression Analysis for Punjab

Page 3: SSA_Report.docx

Regression Analysis: Pan India

Figure : Regression Analysis: Pan India- Enrolment versus Expenditure

Model

Unstandardized CoefficientsStandardized Coefficients

t Sig.

95.0% Confidence Interval for B

B Std. Error Beta Lower BoundUpper Bound

1 (Constant) -84457.918 503184.074 -.168 .868 -1108193.614

939277.777

Expenditure1011 103.097 5.621 .954 18.342 .000 91.662 114.533

Figure: P values and T stat for Pan India

Regression Equation: Pan India (before adjustments)

Yenrollment = -84457.91 + 103.097Xexpenditure + ε

Observations:

R2 = 0.911

Page 4: SSA_Report.docx

1. The R-square value for the single linear regression between enrolment and expenditure comes to be around 91.1%. This means that the actual values are closer to the regression line, which gives an indication of the goodness of fit of the regression. The explanatory variable “Expenditure” has a strong correlation with the response variable “without significant variances.

2. The p value for the constant comes out to be 86.8% which states that β0 is statistically insignificant. However, the same for expenditure is around 100% which states that the inclusion of the explanatory variable is highly significant and there is no chance that the value of β1 is got by chance

3. The t stat for the constant is -0.168 which does not let us reject the null hypothesis that β0 is equal to 0. However, for the explanatory variable “expenditure”, the t stat value comes out to be 18.342, which proves that we can reject the null hypothesis that β1 is equal to 0.

4. The co-efficient of β0 is negative and that for β1 is positive.

Regression Equation: Pan India (after adjustments)

Yenrollment = 103.097Xexpenditure + ε

Regression Analysis: Jammu and Kashmir

Proceeding to analysis of regression done upon Jammu and Kashmir, which is the least literate state in North India, we find that there are three factors that could affect the enrolment of students in Jammu and Kashmir viz.

(i) Expenditure(ii) (ii) Number of teachers(iii) (iii) Number of schools

The results from the regression are as follows:

Observations: Enrolment vs. Expenditure, schools and teachers)

1. R-square value : 31.4% for enrolment vs. expenditure76.3% for enrolment vs. teachers and86.6% for enrolment vs. schoolsThis means that the actual values are somewhat farther from the regression line for the explanatory variable “expenditure” and comparatively closer to the regression line for the explanatory variables “schools” and “teachers”, which gives an indication of the goodness of fit of the regression and that the explanatory variable

Page 5: SSA_Report.docx

“expenditure” has a weak correlation, whereas those of “schools” and “teachers” have a strong correlation with the response variable “enrolment.”

2. P-Value and t-statistics: (a) For the explanatory variable “expenditure”, the p value of 0.19 and t statistics of

1.512 for β1 does not let us reject the null hypothesis that value of β1 is equal to zero and that its value is statistically insignificant.

(b) For the explanatory variable “schools”, the p-value of β0 is 0.048 and that for β1 is 0.002. This states that both these coefficients are statistically significant.

(c) Similarly, the t statistics for the same variable for β0 and β1 comes out to be 2.60 and 5.679 respectively. These values state that we can easily reject the null hypothesis that value of β1 and β0 is equal to zero.

3. Sign of coefficients: The coefficient of both β0 and β1 is positive for both the explanatory variables “expenditure” and “schools”

1 SPSS gives the β0 and β1 values for the values with the most and the least R-square and hence we do not get the p-values and t-statistics as well as β1 and β0 values for the explanatory variable “teachers”

Regression Analysis: Delhi

Proceeding to analysis of regression done upon Delhi, which is the most literate state in North India, we come across three factors that could affect the enrolment of students in Delhi viz.

(i) Expenditure(ii) Number of teachers(iii) Number of schools

The results from the regression are as follows:

Observations: Enrolment vs. (Expenditure, schools and teachers)

1. R-square value: 6.8% for enrolment vs. expenditure1.6% for enrolment vs. teachers and5.2% for enrolment vs. schoolsThis means that the actual values are somewhat farther from the regression line for all the explanatory variables viz. “expenditure”, “schools” and “teachers” which gives

Page 6: SSA_Report.docx

an indication of the goodness of fit of the regression and that all the explanatory variables have a weak correlation with the response variable “enrolment”

2. P-Value and t-statistics: (a)For the explanatory variable “expenditure”, the p value of 0.573 and t statistics of -0.603 for β1 does not let us reject the null hypothesis that value of β1 is equal to zero and that its value is statistically insignificant.(b)For the explanatory variable “schools” and “teachers”, the p-value of β1 is 0.357 and 0.398 respectively. This states that both these coefficients are statistically insignificant and that the changes in these variables have virtually no effect on the change in response variable “expenditure.”

(c) For the explanatory variable “schools” and “teachers”, the statistics of of β1 is 1.039 and -0.945 respectively. These values state that we fail to reject the null Hypothesis that value of β1 for both these variables are equal to zero.

3. Sign of coefficients: The coefficient of β1 is positive for all the explanatory variables “expenditure” “teachers” and “schools”.

Regression Analysis: Punjab

Proceeding to analysis of regression done upon Punjab, which is a random sample from a state in North India, we come across three factors that could affect the enrolment of students in Punjab viz.

(i) Expenditure(ii) Number of teachers(iii) Number of schools

The results from the regression are as follows:

Observations: Enrolment vs. (Expenditure, schools and teachers)

1. R-values: Since this is a multi-linear regression, instead of R-square values, we get Pearson correlation values which are given as below

Pearson Correlation

Enrolment Expenditure Teachers Schools

Enrolment 1.000 .703 .844 .666Expenditure .703 1.000 .958 .823Teachers .844 .958 1.000 .811Schools .666 .823 .811 1.000

Page 7: SSA_Report.docx

The above table shows that, for Punjab, the response variable “enrolment” has a fair degree of correlation to the explanatory variables “expenditure”, “teachers” and “schools”. The R-square for each can be computed by just squaring the results obtained above. In short, it says that the goodness of fit of the regression for Punjab shows promising results.

2. P-Value and t-statistics: (a) For the explanatory variable “expenditure”, the p value of 0.875 and t statistics

of -2.140 for β1 does not let us reject the null hypothesis that value of β1 is equal to zero and that its value is statistically insignificant.

(b) For the explanatory variable “schools”, the p value of 0.902 and t statistics of -0.129 for β2 does not let us reject the null hypothesis that value of β2 is equal to zero and that its value is statistically insignificant.

(c) For the explanatory variable “teachers”, the p value of 0.008 and t statistics of 3.854 for β3 lets us reject the null hypothesis that value of β3 is equal to zero and that its value is statistically significant i.e. the effect of “teachers” on “enrolment” is very significant and a change in “teachers” effects enrolments directly.

3. Sign of coefficients: The sign of coefficient β3 is positive and that of β1 and β2 is negative.

Page 8: SSA_Report.docx

Appendix:

Figure: R-square of Enrolment vs. Expenditure for J&K

R R SquareAdjusted R

SquareStd. Error of the

Estimate

1 .560a .314 .177 170705.83059

Model Summaryb

Model

a. Predictors: (Constant), Expenditure

b. Dependent Variable: Enrollment

Figure: Model Summary of Enrolment vs. Expenditure for J&K

Standardized Coefficients

B Std. Error Beta Lower Bound Upper Bound

(Constant) 1536126.071 243313.512 6.313 .001 910668.777 2161583.364

Expenditure 10.835 7.165 .560 1.512 .191 -7.583 29.253

1

a. Dependent Variable: Enrollment

Model

Unstandardized Coefficients

t Sig.

95.0% Confidence Interval for B

Figure: p-values, t-statistics and coefficients for Enrolment vs. Expenditure for J&K

Page 9: SSA_Report.docx

Figure: R-square for enrolment vs. number of teachers for J&K

Figure: R-square for enrolment vs. number of schools for J&K

Standardized Coefficients

B Std. Error Beta Lower Bound Upper Bound

(Constant) 597537.003 229547.572 2.603 .048 7466.184 1187607.823

schools 68.785 12.113 .930 5.679 .002 37.647 99.923

1

a. Dependent Variable: Enrollment

Model

Unstandardized Coefficients

t Sig.

95.0% Confidence Interval for B

Figure: p-values, t-statistics and coefficients for Enrolment vs. schools for J&K

Page 10: SSA_Report.docx

Figure: R-square of Enrolment vs. expenditure for Delhi

Figure: R-square of Enrolment vs. schools for Delhi

Figure: R-square of Enrolment vs. teachers for Delhi

Page 11: SSA_Report.docx

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t Sig.

95.0% Confidence Interval for B

B Std. Error BetaLower Bound Upper Bound

1 (Constant) 992840.056 119059.707 8.339 .001 662277.315 1323402.796

Teachers 10.646 10.242 1.965 1.039 .357 -17.789 39.081

Schools -173.551 183.640 -1.787 -.945 .398 -683.417 336.314

a. Dependent Variable: enrollment

Figure: p-values, t-statistics and coefficients for Enrolment vs. schools and teachers for Delhi

Correlations

Enrollment Expenditure teachers SchoolsPearson Correlation Enrollment 1.000 .703 .844 .666

Expenditure .703 1.000 .958 .823

teachers .844 .958 1.000 .811

Schools .666 .823 .811 1.000

Sig. (1-tailed) Enrollment .026 .004 .036

Expenditure .026 .000 .006

teachers .004 .000 .007

Schools .036 .006 .007

N Enrollment 8 8 8 8

Expenditure 8 8 8 8

teachers 8 8 8 8

Schools 8 8 8 8

Figure: Correlations for Enrolment vs. expenditure, schools and teachers for Punjab

Model Unstandardized Coefficients

Standardized Coefficients

t Sig. 95.0% Confidence Interval for B

Correlations

B Std. Error

Beta Lower Bound

Upper Bound

Zero-order

Partial

Part

1 (Constant)

788771.323

113848.449

6.928 .000 510194.205

1067348.442

teachers

10.765 2.793 .844 3.854 .008 3.931 17.600 .844 .844

.844

Figure: Coefficients, p values and t stats for Enrolment vs. teachers

Model Beta In t Sig.Partial

CorrelationCollinearity Statistics

Page 12: SSA_Report.docx

Tolerance1 Expenditure -1.299b -2.140 .085 -.691 .082

Schools -.053b -.129 .902 -.058 .343

Figure: p values and t stats for Enrolment vs. expenditure and schools