27
STATISTICAL RELATIONSHIP BETWEEN INCOME AND  EXPENDITURES, (INCOME=DEPENDENT VARIABLE &  EXENDITURES=INDEPENDENT VARIABLE) A Project Presented By Rehan Ehsan Contact# +92 321 8880397 [email protected] To Dr. Naheed Sultana In partial fulfillment of the requirements for course completion of ECONOMETRICS M.PHIL (FINANCE) (SEMESTER ONE) LAHORE SCHOOL OF ACCOUNTING & FINANCE The University of Lahore 1

Statistical relationship between income and expenditures

Embed Size (px)

DESCRIPTION

This Article shows the statistical relationship between the income and expenditure, (INCOME=DEPENDENT VARIABLE & EXPENDITURES=INDEPENDENT VARIABLE) .

Citation preview

Page 1: Statistical relationship between income and expenditures

STATISTICAL RELATIONSHIP BETWEEN INCOME AND EXPENDITURES,

(INCOME=DEPENDENT VARIABLE & EXENDITURES=INDEPENDENT VARIABLE) 

A Project Presented

By

Rehan EhsanContact# +92 321 8880397

[email protected]

To

Dr. Naheed SultanaIn partial fulfillment of the requirements for course completion of

ECONOMETRICS

M.PHIL (FINANCE)(SEMESTER ONE)

LAHORE SCHOOL OF ACCOUNTING & FINANCEThe University of Lahore

1

Page 2: Statistical relationship between income and expenditures

Acknowledgement

To say this project is “by Rehan Ehsan” overstates the case. Without the significant contributions made by other people this project would certainly not exist.

I would like to say thanks to general public who helped me out to have questionnaires regarding their income and expenses. Thanks to their cooperation and thanks to my colleges as well who helped me making my project completed.

2

Page 3: Statistical relationship between income and expenditures

ABSTRACT

We found that monthly expenditures are dependent on the monthly total income and the contribution of population is very low in this regards. As person who earns also make expenses and also save the surplus amount so total monthly income is break up of Expenditures and savings.

3

Page 4: Statistical relationship between income and expenditures

TABLE OF CONTENTS

Introduction------------------------------------------------------------------------------------------------1

Data table---------------------------------------------------------------------------------------------------1

Descriptive statistics--------------------------------------------------------------------------------------2

Frequency table--------------------------------------------------------------------------------------------4

Histogram--------------------------------------------------------------------------------------------------6

Simple linear regression function-----------------------------------------------------------------------7

Regression analysis---------------------------------------------------------------------------------------7

Problems of Regression analysis------------------------------------------------------------------------7

Ordinary least square method----------------------------------------------------------------------------8

Test of regression estimates------------------------------------------------------------------------------8

F-Test-------------------------------------------------------------------------------------------------------9

ANOVA----------------------------------------------------------------------------------------------------9

Reliability--------------------------------------------------------------------------------------------------9

Models of ANOVA--------------------------------------------------------------------------------------11

I. Fixed effect model

II. Random effect model

III. Mixed effect model

Assumptions----------------------------------------------------------------------------------------------11

Means-----------------------------------------------------------------------------------------------------12

Goodness to fit-------------------------------------------------------------------------------------------12

Chi square Goodness to fit-----------------------------------------------------------------------------12

Correlation------------------------------------------------------------------------------------------------13

Correlation coefficient----------------------------------------------------------------------------------14

4

Page 5: Statistical relationship between income and expenditures

Classical normal linear regression model------------------------------------------------------------18

Assumptions of CNLRM-------------------------------------------------------------------------------18

I. Critical assumptions

II. Detailed assumptions

T-Test-----------------------------------------------------------------------------------------------------19

Uses of T-Test--------------------------------------------------------------------------------------------20

Types of T-Test------------------------------------------------------------------------------------------20

Summary--------------------------------------------------------------------------------------------------21

Conclusion------------------------------------------------------------------------------------------------22

5

Page 6: Statistical relationship between income and expenditures

INTRODUCTION :

I made survey on general public and ask them about their Income and Expenses. From the data gathered I rounded off the figures from 5,000 to 150,000 and put the expenditures to their nearest as per my research.

This project is to show the relationship between monthly income and expenditures.

DATA TABLE:

Sr# IncomeExpenditur

e1 5,000 5,0002 10,000 9,5003 15,000 14,5004 20,000 18,5005 25,000 19,0006 30,000 27,0007 35,000 30,5008 40,000 35,0009 45,000 39,000

10 50,000 45,50011 55,000 49,50012 60,000 52,00013 65,000 55,00014 70,000 59,00015 75,000 64,00016 80,000 69,50017 85,000 73,00018 90,000 78,50019 95,000 81,00020 100,000 84,70021 105,000 90,00022 110,000 90,00023 115,000 90,50024 120,000 93,00025 125,000 94,80026 130,000 95,75027 135,000 98,00028 140,000 100,00029 145,000 104,590

30 150,000 110,000

Total 2,325,000 1,876,340

6

Page 7: Statistical relationship between income and expenditures

DESCRIPTIVE STATISTICS:

Descriptive Statistics

N Range Minimum Maximum Sum MeanStd.

Deviation Variance

Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic StatisticINCOME

30 145000 5000 150000 2325000 77500.00 8036.376 44017.04219375000

00.000EXPENDITURE

30 105000 5000 110000 1876340 62544.67 5852.542 32055.69010275672

67.126Valid N (listwise) 30

Test Statistics

INCOME EXPENDITUREChi-Square(a,b) .000 .933df 29 28Asymp. Sig. 1.000 1.000

a) 30 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0.b) 29 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0.

7

Page 8: Statistical relationship between income and expenditures

INCOME EXPENDITURE

Observed N

Expected N

Residual Observed

NExpected

NResidual

5000 1 1 0 5000 1 1 0

10000 1 1 0 9500 1 1 0

15000 1 1 0 14500 1 1 0

20000 1 1 0 18500 1 1 0

25000 1 1 0 19000 1 1 0

30000 1 1 0 27000 1 1 0

35000 1 1 0 30500 1 1 0

40000 1 1 0 35000 1 1 0

45000 1 1 0 39000 1 1 0

50000 1 1 0 45500 1 1 0

55000 1 1 0 49500 1 1 0

60000 1 1 0 52000 1 1 0

65000 1 1 0 55000 1 1 0

70000 1 1 0 59000 1 1 0

75000 1 1 0 64000 1 1 0

80000 1 1 0 69500 1 1 0

85000 1 1 0 73000 1 1 0

90000 1 1 0 78500 1 1 0

95000 1 1 0 81000 1 1 0

100000 1 1 0 84700 1 1 0

105000 1 1 0 90000 2 1 1

110000 1 1 0 90500 1 1 0

115000 1 1 0 93000 1 1 0

120000 1 1 0 94800 1 1 0

125000 1 1 0 95750 1 1 0

130000 1 1 0 98000 1 1 0

135000 1 1 0 100000 1 1 0

140000 1 1 0 104590 1 1 0

145000 1 1 0 110000 1 1 0

150000 1 1 0 Total 30

Total 30

FREQUENCY TABLE:

8

Page 9: Statistical relationship between income and expenditures

INCOME EXPENDITURE

Frequency PercentValid

PercentCumulative

Percent Frequency Percent

Valid Percent

Cumulative Percent

Valid

5000 1 3.3 3.3 3.3

Valid

5000 1 3.3 3.3 3.3

10000 1 3.3 3.3 6.7 9500 1 3.3 3.3 6.7

15000 1 3.3 3.3 10 14500 1 3.3 3.3 10

20000 1 3.3 3.3 13.3 18500 1 3.3 3.3 13.3

25000 1 3.3 3.3 16.7 19000 1 3.3 3.3 16.7

30000 1 3.3 3.3 20 27000 1 3.3 3.3 20

35000 1 3.3 3.3 23.3 30500 1 3.3 3.3 23.3

40000 1 3.3 3.3 26.7 35000 1 3.3 3.3 26.7

45000 1 3.3 3.3 30 39000 1 3.3 3.3 30

50000 1 3.3 3.3 33.3 45500 1 3.3 3.3 33.3

55000 1 3.3 3.3 36.7 49500 1 3.3 3.3 36.7

60000 1 3.3 3.3 40 52000 1 3.3 3.3 40

65000 1 3.3 3.3 43.3 55000 1 3.3 3.3 43.3

70000 1 3.3 3.3 46.7 59000 1 3.3 3.3 46.7

75000 1 3.3 3.3 50 64000 1 3.3 3.3 50

80000 1 3.3 3.3 53.3 69500 1 3.3 3.3 53.3

85000 1 3.3 3.3 56.7 73000 1 3.3 3.3 56.7

90000 1 3.3 3.3 60 78500 1 3.3 3.3 60

95000 1 3.3 3.3 63.3 81000 1 3.3 3.3 63.3

100000

1 3.3 3.3 66.7 84700 1 3.3 3.3 66.7

105000

1 3.3 3.3 70 90000 2 6.7 6.7 73.3

110000

1 3.3 3.3 73.3 90500 1 3.3 3.3 76.7

115000

1 3.3 3.3 76.7 93000 1 3.3 3.3 80

120000

1 3.3 3.3 80 94800 1 3.3 3.3 83.3

125000

1 3.3 3.3 83.3 95750 1 3.3 3.3 86.7

130000

1 3.3 3.3 86.7 98000 1 3.3 3.3 90

135000

1 3.3 3.3 9010000

01 3.3 3.3 93.3

140000

1 3.3 3.3 93.310459

01 3.3 3.3 96.7

145000

1 3.3 3.3 96.711000

01 3.3 3.3 100

150000

1 3.3 3.3 100 Total 30 100 100

Total 30 100 100

9

Page 10: Statistical relationship between income and expenditures

Statistics

INCOME EXPENDITUREN Valid 30 30 Missing 0 0Mean 77500.00 62544.67Std. Error of Mean 8036.376 5852.542Median 77500.00(a) 66750.00(a)Mode 5000(b) 90000Std. Deviation 44017.042 32055.690Variance 1937500000.000 1027567267.126Skewness .000 -.310Std. Error of Skewness .427 .427Range 145000 105000Minimum 5000 5000Maximum 150000 110000Sum 2325000 1876340Percentiles 25 40000.00(c) 35000.00(c) 50 77500.00 66750.00 75 115000.00 90500.00

a) Calculated from grouped data.b) Multiple modes exist. The smallest value is shownc) Percentiles are calculated from grouped data.

Ratio Statistics for INCOME / EXPENDITURE

Mean 1.19795% Confidence Interval for Mean

Lower Bound 1.156Upper Bound

1.238

Median 1.16995% Confidence Interval for Median

Lower Bound 1.148Upper Bound 1.222

Actual Coverage 95.7%Weighted Mean 1.23995% Confidence Interval for Weighted Mean

Lower Bound 1.196Upper Bound 1.282

Minimum 1.000Maximum 1.400Std. Deviation .110Range .400Price Related Differential .966Coefficient of Dispersion .071Coefficient of Variation Median Centered 9.7%

a) The confidence interval for the median is constructed without any distribution assumptions. The actual coverage level may be greater than the specified level. Other confidence intervals are constructed by assuming a Normal distribution for the ratios.

10

Page 11: Statistical relationship between income and expenditures

HISTOGRAM WITH NORMAL CURVE:

INCOME140000120000100000800006000040000200000

Fre

qu

en

cy

5

4

3

2

1

0

INCOME

Mean =77500Std. Dev. =44017.042

N =30

EXPENDITURE120000100000800006000040000200000

Fre

qu

en

cy

6

4

2

0

EXPENDITURE

Mean =62544.67Std. Dev. =32055.69

N =30

11

Page 12: Statistical relationship between income and expenditures

SIMPLE REGRESSION FUNCTION:

In statistics, simple linear regression is the least squares estimator of a linear regression model with a single predictor variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.

REGRESSION ANALYSIS:

Regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.

Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables.

PROBLEMS IN REGRESSION ANALYSIS:

MULTICOLLINEARITY

Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model are highly correlated. In this situation the coefficient estimates may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors. That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others.

HETEROSCEDASTICITY

In statistics, a sequence of random variables is heteroscedastic, or heteroscedastic, if the random variables have different variances. The term means "differing variance" and comes

12

Page 13: Statistical relationship between income and expenditures

from the Greek "hetero" ('different') and "skedasis" ('dispersion'). In contrast, a sequence of random variables is called homoscedastic if it has constant variance.

ORDINARY LEAST SQUARE METHOD

Ordinary least squares (OLS) or linear least squares are a method for estimating the unknown parameters in a linear regression model. This method minimizes the sum of squared vertical distances between the observed responses in the dataset, and the responses predicted by the linear approximation. The resulting estimator can be expressed by a simple formula, especially in the case of a single regressor on the right-hand side.

The OLS estimator is consistent when the regressor are exogenous and there is no Multicollinearity, and optimal in the class of linear unbiased estimators when the errors are homoscedastic and serially uncorrelated. Under these conditions, the method of OLS provides minimum-variance mean-unbiased estimation when the errors have finite variances. Under the additional assumption that the errors be normally distributed, OLS is the maximum likelihood estimator. OLS is used in economics (econometrics) and electrical engineering (control theory and signal processing), among many areas of application.

TEST OF REGRESSION ESTIMATES:

To test if one variable significantly predicts another variable we need to only test if the correlation between the two variables is significant different to zero (i.e., as above). In regression, a significant prediction means a significant proportion of the variability in the predicted variable can be accounted for by (or "attributed to", or "explained by", or "associated with") the predictor variable.

Descriptive Statistics

N Mean Std. DeviationINCOME 30 77500.00 44017.042EXPENDITURE 30 62544.67 32055.690Valid N (listwise) 30

Model Fit

Fit Statistic

Mean SE Minimum Maximum Percentile

5 10 25 50 75 90 95 5 10 25 50Stationary R-squared

.428 . .428 .428 .428 .428 .428 .428 .428 .428 .428

R-squared .997 . .997 .997 .997 .997 .997 .997 .997 .997 .997RMSE 1882.23

1.

1882.231

1882.2311882.23

11882.23

11882.23

11882.23

11882.23

11882.23

11882.231

MAPE 3.282 . 3.282 3.282 3.282 3.282 3.282 3.282 3.282 3.282 3.282MaxAPE 16.348 . 16.348 16.348 16.348 16.348 16.348 16.348 16.348 16.348 16.348MAE 1439.57

7.

1439.577

1439.5771439.57

71439.57

71439.57

71439.57

71439.57

71439.57

71439.577

MaxAE 4395.076

.4395.07

64395.076

4395.076

4395.076

4395.076

4395.076

4395.076

4395.076

4395.076

Normalized BIC

15.307 . 15.307 15.307 15.307 15.307 15.307 15.307 15.307 15.307 15.307

13

Page 14: Statistical relationship between income and expenditures

ANOVA (b)

Model Sum of Squares df Mean Square F Sig.1 Regression 29230939495.261 1 29230939495.261 1439.666 .000(a) Residual 568511251.405 28 20303973.264 Total 29799450746.667 29

a) Predictors: (Constant), INCOMEb) Dependent Variable: EXPENDITURE

F-TEST

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fit to a data set, in order to identify the model that best fits the population from which the data were sampled.

ANOVA

Analysis of variance (ANOVA) is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. For this reason, ANOVAs are useful in comparing two, three or more means.

RELIABILITY:

Case Processing Summary

N %Cases Valid 30 100.0 Excluded(a) 0 .0 Total 30 100.0

a) Listwise deletion based on all variables in the procedure.

Reliability Statistics

Cronbach's AlphaCronbach's Alpha Based on Standardized Items N of Items

.970 .995 2

14

Page 15: Statistical relationship between income and expenditures

Inter-Item Covariance Matrix

INCOME EXPENDITUREINCOME 1937500000.000 1397472413.793EXPENDITURE 1397472413.793 1027567267.126

Inter-Item Correlation Matrix

INCOME EXPENDITUREINCOME 1.000 .990EXPENDITURE .990 1.000

Summary Item Statistics

Mean Minimum Maximum RangeMaximum / Minimum Variance N of Items

Item Means70022.333 62544.667 77500.000 14955.333 1.239

111830997.556

2

Item Variances14825336

33.56310275672

67.12619375000

00.00090993273

2.8741.886

4139887891773752

00.0002

Inter-Item Covariances 1397472413.793

1397472413.793

1397472413.793

.000 1.000 .000 2

Inter-Item Correlations .990 .990 .990 .000 1.000 .000 2

Scale Statistics

Mean Variance Std. Deviation N of Items

140044.6757600120

94.71375894.744 2

15

Item-Total Statistics

1.1186 2.439 .302 .091 . a

.6402 .176 .302 .091 . a

VAR00001

VAR00002

Scale Mean ifItem Deleted

ScaleVariance ifItem Deleted

CorrectedItem-TotalCorrelation

SquaredMultiple

Correlation

Cronbach'sAlpha if Item

Deleted

The value is negative due to a negative average covariance among items. ThisViolates reliability model assumptions. You may want to check item

codings.

.

Page 16: Statistical relationship between income and expenditures

ANOVA

Sum of Squares df Mean Square F SigBetween People 83520175373.33

329 2880006047.356

Within People Between Items

3354929926.667 1 3354929926.667 39.441 .000

Residual 2466775373.333 29 85061219.770 Total 5821705300.000 30 194056843.333 Total 89341880673.33

359 1514269163.955

Grand Mean = 70022.33

MODELS:

(Model 1) FIXED EFFECTS MODEL

The fixed-effects model of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see if the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.

(Model 2) RANDOM EFFECT MODEL

Random effects models are used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments differ from ANOVA model 1.

(Model 3) MIXED EFFECTS MODEL

A mixed-effects model contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.

Most random-effects or mixed-effects models are not concerned with making inferences concerning the particular values of the random effects that happen to have been sampled. For example, consider a large manufacturing plant in which many machines produce the same product. The statistician studying this plant would have very little interest in comparing the three particular machines to each other. Rather, inferences that can be made for all machines are of interest, such as their variability and the mean. However, if one is interested in the realized value of the random effect, best linear unbiased prediction can be used to obtain a "prediction" for the value.

ASSUMPTIONS OF ANOVA

The analysis of variance has been studied from several approaches, the most common of which use a linear model that relates the response to the treatments and blocks. Even when the statistical model is nonlinear, it can be approximated by a linear model for which an analysis of variance may be appropriate.

16

Page 17: Statistical relationship between income and expenditures

Independence of cases – this is an assumption of the model that simplifies the statistical analysis.

Normality – the distributions of the residuals are normal.

Equality (or "homogeneity") of variances, called homoscedasticity — the variance of data in groups should be the same. Model-based approaches usually assume that the variance is constant. The constant-variance property also appears in the randomization (design-based) analysis of randomized experiments, where it is a necessary consequence of the randomized design.

MEANS:

Case Processing Summary

Cases

Included Excluded Total

N Percent N Percent N PercentEXPENDITURE * INCOME 30 100.0% 0 .0% 30 100.0%

Report

GOODNESS TO FIT:

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing.

CHI-SQUARE AS GOODNESS TO FIT

When an analyst attempts to fit a statistical model to observed data, he or she may wonder how well the model actually reflects the data. How "close" are the observed values to those which would be expected under the fitted model? One statistical test that addresses this issue is the chi-square goodness of fit test.

Test Statistics

INCOME EXPENDITUREChi-Square(a,b) .000 .933df 29 28Asymp. Sig. 1.000 1.000

a) 30 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0.

b) 29 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 1.0.

17

Page 18: Statistical relationship between income and expenditures

INCOME EXPENDITURES

Observed N

Expected N

Residual Observed

NExpected

NResidual

5000 1 1 0 5000 1 1 0

10000 1 1 0 9500 1 1 0

15000 1 1 0 14500 1 1 0

20000 1 1 0 18500 1 1 0

25000 1 1 0 19000 1 1 0

30000 1 1 0 27000 1 1 0

35000 1 1 0 30500 1 1 0

40000 1 1 0 35000 1 1 0

45000 1 1 0 39000 1 1 0

50000 1 1 0 45500 1 1 0

55000 1 1 0 49500 1 1 0

60000 1 1 0 52000 1 1 0

65000 1 1 0 55000 1 1 0

70000 1 1 0 59000 1 1 0

75000 1 1 0 64000 1 1 0

80000 1 1 0 69500 1 1 0

85000 1 1 0 73000 1 1 0

90000 1 1 0 78500 1 1 0

95000 1 1 0 81000 1 1 0

100000 1 1 0 84700 1 1 0

105000 1 1 0 90000 2 1 1

110000 1 1 0 90500 1 1 0

115000 1 1 0 93000 1 1 0

120000 1 1 0 94800 1 1 0

125000 1 1 0 95750 1 1 0

130000 1 1 0 98000 1 1 0

135000 1 1 0 100000 1 1 0

140000 1 1 0 104590 1 1 0

145000 1 1 0 110000 1 1 0

150000 1 1 0 Total 30

Total 30

CORRELATION:

Dependence refers to any statistical relationship between two random variables or two sets of data. Correlation refers to any of a broad class of statistical relationships involving dependence.

Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a product and its price. Correlations are useful because they can indicate a predictive relationship that

18

Page 19: Statistical relationship between income and expenditures

can be exploited in practice. For example, an electrical utility may produce less power on a mild day based on the correlation between electricity demand and weather. In this example there is a causal relationship.

Descriptive Statistics

Mean Std. Deviation NINCOME 77500.00 44017.042 30EXPENDITURE 62544.67 32055.690 30

Correlations

INCOME EXPENDITUREINCOME Pearson Correlation 1 .990(**) Sig. (2-tailed) .000 Sum of Squares and

Cross-products56187500

000.00040526700000.000

Covariance 1937500000.000

1397472413.793

N 30 30EXPENDITURE Pearson Correlation .990(**) 1 Sig. (2-tailed) .000 Sum of Squares and

Cross-products40526700

000.00029799450746.667

Covariance 1397472413.793

1027567267.126

N 30 30

** Correlation is significant at the 0.01 level (2-tailed).

CORRELATION COEFFICEINT:

Correlation coefficient may refer to:

Pearson product-moment correlation coefficient, also known as r, R, or Pearson's r, a measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations

Correlation and dependence, a broad class of statistical relationships between two or more random variables or observed data values

Goodness of fit, which refers to any of several measures that measure how well a statistical model fits observations by summarizing the discrepancy between observed values and the values expected under the model in question

19

Page 20: Statistical relationship between income and expenditures

Coefficient of determination, a measure of the proportion of variability in a data set that is accounted for by a statistical model; often called R2; equal in a single-variable linear regression to the square of Pearson's product-moment correlation coefficient.

Coefficient Correlations (a)

Model INCOME1 Correlations INCOME 1.000

Covariances INCOME .000

a) Dependent Variable: EXPENDITURE

Collinearity Diagnostics (a)

Model Dimension

EigenvalueCondition

Index Variance Proportions

(Constant) INCOME (Constant) INCOME1 1 1.873 1.000 .06 .06

2 .127 3.842 .94 .94

a) Dependent Variable: EXPENDITURE

RESIDUALS:

Residuals Statistics (a)

Minimum Maximum Mean Std. Deviation NPredicted Value 10252.15 114837.18 62544.67 31748.440 30Residual -7624.422 7620.241 .000 4427.622 30Std. Predicted Value -1.647 1.647 .000 1.000 30Std. Residual -1.692 1.691 .000 .983 30

a) Dependent Variable: EXPENDITURE

CHARTS:

20

Page 21: Statistical relationship between income and expenditures

Regression Standardized Residual210-1-2

Fre

qu

en

cy

5

4

3

2

1

0

Histogram

Dependent Variable: EXPENDITURE

Mean =-1.04E-16Std. Dev. =0.983

N =30

Observed Cum Prob1.00.80.60.40.20.0

Exp

ecte

d C

um

Pro

b

1.0

0.8

0.6

0.4

0.2

0.0

Normal P-P Plot of Regression Standardized Residual

Dependent Variable: EXPENDITURE

21

Page 22: Statistical relationship between income and expenditures

Observed Cum Prob1.00.80.60.40.20.0

Ex

pe

cte

d C

um

Pro

b

1.0

0.8

0.6

0.4

0.2

0.0

Normal P-P Plot of INCOME

Transforms: natural log

Dot/Lines show Modes

500020000

3500050000

6500080000

95000110000

125000140000

INCOME

5000

18500

30500

45500

55000

69500

81000

90500

95750

104590

EX

PE

ND

ITU

RE

CLASSICAL NORMAL LIINEAR REGRESSION MODEL:

22

Page 23: Statistical relationship between income and expenditures

Econometrics is all about causality. Economics is full of theory of how one thing causes another: increases in prices cause demand to decrease, better education causes people to become richer, etc. So to be able to test this theory, economists find data (such as price and quantity of a good, or notes on a population's education and wealth levels).

Data always comes out looking like a cloud, and without using proper techniques, it is impossible to determine if this cloud gives any useful information. Econometrics is a tool to establish correlation and hopefully later, causality, using collected data points. We do this by creating an explanatory function from the data. The function is linear model and is estimated by minimizing the squared distance from the data to the line. The distance is considered an error term. This is the process of linear regression.

ASSUMPTIONS UNDERLYING CLASSICAL NORMAL LIINEAR REGRESSION MODEL

There are 5 critical assumptions relating to CLRM. These assumptions are required to show that the estimation technique, Ordinary Least Squares (OLS), has a number of desirable properties, and also so that the hypothesis tests regarding the coefficient estimates could validly be conducted.

CRITICAL ASSUMPTIONS:

The errors have zero mean.

The variance of the errors is constant and finite over all values of X.

The errors are statistically independent of one another.

There is no relationship between the error and the corresponding X.

∑ is normally distributed.

DETAILED ASSUMPTIONS

The regression model is linear in parameters The value of the regressor’s, X’s (independent variables) are fixed in repeated

samples. For given values of X’s, the mean value of the errors equals zero. For given values of X’s, the variance of the errors in constant. For given values of X’s there is no autocorrelation. The X’s are stochastic and the errors and the X’s are not correlated. The number of observations is greater than the number of independent variables. There is sufficient variability in the values of the X’s. The regression model is correctly specified.

23

Page 24: Statistical relationship between income and expenditures

There is not multi-Collinearity. The error term is normally distributed.

T-TEST:

A t-test is any statistical hypothesis test in which the test statistic follows a Student's t distribution, if the null hypothesis is supported. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known.

One-Sample Statistics

N Mean Std. DeviationStd. Error

MeanINCOME 30 77500.00 44017.042 8036.376EXPENDITURE 30 62544.67 32055.690 5852.542

One-Sample Test

Test Value = 0

t df Sig. (2-tailed)Mean

Difference95% Confidence Interval

of the Difference

Lower Upper Lower Upper Lower UpperINCOME 9.644 29 .000 77500.000 61063.77 93936.23EXPENDITURE 10.687 29 .000 62544.667 50574.88 74514.46

ANOVA

EXPENDITURE

Sum of Squares df Mean Square F Sig.Between Groups

(Combined)29799450746.667 29 1027567267.126 . .

Linear Term Contrast 29230939495.261 1 29230939495.261 . Deviation

568511251.405 28 20303973.264 .

Within Groups .000 0 . Total 29799450746.667 29

USES:

Among the most frequently used t-tests are:

24

Page 25: Statistical relationship between income and expenditures

• A one-sample location test of whether the mean of a normally distributed population has a value specified in a null hypothesis.

• A two sample location test of the null hypothesis that the means of two normally distributed populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.

• A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test.

• A test of whether the slope of a regression line differs significantly from 0.

TYPES:

UNPAIRED & PAIRED TWO SAMPLES T-Test

Two-sample t-tests for a difference in mean can be either unpaired or paired. Paired t-tests are a form of blocking, and have greater power than unpaired tests when the paired units are similar with respect to "noise factors" that are independent of membership in the two groups being compared. In a different context, paired t-tests can be used to reduce the effects of confounding factors in an observational study.

The unpaired, or "independent samples" t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test. The randomization is not essential here—if we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample t-test to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational.

Dependent samples (or "paired") t-tests typically consist of a sample of matched pairs of similar units, or one group of units that has been tested twice (a "repeated measures" t-test). A typical example of the repeated measures t-test would be where subjects are tested prior to a treatment, say for high blood pressure, and the same subjects are tested again after treatment with a blood-pressure lowering medication.

A dependent t-test based on a "matched-pairs sample" results from an unpaired sample that is subsequently used to form a paired sample, by using additional variables that were measured

25

Page 26: Statistical relationship between income and expenditures

along with the variable of interest. The matching is carried out by identifying pairs of values consisting of one observation from each of the two samples, where the pair is similar in terms of other measured variables. This approach is often used in observational studies to reduce or eliminate the effects of confounding factors.

SUMMARY:

Case Processing Summary (a)

Cases

Included Excluded Total

N Percent N Percent N PercentINCOME 30 100.0% 0 .0% 30 100.0%EXPENDITURE 30 100.0% 0 .0% 30 100.0%

a) Limited to first 100 cases.

26

Page 27: Statistical relationship between income and expenditures

Case Summaries (a)

Case Number INCOME EXPENDITURE1 1 5000 50002 2 10000 95003 3 15000 145004 4 20000 185005 5 25000 190006 6 30000 270007 7 35000 305008 8 40000 350009 9 45000 3900010 10 50000 4550011 11 55000 4950012 12 60000 5200013 13 65000 5500014 14 70000 5900015 15 75000 6400016 16 80000 6950017 17 85000 7300018 18 90000 7850019 19 95000 8100020 20 100000 8470021 21 105000 9000022 22 110000 9000023 23 115000 9050024 24 120000 9300025 25 125000 9480026 26 130000 9575027 27 135000 9800028 28 140000 10000029 29 145000 10459030 30 150000 110000Total Mean 77500.00 62544.67 Minimum 5000 5000 Maximum 150000 110000 Range 145000 105000 Variance

19375000

00.0001027567267.126

N 30 30

a) Limited to first 100 cases.

CONCLUSION: Hence from all the above discussion, we found that monthly expenditures are dependent on the monthly total income and the contribution of population is very low in this regards. As the person who earns make expenses and also save the surplus amount so total monthly income is break up of Expenditures and savings.

27