First Five Chapter Gujrati

7/27/2019 First Five Chapter Gujrati

1/21

BASIC ECONOMETRICS: FOURTH EDITION

Damodar N. Gujarati

PART SINGLE-EQUATION REGRESSION MODELS 15

1 The Nature of Regression Analysis

2 Two-Variable Regression Analysis: Some Basic Ideas

3 Two-Variable Regression Model: The Problem of Estimation

4 Classical Normal Linear Regression Model (CNLRM)

5 Two-Variable Regression: Interval Estimation and

Hypothesis Testing

6 Extensions of the Two-Variable Linear Regression Model

7 Multiple Regression Analysis: The Problem of Estimation

8 Multiple Regression Analysis: The Problem of Inference

9 Dummy Variable Regression Models

PART II RELAXING THE ASSUMPTIONS OF THE

CLASSICAL MODEL

10 Multicollinearity: What Happens if the Regressors Are Correlated

11 Heteroscedasticity: What Happens if the Error

Variance Is Nonconstant?

12 Autocorrelation: What Happens if the Error Terms Are Correlated

13 Econometric Modeling: Model Specification and

Diagnostic Testing

vi BRIEF CONTENTS

PART III TOPICS IN ECONOMETRICS

14 Nonlinear Regression Models

15 Qualitative Response Regression Models


2/21

16 Panel Data Regression Models

17 Dynamic Econometric Models: Autoregressive and

Distributed-Lag Models

PART IV SIMULTANEOUS-EQUATION MODELS

18 Simultaneous-Equation Models

19 The Identification Problem

20 Simultaneous-Equation Methods

21 Time Series Econometrics: Some Basic Concepts

22 Time Series Econometrics: Forecasting

INTRODUCTION

I.1 WHAT IS ECONOMETRICS?

Literally interpreted, econometrics means economic measurement. Al-though measurement is an

important part of econometrics, the scope of econometrics is much broader, as can be seen from the

following quotations: Econometrics, the result of a certain outlook on the role of economics, consists of

the application of mathematical statistics to economic data to lend empirical sup-port to the models

constructed by mathematical economics and to obtain numerical results.

. . . econometrics may be defined as the quantitative analysis of actual economic phenomena based on

the concurrent development of theory and observation, related by appropriate methods of inference.

Econometrics may be defined as the social science in which the tools of economic theory, mathematics,

and statistical inference are applied to the analysis of economic phenomena. Econometrics is concerned

with the empirical determination of economic laws.

METHODOLOGY OF ECONOMETRICS

How do econometricians proceed in their analysis of an economic problem? That is, what is theirmethodology? Although there are several schools of thought on econometric methodology, we present

here the traditional or classical methodology, which still dominates empirical research in economics and

other social and behavioral sciences. Broadly speaking, traditional econometric methodology proceeds

along the following lines:

1. Statement of theory or hypothesis.


3/21

2. Specification of the mathematical model of the theory

3. Specification of the statistical, or econometric, model

4. Obtaining the data

5. Estimation of the parameters of the econometric model

6. Hypothesis testing

7. Forecasting or prediction

8. Using the model for control or policy purposes.

FIGURE I.4 Anatomy of econometric modeling. Please see Page 10.

CHAPTER ONE: The Nature of Regression Analysis

1.1Historical origin of the term regression

1.2The modern interpretation of regression

1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS

1.4 REGRESSION VERSUS CAUSATION

1.5 REGRESSION VERSUS CORRELATION

1.6 TERMINOLOGY AND NOTATION

Before we proceed to a formal analysis of regression theory, let us dwell briefly on the matter of

terminology and notation. In the literature the terms dependent variable and explanatory variable are

described variously. A representative list is:

Dependent variable Explanatory variable

Explained variable Independent variable

Predictand Predictor

Regressand Regressor

Response Stimulus

Endogenous Exogenous

Outcome Covariate


4/21

Controlled variable Control variable

1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS10

The success of any econometric analysis ultimately depends on the avail-ability of the appropriate data.

It is therefore essential that we spend some time discussing the nature, sources, and limitations of the

data that one may encounter in empirical analysis.

Types of Data

Three types of data may be available for empirical analysis: time series, cross-section, and pooled (i.e.,

combination of time series and cross-section) data.

Time Series Data Table I.2: Please see Page 32.

Cross-section Data Table 1.1 Pages 27


5/21

CHAPTER TWO: TWO-VARIABLE REGRESSION ANALYSIS: SOME BASIC IDEAS 41

2.1 A hypothetical Example

2.2 THE CONCEPT OF POPULATION REGRESSION FUNCTION (PRF)

2.3 THE MEANING OF THE TERM LINEAR

Since this text is concerned primarily with linear models, it is essential to know what the term linear

really means, for it can be interpreted in two different ways.

Linearity in the Variables

Linearity in the Parameters

2.4 STOCHASTIC SPECIFICATION OF PRF

2.5 THE SIGNIFICANCE OF THE STOCHASTIC DISTURBANCE TERM

As noted in Section 2.4, the disturbance term ui is a surrogate for all those variables that are omitted

from the model but that collectively affect Y. The obvious question is: Why not introduce these variables

into the model explicitly? Stated otherwise, why not develop a multiple regression model with as many

variables as possible? The reasons are many.

1. Vagueness of theory: The theory, if any, determining the behavior of Y may be, and often is,

incomplete. We might know for certain that weekly income X influences weekly consumption

expenditure Y, but we might be ignorant or unsure about the other variables affecting Y. Therefore, ui

maybe used as a substitute for all the excluded or omitted variables from the model.

2. Unavailability of data: Even if we know what some of the excluded variables are and therefore

consider a multiple regression rather than a simple regression, we may not have quantitative

information about these.

3. Core variables vs peripheral variables

4. Intrinsic randomness in human behavior

5. Poor proxy variables

6. Principles of parsimony

7. Wrong functional form

2.7 AN ILLUSTRATIVE EXAMPLE Please see page 51.


6/21


7/21

CHAPTER THREE: TWO-VARIABLE REGRESSION MODEL

3.1 THE METHOD OF ORDINARY LEAST SQUARES

3.2 THE CLASSICAL LINEAR REGRESSION MODEL:

THE ASSUMPTIONS UNDERLYING THE METHOD OF LEAST SQUARES

Assumption 1: Linear regression model. The regression model is linear in the parameters, as shown in

Yi = 1 2Xi ui

This assumption means that the dependent variable can be calculated as a linear function of a set of

independent variables, plus a disturbance term. This is shown in the above equation that states that:

the regression model is linear in the unknown coefficients 1, 2 so that Yi = 1 2Xi ui , for i =

1,2,3.,n.

Assumption 2: X values are fixed in repeated sampling. Values taken by the regressor X are consideredfixed in repeated samples. More technically, X is assumed to be nonstochastic. the manner in which Yi

are generated. To see why this requirement is needed, look at the PRF: Yi = 1 2 Xi ui . It shows that

Yi depends on both Xi and ui . Therefore, unless we are specific about how Xi and ui are created or

generated, there is no way we can make any statistical inference about the Yi and also, as we shall see,

about 1 and 2. Thus, the assumptions made about the Xi variable(s) and the error term are extremely

critical to the valid interpretation of the regression estimates.

Assumption 3 states that the mean value of ui , conditional upon the given Xi , is zero. Geometrically,

this assumption can be pictured as in Figure 3.3, which shows a few values of the variable X and the Y

populations associated with each of them. As shown, each Y population corresponding to a given X is

distributed around its mean value (shown by the circled points on the PRF) with some Y values above

the mean and some below it. The distances above and below the mean values are nothing but the ui ,

and what (3.2.1) requires is that the average or mean value of these deviations corresponding to any

given X should be zero.

This assumption should not be difficult to comprehend in view of the discussion in Section 2.4 .All that

this assumption says is that the factors not explicitly included in the model, and therefore subsumed in

ui , do not systematically affect the mean value of Y; so to speak, the positive ui 9For illustration, we are

assuming merely that the us are distributed symmetrically as shown in Figure 3.3. But shortly we will

assume that the us are distributed normally.

Assumption 4: Homoscedasticity or equal variance of ui. Given the value of X, the variance of ui is the

same for all observations. That is, the conditional variances of ui are identical. Symbolically, we have

var (ui | Xi) = E *ui E(ui | Xi)+2

= E(ui2 | Xi ) because of Assumption 3


8/21

= 2 (3.2.2)

where var stands for variance.

In passing, note that the assumption E(ui | Xi ) = 0 implies that E(Yi | Xi ) = i 2 Xi . (Why?) Therefore,

the two assumptions are equivalent.

Assumption 5: No autocorrelation between the disturbances. Given any two X values, Xi and Xj (i = j ),

the correlation between any two ui and uj (i = j ) is zero. Symbolically, cov (ui,uj | Xi,Xj) = E *ui E (ui)+ |

Xi -*uj E(uj)+ | Xj -

= E(ui | Xi)(uj | Xj) (why?)

= 0 (3.2.5)

Assumption 6: Zero covariance between ui and Xi , or E(ui/Xi) = 0. Formally, cov (ui, Xi) = E *ui E(ui)+*Xi

E(Xi)+

= E *ui (Xi E(Xi))+ since E(ui) = 0

= E (uiXi) E(Xi)E(ui) since E(Xi) is non-stochastic (3.2.6)

= E(uiXi) since E(ui) = 0

= 0 by assumption

Assumption 6 states that the disturbance u and explanatory variable X are uncorrelated. The rationale

for this assumption is as follows: When we expressed the PRF as in (2.4.2), we assumed that X and u

(which may represent the influence of all the omitted variables) have separate (and additive) influence

on Y. But if X and u are correlated, it is not possible to assess their individual effects on Y. Thus, if X and u

are positively correlated, X increases.

Assumption 7: The number of observations n must be greater than the number of parameters to be

estimated. Alternatively, the number of observations n must be greater than the number of explanatory

variables.

Assumption 8: Variability in X values. The X values in a given sample must not all be the same.

Technically, var (X ) must be a finite positive number.13

Assumption 9:The regression model is correctly specified. Alternatively, there is no specification bias or

error in the model used in empirical analysis.

Assumption 10: There is no perfect multicollinearity. That is, there are no perfect linear relationships

among the explanatory variables.


9/21

3.3 PRECISION OR STANDARD ERRORS OF LEAST-SQUARES ESTIMATES

Meaning of Degrees of Freedom: Please see page 77

3.4 PROPERTIES OF LEAST-SQUARES ESTIMATORS: THE GAUSSMARKOV THEOREM

As noted earlier, given the assumptions of the classical linear regression model, the least-squares

estimates possess some ideal or optimum proper-ties. These properties are contained in the well-known

GaussMarkov theorem. To understand this theorem, we need to consider the best linear unbiasedness

property of an estimator. For this the following conditions must hold:

1. It is linear, that is, a linear function of a random variable, such as the dependent variable Y in the

regression model.

2. It is unbiased, that is, its average or expected value, E( 2), is equal to the true value, 2.

3. It has minimum variance in the class of all such linear unbiased estimators; an unbiased estimator

with the least variance is known as an efficient estimator. In the regression context it can be proved that

the OLS estimators are BLUE. This is the gist of the famous GaussMarkov theorem, which can be stated

as follows:

GaussMarkov Theorem: Given the assumptions of the classical linear regression model, the least-

squares estimators, in the class of unbiased linear estimators, have minimum variance, that is, they are

BLUE. The proof of this theorem is sketched in Appendix 3A, Section 3A.6.

Question: Prove that OLS coefficient for the slope parameter in the simple linear regression model is

unbiased and efficient.

Answer: See Appendix A.

3.5 THE COEFFICIENT OF DETERMINATION2r : A MEASURE OF GOODNESS OF FIT: The regular

coefficient of determination,2r is a measure of the closeness of fit in the multiple regression. However,

2r , cannot be used as a means of comparing two different equations containing different numbers ofexplanatory variables. This is because when additional explanatory variables are added, the proportion

of variation in Y (Dependent variable) explained by the Xs (Independent variable),2r , will always be

increased. Therefore, we will always obtain a higher2r regardless of the importance or not of the

additional regressor. For this reason we need a different measure that will take into account the number

of explanatory variables included in each model. This measure is called the adjusted2r , because it is

adjusted for the number of regressors (or adjusted for the degrees of freedom)

Please see page 81-87 for details.


10/21

Adjusted2r : Adjusted

2r =)(

)1(1

)1/(

)/(1

knTSS

nRSS

nTSS

knRSS

RSS =Residual sum of squares; TSS =Total sum of squares.

TSS = RSS + ESS ; ESS = explained sum of squares.

General criteria for model selection:

As we know that increasing the number of explanatory variables in multiple regression model will

decrease the RSS, and2r will therefore increase. However, the cost of that is a loss in terms of degrees

of freedom. A different method-apart from adjusted2r - of allowing for number of Xs to change when

assessing goodness of fit is to use different criteria for model comparison such as:

Akaike Information Criterion (AIC); Schwartz Information Criterion (SIC); Schwartz Bayesian Criterion

(SBC), Hannan and Quin Critirion (HQC).

The general guideline to select a model: In comparing two or more models, the model with the lowest

AIC is preferred when used AIC criterion. Similarly the model with the lowest SIC is preferred when used

SIC criterion.

For details please see page: 530-539.

3.6 A NUMERICAL EXAMPLE: Please see this example at page 89.

Illustrative Example: Please see 90.


11/21

Some examples of Regression

Data Source: wage.wf1

SUMMARY

OUTPUT TABLE 1

Regression Statistics

Multiple R 0.385548014

R Square 0.148647271

Adjusted R Square 0.14579676

Standard Error 0.388465077

Observations 900

ANOVA

df SS MS FSignificance

F

Regression 3 23.60801004 7.869336682 52.14758055 4.53E-31

Residual 896 135.2109838 0.150905116

Total 899 158.8189938

Coefficients Standard Error t Stat P-value Lower 95%

Up

95

Intercept 5.528329355 0.112794574 49.01236961 8.6993E-256 5.306957 5.74

EDUC 0.07311662 0.006635679 11.01871044 1.44273E-26 0.060093 0.0

EXPER 0.015357835 0.003425312 4.483630929 8.2905E-06 0.008635 0.0TENURE 0.012964063 0.002630727 4.927939405 9.89741E-07 0.007801 0.01

14758055.52150905116.0

869336682.7

)/(

)1/(

/

/

knRSS

kESS

dfRSS

dfESSF

OR)/()1(

)1/(2

2

knR

kRF

The p-value of obtaining an Fvalue of as much as 52.14758055 or greater is zero, leading to therejection of the hypothesis that together educ, exper, and tenure have no effect on lnwage. If

you were to use the conventional 5% level-of-significance value, the critical F value for 3 df in

the numenator and 896 df in the denominator is about 3.84. Obviously the observed F of

52.14758055 far exceeds the critical Fvalues of 3.84 and thereby we reject the null hypothesis

of 0: 3210 H .


12/21

Note that t-testing procedure is based on the assumption that the error termi

u follows the

normal distribution. Although we cannot directly observei

u , we can observe their proxy, the

iu , that is, the residuals. For our lnwage regression, the histogram of the residuals is as shown

in the above figure. From the histogarmit seems that that the residuals are not normally

distributed. In our case, the JB value is 35 with a p- value of 0. It seems that the error term is

not normally distributed.

For our example, the skewness value is -0.23 and the kutosis value is 3.84. Recall that for a

normally distributed variablr the skewness and kurtosis values are, respectively, 0 and 3.

0

20

40

60

80

100

120

-1.5 -1.0 -0.5 0.0 0.5 1.0

Series: RESIDSample 1 900Observations 900

Mean 1.16e-15Median 0.023735Maximum 1.332397Minimum -1.837909Std. Dev. 0.387816Skewness -0.236952Kurtosis 3.841932

Jarque-Bera 35.00382Probability 0.000000


13/21

0

50

100

150

200

250

300

350

400

9 10 11 12 13 14 15 16 17 18

Series: EDUCSample 1 900Observations 900

Mean 13.48000Median 12.00000Maximum 18.00000Minimum 9.000000Std. Dev. 2.200374Skewness 0.541434Kurtosis 2.261056

Jarque-Bera 64.44902Probability 0.000000


14/21

Some examples of Regression

Data Source: wage.wf1

itenureereducwage 3210 expln

As may be seen from Table 1.1, all three variables have positive coefficients. These are all above

the rule of thumb critical t-value of 2, hence all are significant. So, it may be said that wages

will increase as education, experience and tenure increases. Despite the significance of these

variables, the adjusted 2r is quite low (0.145) as there are probably other variables that affect wage.

For example, education increases by 1 year, on average, wage would increase by Taka 0.07. If education

was zero, the average wage would be Taka 5.52 (which is the constant). The2r value of about 0.15

means that 15% of the variation in lnwage is explained by educ, exper, and tenure.

Testing hypothesis: Suppose we want to test null hypothesis that there is no relationship between

lnwage and education , that is, the true slope coefficient 01 . The estimated value of 1 is 0.073117.

If the null hypothesis were true, what is the probability of obtaining a value of 0.073117? Under the null

hypothesis, we observe that the t- value is 11.01871 and the p value of obtaining such a t- value is

practically zero. In other words, we can reject the null hypothesis resoundingly.

Dependent Variable: LNWAGE

Method: Least Squares

Date: 11/05/12 Time: 18:49

Sample: 1 900Included observations: 900

Table 1.1: Results from the wage equation

Variable Coefficient Std. Error t-Statistic Prob.

C 5.528329 0.112795 49.01237 0.0000

EDUC 0.073117 0.006636 11.01871 0.0000

EXPER 0.015358 0.003425 4.483631 0.0000

TENURE 0.012964 0.002631 4.927939 0.0000

R-squared 0.148647 Mean dependent var 6.786164

Adjusted R-squared 0.145797 S.D. dependent var 0.420312

S.E. of regression 0.388465 Akaike info criterion 0.951208

Sum squared resid 135.2110 Schwarz criterion 0.972552Log likelihood -424.0434 Hannan-Quinn criter. 0.959361

F-statistic 52.14758 Durbin-Watson stat 1.750376

Prob(F-statistic) 0.000000


15/21

.

Equation: Untitled

Test Statistic Value df Probability

t-statistic 0.498655 896 0.6181

F-statistic 0.248656 (1, 896) 0.6181

Chi-square 0.248656 1 0.6180

Null Hypothesis: C(3)=C(4)

Null Hypothesis Summary:

Descriptive statistics:Table 2

EDUC EXPER WAGE TENURE

Mean 13.48000 11.59222 964.2644 7.265556

Median 12.00000 11.00000 912.5000 7.000000

Maximum 18.00000 23.00000 3078.000 22.00000

Minimum 9.000000 1.000000 115.0000 0.000000

Std. Dev. 2.200374 4.379564 405.1624 5.080611

Skewness 0.541434 0.072955 1.203925 0.422778

Kurtosis 2.261056 2.437964 5.746906 2.199197

Jarque-Bera 64.44902 12.64404 500.3712 50.85935

Probability 0.000000 0.001796 0.000000 0.000000

Sum 12132.00 10433.00 867838.0 6539.000

Sum Sq. Dev. 4352.640 17243.35 1.48E+08 23205.53

Observations 900 900 900 900


16/21

Normalized Restriction (= 0) Value Std. Err.

C(3) - C(4) 0.002394 0.004800

Restrictions are linear in coefficients.

Redundant Variables Test

Equation: UNTITLED

Specification: LNWAGE C EDUC EXPER TENURE

Redundant Variables: TENURE

Value df Probability

t-statistic 4.927939 896 0.0000

F-statistic 24.28459 (1, 896) 0.0000

Likelihood ratio 24.06829 1 0.0000

F-test summary:

Sum of Sq. dfMean

Squares

Test SSR 3.664668 1 3.664668

Restricted SSR 138.8757 897 0.154822

Unrestricted SSR 135.2110 896 0.150905


LR test summary:

Value df

Restricted LogL -436.0776 897

Unrestricted LogL -424.0434 896

Restricted Test Equation:


Method: Least SquaresDate: 11/05/12 Time: 18:53

Sample: 1 900

Included observations: 900


C 5.537798 0.114233 48.47827 0.0000

EDUC 0.075865 0.006697 11.32741 0.0000

EXPER 0.019470 0.003365 5.786278 0.0000




Sum squared resid 138.8757 Schwarz criterion 0.991736

Log likelihood -436.0776 Hannan-Quinn criter. 0.981843




17/21



Date: 11/05/12 Time: 18:55

Sample: 1 900



C 6.697589 0.040722 164.4699 0.0000

EXPER -0.002011 0.003239 -0.621069 0.5347

TENURE 0.015400 0.002792 5.516228 0.0000








Omitted Variables Test

Equation: UNTITLED

Specification: LNWAGE C EXPER TENURE

Omitted Variables: EDUC

Value df Probability

t-statistic 11.01871 896 0.0000

F-statistic 121.4120 (1, 896) 0.0000

Likelihood ratio 114.3693 1 0.0000

F-test summary:

Sum of Sq. dfMean

Squares

Test SSR 18.32169 1 18.32169Restricted SSR 153.5327 897 0.171162



LR test summary:

Value df

Restricted LogL -481.2281 897

Unrestricted LogL -424.0434 896

Unrestricted Test Equation:



Date: 11/05/12 Time: 18:56

Sample: 1 900



C 5.528329 0.112795 49.01237 0.0000

EXPER 0.015358 0.003425 4.483631 0.0000

TENURE 0.012964 0.002631 4.927939 0.0000


18/21

EDUC 0.073117 0.006636 11.01871 0.0000







CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM): CHAPTER 4

4.2 THE NORMALITY ASSUMPTION FOR ui

The classical normal linear regression model assumes that each ui is distributed normally with

Mean: E(ui ) = 0 (4.2.1)

Variance: E*ui E(ui )+2 = E u2 = 2 (4.2.2)

cov (ui, uj): E*(ui E(ui)+*uj E(uj )+-= E(ui uj ) = 0 i j (4.2.3)

The assumptions given above can be more compactly stated as ui N(0, 2) (4.2.4)

Why the normality assumption is required? Please see page 109

4.3: Properties of the OLS estimators under the normality assumption:

See page: 110-112.


19/21

CHAPTER FIVE: TWO VARIABLE REGRESSION: INTERVAL ESTIMATION AND HYPOTHESIS TESTING

5.1: Statistical prerequisites

5.2 Interval Estimation: Some basic ideas

5.3 Confidence Intervals for regression coefficients

5.4 CONFIDENCE INTERVAL FOR 2

5.5 HYPOTHESIS TESTING: GENERAL COMMENTS

Having discussed the problem of point and interval estimation, we shall now consider the topic of

hypothesis testing. In this section we discuss briefly some general aspects of this topic; Appendix A gives

some additional details. The problem of statistical hypothesis testing may be stated simply as follows: Is

a given observation or finding compatible with some stated hypothesis or not? The word compatible,

as used here, means sufficiently close to the hypothesized value so that we do not reject the stated

hypothesis.

5.6 HYPOTHESIS TESTING: THE CONFIDENCE-INTERVAL APPROACH

Two-Sided or Two-Tail Test

To illustrate the confidence-interval approach, once again we revert to the

Consumptionincome example. As we know, the estimated marginal propen-sity to consume (MPC),

2, is 0.5091. Suppose we postulate that

H0: 2 = 0.3

H1: 2 = 0.3

that is, the true MPC is 0.3 under the null hypothesis but it is less than or greater than 0.3 under the

alternative hypothesis. The null hypothesis is a simple hypothesis, whereas the alternative hypothesis is

composite; actually it is what is known as a two-sided hypothesis. Very often such a two-sided

alternative hypothesis reflects the fact that we do not have a strong a priori or theoretical expectation

about the direction in which the alternative hypothesis should move from the null hypothesis.

Is the observed 2 compatible with H0? To answer this question, let us refer to the confidence interval

(5.3.9). We know that in the long run intervals like (0.4268, 0.5914) will contain the true 2 with 95

percent probability. Consequently, in the long run (i.e., repeated sampling) such intervals provide a

range or limits within which the true 2 may lie with a confidence coefficient of, say, 95%. Thus, the

confidence interval provides a set of plausible null hypotheses. Therefore, if 2 under H0 falls within the

100(1 )% confidence interval, we do not reject the null hypothesis; if it lies outside the interval, we

may reject it.7 This range is illustrated schematically in Figure 5.2. Always bear in mind that there is a

100 percent chance that the confidence interval does not contain 2 under H0 even though the


20/21

hypothesis is correct. In short, there is a 100 percent chance of committing a Type I error. Thus, if =

0.05, there is a 5 percent chance that we could reject the null hypothesis even though it is true.

Following this rule, for our hypothetical example, H0: 2 = 0.3 clearly lies outside the 95% confidence

interval given in (5.3.9). Therefore, we can reject the hypothesis that MPC is 0.3, with 95% confidence.

One-Sided or One-Tail Test

Sometimes we have a strong a priori or theoretical expectation (or expectations based on some previous

empirical work) that the alternative hypothe-sis is one-sided or unidirectional rather than two-sided, as

just discussed. Thus, for our consumptionincome example, one could postulate that

H0: 2 0.3 and H1: 2 > 0.3

Perhaps economic theory or prior empirical work suggests that the mar-ginal propensity to consume is

greater than 0.3. Although the procedure to test this hypothesis can be easily derived from (5.3.5), the

actual mechanics are better explained in terms of the test-of-significance approach.

5.7 HYPOTHESIS TESTING:

THE TEST-OF-SIGNIFICANCE APPROACH

Testing the Significance of Regression Coefficients: The t Test An alternative but complementary

approach to the confidence-interval method of testing statistical hypotheses is the test-of-significance

approach developed along independent lines by R. A. Fisher and jointly by Neyman and Pearson. Broadly

speaking, a test of significance is a procedure by which sample results are used to verify the truth or

falsity of a null hypothesis. The key idea behind tests of significance is that of a test statistic (estimator)

and the sampling distribution of such a statistic under the null hypothesis. The decision to accept or

reject H0 is made on the basis of the value of the test statistic obtained from the data at hand.

As an illustration, recall that under the normality assumption the variable

t = 2 2/se ( 2)

=2

22 )( ix )/ please see page 129-132

5.8 HYPOTHESIS TESTING: SOME PRACTICAL ASPECTS

The Meaning of Accepting or Rejecting a Hypothesis If on the basis of a test of significance, say, the t

test, we decide to accept the null hypothesis, all we are saying is that on the basis of the sample

evidence we have no reason to reject it; we are not saying that the null hypothesis is true beyond any

doubt. Why? To answer this, let us revert to our consumptionincome example and assume that H0: 2

(MPC) = 0.50. Now the estimated value of the MPC is 2 = 0.5091 with a se ( 2) = 0.0357. Then


21/21

on the basis of the t test we find that t = (0.5091 0.50)/0.0357 = 0.25, which is insignificant, say, at

= 5%. Therefore, we say accept H0. But now let us assume H0: 2 = 0.48. Applying the t test, we

obtain t = (0.5091 0.48)/0.0357 = 0.82, which too is statistically insignificant. So now we say accept

this H0.

Please read page 139:The choice between confidence interval and test of significance approaches to

hypothesis testing

5.9 REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE

In this section we study regression analysis from the point of view of the analysis of variance and

introduce the reader to an illuminating and complementary way of looking at the statistical inference

problem.

5.12 EVALUATING THE RESULTS OF REGRESSION ANALYSIS

Please read page 147: Normality tests that include (1) Histogram of residuals; (2) Normal probability

plot (NPP), a graphical device; and (3) Jarque-Bera test.

Documents

First Five Chapter Gujrati