Naghshpour Chap One

Embed Size (px)

Citation preview

  • 7/30/2019 Naghshpour Chap One

    1/32

    Regression for Economics

  • 7/30/2019 Naghshpour Chap One

    2/32

  • 7/30/2019 Naghshpour Chap One

    3/32

    Regression for Economics

    Shahdad Naghshpour

  • 7/30/2019 Naghshpour Chap One

    4/32

    Regression for Economics

    Copyright Business Expert Press, 2012.

    All rights reserved. No part of this publication may be reproduced,

    stored in a retrieval system, or transmitted in any form or by any

    meanselectronic, mechanical, photocopy, recording, or any other

    except for brief quotations, not to exceed 400 words, without the prior

    permission of the publisher.

    First published in 2012 by

    Business Expert Press, LLC

    222 East 46th Street, New York, NY 10017www.businessexpertpress.com

    ISBN-13: 978-1-60649-405-9 (paperback)

    ISBN-13: 978-1-60649-406-6 (e-book)

    DOI 10.4128/9781606494066

    Business Expert Press Economics and Finance collection

    Collection ISSN: 2163-761X (print)

    Collection ISSN: 2163-7628 (electronic)

    Cover design by Jonathan Pennell

    Interior design by Exeter Premedia Services Private Ltd.,

    Chennai, India

    First edition: 2012

    10 9 8 7 6 5 4 3 2 1

    Printed in the United States of America.

  • 7/30/2019 Naghshpour Chap One

    5/32

    o Parisa

    SN

  • 7/30/2019 Naghshpour Chap One

    6/32

  • 7/30/2019 Naghshpour Chap One

    7/32

    Abstract

    Te concept o regression was introduced by Sir Francis Galton, but

    R.A. Fisher provided the statistical theory and application or it or

    the frst time. Te 20th century witnessed the spread o regression

    analysis into every scientifc branch. Regression analysis is the most

    commonly used statistical method in the world. It is used in economics

    and many other felds. Although ew would characterize this technique

    as simple, regression is in act both simple and elegant. Te complexity

    that many attribute to regression analysis is oten a reection o their

    lack o amiliarity with the language o mathematics. But regressionanalysis can be understood even without a mastery o sophisticated

    mathematical concepts. Tis book provides the oundation o the

    regression analysis. All the examples are rom economics, and in almost

    all the examples the real data is used to show the applications o the

    method.

    Tis book seeks to demystiy regression analysis. Te concepts related

    to regression analysis are explained in a way that is comprehensible to

    those whose mathematical skills are not expert. Tere is logic to regression

    analysis that resembles the intrinsic logic that we apply in comprehending

    the various events that fll our lives, which are probabilistic rather

    than deterministic in nature. What hinders peoples comprehension

    o regression analysis is the di culty many have in understanding

    mathematical symbols and derivations. By removing this obstacle, this

    book enables the logical reader to learn regression without possessing

    superior mathematical skills. Although this proposed book will be largely

    nonmathematical in its approach, it will not in any way give short shrit

    to the subject o regression. Tis book is targeted to all business students

    and executives who need to understand the concept o regression or

    practical and proessional purposes.

    Te regression analysis can be used to establish causal relationship

    between actors and the response variable. However, in order to be

    able to do it, the economic theory must be used to provide causal

    relationship and apply the regression analysis to veriy the validity o

    the theory.

  • 7/30/2019 Naghshpour Chap One

    8/32

    Tis book utilizes Microsot Excel to obtain regression results.

    Although spreadsheet sotware is not the sotware o choice or perormingsophisticated regression analysis, it is widely available. Moreover, the use

    o Excel will preempt the need to buy and learn new sotware; in itsel

    another impediment to learning and using regression analysis.

    Keywords

    regression, analysis, causality, inerence

  • 7/30/2019 Naghshpour Chap One

    9/32

    Contents

    Foreword...............................................................................................xi

    Acknowledgments.................................................................................xiii

    Introduction .........................................................................................xv

    Chapter 1 Te Concept o Regression ................................................1

    Chapter 2 Te Method o Least Squares ...........................................13

    Chapter 3 Simple Linear Regression in Excel ....................................27

    Chapter 4 Multiple Regression .........................................................41

    Chapter 5 Goodness o Fit ..............................................................59

    Chapter 6 Regression Coe cients ....................................................71

    Chapter 7 Causality: Correlation Is Not Causality ............................83

    Chapter 8 Qualitative Variables in Regression ..................................89

    Chapter 9 Pitalls o Regression Analysis ........................................101

    Appendix............................................................................................117

    Glossary .............................................................................................129

    Notes..................................................................................................133

    References ...........................................................................................135

    Index .................................................................................................137

  • 7/30/2019 Naghshpour Chap One

    10/32

  • 7/30/2019 Naghshpour Chap One

    11/32

    Foreword

    Statistics Is the Science of FindingOrder in Chaos

    Regression analysis is by ar the most commonly used statistical analysis

    tool in many areas o science, including Economics. Ater you fnish the

    book, I hope you will agree with me that i there was one tool tailor-made

    or economics, it must be regression analysis. Tey are many aspects o

    regression that perectly match the needs o an economist.

    Oten students o introductory statistics are overwhelmed because o

    the diversity o the material. Tere are too many new concepts and too

    many dierent topics, which may not seem related in any sensible way.

    In regression analysis, the ocus is on one and only one topic, regression

    analysis. Tis narrow ocus is due to several reasons. Reason one is that

    ater having been exposed to introductory statistics, you are now ready toocus on a special topic. Reason two is that the topic is so vast that even

    dedicated books are su cient to cover all aspects o the topic. Te present

    manuscript does not even scratch the surace o the vast topic o regres-

    sion analysis. My hope is that you learn to see economics rom an applied

    angle and manage to ocus on specifc outcomes and their magnitude.

    I want you to know that every claim in economics is a testable hypothesis,

    and every theorem in economics can be written as a regression model and

    thus tested or the magnitude o the expected outcome. Regression analy-

    sis or its broader subject area, statistics, is not a substitute or economic

    theory. Instead, it is a complementary tool that allows us to estimate the

    magnitude o the theoretically predicted outcome and to test the results

    against the claims o policy makers and planners.

  • 7/30/2019 Naghshpour Chap One

    12/32

  • 7/30/2019 Naghshpour Chap One

    13/32

    Acknowledgments

    I am indebted to my wie Donna who has helped me in more ways

    than imaginable. I do not think I can thank her enough. I would like

    to thank Michael Webb or his relentless assistance in all aspects o the

    book. He has been my most reliable source and I could always count on

    him. I also want to thank my graduate assistants Issam Abu-Ghallous and

    Brian Carriere. Tey have provided many hours o help with all aspects

    o the process. Without the help o Mike, Issam, and Brian, the bookwould not have been completed. I also would like to thank Madeline

    Gillette, Anthony Calandrillo, and Matt Orzechowski who read parts o

    the manuscript.

  • 7/30/2019 Naghshpour Chap One

    14/32

  • 7/30/2019 Naghshpour Chap One

    15/32

    Introduction

    Economics is a very interesting subject. Te scope o economic domain is

    vast. Economics deals with market structure, consumer behavior, invest-

    ment, growth, fscal policy, monetary policy, the roles o the bank, etc.

    Te list can go on or quite some time. It also predicts how economic

    agents behave in response to changes in economic and noneconomic

    actors such as price, income, political party, stability, and so on. Te

    economic theory, however, is not specifc. For example, the theory provesthat when the price o a good increases the quantity supplied increases,

    provided all the other pertinent actors remain constant, which is also

    known as ceteris paribus. What the theory does not and cannot state is

    how much the quantity increases or a given increase in price. Te answer

    to this question seems to be more interesting to most people than the

    act that the quantity will increase as a result o an increase in price. Te

    truth is that the theory that explains the above relationship is impor-

    tant or economists. For the rest o the population, the knowledge o

    that relationship is worthless i the magnitude is unknown. Assume or

    10% increase in price the quantity increases by 1%. Tis has many di-

    erent consequences than i the quantity increases by 10%, and totally

    dierent consequences i the quantity increases by 20%. Te knowledge

    o the magnitude o change is as important, i not more important, than

    the knowledge o the direction o change. In other words, predictions

    are valuable when they are specifc.

    Statistics is the science that can answer specifc issues raised above.

    Te science o statistics provides the necessary theories that can providethe oundation or answering such specifc questions. Statistics theory

    indicates the necessary conditions to set up the study and collect data.

    It provides the means to analyze and clariy the meaning o the fndings.

    It also provides the oundation to explain the meaning o the fnding

    using statistical inerence.

    In order to be able to make an economic decision, it is necessary

    to know the economic conditions. Tis is true or all economic agents,

    rom the smallest to the largest. Te smallest economic agent might be

  • 7/30/2019 Naghshpour Chap One

    16/32

    xvi INTRODUCTION

    an individual with little earning and disposable income, while the largest

    can be a multinational corporation with thousands o employees, not tomention governments. Briey, we will discuss some o the main needs and

    uses o statistics in economics and then present some uses o regression

    analysis in economics as well.

    Te frst step in making any economic decision is to gain knowledge

    o the state o economy. Economic condition is always in a state o

    ux. Sometimes it seems that we are not very concerned with mundane

    economic basics. For example, we may not try to orecast what the price

    o a loa o bread is or a pound o meat. We know the average prices or

    these items; we consume them on a regular basis and will continue doing

    so as long as nothing drastic happens. However, i you were to buy a

    new car you would most likely call around and check some showrooms

    to learn about available eatures and prices because we tend not to have

    up-to-date inormation on big-ticket items or goods and services that we

    do not purchase regularly. Te process described above is a kind o sam-

    pling, and the inormation that you obtain is called sample statistics,

    which you use to make an inormed decision about the average price o

    an automobile. When the process is perormed according to restrict andormal statistical methods, it is called statistical inerence. Te specifc

    sample statistics is called sample mean. Mean is one o numerous sta-

    tistical measures at the disposal o modern economists. Another useul

    measure is the median. Te median is a value that divides observations

    into two equal halves, one with values less than the median and the

    other with values more than median. Statistics explains when each meas-

    ure should be used and what determines which one is the appropriate

    measure. Median is the appropriate measure when dealing with home

    prices or income. Applications o statistical analysis in economics are

    vast, and sometimes they reach to other disciplines that need econom-

    ics or assistance. For example, when we need to build a bridge to meet

    economic, social, and even cultural needs o a community, it is impor-

    tant to fnd a reliable estimate o the necessary capacity o the bridge.

    Statistics indicates the appropriate measure to be used by teaching us

    whether we should use the median or the mode. It also provides insight

    on the role that variance plays in this problem. In addition to identiying

    the appropriate tools or the task on hand, statistics also provides the

  • 7/30/2019 Naghshpour Chap One

    17/32

    INTRODUCTION xvii

    methods o obtaining suitable data and procedure or perorming

    analysis to deliver the necessary inerence.One cannot imagine an economic problem that does not depend on

    statistical analysis. Every year, the Government Printing O ce compiles

    the Economic Report o the President. Although the majority o the sta-

    tistics in the report are act-based inormation about dierent aspects o

    economics, many o the statistics are based on some statistical analysis,

    albeit descriptive statistics. Descriptive statistics provides simple yet

    powerul insight to economic agents and enable them to make more

    inormed decisions.

    Another component o statistical analysis is inerential statistics.

    Inerential statistics allows the economist and political leaders to test

    hypotheses about economic condition. For example, in the presence o

    ination, the Federal Reserve Board o Governors may choose to reduce

    money supply to cool down the economy and slow down the pace o

    ination. Te knowledge o how much to reduce the supply o money is

    not only based on economic theory, but also depends on proper estima-

    tion o the fnal outcome.

    Another widely used application o statistical analysis is in policy deci-sion. We hear a lot about the erosion o the middle class or that the mid-

    dle class pays a larger percentage o its income in taxes than the lower

    and upper classes. However, how do we know who is the middle class.

    A set dollar amount o income would be inadequate because o ina-

    tion, although, we must admit even a single dollar amount must also

    be obtained using statistics. However, statistical analysis has a much

    more meaningul and more elegant solution. Te concept o interquartile

    range identifes the middle 50% o the population or income. Although

    interquartile range was not designed to identiy the middle 50% and is

    not explained in these terms, the combination o economics and statistics

    is used to identiy the middle 50% or economics and policy decision

    purposes.

    Te knowledge o statistics can also help to identiy and comprehend

    daily news and events. Recently, a report indicated that the chance o

    accident or teenage drivers increases by 40% when there are passengers

    in the car that are under 21 years o age. Tis is a meaningless report.

    Few teenagers drive alone or have passengers over 21 years o age. otal

  • 7/30/2019 Naghshpour Chap One

    18/32

    xviii INTRODUCTION

    miles driven by teenagers when there passengers under 21 years o age ar

    exceeds any other types o teenage driving. Other things equal, the moreyou drive, the higher the probability o an accident. Tis example indi-

    cates that the knowledge o statistics is helpul in understanding everyday

    events and in making sound analysis.

    When an economic phenomenon is changed to produce a desirable

    income, we need more powerul tools than simple statistics. Regression

    analysis is one o the most widely used statistical tools at the disposal o

    economists.

    In regression analysis, the eect o one or more actor is measured to

    determine another actor. Te frst group is also known as explanatory

    variables, while the latter is known as endogenous variables. In econom-

    ics it makes sense to reer to explanatory variables as policy instruments.

    Policy instruments are variables that economists and policy makers can

    change or control. Te supply o money is a policy instrument controlled

    by the Federal Reserve. Te Fed has to collect data frst, which is done on

    a periodic basis. Tese statistics inorm the Fed that there is a problem in

    the economy, such as ination. Te Fed decides to reduce the supply o

    money. It will wait or the economy to respond to the change in supply omoney. Ten economic indicators are measured again and tested against

    the target set by the policy. I the policy objectives are not met, the action

    is repeated until the desirable outcome is obtained.

    When working with a regression model, one might wonder i it

    was designed to serve economists. Even some o the commonly used

    terminologies are the same in both felds. For example, both subjects use

    explanatory variables to measure the response variable. ypical regres-

    sion models do not consist o one explanatory variable and one response

    variable. Instead, in addition to explanatory variables, the model has addi-

    tional variables known as control variables. Control variables are actually

    the same thing as economics shiters. Shiters in economics reer to

    variables that are assumed to remain constant or the sake o identiying

    the impact o the explanatory variables on the response variables. In

    act, every economic theory seems to have the amous ceteris paribus,

    which means other things being equal. When other things are not equal

    and change, they do not distort the relationship between explanatory

    and response variables. Tey simply shit the magnitude up or down,

  • 7/30/2019 Naghshpour Chap One

    19/32

    INTRODUCTION xix

    depending on the direction o the impact. Estimation o demand pro-

    vides a good example. Economic theory states that an increase in pricereduces the quantity demanded, ceteris paribus. Te regression model or

    this economic theory can be written as

    Qd

    =b0

    +b1P+ e (I.1)

    where e is the error term, which will be explained later. o complete the

    process, we need to test the hypothesis that the coe cient o price, which

    is also the slope o the demand curve, is negative. So we use statistics to

    test the ollowing hypothesis:

    H0: b

    1= 0 H

    1: b

    1< 0

    Te model, however, is not complete, because it is not subject to ceteris

    paribusas it does not control anything. Simple control variables consist o

    price o a complementary good, a substitute good, and income, to name

    just a ew important ones. Te theory predicts that the eect o a change

    in the price o a complementary good is inverse, the eect o a change inthe price o a substitute good is direct, and the eect o change in income

    is direct. Tus, model (I.1) should be modifed as below.

    Qd

    =b0

    + b1P+ b

    2P

    c+b

    3P

    s+b

    4Y+ + e, (I.2)

    Te theoretical claims are written as

    H0

    : b1

    = 0 H1

    : b1

    < 0

    H0

    : b2

    = 0 H1

    : b2

    > 0

    H0

    : b3

    = 0 H1

    : b3

    < 0,

    where the subscripts use the frst letters o complementary and substi-

    tute, and Yrepresents income. Te regression model clearly and perectly

    matches the economic theory rom expected eects o each variable to the

    concept oceteris paribus.

  • 7/30/2019 Naghshpour Chap One

    20/32

  • 7/30/2019 Naghshpour Chap One

    21/32

    CHAPTER 1

    The Concept of Regression

    Relationship Between Variables

    Oten we are interested in explaining a phenomenon using other actors.

    Tere are numerous methods or accomplishing this objective. When thephenomenon is quantitatively measurable, the solution is much easier

    and the methods are well established. One such method is regression.

    In regression analysis, one variable (dependentvariable) is explained

    by one or more variables (independentvariables). Beore explaining a

    regression model, presenting an example o a simple model or explaining

    consumption using income is benefcial. But we frst need to defne the

    economic concept marginal propensity to consume (MPC).

    Definition 1.1

    Te marginal propensity to consume or MPC represents the amount

    one would consume i one is given an extra dollar.

    Consumption = subsistence consumption +

    (marginal propensity to consume) (income).(1.1)

    Conceptually, MPC is the same as the slope o regression line whenthere is only one independent variable. In equation (1.1), consump-

    tion is the dependent variable and income is the independent variable.

    Although the term dependent variable is commonly used in econom-

    ics literature, other names such as endogenous variable, Y variable,

    response variable, or even outputare oten used as well. Similarly, the

    term independentvariable might be replaced byexogenous variable,

    Xvariable, regressor, input, actor, or predictor variable.

  • 7/30/2019 Naghshpour Chap One

    22/32

    2 REGRESSION FOR ECONOMICS

    Equation (1.1) is a good example o the concept o regression, but it

    is not a regression model. Te ormat or a regression model will be dis-cussed shortly. You are more likely to be amiliar with a mathematical

    unction than a statistical unction such as regression. A mathematical

    unction represents a nonprobabilistic association between a depend-

    ent variable and one or more independent variables; the association is

    exact and fxed (Figure 1.1a). A regression model is a simplifcation

    o reality. It is actually aclaim o a relationship and thus, a testable

    hypothesis. Te association between the dependent variable and the

    independent variable(s) is probabilistic and not deterministic. It is

    true on the average only. Figure 1.1b depicts pairs o (X, Y) observa-

    tions relating dependent variable (Y) to the independent variable (X).

    Many actors aect the actual value oYand cause the observation to

    deviate rom the expected values. A regression model represents the

    expected value.

    Equation (1.1) is the equation o a line except that it is not written

    in the customary orm (used in geometry). It is also a unction because

    it provides a specifc outcome based on a linear rule, that is, as income

    changes, consumption changes by the magnitude o theMPC. I incomebecomes zero, consumption drops to the level o subsistence consump-

    tion, which is the level o consumption necessary to survive even i one

    does not have any income. Note that here we are not interested in answer-

    ing how one manages to pay or subsistence consumption, which could

    be rom savings, selling household urniture, or something else. Tat is

    Figure 1.1. Comparison of (a) a function with (b) a regression model.

    a. A function

    OX

    Y

    Y=b0

    +b1

    X

    b. A regression line superimposed

    on observations

    OX

    Y

    Y=

    b0+b1

    X+e

  • 7/30/2019 Naghshpour Chap One

    23/32

    THE CONCEPT OF REGRESSION 3

    not the purpose o this model. Te purpose is to explain the level o con-

    sumption in response to changes in income. Tis model is a simplifcationo reality. For example, it does not take into account the role that wealth

    might play in explaining consumption. In a more elaborate model, addi-

    tional independent variables could be included that might improve the

    models ability to estimate the dependent variable more accurately and to

    more closely approximate the reality.

    Although this model is a good starting point, it is not a precise rep-

    lication o reality. Nevertheless, it is the same as a simple consumption

    unction explained in many introductory macroeconomics textbooks. As

    such, it serves a similar purpose: introduces the concept, clarifes applica-

    tion o the concept, and prepares or a more appropriate model.

    Definition 1.2

    Amodelis a simple representation o something real in lie.

    Te level o representativeness is determined by the purpose o the

    model and does not necessarily make a model more desirable, in part

    because the purposes o a study aect the desirability o the level osophistication o the model.

    Models need restrictions on their parameters to make sense. For

    example, theMPChas to be positive and less than one. A negative MPC

    means that as income increases, consumption decreases and eventually

    drops below subsistence level, while an MPC greater than one means

    that consumption at some point becomes larger than income. MPCval-

    ues below zero or above one contradict reality and dey common sense.

    Tereore, we restrictMPCto be between 0 and 1. In addition, negative

    values or the independent variable o income and the dependent variable

    o consumption are meaningless. Similarly, a negative subsistence level

    would be impossible. However, there are situations where the estimate or

    the subsistence level might turn out to be negative, but or the purpose o

    this example they can be ignored.

    Te our values o income, consumption, the MPC, and the sub-

    sistence level are very dierent rom each other. Consumption and

    income, the dependent and independent variables, are observable data.

    Tis means we can gather data on actual income and consumption

  • 7/30/2019 Naghshpour Chap One

    24/32

    4 REGRESSION FOR ECONOMICS

    levels o a sample o people. Te data are typically published and cus-

    tomarily represented in a column ormat. Subsistence consumptionand MPC, however, are known as parameters. Parameters are almost

    always unknown and have to be estimated. Although every nation has

    an MPC at any given point in time, the actual value is unknown, as

    is the case with the subsistence level o consumption. Te parameters

    are estimated by the model using regression analysis. In the jargons o

    regression, parameters are sometimes called coef cients or slopes. Te

    interpretation o coe cients and their appropriate analyses are covered

    in Chapter 6.

    Definition 1.3

    A parameter is a characteristic o a population that is o interest.

    Parameters are constant and usually unknown.

    Examples o parameters include population mean, population vari-

    ance, and regression coe cients. One o the main purposes o statistics

    is to obtain inormation rom a sample that can be used to make iner-

    ences about population parameters. Te estimated value obtained rom asample is called astatistic.

    Definition 1.4

    Astatisticis a numerical value calculated rom a sample that is variable

    and known.

    Te word statistic has several meanings depending on the context:

    two o its meanings are presented in the previous paragraph. Te frst useo the word reers to the science and the discipline o statistics. Te second

    use is more specifc and is based on the above defnition. In the science o

    statistics, we use statistics to make inerence about parameters.

    Te slope and intercept terminologies used in geometry are also

    commonly used to reer to coe cients in regression analysis. In the

    consumption model, the corresponding analogy to geometry is that

    MPCis the slope and subsistence level is the intercept o the consump-

    tion line. According to this model, a dollar increase in income increases

    consumption by the magnitude oMPC, which by defnition is the slope

  • 7/30/2019 Naghshpour Chap One

    25/32

    THE CONCEPT OF REGRESSION 5

    o regression line. When income is zero, the amount o consumption is

    equal to subsistence level and thereore, indicates the intercept.Te representative terms consumption and income used in

    equation (1.1) only apply to this particular problem, which renders

    them inapplicable when the problem is changed. Consider a model that

    explains quantity demanded as a unction o price o a good. I the price

    increases by one dollar, how much will the quantities demanded decrease?

    An attempt to write this question in the orm o a model results in a

    stalemate or a typical economist wishing to stick to vocabulary that has

    economic meaning. In equation (1.2) below, the problematic value is des-

    ignated by ? Te value that replaces ? answers the question i the

    price increases by $1, (how much) will the quantity demanded decrease.

    Te (how much) in the parenthesis does not have a defned economic

    name, thus, or the time being it is represented by a question mark.

    Quantity demanded =

    demand when the good is ree + (?) (price)(1.2)

    Te ? can be replaced by responsiveness o quantity demanded, orsome other unamiliar and arcane wording. Such arbitrary naming can only

    cause conusion and should be avoided. A reasonably good alternative

    or the (?), which would be close to the concept oMPCin equation (1.1),

    could be coe cient o responsiveness o quantity demanded to changes

    in price. One advantage o this term is the use o the previously defned

    concept ocoef cient. While this phrasing still has the shortcomings o

    the previous naming, it also has the added disadvantage o being long and

    wordy. Furthermore, an astute student would recall that it resembles the

    defnition oelasticity. In act, had the price and quantity been meas-

    ured in units o natural logarithm, the question mark could be replaced by

    price elasticity, as demonstrated in equation (1.3).

    ln(quantity demanded) = demand when the good is ree +

    (price elasticity o demand) (price),(1.3)

    where ln indicates natural logarithm as is customary. Sometimes

    equations that involve natural logarithm on both sides o the equation are

  • 7/30/2019 Naghshpour Chap One

    26/32

    6 REGRESSION FOR ECONOMICS

    called loglog, but this is a poor and inappropriate terminology, as is the

    name double-log equation.

    Definition 1.5

    Price elasticity of demandis the percentage change in quantity demanded

    divided by the percentage change in price.

    By expressing the price and quantity in natural logarithm, the coe-

    fcient o the slope o the price variable becomes the same as the demand

    elasticity. Tis is due to properties o the slope o regression line and math-

    ematical properties o the natural logarithm. In Chapter 9, using loga-

    rithm we address some modeling and data problems. In equation (1.3)

    there is no good explanation or intercept, so or simplicity and brevity

    it can be called by its generic term, namely the intercept. Nevertheless, it

    is better to think o the model in economics terms as much as possible.

    Although writing models in their economics equivalent terms is

    extremely useul, it can also be a cumbersome process. At times, it is

    helpul to use symbols instead o words. For example, i we replace con-

    sumption with C, income with Y, and marginal propensity to consumewith MPC in equation (1.1), as is customary, we obtain the ollowing

    equation:

    C= subsistence level o consumption + (MPC) (Y) (1.4)

    One might choose to represent subsistence level o consumption

    with SLC, but the acronym is not customary and thus, it does not help

    much. A more generic symbol might prove more pragmatic.

    Parameters are customarily represented by Greek letters, which make

    most people apprehensive. Consider the Greek letters as names or param-

    eters, which are generic terms. Equation (1.4) can be written as

    C= b0+b

    1Y (1.5)

    A novice mathematics student might be ill at ease with equation (1.4)

    or (1.5) because in mathematics it is customary to use the letter Yor the

    dependent variable, while here it is used to represent the independent

  • 7/30/2019 Naghshpour Chap One

    27/32

    THE CONCEPT OF REGRESSION 7

    variable. Economists customarily use the letter Yor income and are airly

    comortable with it. However, the ollowing ormat is not only preerredbut also more inormative:

    Consumption = b0

    + b1

    income (1.6)

    Tis indicatesthat

    i income changes by one unit, consumption

    changes byb1

    units in the direction o the sign ob1, which according

    to consumption theory, should be positive. Tis theoretical expectation

    o the outcome is the oundation o orming the alternative hypothesis.

    For more inormation consult.1 For example, ib1

    is 0.8, then as income

    increases by $100, consumption will increase by $80. Tis expected out-

    come can be verifed empirically, which makes it a testable hypothesis.

    In order to test the magnitude o theMPC, the slope parameter (b) must

    be estimated, as will be discussed later. Te next step ater estimating a

    parameter is to test the estimated value against theoretical expectation.

    In this example, it makes sense to test the estimate o the parameter to

    determine i it is equal to the numeral one, which indicates zero savings

    and zero borrowing. As it will become clear later, it would also make senseto test the estimated slope against the value o zero.

    From a Mathematical Equation to a Regression Model

    None o the equations that have been presented thus ar are actually

    regression models. Tey are mathematical unctions and more specif-

    cally, each is an equation o a line. Equations (1.1) and (1.4)(1.6) are

    consumption lines, where consumption is a unction o income, while

    equation (1.2) is a demand line or unction. Equation (1.3) is a line rep-

    resenting the percentage change in quantity demanded as a unction o

    percentage change in price. Its main parameter is the price elasticity o

    demand, which is the coe cient o the independent variable percentage

    change in price.

    Te reason none o these equations are models is that they are exact

    mathematical equations, as depicted in Figure 1.1a, and not a simplifca-

    tion o a real phenomenon in lie. Tings in real lie occur with a degree

    o uncertainty or probability and thus, they are random in nature. Adding

  • 7/30/2019 Naghshpour Chap One

    28/32

    8 REGRESSION FOR ECONOMICS

    a random component to these equations converts them into a regres-

    sion model. Te random component is called error term, or randomerror, or reasons that will be explained shortly. Te customary symbol

    is the Greek letter epsilon (e), but (U) and (V) are also common. In

    Figure 1.1b, the vertical distances between the actual observations and the

    regression model are the error terms.

    C= subsistence level o consumption + (MPC) (Y) + e (1.7)

    Consumption = b0+ b

    1income + e (1.8)

    C=b0

    + b1Y+ e (1.9)

    Te above three equations (1.7)(1.9) are regression models and

    express exactly the same thing. Tey are models that state, on the average,

    consumption depends on income in a linear ashion. Tese are all the

    same as claiming that income explains average consumption. Note that

    the use o the term average reers to average outcome or a dependent

    variable, which because o random error is probabilistic in nature and hasan average. It is dierent than the concept o average consumption, which

    is consumption divided by income.

    Soon you will learn that having a model is not su cient; a model

    must be useul, which is a concept that needs to be defned and clarifed.

    For sake o completeness, the dependent variable (C) represents consump-

    tion. For slope, we use the acronymMPC. Te independent variable (Y)

    represents income. Epsilon (e) is the error term;b0(beta zero) is the inter-

    cept, which represents the subsistence level, and b1

    (beta one) is the slope,

    which in this case represents theMPC.

    Students and scholars should develop the habit o ollowing the same

    procedure or regression models as it is customary in the proession. Te

    dependent variable, what is being explained, appears on the let-hand side

    o the equal sign. Examples rom the above models include consumption,

    quantity demanded, and percentage change in quantity demanded. Te

    term that is not related to the independent variable, the intercept, appears

    as the frst term on the right-hand side o the equal sign. It represents

    the value o the dependent variable in the case where the independent

  • 7/30/2019 Naghshpour Chap One

    29/32

    THE CONCEPT OF REGRESSION 9

    variable ails to be signifcant, which is reected by a zero value or its

    coe cient.Te independent variable and its coef cient are next on the

    right-hand side o the equation. In the three examples above, there is one

    independent variable in each model. Te independent variable or the

    consumption model is income, while or the quantity demanded model it

    is price. Finally, or the model estimating elasticity, the independent vari-

    able is the percentage change in price. I there were more than one inde-

    pendent variable, as will be the case soon, the variables ollow the same

    pattern one ater the other but not necessarily in any particular order. In

    act, the order in which independent variables are listed in a model has no

    impact on the fnal output. Te coe cient o the independent variable is

    also called slope o the line; however, it only makes sense i there is only

    one independent variable, as has been the case with the examples so ar.

    Customarily, the last term is the error term or e, which plays a very

    important role in a regression model. It converts a mathematical unction

    into a regression model that can be estimated using statistics. For a regres-

    sion analysis to be valid, the error term must comply with certain require-

    ments, which are customarily called assumptions. Te assumptions areplaced in the appendix because o the theoretical nature o the discussion.

    The Meaning of Regression

    As noted earlier, equations (1.1) and (1.4)(1.6) state the same thing,

    while models (1.7), (1.8), or (1.9) are exactly identical. We choose

    equation (1.6) and model (1.8) or comparison. Te dierence between

    an equation like (1.6) and model (1.8) seems to be that model (1.8)

    has one extra term, namely, the (e), which we learned is called the error

    term. However, there are a number o major dierences between the two

    equations. Some are simplistic, such as the act that equation (1.6) is a

    mathematical unction, while equation (1.8) is a regression model. Te

    other dierences need more explanation, which should clariy the dier-

    ence between an equation and a model. A mathematical unction repre-

    sents an exact relationship with exactly the same outcome each time it

    is perormed. However, a model is a representation or simplifcation o

    reality and includes a random error term to indicate that the outcome

  • 7/30/2019 Naghshpour Chap One

    30/32

    10 REGRESSION FOR ECONOMICS

    is stochastic rather than deterministic. Te term stochastic means that

    a model is probabilistic in nature; thereore, every time a new sample isobtained and the regression model is estimated, the results are slightly

    dierent, reecting the random nature o the model.

    In equation (1.6), the parameters b0

    and b1

    are known. In contrast,

    in model (1.8) they are unknown and must be estimated. Te customary

    use o equation (1.6) is to fnd the value o consumption with knowledge

    o known parameters b0

    and b1

    and a given value o income. Te act

    that b0

    and b1

    are known means anyone who chooses to insert a given

    value o the independent variable income in the equation would always

    get the same answer. No real data is necessary. I one chooses to use real

    data such as per capita income or a country or years 19732010, it

    is possible to obtain one value or consumption or each year. On the

    other hand, in model (1.8) the parametersb0

    andb1

    are unknown, which

    means it is impossible to obtain a value or consumption even with a

    known value or income until parameters b0

    and b1

    are estimated using

    regression analysis. In using model (1.8) the data or consumption and

    income are available. Tey are historical values that have been observed

    and cannot be changed or replaced arbitrarily. Using these observed val-ues the objective is to estimate the unknown parameters to obtain a line

    that best fts the data. Te study o regression analysis deals with methods

    or obtaining estimates or b0

    and b1

    that meet certain criteria deemed

    desirable and also to determine i there is a set o estimates that is best; a

    concept that must be defned clearly and precisely and will be covered in

    Appendix A. Customarily, estimated parameters are represented by Greek

    letters with a ^, called a hat symbol, as 0b and

    1b . Tese are pro-

    nounced beta-hat-sub-zero and beta-hat-sub-one, respectively.

    A model represents aclaim about a real-lie phenomenon. For exam-

    ple, model (1.8) claims that there is a cause and eect relationship between

    income and consumption, that is, as income increases consumption

    increases. One cannot include vice versaat the end o last sentence, because

    based on economic theory it is not true. In economics, income determines

    consumption while consumption does not determine income, at least not

    in an introductory discussion o the subject. Te theory that states income

    determines consumption belongs to economics not statistics. Te act that

    in macroeconomics, consumption also depends on income, via a dierent

  • 7/30/2019 Naghshpour Chap One

    31/32

    THE CONCEPT OF REGRESSION 11

    mechanism, is addressed later in a much more sophisticated analysis in

    more advanced economic courses. A model, as a simplifcation o reality, isproposed to explain the causal relationship between income and consump-

    tion. Regression analysis, as a statistical tool, is used to provide a theory

    that determines i there is su cient evidence in real lie to support the

    claim presented in economics. Te theories that justiy inerence based on

    evidence belong to statistics not economics.

    Tereore, every research model involves two dierent types o theo-

    ries, one rom the discipline in which the research is conducted and the

    other rom statistics. Te starting point or every research is the theoreti-

    cal oundations o the discipline, which or us is economics. Te estima-

    tion and inerence o the research are governed by theories in statistics.

    Te frst set o theories originates in economics, which provides the oun-

    dation or raising the research question and establishing the claim(s) o

    the study. For research in other felds, the relevant subject provides the

    appropriate theory or this purpose. Statistical theories govern the pro-

    cedures and assure that outcomes have desirable properties and can be

    generalized. Some o the desirable properties will be explained and veri-

    fed in this manuscript. Lack o appropriate theories rom either the feldo economics or statistics invalidates the research outcome.

    A consumption model like equation (1.8) is used to determine

    whether there is empirical evidence to reute economic theory. Note that

    economic theory does not make any assumption that parameters b0

    and

    b1

    are known. Although it places restrictions on them, such as b1must be

    a value between 0 and 1, when b1

    representsMPC. Any number outside

    the range 0 and 1 violates one or more economic rules or principles. A

    slope greater than 1 means that a one unit increase in income would

    increase consumption by more than 1 (or example, ib1

    is 1.2, then a

    $1.00 increase in income would increase consumption by $1.20), which

    at least in this simplest o consumption models is impossible. Also, a

    negative MPCmakes no economic sense. Teoretical properties o the

    coe cient can also be tested statistically, as will be seen in Chapter 6.

    In order to test any theory using a model there must be su cient data.

    Because parameters o the proposed model (b0

    and b1) are unknown, a

    statistical method known as regression analysis is necessary. Regression

    analysis is also called the method o least squares. Te simplest regression

  • 7/30/2019 Naghshpour Chap One

    32/32

    12 REGRESSION FOR ECONOMICS

    analysis uses a model that has onlyone independent variable, such as

    income, which means it has two parameters,b0 andb0. Tese parametersare also known as intercept and slope, respectively. Tis simple regression

    analysis requires one set o data, customarily arranged in two columns,

    one or the independent variable and another one or the dependent

    variable, which in this case are income and consumption, respectively.

    Estimated parameters depend on a particular observed set o data and

    are shown as 0b andb1.