View
65
Download
0
Category
Preview:
Citation preview
Lecture 1: Introduction to Econometrics University of San Francisco Department of Economics Prof. Jesse K. Anttila-Hughes January 21st, 2014
NYTimes, Jan 29th, 2013
Today well cover: Introduction to econometrics Review of probability
Reading: Wooldridge Appendixes B and C
Office Hours: Tuesdays 1-4pm or by appt
Problem set #1 On Blackboard Due next Monday
Computer assignment #1 Get Stata Download the data files from Blackboard Bring your laptop, with Stata installed and the data files
Goals for this class Emphasis on:
Formal econometrics : Basic statistics and probability Fundamentals of multivariate OLS regression
Regression execution and interpretation Hypothesis testing and standard errors
Simple connections between research questions and econometrics Including some common basic problems and their solutions / lack thereof
Formal foundation for Econometrics II and III Tacit knowledge
Basics of performing econometrics analysis in Stata Basics good habit for handling and managing data
By the end of class you should have: Simple, intuitive understandings of what econometrics is and what it can
and cant do A beginners proficiency with Stata Enough econometrics to start appreciating how much there is to learn
Econometrics involves a lot of tacit learning:
Administraterrata Class:
6:30 9:15 pm on Tuesdays 14 lectures total, one midterm
Blackboard primary resource Syllabus, lecture notes, hws, readings, etc.
Grades: 12 Problem sets (drop lowest 2): 35% Class participation: 10% Midterm exam: 25% Final exam: 30%
Book: Wooldridge 5th Edition
What were aiming for
Hsiang et al. 2013
The Nature of Econometrics and Economic Data
Econometrics What is econometrics?
The statistics used by economists
Why do we do econometrics? Estimating relationships between economic variables Testing economic theories and hypotheses Forecasting economic variables Evaluating or implementing policies
Etc.
In general, econometrics starts with an economic model (i.e., something youd learn in a theory class) and then generates testable predictions which come from that model The step of explicitly defining a model is often skipped
Economic model of crime (Becker (1968)) Derives equation for criminal activity based on utility maximization
Functional form of relationship not specified Equation could have been postulated without economic modeling
Hours spent in criminal activities
Returns to criminal activities Wage for legal
employment Other income
Probability of getting caught
Probability of conviction if caught
Expected sentence
Age
What kind of theory models?
Model of job training and worker productivity What is effect of additional training on worker productivity? Formal economic theory not really needed to derive equation (but may):
Other factors may be relevant, but these are the most important (?)
Hourly wage
Years of formal education Years of work-
force experience
Weeks spent in job training
What kind of theory models?
Econometric model of criminal activity The functional form has to be specified Variables may have to be approximated by other quantities
Measure of cri- minal activity
Wage for legal employment
Other income
Frequency of prior arrests
Frequency of conviction
Average sentence length after conviction
Age
Unobserved deter- minants of criminal activity
e.g. moral character, wage in criminal activity, family background
What kind of econometric models?
Econometric model of job training and worker productivity
Most of econometrics deals with the specification of the error Econometric models may be used for hypothesis testing
For example, the parameter represents effect of training on wage How large is this effect? Is it different from zero?
Hourly wage Years of formal education
Years of work- force experience
Weeks spent in job training
Unobserved deter- minants of the wage
e.g. innate ability, quality of education, family background
What kind of econometric models?
Types of data Econometric analysis requires data
And there are many, many different things that count as data
In general, we distinguish between the four major kinds of economic data in terms of how they interact between units of observation and time Cross-sectional data
Multiple units of obs., single time Time series data
Single unit of obs., multiple times Pooled cross sections
Multiple unit of obs., multiple times, but different obs. each time Panel/Longitudinal data
Multiple units of observation with multiple time observations for each
Econometric methods depend on the nature of the data used Use of inappropriate methods may lead to misleading results
This will be a big theme in later classes
Cross-sectional data
Sample of individuals, households, firms, cities, states, countries, or
other units of interest at a given point of time/in a given period
Cross-sectional observations must be more or less independent
For example, pure random sampling from a population
Sometimes pure random sampling is violated, e.g. units refuse to
respond in surveys, or if sampling is characterized by clustering
Types of data
Observation number Hourly wage
Indicator variables (1=yes, 0=no)
" Cross-sectional data set on wages and other characteristics
Cross sectional data: One observation per unit of obs
Cross-sectional data on growth rates and country characteristics
Adult secondary education rates
Government consumtion as percentage of GDP
Growth rate of real per capita GDP
Cross sectional data: One observation per unit of obs
Time series data Observations of a variable or several variables over time
For example, stock prices, money supply, consumer price index, gross domestic product, annual homicide rates, automobile sales,
Time series observations are typically serially correlated Ordering of observations conveys important information Data frequency: daily, weekly, monthly, quarterly, annually, Typical features of time series: trends and seasonality Typical applications: applied macroeconomics and finance
Types of data
Time series data on minimum wages and related variables
Unemployment rate
Average coverage rate
Average minimum wage for given year
Gross national product
Types of data
Pooled cross sections Two or more cross sections are combined in one data set Cross sections are drawn independently of each other Pooled cross sections often used to evaluate policy changes Example:
Evaluate effect of change in property taxes on house prices Random sample of house prices for the year 1993 A new random sample of house prices for the year 1995 Compare before/after (1993: before reform, 1995: after reform)
Types of data
Pooled cross sections on housing prices
Number of bathrooms
Size of house in square feet
Property tax
Before reform
After reform
Types of data
Panel or longitudinal data The same cross-sectional units are followed over time
Panel data have both cross-sectional and a time series dimensions
Hence, panel data can be used to account for time-invariant unobservables
Panel data can be used to model lagged responses
Example:
City crime statistics; each city is observed in two years Time-invariant unobserved city characteristics may be modeled
Effect of police on crime rates may exhibit time lag
Types of data
Two-year panel data on city crime statistics
Each city has two time series observations
Number of police in 1986
Number of police in 1990
Types of data
Causal Inference vs. Association One of the major aspects of econometrics that distinguishes it from
statistics is a very strong emphasis on understanding causal inference Causal inference: evaluating whether a change one variable (x) will lead
to a change in another variable (y) assuming nothing else changes (ceteris paribus)
Why do we care about causality? A lot of times as econometricians were explicitly trying to evaluate a
policy to enact Thus we want to know what will happen if we change one variable
More generally, the statistical tools we have can tell us a lot about how two variables covary But correlation doesnt imply causation, and to get to causal inference we
generally need to know about how the problem works in real life A particular concern is when our two variables x and y are
endogenous, or jointly determined Aka x and y influence each other, or theres a third variable Z that affects
both
Example: Medicine (Medieval)
Four Humors Theory (not falsifiable)
Four Humors Empirics (sub-optimal outcomes)
Medicine (Victorian)
Miasma Theory (not falsifiable)
Miasma Empirics (sub-optimal outcomes)
Medicine (Early Modern, 1850s)
Theory : Cholera is a vector-borne disease transmitted by water
(falsifiable)
Disease Theory Empirics (decent outcomes)
Randomization Ironically given its late arrival to the scientific
method, medicine first developed what we may now consider the be the ultimate tool in causal inference in the sciences: the randomized control trial (or RCT)
Endogeneity and random assignment
Why is random assignment important? Consider
health_outcome = A*(took_drug) + where is our error term, took_drug is a binary
variable indicating that a patient took a drug, and health_outcome is some measurement of health
Why do we care that took_drug is randomly assigned? If took_drug is randomly assigned, then is it
correlated with anything? If not, who cares?
Why is endogeneity an issue? Randomization allows a researcher to eliminate the
possibility that they are arguing for a causal, exogenous interpretation of an endogenous system
Endogenous originating from inside the system, in this case taken to mean co-influential Education and earnings Prices of substitute or complementary goods Development and the environment
Exogenous means originating outside the system Interpreting an endogenous relationship as exogenous
means risking interpreting a system with reverse causality as strictly causal
Endogeneity by example: Classroom size and educational achievement
How is the relationship endogenous?
Class size: ways of attacking endogeneity Randomly assign students to large or small
classes Tennessee STAR
Find a natural experiment that produces something akin to randomization in class size Maimonides Rule in Israel
But these dont always work the way we think Discontinuous class size cutoffs in Chile
Why all this concern about endogeneity? Endogeneity is particularly troublesome in the social
sciences because humans are self-aware Humans might sort on / select into treatment
Our understanding of how to deal with endogeneity is relatively new Natural scientists dont normally have intelligent,
reactive data points Medicine, which one might argue is halfway between
the natural and social sciences, needed to be concerned with endogeneity early on
Endogeneity is still a concern in the natural sciences, though correlation does not imply causation
Recommended