Upload
trinhxuyen
View
227
Download
0
Embed Size (px)
Citation preview
Applied Econometrics
Descriptive Statistics
Michael Ash
Econ 753
Descriptive Statistics – p.1/22
Review of Summers
Good econometrics
• Interesting
• Exploratory
• Robust
• Convincingly causal vianatural experiments
• Identify regularities for the-ory to explain
Bad econometrics
• Critical test of deductivemodels
• Deep structural parameters
Descriptive Statistics – p.2/22
Descriptive Statistics and Quantitative Ease
• Conversation starters
• Examples◦ Nurses’ unions and heart-attack mortality◦ Environmental justice
• Good graphical practice (Tufte)
Descriptive Statistics – p.3/22
Descriptive Statistics
• Descriptive statistics should build a case—more than a pro formapresentation of means
• Develop stylized facts by separating the data into categories
• Generate a puzzle
• Multivariate methods then◦ Elaborate the initial case by demonstrating robustness; or◦ Unravel the puzzle in a convincing way.
Descriptive Statistics – p.4/22
Graphical Excellence
The Visual Display of Quantitative Information (Edward Tufte)
Graphical excellence is that which gives to the viewer thegreatest number of ideas in the shortest time with the leastink in the smallest space.
Descriptive Statistics – p.5/22
Lessons of Tufte: Graphics should
• show the data
• induce the viewer to think about the substance rather than aboutmethodology, graphic design, the technology of graphicproduction, or something else.
• avoid distorting what the data have to say
• present many numbers in a small space
• make large data sets coherent
• encourage the eye to compare different pieces of data
• reveal the data at several levels of detail from a broad overview tothe fine structure
• serve a reasonably clear purpose: description; exploration,tabulation, or decoration
• be closely integrated with statistical and verbal descriptions of adataset
Descriptive Statistics – p.6/22
Some Examples
• Cancer maps
• Early epidemiology: John Snow and Cholera
• HIV–effective time series
• Communication before the Challenger accident
• Phillips Curve
• Oil prices
• Detailed tables (Carter v. Reagan)
• War and Peace (in one page)
Descriptive Statistics – p.7/22
Cancer maps
• Cancer incidence by county.
• Cancer clusters (Civil Action, New Yorker)
• Shortcoming◦ The visual importance of the county is mapped to its
geographic area rather than its population.
Descriptive Statistics – p.8/22
Cholera
• Dr. John Snow and the London cholera epidemic of 1854
• Maps are effective where where spatial relationship matters, i.e.,the proximity of two different places matters: proximity of the VauxHall water company pump to the houses with cholera deaths.
• On the other hand, if you want to establish within-placeassociation, a scatterplot may be better, e.g., toxic emission ratesand race might do better on a scatterplot than on two maps or onone map with two coding systems. See, for example, Ash andFetter.
Descriptive Statistics – p.9/22
HIV and deaths among the young
• Easy to see the rise of HIV
• Criticisms◦ Young people die at much lower rates than do old people◦ Men’s and women’s scales are quite different: young men die
at about twice the rate of young women.
Descriptive Statistics – p.10/22
Bad Communication & the Challenger Accident
• In hindsight, clear relationship between temperature and O-ringfailures
Descriptive Statistics – p.11/22
(Breakdown of the) Phillips Curve
• Consider the time series alternative
• Criticisms◦ Scale of each country is different.
Descriptive Statistics – p.12/22
Some practical data aesthetics
• Don’t waste data graphics to present trend lines without data; oneor two numbers express a trend line perfectly well.
• Use scatterplots to imply causal relationships that you will assesswith other methods, statistical and textual.
• Time series plots express periodicity and develop event studies orstructural breaks. With trending data, overlaying two time seriescan be a way to cheat. Use scatterplots for related variables andlabel dates, e.g., the Phillips curve. Present real prices (unlessthe topic is price indexes) See oil prices.
• Avoid vertical lines in your tables. Columns of numbers divided bywhitespace give plenty of division. Use horizontal lines sparingly.The eye does a good job reading a well-designed table withoutlines. Table should be rich and detailed. (See Carter v. Reagan.)
• Go easy on pie charts; because there are relatively few numbers,their contents can almost always be presented better in a table.
• Avoid legends; they’re very distracting. Label series directly onthe chart (arrows if necessary).
Descriptive Statistics – p.13/22
Categories
• Race and Ethnicity◦ Racial categorization: from 5 (hite, black, Asian/PI, Native,
Other) to 63 (white (y/n), black (y/n), Asian (y/n), PI (y/n),Native (y/n), Other (y/n)]
◦ Hispanic (y/n)◦ What do these categories mean?
• Profits, returns to capital, surplus value, managerialcompensation, returns to risk
• Describing and interpreting unemployment
Descriptive Statistics – p.14/22
Unemployment
• Why study unemployment? Business cycle, wage-setting (reservearmy), spatial mismatch, structural change, skills mismatch, skillsdecay, poverty, inequality, health effects, gender, race.
• “Easy” to partition all adults:
E +U +N
• Who is counted as unemployed? “Persons are classified asunemployed if they do not have a job, have actively looked forwork in the prior 4 weeks, and are currently available for work”Data source: Current Population Survey
• Who is not counted as unemployed?◦ discouraged:◦ underemployed: part-time workers who would prefer full-time
work (even 1 paid hour per week); college-educated workersin “high-school jobs”; contingent workers
Descriptive Statistics – p.15/22
Approaches
• Purpose? Cyclical, Mismatch, Gender, etc.
• U1–U6 alternative measures of labor underutilization(Analogy to M1, . . . ,Mn measure of the money supply)
Descriptive Statistics – p.16/22
Alternative measures of labor underutilization
U-1 Persons unemployed 15 weeks or longer, as a percent of thecivilian labor force (2.3 percent in 2003)
U-2 Job losers and persons who completed temporary jobs, as apercent of the civilian labor force (3.3 percent)
U-3 Total unemployed, as a percent of the civilian labor force (officialunemployment rate) (6.0 percent)
U-4 Total unemployed plus discouraged workers, as a percent of thecivilian labor force plus discouraged workers (6.3 percent)
U-5 Total unemployed, plus discouraged workers, plus all othermarginally attached workers, as a percent of the civilian laborforce plus all marginally attached workers (7.0 percent)
U-6 Total unemployed, plus all marginally attached workers, plus totalemployed part time for economic reasons, as a percent of thecivilian labor force plus all marginally attached workers (10.1percent)
Descriptive Statistics – p.17/22
Alternative measures of slack
• EPOP (employment-to-population ratio)◦ Source: Current Population Survey◦ Does not include the intentionality implicit in measures of
labor underutilization.◦ Secular trends, typically segmented by sex
• Capacity Utilization (source: Federal Reserve Board survey ofbusinesses)
Descriptive Statistics – p.18/22
Current Population Survey
• Approximately 50,000 households per month
• Partial panel structure (4–8–4)
• Monthly Social, demographic, and labor force questions
• Supplements: smoking, school enrollment, voting, fertility, training
• January 1994 Redesign: (un)employment questions had beenasked in an explicitly sexist fashion.
If the respondent “appeared to be a homemaker,” themanual instructed interviewers to ask “What were youdoing most of last week—keeping house or somethingelse?” . . . For . . . other respondents, interviewers wereinstructed to ask, “What were you doing most of lastweek—working or something else?”
The redesign affected the measured unemployment rate forwomen (raising it). It also affected the measurement part-timeworkers voluntarily and involuntarily so employed.
• Representative of the U.S. population, rich questions, regular andlarge
Descriptive Statistics – p.19/22
Environmental Justice (Ash and Fetter 2004)
• EJ: differential availability of environmental amenities or exposureto environmental disamenities on the basis of socioeconomic,ethnic, or racial differences.
• Industrial toxic exposure in the United States
• EPA Toxic Release Inventory and neighborhood-level U.S.Census data
• Toxic data adjusted for fate and dispersion and toxicity
• Key findings◦ Blacks tend to live both in more polluted cities in the U.S. and
in more polluted neighborhoods within cities.◦ Hispanics live in less polluted cities on average, but they live
in more polluted areas within cities.◦ Strong income-pollution gradient, with lower income people
significantly more exposed.
Descriptive Statistics – p.20/22
Descriptive statistics, plots, results
• Histogram
• City halves
• Milwaukee maps (dropped in final)
• Results
At the median, a 10,000 dollar increase in income isassociated with a 7 percentage point decrease in theprobability of being in the more polluted half of the city.
Descriptive Statistics – p.21/22
Nurses Unions and Heart Attack Mortality
• Do unionized registered nurses achieve better patient outcomes?◦ Why plausible (briefly)?◦ Strategy: compare risk-adjusted heart-attack mortality in
union and non-union hospitals in California (early 1990’s)◦ Strong bivariate relationship (prima facie evidence)◦ But also substantive differences between union and
non-union hospitals◦ Multivariate and specification test to buttress causal claim.
Descriptive Statistics – p.22/22