28
Review: Types of Summary Statistics We’re often interested in describing the following characteristics of the distribution of a data series: Central tendency - where is the middle of the distribution? X Dispersion - how spread out is the data? X Skewness (asymmetry) - how symmetric (or assymetric) is the distribution? Peakedness - how fat are the tails, how tall is the peak? J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 1 / 28

Review: Types of Summary Statistics

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Review: Types of Summary Statistics

Review: Types of Summary Statistics

We’re often interested in describing the followingcharacteristics of the distribution of a data series:

Central tendency - where is the middle of thedistribution? X

Dispersion - how spread out is the data? X

Skewness (asymmetry) - how symmetric (orassymetric) is the distribution?

Peakedness - how fat are the tails, how tall is thepeak?

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 1 / 28

Page 2: Review: Types of Summary Statistics

Measuring Symmetry (or Asymmetry)

Typically use skewness to measure symmetry

Right-skewed: distribution has a long right tail and dataare concentrated to the left

Left-skewed: distribution has a long left tail and dataare concentrated to the right

One way to test for right- or left-skewed is to comparemedian to mean:

Symmetric: x̄ = median(x)Right-skewed: x̄ > median(x)Left-skewed: x̄ < median(x)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 2 / 28

Page 3: Review: Types of Summary Statistics

A Right-Skewed Distribution

July frequency250

100

150

200

250

y q y

July frequency

100

150

200

Number of flights

‐50

0

50

100

0 20 40 60 80 100

July frequency

0

50

‐30 ‐15 0 15 30 45 60 75 90 105 120

Arrival delay (minutes)

Distribution of arrival delays for Southwest flights into SMF,January 2010

Mean = 3.4 min , Median = -2 min , Skewness = 5.0

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 3 / 28

Page 4: Review: Types of Summary Statistics

A Left-Skewed Distribution

70

80

20

30

40

50

60

70Frequency

0

10

9.58

9.6

9.62

9.64

9.66

9.68

9.7

9.72

9.74

9.76

9.78

9.8

9.82

9.84

9.86

9.88

9.9

9.92

9.94

9.96

9.98

10

100m time (seconds)

Distribution of the 500 fastest 100m times as of December2010

Mean = 9.93 sec , Median = 9.95 sec, Skewness = -1.6

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 4 / 28

Page 5: Review: Types of Summary Statistics

Quantifying Skewness

The basic idea is to compare the mean with the median

How we actually do it:

n

(n − 1)(n − 2)

n∑i=1

(xi − x̄

s

)3

Interpretation of statistic: 0 if symmetric, greater than0 if right-skewed, less than zero if left skewed

Excel: use SKEW() function

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 5 / 28

Page 6: Review: Types of Summary Statistics

Measuring “Peakedness”

Peakedness is a question of how fat the tails of adistribution are

Formally, we use kurtosis:

n(n + 1)

(n − 1)(n − 2)(n − 3)

n∑i=1

(xi − x̄

s

)4

− 3(n − 1)2

(n − 2)(n − 3)

Excel: use KURT() function

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 6 / 28

Page 7: Review: Types of Summary Statistics

Interpreting Kurtosis

Kurtosis has no units (because xi − x̄ is divided by s)

If kurtosis is equal to 0, the distribution has the shapeof the normal distribution

If kurtosis is greater than 0, the distribution is peakedrelative to the normal distribution and has fat tails

If kurtosis is less than 0, the distribution is less peakedrelative to the normal distribution and has skinny tails

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 7 / 28

Page 8: Review: Types of Summary Statistics

Interpreting Kurtosis

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 8 / 28

Page 9: Review: Types of Summary Statistics

Excel Demonstration

To practice generating and interpreting summary statistics,we’ll use some flight delay data from SMF:

Data are for all Southwest flights departing SMF inJanuary and July of 2010

These are panel data (multiple observations for eachflight)

Data are available on Smartsite(southwest-flights-2010.xlsx)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 9 / 28

Page 10: Review: Types of Summary Statistics

Excel Demonstration

Before we switch over to Excel, a couple of quick notes:

Make certain that you have installed the data analysistoolpack for Excel (while not necessary for the summarystatistics, it will be necessary later in the course)

I’ll show you how to add it when we switch over to Excel

You can calculate summary statistics three ways:

Enter the formula as a functionUse the predefined function (AVERAGE, SKEW, etc.)Use the descriptive statistics function under dataanalysis

Now to Excel ...

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 10 / 28

Page 11: Review: Types of Summary Statistics

Graphical Representations of Univariate Data

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 11 / 28

Page 12: Review: Types of Summary Statistics

Graphical Representations of Univariate Data

With univariate data, we have a few different options forgraphing the data. The most common are:

Histograms - graphs showing the frequency ofoccurrence of different values

Pie charts, bar charts, column charts - various ways topresent observations that are measured in differentcategories

Line charts - plots of the variable value against theobservation number

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 12 / 28

Page 13: Review: Types of Summary Statistics

A Histogram Example Using Absolute Frequencies

Data are from the 2008 American Community Survey downloadedfrom usa.ipums.org

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 13 / 28

Page 14: Review: Types of Summary Statistics

A Histogram Example Using Relative Frequencies

Data are from the 2008 American Community Survey downloadedfrom usa.ipums.org

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 14 / 28

Page 15: Review: Types of Summary Statistics

Histograms

There are a few choices to make when constructing ahistogram.

Whether to use absolute frequency or relative frequencyfor the vertical axis

Absolute frequency - just the number of times aparticular value is observed in the dataRelative frequency - the number of times a value isobserved as a percentage of all observationsEither choice will lead to the same shape for thehistogram

How large to make the bin sizes

If the data take on many different values, you’ll want togroup data into binsIn general, the more observations you have, the morebins you use

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 15 / 28

Page 16: Review: Types of Summary Statistics

Constructing a Histogram in Excel

Choose ‘Data Analysis’ and then select ‘Histogram’

For input range, select the values you want to plot ahistogram of

Leave‘bin range’ blank to get automatic bins, or specifyyour own bin range

Select a cell with space below and to the right of it asthe ‘output range’

Click on ‘chart output’ and optionally ‘cumulativepercentage’

Now back to our flight data Excel...

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 16 / 28

Page 17: Review: Types of Summary Statistics

Pie and Bar/Column Charts

Histograms are good for representing numerical univariatedata. For categorical univariate data, we typically use piecharts or bar/column charts.

Pie charts are perhaps the easiest way for people tovisualize percentages

Bar/column charts have the advantage of being able toshow both relative and absolute frequencies

Bar/column charts will become more useful as we startadding more variables

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 17 / 28

Page 18: Review: Types of Summary Statistics

Creating Pie Charts in Excel

The first step is to get frequencies for the differentcategories

You can do this using the FREQUENCY() function inExcel (remember that this is an array function)

Once you have a column of category names and acolumn of frequencies, highlight the values then select‘Inset’, then ‘Pie Chart’ and choose your preferredoptions

It’s the same method for bar/column charts, justspecificy the appropriate chart type

Back to Excel and American Community Survey data ontravel to work (travel-to-work.csv)...

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 18 / 28

Page 19: Review: Types of Summary Statistics

Line Charts

When the observations in a univariate dataset have a naturalorder, it often makes sense to use a line chart

A line chart plots successive values of the data againstthe successive index values

This offers an easy way to visualize whether values aregetting larger or smaller

Line charts are most common with tme series data

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 19 / 28

Page 20: Review: Types of Summary Statistics

Constructing a Line Chart in Excel

To practice constructing a line chart, we’ll use time seriesdata on employment in California.

The data are available on Smartsite(ca-urate-2000-2010.csv)

They are monthly time series data from January of 2000to November of 2010

The data were downloaded from the Bureau of LaborStatistics (www.bls.gov)

To Excel...

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 20 / 28

Page 21: Review: Types of Summary Statistics

Constructing a Line Chart in Excel

Begin by selecting the data values that you want tograph

Select ‘Insert’ and then ‘Line’ and then whichever typeof line chart you prefer

To get the x-axis values you want, right click on thechart and choose ‘Select data...’

Click on the ‘Edit’ box under ‘Horizontal (category)Axis Label’ and select the cells containing your labels

If you have graphed multiple data series on the samegraph, be certain to include a legend

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 21 / 28

Page 22: Review: Types of Summary Statistics

Summary Statistics as a Graph: The Box Plot

Box plot of income by form of transportation used, 2008 AmericanCommunity Survey

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 22 / 28

Page 23: Review: Types of Summary Statistics

Some Other Examples of Visual Representations of Data

Google Trends data for the phrase “ice cream” (blue line) and theword “Santa” (red line).

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 23 / 28

Page 24: Review: Types of Summary Statistics

Some Other Examples of Visual Representations of Data

From visualizingeconomics.com

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 24 / 28

Page 25: Review: Types of Summary Statistics

Some Other Examples of Visual Representations of Data

From joeswainson.blogspot.com

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 25 / 28

Page 26: Review: Types of Summary Statistics

Some Other Examples of Visual Representations of Data

Map of Napoleon’s Russian campaign of 1812, Charles Joseph Minard (1861)

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 26 / 28

Page 27: Review: Types of Summary Statistics

Some Other Examples of Visual Representations of Data

Wordle generated from Bush’s 2002 State of the Union address(after 9/11).

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 27 / 28

Page 28: Review: Types of Summary Statistics

Some Other Examples of Visual Representations of Data

Wordle generated from Obama’s 2009 State of the Union address(after start of recession).

J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 January 6, 2011 28 / 28