67
Week 2 September 8-12 Five Mini-Lectures QMM 510 Fall 2014

Week 2 September 8-12

  • Upload
    daryl

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Week 2 September 8-12. Five Mini-Lectures QMM 510 Fall 2014 . Chapter Contents 4.1 Numerical Description 4.2 Measures of Center 4.3 Measures of Variability 4.4 Standardized Data 4.5 Percentiles, Quartiles, and Box Plots 4.6 Correlation and Covariance 4.7 Grouped Data - PowerPoint PPT Presentation

Citation preview

Page 1: Week  2  September  8-12

Week 2 September 8-12

Five Mini-Lectures QMM 510Fall 2014

Page 2: Week  2  September  8-12

4-2

Describing Data Numerically ML 2.1

Chapter Contents

4.1 Numerical Description

4.2 Measures of Center

4.3 Measures of Variability

4.4 Standardized Data

4.5 Percentiles, Quartiles, and Box Plots

4.6 Correlation and Covariance

4.7 Grouped Data

4.8 Skewness and Kurtosis

Chapter 4

So many topics, so little time …

Page 3: Week  2  September  8-12

4-3

Chapter 4

Center, Variability, Shape

Three key characteristics of numerical data:

Page 4: Week  2  September  8-12

4-4

Chapter 4

Visual Description

Page 5: Week  2  September  8-12

4-5

• A familiar measure of center

• Excel function =AVERAGE(Data) where Data is an array of data values.

Population Mean Sample Mean

Mean

Chapter 4

Measures of Center

Page 6: Week  2  September  8-12

4-6

• The median (M) is the 50th percentile or midpoint of the sorted sample data.

• M separates the upper and lower halves of the sorted observations.• If n is odd, the median is the middle observation in the data array.• If n is even, the median is the average of the middle two observations in

the data array.

Median

Chapter 4

Measures of Center

Page 7: Week  2  September  8-12

4-7

• The most frequently occurring data value.

• Familiar and easy to understand.

• But - data may have multiple modes or no mode.

• Most useful for discrete or categorical data with only a few values.Rarely useful for continuous data or data with a wide range.

Mode

Chapter 4

Example: Revenue growth in 32 bio-tech companies last year.0.57 1.57 1.71 1.71 1.86 2.14 2.43 2.864.00 4.01 5.28 5.29 6.14 6.43 6.71 6.868.29 8.43 9.14 9.29 10.00 10.29 10.43 10.43

11.00 11.57 11.57 11.86 12.43 13.43 13.57 14.14

Caution: In decimal data, some data values may occur more than once, but this is likely due to chance (not central tendency). Excel’s =MODE(Data) returns only the first mode (1.71 in this example).

Measures of Center

Page 8: Week  2  September  8-12

4-8

• Compare mean and median or look at the histogram to determine degree of skewness.

• Figure 4.10 shows prototype population shapes showing varying degrees of skewness.

Chapter 4

Measures of Center

Page 9: Week  2  September  8-12

4-9

• The geometric mean (G) is a multiplicative average.

Geometric Mean

Chapter 4

Growth RatesA variation on the geometric mean used to find the average

growth rate for a time series.

In Excel =GEOMEAN(Data) or =(2*3*7*9*10*12)^(1/6)

Measures of Center

Page 10: Week  2  September  8-12

4-10

• For example, from 2006 to 2010, JetBlue Airlines revenues are:

Year Revenue (mil)2006 2,3612007 2,8432008 3,3922009 3,2922010 3,779

Growth Rates

The average growth rate:

or 12.5 % per year.

Chapter 4

Measures of Center

Page 11: Week  2  September  8-12

4-11

• The midrange is the point halfway between the lowest and highest values of X.

• Easy to use but sensitive to extreme data values.

• Here, the midrange (126.5) is higher than the mean (114.70) or median (113).

Midrange

• For the J.D. Power quality data:

Chapter 4

Measures of Center

Page 12: Week  2  September  8-12

4-12

• To calculate the trimmed mean, first remove the highest and lowest k percent of the observations.

• For example, for the n = 33 P/E ratios, we want a 5 percent trimmed mean (i.e., k = .05).

• To determine how many observations to trim, multiply k by n, which is 0.05 x 33 = 1.65 or 2 observations.

• So, we would remove the two smallest and two largest observations before averaging the remaining values.

Trimmed Mean

Chapter 4

Measures of Center

Page 13: Week  2  September  8-12

4-13

• Here is a summary of all the measures of central tendency for the J.D. Power data, along with Excel functions.

• The trimmed mean mitigates the effects of very high values.

Mean: 114.70 =AVERAGE(Data)

Median: 113 =MEDIAN(Data)

Mode: 111 =MODE.SNGL(Data)Geometric Mean: 113.35 =GEOMEAN(Data)

Midrange: 126.5 (MIN(Data)+MAX(Data))/2

5% Trim Mean: 113.94 =TRIMMEAN(Data, 0.1)

Trimmed Mean

Chapter 4

Measures of Center

Page 14: Week  2  September  8-12

4-14

Variability is the “spread” of data points about the center of the distribution in a sample.

Statistic Formula Excel Pro Con

Range xmax – xmin=MAX(Data) -

MIN(Data) Easy to calculateSensitive to extreme data values.

Sample Variance (s2)

=VAR.S(Data)Plays a key role in mathematical statistics.

Nonintuitive meaning.

Measures of Variability

Chapter 4

Measures of Variability

Page 15: Week  2  September  8-12

4-15

Statistic Formula Excel Pro Con

Sample standard deviation (s)

=STDEV.S(Data)

Most common measure. Uses same units as the raw data ($ , £, ¥, grams etc.).

Nonintuitive meaning.

Sample coef-ficient. ofvariation (CV)

=100*STDEV.S(Data)/

AVERAGE(Data)

Measures relative variation in percent so can compare data sets.

Requires non-negative data.

Chapter 4Population variance Population standard deviation

Measures of Variability

Page 16: Week  2  September  8-12

4-16

Statistic Formula Excel Pro Con

Mean absolute deviation (MAD)

=AVEDEV(Data) Easy to understand.

Lacks “nice” theoretical properties.

1

n

iix x

n

Chapter 4

Measures of Variability

Page 17: Week  2  September  8-12

4-17

• Useful for comparing variables measured in different units or with different means.

• A unit-free measure of dispersion.

• Expressed as a percent of the mean.

• Only appropriate for nonnegative data. It is undefined if the mean is zero or negative.

Coefficient of Variation

Chapter 4

Measures of Variability

Page 18: Week  2  September  8-12

4-18

Chapter 4

Example: Class scores on 16-point quiz on first day of class and after students had an opportunity to review the material.

Caution: Only appropriate for nonnegative data. CV is undefined if the mean is zero or negative (this could happen, for example, if stocks in a portfolio had negative rates of return).

Measures of Variability

Page 19: Week  2  September  8-12

4-19

Standardized Data ML 2.2C

hapter 4

Topics

• sorting, standardizing, z-scores

• normal distribution as a benchmark

• Empirical Rule (MegaStat)

• outliers and unusual observations

• Excel functions (Appendix J)

• examples: birth weight, voting

• using MegaStat and Minitab

Page 20: Week  2  September  8-12

4-20

• The Empirical Rule states that for data from a normal distribution, we expect the interval ± k to contain a known percentage of observed data:

• The normal distribution is symmetric and is also known as the bell-shaped curve.

k = 1 68.26% will lie within + 1k = 2 95.44% will lie within + 2

k = 3 99.73% will lie within + 3

Chapter 4

The Empirical Rule

Page 21: Week  2  September  8-12

4-21

Note: No upper bound is given.

Data values outside + 3 are rare.

The Empirical Rule

Chapter 4

Standardized Data

Page 22: Week  2  September  8-12

4-22

• A standardized variable (Z) redefines each observation in terms of the number of standard deviations from the mean.

A negative zvalue means theobservation is to theleft of the mean.

Positive z means the observation is to the right of the mean.

Chapter 4

Standardization formula for a population:

Standardization formula for a sample (for n > 30):

Standardized Data

Page 23: Week  2  September  8-12

4-23

Chapter 4

Standardized Data

Page 24: Week  2  September  8-12

4-24

Chapter 4

Standardized DataExample: Birth Weights (n = 1429)

• 5 pound baby’s z-score: z = (80-116.14)/21.96 = -1.65• 8 pound baby’s z-score: z = (144-116.14)/21.96 = 1.27• 11 pound baby’s z-score: z = (176-116.14)/21.96 = 2.73

Resembles a normal except for the low tail (a few extremely tiny babies).

Source Birth records from the North Carolina State Center for Health and Environmental Statistics and the Institute for Research in Social Science at University of North Carolina at Chapel Hill.

Page 25: Week  2  September  8-12

4-25

Chapter 4

Standardized DataExample: Voting in 2004 Presidential Election)

Only two states stand out as unusual

State Voting% z-ScoreHawaii 46.2 -2.35California 49.1 -1.89Texas 50.3 -1.71Nevada 51.3 -1.55Georgia 52.6 -1.35… … …Oregon 70.6 1.45North Dakota 70.8 1.48Maine 72.0 1.67Wisconsin 73.0 1.82Minnesota 76.7 2.40

Note: Sorting the data values allows you to see the extremes. Values within μ ±1σ are not less interesting.

Use Excel’s function=STANDARDIZE(x, μ, σ)

Mean 61.29St Dev 6.43

n 50

Page 26: Week  2  September  8-12

4-26

Chapter 4

Excel

Voting%

Mean 61.286Standard Error 0.909788089Median 61.5Mode 59.7Standard Deviation 6.433173274Sample Variance 41.38571837Kurtosis 0.014949556Skewness 0.00241464Range 30.5Minimum 46.2Maximum 76.7Sum 3064.3Count 50

Voting percent in 50 states

Note: In Excel’s Descriptive Statistics, you can’t choose the statistics displayed.

Page 27: Week  2  September  8-12

4-27

Chapter 4

MegaStat

Note: You can choose the statistics displayed (e.g.,Empirical Rule).

Statistic Voting% empirical rulecount 50 mean - 1s 54.853 mean 61.286 mean + 1s 67.719 sample variance 41.386 percent in interval (68.26%) 68.00%sample standard deviation 6.433 mean - 2s 48.420 minimum 46.2 mean + 2s 74.152 maximum 76.7 percent in interval (95.44%) 96.00%range 30.5 mean - 3s 41.986

mean + 3s 80.586 1st quartile 57.450 percent in interval (99.73%) 100.00%median 61.500 3rd quartile 64.950 low outliers 0 interquartile range 7.500 high outliers 1 mode 59.700 high extremes 0

Voting percent in 50 states

Page 28: Week  2  September  8-12

4-28

Chapter 4

Appendix J: Excel Functions

Page 29: Week  2  September  8-12

4-29

Chapter 4

Appendix J: Excel Functions

Page 30: Week  2  September  8-12

4-30

Quantiles ML 2.3C

hapter 4

Topics

• percentiles, quartiles, boxplots

• fences, another view of outliers

• examples: birth weight. City MPG

Page 31: Week  2  September  8-12

4-31

• Percentiles are data that have been divided into 100 groups.

For example, you score in the 83rd percentile on a standardized test. That means that 83% of the test-takers scored below you.

• Deciles are data that have been divided into10 groups.

• Quintiles are data that have been divided into 5 groups.

• Quartiles are data that have been divided into 4 groups.

Percentiles

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 32: Week  2  September  8-12

4-32

• Percentiles may be used to establish benchmarks for comparison purposes (e.g. health care, manufacturing, and banking industries use 5th, 25th, 50th, 75th and 90th percentiles).

• Quartiles (25, 50, and 75 percent) are commonly used to assess financial performance and stock portfolios.

• Percentiles can be used in employee merit evaluation and salary benchmarking.

Percentiles

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 33: Week  2  September  8-12

4-33

• Quartiles are scale points that divide the sorted data into four groups of approximately equal size.

The three values that separate the four groups are called Q1, Q2, and Q3.

Q1 Q2 Q3

Lower 25% | Second 25% | Third 25% | Upper 25%

Quartiles

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 34: Week  2  September  8-12

4-34

• The second quartile Q2 is the median, a measure of central tendency.

Q2

Lower 50% | Upper 50%

Quartiles

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 35: Week  2  September  8-12

4-35

• For small data sets, find quartiles using method of medians:

Step 1: Sort the observations.

Step 2: Find the median Q2.

Step 3: Find the median of the data values that lie below Q2.

Step 4: Find the median of the data values that lie above Q2.

Method of Medians

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 36: Week  2  September  8-12

4-36

• The first quartile Q1 is the median of the data values below Q2

• The third quartile Q3 is the median of the data values above Q2.

Q1 Q2 Q3

Lower 25% | Second 25% | Third 25% | Upper 25%

For first half of data, 50% above, 50% below Q1.

For second half of data, 50% above, 50% below Q3.

Quartiles – The method of medians

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 37: Week  2  September  8-12

4-37

Method of Medians

Chapter 4

Example:

Percentiles, Quartiles, and Box-Plots

Page 38: Week  2  September  8-12

4-38

• A useful tool of exploratory data analysis (EDA).

• Also called a box-and-whisker plot.

• Based on a five-number summary:

Xmin, Q1, Q2, Q3, Xmax

• For the previous P/E ratios example:

7 27 35.5 40.5 49

Xmin, Q1, Q2, Q3, Xmax

Chapter 4

Box Plots

Percentiles, Quartiles, and Box-Plots

Page 39: Week  2  September  8-12

4-39

• The box plot is displayed visually, like this.

Chapter 4

Box Plots

Percentiles, Quartiles, and Box-Plots

Page 40: Week  2  September  8-12

4-40

Chapter 4

Box Plots

Percentiles, Quartiles, and Box-Plots

Page 41: Week  2  September  8-12

4-41

• The average of the first and third quartiles.

The name midhinge derives from the idea that, if the “box” were folded in half, it would resemble a “hinge”.

Box Plots: Midhinge

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 42: Week  2  September  8-12

4-42

• Use quartiles to detect unusual data points.

• These points are called fences and can be found using the following formulas:

Inner fences Outer fences:

Lower fence Q1 – 1.5 (Q3 – Q1) Q1 – 3.0 (Q3 – Q1)

Upper fence Q3 + 1.5 (Q3 – Q1) Q3 + 3.0 (Q3 – Q1)

• Values outside the inner fences are unusual while those outside the outer fences are outliers.

Box Plots: Fences and Unusual Data Values

Chapter 4

Percentiles, Quartiles, and Box-Plots

Page 43: Week  2  September  8-12

4-43

Chapter 4

Example: Birth Weights (n = 1429)

Box-Plots with Fences

Source Birth records from the North Carolina State Center for Health and Environmental Statistics and the Institute for Research in Social Science at University of North Carolina at Chapel Hill.

Note: The middle 50% of birth weights lie within a small range (105 to 130, or about 6.56 lb to 8.13 lbs). But there are extremes on the low end.

Page 44: Week  2  September  8-12

4-44

Fences Visualized:

Chapter 4

Fences Example:

Interpretation: There are three outliers (beyond the inner upper fence). One is on the border of the upper outer fence, so is almost an extreme outlier. Lower fences are not displayed since they are irrelevant for this sample.

Box-Plots with Fences

Page 45: Week  2  September  8-12

4-45

Interpretation: Based on the fences, there is only one outlier and no extreme outliers. Lower fences are not displayed since they are not needed for this sample.

Chapter 4

Example: Fences and Unusual Data Values

Outlier

Box-Plots with Fences

Page 46: Week  2  September  8-12

4-46

Correlation, Grouped Data, Shape ML 2.4C

hapter 4

Topics

• scatter plots

• correlation coefficient

• covariance – population, sample

• mean from grouped mean

• skewness, kurtosis (Excel)

Page 47: Week  2  September  8-12

4-47

The sample correlation coefficient is a statistic that describes the degree of linearity between paired observations on two quantitative variables X and Y.

Correlation Coefficient

Note: -1 ≤ r ≤ +1

Chapter 4

Correlation and Covariance

Perfect negative correlation

Perfect positivecorrelation

Page 48: Week  2  September  8-12

4-48

Illustration of Correlation Coefficients

Chapter 4

Correlation and Covariance

Page 49: Week  2  September  8-12

4-49

The sample correlation coefficient describes the degree of linearity between paired observations on two quantitative variables X and Y.

Correlation Coefficient: Examples Note: -1 ≤ r ≤ +1

Chapter 4

X = car weight (lbs), Y = city MPG X = gestation (months), Y = birth weight (oz)

Correlation and Covariance

Page 50: Week  2  September  8-12

4-50

The sample correlation coefficient describes the degree of linearity between paired observations on two quantitative variables X and Y.

Correlation Coefficient: Example Note: -1 ≤ r ≤ +1

Chapter 4

Correlation and Covariance

Page 51: Week  2  September  8-12

4-51

The covariance of two random variables X and Y (denoted σXY ) measures the degree to which the values of X and Y change together.

Covariance

Chapter 4

Correlation and Covariance

Caution: The covariance is not easy to interpret because its units depend on Y (e.g., dollars). That’s why we usually refer to the correlation coefficient (it is unit free).

Page 52: Week  2  September  8-12

4-52

Group Mean

Chapter 4

Grouped Data

Weighted Mean

Page 53: Week  2  September  8-12

4-53

Group Mean

Chapter 4

Grouped Data

Note: You will rarely need this. If you are given only grouped data. you will have to make your own tables in Excel (like this).

Page 54: Week  2  September  8-12

4-54

Skewness

Chapter 4

Skewness

To interpret Excel’s skewness coefficient, you need a table showing critical values for various sample sizes.

Note: You can assess skewness from the histogram or boxplot (usually revealed by outliers or a long tail). It’s usually not worth it to bother with the table.

Page 55: Week  2  September  8-12

4-55

To interpret Excel’s kurtosis coefficient, you need a table showing critical values for various sample sizes.

Chapter 4

Kurtosis

Caution: You cannot reliably assess kurtosis from the histogram, because its x-axis scale affects its appearance. Maybe best to let statisticians worry about this topic.

Page 56: Week  2  September  8-12

0-56

Assignments ML 2.5

• Connect C-2 (covers chapter 4)• You get three attempts• Feedback is given if requested• Printable if you wish• Deadline is midnight each Monday

• Project P-1 (data, tasks, questions)• Review instructions• Look at the data• Your task is to write a nice, readable report (not a spreadsheet)• Length is up to you

Page 57: Week  2  September  8-12

0-57

Projects: General Instructions

General Instructions

For each team project, submit a short (5-10 page) report (using Microsoft Word or equivalent) that answers the questions posed. Strive for effective writing (see textbook Appendix I). Creativity and initiative will be rewarded. Avoid careless spelling and grammar. Paste graphs and computer tables or output into your written report (it may be easier to format tables in Excel and then use Paste Special > Picture to avoid weird formatting and permit sizing within Word). Allocate tasks among team members as you see fit, but all should review and proofread the report (submit only one report).

Page 58: Week  2  September  8-12

0-58

Project P-1Random teams are assigned on Moodle (submit only one report). Data: Download Big Dataset 02 - Crime in Major Cities from Moodle. Your team is assigned one crime category (but you can change it if you wish). Copy the city names and the chosen crime data column to a new spreadsheet. Delete lines (if any) with missing data. Analysis: (a) Sort the observations (with city names). (b) List the top 10 and bottom 10 data values (with city names). (c) For the entire data set, calculate the mean and median. What do they tell you about center? Would the mode be helpful for this type of data? Explain. (d) Calculate the standard deviation. (e) Calculate the standardized z-value for each observation. (f) Are there outliers or unusual data values (see p. 137)? Discuss. (g) Use MegaStat (or Minitab or Excel) to make a histogram. Describe its shape. (h) Calculate the quartiles. Make a boxplot and describe it. (i) Make a scatter plot of your kind of crime versus a different type of crime. What does it show? (j) Ambitious students: Sort the database in random order (see bottom of page 36) using Excel’s function =RAND(). Copy and paste the first few sorted lines into your report to illustrate your sorting method. Comment on anything unusual (or interesting things that you might find on the web).

Watch the video walkthrough using Voting, North Carolina Births, and CEO compensation as examples (posted on Moodle)

Page 59: Week  2  September  8-12

0-59

Project P-1your 2010 data will look like this (2005 and 2000 are also available)

Crime Rates in U.S. Metropolitan Areas, 2010 (n = 365)

Metropolitan Statistical Area All Violent Murder Rape Robbery Assault All Property Burglary Larceny Car Theft DefinitionsAbilene, TX M.S.A. 423.0 3.1 48.9 72.7 298.3 3617.3 1009.0 2459.8 148.5 Violent crimeAkron, OH M.S.A. 304.7 3.7 40.9 105.1 155.0 3185.6 947.7 2074.5 163.3 Murder and nonnegligent manslaughterAlbany, GA M.S.A. 566.0 8.7 24.9 150.4 382.1 4512.6 1417.8 2803.4 291.4 Forcible rapeAlbany-Schenectady-Troy, NY M.S.A. 310.4 1.5 21.0 98.5 189.4 2693.6 512.1 2076.2 105.4 RobberyAlbuquerque, NM M.S.A. 670.4 5.8 44.8 124.3 495.6 3896.1 920.6 2586.2 389.4 Aggravated assaultAlexandria, LA M.S.A. 638.0 5.8 23.1 132.3 476.7 4592.9 1203.3 3176.3 213.3Allentown-Bethlehem-Easton, PA-NJ M.S.A. 228.2 3.5 20.3 93.6 110.9 2298.0 432.2 1758.1 107.7 Property crimeAltoona, PA M.S.A. 243.6 0.8 38.0 49.8 155.0 1811.7 425.4 1318.2 68.0 BurglaryAmarillo, TX M.S.A. 513.1 5.7 40.8 98.9 367.8 4812.7 1137.2 3390.5 285.0 Larceny-theftAmes, IA M.S.A. 299.5 1.1 41.7 12.4 244.4 2528.1 478.6 1966.1 83.3 Motor vehicle theftAnchorage, AK M.S.A. 812.9 4.2 85.9 148.5 574.4 3506.3 416.1 2813.4 276.8Anderson, IN M.S.A. 205.8 2.3 33.4 70.6 99.5 3353.8 848.1 2294.6 211.1Anderson, SC M.S.A. 586.0 5.3 36.4 75.9 468.4 4707.8 1297.6 3041.7 368.4Ann Arbor, MI M.S.A. 338.5 1.4 43.2 69.8 224.0 2713.7 659.7 1879.5 174.4Appleton, WI M.S.A. 155.8 0.0 21.4 13.8 120.5 2136.7 378.5 1708.2 50.0Asheville, NC M.S.A. 229.7 1.9 21.8 59.9 146.1 2454.9 749.6 1534.9 170.3Athens-Clarke County, GA M.S.A. 374.9 4.2 19.6 70.5 280.5 3843.7 1018.0 2588.1 237.5Atlanta-Sandy Springs-Marietta, GA M.S.A. 413.8 6.1 20.9 149.7 237.1 3462.6 957.0 2135.7 370.0Atlantic City-Hammonton, NJ M.S.A. 529.8 8.0 18.9 245.5 257.5 3550.3 741.5 2685.7 123.1Augusta-Richmond County, GA-SC M.S.A. 412.9 10.2 37.4 156.6 208.7 4815.3 1355.1 3037.7 422.5Austin-Round Rock-San Marcos, TX M.S.A. 327.9 3.4 24.7 84.0 215.8 3792.0 754.3 2866.9 170.8Bakersfield-Delano, CA M.S.A. 593.0 9.0 19.9 148.4 415.7 3713.1 1148.0 1931.6 633.6Baltimore-Towson, MD M.S.A. 685.3 10.3 23.6 214.4 437.0 3090.7 649.5 2135.5 305.7Bangor, ME M.S.A. 68.4 2.0 12.6 27.2 26.6 3098.2 573.3 2429.3 95.7Barnstable Town, MA M.S.A. 434.6 0.5 36.1 57.6 340.3 2972.8 1116.6 1764.7 91.5Battle Creek, MI M.S.A. 697.6 4.5 75.3 109.6 508.3 3703.5 1145.6 2411.1 146.8Bay City, MI M.S.A. 335.2 0.9 78.1 50.8 205.2 2472.4 610.1 1776.6 85.7Beaumont-Port Arthur, TX M.S.A. 498.3 5.6 37.7 157.9 297.0 3865.3 1156.9 2488.4 220.1Bellingham, WA M.S.A. 267.0 2.5 44.7 50.6 169.1 3197.8 694.2 2372.7 130.8Bend, OR M.S.A.2 304.9 4.3 29.0 30.9 240.7 2973.7 497.5 2360.2 116.0

Property Crimes Per 100,000Violent Crimes Per 100,000

Page 60: Week  2  September  8-12

0-60

Example: CEO Compensation

sorting is a good first step

Page 61: Week  2  September  8-12

0-61

Example: CEO Compensation

Highlight all data (including the headings) and use Custom Sort

Page 62: Week  2  September  8-12

0-62

Example: CEO Compensationnow you can clearly see the high and low data values (and comment on any weird data values)

Page 63: Week  2  September  8-12

0-63

Example: CEO Compensation

use MegaStat’s Descriptive Statistics to get your basic stats along with a nice boxplot

Page 64: Week  2  September  8-12

0-64

Example: CEO Compensationuse MegaStat’s Frequency Distributions to get a frequency table, histogram, etc

severely skewed

annotated by user

normal if logs used?

Page 65: Week  2  September  8-12

0-65

Example: CEO Compensationstandardize the sorted list by subtracting the mean from each x value and then dividing by the standard deviation (or use =STANDARDIZE function)

Page 66: Week  2  September  8-12

0-66

Example: CEO Compensationafter standardizing the sorted list, unusual z values can be seen

Page 67: Week  2  September  8-12

0-67

Example: CEO Compensation

to randomize the list, paste values of =RAND() beside data and custom sort on =RAND()