STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures

STAT 280: Elementary Applied Statistics

Describing Data Using Numerical Measures

After completing this chapter, you should be able to:

Compute and interpret the mean, median, and mode for a set of data

Compute the range, variance, and standard deviation and know what these values mean

Construct and interpret a box and whiskers plot

Compute and explain the coefficient of variation and z scores

Use numerical measures along with graphs, charts, and tables to describe data

Chapter Goals

Summary Measures

Center and Location

Median

Other Measures of Location

Weighted Mean

Describing Data Numerically

Variation

Variance

Standard Deviation

Coefficient of Variation

RangePercentiles

Interquartile RangeQuartiles

GeoMean

Measures of Center and Location

Center and Location

Mean Median Mode Weighted Mean

Overview

Geomean RMS

Mean (Arithmetic Average)

The Mean is the arithmetic average of data values Sample mean

Population mean

n = Sample Size

N = Population Size

Mean (Arithmetic Average)

The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers)

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

104321

Median

Not affected by extreme values

In an ordered array, the median is the “middle” number If n or N is odd, the median is the middle number If n or N is even, the median is the average of the

two middle numbers

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 5

0 1 2 3 4 5 6

No Mode

Geometric Mean

A measure of central tendency Only used for positive, numerical data Same as IRR Always less than or equal to Mean

Root Mean Square (RMS)

A measure of central tendency Measures physical activity, not

affected by negative values There may may be no mode There may be an infinite number of

RMS ii

Weighted Mean

Used when values are grouped by frequency or relative importance

Days to Complete

Frequency

Example: Sample of 26 Repair Projects Weighted Mean Days

to Complete:

days 6.31 26

8)(27)(86)(125)(4

Central Measure Example

Summary Statistics

Mean: ($3,000,000/5)

= $600,000

Median: middle value of ranked data = $300,000

Mode: most frequent value = $100,000

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Sum 3,000,000

Mean is generally used, unless extreme values (outliers) exist

Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be

reported for a region – less sensitive to outliers

Which measure of location is the “best”?

Shape of a Distribution

Describes how data is distributed Symmetric or skewed

Mean = Median = Mode

Mean < Median < Mode Mode < Median < Mean

Right-SkewedLeft-Skewed Symmetric

(Longer tail extends to left) (Longer tail extends to right)

Other Location Measures

Other Measures of Location

Percentiles Quartiles

1st quartile = 25th percentile

2nd quartile = 50th percentile

= median

3rd quartile = 75th percentile

The pth percentile in a data array: p% are less than or equal to this

value (100 – p)% are greater than or

equal to this value

(where 0 ≤ p ≤ 100)

Quartiles

Quartiles split the ranked data into 4 equal groups

25% 25% 25% 25%

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

Example: Find the first quartile

(n = 9)

Q1 = 25th percentile, so find the (9+1) = 2.5 position

so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Q1 Q2 Q3

Percentiles

The pth percentile in an ordered array of n values is the value in ith position, where

Example: The 60th percentile in an ordered array of 19 values is the value in 12th position:

1)(n100

121)(19100

601)(n

Box and Whisker Plot

A Graphical display of data using 5-number summary:

Minimum -- Q1 -- Median -- Q3 -- Maximum

Example:

Minimum 1st Median 3rd Maximum Quartile Quartile

25% 25% 25% 25%

Shape of Box and Whisker Plots

The Box and central line are centered between the endpoints if data is symmetric around the median

A Box and Whisker plot can be shown in either vertical or horizontal format

Distribution Shape and Box and Whisker Plot

Right-SkewedLeft-Skewed Symmetric

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

Box-and-Whisker Plot Example

Below is a Box-and-Whisker plot for the following data:

0 2 2 2 3 3 4 5 5 10 27

This data is very right skewed, as the plot depicts

0 2 3 5 270 2 3 5 27

Min Q1 Q2 Q3 Max

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance

Sample Variance

PopulationStandardDeviation

Sample Standard Deviation

Interquartile Range

Measures of variation give information on the spread or variability of the data values.

Variation

Same center, different variation

Simplest measure of variation Difference between the largest and the smallest

observations:

Range = xmaximum – xminimum

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Ignores the way in which data are distributed

Sensitive to outliers

7 8 9 10 11 12Range = 12 - 7 = 5

7 8 9 10 11 12 Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Interquartile Range

Can eliminate some outlier problems by using the interquartile range

Eliminate some high-and low-valued observations and calculate the range from the remaining values.

Interquartile range = 3rd quartile – 1st quartile

Interquartile Range

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

“Outliers”

1.5 IQR Criterion IQR= Q3 – Q1 Q3 + 1.5IQR Q1 - 1.5IQR

“2-Sigma” Criterion (2 )

Average of squared deviations of values from the mean Sample variance:

Population variance:

Variance

μ)(xσ

Standard Deviation

Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Sample standard deviation:

Population standard deviation:

μ)(xσ

Calculation Example:Sample Standard

Deviation

Sample Data (Xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

16)(2416)(1416)(1216)(10

)x(24)x(14)x(12)x(10s

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Coefficient of Variation

Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of data

measured in different units

Population Sample

Comparing Coefficient of Variation

Stock A: Average price last year = $50 Standard deviation = $5

Stock B: Average price last year = $100 Standard deviation = $5

Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50

$5100%

5%100%$100

$5100%

If the data distribution is bell-shaped, then the interval:

contains about 68% of the values in the population or the sample

The Empirical Rule

contains about 95% of the values in the population or the sample

contains about 99.7% of the values in the population or the sample

The Empirical Rule

99.7%95%

Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean

Examples:

(1 - 1/12) = 0% ……..... k=1 (μ ± 1σ)(1 - 1/22) = 75% …........ k=2 (μ ± 2σ)(1 - 1/32) = 89% ………. k=3 (μ ± 3σ)

Tchebysheff’s Theorem

withinAt least

A standardized data value refers to the number of standard deviations a value is from the mean

Standardized data values are sometimes referred to as z-scores

Standardized Data Values

where: x = original data value μ = population mean σ = population standard deviation z = standard score

(number of standard deviations x is from μ)

Standardized Population Values

where: x = original data value x = sample mean s = sample standard deviation z = standard score

(number of standard deviations x is from μ)

Standardized Sample Values

Using Microsoft Excel

Descriptive Statistics are easy to obtain from Microsoft Excel

Use menu choice:

tools / data analysis / descriptive statistics

Enter details in dialog box

Excel output

Microsoft Excel

descriptive statistics output,

using the house price data:

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Chapter Summary

Described measures of center and location Mean, median, mode, geometric mean, midrange

Discussed percentiles and quartiles Described measure of variation

Range, interquartile range, variance,

standard deviation, coefficient of variation

Created Box and Whisker Plots

Chapter Summary

Illustrated distribution shapes Symmetric, skewed

Discussed Tchebysheff’s Theorem

Calculated standardized data values

(continued)

STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures

Documents

Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

FrameOne EN FR · 2014-06-24 · 6 7 B6232 B6256 B7415 280 280 80 280 280 80 B6254 B7416 280 280 80 280 280 80 C4001 280 280 80 FrameOne That’s what I want As individual as you

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Nathaniel Cannon Describing Data: Categorical Variables SECTIONS 2.1 One categorical variable Two

PUBLIC LAW 106-280—OCT. 6, 2000 114 STAT. 845 …uscode.house.gov/statutes/pl/106/280.pdfPUBLIC LAW 106-280—OCT. 6, 2000 114 STAT. 845 Public Law 106-280 106th Congress An Act

STAT Working papers STAT Documents de travail STAT … · 2014-06-10 · STAT Working papers STAT Documents de travail STAT Documentos de trabajo No. 95-4 INQUÉRITOS DE POPULAÇAO

José Patrício | 280 Dominoes · 2018-11-01 · 280 Dominoes, 2000 7.840 pieces of domino (resin) 312 x 312 cm/122.8 x 122.8 in 280 dominós [280 dominoes] 280 dominós [280 dominoes]

MOTORI GH 280 GH 355 - Mez CZ · GH 280 SK 4 GH 280 MK 6 GH 280 LK 8 GH 280 PK 10 Overall dimensions GH 280 IM1001-IP44-IC37 12 GH 280 IM1001-IP23-IC06 13 GH 280 IM1001-IP54-IC86W

STAT 280 Lab 99 Solutions and Thoughts Page 1 of 13

STAT Working Papers STAT Documents de travail STAT

IT/280 2014 IT280 IT 280 IT/280 TUTORIALS UOP TUTORIALS IT/280

· ¥410 ¥280 ¥420 ¥410 ¥280 ¥300 ¥340 ¥80 ¥80 ¥80 ¥80 ¥80 o) 10 ¥ —(kcd) ¥220 ¥380 ¥280 ¥280 ¥280 601.0 21.1 12.2 2B 19B 857.0

Chapter 3: Describing Relationships - SMHSWalsh - homesmhswalsh.cmswiki.wikispaces.net/file/view/AP Stat 3.2... · 2014-10-20 · Chapter 3: Describing Relationships ... regression

pacini18 pacini s.r.l. iderurgica ommerciale acini LARGHI PIATTI DESIGNAZ. mm PESO Kg/m 280 x 10 280 x 12 280 x 15 280 x 20 280 x 25 280 x 30 280 x 35 280 x 40 22,0 26,4 33,0 44,0

ÁKVÖRÐUN SAMEIGINLEGU EES-NEFNDARINNAR...NO 280 Ibestad NOIBE NO 280 Inderøy NOIND NO 280 X Jan Mayen SJ888 NO 280 Jondal NOJON NO 280 Karlsøy NOKAY NO 280 Kirkenes/Sør-Varanger

AP Stat Day 10 68 Days until AP Exam Describing Data with Numbers Box and Whisker Plots Variance and Standard Deviation

Stat 475 Life Contingencies I Chapter 2: Survival models · The future lifetime random variable | Notation We are interested in analyzing and describing the future lifetime of an

PUBLIC LAW 106-280—OCT. 6, 2000 114 STAT. 845 Public Law ... · PUBLIC LAW 106-280—OCT. 6, 2000 114 STAT. 845 Public Law 106-280 106th Congress An Act To amend the Foreign Assistance

Ciclos do apalpador TNC 426 TNC 430 - HEIDENHAIN...9/99 Manual do utilizador Software de NC 280 472xx 280 473xx 280 474xx 280 475xx 280 476xx 280 477xx Ciclos do apalpador TNC 426

Chapter 2 DESCRIBING DATA USING GRAPHSnlucas/Stat 145/145 Powerpoint Files/145 Chapter 2... · Chapter 2 DESCRIBING DATA –USING GRAPHS TOPIC SLIDE Introduction to using graphs 2

Describing Places Module 5 Module 5. Describing Places1