Review of Basic Statistics. Parameters and Statistics Parameters are characteristics of populations,...

Preview:

Citation preview

Review of Basic Statistics

Parameters and Statistics

• Parameters are characteristics of populations, and are knowable only by taking a census.

• Statistics are estimates of parameters made from samples.

Descriptive Statistics Review

Measures of LocationThe MeanThe MedianThe Mode

Measures of Dispersion

The variance

The standard deviation

MeanThe mean (or average) is the basic measure of location or “central

tendency” of the data.

•The sample mean is a sample statistic.

•The population mean is a population statistic.

x

Sample Mean

n

xx i

Where the numerator is the sum of values of n observations, or:

ni xxxx ...21

The Greek letter Σ is the summation sign

Example: College Class SizeWe have the following sample of data for 5 college classes:

46 54 42 46 32

We use the notation x1, x2, x3, x4, and x5 to represent the number of students in each of the 5 classes:

X1 = 46 x2 = 54 x3 = 42 x4 = 46 x5 = 32

Thus we have:

445

3246425446

554321

xxxxx

n

xx i

The average class size is 44 students

Population Mean ()

N

xi

The number of observations in the population is denoted by the upper case N.

The sample mean is a point estimator of

the population mean

x

Median The median is the value in the middle when the data are arranged in ascending order (from smallest value to largest value).

a. For an odd number of observations the median is the middle value.

b. For an even number of observations the median is the average of the two middle values.

The College Class Size example

First, arrange the data in ascending order:

32 42 46 46 54

Notice than n = 5, an odd number. Thus the median is given by the middle value.

32 42 46 46 54

The median class size is 46

Median Starting Salary For a Sample of 12 Business School Graduates

A college placement office has obtained the following data for 12 recent graduates:

Graduate Starting Salary GraduateStarting Salary

1 2850 7 2890

2 2950 8 3130

3 3050 9 2940

4 2880 10 3325

5 2755 11 2920

6 2710 12 2880

2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325

Notice that n = 12, an even number. Thus we take an average of the middle 2 observations:

2710 2755 2850 2880 2880 2890 2920 2940 2950 3050 3130 3325

Middle two values

First we arrange the data in

ascending order

29052

29202890Median

Thus

Mode The mode is the value that occurs with greatest frequency

Soft Drink Example

Soft Drink Frequency

Coke Classic 19

Diet Coke 8

Dr. Pepper 5

Pepsi Cola 13

Sprite 5

Total 50

The mode is Coke Classic. A mean

or median is meaningless of qualitative data

Using Excel to Compute the Mean, Median, and Mode

Enter the data into cells A1:B13 for the starting salary example.

•To compute the mean, activate an empty cell and enter the following in the formula bar:

=Average(b2:b13) and click the green checkmark.

•To compute the median, activate an empty cell and enter the following in the formula bar:

= Median(b2:b13) and click the green checkmark.

•To compute the mode, activate an empty cell and enter the following in the formula bar:

=Average(b2:b13) and click the green checkmark.

The Starting Salary Example

Mean 2940

Median 2905

Mode 2880

Variance• The variance is a measure of variability that uses all the

data• The variance is based on the difference between each

observation (xi) and the mean ( ) for the sample and μ for the population).x

The variance is the average of the squared differences between the observations and the mean value

For the population:N

xi2

2 )(

For the sample:1

)( 22

n

xxs i

Standard Deviation

• The Standard Deviation of a data set is the square root of the variance.

• The standard deviation is measured in the same units as the data, making it easy to interpret.

Computing a standard deviation

1

)( 2

n

xxs i

For the population:

For the sample:

N

xi2)(

Measures of AssociationBetween two Variables

•Covariance

•Correlation coefficient

Covariance

• Covariance is a measure of linear association between variables.

• Positive values indicate a positive correlation between variables.

• Negative values indicate a negative correlation between variables.

To compute a covariance for variables x and y

N

uyx yixixy

))((

For populations

1

))((

n

yyxxs iixy

For samples

Mortgage Interest Rates and Monthly Home Sales, 1980-2004

3

5

7

9

11

13

15

17

15 35 55 75 95 115

Monthly Home Sales (thousands)

Mor

tgag

e In

tere

st R

ate

(Per

cent

)3.60x

02.9y

n = 299

II I

III

IV

If the majority of the sample points are

located in quadrants II and IV, you have a negative correlation

between the variables—as we do in this case.

Thus the covariance will have a negative sign.

The (Pearson) Correlation Coefficient

A covariance will tell you if 2 variables are positively or

negatively correlated—but it will not tell you the degree of correlation. Moreover, the

covariance is sensitive to the unit of measurement. The correlation coefficient does not suffer from

these defects

The (Pearson) Correlation Coefficient

yx

xyxy

yx

xyxy ss

sr

For populations

For samples

Note that:

11

and

11

xy

xy

r

Correlation Coefficient = 1

0

100

200

300

400

500

0 20 40 60 80 100

Average Speed (MPH)

Dis

tan

ce T

rave

led

in

5

Ho

urs

(M

iles

)

Correlation Coefficient = -1

012345678

0 2 4 6 8

Time Spent Jogging (Hours)

Tim

e S

pen

t S

wim

min

g

(Ho

urs

)

I have 7 hours per week for exercise

Normal Probability Distribution

The normal distribution is by far the most important

distribution for continuous random variables. It is widely

used for making statistical inferences in both the natural

and social sciences.

HeightsHeightsof peopleof peopleHeightsHeights

of peopleof people

Normal Probability DistributionNormal Probability Distribution

It has been used in a wide variety of It has been used in a wide variety of applications:applications:

ScientificScientific measurementsmeasurements

ScientificScientific measurementsmeasurements

AmountsAmounts

of rainfallof rainfall

AmountsAmounts

of rainfallof rainfall

Normal Probability DistributionNormal Probability Distribution

It has been used in a wide variety of It has been used in a wide variety of applications:applications:

TestTest scoresscoresTestTest

scoresscores

The Normal Distribution

22 2/)(

2

1)(

xexf

Where:

μ is the mean

σ is the standard deviation

= 3.1459

e = 2.71828

The distribution is The distribution is symmetricsymmetric, and is , and is bell-shapedbell-shaped.. The distribution is The distribution is symmetricsymmetric, and is , and is bell-shapedbell-shaped..

Normal Probability DistributionNormal Probability Distribution

CharacteristicsCharacteristics

xx

The entire family of normal probabilityThe entire family of normal probability distributions is defined by itsdistributions is defined by its meanmean and its and its standard deviationstandard deviation . .

The entire family of normal probabilityThe entire family of normal probability distributions is defined by itsdistributions is defined by its meanmean and its and its standard deviationstandard deviation . .

Normal Probability DistributionNormal Probability Distribution

CharacteristicsCharacteristics

Standard Deviation Standard Deviation

MeanMean xx

The The highest pointhighest point on the normal curve is at the on the normal curve is at the meanmean, which is also the , which is also the medianmedian and and modemode.. The The highest pointhighest point on the normal curve is at the on the normal curve is at the meanmean, which is also the , which is also the medianmedian and and modemode..

Normal Probability DistributionNormal Probability Distribution

CharacteristicsCharacteristics

xx

Normal Probability DistributionNormal Probability Distribution

CharacteristicsCharacteristics

-10-10 00 2020

The mean can be any numerical value: negative,The mean can be any numerical value: negative, zero, or positive.zero, or positive. The mean can be any numerical value: negative,The mean can be any numerical value: negative, zero, or positive.zero, or positive.

xx

Normal Probability DistributionNormal Probability Distribution

CharacteristicsCharacteristics

= 15= 15

= 25= 25

The standard deviation determines the width of theThe standard deviation determines the width of thecurve: larger values result in wider, flatter curves.curve: larger values result in wider, flatter curves.The standard deviation determines the width of theThe standard deviation determines the width of thecurve: larger values result in wider, flatter curves.curve: larger values result in wider, flatter curves.

xx

Probabilities for the normal random variable areProbabilities for the normal random variable are given by given by areas under the curveareas under the curve. The total area. The total area under the curve is 1 (.5 to the left of the mean andunder the curve is 1 (.5 to the left of the mean and .5 to the right)..5 to the right).

Probabilities for the normal random variable areProbabilities for the normal random variable are given by given by areas under the curveareas under the curve. The total area. The total area under the curve is 1 (.5 to the left of the mean andunder the curve is 1 (.5 to the left of the mean and .5 to the right)..5 to the right).

Normal Probability DistributionNormal Probability Distribution

CharacteristicsCharacteristics

.5.5 .5.5

xx

The Standard Normal Distribution

0

The Standard Normal Distribution is a normal distribution with the

special properties that is mean is zero and its standard deviation is

one.

1

00zz

The letter The letter z z is used to designate the standardis used to designate the standard normal random variable.normal random variable. The letter The letter z z is used to designate the standardis used to designate the standard normal random variable.normal random variable.

Standard Normal Probability DistributionStandard Normal Probability Distribution

Cumulative ProbabilityCumulative Probability

00 11zz

)1( zP

Probability that z ≤ 1 is the area under the curve to the left of 1.

What is P(z ≤ 1)?

Z .00 .01 .02

.9 .8159 .8186 .8212

1.0 .8413 .8438 .8461

1.1 .8643 .8665 .8686

1.2 .8849 .8869 .8888

To find out, use the Cumulative Probabilities Table for the Standard Normal Distribution

)1( zP

Area under the curveArea under the curve

00zz

211--11

-2

68.25%

95.45%

•68.25 percent of the total area under the curve is within (±) 1 standard deviation from the mean.

•95.45 percent of the area under the curve is within (±) 2 standard deviations of the mean.

Exercise 1

2.46

a) What is P(z ≤2.46)?

b) What is P(z >2.46)?

Answer:

a) .9931

b) 1-.9931=.0069

z

Exercise 2

-1.29

a) What is P(z ≤-1.29)?

b) What is P(z > -1.29)?

Answer:

a) 1-.9015=.0985

b) .9015

Note that, because of the symmetry, the area to the left of -1.29 is the same as the area to the right of 1.29

1.29

Red-shaded area is equal to green- shaded area

Note that:

)29.1(1)29.1( zPzP

z

Exercise 3

0

What is P(.00 ≤ z ≤1.00)?

1

3413.5000.8413.

)0()1()100(.

zPzPzP

P(.00 ≤ z ≤1.00)=.3413

z

Recommended