Statistics Cental Tendency

Embed Size (px)

Citation preview

  • 8/3/2019 Statistics Cental Tendency

    1/52

    1

    StatisticsStatistics

    Prof. Rushen ChahalDescriptive Statistics -

    Measures of Central Tendency

    Chapters 3 & 4

  • 8/3/2019 Statistics Cental Tendency

    2/52

    2

    TO CALCULATE THE ARITHMETIC MEAN,THE WEIGHTED MEAN, THE MEDIAN,

    THE MODE, AND THE GEOMETRIC MEAN.

    TO EXPLAIN THE CHARACTERISTICS,

    USE, ADVANTAGES, AND

    DISADVANTAGES OF EACH MEASURE OF

    CENTRAL TENDENCY.

    TO IDENTIFY THE POSITION OF THEARITHMETIC MEAN,MEDIAN, AND MODE

    FOR BOTH A SYMMETRICAL AND A

    SKEWED DISTRIBUTION.

    TODAYS GOALS

  • 8/3/2019 Statistics Cental Tendency

    3/52

    3

    Definition: For ungrouped data, the populationmean isthe sum ofall the population values

    divided by the total number of population

    values. To compute the population mean, use

    the following formula.

    POPULATION MEAN

    mu

    Sigma

    Population

    Size

    Individual

    value

    Q ! XN

  • 8/3/2019 Statistics Cental Tendency

    4/52

    4Parameter: A measurable characteristic ofa

    population. For example, the population mean.

    A racing team hasa fleet of four cars. The

    following are the milescovered by eachcar over

    their lives: 23,000, 17,000, 9,000, and 13,000.Find the average milescovered by eachcar.

    Since this fleetisthe population, the mean is

    (23,000 + 17,000 + 9,000 + 13,000)/4 = 15,500.

    EXAMPLE

  • 8/3/2019 Statistics Cental Tendency

    5/52

    5

    Definition: For ungrouped data, the samplemean isthe sum ofall the sample values divided

    by the number ofsample values. To compute the

    sample mean, use the following formula.

    THE SAMPLE MEAN

    XXn!

    7X-bar Sigma

    Sample

    Size

    Individual

    value

  • 8/3/2019 Statistics Cental Tendency

    6/52

    6

    Population Mean QSample Mean x

    WHATS THE DIFFERENCE?

    ?

  • 8/3/2019 Statistics Cental Tendency

    7/52

  • 8/3/2019 Statistics Cental Tendency

    8/52

  • 8/3/2019 Statistics Cental Tendency

    9/52

    9Sum of Deviations

    Consider the set of values: 3, 8, and 4. The meanis 5. So (3 -5) + (8 - 5) + (4 - 5) = -2 + 3 - 1 = 0.

    Symbolically we write:

    The mean isalso known asthe expected value

    or average.

    ( )X X ! 0

  • 8/3/2019 Statistics Cental Tendency

    10/52

    10

    Definition: The midpoint ofthe valuesafter theyhave been ordered from the smallestto the

    largest, or the largestto the smallest. There are

    as many valuesabove the median as below itin

    the dataarray.

    Note: For an odd set of numbers, the median will

    be the middle number in the ordered array.

    Note: For an even set of numbers, the medianwill be the arithmeticaverage ofthe two middle

    numbers.

    THE MEDIAN

  • 8/3/2019 Statistics Cental Tendency

    11/52

    11Compute the median:

    The road life for asample of five tiresin milesis:

    42,000 51,000 40,000 39,000 48,000

    Arranging the datain ascending order gives:

    39,000 40,000 42,000 48,000 51,000. Thus

    the median is 42,000 miles.

    EXAMPLE

  • 8/3/2019 Statistics Cental Tendency

    12/52

    12Compute the median:

    The following valuesare years ofservice for a

    sample ofsix store managers: 16 12 8 15 7

    23.

    Arranging in order gives 7 8 12 15 16 23.

    Thusthe median is (12 + 15)/2 = 13.5 years.

    EXAMPLE

  • 8/3/2019 Statistics Cental Tendency

    13/52

    13

    There isa unique median for each dataset. Itis notaffected by extremely large or small

    valuesand istherefore a valuable measure of

    central tendency when such values occur.

    Itcan be computed for ratio-level, interval-level,

    and ordinal-level data.

    Itcan be computed for an open-ended frequency

    distribution ifthe median does not lie in an

    open-ended class.

    PROPERTIES OF THE MEDIAN

  • 8/3/2019 Statistics Cental Tendency

    14/52

    14 Definition: The mode isthat value ofthe

    observation thatappears most frequently

    The exam scores for ten studentsare: 81 93 75

    68 87 81 75 81 87. Whatisthe modal exam

    score?

    Since the score of 81 occursthe most, then the

    modal score is 81.

    The nextslide showsthe histogram withsixclasses for the water consumption from our

    previousclass. Observe thatthe modal classis

    the blue box witha midpoint of 15.

    THE MODE

  • 8/3/2019 Statistics Cental Tendency

    15/52

    15WATER CONSUMPTION IN 1,000 GALLONS

  • 8/3/2019 Statistics Cental Tendency

    16/52

    16THEWEIGHTED MEANDefinition: The weighted mean ofaset of

    numbersX1, X2, ..., Xn, withcorresponding

    weightsw1, w2, ... , wn, iscomputed from the

    following formula.

    Why would anyone wantto weight observations?

    Xw X w X w X

    w w w

    or X

    w X

    w

    wn n

    n

    w

    !

    !

    1 1 2 2

    1 2

    ...

    ...

    ( )

  • 8/3/2019 Statistics Cental Tendency

    17/52

    17EXAMPLE

    During a one hour period on a busy Friday

    night, fifty soft drinks were sold atthe Kruzin

    Cafe. Compute the weighted mean ofthe price

    ofthe soft drinks. (Price ($), Number sold):

    (0.5, 5), (0.75, 15), (0.9, 15), and (1.10, 15).

    The weighted mean is

    [0.5v+ 0.75v15 + 0.9v15 + 1.1v15]/[5 +15+15+ 15]= $43.75/50 =

    $0.875

  • 8/3/2019 Statistics Cental Tendency

    18/52

    18Definition: The geometric mean (GM)ofaset of

    n numbersis defined asthe nth root ofthe

    product ofthe n numbers. The formula for the

    geometric mean is given by:

    One main use ofthe geometric mean isto

    average percents.

    How do you compute the nth root ofa number?

    THE GEOMETRIC MEAN

    GM X X X X nn!

    1 2 3

    ...

  • 8/3/2019 Statistics Cental Tendency

    19/52

    19EXAMPLE The profits earned by ABC Construction on

    three projects were 6, 3, and 2 percent

    respectively. Compute the geometric mean

    profitand the arithmetic mean and compare.

    The geometric mean is

    The arithmetic mean profit =(6 + 3 + 2)/3 =

    3.6667.

    The geometric mean of 3.3019 givesa moreconservative profit figure than the arithmetic

    mean of 3.6667. Thisis because the GM is not

    heavily weighted by the profit of 6 percent.

    GM! !( )( )( ) . .6 3 23 33019

  • 8/3/2019 Statistics Cental Tendency

    20/52

    20 The other main use ofthe geometric mean to

    determine the average percentincrease in sales,

    production or other business or economicseries

    from one time period to another.

    The formula for the geometric mean asappliedto thistype of problem is:

    Where did thiscome from?

    THE GEOMETRIC MEAN (continued)

    GM Value at end of periodValue at beginning of period

    n! 1 1

  • 8/3/2019 Statistics Cental Tendency

    21/52

    21 The total enrollmentata large university

    increased from 18,246 in 1985 to 22,840 in 1995.

    Compute the geometric mean rate ofincrease

    over the period.

    Here n = 10, so n - 1 = 9 = (number of periods) The geometric mean rate ofincrease is given by

    Thatis, the geometric mean rate ofincrease is

    2.53%.

    EXAMPLE

    GM

    22,84018,246! !9 1 00253. .

  • 8/3/2019 Statistics Cental Tendency

    22/52

    22 Do you prefer I use the arithmetic mean or the

    geometric mean to compute classscore

    averages?

    EXAMPLE

    ?

  • 8/3/2019 Statistics Cental Tendency

    23/52

    23 The mean ofasample of data organized in a

    frequency distribution iscomputed by the

    following formula:

    THE MEAN OF GROUPED DATA

    X-bar

    Sum of

    frequencies

    Class midpoint

    Xvalues -

    Sample

    size

    XXf

    f

    Xf

    n!

    !

    f- class

    frequency

  • 8/3/2019 Statistics Cental Tendency

    24/52

    24A sample oftwenty appliance storesin a large

    metropolitan area revealed the following

    number of VCRssold last week. Compute the

    mean number sold. The formulaand

    computation isshown below.

    EXAMPLE

    XfXf

    fXn!

    !

    = 325/20

    = 16.25 VCRs

  • 8/3/2019 Statistics Cental Tendency

    25/52

    25EXAMPLE (continued) The table also gives the necessary computations.

  • 8/3/2019 Statistics Cental Tendency

    26/52

    26SYMMETRIC DISTRIBUTION

    Symmetric

    Distribution

    Mode = Median = Mean

    Zero

    Skewness

  • 8/3/2019 Statistics Cental Tendency

    27/52

    27RIGHT SKEWED DISTRIBUTION

    MODE

    MEDIAN

    MEAN

    Positively skewed

    Mean and median

    are to the RIGHT ofthe mode.

  • 8/3/2019 Statistics Cental Tendency

    28/52

    28LEFT SKEWED DISTRIBUTION

    MODE

    MEDIAN

    MEAN

    Negatively skewed

    Mean and median

    are to the LEFT of

    the mode.

  • 8/3/2019 Statistics Cental Tendency

    29/52

  • 8/3/2019 Statistics Cental Tendency

    30/52

  • 8/3/2019 Statistics Cental Tendency

    31/52

    31TO COMPARE (COMPUTE) VARIOUS MEASURES

    OF DISPERSION FOR GROUPED AND

    UNGROUPED DATA.

    TO EXPLAIN THE CHARACTERISTICS, USES,

    ADVANTAGES, AND DISADVANTAGES OF EACHMEASURE OF DISPERSION.

    TO EXPLAIN CHEBYSHEVS THEOREM AND THE

    EMPIRICAL (NORMAL) RULE.

    TO COMPUTE THE COEFFICIENTS OFVARIATION AND SKEWNESS.

    DEMONSTRATE COMPUTING STATISTICS WITH

    EXCEL.

    TODAYS GOALS

  • 8/3/2019 Statistics Cental Tendency

    32/52

    32

    Range: For ungrouped data, the range isthedifference between the highestand lowest values

    in aset of data. To compute the range, use the

    following formula.

    EXAMPLE : A sample of five recentaccountinggraduates revealed the following starting salaries

    (in $1000): 17 26 18 20 19. The range isthus

    $26,000 - $17,000 = $9,000.

    MEASURES OF DISPERSION -

    UNGROUPED DATA

    RANGE= HIGHEST VALUE - LOWEST VALUE

  • 8/3/2019 Statistics Cental Tendency

    33/52

    33Mean Deviation: The arithmetic mean ofthe

    absolute values ofthe deviations from the

    arithmetic mean. Itiscomputed by the formula

    below:

    MEAN DEVIATION

    Individual

    Value

    Arithmetic

    Mean

    Sample

    Size

    MD

    X X

    n!

  • 8/3/2019 Statistics Cental Tendency

    34/52

  • 8/3/2019 Statistics Cental Tendency

    35/52

    35Population Variance: The population variance

    for ungrouped dataisthe arithmetic mean ofthe

    squared deviations from the population mean. It

    iscomputed from the formula below:

    POPULATION VARIANCE

    Sigma

    square

    Population

    size

    Population meanIndividualvalue

    W

    Q2 2

    !( )X

    N

  • 8/3/2019 Statistics Cental Tendency

    36/52

    36 The ages ofall the patientsin the isolation ward

    of Yellowstone Hospital are 38, 26, 13, 41, and 22years. Whatisthe population variance? The

    computationsare given below.

    EXAMPLE

    Q= 7X)/N

    = 140/5 = 28.

    W2 = 7(X- Q)2/N

    = 534/5= 106.8.

  • 8/3/2019 Statistics Cental Tendency

    37/52

    37ALTERNATIVE FORMULA FOR THEPOPULATION VARIANCE

    Verify, using above formula, thatthe populationvariance is 106.8 for the previous example.

    Why would you use this formula?

    W

    22 2

    !

    X

    N

    X

    N

  • 8/3/2019 Statistics Cental Tendency

    38/52

    38

    Population Standard Deviation: The population

    standard deviation (W) isthe square root ofthe

    population variance.

    For the previous example, the population

    standard deviation isW = 10.3344 (square root of

    106.8).

    Note:If you are given the population standarddeviation, just square that number to get the

    population variance.

    THE POPULATION STANDARD

    DEVIATION

  • 8/3/2019 Statistics Cental Tendency

    39/52

    39Sample Variance: The formula for the sample

    variance for ungrouped datais:

    OROR

    SAMPLE VARIANCE

    2

    2

    1s

    X X

    n!

    ( )

    2

    22

    1s X

    X

    nn!

    ( )

    Samplevariance

    Thissample variance is used to estimate the

    population variance.

  • 8/3/2019 Statistics Cental Tendency

    40/52

    40A sample of five hourly wages for blue-collar

    jobsis: 17 26 18 20 19. Find the variance.

    EXAMPLE

    = 100/5 = 20

    s2 = 50/(5 - 1)

    = 12.5.

    X

  • 8/3/2019 Statistics Cental Tendency

    41/52

    41Sample Standard Deviation: The sample

    standard deviation (s) isthe square root ofthesample variance.

    For the previous example, the sample standard

    s = 3.5355 (square root of 12.5).

    Note:If you are given the sample standard

    deviation, just square that number to get the

    sample variance.

    SAMPLE STANDARD DEVIATION

  • 8/3/2019 Statistics Cental Tendency

    42/52

    42

    Chebyshevs theorem: For any set ofobservations (sample or population), the

    minimum proportion ofthe valuesthat lie within

    kstandard deviations ofthe mean isat least

    1 - 1/k2, where kisany constant greater than 1.

    Empirical Rule: For any symmetrical, bell-

    shaped distribution, approximately 68% ofthe

    observations will lie within s 1Wofthe mean (Q);approximately 98% within s 2Wofthe mean (Q);

    and approximately 99.7% within s 3Wofthe

    mean (Q).

    INTERPRETATION AND USES OF THE

    STANDARD DEVIATION

  • 8/3/2019 Statistics Cental Tendency

    43/52

    43

    Between:

    1. 68.26%

    2. 95.44%

    3. 99.97%

    QW QW QW Q QW QW QW

    Bell-Shaped Curve showing the relationship between W and Q.

  • 8/3/2019 Statistics Cental Tendency

    44/52

  • 8/3/2019 Statistics Cental Tendency

    45/52

  • 8/3/2019 Statistics Cental Tendency

    46/52

    46RELATIVE DISPERSION Coefficient of Variation: The ratio ofthe

    standard deviation to the arithmetic mean,expressed asa percentage.

    For example, ifthe CVfor the yield oftwo

    differentstocksare 10 and 25. The stock withthe larger CVhas more variation relative to the

    mean yield. Thatis, the yield for thisstockis not

    asstable asthe other.

    CVs

    X! (100%)

  • 8/3/2019 Statistics Cental Tendency

    47/52

  • 8/3/2019 Statistics Cental Tendency

    48/52

    48SYMMETRIC DISTRIBUTION

    Symmetric

    Distribution

    Mode = Median = Mean

    Zero

    Skewness

  • 8/3/2019 Statistics Cental Tendency

    49/52

    49RIGHT SKEWED DISTRIBUTION

    MODE

    MEDIAN

    MEAN

    Positively skewed

    Mean and median

    are to the RIGHT ofthe mode.

  • 8/3/2019 Statistics Cental Tendency

    50/52

    50LEFT SKEWED DISTRIBUTION

    MODE

    MEDIAN

    MEAN

    Negatively skewed

    Mean and median

    are to the LEFT of

    the mode.

  • 8/3/2019 Statistics Cental Tendency

    51/52

    51

    =AVERAGE(A1:A10) ArithmeticMean

    =MEDIAN(A1:A10) Median Value

    =MODE(A1:A10) Modal Value

    =GEOMEAN(A1:A10) GeometricMean=QUARTILE(A1:A10,Q) Quartile Q Value

    =MAX(A1:A10)-MIN(A1:A10) Range

    =PERCENTILE(A1:A10,P) Percentile P Value

    EXCEL FUNCTIONSEXCEL FUNCTIONS

  • 8/3/2019 Statistics Cental Tendency

    52/52

    52

    =AVEDEV(A1:A10) Mean Absolute Deviation (MAD)

    =VAR(A1:A10) Sample Variance

    =STDEV(A1:A10) Sample Standard Deviation

    =VARP(A1:A10) Population Variance

    =STDEVP(A1:A10) Population Standard Deviation

    =STDEV(A1:A10)/AVERAGE(A1:A10) Coefficeint of

    Variation=SKEW(A1:A10) Coefficient of Skewness

    (not same as book)

    MORE EXCEL FUNCTIONSMORE EXCEL FUNCTIONS