Central Tendency (Stats)

Embed Size (px)

Citation preview

  • 7/28/2019 Central Tendency (Stats)

    1/89

    MEASURES OF CENTRALTENDENCY ( Location ) Average age of top 50 powerful persons of 2010in India decreased from 58 years to 54

  • 7/28/2019 Central Tendency (Stats)

    2/89

    Measures of Location orCentral Tendency.

    In a distribution , the observationscluster around a central value. Thisproperty of concentration of the

    observations around a central value iscalled Central Tendency.

    The central Value around which there isconcentration is called measure ofcentral tendency( measure of location,Average).

    Ex: Mean marks scored by 1st PGDM is 65 %

  • 7/28/2019 Central Tendency (Stats)

    3/89

    Objectives of Averaging:To get a Single value that describes the

    characteristics of the entire data.

    To Facilitate Comparison. For computing various other

    statistical measures such as

    dispersion, skewness, kurtosisand various other basiccharacteristics of a mass data.

  • 7/28/2019 Central Tendency (Stats)

    4/89

    Requisites of a Good Average.

    1. It should be simple to Understand andeasy to calculate.

    2. It should be based on all the items ofthe given data.

    3. It should be rigidly defined.4. It should be capable of further

    mathematical treatment.

    5. It should be affected as little as possible

    by fluctuations of sampling.6. It should not be affected by extreme

    observations ( Values)

  • 7/28/2019 Central Tendency (Stats)

    5/89

    1. Arithmetic Mean (A.M)

    2. Median (M)

    3. Mode (Z)

    4. Geometric Mean (G.M)

    5. Harmonic Mean (H.M)

    Various measures of Central Tendency

  • 7/28/2019 Central Tendency (Stats)

    6/89

    1. Arithmetic Mean ( A.M )

    A.M=Sum of observationsNumber of observations

  • 7/28/2019 Central Tendency (Stats)

    7/89

    Calculation of A.M

    1. Ungrouped Data ( Raw Data)

    2. Discrete Data3. Continuous Data

  • 7/28/2019 Central Tendency (Stats)

    8/89

    Ungrouped Data (Raw Data):

    A sample of 30 persons weight of a

    particular class students are as follows.

    62 58 58 52 48 53 54 63 69 63

    57 56 46 48 53 56 57 59 58 53

    52 56 57 52 52 53 54 58 61 63

  • 7/28/2019 Central Tendency (Stats)

    9/89

    Discrete DataNumber of post graduates

    (x)

    Frequency (f)

    0 2

    1 2

    2 4

    3 1

    4 1

  • 7/28/2019 Central Tendency (Stats)

    10/89

    Continuous Data

    Marks No. of students20 30 5

    30 40 15

    40 50 25

  • 7/28/2019 Central Tendency (Stats)

    11/89

    Exclusive method (overlapping)In this method, the upper limits of one class-interval are the lower limit of next class. Thismethod makes continuity of data.

    A student whose mark is between 20 to 29.9will be included in the 20 30 class.

    Marks No. of students

    20

    30 5

    3040 15

    4050 25

  • 7/28/2019 Central Tendency (Stats)

    12/89

    Inclusive method (non-overlaping)

    A student whose mark is 29 is included in

    20 29 class interval and a student whose

    mark in 39 is included in 30 39 classinterval.

    Marks No. of students2029 5

    3039 15

    4049 25

  • 7/28/2019 Central Tendency (Stats)

    13/89

    Ungrouped Data (Raw Data)

    x = observationsn = number of observations.

    X =X1+ X2+ X3+ + X n

    n

    = Xn

  • 7/28/2019 Central Tendency (Stats)

    14/89

    The following data gives value of equity holdings of 20 of

    the Indias billionaires.

    Name Equity Holdings ( M illions of Rs.) Kiran Mazumdar-shaw 2717The Nilekani family 2796The Punj family 3098Karsanbhai K.Patel& family 3144Shashi Ruia 3527K.K . Birla 3534

    B. Rama Linga Raju 3862Habil F. khorakiwala 4187The Murthy family 4310Keshub Mahindra 4506The Kirloskar family 4745M.v. Subbiah family 4784Ajay G. Piramal 4923

    Uday Kotak 5034S.P.Hinduja 5071Subhash Chandra 5424Adi Godrej 5561Vijay Mallya 6505V.N. Dhoot 6707

    Naresh Goyal 6874

  • 7/28/2019 Central Tendency (Stats)

    15/89

    X 2717+2796++6874

    n 20

    = Rs.4565.4 Millions

    =X =

  • 7/28/2019 Central Tendency (Stats)

    16/89

    X = ObservationsF = Frequency

    Discrete Data

    X =f x

    f

  • 7/28/2019 Central Tendency (Stats)

    17/89

    Problem on Discrete Data

    The following is the frequency distribution of the number of

    telephone calls received in 245 successive one-minuteintervals at an exchange:

    Obtain the mean number of calls per minute.

    No. of Calls 0 1 2 3 4 5 6 7Frequency 14 21 25 43 51 40 39 12

  • 7/28/2019 Central Tendency (Stats)

    18/89

    No. of calls (x) Frequency (f) f x

    0 14 0

    1 21 21

    2 25 50

    3 43 129

    4 51 204

    5 40 200

    6 39234

    7 12 84

    f=245 f x: 922

  • 7/28/2019 Central Tendency (Stats)

    19/89

    f x 922

    f 245=X = = 3.763

  • 7/28/2019 Central Tendency (Stats)

    20/89

    Continuous Series

    The calculation is illustrated with the data relating toequity holdings of the group of 20 billionaires given

    Class Interval Frequency

    2000-3000 2

    3000-4000 5

    4000-5000 6

    5000-6000 4

    6000-7000 3

  • 7/28/2019 Central Tendency (Stats)

    21/89

    Class Interval Frequency (F) Mid value(X) fx

    2000-3000 2 2500 50003000-4000 5 3500 17500

    4000-5000 6 4500 27000

    5000-6000 4 5500 22000

    6000-7000 3 6500 19500f=20 fx=91000

  • 7/28/2019 Central Tendency (Stats)

    22/89

    f x 91000

    f 20==X = 4550

  • 7/28/2019 Central Tendency (Stats)

    23/89

    Properties of Arithmetic Mean

    1. The sum of the deviations, of all thevalues x, from their arithmetic mean, isalways zero

    2. The product of the arithmetic mean and

    the number of items gives the total ofall items.

    3. If there are the arithmetic mean of two samplesof sizes n1and n2 respectively then, the

    arithmetic mean of the distribution combiningthe two can be calculated as

    X12 = N1 X 1 + N2 X 2

    N1 + N2

  • 7/28/2019 Central Tendency (Stats)

    24/89

    Properties of Arithmetic Mean4. The sum of squared deviations of the

    items from mean is minimum, whencompared to the sum of squareddeviation of the items from any othervalue.

  • 7/28/2019 Central Tendency (Stats)

    25/89

    Weighted Mean

    The weighted meanof a set of numbersX1, X2, ..., Xn, with corresponding weightsw1, w2, ...,wn, is computed from the

    following formula:

  • 7/28/2019 Central Tendency (Stats)

    26/89

    26

    EXAMPLE Weighted Mean

    The Carter Construction Company pays its hourlyemployees $16.50, $19.00, or $25.00 per hour.There are 26 hourly employees, 14 of which arepaid at the $16.50 rate, 10 at the $19.00 rate, and2 at the $25.00 rate. What is the mean hourly rate

    paid the 26 employees?

  • 7/28/2019 Central Tendency (Stats)

    27/89

    Merits:1. Mean is based on all the items of the

    given data.2. Mean is rigidly defined by a

    mathematical formula.

    3. Mean is capable of further algebraic

    treatment.

    4. Mean has good sampling stability.

  • 7/28/2019 Central Tendency (Stats)

    28/89

    Demerits:1. Mean can be unduly affected by

    extreme values.

    2. Mean cannot be calculated for

    open-end classes, since midpoints cannot be found for suchclasses.

    3. Mean cannot be foundgraphically like median and mode

  • 7/28/2019 Central Tendency (Stats)

    29/89

    Median (M) The median is that value of thevariable which divides the group in

    two equal parts, one partcomprising all the values greaterand the other, all the values less

    than median.

  • 7/28/2019 Central Tendency (Stats)

    30/89

    Calculation of Median

    1. Ungrouped Data ( Raw Data)

    2. Discrete Data

    3. Continuous Data

  • 7/28/2019 Central Tendency (Stats)

    31/89

    Raw Data

    Steps:1. Arrange the data in ascending

    order.

    2. Find n+1 value2

    3. Apply the formula

    M= size of n+1 item.

    2

  • 7/28/2019 Central Tendency (Stats)

    32/89

    Sales Sorted Sales9 66 9

    12 1010 1213 1315 1416 1414 1514 1616 1617 1616 1724 1721 1822 1818 1919 2018 2120 2217 24

    The median is the middle value ofdata sorted in order of magnitude.

    (20+1)/2=10.516

    Median

  • 7/28/2019 Central Tendency (Stats)

    33/89

    Discrete Data

    Steps:1. Find Cumulative Frequencies (C.F)2. Find N/2 value.

    N= total Frequency3. Apply the formula M= Size of (N/2)thitem. In other words locate a valuewhich is just more than N/2 value.(Note: This is not Median)

    4. Read the corresponding X value. Thisgives the value of Median.

  • 7/28/2019 Central Tendency (Stats)

    34/89

    Continuous data

    Steps:1. Find C.F

    2. Find N/2 Value.

    3. Locate the value which is morethan N/2 value from thecumulative frequency column.

    4. Read the corresponding class.This is the median class i. e theclass where median lies.

  • 7/28/2019 Central Tendency (Stats)

    35/89

    5. Apply the formula,

    M= l+ 2

    M = Median

    l = Lower limit of the median class.

    N = Total Frequency

    c. f= cumulative frequency of the pre medianclass

    f = frequency of the median class

    c = width of the median class

    N C .F

    f

    X c

  • 7/28/2019 Central Tendency (Stats)

    36/89

    Merits:1. It is easy to understand and easy to

    calculate for a non-mathematicalperson.2. It is not affected by extreme

    observations.

    3. Median can be calculated dealingwith a distribution with open endclasses.

    4. Median can be representedgraphically.

    5. Median is the only average to beused with qualitative data.

  • 7/28/2019 Central Tendency (Stats)

    37/89

    Demerits:

    1. In case of even number of

    observation for an ungrouped data ,median can not be determinedgraphically.

    2. Median, being a positional average ,

    is not based on each and every itemof the distribution.

    3. Median is not suitable for furthermathematical treatment.

    4. Median doest not have samplingstability.

  • 7/28/2019 Central Tendency (Stats)

    38/89

    Mode (Z)

    . . . . . : . : : : . . . . .

    ---------------------------------------------------------------6 9 10 12 13 14 15 16 17 18 19 20 21 22 24.

    Mode

    Mode is defined as the value which is repeatedmaximum number of times in a data.

    .

  • 7/28/2019 Central Tendency (Stats)

    39/89

    Calculation of Mode

    1. Ungrouped Data ( Raw Data)2. Discrete Data

    3. Continuous Data

  • 7/28/2019 Central Tendency (Stats)

    40/89

    Ungrouped data

    Here, Mode is calculated by mere

    inspection.

  • 7/28/2019 Central Tendency (Stats)

    41/89

    Discrete Data

    Here, Mode is calculated by mere

    inspection.

  • 7/28/2019 Central Tendency (Stats)

    42/89

    Continuous data

    Steps:1. Locate the maximum frequency.

    2. Read the corresponding class.

    This is the modal class i.e., theclass where mode lies.

    3. Apply the formula,

    z= l + 1+

    1

    1 2

    X c

  • 7/28/2019 Central Tendency (Stats)

    43/89

    z = mode

    l = lower limit of modal classf - f

    f - f

    f Frequency of modal classf Frequency of pre modal class

    f Frequency of post modal

    classc Width of the class interval

    1= 1 0

    2= 1 2

    1 =

    0 =

    2 =

    =

  • 7/28/2019 Central Tendency (Stats)

    44/89

    Merits:

    1. Its value can be easily ascertained

    without much calculation.2. It is an average which is commonly

    used in day to day life.

    3. It is not affected by extreme values.4. The data need not be arranged.

    5. Mode can be graphicallydetermined.

    6. Mode can be calculated for datawith open-end classes.

  • 7/28/2019 Central Tendency (Stats)

    45/89

    Demerits:

    1. Mode is not based on each and

    every item of the data.2. Mode is not capable of further

    of algebraic treatment.

    3. Mode is not rigidly defined.

    4. Model value can be misleading.

    5. Mode is ill defined for bimodalor multimodal distribution.

    6. Mode doesnt have samplingstability.

    R l ti b/ di

  • 7/28/2019 Central Tendency (Stats)

    46/89

    Relation b/w mean, medianand mode.

    Mode = mean - 3 [mean - median]

    Mode = 3 median - 2 mean

    Median = mode +

  • 7/28/2019 Central Tendency (Stats)

    47/89

    Symmetrical Distribution

  • 7/28/2019 Central Tendency (Stats)

    48/89

    NEGATIVELY OR LEFT SKEWED

    Mean < Median < Mode

  • 7/28/2019 Central Tendency (Stats)

    49/89

    POSITIVE OR RIGHT SKEWED

    Mean > Median > Mode

  • 7/28/2019 Central Tendency (Stats)

    50/89

    Geometric Mean It is defined as the nth root of

    product of n positive values oritems.

  • 7/28/2019 Central Tendency (Stats)

    51/89

    Calculation of G.MUngrouped data ( Raw Data )G.M= antilog log X

    n

  • 7/28/2019 Central Tendency (Stats)

    52/89

  • 7/28/2019 Central Tendency (Stats)

    53/89

    Calculation of G.MGrouped data (Discrete &Continues Data )G.M= antilog f log X

    N= total Frequency.

    N

    Example:

  • 7/28/2019 Central Tendency (Stats)

    54/89

    Example:

    Suppose you receive a 5 percentincrease in salary this year and a 15percent increase next year. The

    average annual percent increase is8.886, not 10.0. Why is this so? Webegin by calculating the geometric

    mean.

  • 7/28/2019 Central Tendency (Stats)

    55/89

    55

    The Geometric Mean Useful in finding the average change ofpercentages, ratios, indexes, or growthrates over time. It has a wide application in business and

    economics because we are often interestedin finding the percentage changes in sales,salaries, or economic figures, such as theGDP, which compound or build on eachother.

    The geometric mean will always be lessthan or equal to the arithmetic mean.

  • 7/28/2019 Central Tendency (Stats)

    56/89

    Combined Geometric Mean

    G = Antilog [(n1 log G1 + n2 log G2)/ (n1 + n2)]

  • 7/28/2019 Central Tendency (Stats)

    57/89

    Geometric MeanMerits:1. Makes use of full data.2. Extreme large values havelesser impact.3. Useful for data relating to ratiosand percentages.4. Useful for rate ofchange/growth.

    Demerits: (G.M)

  • 7/28/2019 Central Tendency (Stats)

    58/89

    1. Cannot be calculated if anyobservation has the value zeroor is negative.2. Difficult to calculate and

    interpret.

  • 7/28/2019 Central Tendency (Stats)

    59/89

    AM, GM, and HM satisfy theseinequalities:

    AMGMHM

    Equality holds only when all theelements of the given sample areequal.

  • 7/28/2019 Central Tendency (Stats)

    60/89

    Harmonic Mean

    It is defined as the reciprocal ofmean of reciprocal of values.Calculation of H.M:

    UngroupedData Grouped Datan

  • 7/28/2019 Central Tendency (Stats)

    61/89

    H.M- Merits:

    1. It is based on all the items of thegiven data.

    2. It gives the best results wheretime and rates are under study.

    3. It is rigidly defined.

    4. It is calculated even if the seriescontains negative values.

  • 7/28/2019 Central Tendency (Stats)

    62/89

    H.M Demerits:

    1. It is difficult for layman tounderstand and interpret.

    2. It has limited practicalapplication.

    3. It cannot be calculated if any of

    the value is zero.

  • 7/28/2019 Central Tendency (Stats)

    63/89

    Sales Sales Executive A Sales Executive B Sales Executive C

    March 14 10 6

    April 12 10 16

    May 6 10 7

    June 8 10 15

    July 13 10 10

    Aug 7 10 6

    Total 60 60 60

    Average 10 10 10

  • 7/28/2019 Central Tendency (Stats)

    64/89

  • 7/28/2019 Central Tendency (Stats)

    65/89

    MEASURES OF DISPERSION

    Why Study Dispersion?

  • 7/28/2019 Central Tendency (Stats)

    66/89

    A measure of location, such as the mean or themedian, only describes the center of the data. It isvaluable from that standpoint, but it does not tell usanything about the spread of the data.For example, if your nature guide told you that theriver ahead averaged 3 feet in depth, would youwant to wade across on foot without additionalinformation? Probably not. You would want to knowsomething about the variation in the depth.A second reason for studying the dispersion in a setof data is to compare the spread in two or moredistributions.

  • 7/28/2019 Central Tendency (Stats)

    67/89

    The scatterdness of

    values from any measure of centraltendency is called Variation orDispersion

    Characteristics for Ideal

  • 7/28/2019 Central Tendency (Stats)

    68/89

    measure of dispersion1. It should be rigidly defined.2. It should be based on all the

    observations.

    3. It should be amenable to furthermathematical treatment.

    4. It should be not be affected by

    extreme observations.

    f

  • 7/28/2019 Central Tendency (Stats)

    69/89

    Measures of Dispersion:

    1. Range2. Quartile Deviation

    3. Mean Deviation

    4. Standard Deviation

  • 7/28/2019 Central Tendency (Stats)

    70/89

    Range:

    Range is simply the difference between thehighest and lowest value in the distributionof values.

    Example:

    Weekly income of 10 people:

    Range is maximum income minusminimum income: 330-180 = 150.180 220 280 320 280 180 350 280 330 220

  • 7/28/2019 Central Tendency (Stats)

    71/89

    Group A: 30, 40, 40, 40, 40, 50, 50

    Group B: 30, 30, 30, 40, 50, 50,50

    Group C: 30, 35, 40, 40, 40, 45, 50

    Range:20

    Let us take two sets of observations.Set A contains marks of five students

  • 7/28/2019 Central Tendency (Stats)

    72/89

    Set A contains marks of five studentsin Mathematics out of 25 marks and group Bcontains marks of the same student in

    English out of 100 marks.

    Set A: 10, 15, 18, 20, 20Set B: 30, 35, 40, 45, 50

    The values of range and coefficient of

    range are calculated as:

  • 7/28/2019 Central Tendency (Stats)

    73/89

    Range Co efficient ofRangeSet : A 20 -10 = 10

    Set : B 50 -30=20

    Coefficient of Range:

  • 7/28/2019 Central Tendency (Stats)

    74/89

    Coefficient of Range:

    It is relative measure ofdispersion and is based on thevalue of range. It is also called

    range coefficient of dispersion. Itis defined as

    Coefficient of Range = Max-min

    Max+min

    Merits: Demerits:1 It i t b d

  • 7/28/2019 Central Tendency (Stats)

    75/89

    1. It is the simplestmethod ofmeasuringvariation.

    2. It can becalculated quicklysince only twovalues are taken

    into consideration.

    1. It is not based oneach and every item

    of the given data.2. It can get affectedunduly by extremevalues, since only

    those values areconsidered.

    3. It can not becalculated for data

    with open endclasses.

    4. Range does nothave sampling

    stabilit . Semi Interquartile Range

  • 7/28/2019 Central Tendency (Stats)

    76/89

    ( Quartile Deviation )Inter quartile range (IQR) is another range measure but thistime looks at the data in terms of quarters or percentiles.The range of data is divided into four equalpercentiles or quarters (25%).

    Min Max

    Q2

    Median

    50th Percentile

    Q1

    25th percentile

    Q3

    75th percentile

    IQR

    Range Calculation Of Q.D

  • 7/28/2019 Central Tendency (Stats)

    77/89

    1. Ungrouped data ( Raw data).2. Discrete Data.

    3. Continuous Data.

    Raw Data:

  • 7/28/2019 Central Tendency (Stats)

    78/89

    Raw Data:

    thQ1 = Size of n+1 item.

    4

    thQ3 = Size of 3 n+1 item.

    4

    Di t D t

  • 7/28/2019 Central Tendency (Stats)

    79/89

    Discrete Data:

    Merits:

  • 7/28/2019 Central Tendency (Stats)

    80/89

    1.It is simple to compute and easyto understand.

    2. It can be computed for data with

    open-end classes.3. It is not affected by extreme

    values.

    Demerits:

  • 7/28/2019 Central Tendency (Stats)

    81/89

    1. It doesnt take all the values intoconsideration. It omits 50% of theitems- i.e. 25% items below Q1

    and 25% items above Q3.2. It is not much capable of further

    algebraic treatment.

    3. It doesnt have sampling stability.

    3. Mean Deviation

  • 7/28/2019 Central Tendency (Stats)

    82/89

    It is defined as the mean

    of absolute deviations of variousitems from either mean or medianor mode.

    Calc lation of M D

  • 7/28/2019 Central Tendency (Stats)

    83/89

    Calculation of M.D

    1. Raw Data2. Discrete Data

    3. Continuous Data.

    Merits of M.D:

  • 7/28/2019 Central Tendency (Stats)

    84/89

    1. It is based on every item of theseries.

    2. It is rigidly defined.

    3. It is not much affected byextreme values.

    Demerits of M.D :

  • 7/28/2019 Central Tendency (Stats)

    85/89

    1. It ignores algebraic signs whiletaking deviations of the items.

    2. It is not much used for further

    algebraic treatment.3. It can not be computed for data with

    open end classes.

    4. Calculation of M.D becomes tediouswhen the values of Mean, median,mode are in decimals.

    Variance

  • 7/28/2019 Central Tendency (Stats)

    86/89

    Where the mean is a measure of the centre of agroup of numbers, the variance is the measure of the

    spread.

    It involves measuring the distance between each ofthe values and the mean.

    To calculate the variance :

    1. calculate the mean

    2. for each value in the distribution subtract the

    mean and then square the result (the squareddifference)

    3. calculate the average of those squareddifferences.

    Variance

  • 7/28/2019 Central Tendency (Stats)

    87/89

    = Sum of (observed value

    mean score)2

    Total number of scores -1

    The larger the variance value the further the observedvalues of the data set are dispersed from the mean.

    A variance value of zero means all observed values arethe same as the mean.

    1

    2

    2

    N

    XXs

    i

    4. Standard Deviation (S.D)

  • 7/28/2019 Central Tendency (Stats)

    88/89

    ( )

    The square root of variance isknown as standard deviation.

  • 7/28/2019 Central Tendency (Stats)

    89/89