65
Chapter 3 Describing Data Using Numerical Measures` Business Statistics: A Decision-Making Approach 8 th Edition Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 3-1

# 4 Pemusatan & Penyebaran Data (TM)

Embed Size (px)

DESCRIPTION

penyebaran data

Citation preview

  • Chapter 3

    Describing Data Using

    Numerical Measures`

    Business Statistics: A Decision-Making Approach

    8th Edition

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-1

  • Chapter Goals

    After completing this chapter, you should be able to:

    Compute and interpret the mean, median, and mode for a set

    of data

    Compute the range, variance, and standard deviation and

    know what these values mean

    Construct and interpret a box and whisker graph

    Compute and explain the coefficient of variation and z scores

    Compute and explain the covariance and coefficient of

    correlation

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-2

  • Descriptive Procedures

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-3

    Center and Location

    Mean

    Median

    Mode

    Other Measures of Location

    Weighted Mean

    Describing Data Numerically

    Variation

    Variance

    Standard Deviation

    Coefficient of Variation

    RangePercentiles

    Interquartile RangeQuartiles

  • Measures of Center and Location

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-4

    Center and Location

    Mean Median Mode Weighted Mean

    N

    x

    n

    x

    x

    N

    i

    i

    n

    i

    i

    1

    1

    i

    ii

    W

    i

    iiW

    w

    xw

    w

    xwX

    Balance point

    Midpoint of ranked values

    Most frequently observed value

  • Mean (Arithmetic Average)

    The most common measure of central tendency

    Mean = sum of values divided by the number of values

    Affected by extreme values (outliers)

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-5

    0 1 2 3 4 5 6 7 8 9 10

    Mean = 3

    0 1 2 3 4 5 6 7 8 9 10

    Mean = 4

    35

    15

    5

    54321

    4

    5

    20

    5

    104321

  • Mean (Arithmetic Average)

    The Mean is the arithmetic average of data values

    Population mean

    Sample mean

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-6

    n = Sample Size

    N = Population Size

    n

    xxx

    n

    x

    x n

    n

    i

    i

    211

    N

    xxx

    N

    xN

    N

    i

    i

    211

    (continued)

  • Median

    In an ordered array (lowest to highest), the median is the

    middle number, i.e., the number that splits the distribution

    in half numerically

    50% of the data is above the median, 50% is below

    Represented as Md

    The median is not affected by extreme values

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-7

    0 1 2 3 4 5 6 7 8 9 10

    Median = 3

    0 1 2 3 4 5 6 7 8 9 10

    Median = 3

  • Median

    To find the median, sort the n data values from low to high

    (sorted data is called a data array)

    Find the value in the i = (1/2)n position

    The ith position is the Median Index Point

    If i is not an integer, round up to next highest integer

    If i is an integer, the median is the average of the values in

    position i and i + 1

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-8

    (continued)

  • Median Example

    Note that n = 13

    Find the i = (1/2)n position:

    i = (1/2)(13) = 6.5

    Since 6.5 is not an integer, round up to 7

    The median is the value in the 7th position:

    Md = 12

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-9

    Data array:4, 4, 5, 5, 9, 11, 12, 14, 16, 19, 22, 23, 24

  • Finding the Median

    The location of the median:

    If the number of values is odd, the median is the middle

    number

    If the number of values is even, the median is the average of

    the two middle numbers

    Note that is not the value of the median, only the

    position of the median in the ranked data

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-10

    dataorderedtheinpositionn

    positionMedian2

    1

    2

    1n

  • Median Example

    n = 15 (odd)

    Median is the midle number = 85

    n = 10 (even)

    Median is (n + 1)/2 = 11/2 = 5,5

    Median value is (35 + 45)/ 2 = 40

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-11

    Data array:60 68 75 77 80 80 80 85 88 90 95 95 95 95 99

    Data array:23 25 25 34 35 45 46 47 52 54

  • Shape of a Distribution

    Describes how data is distributed

    Symmetric or skewed

    The greater the difference between the mean and the median,

    the more skewed the distribution

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-12

    Mean = MedianMean < Median Median < Mean

    Right-SkewedLeft-Skewed Symmetric

    (Longer tail extends to left) (Longer tail extends to right)

  • Mode

    A measure of location

    The value that occurs most often

    Not affected by extreme values

    Used for either numerical or categorical data

    There may be no mode

    There may be several modes (2 modes = bimodal)

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-13

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

    Mode = 5

    0 1 2 3 4 5 6

    No Mode

  • Weighted Mean

    Used when values are grouped by frequency or relative

    importance

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-14

    Days to Complete

    Frequency

    5 4

    6 12

    7 8

    8 2

    Example: Sample of 26 Repair Projects

    Weighted Mean Days to Complete:

    days 6.31 26

    164

    28124

    8)(27)(86)(125)(4

    w

    xwX

    i

    iiW

  • Review Example

    Five houses on a hill by the beach

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-15

    $2,000 K

    $500 K

    $300 K

    $100 K

    $100 K

    House Prices:

    $2,000,000500,000300,000100,000100,000

  • Summary Statistics

    Mean: ($3,000,000/5)

    m = $600,000

    Median: middle value of ranked data

    Md = $300,000

    Mode: most frequent value

    Mode = $100,000

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-16

    House Prices:

    $2,000,000500,000300,000100,000100,000

    Sum 3,000,000

  • Which measure of location is the best?

    Mean is generally used, unless extreme values (outliers) exist

    Then Median is often used, since the median is not sensitive to

    extreme values.

    Example: Median home prices may be reported for a region

    less sensitive to outliers

    Mode is good for determining most likely to occur

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-17

  • Other Location Measures

    The pth percentile in a data array:

    p% are less than or equal to

    this value

    (100 p)% are greater than or equal to this value

    (where 0 p 100)

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-18

    Percentiles

    1st quartile = 25th percentile

    2nd quartile = 50th percentileAlso the median

    3rd quartile = 75th percentile

    Quartiles

  • Percentiles

    The pth percentile in an ordered array of n values is

    the value in ith position, where

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-19

    Example: Find the 60th percentile in an ordered array of

    19 values.

    (n)100

    pi

    11.4(19)100

    60(n)

    100

    pi

    If i is not an integer, round up to the next higher integer value

    So use value in the i = 12th position

    Percentile Location

    Index

  • Quartiles

    Quartiles split the ranked data into 4 equal groups:

    Note that

    Q1 is the value for which 25% of the observations are

    smaller & 75% are larger.

    Q2 quartile (the 50th percentile) is the median

    Only 25% of the observations are greater than the Q3 IQR (interquartile range) = Q3 Q1

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-20

    25%

    Q1 Q2 Q3

    25% 25% 25%

  • Quartile Formulas

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-21

    Find a quartile by determining the value in the appropriate position in the ranked data, where

    First quartile position: Q1 = (n+1)/4

    Second quartile position: Q2 = (n+1)/2 (the median position)

    Third quartile position: Q3 = 3(n+1)/4

    where n is the number of observed values

  • Quartiles

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-22

    Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

    Example: Find the first quartile

    (n = 9)

    Q1 = 25th percentile, so find i : i = (9) = 2.25

    so round up and use the value in the 3rd position: Q1 = 13

    25100

    Round up to 3 since not an

    integer

    Interpretation: 25% of the data is below 13

  • Quartiles

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-23

    (n = 9)

    Q1 is in the (9+1)/4 = 2.5 position of the ranked data,

    so Q1 = 12.5

    Q2 is in the (9+1)/2 = 5th position of the ranked data,

    so Q2 = median = 16

    Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,

    so Q3 = 19.5

    Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

    Example:

    (continued)

  • Box and Whisker Plot

    A graphical display of data using a central box and extended

    whiskers

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-24

    Example:

    25% 25% 25% 25%

    Outliers Lower 1st Median 3rd UpperLimit Quartile Quartile Limit

    **

  • Constructing the Box and Whisker Plot

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-25

    Outliers Lower 1st Median 3rd UpperLimit Quartile Quartile Limit

    **

    The lower limit is Q1 1.5 (Q3 Q1)

    The upper limit is Q3+ 1.5 (Q3 Q1)

    The center box extends from Q1 to Q3 The line within the box is the median

    The whiskers extend to the smallest and largest values within the calculated limits

    Outliers are plotted outside the calculated limits

  • Shape of Box and Whisker Plots

    The Box and central line are centered between the endpoints if data is symmetric around the median

    A Box and Whisker plot can be shown in either vertical or horizontal format

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-26

  • Distribution Shape and Box and Whisker Plot

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-27

    Right-SkewedLeft-Skewed Symmetric

    Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

  • Constructing a Box and Whisker Plot

    1. Sort values from lowest to highest

    2. Find Q1, Q2, Q33. Draw the box so that the ends are at Q1 and Q34. Draw a vertical line through the median

    5. Calculate the interquartile range (Q3 Q1)

    6. Extend dashed lines from each end to the highest and

    lowest values within the limits

    7. Identify outliers with an asterisk (*)

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-28

  • Box-and-Whisker Plot Example

    Below is a Box-and-Whisker plot for the following data:

    0 2 2 2 3 3 4 5 6 11 27

    This data is right skewed, as the plot depicts

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-29

    0 2 3 6 12 27

    Min Q1 Q2 Q3 Max

    *

    Upper limit = Q3 + 1.5 (Q3 Q1)

    = 6 + 1.5 (6 2) = 12

    27 is above the upper limit so is shown as an outlier

  • Measures of Variation

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-30

    Variation

    Variance Standard Deviation Coefficient of Variation

    PopulationVariance

    Sample Variance

    PopulationStandardDeviation

    Sample Standard Deviation

    Range

    Interquartile Range

  • Variation

    Measures of variation give information on the spread or

    variability of the data values.

    Smaller value Less variation

    Larger value More variation

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-31

    Same center,

    different variation

  • Range

    Simplest measure of variation

    Difference between the largest and the smallest observations:

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-32

    Range = xmaximum xminimum

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

    Range = 14 - 1 = 13

    Example:

  • Disadvantages of the Range

    Ignores the way in which data are distributed

    Sensitive to outliers

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-33

    7 8 9 10 11 12

    Range = 12 - 7 = 5

    7 8 9 10 11 12

    Range = 12 - 7 = 5

    1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

    1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

    Range = 5 - 1 = 4

    Range = 120 - 1 = 119

  • Interquartile Range

    Can eliminate some outlier problems by using the interquartile range

    Eliminate some high-and low-valued observations and calculate the range from the remaining values.

    Interquartile range = Q3 Q1

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-34

  • Interquartile Range Example

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-35

    Median(Q2)

    XmaximumX minimum Q1 Q3

    Example:

    25% 25% 25% 25%

    12 30 45 57 70

    Interquartile range = 57 30 = 27

  • Variance

    Average of squared deviations of values from the

    mean (in squared units) Population variance:

    Sample variance:

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-36

    N

    )(x

    N

    1i

    2

    i2

    1- n

    )x(x

    s

    n

    1i

    2

    i2

  • Standard Deviation

    Most commonly used measure of variation

    Shows variation about the mean

    Has the same units as the original data

    Population standard deviation:

    Sample standard deviation:

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-37

    N

    )(x

    N

    1i

    2

    i

    1-n

    )x(x

    s

    n

    1i

    2

    i

  • Calculation Example : Sample Standard Deviation

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-38

    Sample Data (Xi) : 10 12 14 15 17 18 18 24

    n = 8 Mean = x = 16

    4.30957

    130

    18

    16)(2416)(1416)(1216)(10

    1n

    )x(24)x(14)x(12)x(10s

    2222

    2222

    A measure of the average scatter around the mean

  • Measuring variation

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-39

    Small standard deviation

    Large standard deviation

  • Comparing Standard Deviations

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-40

    Mean = 15.5s = 3.33811 12 13 14 15 16 17 18 19 20 21

    11 12 13 14 15 16 17 18 19 20 21

    Data B

    Data A

    Mean = 15.5

    s = .9258

    11 12 13 14 15 16 17 18 19 20 21

    Mean = 15.5

    s = 4.57

    Data C

    Same mean, but different standard deviations:

  • Coefficient of Variation

    Measures relative variation

    Always in percentage (%)

    Shows variation relative to mean

    Is used to compare two or more sets of data measured in

    different units

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-41

    100%x

    sCV

    100%

    CV

    Population Sample

  • Comparing Coefficients of Variation

    Stock A:

    Average price last year = $50

    Standard deviation = $5

    Stock B:

    Average price last year = $100

    Standard deviation = $5

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-42

    5% 100%*$100

    $5100%*

    x

    sCVB

    10% 100%*$50

    $5100%*

    x

    sCVA

    Both stocks have the same standard deviation, but stock B is less variable relative to its price

  • The Empirical Rule

    If the data distribution is bell-shaped, then the interval (mean

    = median thus not skewed):

    contains about 68% of the values in the

    population or the sample

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-43

    1

    68%

    1

  • The Empirical Rule

    contains about 95% of the values in the population

    or the sample

    contains about 99.7% of the values in the

    population or the sample

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-44

    2

    3

    3

    99.7%

    95%

    2

  • Tchebysheffs Theorem

    Regardless of how the data are distributed, at least (1 - 1/k2)

    of the values will fall within k standard deviations of the mean

    Examples:

    (1 - 1/12) = 0% ..... k=1 ( 1)

    (1 - 1/22) = 75% ........ k=2 ( 2)

    (1 - 1/32) = 89% . k=3 ( 3)

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-45

    withinAt least

  • Standardized Data Values

    A standardized data value refers to the number of standard deviations a value is from the mean

    Standardized data values are sometimes referred to as z-scores

    Can be used to compare datasets

    Will be addressed in more detail later in the text

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-46

  • Standardized Population Values

    where:

    x = original data value

    = population mean

    = population standard deviation

    z = standard score

    (number of standard deviations x is from )

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-47

    x z

  • Standardized Sample Values

    where:

    x = original data value

    x = sample mean

    s = sample standard deviation

    z = standard score

    (number of standard deviations x is from )

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-48

    s

    xx z

  • Standardized Value Example

    IQ scores in a large population have a bell-shaped

    distribution with mean = 100 and standard deviation

    = 15

    Find the standardized score (z-score) for

    a person with an IQ of 121.

    Someone with an IQ of 121 is 1.4 standard deviations above

    the mean

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-49

    1.415

    100121

    x z

    Answer:

  • Using Microsoft Excel

    Descriptive Statistics are easy to obtain from Microsoft Excel

    Use menu choice:

    Data / data analysis / descriptive statistics

    Enter details in dialog box

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-50

  • Using Excel

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-51

    Select:

    Data / data analysis / descriptive statistics

  • Using Excel

    Enter dialog box details

    Check box for summary statistics

    Click OK

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-52

    (continued)

  • Excel output

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-53

    Microsoft Excel

    descriptive statistics output

    using the house price data:

    House Prices:

    $2,000,000500,000300,000100,000100,000

  • The Sample Covariance

    The sample covariance measures the strength of the

    linear relationship between two variables (called

    bivariate data)

    The sample covariance:

    Only concerned with the strength of the relationship

    No causal effect is implied

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-54

    1n

    )YY)(XX(

    )Y,X(cov

    n

    1i

    ii

  • Interpreting Covariance

    Covariance between two random variables:

    cov(X,Y) > 0 X and Y tend to move in the same direction

    cov(X,Y) < 0 X and Y tend to move in opposite directions

    cov(X,Y) = 0 X and Y are independent

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-55

  • Coefficient of Correlation

    Measures the relative strength of the linear relationship

    between two variables

    Sample coefficient of correlation:

    where

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-56

    YXSS

    Y),(Xcovr

    1n

    )X(X

    S

    n

    1i

    2

    i

    X

    1n

    )Y)(YX(X

    Y),(Xcov

    n

    1i

    ii

    1n

    )Y(Y

    S

    n

    1i

    2

    i

    Y

  • Features of Correlation Coefficient, r

    Unit free

    Ranges between 1 and 1

    The closer to 1, the stronger the negative linear

    relationship

    The closer to 1, the stronger the positive linear

    relationship

    The closer to 0, the weaker the linear relationship

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-57

  • Scatter Plots of Data with Various Correlation Coefficients

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-58

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    Y

    X

    r = -1 r = -.6 r = 0

    r = +.3r = +1

    Y

    Xr = 0

  • Using Excel to Find the Correlation Coefficient

    Select

    Tools/Data Analysis

    Choose Correlation

    from the selection

    menu

    Click OK . . .

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-59

  • Using Excel to Find the Correlation Coefficient

    Input data range and select appropriate options

    Click OK to get output

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-60

    (continued)

  • Interpreting the Result

    r = .733

    There is a relatively strong positive linear relationship between test score #1 and test score #2

    Students who scored high on the first test tended to score high on second test, and students who scored low on the first test tended to score low on the second test

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-61

    Scatter Plot of Test Scores

    70

    75

    80

    85

    90

    95

    100

    70 75 80 85 90 95 100

    Test #1 ScoreT

    est

    #2 S

    co

    re

  • Pitfalls in Numerical Descriptive Measures

    Data analysis is objective

    Should report the summary measures that best meet the

    assumptions about the data set

    Data interpretation is subjective

    Should be done in fair, neutral and clear manner

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-62

  • Ethical Considerations

    Numerical descriptive measures:

    Should document both good and bad results

    Should be presented in a fair, objective and neutral manner

    Should not use inappropriate summary measures to distort

    facts

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-63

  • Chapter Summary

    Described measures of center and location

    Mean, median, mode, weighted mean

    Discussed percentiles and quartiles

    Created Box and Whisker Plots

    Illustrated distribution shapes

    Symmetric, skewed

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-64

  • Chapter Summary

    Described measures of variation

    Range, interquartile range, variance,

    standard deviation, coefficient of variation

    Discussed Tchebysheffs Theorem

    Calculated standardized data values

    Copyright 2011 Pearson Education, Inc. publishing as Prentice Hall 3-65

    (continued)