Stat Simplify

Embed Size (px)

Citation preview

  • 8/10/2019 Stat Simplify

    1/17

    Averages

    Instatistics,an average is defined as the number that measures the central tendency of a given set of numbers.

    There are a number of different averages including but not limited to: mean, median, mode and range.

    Mean

    Mean is what most people commonly refer to as an average. The mean refers to the number you obtain when you

    sum up a given set of numbers and then divide this sum by the total number in the set. Mean is also referred to

    more correctly as arithmetic mean.

    Given a set of nelements from a1to an

    The mean is found by adding up all the a's and then dividing by the total number, n

    This can be generalized by the formula below:

    Mean Example Problems

    Example 1

    Find the mean of the set of numbers below

    Solution

    The first step is to count how many numbers there are in the set, which we shall call n

    The next step is to add up allthe numbers in the set

    The last step is to find the actual mean by dividing the sum by n

    http://www.wyzant.com/resources/lessons/math/statistics_and_probabilityhttp://www.wyzant.com/resources/lessons/math/statistics_and_probabilityhttp://www.wyzant.com/resources/lessons/math/statistics_and_probabilityhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability
  • 8/10/2019 Stat Simplify

    2/17

    Mean can also be found for grouped data, but before we see an example on that, let us first define frequency.

    Frequency in statistics means the same as in everyday use of the word. The frequency an element in a set refers to

    how many of that element there are in the set. The frequency can be from 0 to as many as possible. If you're told

    that the frequency an element ais 3, that means that there are 3 as in the set.

    Example 2

    Find the mean of the set of ages in the table below

    Age (years) Frequency

    10 0

    11 8

    12 3

    13 2

    14 7

    Solution

    The first step is to find the total number of ages, which we shall call n. Since it will be tedious to count all the ages,

    we can find nby adding up the frequencies:

    Next we need to find the sum of all the ages. We can do this in two ways: we can add up each individual age, which

    will be a long and tedious process; or we can use the frequency to make things faster.

    Since we know that the frequency represents how many of that particular age there are, we can just multiply each

    age by its frequency, and then add up all these products.

    The last step is to find the mean by dividing the sum by n

    Population Mean vs Sample Mean

    In theIntroduction to Statisticssection, we defined a population and asamplewhereby a sample is a part of a

    population.

    In statistics there are two kinds of means: population mean and sample mean. A population mean is the true mean

    of the entire population of the data set while a sample mean is the mean of a small sample of the population. These

    different means appear frequently in both statistics and probability and should not be confused with each other.

    Population mean is represented by the Greek letter (pronounced mu) while sample mean is represented

    by (pronounced x bar). The total number of elements in a population is represented by Nwhile the number of

    http://www.wyzant.com/resources/lessons/math/statistics_and_probability/introductionhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introductionhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introductionhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction
  • 8/10/2019 Stat Simplify

    3/17

    elements in a sample is represented by n. This leads to an adjustment in the formula we gave above for calculating

    the mean.

    The sample mean is commonly used to estimate the population mean when the population mean is unknown. This is

    because they have the same expected value.

    Median

    The median is defined as the number in the middle of a given set of numbers arranged in order of increasing

    magnitude. When given a set of numbers, the median is the number positioned in the exact middle of the list when

    you arrange the numbers from the lowest to the highest. The median is also a measure of average. In higher level

    statistics, median is used as a measure of dispersion. The median is important because it describes the behavior of

    the entire set of numbers.

    Example 3

    Find the median in the set of numbers given below

    Solution

    From the definition of median, we should be able to tell that the first step is to rearrange the given set of numbers in

    order of increasing magnitude, i.e. from the lowest to the highest

    Then we inspect the set to find that number which lies in the exact middle.

    Lets try another example to emphasize something interesting that often occurs when solving for the median.

    Example 4

    Find the median of the given data

    Solution

    As in the previous example, we start off by rearranging the data in order from the smallest to the largest.

    Next we inspect the data to find the number that lies in the exact middle.

  • 8/10/2019 Stat Simplify

    4/17

    We can see from the above that we end up with two numbers (4and 5) in the middle. We can solve for the median

    by finding the mean of these two numbers as follows:

    Mode

    The mode is defined as the element that appears most frequently in a given set of elements. Using the definition of

    frequency given above, mode can also be defined as the element with the largest frequency in a given data set.

    For a given data set, there can be more than one mode. As long as those elements all have the same frequency and

    that frequency is the highest, they are all the modal elements of the data set.

    Example 5

    Find the Mode of the following data set.

    Solution

    Mode = 3 and 15

    Mode for Grouped Data

    As we saw in the section on data, grouped data is divided into classes. We have defined mode as the element which

    has the highest frequency in a given data set. In grouped data, we can find two kinds of mode: the Modal Class, or

    class with the highest frequency and the mode itself, which we calculate from the modal class using the formula

    below.

    where

    Lis the lower class limit of the modal class

    f1is the frequency of the modal class

    f0is the frequency of the class before the modal class in the frequency table

    f2is the frequency of the class after the modal class in the frequency table

    his the class interval of the modal class

    Example 6

    Find the modal class and the actual mode of the data set below

    Number Frequency

    1 - 3 7

    4 - 6 6

    7 - 9 4

  • 8/10/2019 Stat Simplify

    5/17

  • 8/10/2019 Stat Simplify

    6/17

    In the section onaverages,we learned how to calculate the mean for a given set of data. The data we looked at was

    ungrouped data and the total number of elements in the data set was not that large. That method is not always a

    realistic approach especially if you're dealing with grouped data.

    That's where the assumed mean comes into play.

    Assumed mean, like the name suggests, is a guess or an assumption of the mean. Assumed mean is most commonly

    denoted by the letter a. It doesn't need to be correct or even close to the actual mean and choice of the assumed

    mean is at your discretion except for where the question explicitly asks you to use a certain assumed mean value.

    Assumed mean is used to calculate the actual mean as well as the variance and standard deviation as we'll see later.

    Assumed mean can be calculated from the following formula:

    It's very important to remember that the above formula only applies to grouped data with equal class intervals.

    Now let us define each term used in the formula:

    is the mean hich ere trying to find

    ais the assumed mean.

    his the class interval which we looked at in the section on data.

    fiis the frequency of each class, we find the total frequency of all the classes in the data

    set (fi) by adding up all thefi's

    Each uiis found from the following formula:

    where his the class interval and each diis the difference between the mid element in a

    class and the assumed mean.

    dis calculated from the following formula:

    wherexis the midpoint of a given class.

    xis obtained from the following:

    xiis the number in the middle of a given class.

    Therefore uibecomes

    http://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages
  • 8/10/2019 Stat Simplify

    7/17

    Let's try an example to see how to apply the assumed mean method for finding mean.

    Example 1

    The student body of a certain school were polled to find out what their hobbies were. The number of hobbies each

    student had was then recorded and the data obtained was grouped into classes shown in the table below. Using an

    assumed mean of 17, find the mean for the number of hobbies of the students in the school.

    Number of hobbies Frequency

    0 - 4 45

    5 - 9 58

    10 - 14 27

    15 - 19 30

    20 - 24 19

    25 - 29 11

    30 - 34 8

    35 - 40 2

    Solution

    We have been given the assumed mean aas 17and we know the formula for finding mean from the assumed mean

    as

    we can find the class interval by using the class limits as follows:

    We now have one component we need and we're one step closer to finding the mean.

    So we can solve the rest of this problem using a table where by we find each remaining component of the formula

    and then substitute at the end:

    Hobbies Frequency fixidi= xi- a ui= ih fiui

    0 - 4 45 2 -15 -3 -135

    5 - 9 58 7 -10 -2 -116

    10 - 14 27 12 -5 -1 -27

    15 - 19 30 17 0 0 0

    20 - 24 19 22 5 1 19

    25 - 29 11 27 10 2 22

    30 - 34 8 32 15 3 24

    35 - 40 2 37 20 4 8

    fi= 200 fiui= -202

    substituting

  • 8/10/2019 Stat Simplify

    8/17

    The mean number of hobbies is 11.95.

    Cumulative Frequency, Quartiles and Percentiles

    Cumulative Frequency

    Cumulative frequency is defined as a running total of frequencies. The frequency of an element in a set refers to how

    many of that element there are in the set. Cumulative frequency can also defined as the sum of all previous

    frequencies up to the current point.

    The cumulative frequency is important when analyzing data, where the value of the cumulative frequency indicates

    the number of elements in the data set that lie below the current value. The cumulative frequency is also useful

    when representing data using diagrams like histograms.

    Cumulative Frequency Table

    The cumulative frequency is usually observed by constructing a cumulative frequency table. The cumulative

    frequency table takes the form as in the example below.

    Example 1

    The set of data below shows the ages of participants in a certain summer camp. Draw a cumulative frequency table

    for the data.

    Age (years) Frequency

    10 3

    11 18

    12 13

    13 12

    14 7

    15 27

    Solution:

    The cumulative frequency at a certain point is found by adding the frequency at the present point to the cumulative

    frequency of the previous point.

    The cumulative frequency for the first data point is the same as its frequency since there is no cumulative frequency

    before it.

    Age (years) Frequency Cumulative Frequency

    10 3 3

    11 18 3+18 = 21

    12 13 21+13 = 34

    13 12 34+12 = 46

    14 7 46+7 = 53

    15 27 53+27 = 80

    Cumulative Frequency Graph (Ogive)

    A cumulative frequency graph, also known as an Ogive, is a curve showing the cumulative frequency for a given set

    of data. The cumulative frequency is plotted on the y-axis against the data which is on the x-axis for un-grouped

    data. When dealing with grouped data, the Ogive is formed by plotting the cumulative frequency against the upper

    boundary of the class. An Ogive is used to study the growth rate of data as it shows the accumulation of frequency

    and hence its growth rate.

  • 8/10/2019 Stat Simplify

    9/17

    Example 2

    Plot the cumulative frequency curve for the data set below

    Age (years) Frequency

    10 5

    11 10

    12 27

    13 18

    14 6

    15 16

    16 38

    17 9

    Solution:

    Age (years) Frequency Cumulative Frequency

    10 5 5

    11 10 5+10 = 15

    12 27 15+27 = 42

    13 18 42+18 = 60

    14 6 60+6 = 66

    15 16 66+16 = 82

    16 38 82+38 = 120

    17 9 120+9 = 129

    Percentiles

    A percentile is a certain percentage of a set of data. Percentiles are used to observe how many of a given set of data

    fall within a certain percentage range; for example; a thirtieth percentile indicates data that lies the 13% mark of the

    entire data set.

    Calculating Percentiles

    Let designate a percentile as Pmwhere mrepresents the percentile we're finding, for example for the tenth

    percentile, m} would be 10. Given that the total number of elements in the data set is N

    Quartiles

  • 8/10/2019 Stat Simplify

    10/17

    The term quartile is derived from the word quarter which means one fourth of something. Thus a quartile is a certain

    fourth of a data set. When you arrange a date set increasing order from the lowest to the highest, then you divide

    this data into groups of four, you end up with quartiles. There are three quartiles that are studied in statistics.

    First Quartile (Q1)

    When you arrange a data set in increasing order from the lowest to the highest, then you

    proceed to divide this data into four groups, the data at the lower fourth (14) mark of

    the data is referred to as the First Quartile.

    The First Quartile is equal to the data at the 25th percentile of the data. The first quartile

    can also be obtained using the Ogive whereby you section off the curve into four parts

    and then the data that lies on the last quadrant is referred to as the first quartile.

    Second Quartile (Q2)

    When you arrange a given data set in increasing order from the lowest to the highest

    and then divide this data into four groups , the data value at the second fourth (24) mark

    of the data is referred to as the Second Quartile.

    This is the equivalent to the data value at the half way point of all the data and is also

    equal to the the data value at the 50th percentile.

    The Second Quartile can similarly be obtained from an Ogive by sectioning off the curve

    into four and the data that lies at the second quadrant mark is then referred to as the

    second data. In other words, all the data at the half way line on the cumulative

    frequency curve is the second quartile. The second quartile is also equal to the median.

    Third Quartile (Q3)

    When you arrange a given data set in increasing order from the lowest to the highest

    and then divide this data into four groups, the data value at the third fourth (34) mark of

    the data is referred to as the Third Quartile.

    This is the equivalent of the the data at the 75th percentile. The third quartile can be

    obtained from an Ogive by dividing the curve into four and then considering all the data

    value that lies at the 34mark.

    Calculating the Different Quartiles

    The different quartiles can be calculated using the same method as with the median.

    First Quartile

    The first quartile can be calculated by first arranging the data in an ordered list, then

    finding then dividing the data into two groups. If the total number of elements in the

    data set is odd, you exclude the median (the element in the middle).

    After this you only look at the lower half of the data and then find the median for this

    new subset of data using the method for finding median described in the section

    onaverages.

    This median will be your First Quartile.

    Second Quartile

    http://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages
  • 8/10/2019 Stat Simplify

    11/17

    The second quartile is the same as the median and can thus be found using the same

    methods for finding median described in the section on averages.

    Third Quartile

    The third quartile is found in a similar manner to the first quartile. The difference here is

    that after dividing the data into two groups, instead of considering the data in the lower

    half, you consider the data in the upper half and then you proceed to find the Median of

    this subset of data using the methods described in the section on Averages.

    This median will be your Third Quartile.

    Calculating Quartiles from Cumulative Frequency

    As mentioned above, we can obtain the different quartiles from the Ogive, which means that we use the cumulative

    frequency to calculate the quartile.

    Given that the cumulative frequency for the last element in the data set is given as fc, the quartiles can be calculated

    as follows:

    The quartile is then located by matching up which element has the cumulative frequency corresponding to the

    position obtained above.

    Example 3

    Find the First, Second and Third Quartiles of the data set below using the cumulative frequency curve.

    Age (years) Frequency

    10 5

    11 10

    12 27

    13 18

    14 6

    15 16

    16 38

    17 9

    Solution:

    Age (years) Frequency Cumulative Frequency

    10 5 5

    11 10 15

    12 27 42

    13 18 60

    14 6 66

    15 16 82

    16 38 120

  • 8/10/2019 Stat Simplify

    12/17

    17 9 129

    From the Ogive, we can see the positions where the quartiles lie and thus can approximate them as follows

    Interquartile Range

    The interquartile range is the difference between the third quartile and the first quartile.

    Dispersion - Deviation and Variance

    Dispersion measures how the various elements behave with regards to some sort of central tendency, usually the

    mean. Measures of dispersion includerange,interquartile range,variance, standard deviation and absolute deviation.

    We've already looked at the first two in theAveragessection, so let's move on to the other measures.

    Absolute Deviation

    http://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Range
  • 8/10/2019 Stat Simplify

    13/17

    Absolute deviation for a given data set is defined as the average of the absolute difference between the elements of

    the set and the mean (average deviation) or the median element (median absolute deviation).

    The average deviation is calculated as follows:

    which means that the average deviation is the average of the differences between each element of the data set and

    the mean.

    The median absolute deviation is calculated as follows:

    Example 1

    The heights of a group of 10 students randomly selected from a given school are as follows (in ft):

    5.5, 3.5, 4.6, 6.1, 5.7, 5.11, 4.9, 5.0, 5.0, 5.5

    a) Find the absolute deviation from the mean.

    b) Find the absolute deviation from the median.

    Solution

    a)To find the absolute deviation from the mean, we need to first find the mean of the heights.

    We know that the mean is given by:

    Using the above, we calculate the mean as:

    The mean height is 5.091 ft.

    The deviation from the mean for each of the elements in the data set is obtained by subtracting the mean from that

    element, as follows:

    For 5.5:

  • 8/10/2019 Stat Simplify

    14/17

    We find all the deviations and then take their average (remember that we only consider their absolute values):

    b)To find the absolute deviation from the median, we need to first find the median height for the data set.

    We know that to find the median value, we arrange the elements in the data set in ascending or descending order

    and the find that element that lies in the middle.

    Arranged in ascending order from the smallest to the largest:

    Finding the median:

    Since we had an even number of elements in the data set, it comes as no surprise that we're unable to obtain a

    median by canceling out corresponding elements. We're left with two elements and so we find their mean which then

    becomes our median.

    Having obtained our median as 5.25, we can proceed to find the average deviation from the median using the same

    steps as in the previous question.

  • 8/10/2019 Stat Simplify

    15/17

    Variance and Standard Deviation

    Variance, as the name suggests, is a measure of how different the elements in a given population are. Variance is

    used to indicate how spread out these elements are from the mean of the population. There are two kinds ofvariance: population variance and sample variance.

    Population variance is the variance of the entire population and is denoted by 2while sample variance is the

    variance of a sample space of the population; and is denoted by S2

    Standard deviation is the square root of variance. Standard deviation is a measure of how precise the mean of a

    population or sample is. It is used to indicate trends in the elements in a given data set with respect to the mean, i.e,

    the spread of these elements from the mean.

    Just as we have a population and sample variance, we also have a population and sample standard deviation.

    Population standard deviation is denoted by while the sample standard deviation is denoted by S

    Although absolute deviation is also a measure of dispersion, variance and standard deviation are better measures

    because of the way they're calculated. Calculating variance involves squaring the differences (deviations) between

    the element and the mean and this makes the differences larger and thus more manageable. Making the differences

    larger adds a weighting factor to them making trends easier to spot.

    The population variance can be calculated from the following:

    where is the population mean.

    The sample variance is given by

    where is the sample mean.

    Standard deviation is simply the square root of variance, so we can calculate it by taking the square root of the

    above variance formulae:

    Population standard deviation

    where is the population mean.

    Sample standard deviation

  • 8/10/2019 Stat Simplify

    16/17

    where is the sample mean.

    The difference in calculating 2and S2is the average if found using the number of elements in the set for 2. By

    contrast, we use one less than the sample space size for S2. The reason for this is that by using n-1we ensure

    that S2is an unbiased estimator of 2.

    Before you can begin to understand statistics, there are four terms you will need to fully understand.

    The first term 'average' is something we have been familiar with from a very early age when we startanalyzing our marks on report cards. We add together all of our test results and then divide it by thesum of the total number of marks there are. We often call it the average. However, statistically it's theMean!

    The Mean

    Example:

    Four tests results: 15, 18, 22, 20

    The sum is: 75

    Divide 75 by 4: 18.75

    The 'Mean' (Average) is 18.75

    (Often rounded to 19)

    The Median

    The Median is the 'middle value' in your list. When the totals of the list are odd, the median is the

    middle entry in the list after sorting the list into increasing order. When the totals of the list are even,

    the median is equal to the sum of the two middle (after sorting the list into increasing order) numbers

    divided by two. Thus, remember to line up your values, the middle number is the median! Be sure to

    remember the odd and even rule.

    Examples:

    Find the Median of: 9, 3, 44, 17, 15 (Odd amount of numbers)

    Line up your numbers: 3, 9, 15, 17, 44 (smallest to largest)

    The Median is: 15 (The number in the middle)

    Find the Median of: 8, 3, 44, 17, 12, 6 (Even amount of numbers)

    Line up your numbers: 3, 6, 8, 12, 17, 44

    Add the 2 middles numbers and divide by 2: 8 12 = 20 2 = 10

    The Median is 10.

    The Mode

    The mode in a list of numbers refers to the list of numbers that occur most frequently. A trick to

    remember this one is to remember that mode starts with the same first two letters that most does.Most frequently - Mode. You'll never forget that one!

    Examples:

    Find the mode of:

    9, 3, 3, 44, 17 , 17, 44, 15, 15, 15, 27, 40, 8,

    Put the numbers is order for ease:

  • 8/10/2019 Stat Simplify

    17/17