Stat Simplify

8/10/2019 Stat Simplify

1/17

Averages

Instatistics,an average is defined as the number that measures the central tendency of a given set of numbers.

There are a number of different averages including but not limited to: mean, median, mode and range.

Mean

Mean is what most people commonly refer to as an average. The mean refers to the number you obtain when you

sum up a given set of numbers and then divide this sum by the total number in the set. Mean is also referred to

more correctly as arithmetic mean.

Given a set of nelements from a1to an

The mean is found by adding up all the a's and then dividing by the total number, n

This can be generalized by the formula below:

Mean Example Problems

Example 1

Find the mean of the set of numbers below

Solution

The first step is to count how many numbers there are in the set, which we shall call n

The next step is to add up allthe numbers in the set

The last step is to find the actual mean by dividing the sum by n
http://www.wyzant.com/resources/lessons/math/statistics_and_probabilityhttp://www.wyzant.com/resources/lessons/math/statistics_and_probabilityhttp://www.wyzant.com/resources/lessons/math/statistics_and_probabilityhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability


2/17

Mean can also be found for grouped data, but before we see an example on that, let us first define frequency.

Frequency in statistics means the same as in everyday use of the word. The frequency an element in a set refers to

how many of that element there are in the set. The frequency can be from 0 to as many as possible. If you're told

that the frequency an element ais 3, that means that there are 3 as in the set.

Example 2

Find the mean of the set of ages in the table below

Age (years) Frequency

10 0

11 8

12 3

13 2

14 7

Solution

The first step is to find the total number of ages, which we shall call n. Since it will be tedious to count all the ages,

we can find nby adding up the frequencies:

Next we need to find the sum of all the ages. We can do this in two ways: we can add up each individual age, which

will be a long and tedious process; or we can use the frequency to make things faster.

Since we know that the frequency represents how many of that particular age there are, we can just multiply each

age by its frequency, and then add up all these products.

The last step is to find the mean by dividing the sum by n

Population Mean vs Sample Mean

In theIntroduction to Statisticssection, we defined a population and asamplewhereby a sample is a part of a

population.

In statistics there are two kinds of means: population mean and sample mean. A population mean is the true mean

of the entire population of the data set while a sample mean is the mean of a small sample of the population. These

different means appear frequently in both statistics and probability and should not be confused with each other.

Population mean is represented by the Greek letter (pronounced mu) while sample mean is represented

by (pronounced x bar). The total number of elements in a population is represented by Nwhile the number of
http://www.wyzant.com/resources/lessons/math/statistics_and_probability/introductionhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introductionhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introductionhttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction/samplinghttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/introduction


3/17

elements in a sample is represented by n. This leads to an adjustment in the formula we gave above for calculating

the mean.

The sample mean is commonly used to estimate the population mean when the population mean is unknown. This is

because they have the same expected value.

Median

The median is defined as the number in the middle of a given set of numbers arranged in order of increasing

magnitude. When given a set of numbers, the median is the number positioned in the exact middle of the list when

you arrange the numbers from the lowest to the highest. The median is also a measure of average. In higher level

statistics, median is used as a measure of dispersion. The median is important because it describes the behavior of

the entire set of numbers.

Example 3

Find the median in the set of numbers given below

Solution

From the definition of median, we should be able to tell that the first step is to rearrange the given set of numbers in

order of increasing magnitude, i.e. from the lowest to the highest

Then we inspect the set to find that number which lies in the exact middle.

Lets try another example to emphasize something interesting that often occurs when solving for the median.

Example 4

Find the median of the given data

Solution

As in the previous example, we start off by rearranging the data in order from the smallest to the largest.

Next we inspect the data to find the number that lies in the exact middle.


4/17

We can see from the above that we end up with two numbers (4and 5) in the middle. We can solve for the median

by finding the mean of these two numbers as follows:

Mode

The mode is defined as the element that appears most frequently in a given set of elements. Using the definition of

frequency given above, mode can also be defined as the element with the largest frequency in a given data set.

For a given data set, there can be more than one mode. As long as those elements all have the same frequency and

that frequency is the highest, they are all the modal elements of the data set.

Example 5

Find the Mode of the following data set.

Solution

Mode = 3 and 15

Mode for Grouped Data

As we saw in the section on data, grouped data is divided into classes. We have defined mode as the element which

has the highest frequency in a given data set. In grouped data, we can find two kinds of mode: the Modal Class, or

class with the highest frequency and the mode itself, which we calculate from the modal class using the formula

below.

where

Lis the lower class limit of the modal class

f1is the frequency of the modal class

f0is the frequency of the class before the modal class in the frequency table

f2is the frequency of the class after the modal class in the frequency table

his the class interval of the modal class

Example 6

Find the modal class and the actual mode of the data set below

Number Frequency

1 - 3 7

4 - 6 6

7 - 9 4


5/17


6/17

In the section onaverages,we learned how to calculate the mean for a given set of data. The data we looked at was

ungrouped data and the total number of elements in the data set was not that large. That method is not always a

realistic approach especially if you're dealing with grouped data.

That's where the assumed mean comes into play.

Assumed mean, like the name suggests, is a guess or an assumption of the mean. Assumed mean is most commonly

denoted by the letter a. It doesn't need to be correct or even close to the actual mean and choice of the assumed

mean is at your discretion except for where the question explicitly asks you to use a certain assumed mean value.

Assumed mean is used to calculate the actual mean as well as the variance and standard deviation as we'll see later.

Assumed mean can be calculated from the following formula:

It's very important to remember that the above formula only applies to grouped data with equal class intervals.

Now let us define each term used in the formula:

is the mean hich ere trying to find

ais the assumed mean.

his the class interval which we looked at in the section on data.

fiis the frequency of each class, we find the total frequency of all the classes in the data

set (fi) by adding up all thefi's

Each uiis found from the following formula:

where his the class interval and each diis the difference between the mid element in a

class and the assumed mean.

dis calculated from the following formula:

wherexis the midpoint of a given class.

xis obtained from the following:

xiis the number in the middle of a given class.

Therefore uibecomes
http://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages


7/17

Let's try an example to see how to apply the assumed mean method for finding mean.

Example 1

The student body of a certain school were polled to find out what their hobbies were. The number of hobbies each

student had was then recorded and the data obtained was grouped into classes shown in the table below. Using an

assumed mean of 17, find the mean for the number of hobbies of the students in the school.

Number of hobbies Frequency

0 - 4 45

5 - 9 58

10 - 14 27

15 - 19 30

20 - 24 19

25 - 29 11

30 - 34 8

35 - 40 2

Solution

We have been given the assumed mean aas 17and we know the formula for finding mean from the assumed mean

as

we can find the class interval by using the class limits as follows:

We now have one component we need and we're one step closer to finding the mean.

So we can solve the rest of this problem using a table where by we find each remaining component of the formula

and then substitute at the end:

Hobbies Frequency fixidi= xi- a ui= ih fiui

0 - 4 45 2 -15 -3 -135

5 - 9 58 7 -10 -2 -116

10 - 14 27 12 -5 -1 -27

15 - 19 30 17 0 0 0

20 - 24 19 22 5 1 19

25 - 29 11 27 10 2 22

30 - 34 8 32 15 3 24

35 - 40 2 37 20 4 8

fi= 200 fiui= -202

substituting


8/17

The mean number of hobbies is 11.95.

Cumulative Frequency, Quartiles and Percentiles

Cumulative Frequency

Cumulative frequency is defined as a running total of frequencies. The frequency of an element in a set refers to how

many of that element there are in the set. Cumulative frequency can also defined as the sum of all previous

frequencies up to the current point.

The cumulative frequency is important when analyzing data, where the value of the cumulative frequency indicates

the number of elements in the data set that lie below the current value. The cumulative frequency is also useful

when representing data using diagrams like histograms.

Cumulative Frequency Table

The cumulative frequency is usually observed by constructing a cumulative frequency table. The cumulative

frequency table takes the form as in the example below.

Example 1

The set of data below shows the ages of participants in a certain summer camp. Draw a cumulative frequency table

for the data.


10 3

11 18

12 13

13 12

14 7

15 27

Solution:

The cumulative frequency at a certain point is found by adding the frequency at the present point to the cumulative

frequency of the previous point.

The cumulative frequency for the first data point is the same as its frequency since there is no cumulative frequency

before it.

Age (years) Frequency Cumulative Frequency

10 3 3

11 18 3+18 = 21

12 13 21+13 = 34

13 12 34+12 = 46

14 7 46+7 = 53

15 27 53+27 = 80

Cumulative Frequency Graph (Ogive)

A cumulative frequency graph, also known as an Ogive, is a curve showing the cumulative frequency for a given set

of data. The cumulative frequency is plotted on the y-axis against the data which is on the x-axis for un-grouped

data. When dealing with grouped data, the Ogive is formed by plotting the cumulative frequency against the upper

boundary of the class. An Ogive is used to study the growth rate of data as it shows the accumulation of frequency

and hence its growth rate.


9/17

Example 2

Plot the cumulative frequency curve for the data set below


10 5

11 10

12 27

13 18

14 6

15 16

16 38

17 9

Solution:


10 5 5

11 10 5+10 = 15

12 27 15+27 = 42

13 18 42+18 = 60

14 6 60+6 = 66

15 16 66+16 = 82

16 38 82+38 = 120

17 9 120+9 = 129

Percentiles

A percentile is a certain percentage of a set of data. Percentiles are used to observe how many of a given set of data

fall within a certain percentage range; for example; a thirtieth percentile indicates data that lies the 13% mark of the

entire data set.

Calculating Percentiles

Let designate a percentile as Pmwhere mrepresents the percentile we're finding, for example for the tenth

percentile, m} would be 10. Given that the total number of elements in the data set is N

Quartiles


10/17

The term quartile is derived from the word quarter which means one fourth of something. Thus a quartile is a certain

fourth of a data set. When you arrange a date set increasing order from the lowest to the highest, then you divide

this data into groups of four, you end up with quartiles. There are three quartiles that are studied in statistics.

First Quartile (Q1)

When you arrange a data set in increasing order from the lowest to the highest, then you

proceed to divide this data into four groups, the data at the lower fourth (14) mark of

the data is referred to as the First Quartile.

The First Quartile is equal to the data at the 25th percentile of the data. The first quartile

can also be obtained using the Ogive whereby you section off the curve into four parts

and then the data that lies on the last quadrant is referred to as the first quartile.

Second Quartile (Q2)

When you arrange a given data set in increasing order from the lowest to the highest

and then divide this data into four groups , the data value at the second fourth (24) mark

of the data is referred to as the Second Quartile.

This is the equivalent to the data value at the half way point of all the data and is also

equal to the the data value at the 50th percentile.

The Second Quartile can similarly be obtained from an Ogive by sectioning off the curve

into four and the data that lies at the second quadrant mark is then referred to as the

second data. In other words, all the data at the half way line on the cumulative

frequency curve is the second quartile. The second quartile is also equal to the median.

Third Quartile (Q3)

When you arrange a given data set in increasing order from the lowest to the highest

and then divide this data into four groups, the data value at the third fourth (34) mark of

the data is referred to as the Third Quartile.

This is the equivalent of the the data at the 75th percentile. The third quartile can be

obtained from an Ogive by dividing the curve into four and then considering all the data

value that lies at the 34mark.

Calculating the Different Quartiles

The different quartiles can be calculated using the same method as with the median.

First Quartile

The first quartile can be calculated by first arranging the data in an ordered list, then

finding then dividing the data into two groups. If the total number of elements in the

data set is odd, you exclude the median (the element in the middle).

After this you only look at the lower half of the data and then find the median for this

new subset of data using the method for finding median described in the section

onaverages.

This median will be your First Quartile.

Second Quartile
http://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages


11/17

The second quartile is the same as the median and can thus be found using the same

methods for finding median described in the section on averages.

Third Quartile

The third quartile is found in a similar manner to the first quartile. The difference here is

that after dividing the data into two groups, instead of considering the data in the lower

half, you consider the data in the upper half and then you proceed to find the Median of

this subset of data using the methods described in the section on Averages.

This median will be your Third Quartile.

Calculating Quartiles from Cumulative Frequency

As mentioned above, we can obtain the different quartiles from the Ogive, which means that we use the cumulative

frequency to calculate the quartile.

Given that the cumulative frequency for the last element in the data set is given as fc, the quartiles can be calculated

as follows:

The quartile is then located by matching up which element has the cumulative frequency corresponding to the

position obtained above.

Example 3

Find the First, Second and Third Quartiles of the data set below using the cumulative frequency curve.


10 5

11 10

12 27

13 18

14 6

15 16

16 38

17 9

Solution:


10 5 5

11 10 15

12 27 42

13 18 60

14 6 66

15 16 82

16 38 120


12/17

17 9 129

From the Ogive, we can see the positions where the quartiles lie and thus can approximate them as follows

Interquartile Range

The interquartile range is the difference between the third quartile and the first quartile.

Dispersion - Deviation and Variance

Dispersion measures how the various elements behave with regards to some sort of central tendency, usually the

mean. Measures of dispersion includerange,interquartile range,variance, standard deviation and absolute deviation.

We've already looked at the first two in theAveragessection, so let's move on to the other measures.

Absolute Deviation
http://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averageshttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_frequency_percentiles_and_quartiles#Interquartile_Rangehttp://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages#Range


13/17

Absolute deviation for a given data set is defined as the average of the absolute difference between the elements of

the set and the mean (average deviation) or the median element (median absolute deviation).

The average deviation is calculated as follows:

which means that the average deviation is the average of the differences between each element of the data set and

the mean.

The median absolute deviation is calculated as follows:

Example 1

The heights of a group of 10 students randomly selected from a given school are as follows (in ft):

5.5, 3.5, 4.6, 6.1, 5.7, 5.11, 4.9, 5.0, 5.0, 5.5

a) Find the absolute deviation from the mean.

b) Find the absolute deviation from the median.

Solution

a)To find the absolute deviation from the mean, we need to first find the mean of the heights.

We know that the mean is given by:

Using the above, we calculate the mean as:

The mean height is 5.091 ft.

The deviation from the mean for each of the elements in the data set is obtained by subtracting the mean from that

element, as follows:

For 5.5:


14/17

We find all the deviations and then take their average (remember that we only consider their absolute values):

b)To find the absolute deviation from the median, we need to first find the median height for the data set.

We know that to find the median value, we arrange the elements in the data set in ascending or descending order

and the find that element that lies in the middle.

Arranged in ascending order from the smallest to the largest:

Finding the median:

Since we had an even number of elements in the data set, it comes as no surprise that we're unable to obtain a

median by canceling out corresponding elements. We're left with two elements and so we find their mean which then

becomes our median.

Having obtained our median as 5.25, we can proceed to find the average deviation from the median using the same

steps as in the previous question.


15/17

Variance and Standard Deviation

Variance, as the name suggests, is a measure of how different the elements in a given population are. Variance is

used to indicate how spread out these elements are from the mean of the population. There are two kinds ofvariance: population variance and sample variance.

Population variance is the variance of the entire population and is denoted by 2while sample variance is the

variance of a sample space of the population; and is denoted by S2

Standard deviation is the square root of variance. Standard deviation is a measure of how precise the mean of a

population or sample is. It is used to indicate trends in the elements in a given data set with respect to the mean, i.e,

the spread of these elements from the mean.

Just as we have a population and sample variance, we also have a population and sample standard deviation.

Population standard deviation is denoted by while the sample standard deviation is denoted by S

Although absolute deviation is also a measure of dispersion, variance and standard deviation are better measures

because of the way they're calculated. Calculating variance involves squaring the differences (deviations) between

the element and the mean and this makes the differences larger and thus more manageable. Making the differences

larger adds a weighting factor to them making trends easier to spot.

The population variance can be calculated from the following:

where is the population mean.

The sample variance is given by

where is the sample mean.

Standard deviation is simply the square root of variance, so we can calculate it by taking the square root of the

above variance formulae:

Population standard deviation

where is the population mean.

Sample standard deviation


16/17

where is the sample mean.

The difference in calculating 2and S2is the average if found using the number of elements in the set for 2. By

contrast, we use one less than the sample space size for S2. The reason for this is that by using n-1we ensure

that S2is an unbiased estimator of 2.

Before you can begin to understand statistics, there are four terms you will need to fully understand.

The first term 'average' is something we have been familiar with from a very early age when we startanalyzing our marks on report cards. We add together all of our test results and then divide it by thesum of the total number of marks there are. We often call it the average. However, statistically it's theMean!

The Mean

Example:

Four tests results: 15, 18, 22, 20

The sum is: 75

Divide 75 by 4: 18.75

The 'Mean' (Average) is 18.75

(Often rounded to 19)

The Median

The Median is the 'middle value' in your list. When the totals of the list are odd, the median is the

middle entry in the list after sorting the list into increasing order. When the totals of the list are even,

the median is equal to the sum of the two middle (after sorting the list into increasing order) numbers

divided by two. Thus, remember to line up your values, the middle number is the median! Be sure to

remember the odd and even rule.

Examples:

Find the Median of: 9, 3, 44, 17, 15 (Odd amount of numbers)

Line up your numbers: 3, 9, 15, 17, 44 (smallest to largest)

The Median is: 15 (The number in the middle)

Find the Median of: 8, 3, 44, 17, 12, 6 (Even amount of numbers)

Line up your numbers: 3, 6, 8, 12, 17, 44

Add the 2 middles numbers and divide by 2: 8 12 = 20 2 = 10

The Median is 10.

The Mode

The mode in a list of numbers refers to the list of numbers that occur most frequently. A trick to

remember this one is to remember that mode starts with the same first two letters that most does.Most frequently - Mode. You'll never forget that one!

Examples:

Find the mode of:

9, 3, 3, 44, 17 , 17, 44, 15, 15, 15, 27, 40, 8,

Put the numbers is order for ease:


17/17

Documents

Stat Simplify