71
Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. ~Aaron Levenstein Statistics can be made to prove anything - even the truth. ~Author Unknown Lottery: A tax on people who are bad at math. ~Author Unknown He uses statistics as a drunken man uses lampposts - for support rather than for illumination. ~Andrew Lang The theory of probabilities is at bottom nothing but common sense reduced to calculus. ~Laplace, Théorie analytique des probabilités, 1820 I could prove God statistically. Take the human body alone - the chances that all the functions of an individual would just happen is a statistical Statistics – What is it?

Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Embed Size (px)

Citation preview

Page 1: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Torture numbers, and they'll confess to anything. ~Gregg Easterbrook

98% of all statistics are made up. ~Author Unknown

Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital. ~Aaron Levenstein

Statistics can be made to prove anything - even the truth. ~Author Unknown

Lottery: A tax on people who are bad at math. ~Author Unknown

He uses statistics as a drunken man uses lampposts - for support rather than for illumination. ~Andrew Lang

The theory of probabilities is at bottom nothing but common sense reduced to calculus. ~Laplace, Théorie analytique des probabilités, 1820

I could prove God statistically. Take the human body alone - the chances that all the functions of an individual would just happen is a statistical monstrosity. ~George Gallup

Statistics are just a way for the mathematician to evangelize his faith. ~Hunter Brinkmeier

There are three kinds of lies: lies, damned lies, and statistics.“ ~ Benjamin Disraelie

Statistics – What is it?

Page 2: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Statistics is the science of using of mathematical tools to interpret data

Page 3: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveUnderstand the different ways of describing dataUnderstand the importance of different sampling techniques when collecting data

Page 4: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

The Different Ways of Describing Data

Discrete data

Continuous data

Categorical data

Numerical data

Qualitative data

Quantitative data

Page 5: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

The Different Ways of Describing Data

Data that falls into different labelled groups. If the labels are numerical then they have no numerical worth so calculating a mean is meaningless.

Data that is digital and has specific values with gaps in between. A slight improvement in the accuracy of the measuring device does not alter the data.

Data that is analogue and takes a range of values. A slight improvement in the accuracy of the measuring device alters the data collected.

Data that is based on the size of numbers where the size of the numbers have some meaning.

Data that has been collected based on some quality or categorization that in some cases may be 'informal' or may use relatively ill-defined characteristics such as warmth and flavour; Data that can be observed but not measured.

Data that has a been collected by using a measuring scale is data measured or identified on a numerical scale.

Discrete data

Continuous data

Categorical data

Numerical data

Qualitative data

Quantitative data

Page 6: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Give 3 examples of each type of data:

Discrete data

Continuous data

Categorical data

Numerical data

Qualitative data

Quantitative data

Page 7: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Discrete data

Continuous data

Categorical data

Numerical data

Qualitative data

Quantitative data

The Different Ways of Describing Data

Eg Types of Pet, House Number, Colour

Eg Shoe Size, Dice score, Type of Pet

Eg Time to run a mile, length of a hair

Eg Score on a dice, Weight of a lemon

Eg I feel happy, The weather is good today

Eg The score obtained in a test, the height of a tree

Page 8: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Decide whether each of the following sets of data is categorical or numerical, and if numerical whether it is discrete or continuous.

1) Cards drawn from a set of playing cards: {2 of diamonds, ace of spades, 3 of hearts etc…} 2) Number of aces in a hand of 13 cards: {1, 2, 3, 4} 3) Time in seconds for 100 metre sprint: {10.05, 12.31, 11.20, 10.67, 11.56, …etc} 4) Fraction of coin tosses which were Heads after 1, 2, 3, … tosses for the following sequence: H T H T T T H H … {1, ½, 2/3, ½, 2/5, 1/3, 3/7, ½, …} 5) Number of spectators at a football match: {23 456, 40 132, 28 320, 18 214, …etc} 6) Day of week when people were born: {Wednesday, Monday, Sunday, Sunday, Saturday, etc…} 7) Times in seconds between ‘blips’ of a Geiger counter in a physics experiment: {0.23, 1.23, 3.03, 0.21, 4.51, …etc} 8) Percentages gained by students for a test out of 60: {20, 78.33, 80, 75, 53.33, …etc} 9) Number of weeds in a 1 m by 1 m square in a biology experiment: {2, 8, 12, 3, 5, 8, …}

Page 9: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Solution 1 and 6 are categorical data, all the others are numerical. 2 - discrete 3 - continuous 4 - discrete, as the possible fractions can be listed 5 - discrete 7 - continuous 8 - discrete, as there are only 60 possible percentage scores. 9 - discrete, as there must be a whole number of weeds.

Page 10: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Different Sampling Techniques

There are many different ways to generate a sample for data collection:

4 of the most common are:

Random Sampling

Systematic Sampling

Stratified Sampling

Convenience Sampling

Look at the cards on the next slide and decide which sampling technique is being described. Think of an advantage and a disadvantage for the technique described.

Page 11: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

A bag contains 100 names. It is shaken and 30 names

are drawn from the bag without looking

A pollster stands in Huntingdon market square and asks the first 30 people

that will listen to her their opinions on a market

revamp.

In a survey to assess opinions about Year 10 uniform a school list is

printed and every 10th pupil on the list selected.

At a local club it is known that ¾ of the membership is female. A sample of 21 females and 7 males is

drawn by randomly picking names from a hat.

To find out opinions about a web site you ask the first 30 people to visit the site to complete a questionnaire

using their browser.

To select a sample of 6 people from a class of 30 to do a maths test, the class are lined up in height order

and every 5th pupil selected.

In a class of 20 pupils each pupil is assigned a number

and 4 members are selected for a competition

by using the random number generator on a

calculator.

A Secondary school has 3 Key Stages with pupils split between them in the ratio 3:2:3 To survey opinions about the school canteen they interview 30 students

from KS3, 20 from KS4 and 30 from KS5.

To investigate the health of whales a marine biology charity decide to estimate the length of whales in the

South Atlantic by measuring the first 10

whales they find.

Page 12: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveUnderstand the three key things required to analyse data

Page 13: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

In an experiment pupils were selected randomly from their maths lessons and asked to estimate the area of a triangle and a rectangle . The area of both

shapes was 15cm2. The results are shown below:

age gender Rec:15 Tr:1511 f 12 1111 f 10 5011 m 15 1011 f 15 1611 f 18 6411 f 30 511 m 16 2511 f 18 1511 f 16 1611 m 3 4.511 f 15 2011 m 8 1211 m 8 911 f 14 1311 f 15 11

age gender Rec:15 Tr:1517 f 13 817 m 14 1617 f 15 1817 m 12 2017 f 13 1217 f 16 1217 m 16 1417 m 14 1017 f 15 1217 f 12 1317 f 15 1317 f 10 2018 m 13 3018 f 14 1518 m 18 1219 f 15 1519 m 18 15

Analyse this data.

Page 14: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

In an experiment pupils were selected randomly from their maths lessons and asked to estimate the area of a triangle and a rectangle . The area of both

shapes was 15cm2. The results are shown below:

age gender Rec:15 Tr:1511 f 12 1111 f 10 5011 m 15 1011 f 15 1611 f 18 6411 f 30 511 m 16 2511 f 18 1511 f 16 1611 m 3 4.511 f 15 2011 m 8 1211 m 8 911 f 14 1311 f 15 11

age gender Rec:15 Tr:1517 f 13 817 m 14 1617 f 15 1817 m 12 2017 f 13 1217 f 16 1217 m 16 1417 m 14 1017 f 15 1217 f 12 1317 f 15 1317 f 10 2018 m 13 3018 f 14 1518 m 18 1219 f 15 1519 m 18 15

What things could we investigate?

Page 15: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Some nuggets of wisdom:

1)“This shows that the boys had a greater spread of data, meaning that the girls were more accurate” so spread implies accuracy?

2)“I predict that the girls will be more accurate than the boys at estimating the area as there are more of them and so a greater chance that more will correctly estimate the area” so the more people you have guessing the more accurate they will be?

3)“I predict that the boys will be better at estimating as there are fewer, meaning that there is less chance for anomalous results” so you get the best results by having a small sample size?

Mode – generally useless for this exercise Calculating how many got it exactly right is generally useless as the data is continuous - the fact that some people guessed it correctly has more to do with Psychology than good estimating skills. Averaging averages to get an all embracing average is NEVER a good idea: Data set 1 Data set 2 1 and 8 6

Page 16: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Things to consider:1)Is what they have tried to analyse clearly stated? Is there a hypothesis or some alternate statement explaining what they are trying to achieve?

2)Have they attempted to find an average? Is it the most appropriate average for the task? Is the average calculated properly?

3)Have they attempted to look at the consistency of the data? Have they used an appropriate method to measure consistency? Is their measure of consistency (range, IQR) calculated properly?

4)Have they drawn a graph or chart to help show the distribution of the data?

4)Have they written a final comment that refers to their initial statement/hypothesis and that attempts to provide a conclusion? Does the final comment agree with their actual maths? Have they referred to/tied their maths to the conclusion?(Eg the mean of …. for boys was greater than the mean for girls ….. therefore …) Does the conclusion comment on both consistency and averages? Is there anything in the conclusion to suggest deeper analysis? Is there anything that makes you go – that’s cleaver I like that!

1 mark

1 mark relevant average 1 mark accuracy

1 mark relevant measure 1 mark accuracy

1 mark relevant graph 1 mark accuracy

3 marks – you judge!

Page 17: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

When we are analysing numerical data we are interested in 3 things:

1)The Location (Size) of the data

2)The variation (Spread) of the data

3)The shape (Distribution) of the data

Page 18: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

1) The Location (Size) of the data

We use averages for this purpose:

Mean

Mode

Median

Mid Range

Page 19: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

2) The variation (Spread) of the data

Range

Inter-quartile Range

Standard deviation/Root Mean Squared Deviation

Page 20: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

3) The shape (Distribution) of the data

We use graphs for this purpose:

Stem and Leaf diagrams

Box and Whisker Plots

Bar Chars

Histograms

Page 21: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveRevise basic graph types and their usesFocus on drawing and interpreting histograms

Page 22: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

This data set is the heights of a group of 38 ‟A‟ level students.

1) How tall is the shortest person in the sample?2) How many girls in the sample?2) What is the range of the boys heights?3) What is the median height of the girls?4) What is the inter-quartile range of the boys heights?

GIRLS BOYS

Page 23: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

The Pie Charts show how Year 10 and 11 students travel to school.

From the Pie Charta) Can you tell if more Boys or Girls walk to school?

b) If the angle for walking in the girls section is 18 degrees and represents 10 pupils, how many girls were surveyed.

Page 24: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

This histogram illustrates the time students in a form group take to get to school in the morning.

a) Find the number of students in the class.b) Estimate the probability that a randomly chosen pupil takes between 10

and 20 minutes to get to school.

Page 25: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

height (cm) 110-119 120-129 130-134 135-139 140-149 150-159 160-179 180-189frequency 2 4 3 5 6 5 5 1

Question 1 The table below shows the heights, to the nearest centimetre, of a group of students.

a) Draw a histogram for this data.b) Use your histogram to estimate the number of students taller than 153cm.c) Estimate the number of students between 127 and 143 cm tall.

Page 26: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

height (cm) 110-119 120-129 130-134 135-139 140-149 150-159 160-179 180-189frequency 2 4 3 5 6 5 5 1frequency density 0.2 0.4 0.6 1 0.6 0.5 0.25 0.1

6.55 5 1 9.25

109 students

2.5 3.54 3 3 6 9.1

10 109 students

b) To find how many students are above 153cm in height, we would add the frequencies of the last two bars to the correct proportion of the previous bar. So there are approximately 9 students

above 153 cm.

c) The number of students between 127 and 143 cm tall is given by…

The class width of the first bar would appear to be 9, but it is not. Because the heights are measured to the nearest centimetre, the first class embraces all heights between 109.5cm and 119.5cm. This is a class width of 10, and also involves labelling 109.5, 119.5 etc. on the horizontal axis of the histogram. Adding the frequency density row to the table...

Page 27: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

time (minutes)

frequency

0-15 9015-20 4020-2525-35

2) Complete the table and histogram below.

Page 28: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

time (minute

s)frequency

frequency density

0-15 90 615-20 40 820-25 80 1625-35 100 10

Page 29: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Bar Chart

Most suitable Data Type(s)Discrete or Continuous

Numerical or Categorical

Advantages Disadvantages

Pie Chart

Stem and Leaf

Box and Whisker

Histogram

Page 30: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Bar Chart

Most suitable Data Type

Advantages Disadvantages

Pie Chart

Stem and Leaf

Box and Whisker

Histogram

Categorical Discrete

Categorical Discrete

Numerical Small data sets continuous or discrete

Numerical Continuous data

Numerical Continuous data

Shows proportionsClearly

Can’t see how many are in each category. Not good if there are too many categories

Easy to see how many are in each category. Shows shape well.

Can’t see proportions so easily

Keeps the raw dataShape of data clearOrdered data helps with medians etc

Not good for large data sets

Good for showing/comparing the spread of data

Looses raw data

Good for showing the shape of the data and the proportions

Can’t read actual frequencies for the groups easily

Page 31: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveBe able to calculate measures of Location/AveragesUnderstand summation notation for the mean

What is an average and why do we have more than one way of calculating them?

Page 32: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

“Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable. ” ~Bobby Bragan, 1963

“The average human has one breast and one testicle.” ~Des McHale

“I abhor averages. I like the individual case. A man may have six meals one day and none the next, making an average of three meals per day, but that is not a good way to live.” ~Louis D. Brandeis

These quotes might help you consider the answer to this question:

Page 33: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Averages for raw/untabulated data

The data shows the number chocolates gratefully provided to a particular maths teacher from his sixth form classes over a 3 week period:

Find the mean, mode, median and mid-range of the number of gifts received:

1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3

Page 34: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

The data shows the number chocolates gratefully provided to a particular maths teacher from his sixth form classes over a 3 week period:

Find the mean, mode, median and mid-range of the number of gifts received:

1, 2, 0, 3, 5, 1, 2, 0, 0, 4, 1, 1, 2, 1, 3

0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 5

Median: 1

Mode: 1

Mean: =x 73.115

26

f

fx

Page 35: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Averages for tabulated data

shoe size (x) frequency (f)5 36 147 138 219 16

10 8Total 75

Find the mean, mode, median and mid-range for this data, showing shoe size

Page 36: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Median:75 items of data median at (75 + 1)/2 = 38th positionCounting through the list median shoe size is 8

Mode: 8

Mean = =

shoe size (x) frequency (f) frequency × shoe size (fx)5 3 156 14 847 13 918 21 1689 16 144

10 8 80Total 75 582

Find the mean, mode, median and mid-range for this data, showing shoe size

x 76.775

582

f

fx

Page 37: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Averages for tabulated data

Find the mean, mode, median and mid-range for this data, showing speeds of vehicles along a road:

speed, s (mph)

number of vehicles (f)

20 ≤ s < 25 725 ≤ s < 30 1130 ≤ s < 35 3135 ≤ s < 40 2040 ≤ s < 45 1445 ≤ s < 50 9

Total 92

Page 38: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Median:Use Cumulative Frequency Curve Instead for better accuracy! Estimate 92 items of data median at (92 + 1)/2 = 46.5th positionCounting through the list this will be in the 30 to 35 interval.

Modal interval : 30 ≤ s < 35

Mean: Can only be estimated as lack of raw data = =

Find the mean, mode, median and mid-range for this data, showing speeds of vehicles along a road:

x mphf

fx21.35

92

3240

speed, s (mph)

number of vehicles (f)

mid-point (x)

frequency × mid-point (fx)

20 ≤ s < 25 7 22.5 157.525 ≤ s < 30 11 27.5 302.530 ≤ s < 35 31 32.5 1007.535 ≤ s < 40 20 37.5 75040 ≤ s < 45 14 42.5 59545 ≤ s < 50 9 47.5 427.5

Total 92 3240

Page 39: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveBe able to calculate Interquartile Range for a list of dataDrawing and Interpreting Box and Whisker PlotsUnderstanding Skewness and identifying outliers

Page 40: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Two classes did a test (out of 100)Here are the resultsClass A: 50 82 40 51 45 50 48 49 47 10 43 58 56 52 39 16

Class B: 20 34 50 48 62 70 39 47 12 38 40

a)Find the median and interquartile range of the set of marks for each class. b)Draw a box and whisker plot to compare the results for each class.

Page 41: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Two classes did a test (out of 100)Here are the resultsClass A: 10 16 39 40 43 45 47 48 49 50 50 51 52 56 58 82

Class B: 12 20 34 38 39 40 47 48 50 62 70

a)Find the median and interquartile range of the set of marks for each class. b)Draw a box and whisker plot to compare the results for each class

0 10 20 30 40 50 60 70 80 90 100

CLASS A

CLASS B

Class A Median: 48.5, IQ Range = 10 Negatively Skewed Class B Median: 40, IQ Range = 16 Positively Skewed

48.5

Page 42: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

A piece of data is generally considered an outlier if it is :

1.5 × IQR below the lower quartile

OR

1.5 × IQR above the upper quartile

Class A: 10 16 39 40 43 45 47 48 49 50 50 51 52 56 58 82Class B: 12 20 34 38 39 40 47 48 50 62 70

Are there any outliers in each class?

Page 43: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Design a data set for one of the box and whisker charts on the next pageSwap with a partner

They must design a data set to recreate your graph as best as possibleCompare at the end

Page 44: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What
Page 45: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What
Page 46: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveBe able to calculate the Standard Deviation for a set of dataUse calculator to find the Standard Deviation for a set of data

Write down some statements to compare these two sets of data.Which features are the same and which are different?

Page 47: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Here is the actual data?How does this clash with your previous assumptions?

Page 48: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Consider the following sets of numbers.Find the Range, The Interquartile Range and the MeanWhat are the limitations of the Range and the Interquartile Range in measuring consistency in a data set

4, 5, 9, 6, 6, 10, 10, 10, 11, 19

Page 49: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

The Root Mean Squared Deviation(Commonly called the Standard Deviation of a Sample)

R.M.S

The value of this equation before you square root is referred to as the VARIANCE

The Standard Deviation for a Population (It can be shown the Root Mean Squared Deviation formula when calculated on a sample taken from a population generally produces a result that is lower than the actual Standard Deviation of the Population – this is S3 + S4). The formula can therefore be adjusted as follows to take this into account:

S.D of Population

The value of this equation before you square root is still referred to as the VARIANCE

NOTE: FOR OUR SYLLABUS IT IS EXPECTED THAT YOU WILL ALWAYS USE THE BOTTOM FORMULA WHEN ASKED TO CALCULATE STANDARD

DEVIATION!!

Page 50: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Find the standard deviation for this set of data

Page 51: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What
Page 52: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveUnderstand the concept of ‘Coding’Be able to find the mean and standard deviation of ‘coded’ data and related data sets

Page 53: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Here is some data.We will call this data the ‘x’ data:

Find the mean and the standard deviation of this data?Check your results on your calculator.

Page 54: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Investigation

Suppose you multiply each of the data you just used by 2 and add 3. Write down the new set of data. Call it the y-data.Now calculate the and the standard deviation of the y-data.What do you notice? How is it related to the original x-data?

What if you multiply it by 2 and add 5?

What if you multiply by 3 and add 5?

Can you predict what will happen if you multiply by ‘a’ and add ‘b’?Can you justify your results?

Page 55: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Suppose you have a set of values (x-data) x1, x2, x3, x4, x5 ……….

Let the mean of the set of data be ‘m’ and the standard deviation ‘s’

Let another set of values (y-data) be so related to the x-data by a linear formula of the form yi = a × xi + b (‘a’ and ‘b’ are constants)

Then:

The mean of the y values = a × mean of ‘x-data’ + bThe standard deviation of the y values = a × standard deviation of ‘x-data’

We can use this to find the mean of related sets of data. This process is called ‘Coding’

Eg Consider the values 1002, 1004, 1006, 1008, 1010

This data set is merely the data set 1, 2, 3, 4, 5 multipled by 2 and with 1000 added. The mean of 1, 2, 3, 4, 5 is 3 and the sd of 1, 2, 3, 4, 5 is 1.58

so the mean of the original data is 2 × 3 + 1000 = 1006 the sd of the original data is 2 × 1.58 = 3.16

Page 56: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Ex 50 Book S1 Third Edition

Page 57: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveRecognise and be able to use the alternative formula for standard deviation.

shoe size (x) frequency (f)5 36 147 138 219 16

10 8Total 75

75 adults were asked to their shoe size. The results are recorded in the table below. Calculate the standard deviation in the shoe-sizes using the formula:

Check your result using your calculator

1

)( 2

n

xx

Page 58: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveRecognise and be able to use the alternative formula for standard deviation.

shoe size (x) frequency (f) x × f5 3 15 22.85286 14 84 43.36647 13 91 7.50888 21 168 1.20969 16 144 24.6016

10 8 80 40.1408Total 75 582 139.68

75 adults were asked to their shoe size. The results are recorded in the table below. Calculate the standard deviation in the shoe-sizes using the formula:

Check your result using your calculator:

fxx 2)(

Mean = 582÷ 75 =7.76 sd = √(139.68 ÷ 74) = 1.37

1

)( 2

n

xx

Page 59: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

shoe size (x) frequency (f) x × f5 3 15 756 14 84 5047 13 91 6378 21 168 13449 16 144 1296

10 8 80 800Total 75 15 4656

An alternative (rearrangement) of the formula:

Is:

This gives the same answer but is slightly easier to use when the data is in a frequency table:

fx 2

Mean = 582÷ 75 =7.76 sd = = 1.37

1

)( 2

n

xx

1

22

n

xnx

74

76.7754656 2

Page 60: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Height, h (cm)

mid-points

frequency (f)

158.5 4160.5 11162.5 19164.5 8166.5 5168.5 3Total

50 female students had their heights measured. The results were put into the table below. Find the mean height and the standard deviation in the heights:

Check your result using your calculator.

Page 61: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Height, h (cm)

mid-points

frequency (f)

158.5 4160.5 11162.5 19164.5 8166.5 5168.5 3Total

50 female students had their heights measured. The results were put into the table below. Find the mean height and the standard deviation in the heights:

Check your result using your calculator.

Mean 162.5 cmsd = 2.56 cm

Page 62: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Different style of exam question

1

)( 2

n

xx

1

22

n

xnx

Standard deviation formulae

Given the following information relating to data placed in a frequency distribution.

Find the mean and the standard deviation of the data

Page 63: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Different style of exam question

1

)( 2

n

xx

1

22

n

xnx

Standard deviation formulae

Given the following information relating to data placed in a frequency distribution.

Find the mean and the standard deviation of the data

Mean = 6.1 sd = 2.25 (3 sig fig)

Page 64: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Lesson ObjectiveUnderstand what cumulative frequency curves representBe able to draw a cumulative frequency curve Use a cumulative frequency curve to find medians, quartiles and percentiles

Page 65: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Weight of the Egg, w (grams)

Frequency

30 ≤ w < 40 15

40 ≤ w < 50 25

50 ≤ w < 60 50

60 ≤ w < 70 40

70 ≤ w < 80 10

An egg farmer wants to grade his eggs in terms of size.Grade A will be the biggest size of eggGrade B the next, biggest etc with Grade D the smallest.Each grading should contain the same proportion of eggs.

The table shows the weight of his first batch of eggs.What ‘boundaries’ should he choose for each egg Grade?

Page 66: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Weight of the Egg, w (grams)

Cum.

Freq.

0 ≤ w < 40 15

0 ≤ w < 50 40

0 ≤ w < 60 90

0 ≤ w < 70 130

0 ≤ w < 80 140

Weight of the Egg, w (grams)

Frequency

30 ≤ w < 40 15

40 ≤ w < 50 25

50 ≤ w < 60 50

60 ≤ w < 70 40

70 ≤ w < 80 10

Quartile values will be roughly around: 35 (LQ), 70 (MEDIAN), 105 (UQ)

LQ could be found by saying 40 + 20/25 of 10 = 48

MEDIAN 50 + 30/50 of 10 = 56

UQ 60+ 15/40 of 10 = 63.75

But this approach assumes a linear growth in the frequency across each interval

Page 67: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Weight of the Egg, w (grams)

Frequency

30 ≤ w < 40 15

40 ≤ w < 50 25

50 ≤ w < 60 50

60 ≤ w < 70 40

70 ≤ w < 80 10

Cum

ulat

ive

freq

uenc

y

Weight

30 35 40 45 50 55 60

10

20

30

40

50

60

70

80

90

100

0

110

120

130

140

65 70 75 80

a) How a many eggs did the farmer harvest on this particular day?

b) Estimate the Median weight of the eggs collected.

c) Estimate the Inter-quartile range in the Eggs collected.

Weight of the Egg, w (grams)

Cum.

Freq.

0 ≤ w < 40

0 ≤ w < 50

0 ≤ w < 60

0 ≤ w < 70

0 ≤ w < 80

Page 68: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Time in mins

Cum

ulat

ive

freq

uenc

y

30 35 40 45 50 55 60

10

20

30

40

50

60

70

80

90

100

0

Cumulative Frequency goes up the side

Horizontal axis has a continuous scale

You plot Cumulative Frequency at the end of the interval.

(35,10)

(40,21) etc

A Cumulative frequency graph tells you how many items are below each value. Here 80 people waited for less than 53 mins. It is mainly used to estimate medians and percentiles for grouped data.

Waiting Time Cum. Freq.

0 ≤ w < 35 10

0 ≤ w < 40 21

0 ≤ w < 45 46

0 ≤ w < 50 73

……etc. …etc

There were 100 people. The median waiting time is that obtained by the 50th

person (half of 100) = 46 mins.

To find the Upper quartile, read the time at 75. For the lower quartile read the time at 25.

Graph shows how long people waited to be seen at an eye clinic. Key Points:

Page 69: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Can you find data sets to match these cumulative

frequency curves

Page 70: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Summary of what we have learned:

Page 71: Torture numbers, and they'll confess to anything. ~Gregg Easterbrook 98% of all statistics are made up. ~Author Unknown Statistics are like bikinis. What

Summary of what we have learned:

When comparing data we are interested in the location of the data (averages) the consistency of the data (measures of spread) and the shape of the data (Graphs)

Averages: A single item of data that represents the whole data set Mean, Mode, Median, Mid Range

Spread: Range, Interquartile Range, Root Mean Squared Deviation, Standard Deviation

Shape: Bar Charts, Frequency Charts, Histograms, Frequency Polygons

Can also draw Box and Whisker Plots (Good for showing skewness and spread) Pie Charts (Good for showing proportions) Cumulative Frequency Curves (Good for finding Interquartile Range for grouped data)

The formula for the Variance is that for standard deviation without the square root Outliers are defined as being either: 1.5xIQR above the UQ or below the LQ or above/below mean +/- 2 standard deviations

1

)( 2

n

xx

1

22

n

xnx

Standard deviation formulae:

22

xn

x

n

xx 2)(

Root Mean squared Formulae: