Upload
marley-pitcock
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
HANDLING DATACOURSEWORK
School Database
Planning the Investigation
SampleMean, Median,
Mode and RangePie Charts
Scatter Plots
What is Coursework???
Specify and PlanCollect, Process
& RepresentInterpret and
DiscussWhat You Should
Do?
Main Menu
Bar Charts
Histograms and Freq Polygons
Stem and Leaf Plots
Cumulative Frequency
Box and Whisker Plots
Is given in detail on the task sheet.
Basically your task is to:
“investigate what influences the amount a student drinks.”
The database has been selected for you from Rondam Secondary school.
Your Task
A MIX OF THE FOLLOWING:
•Direct Teaching – statistics skills, ICT, investigation cycle
•Group Work – planning, discussing, plagiarism?
•Individual Time – writing up, working
What Will Happen
Specify and PlanHypothesis
Collect, process and represent
Interpret and discuss
How could you make it better?
Investigation cycle
Specify and plan
What to do in this section?
Examine the Writing Frame and what decisions you must make to fill it in.
Decide on the hypothesis you are going to test. Make sure it is well explained.
Write a clear and detailed description of the task and your plan to test the hypothesis.
Do a draft first. Your final write up will come later.
Collect, Process and RepresentSpecify and PlanHypothesis
Collect, process and represent
Interpret and discuss
How could you make it better?
Investigation cycle
Specify and plan
What to do in this section?
Collect the data – fully explain your sampling technique and sample size.
Tabulate the data. Only include the information relevant to your hypothesis.
Using statistical and graphical methods to process and examine the data.
Interpret and DiscussSpecify and PlanHypothesis
Collect, process and represent
Interpret and discuss
How could you make it better?
Investigation cycle
Specify and plan
What to do in this section?
This is the big crunch section.
Draw conclusions from all of your calculations and relate these to your initial hypothesis.
Make sure you: Compare results to show differences/similarities. Use facts and statistics taken directly from your calculations. Evaluate your approach and explain any changes you would make
if you were doing it again. Consider bias in your results.
ChallengeWhat will a good piece of maths investigative work look like ???
You should consider:
• What will it contain?
• How will it be presented?
• How will it be marked?
• What will it look like?
15 mins in groups of 5 or 6
And Now …….
Formulating a hypothesisFormulating a hypothesis
The first step in planning a statistical enquiry is to decide what problem you want to explore.
This can be done by asking questions that you want your data to answer and by stating a hypothesis.
A hypothesis is a statement that you believe to be true but that you have not yet tested.
The plural of hypothesis is hypotheses.
For example,
Year Eleven pupils with paid jobs don’t do as well
in their exams.
“Year Eleven pupils with paid jobs don’t do as well in their exams.”
Forming a hypothesisForming a hypothesis
How could you find out if this statement is true?
How will you collect it?
Which Year Elevens does this statement cover? How could you ensure the data you collect represents all
of these Year Elevens?
What would you do with the data?
What would you expect to find?
Think about:
What data (information) would you need to collect?
Key vocabularyKey vocabulary
hypothesis – a statement that can be tested
population – the group (often of people) referred to in the hypothesis
sample – a selection from the population
biased sample – an unfair selection
representative sample – a fair selection
cross section – a selection that reflects all the subgroups within the population
objective data – information that is not affected by people’s opinions
Key vocabularyKey vocabulary
subjective data – information that is affected by people’s opinions
primary data – information you collect yourself, by asking people, measuring, carrying out experiments, and so on
secondary data – information that has been collected already, that you get from books, the internet, and so on
ethical issues – problems to do with confidentiality and personal questions
reliable results – results that will be repeated if the experiment or survey is carried out again with a new sample
Extending a hypothesisExtending a hypothesisOnce you have collected data and drawn conclusions about your hypothesis, you could ask further questions and pursue other lines of enquiry.
You will need to plan what these might be beforehand if you are carrying out a survey. For example,
How could you extend these hypotheses?
What extra information might it be worth collecting?
“People feel stressed when they have exams.”
“You get less work done when it is noisy.”
“Sleep deprivation affects concentration.”
“Coffee can help you revise better.”
“The more revision you do, the better your exam results.”
How are TV viewing figures compiled?
VIEW IN G FIG U R ES
Westenders
Carnation S treet
JAN FEB M AR APR M AY AU G
2
4
6
8
10
12
M illions
Sampling – Soap WarsSampling – Soap Wars
Television viewing figuresTelevision viewing figures
When compiling television viewing figures, it is impractical to find out what everyone in the country is watching at a particular time.
Instead, the viewing habits of a sample of households is carefully monitored and the data collected is used to compile the figures.
To avoid bias, it is important that the sample is representative of all television viewing households across the country.
This is done by dividing households into categories and taking samples in proportion to the size of each category.
This is an example of a stratified sample.
Different sampling methodsDifferent sampling methodsRandom samplingPeople are chosen at random e.g. names picked from a hat or using a random number generator on a calculator.Every member of the population has an equal chance of being chosen.
27
Systematic sampling
Members of the population are chosen at regular intervals, such as every 100th person from a telephone directory.
Quota sampling
You keep asking until you have enough people from each category. An example would be a survey in the street where you stop when you have enough people from each age category.
Evaluating different sampling methodsEvaluating different sampling methods
Random sampling
Every member of the population has an equal chance of being chosen, which makes it fair.
It can be very time consuming and usually impractical.
Systematic sampling
You are unlikely to get a biased sample.
It is not strictly random: some members of the population cannot be chosen once you have decided where to start on the list.
Evaluating different sampling methodsEvaluating different sampling methods
Quota sampling
This is easier to manage.
It could be biased. For example, if you are only asking people on the street or in a shop, the sample might not represent people at work all day.
Stratified sampling
It is the best way to reflect the population accurately.
It is time consuming and you have to limit the number of relevant variables to make it practical.
The three averages and rangeThe three averages and range
There are three different types of average:
MEDIAN
middle value
The range is not an average, but tells you how the data is spread out:
RANGE
largest value – smallest value
MODE
most common
MEAN
sum of valuesnumber of values
Comparing sets of dataComparing sets of data
Chris Rob
Mean 24.8 seconds 25.0 seconds
Range 1.4 seconds 0.9 seconds
Here is a summary of Chris and Rob’s performance in the 200 metres over a season. They each ran 10 races.
Which of these conclusions are correct?
Robert is more reliable. Robert is better because his mean is higher. Chris is better because his range is higher. Chris must have run a better time for his quickest race. On average, Chris is faster but he is less consistent.
Pie chartsPie charts
A pie chart is a circle divided up into sectors which are representative of the data.
In a pie chart, each category is shown as a fraction of the circle.
For example, in a survey half the people asked drove to work, a quarter walked and a quarter went by bus.
Methods of travel to work
Car
Walk
Bus
Pie chartsPie charts
There are 9° per person.
To convert raw data into angles for n data items:
360 ÷ n represents the number of degrees per data item.
To convert raw data into angles for n data items:
360 ÷ n represents the number of degrees per data item.
For example, 40 people take part in a survey. What angle represents
one person? 360° ÷ 40 = 9°
two people? 9° × 2 = 18°
eight people? 9° × 8 = 72°
How many people are represented by an angle of 36°?
36° ÷ 9° = 4 people.
Drawing pie chartsDrawing pie charts
There are 30 people in the survey and 360º in a full pie chart.Each person is therefore represented by 360º ÷ 30 = 12º
We can now calculate the angle for each category:
Newspaper No of people Working Angle
The Guardian 8
Daily Mirror 7
The Times 3
The Sun 6
Daily Express 6
8 × 12º 96º
7 × 12º 84º
3 × 12º 36º
6 × 12º 72º
6 × 12º 72º
Total 30 360º
Drawing pie chartsDrawing pie charts
Once the angles have been calculated you can draw the pie chart.
Start by drawing a circle using a compass.
Draw a radius.
Measure an angle of 96º from the radius using a protractor and label the sector.
96º
The Guardian
Measure an angle of 84º from the the last line you drew and label the sector.
84º
The Daily Mirror
Repeat for each sector until the pie chart is complete.
36º
The Times
72º
72º
The Sun
The Daily Express
Drawing bar chartsDrawing bar charts
When drawing bar chart remember:
Give the bar chart a title.
Label both the axes.
Use equal intervals on the axes.
Leave a gap between each bar.
Drawing bar chartsDrawing bar charts
Use the data in the frequency table to complete a bar chart showing the the number of children absent from school from each year group on a particular day.
YearNumber of absences
7 74
8 53
9 32
10 11
11 10
Bar charts for two sets of dataBar charts for two sets of data
Two or more sets of data can be shown on a bar chart.
For example, this bar chart shows favourite subjects for a group of boys and girls.
Girls' and boys' favourite subjects
0
1
2
3
4
5
6
7
8
Maths Science English History PE
Favourite subject
Nu
mb
er
of
pu
pil
s
Girls
Boys
Frequency diagramsFrequency diagrams
Frequency diagrams can be used to display grouped continuous data.For example, this frequency diagram shows the distribution of heights for a group students:
Fre
quen
cy
Height (cm)
0
5
10
15
20
25
30
35
150 155 160 165 170 175 180 185
Heights of students
This type of frequency diagram is often called a histogram.
Drawing frequencyDrawing frequency diagrams diagrams
Use the data in the frequency table to complete the frequency diagram showing the time pupils spent watching TV on a particular evening:
Time spent (hours)
Number of people
0 ≤ h < 1 4
1 ≤ h < 2 6
2 ≤ h < 3 8
3 ≤ h < 4 5
4 ≤ h < 5 3
h ≤ 5 1
Histograms and Frequency PolygonsHistograms and Frequency Polygons
We can show the trend of these graphs more clearly using a FREQUENCY POLYGON.
Using a previous example, you first need to draw a histogram
Fre
quen
cy
Height (cm)
0
5
10
15
20
25
30
35
140 145 150 155 160 165 170 175
Heights of Year 8 pupils
Then joint the midpoints of each column.
Scatter graphsWhat does this scatter graph show?
50
55
60
65
70
75
80
85
0 20 40 60 80 100 120Number of cigarettes smoked in a week
Lif
e e
xp
ect
ancy
It shows that life expectancy decreases as the number of cigarettes smoked increases.
This is called a negative correlation.This is called a negative correlation.
Scatter GraphsScatter Graphs
Interpreting scatter graphsInterpreting scatter graphsScatter graphs can show a relationship between two variables.
This relationship is called correlation.
Correlation is a general trend. Some data items will not fit this trend, as there are often exceptions to a rule. They are called outliers.
Scatter graphs can show:
positive correlation: as one variable increases, so does the other variable
negative correlation: as one variable increases, the other variable decreases
zero correlation: no linear relationship between the variables.
Correlation can be weak or strong.
0
5
10
15
20
25
0 5 10 15 20 25
Strong positive correlation
0
5
10
15
20
25
0 5 10 15 20 25
Strong negative correlation
0
5
10
15
20
25
0 5 10 15 20 25
Weak negative correlation
The line of best fitThe line of best fitThe line of best fit is drawn by eye so that there are roughly an equal number of points below and above the line.
Look at these examples,
0
5
10
15
20
25
0 5 10 15 20 25
Weak positive correlation
Notice that the stronger the correlation, the closer the points are to the line.
If the gradient is positive, the correlation is positive and if the gradient is negative, then the correlation is also negative.
Line of best fitLine of best fit
The line does not have to pass through the origin.
When drawing the line of best fit remember the following points,
For an accurate line of best fit, find the mean for each variable. This forms a coordinate, which can be plotted. The line of best fit should pass through this point.
The line of best fit can be used to predict one variable from another.
It should not be used for predictions outside the range of data used.
The equation of the line of best fit can be found using the gradient and intercept.
The data below represents the numbers of cigarettes smoked in a week by regular smokers in Year 11.
7 38 41 22 20 7 5 24 1715 13 23 45 7 11 17 30 19 5 10 30 20
Constructing stemConstructing stem--andand--leaf diagramsleaf diagrams
Put this data into a stem-and-leaf diagram.
The stem should represent ____ and the leaf should represent _____.
Work out the mode, mean, median and range.
tensunits
5 5 7 7 70
1 54
0 0 83
0 0 2 3 42
0 1 3 5 7 7 91
Leaf (units)Stem (tens)
Calculations with stem-and-leaf diagramsCalculations with stem-and-leaf diagrams
427 ÷ 22 =___19
This is ___.
427
22
7
17 19 18
45 5 40
Mode
The mode is __ .
Mean
There are ___ people in the survey and they smoke a total of ____ cigarettes a week.
Median
The median is halfway between ___ and ___.
Range
___ – ___ = ___
Solving problems with stem-and-leaf diagramsSolving problems with stem-and-leaf diagramsWhat fraction of the group smoke more than 20 cigarettes a week? What is this as a percentage?
The mean number smoked is 19. How many smoke less than the mean? What is this as a percentage?
What percentage smoke less than 10 cigarettes?
A packet of 20 cigarettes costs about £4. Work out the average amount spent on cigarettes using the median.
5 5 7 7 70
1 54
0 0 83
0 0 2 3 42
0 1 3 5 7 7 91
Leaf (units)Stem (tens)
You are going to record how long each member of your class can keep their eyes open without blinking.
Cumulative Freq - Choosing class intervalsCumulative Freq - Choosing class intervals
How could this information be recorded?
What practical issues might arise?
Time is an example of continuous data.
You will have to decide how accurately to measure the times,
to the nearest tenth of a second?
to the nearest second?
to the nearest five seconds?
You will also have to decide what size class intervals to use.
Holding Your BreathHolding Your Breath
When continuous data is grouped into class intervals it is important that no values are missed out and that there are no overlaps.
For example, you may decide to use class intervals with a width of 5 seconds.
If everyone holds their breath for more than 30 seconds the first class interval would be more than 30 seconds, up to and including 35 seconds.
This is usually written as 30 < t ≤ 35, where t is the time in seconds.
The next class interval would be _________.35 < t ≤ 40
Cumulative frequencyCumulative frequencyCumulative frequency is a running total. It is calculated by adding up the frequencies up to that point.
Cumulative frequency
1650 < t ≤ 55
1155 < t ≤ 60
930 < t ≤ 35
1235 < t ≤ 40
2440 < t ≤ 45
2845 < t ≤ 50
Time in secondsFrequencyTime in seconds
89 + 11 = 100
73 + 16 = 89
45 + 28 = 73
21 + 24 = 45
9 + 12 = 21
9
0 < t ≤ 55
0 < t ≤ 60
0 < t ≤ 35
0 < t ≤ 40
0 < t ≤ 45
0 < t ≤ 50
Here are the results of 100 people holding their breath:
Plotting a cumulative frequency graphPlotting a cumulative frequency graph
Time in seconds
Cum
ulat
ive
freq
uenc
y
30 35 40 45 50 55 60
10
20
30
40
50
60
70
80
90
100
0
The upper boundary for each class interval is plotted against its cumulative frequency.
A smooth curve is then drawn through the points.
We can use the graph to estimate the median by finding the time for the 50th person.
This gives us a median time of 47 seconds.
The interquartile rangeThe interquartile rangeRemember, the range is a measure of spread. It is the difference between the highest value and the lowest value.
When the range is affected by outliers it is often more appropriate to use the interquartile range.
The interquartile range is the range of the middle 50% of the data.
The lower quartile is the data item ¼ of the way along the list.
The upper quartile is the data item ¾ of the way along the list.
interquartile range = upper quartile – lower quartileinterquartile range = upper quartile – lower quartile
Finding the interquartile rangeFinding the interquartile range
Time in seconds
Cum
ulat
ive
freq
uenc
y
30 35 40 45 50 55 60
10
20
30
40
50
60
70
80
90
100
0
The lower quartile is the time of the 25th person.
The upper quartile is the time of the 75th person.
The interquartile range is the difference between these two values.
51 – 42 = 9 seconds
The cumulative frequency graph can be used to locate the upper and lower quartiles and so find the interquartile range.
42 seconds
51 seconds
A box-and-whisker diagramA box-and-whisker diagramA box-and-whisker diagram, or boxplot, can be used to illustrate the spread of the data in a given distribution using the median, the lower quartile and the upper quartile.
These values can be found from a cumulative frequency graph.
Time in seconds
Cum
ulat
ive
freq
uenc
y
30 35 40 45 50 55 60
10
20
30
40
50
60
70
80
90
100
0
For example, for this cumulative frequency graph showing the results of 100 people holding their breath,
Minimum value = 30
Lower quartile = 42
Median = 47
Upper quartile = 51
Maximum value = 60
A box-and-whisker diagramA box-and-whisker diagramThe corresponding box-and-whisker diagram is as follows:
30
Minimum value
42
Lower quartile
47
Median
51
Upper quartile
60
Maximum value
Lap timesLap times
James takes part in karting competitions and his Dad records his lap times on a spreadsheet.
The track is 1108 metres long. James’ fastest time in a race was 51.8 seconds.
In which position in the list would the median lap time be?
One of the karting tracks is at Shenington. In 2004, 378 of James’ lap times were recorded.
There are 378 lap times and so the median lap time will be the
378 + 1
2
thvalue ≈ 190th value
Lap timesLap times
In which position in the list would the lower quartile be?
There are 378 lap times and so the lower quartile will be the
378 + 1
4
thvalue ≈ 95th value
In which position in the list would the upper quartile be?
There are 378 lap times and so the upper quartile will be the
284th value378 + 1
4
thvalue ≈3 ×
Lap times at Shenington karting circuitLap times at Shenington karting circuitJames’ lap times are displayed in the following cumulative frequency graph.
Lap times in seconds
Cum
ulat
ive
freq
uenc
y
52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 920
50
100
150
200
250
300
350
400
Box and whisker plot for James’ race timesBox and whisker plot for James’ race times
What conclusions can you draw about James’ performance?
52
Minimum value
53
Lower quartile
54
Median
58
Upper quartile
91
Maximum value