Upload
blake-yates
View
19
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Statistics. Types of Data. Categorical data is data which has worded categories eg ‘ Ways of getting to school ’ might have the categories bus, car, walk, bike. Quantitative data is numerical data. It will either be discrete or continuous. - PowerPoint PPT Presentation
Citation preview
StatisticsStatistics
Types of DataTypes of Data• Categorical data is data which has worded categories eg ‘Ways of getting to school’ might have the categories bus, car, walk, bike.
• Quantitative data is numerical data. It will either be discrete or continuous.
• Discrete data is countable eg number of peas in a pod, number of pupils in a class.
• Continuous data is measurable eg lengths of leaves, heights of people, weights of animals etc
DefinitionsDefinitions• The population is the group about which you wish to collect information.
• A census involves collecting data from every person in the population.
• A sample involves collecting data from those in a part of the population.
• A sample should be unbiased and therefore representative of the entire population.
• A biased sample is one which has been unfairly influenced by the collection process.
Ways of Displaying DataWays of Displaying Data
1. Bar Graph (for discrete data)
2. Histogram (for continuous data)
3. Dot Plot
4. Strip Graph
5. Pictogram
6. Pie Graph
7. Stem and Leaf
8. Box and Whisker
9. Line Graph
*
*
*
*
*
*
* = most important ones
Frequency Tables Frequency Tables Here are the number of questions the 27 students in 9HRT got correct out of the 8 ‘Do Now’ revision questions at the start of a lesson:5, 5, 4, 6, 4, 4, 5, 5, 6, 4, 4, 3, 7, 4, 6, 5, 5, 5,
4, 6, 5, 6, 4, 5, 4, 5, 5
We illustrate this in a frequency table as follows:
Number of Number of questions questions correctcorrect
Tally MarksTally Marks FrequencyFrequency
33
44
55
66
77
TotalTotal
1
9
11
5
1
27
'Do Now' Questions Correct
0123456789
101112
1 2 3 4 5 6 7 8
Number Correct
Fre
quency
Number of Number of questions questions correctcorrect
FrequencFrequencyy
33 11
44 99
55 1111
66 55
77 11
For discrete data, we graph the information in a bar graph.
A bar graph must have:• a title• labels on both axes• scale on both axes• separate bars• x-axis numbers in centre of the bar• y-axis scale starting at 0 with evenly spaced numbers• a good size – at least ⅓
page
Discrete Data
OutliersOutliers are data values that are either much larger or much smaller than the general body of data.
Outliers appear separated from the body of data on a graph.Number of peas in pods from my garden
Frequency
Grouped Discrete DataA kindergarten was concerned about the number of
cars passing by between 8.45 am and 9.00 am. Over 30 week days they recorded data.The results were: 27, 30, 17, 13, 46, 23, 40, 28, 38, 24, 23, 22, 18, 29, 16, 35, 24, 8, 24, 44, 32, 52, 31, 39, 32, 19, 41, 38, 24, 32In situations like this, it is necessary to group the data into class intervals.
Histogram:
Cars passing by kindergarten
Frequency
0 10 20 30 40 50 60
Number of cars
Frequency Table:
A histogram is then drawn as shown below.
Note the difference in the scale on the horizontal axis.
Continuous DataFor continuous data, we graph the information in a histogram.Example: 2 students lie in the 60 kg up to but not including 70 kg, 7 students lie in the 70 kg up to but not including 80 kg, 9 students lie in the 80 kg up to but not including 90 kg, 5 students lie in the 90 kg up to but not including 100 kg, 3 students lie in the 100 kg up to but not including 110 kg.
The frequency table would be:
The graph would be:
Frequency
Weights of boys in the rugby squad.
Rugby
Squad
Note: The bars are JOINED
A bar graph must have:• a title• labels on both axes• scale on both axes• separate bars• x-axis numbers in centre of the bar• y-axis scale starting at 0 with evenly spaced numbers• a good size – at least ⅓ page
histogram
bars joinedat the join of the bars
AveragesAveragesAn average is a number that is typical of the An average is a number that is typical of the
data.data.
In maths we use three different types of averageIn maths we use three different types of average
1. 1.
2.2.
3.3.
median = middle value
mode = most common value
Example: the mean of 5, 0, 8, 1, 0, 4, 3, 0, 2, 2
10
2203401805
5.210
25
valuesofnumber
valuesalloftotalmean =
The The medianmedian is the middle value when they are all is the middle value when they are all
placed in order.placed in order.
For an For an odd number odd number of data, the median is the of data, the median is the oneone in in the middle.the middle.
For an For an even numbereven number of data, the median is between the of data, the median is between the two middle values.two middle values.
Example: The median of: 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4Example: The median of: 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4
In order, the data is: 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8In order, the data is: 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8
There are 13 bits of data so the median is the 7th
bitmedian = 5
Example: The median of 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8
There are 14 bits of data so the median is between the 7th and 8th bit
median is (5+6) ÷ 2 = 5.5
The mode is the value that occurs most often
Example: The mode of 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4
The mode = 6
Example: The mode of 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8This data set has two modes. The modes are 6 and 7. We say that the data is bimodal.
Pg 308 Ex 10F.1
If a data set has more than 2 modes, we do not use the mode as a measure of the middle.
Measures of SpreadMeasures of Spread The The rangerange is the difference between the largest and is the difference between the largest and
the smallest value, ie range = highest value – lowest the smallest value, ie range = highest value – lowest value.value.
The The inter-quartile rangeinter-quartile range is the difference between the is the difference between the upper quartile and lower quartile, upper quartile and lower quartile,
ie I.Q.R. = upper quartile – lower quartileie I.Q.R. = upper quartile – lower quartileWhat’s a quartile?
That’s easy. It’s just the median of each half of
the data.
eg: 2, 5, 8, 3, 5, 7, 1, 9, 4, 8, 4, 6, 7, 8, 4, 9
The range is 9 - 1 = 8
QuartilesQuartiles1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 8, 9, 91, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 8, 9, 9
18 values puts the median between the 9th and 10th values.
median
The lower quartile is ¼ of the way along the data, which is the middle of the left hand half.
ie median = 5
LQ UQ
ie LQ = 2
The upper quartile is ¾ of the way along the data, which is the middle of the right hand half.ie UQ = 6
Inter-quartile range = UQ – LQ = 6 – 2 = 4
eg 1:
eg 2: 1, 1, 1, 2, 3, 3, 4, 5, 6, 7, 7, 8, 8, 9, 9, 10, 12
17 bits of data, so median is the 9th bit of data
median
ie median = 6
Each half (not counting the median) has 8 bits of data, so the quartiles are between the 4th and 5th bit of data.
LQ
LQ = 2.5
UQ
UQ = 8.5 IQR = 6
The time spent (in The time spent (in minutes) by 20 people in minutes) by 20 people in a queue at a bank has a queue at a bank has been recorded as follows:been recorded as follows:
Stem & Leaf Stem & Leaf GraphGraph
3.63.6 2.12.1 3.83.8 2.22.2 4.54.5 1.41.4 00
00 1.61.6 4.84.8 1.51.5 1.91.9 00 3.63.6
5.25.2 2.72.7 3.03.0 0.80.8 3.83.8 3.23.2
55
44
33
22
11
00
6
0
2
1
6
7
8
8
0
2
5
8
5
9
8
4
0
2
0
6
Key: 2 | 7 = 2.7 minutes
55 22
44 55 88
33 00 22 66 66 88 88
22 11 22 77
11 44 55 66 99
00 00 00 00 88
unsortedsorted
Time spent queueing
at the bank
median =
(2.2+2.7) ÷ 2
= 2.45 minsUQ = 3.7 mins
LQ = 1.45 mins
This is an example of a This is an example of a back to back stem and leaf back to back stem and leaf plot.plot.
The numbers are sorted The numbers are sorted from smallest to largest - from smallest to largest - from the centre out.from the centre out.
Discus results for 10SKD
Key | 9 | 4 = 9.4 m
A stem and leaf graph must have• a title• labels (if back-to-back)• a key• the leaf numbers in columns• numerical order (sorted)• no commas between numbers
girls boys 14 0 2 5 13 6 7 3 0 12 2 4 5 3 119 3 3 10 3 5 4 9 4 5 5 8 5 7
Box and Whisker plotBox and Whisker plot
The box and whisker plot is a visual display of the The box and whisker plot is a visual display of the five statistics:five statistics:
Minimum, Lower Quartile, Median, Upper Quartile Minimum, Lower Quartile, Median, Upper Quartile and Maximum.and Maximum.
Min LQ Median UQ Max
A box and whisker graph must have:
• an axis
• a scale on the axis
• a label on the axis (including the units)
• a title
• headings, if side by side
eg: Maths exam results
100
90
80
70
60
50
40
30
20
10
0
mark (%)
boys girls
Comparing dataComparing dataOften two (or more) box and whisker plots are put Often two (or more) box and whisker plots are put
on the same set of axes (‘side-by-side’ box on the same set of axes (‘side-by-side’ box plots). We can then compare the data, plots). We can then compare the data, commenting on:commenting on:
1) ‘on average’ (using the median)1) ‘on average’ (using the median)
2) ‘spread’ (using the range or IQR)2) ‘spread’ (using the range or IQR)
3) ‘shape’ (symmetrical or skewed)3) ‘shape’ (symmetrical or skewed)
4) and a general statement 4) and a general statement
vertically or horizontally
Statistical Investigation
2) Collect the data. This may involve preparing a questionnaire, then deciding on whether the data should be collected from the whole population or from just a sample. If from a sample, how do you choose your sample and how many should be in it?
3) Organise the data collected. Maybe in a frequency table or a stem and leaf plot. Calculate the relevant statistics (eg mean, median, LQ, UQ, range, IQR)4) Illustrate the data collected. Maybe in a box and whisker plot, a pie graph, a bar graph etc5) Write an analysis, ending with a conclusion.
1) Pose the question.
AnalysisAnalysisWhen writing a report to compare two (or more) data sets, there are four things we need to mention:1) On average which is heavier, longer, better etc and quote the values of the medians or means for each set of data.2) The spread of the data by quoting the values of the range or IQR for each set of data.3) The shape of the data (by looking at the bar
graph, stem & leaf graph or box & whisker plot)
We can also mention if there is an outlier ie a value that is significantly bigger or smaller than the rest of the data.
4) A conclusion
ShapeShape::7
654321
7654321
bi-modal uni-modal
skewed to the lower values
skewed to the higher values
symmetrical
skewed
symmetrical
Example: Here is a back-to-back stem and leaf graph comparing the lengths of leaves of sprayed fern plants with those that had not been sprayed.
1 1 2 7 8 9 90 3 3 5
7 7 5 5 0 0
0123456
5 9
2 3 3 50 0 4 4 5 6
1 9
4 4 1
9 8 8 5 3 2 2 07 7 5 3 1
5 3 2
sprayed unsprayed
5 1 = 51 cm
Sprayed:
median =
43 cm
range =45 cm
Unsprayed:minimum = 20 cm
LQ = 29 cm
UQ = 54 cm
maximum = 65 cm IQR = 25 cm
median =
31 cm
range =54 cm
minimum = 5 cm LQ = 17.5
cm
UQ =
maximum = 59 cm IQR =
39.5 cm
22 cm
Sprayed:
median =43 cm
range =45 cm
Unsprayed:
minimum = 20 cm LQ = 29 cm
UQ = 54 cm
maximum = 65 cm IQR = 25 cm
median =31 cm
range =54 cm
minimum = 5 cm LQ = 17.5 cm
UQ =
maximum = 59 cm IQR =
39.5 cm
22 cm
Lengths of leaves of fern plants
Sprayed Unsprayed
0
5
10
15
20
25
30
3540
45
50
55
60
65
70
Length (cm)
Analysis:
unsprayed leavesas their median is 43cm compared with the unsprayed
leaves median of 31cm.The unsprayed leaves lengths
have a greater spread
as their range is 54 cm compared with the sprayed leaves range of 45cm.
The shape of the unsprayed leaves data is uni-modal
whereas for the sprayed leaves it is bi-modal (as seen from the stem & leaf) .
On average, the sprayed leaves have grown longer than the
Lengths of leaves of fern plants
Sprayed Unsprayed
0
5
10
15
20
25
30
3540
45
50
55
60
65
70
Length (cm)
In conclusion, the lengths of sprayed leaves are longer than the lengths of the non-sprayed leaves.
1) The following is a back-to-back stem and leaf graph of the heights of boys and girls in a year 9 class. Work out the relevant statistics, draw the box plot, then write an analysis by filling in the gaps.
14
15
16
17
18
7 9
3 6 7 7 9
1 3 4 5 6 6 7 9
1 2 2 3 3 4 5 6 8 9
1 1 2
5 2 1
9 8 8 6 3 2
8 7 5 3 1 1 1 0 0
7 6 6 4 3 3
9 7 4
girls boys
16 3 = 163cm
Girls: Boys:
median = 161 cm
range = 48 cm
minimum = 141 cm LQ = 158
cm
UQ = 174 cmmaximum = 189
cm IQR = 16 cm
median = 168 cm
range = 35 cm
minimum = 147 cm LQ = 160
cm
UQ = 174.5 cm
maximum = 182 cm IQR = 14.5
cm
On average, the ………..are………………… than the…………. as the median for the boys is………………compared with the girls median of…………..The…………………of heights is greater for the…………….as their …………… is…………………..compared to the………………. range of……………………The shape of the data for the girls is……………………………….. and for the boys is……………………………………(as seen from the box plot)
boys taller girls168cm 161cm
spread
girls range48cm boys 35cm
skewedfairly symmetrical
Heights of year 9 students
Height (cm)
140
145150
155
160
165
170
175
180
185
190 girls boys
In conclusion……………………………………………………………………………the boys are generally taller than the girls.
2) The following is a back-to-back stem and leaf graph comparing the weights of the students in 2 classes. Calculate the relevant statistics, draw the box plot, then write an analysis by filling in the gaps.
7
6
5
4
3
0 5
0 1 3 4 6
3 6 6 7 7 8 9
2 2 3 4 7 7 8 9 9
5 8
1 0
6 4 2 1
9 9 7 6 5 1 0 0
7 5 4 2 2 0
8 7 7 6
Class J Class K
6 3 = 63 kg
Class J: Class K:
median = 50.5 kg
range = 35 kg
minimum = 36 kg LQ = 42 kg
UQ = 60 kg
maximum = 71 kg IQR = 18 kg
median = 56 kg
range = 40 kg
minimum = 35 kg LQ = 45.5
kg
UQ = 60.5 kgmaximum = 75
kg IQR = 15 kg
On average class…….. is heavier than class………because the median for class ….. is……………… compared with the ………………for class …… of………….. The ……………. of data for class K is greater as the ………….. for class K is………… compared with the ……………for class J of ……………. The data shape for class J is……………………………… The data shape for class K is ……………………………………………...
K 56 kg median
J50.5 kg spread range
40 kg range 35 kgfairly
symmetricalskewed
K J
Weights of students
Weight (kg)
30
3540
45
50
55
60
65
70
75
80 Class J
In conclusion………………………………………………………………class K is generally heavier than class J
Class K
3) Here is a back-to-back stem and leaf graph showing the time in minutes for competitors to complete a cross country race. It compares the time of those competitors shorter than 165 cm with those taller than 165cm. It is using a split stem. Calculate the relevant statistics, draw a box plot and then write a report.
7
6
6
5
5
4
4
1 1
5 7 8 9
0 1 1 1 2 2 2 2 3 3
8 8 9
0 0 1 1 1 2 2 2 3 3 4
7 8 8 9
3 4 4
4 0
9 8 7
4 3 3 2
9 8 7 7 6 5
3 3 2 2 2 1 1 0
9 8 6 5
3 2 0
Shorter than 165cm 165cm or taller
Key: 5 2 = 52 mins
Shorter:
median = 54 mins
range = 34 mins
minimum = 40 minsLQ = 50
mins
UQ = 63 minsmaximum = 74
minsIQR = 13 mins
Taller:
median = 58 mins
range = 28 mins
minimum = 43 minsLQ = 50.5
mins
UQ = 62 minsmaximum = 71
minsIQR = 11.5 mins
Times to run a race
Time (mins)
30
3540
45
50
55
60
65
70
75
80
Shorterstudents
Tallerstudents
Report:
On average the taller students were slower than the shorter students as the median for the taller students was 58 mins compared to the shorter students median of 54 mins.
The spread of results was greater for the shorter students as they had a range of 34 mins whereas the taller students had a range of 28 mins.
The data for the shorter students is uni-modal and slightly skewed. The data for the taller students is bi-modal (as shown by the stem & leaf graph.
In conclusion, the height of the students does not greatly affect the running speeds.
Pg 326 Problems 1 & 2
Pg 297 Opening Problem B
Pg 324 # 4
Misleading Graphs
should be
should be
Pg 329 Ex 10L
Because there is 360o around a circle
Pie Graphs
70 ÷ 315 x
On Calculator:
How students of Sancta Maria College travel to school
by bicycle
TranspoTransportrt
NumbNumberer
WalkWalk 7070
CarCar 105105
BusBus 8484
BicycleBicycle 5656
TotalTotal 315315
WorkinWorkingg
AngleAngle
360315
70x 80°
360315
105x 120°
360315
84x 96°
360315
56x 64
°
360 =
Time Series DataData that is collected over time, at regular intervals, is often called ‘time series data’.The data is usually presented on a line graph, with time on the horizontal axis.eg. the following graph shows the number of visitors staying at a motel over a 5 year period.
1998 1999 2000 2001 2002
A time series is used to identify trends and patterns in data over a period of time so as to predict future movements.
Long Term Trend: Whether the measurements are increasing, decreasing or staying fairly constant overall.Seasonal Variations: These are the up and down patterns which recur over a year, month, week or dayShort Term Features: these are irregular fluctuations, unexpected results, outliers.
2) The following is a back-to-back stem and leaf graph comparing the weights of the students in 2 classes. Calculate the relevant statistics, draw the box plot, then write an analysis.
0 50 1 3 4 63 6 6 7 7 8 92 2 3 4 7 7 8 9 95 8
1 06 4 2 1
9 9 7 6 5 1 0 07 5 4 2 2 0
8 7 7 6
Class J Class K
76543
6 3 = 63 kg
1) Here is a back-to-back stem and leaf graph of the heights of boys and girls in a year 9 class. Work out the relevant statistics, draw the box plot, then write an analysis.
1415161718
5 2 19 8 8 6 3 2
8 7 5 3 1 1 1 0 07 6 6 4 3 3
9 7 4
girls boys
16 3 = 163cm
7 93 6 7 7 91 3 4 5 6 6 7 91 2 2 3 3 4 5 6 8 91 1 2
3) Here is a back-to-back stem and leaf graph showing the time in minutes for competitors to complete a cross country race. It compares the time of those competitors shorter than 165 cm with those taller than 165cm. It is using a split stem. Calculate the relevant statistics, draw a box plot and then write a report.
7
6
6
5
5
4
4
1 1
5 7 8 9
0 1 1 1 2 2 2 2 3 3
8 8 9
0 0 1 1 1 2 2 2 3 3 4
7 8 8 9
3 4 4
4 0
9 8 7
4 3 3 2
9 8 7 7 6 5
3 3 2 2 2 1 1 0
9 8 6 5
3 2 0
Shorter than 165cm 165cm or taller
Key: 5 2 = 52 mins