Upload
abraham-fowler
View
232
Download
1
Embed Size (px)
DESCRIPTION
Categorical vs Numerical Data Numerical Data is in the form of numbers and can be classed as either: Discrete – Numbers counted in exact values, usually whole numbers. eg. Goals Scored in a footy match, Number of children in a family Continuous – Numbers measured in a continuous decimal scale. eg. Mass of an object, Time, Length, Temperature
Citation preview
StatisticsYEAR TEN MATHS FOR FURTHER
SEMESTER TWO
Categorical vs Numerical Data
Data can be divided in to two major groups –
• Numerical Data
• Categorical Data
Categorical vs Numerical Data
Numerical Data is in the form of numbers and can be classed as either:
Discrete – Numbers counted in exact values, usually whole numbers.eg. Goals Scored in a footy match, Number of children in a family
Continuous – Numbers measured in a continuous decimal scale.eg. Mass of an object, Time, Length, Temperature
Categorical vs Numerical Data
Categorical Data is can be classed in two separate categories:
Nominal – Requires sub-groups (names) to complete the description
eg. Hair Colour (Brown, Blonde, Black etc.)
Ordinal – Requires sub-groups in terms of ranking to order the description
eg. Level of Achievement ( Excellent, Very Good, Good, Poor) Size of Pizza (Small, Medium, Large, Family)
What type of data is…..?The number of goals kicked per match of footy.
The types of vehicles driving along a road.
The sizes of pizza available at a pizza shop.
The varying temperature outside throughout the day.
Numerical – Discrete
Categorical – Nominal
Categorical – Ordinal
Numerical - Continuous
Now Do
Introduction to Statistics Worksheet Question 1
Working with Categorical Data
Once data has been collected, it is important to be able to display it in a meaningful way, using a range of different charts, including:Frequency Tables Graphs – Column / Bar Chart Dot Plots
Gap at the start of the plot & Gaps between each bar
Working with Numerical DataFrequency Tables Histograms
Ungrouped data
Grouped Data
Gap at start, Columns joined
Working with Numerical DataStem and Leaf Plots Dot Plots
Working with Dataeg1. The chart below shows the marital status of 40 respondents to a survey.a) What type of data is this?
b) What type of chart is this? Why?
c) What is the most common marital status and how many respondents are in this category?
d) How many respondents are marked ‘never married’ ?
Categorical – Nominal
Bar chart – gaps between columns
Married - 15 respondents
12
Working with Dataeg. 60 packets of jellybeans were opened and the number of jellybeans within them counted.a) What type of data is this?
b) How many packets had 51 jellybeans?
c) Would we display this data on a histogram or a bar chart?
Why?
d) Plot the data on the chart you chose in part c
Numerical - Discrete
9
Histogram – Numerical Data
Working with Dataeg. 60 packets of jellybeans were opened and the number of jellybeans within them counted.d) Plot the data on the chart you chose in part c
Working with DataClass Hair Colour SurveyGather Data of the students in the classroom and use it to:1. Summarise data using a frequency table2. Represent data using a graph
Working with DataClass Hair Colour Survey
Hair Colour Tally TotalBrownBlondeBlackRed
Other
1. Summarise data using a frequency distribution table
Working with DataClass Hair Colour Survey
2. Represent data using a bar chart
Remember – In a bar chart the bars don’t
touch. Leave gaps!
Brown Blonde Red Black Other
Class Hair Colours
Hair Colour
Frequency
Now Do
Introduction to Statistics Worksheet Question 2
Frequency, Relative Frequency and Percentage Frequency
We can investigate how often a particular event occurs using the following:
Frequency
Relative Frequency
% Frequency
The number of times that a particular event has occurred
The number of times that a particular event has occurred The total number of samples recorded
The relative frequency × 100
Frequency, Relative Frequency and Percentage Frequency
eg1. The frequency table pictured shows the size of 30 pizzas ordered from Pizza Hut on Monday night.
a) Find the Frequency of a Medium Pizza being ordered
b) Find the Relative Frequency of a medium pizza being ordered
c) Find the % Frequency of a medium pizza being ordered
12
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦×100=( 25 )×100 ¿ 40%
Frequency, Relative Frequency and Percentage Frequency
eg2. A group of 20 people were asked how many times they attended the cinema this month. Results are shown on the histogram.
a) Find the Frequency of attending the cinema twice a month.
b) Find the Relative Frequency of attending the cinema twice a month
c) Find the % Frequency of attending the cinema twice a month
4
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦×100=( 15 )×100¿20%
Now Do
Introduction to Statistics Worksheet Question 3
Data DistributionWe can name data according to how it’s distributed.
Is it all crammed together or is there more data in certain areas??We associate certain names with different shapes of distribution
• Normal – Most common score in the centre of the data• Skewed – Most common score is toward one end of the data• Bimodal – More than one score that is most frequent• Spread – Data is spread over a wide range• Clustered – Most of the data is confined to a small range
Data DistributionNormally Distributed Data
• The most common score in the centre of the data.• The graph is symmetrical.
Data DistributionSkewed Data
• The most common score is toward one end of the data.• Most data toward the left – Positively Skewed• Most data toward the right – Negatively Skewed
Data DistributionBimodal Data
• More than one score that is most frequent• This looks like two peaks on the graph
Data DistributionSpread Data
Data is rather evenly spread over a wide range
Data DistributionClustered Data
Most of the data is confined to a
small range
Grouping DataFor some sets of data it is appropriate to group it before plotting it. When grouping data, we usually use a group size or ‘class size or interval’ of 5 or 10.
eg1. Group the following 20 test scores using a class size of 10.90, 77, 68, 72, 88, 83, 45, 51, 54, 41, 97, 78, 81, 61,
55, 93, 74, 71, 78, 64Test ScoreTally
Frequency
2 3 3 6 3 3
40 - 50 - 60 - 70 - 80 - 90 -
Grouping Dataeg1. Now represent the grouped data using a histogram
40 50 60 70 80 90 1000 Test score
Frequency
6
54321
Grouping Data
Stem Leaf 4 1, 5 5 1, 4, 5 6 1, 4, 8 7 1, 2, 4, 7, 8, 8 8 1, 3, 8 9 0, 3, 7
eg1. Now represent the grouped data using a stem and leaf plot, with a class size of 1090, 77, 68, 72, 88, 83, 45, 51, 54, 41, 97, 78, 81, 61,
55, 93, 74, 71, 78, 64Key:4 | 1 = 41
Grouping DataFor some sets of data it is appropriate to group it before plotting it. When grouping data, we usually use a group size or ‘class size or interval’ of 5 or 10.
eg2. Group the following 20 scores using a class size of 5.10, 4, 6, 13, 18, 9, 7, 14, 21, 23, 8, 15, 19, 22, 14, 15,
17, 3, 9, 11ScoreTally
Frequency
2 5 5 5 3
0 - 5 - 10 - 15 - 20 -
Grouping Dataeg2. Now represent the grouped data using a histogram
0 5 10 15 20 25 Score
Frequency
5
4
3
2
1
Grouping Data
Stem Leaf 0 3, 4 0 * 6, 7, 8, 9, 9 1 0, 1, 3, 4, 4 1 * 5, 5, 7, 8, 9 2 1, 2, 3
eg2. Now represent the grouped data using a stem and leaf plot, with a class size of 510, 4, 6, 13, 18, 9, 7, 14, 21, 23, 8, 15, 19, 22, 14, 15,
17, 3, 9, 11Key:0 | 3 = 30* | 6 = 6
Now Do
Statistics Worksheet 2 Question 1 and 2
Measures of CentreThe measures that we use to find the ‘centre’ of our data are:
Mean, - The average of the data
Median – Data is ordered from smallest to largest. The middle score is the median
Mode – The most commonly occurring number
MeanMean, - The average of the data
eg. Find the mean of the data set:
4 2 6 7 10 3 7 3 6 7
Solution:
𝑥=𝑠𝑢𝑚𝑜𝑓 𝑎𝑙𝑙𝑣𝑎𝑙𝑢𝑒𝑠(𝑠𝑐𝑜𝑟𝑒𝑠𝑎𝑑𝑑𝑒𝑑 h𝑡𝑜𝑔𝑒𝑡 𝑒𝑟 )
𝑡𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
MedianMedian - The middle score of an ordered set of data with ‘n’ pieces of data
eg. Find the median of the data set:
4 2 6 7 10 3 7 3 6 7
Solution:
𝑀𝑒𝑑𝑖𝑎𝑛𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=𝑛+12 h𝑡 𝑠𝑐𝑜𝑟𝑒
Write the scores in order smallest to largest 2 3 3 4 6 6 7 7 7 10
Median = 6
Median =
ModeThe Mode - The most commonly occurring number in the set of data
eg. Find the mode of the data set:
4 2 6 7 10 3 7 3 6 7
Solution: You may wish to write the scores in order to ensure all data is accounted for but this is not necessary.
2 3 3 4 6 6 7 7 7 10
There can be one or more than one score which occurs most frequently.
In these cases they are both modes – list them both.
Mode = 7
Measures of CentreMean, - The average of the data Median – Data is ordered from smallest to largest and the middle is the medianMode – The most commonly occurring number
𝑥=𝑠𝑢𝑚𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠
eg1. Given the following set of data, find: the mean, the median, the mode.2, 3, 4, 4, 6, 7, 8, 9, 10Mean
Median 2, 3, 4, 4, 6, 7, 8, 9, 10
Mode = 4
𝑀𝑒𝑑𝑖𝑎𝑛𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=𝑛+12 h𝑡 𝑠𝑐𝑜𝑟𝑒=
9+12 =
102 =5 h𝑡 𝑠𝑐𝑜𝑟𝑒
𝑀𝑒𝑑𝑖𝑎𝑛𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=𝑛+12 h𝑡 𝑠𝑐𝑜𝑟𝑒
Median = 6
Lets try to now solve the same problem using the ‘STATISTICS’ function on our calculators.
2, 3, 4, 4, 6, 7, 8, 9, 10
Using the classpad to solve
Lets try to now solve the same problem using the ‘STATISTICS’ function on our calculators.
2, 3, 4, 4, 6, 7, 8, 9, 10
Using the classpad to solve
Mean
ModeMedian
Now Do
Statistics Worksheet 2Question 3
Then Do
Work RecordExercise 9A pg564 Questions 1, 2, 3, 4, 5, 6, 7, 8, 11
QuartilesAnother way to analyse a set of data is to create a 5-figure summary.
These summarise the data in terms of quartiles – ie. it divides the data set into quarters.
To create a 5-figure summary we find the following:
• Minimum Value (Min)• Lower Quartile (Q1) – The number 25% (a quarter) through the data• Median (Q2) – The number 50% (halfway/the centre) through the data• Upper Quartile (Q3) – The number 75% (three quarters) through the data• Maximum Value (Max)
What does a 5-figure summary look like?• Minimum Value (Min)• Lower Quartile (Q1) – The number 25% (a quarter) through the data• Median (Q2) – The number 50% (halfway/the centre) through the data• Upper Quartile (Q3) – The number 75% (three quarters) through the data• Maximum Value (Max)
eg. Find the 5-figure summary for the data set: 1, 1, 3, 4, 5, 6, 7,
7, 8 • Min = 1• Max = 8• Med (Q2) = 5
Q1 = 2 Q3 = 7Min, Q1, Med, Q3, Max
1, 2, 5, 7, 8
What does a 5-figure summary look like?• Minimum Value (Min)• Lower Quartile (Q1) – The number 25% (a quarter) through the data• Median (Q2) – The number 50% (halfway/the centre) through the data• Upper Quartile (Q3) – The number 75% (three quarters) through the data• Maximum Value (Max)
eg. Find the 5-figure summary for the data set: 10, 11, 13, 13, 15,
16, 17, 19 • Min = 10• Max = 19• Med (Q2) = 14
Q1 = 12 Q3 = 16.5Min, Q1, Med, Q3, Max
10, 12, 14, 16.5, 19Q2 = 14
What does a 5-figure summary look like?Lets confirm our result using the calculator:
10, 11, 13, 13, 15, 16, 17, 19 Min, Q1, Med, Q3, Max
10, 12, 14, 16.5, 19
What does a 5-figure summary look like?Lets confirm our result using the calculator:
10, 11, 13, 13, 15, 16, 17, 19 Min, Q1, Med, Q3, Max
10, 12, 14, 16.5, 19
Now Do
Worksheet 3Question 1
Measures of SpreadWe can use the following to determine how spread out our data set is.
Range
Interquartile Range (IQR)
eg. Find the Range and the Interquartile Range for the data set:3, 3, 4, 6, 7, 8, 10
Range = 10 – 3 = 7IQR = 8 – 3 = 5
= Maximum Value – Minimum Value
= Q3 – Q1
Measures of SpreadWe can use the following to determine how spread out our data set is.
Range
Interquartile Range (IQR)
eg. Find the Range and the Interquartile Range for the data set:13, 14, 18, 20, 23,
28, 30 Range = 30 – 13 = 17IQR = 28 – 14 = 14
= Maximum Value – Minimum Value
= Q3 – Q1
IQR gives a good indication of
spread when we have small or
large values that may not best
reflect our data set
Now Do
Worksheet 3Question 2
OutliersSome data sets include large or small values that don’t match the rest
of the data.This can sometimes give us values for measures of centre and
measures of spread that isn’t the best representation of the data.A value is considered an ‘outlier’ if it’s:
Value is Less than Q1 – (1.5 x IQR )
Value is Greater than Q3 + (1.5 x IQR)
Outliers
eg. Decide if the following data set includes an outlier.1, 5, 6, 7, 7, 9, 16
Step 1: Find Q1 and Q3 Q1 = 5, Q3 = 9
Step 2: Find the IQR IQR = 9 – 5 = 4
A value is considered an ‘outlier’ if it’s:Value is Less than Q1 - 1.5 x IQR Value is Greater than Q3 + 1.5 x IQR
Step 3: Check lower end:
Step 4: Check upper end:
The value 16 is an outlier as it is bigger than the upper end of our allowed values.
Now Do
Worksheet 3Question 3
Then Exercise 9B – Q1, 2, 3, 4b, 4d, 5, 6, 7, 8, 9a, 9c, 10
BoxplotsThe 5-figure summary (Min, Q1, Q2, Q3, Max) can be represented in graphical form using a boxplot.
If our data set includes outliers, we represent these using a cross on the plot.
The line of the boxplot end at our next largest (or smallest) value
Boxplots are always drawn to scale with a ruled, labelled axis at the base of the plot
Boxplots
Scale
Xmin (Lowest Score)Q1 ( through the data)Median (Half way through data)Q3 (75% through the data)Xmax (Highest Score)
XmaxXminQ1 Q3Median
Boxplotseg. A set of data gives the 5-figure summary 2, 5, 9, 13, 18.
Represent this using a boxplot.
1825 139
Boxplotseg. Draw the boxplot for the data set: 3, 4, 4, 5, 6, 6, 7, 9, 11, 12, 15
Are there any outliers?
153 4 116
𝐿𝑜𝑤𝑒𝑟 𝑒𝑛𝑑:𝑄 1− (1.5× 𝐼𝑄𝑅 )=4− (1.5×7 )=4−10.5=−6.5𝑈𝑝𝑝𝑒𝑟 𝑒𝑛𝑑 :𝑄 3+(1.5× 𝐼𝑄𝑅 )=11+(1.5×7 )=11+10.5=21.5
No outliers
Boxplotseg. Draw the boxplot for the data set: 2, 3, 5, 8, 9, 9, 10, 10, 13, 20
Are there any outliers?𝐿𝑜𝑤𝑒𝑟 𝑒𝑛𝑑:𝑄 1− (1.5× 𝐼𝑄𝑅 )=5− (1.5×5 )=5−7.5=−2.5𝑈𝑝𝑝𝑒𝑟 𝑒𝑛𝑑 :𝑄 3+(1.5× 𝐼𝑄𝑅 )=10+(1.5×5 )=10+7.5=17.5
Outlier = 20
x
Lets see how we
can use a calculator
to plot these
Boxplotseg. Draw the boxplot for the data set: 2, 3, 5, 8, 9, 9, 10, 10, 13, 20
Boxplotseg. Draw the boxplot for the data set: 2, 3, 5, 8, 9, 9, 10, 10, 13, 20
Use zoom ‘Box’ to get a better view of the plot
To see the points on the plot, use Analysis
‘Trace’.Use your arrow keys to
move from point to point
etc..
Parallel BoxplotsWe can easily compare sets of data using parallel boxplots.
These consist of two of more boxplots drawn together using the same scale.
Given the parallel boxplots above:
What statistical measures do they have in common?Which group of data A or B is most spread out?Which group has the largest Q1 value? What is it?
Same values for med (14) and Q3 (17)Group B – Largest Range and IQR
Group A - 13
Now Do
Exercise 9CQ1, 2, 3, 4a, 4c, 5a, 5c, 6, 7, 9, 11
Time Series DataA time series is a sequence of data values that are recorded at regular
time intervals.The data is something meaningful that we monitor over a period of time,
such as:
• Temperature monitored every hour throughout the day• Monthly Average Temperature monitored throughout the year• Share price fluctuations monitored hourly/daily/monthly etc.
The time component is drawn on the x-axis
Data is plotted on the graph as dots, Joined together with lines
• Linear – Straight (or almost straight) line
• Non-Linear (Curve) – Data forms a curve
• No Trend – Data fluctuates.
Describing Trends
eg. The plot below shows the change in population of a country town from 1990 to 2005.
a) What is the population in the year 2000?b) What is the lowest population recorded?c) State the trend of the dataThe population declines steadily for the first 9 years, before rising and falling in the final 5 years, resulting in a slight upward trend.
Describing Trends800
700
eg. A company’s share price over 12 months is recorded each month, given on the table below.
a) Plot the time series graph of the data (start your y-axis data at $1.20).
b) Describe the way the share price has changed over the year.
The share price generally increased from January to June (from $1.30 to a peak of $1.43), with a small drop of $0.01 in April. After June, the price declines steadily to a low of $1.22 before trending upward to $1.23 in December
Now Do
Exercise 9DQ1, 2, 3, 5, 6, 7
Bivariate Data• Bivariate Data involves comparing data that includes two variables.
• We analyse the data by plotting the data on scatterplot.
• We look at the direction and shape of data on the plot and from this
we can state the strength of the relationship between the two
variables – we call this the ‘correlation’.
Positive CorrelationWhat does this look like?
StrongPositiv
e
WeakPositiv
e
Negative CorrelationWhat does this look like?
StrongNegativ
e
WeakNegati
ve
No CorrelationWhat does this look like?
Bivariate Dataeg. Draw the scatterplot for the data and comment on the correlation of
the datax 1 1 2 3 4 5 6 7 8 8 9 10 11y 10 9 11 13 15 17 18 18 20 19 22 24 25
We look at the pattern that the points have made – the dots could form a straight line in a positive direction, so we can say the data
has a strong positive correlation
Now Do
Exercise 9EQ1, 2a, 3, 4, 5a, 6, 7, 9
Line of best fit• When bivariate data has a strong linear correlation, we can model
the data with a line of best fit.
• We fit the line ‘by eye’ to try and balance the data points above the line with points below the line.
• What does it look like?......
Drawing the line of best fitWe fit the line ‘by eye’ to try and balance the data points above the line with points below
the line.
Now Do
Worksheet – Part 1Drawing a line of best fit
Writing the equation for line of best fit• Look at the line of best fit…..Could we find the equation that
matches the line?• Think back to linear graphs….
Find 2 points on the line (100, 200) and (600, 700) Use these to find the gradient of the line
Use the gradient and one point to find the y-intercept
𝑦=𝑥+100
Writing the equation for line of best fit• Find the equation for the line of best fit given on the plot below.
Find 2 points on the line (3, 8) and (9, 20) Use these to find the gradient of the line
Use the gradient and one point to find the y-intercept
𝑦=2 𝑥+2
Now Do
Worksheet – Part 2Forming the equation for the line of best fit
Using the line of best fit to make predictions• We can predict values using the line of best fit.• If we have the line, we can line up values on the line to its
corresponding values on the plot.eg. Predict the value of y when x = 40 Answer: Approx. y = 42If we knew the equation for the line was we could also predict the value.
Using the line of best fit to make predictions
eg. Predict the value of y when x = 30 Answer: Approx. y = 37We know the equation for the line is , predict the value of y when x = 30
Now Do
Worksheet – Part 3Making predictions using the line of best fit
Then Do
Exercise 9F Q1, 2, 4abc, 5, 6