Statistics YEAR TEN MATHS FOR FURTHER SEMESTER TWO

Preview:

DESCRIPTION

Categorical vs Numerical Data Numerical Data is in the form of numbers and can be classed as either: Discrete – Numbers counted in exact values, usually whole numbers. eg. Goals Scored in a footy match, Number of children in a family Continuous – Numbers measured in a continuous decimal scale. eg. Mass of an object, Time, Length, Temperature

Citation preview

StatisticsYEAR TEN MATHS FOR FURTHER

SEMESTER TWO

Categorical vs Numerical Data

Data can be divided in to two major groups –

• Numerical Data

• Categorical Data

Categorical vs Numerical Data

Numerical Data is in the form of numbers and can be classed as either:

Discrete – Numbers counted in exact values, usually whole numbers.eg. Goals Scored in a footy match, Number of children in a family

Continuous – Numbers measured in a continuous decimal scale.eg. Mass of an object, Time, Length, Temperature

Categorical vs Numerical Data

Categorical Data is can be classed in two separate categories:

Nominal – Requires sub-groups (names) to complete the description

eg. Hair Colour (Brown, Blonde, Black etc.)

Ordinal – Requires sub-groups in terms of ranking to order the description

eg. Level of Achievement ( Excellent, Very Good, Good, Poor) Size of Pizza (Small, Medium, Large, Family)

What type of data is…..?The number of goals kicked per match of footy.

The types of vehicles driving along a road.

The sizes of pizza available at a pizza shop.

The varying temperature outside throughout the day.

Numerical – Discrete

Categorical – Nominal

Categorical – Ordinal

Numerical - Continuous

Now Do

Introduction to Statistics Worksheet Question 1

Working with Categorical Data

Once data has been collected, it is important to be able to display it in a meaningful way, using a range of different charts, including:Frequency Tables Graphs – Column / Bar Chart Dot Plots

Gap at the start of the plot & Gaps between each bar

Working with Numerical DataFrequency Tables Histograms

Ungrouped data

Grouped Data

Gap at start, Columns joined

Working with Numerical DataStem and Leaf Plots Dot Plots

Working with Dataeg1. The chart below shows the marital status of 40 respondents to a survey.a) What type of data is this?

b) What type of chart is this? Why?

c) What is the most common marital status and how many respondents are in this category?

d) How many respondents are marked ‘never married’ ?

Categorical – Nominal

Bar chart – gaps between columns

Married - 15 respondents

12

Working with Dataeg. 60 packets of jellybeans were opened and the number of jellybeans within them counted.a) What type of data is this?

b) How many packets had 51 jellybeans?

c) Would we display this data on a histogram or a bar chart?

Why?

d) Plot the data on the chart you chose in part c

Numerical - Discrete

9

Histogram – Numerical Data

Working with Dataeg. 60 packets of jellybeans were opened and the number of jellybeans within them counted.d) Plot the data on the chart you chose in part c

Working with DataClass Hair Colour SurveyGather Data of the students in the classroom and use it to:1. Summarise data using a frequency table2. Represent data using a graph

Working with DataClass Hair Colour Survey

Hair Colour Tally TotalBrownBlondeBlackRed

Other

1. Summarise data using a frequency distribution table

Working with DataClass Hair Colour Survey

2. Represent data using a bar chart

Remember – In a bar chart the bars don’t

touch. Leave gaps!

Brown Blonde Red Black Other

Class Hair Colours

Hair Colour

Frequency

Now Do

Introduction to Statistics Worksheet Question 2

Frequency, Relative Frequency and Percentage Frequency

We can investigate how often a particular event occurs using the following:

Frequency

Relative Frequency

% Frequency

The number of times that a particular event has occurred

The  number   of   times   that   a   particular   event   has   occurred  The   total   number  of  samples   recorded

The relative frequency × 100

Frequency, Relative Frequency and Percentage Frequency

eg1. The frequency table pictured shows the size of 30 pizzas ordered from Pizza Hut on Monday night.

a) Find the Frequency of a Medium Pizza being ordered

b) Find the Relative Frequency of a medium pizza being ordered

c) Find the % Frequency of a medium pizza being ordered

12

𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦×100=( 25 )×100 ¿ 40%

Frequency, Relative Frequency and Percentage Frequency

eg2. A group of 20 people were asked how many times they attended the cinema this month. Results are shown on the histogram.

a) Find the Frequency of attending the cinema twice a month.

b) Find the Relative Frequency of attending the cinema twice a month

c) Find the % Frequency of attending the cinema twice a month

4

𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦×100=( 15 )×100¿20%

Now Do

Introduction to Statistics Worksheet Question 3

Data DistributionWe can name data according to how it’s distributed.

Is it all crammed together or is there more data in certain areas??We associate certain names with different shapes of distribution

• Normal – Most common score in the centre of the data• Skewed – Most common score is toward one end of the data• Bimodal – More than one score that is most frequent• Spread – Data is spread over a wide range• Clustered – Most of the data is confined to a small range

Data DistributionNormally Distributed Data

• The most common score in the centre of the data.• The graph is symmetrical.

Data DistributionSkewed Data

• The most common score is toward one end of the data.• Most data toward the left – Positively Skewed• Most data toward the right – Negatively Skewed

Data DistributionBimodal Data

• More than one score that is most frequent• This looks like two peaks on the graph

Data DistributionSpread Data

Data is rather evenly spread over a wide range

Data DistributionClustered Data

Most of the data is confined to a

small range

Grouping DataFor some sets of data it is appropriate to group it before plotting it. When grouping data, we usually use a group size or ‘class size or interval’ of 5 or 10.

eg1. Group the following 20 test scores using a class size of 10.90, 77, 68, 72, 88, 83, 45, 51, 54, 41, 97, 78, 81, 61,

55, 93, 74, 71, 78, 64Test ScoreTally

Frequency

2 3 3 6 3 3

40 - 50 - 60 - 70 - 80 - 90 -

Grouping Dataeg1. Now represent the grouped data using a histogram

40 50 60 70 80 90 1000 Test score

Frequency

6

54321

Grouping Data

Stem Leaf 4 1, 5 5 1, 4, 5 6 1, 4, 8 7 1, 2, 4, 7, 8, 8 8 1, 3, 8 9 0, 3, 7

eg1. Now represent the grouped data using a stem and leaf plot, with a class size of 1090, 77, 68, 72, 88, 83, 45, 51, 54, 41, 97, 78, 81, 61,

55, 93, 74, 71, 78, 64Key:4 | 1 = 41

Grouping DataFor some sets of data it is appropriate to group it before plotting it. When grouping data, we usually use a group size or ‘class size or interval’ of 5 or 10.

eg2. Group the following 20 scores using a class size of 5.10, 4, 6, 13, 18, 9, 7, 14, 21, 23, 8, 15, 19, 22, 14, 15,

17, 3, 9, 11ScoreTally

Frequency

2 5 5 5 3

0 - 5 - 10 - 15 - 20 -

Grouping Dataeg2. Now represent the grouped data using a histogram

0 5 10 15 20 25 Score

Frequency

5

4

3

2

1

Grouping Data

Stem Leaf 0 3, 4 0 * 6, 7, 8, 9, 9 1 0, 1, 3, 4, 4 1 * 5, 5, 7, 8, 9 2 1, 2, 3

eg2. Now represent the grouped data using a stem and leaf plot, with a class size of 510, 4, 6, 13, 18, 9, 7, 14, 21, 23, 8, 15, 19, 22, 14, 15,

17, 3, 9, 11Key:0 | 3 = 30* | 6 = 6

Now Do

Statistics Worksheet 2 Question 1 and 2

Measures of CentreThe measures that we use to find the ‘centre’ of our data are:

Mean, - The average of the data

Median – Data is ordered from smallest to largest. The middle score is the median

Mode – The most commonly occurring number

MeanMean, - The average of the data

eg. Find the mean of the data set:

4 2 6 7 10 3 7 3 6 7

Solution:

𝑥=𝑠𝑢𝑚𝑜𝑓 𝑎𝑙𝑙𝑣𝑎𝑙𝑢𝑒𝑠(𝑠𝑐𝑜𝑟𝑒𝑠𝑎𝑑𝑑𝑒𝑑 h𝑡𝑜𝑔𝑒𝑡 𝑒𝑟 )

𝑡𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠

MedianMedian - The middle score of an ordered set of data with ‘n’ pieces of data

eg. Find the median of the data set:

4 2 6 7 10 3 7 3 6 7

Solution:

𝑀𝑒𝑑𝑖𝑎𝑛𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=𝑛+12 h𝑡 𝑠𝑐𝑜𝑟𝑒

Write the scores in order smallest to largest 2 3 3 4 6 6 7 7 7 10

Median = 6

Median =

ModeThe Mode - The most commonly occurring number in the set of data

eg. Find the mode of the data set:

4 2 6 7 10 3 7 3 6 7

Solution: You may wish to write the scores in order to ensure all data is accounted for but this is not necessary.

2 3 3 4 6 6 7 7 7 10

There can be one or more than one score which occurs most frequently.

In these cases they are both modes – list them both.

Mode = 7

Measures of CentreMean, - The average of the data Median – Data is ordered from smallest to largest and the middle is the medianMode – The most commonly occurring number

𝑥=𝑠𝑢𝑚𝑜𝑓 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠

eg1. Given the following set of data, find: the mean, the median, the mode.2, 3, 4, 4, 6, 7, 8, 9, 10Mean

Median 2, 3, 4, 4, 6, 7, 8, 9, 10

Mode = 4

𝑀𝑒𝑑𝑖𝑎𝑛𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=𝑛+12 h𝑡 𝑠𝑐𝑜𝑟𝑒=

9+12 =

102 =5 h𝑡 𝑠𝑐𝑜𝑟𝑒

𝑀𝑒𝑑𝑖𝑎𝑛𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛=𝑛+12 h𝑡 𝑠𝑐𝑜𝑟𝑒

Median = 6

Lets try to now solve the same problem using the ‘STATISTICS’ function on our calculators.

2, 3, 4, 4, 6, 7, 8, 9, 10

Using the classpad to solve

Lets try to now solve the same problem using the ‘STATISTICS’ function on our calculators.

2, 3, 4, 4, 6, 7, 8, 9, 10

Using the classpad to solve

Mean

ModeMedian

Now Do

Statistics Worksheet 2Question 3

Then Do

Work RecordExercise 9A pg564 Questions 1, 2, 3, 4, 5, 6, 7, 8, 11

QuartilesAnother way to analyse a set of data is to create a 5-figure summary.

These summarise the data in terms of quartiles – ie. it divides the data set into quarters.

To create a 5-figure summary we find the following:

• Minimum Value (Min)• Lower Quartile (Q1) – The number 25% (a quarter) through the data• Median (Q2) – The number 50% (halfway/the centre) through the data• Upper Quartile (Q3) – The number 75% (three quarters) through the data• Maximum Value (Max)

What does a 5-figure summary look like?• Minimum Value (Min)• Lower Quartile (Q1) – The number 25% (a quarter) through the data• Median (Q2) – The number 50% (halfway/the centre) through the data• Upper Quartile (Q3) – The number 75% (three quarters) through the data• Maximum Value (Max)

eg. Find the 5-figure summary for the data set: 1, 1, 3, 4, 5, 6, 7,

7, 8 • Min = 1• Max = 8• Med (Q2) = 5

Q1 = 2 Q3 = 7Min, Q1, Med, Q3, Max

1, 2, 5, 7, 8

What does a 5-figure summary look like?• Minimum Value (Min)• Lower Quartile (Q1) – The number 25% (a quarter) through the data• Median (Q2) – The number 50% (halfway/the centre) through the data• Upper Quartile (Q3) – The number 75% (three quarters) through the data• Maximum Value (Max)

eg. Find the 5-figure summary for the data set: 10, 11, 13, 13, 15,

16, 17, 19 • Min = 10• Max = 19• Med (Q2) = 14

Q1 = 12 Q3 = 16.5Min, Q1, Med, Q3, Max

10, 12, 14, 16.5, 19Q2 = 14

What does a 5-figure summary look like?Lets confirm our result using the calculator:

10, 11, 13, 13, 15, 16, 17, 19 Min, Q1, Med, Q3, Max

10, 12, 14, 16.5, 19

What does a 5-figure summary look like?Lets confirm our result using the calculator:

10, 11, 13, 13, 15, 16, 17, 19 Min, Q1, Med, Q3, Max

10, 12, 14, 16.5, 19

Now Do

Worksheet 3Question 1

Measures of SpreadWe can use the following to determine how spread out our data set is.

Range

Interquartile Range (IQR)

eg. Find the Range and the Interquartile Range for the data set:3, 3, 4, 6, 7, 8, 10

Range = 10 – 3 = 7IQR = 8 – 3 = 5

= Maximum Value – Minimum Value

= Q3 – Q1

Measures of SpreadWe can use the following to determine how spread out our data set is.

Range

Interquartile Range (IQR)

eg. Find the Range and the Interquartile Range for the data set:13, 14, 18, 20, 23,

28, 30 Range = 30 – 13 = 17IQR = 28 – 14 = 14

= Maximum Value – Minimum Value

= Q3 – Q1

IQR gives a good indication of

spread when we have small or

large values that may not best

reflect our data set

Now Do

Worksheet 3Question 2

OutliersSome data sets include large or small values that don’t match the rest

of the data.This can sometimes give us values for measures of centre and

measures of spread that isn’t the best representation of the data.A value is considered an ‘outlier’ if it’s:

Value is Less than Q1 – (1.5 x IQR )

Value is Greater than Q3 + (1.5 x IQR)

Outliers

eg. Decide if the following data set includes an outlier.1, 5, 6, 7, 7, 9, 16

Step 1: Find Q1 and Q3 Q1 = 5, Q3 = 9

Step 2: Find the IQR IQR = 9 – 5 = 4

A value is considered an ‘outlier’ if it’s:Value is Less than Q1 - 1.5 x IQR Value is Greater than Q3 + 1.5 x IQR

Step 3: Check lower end:

Step 4: Check upper end:

The value 16 is an outlier as it is bigger than the upper end of our allowed values.

Now Do

Worksheet 3Question 3

Then Exercise 9B – Q1, 2, 3, 4b, 4d, 5, 6, 7, 8, 9a, 9c, 10

BoxplotsThe 5-figure summary (Min, Q1, Q2, Q3, Max) can be represented in graphical form using a boxplot.

If our data set includes outliers, we represent these using a cross on the plot.

The line of the boxplot end at our next largest (or smallest) value

Boxplots are always drawn to scale with a ruled, labelled axis at the base of the plot

Boxplots

Scale

Xmin (Lowest Score)Q1 ( through the data)Median (Half way through data)Q3 (75% through the data)Xmax (Highest Score)

XmaxXminQ1 Q3Median

Boxplotseg. A set of data gives the 5-figure summary 2, 5, 9, 13, 18.

Represent this using a boxplot.

1825 139

Boxplotseg. Draw the boxplot for the data set: 3, 4, 4, 5, 6, 6, 7, 9, 11, 12, 15

Are there any outliers?

153 4 116

𝐿𝑜𝑤𝑒𝑟 𝑒𝑛𝑑:𝑄 1− (1.5× 𝐼𝑄𝑅 )=4− (1.5×7 )=4−10.5=−6.5𝑈𝑝𝑝𝑒𝑟 𝑒𝑛𝑑 :𝑄 3+(1.5× 𝐼𝑄𝑅 )=11+(1.5×7 )=11+10.5=21.5

No outliers

Boxplotseg. Draw the boxplot for the data set: 2, 3, 5, 8, 9, 9, 10, 10, 13, 20

Are there any outliers?𝐿𝑜𝑤𝑒𝑟 𝑒𝑛𝑑:𝑄 1− (1.5× 𝐼𝑄𝑅 )=5− (1.5×5 )=5−7.5=−2.5𝑈𝑝𝑝𝑒𝑟 𝑒𝑛𝑑 :𝑄 3+(1.5× 𝐼𝑄𝑅 )=10+(1.5×5 )=10+7.5=17.5

Outlier = 20

x

Lets see how we

can use a calculator

to plot these

Boxplotseg. Draw the boxplot for the data set: 2, 3, 5, 8, 9, 9, 10, 10, 13, 20

Boxplotseg. Draw the boxplot for the data set: 2, 3, 5, 8, 9, 9, 10, 10, 13, 20

Use zoom ‘Box’ to get a better view of the plot

To see the points on the plot, use Analysis

‘Trace’.Use your arrow keys to

move from point to point

etc..

Parallel BoxplotsWe can easily compare sets of data using parallel boxplots.

These consist of two of more boxplots drawn together using the same scale.

Given the parallel boxplots above:

What statistical measures do they have in common?Which group of data A or B is most spread out?Which group has the largest Q1 value? What is it?

Same values for med (14) and Q3 (17)Group B – Largest Range and IQR

Group A - 13

Now Do

Exercise 9CQ1, 2, 3, 4a, 4c, 5a, 5c, 6, 7, 9, 11

Time Series DataA time series is a sequence of data values that are recorded at regular

time intervals.The data is something meaningful that we monitor over a period of time,

such as:

• Temperature monitored every hour throughout the day• Monthly Average Temperature monitored throughout the year• Share price fluctuations monitored hourly/daily/monthly etc.

The time component is drawn on the x-axis

Data is plotted on the graph as dots, Joined together with lines

• Linear – Straight (or almost straight) line

• Non-Linear (Curve) – Data forms a curve

• No Trend – Data fluctuates.

Describing Trends

eg. The plot below shows the change in population of a country town from 1990 to 2005.

a) What is the population in the year 2000?b) What is the lowest population recorded?c) State the trend of the dataThe population declines steadily for the first 9 years, before rising and falling in the final 5 years, resulting in a slight upward trend.

Describing Trends800

700

eg. A company’s share price over 12 months is recorded each month, given on the table below.

a) Plot the time series graph of the data (start your y-axis data at $1.20).

b) Describe the way the share price has changed over the year.

The share price generally increased from January to June (from $1.30 to a peak of $1.43), with a small drop of $0.01 in April. After June, the price declines steadily to a low of $1.22 before trending upward to $1.23 in December

Now Do

Exercise 9DQ1, 2, 3, 5, 6, 7

Bivariate Data• Bivariate Data involves comparing data that includes two variables.

• We analyse the data by plotting the data on scatterplot.

• We look at the direction and shape of data on the plot and from this

we can state the strength of the relationship between the two

variables – we call this the ‘correlation’.

Positive CorrelationWhat does this look like?

StrongPositiv

e

WeakPositiv

e

Negative CorrelationWhat does this look like?

StrongNegativ

e

WeakNegati

ve

No CorrelationWhat does this look like?

Bivariate Dataeg. Draw the scatterplot for the data and comment on the correlation of

the datax 1 1 2 3 4 5 6 7 8 8 9 10 11y 10 9 11 13 15 17 18 18 20 19 22 24 25

We look at the pattern that the points have made – the dots could form a straight line in a positive direction, so we can say the data

has a strong positive correlation

Now Do

Exercise 9EQ1, 2a, 3, 4, 5a, 6, 7, 9

Line of best fit• When bivariate data has a strong linear correlation, we can model

the data with a line of best fit.

• We fit the line ‘by eye’ to try and balance the data points above the line with points below the line.

• What does it look like?......

Drawing the line of best fitWe fit the line ‘by eye’ to try and balance the data points above the line with points below

the line.

Now Do

Worksheet – Part 1Drawing a line of best fit

Writing the equation for line of best fit• Look at the line of best fit…..Could we find the equation that

matches the line?• Think back to linear graphs….

Find 2 points on the line (100, 200) and (600, 700) Use these to find the gradient of the line

Use the gradient and one point to find the y-intercept

𝑦=𝑥+100

Writing the equation for line of best fit• Find the equation for the line of best fit given on the plot below.

Find 2 points on the line (3, 8) and (9, 20) Use these to find the gradient of the line

Use the gradient and one point to find the y-intercept

𝑦=2 𝑥+2

Now Do

Worksheet – Part 2Forming the equation for the line of best fit

Using the line of best fit to make predictions• We can predict values using the line of best fit.• If we have the line, we can line up values on the line to its

corresponding values on the plot.eg. Predict the value of y when x = 40 Answer: Approx. y = 42If we knew the equation for the line was we could also predict the value.

Using the line of best fit to make predictions

eg. Predict the value of y when x = 30 Answer: Approx. y = 37We know the equation for the line is , predict the value of y when x = 30

Now Do

Worksheet – Part 3Making predictions using the line of best fit

Then Do

Exercise 9F Q1, 2, 4abc, 5, 6

Recommended