41
Statist Statist ics ics

Statistics

Embed Size (px)

DESCRIPTION

Statistics. Types of Data. Categorical data is data which has worded categories eg ‘ Ways of getting to school ’ might have the categories bus, car, walk, bike. Quantitative data is numerical data. It will either be discrete or continuous. - PowerPoint PPT Presentation

Citation preview

Page 1: Statistics

StatisticsStatistics

Page 2: Statistics

Types of DataTypes of Data• Categorical data is data which has worded categories eg ‘Ways of getting to school’ might have the categories bus, car, walk, bike.

• Quantitative data is numerical data. It will either be discrete or continuous.

• Discrete data is countable eg number of peas in a pod, number of pupils in a class.

• Continuous data is measurable eg lengths of leaves, heights of people, weights of animals etc

Page 3: Statistics

DefinitionsDefinitions• The population is the group about which you wish to collect information.

• A census involves collecting data from every person in the population.

• A sample involves collecting data from those in a part of the population.

• A sample should be unbiased and therefore representative of the entire population.

• A biased sample is one which has been unfairly influenced by the collection process.

Page 4: Statistics

Ways of Displaying DataWays of Displaying Data

1. Bar Graph (for discrete data)

2. Histogram (for continuous data)

3. Dot Plot

4. Strip Graph

5. Pictogram

6. Pie Graph

7. Stem and Leaf

8. Box and Whisker

9. Line Graph

*

*

*

*

*

*

* = most important ones

Page 5: Statistics

Frequency Tables Frequency Tables Here are the number of questions the 27 students in 9HRT got correct out of the 8 ‘Do Now’ revision questions at the start of a lesson:5, 5, 4, 6, 4, 4, 5, 5, 6, 4, 4, 3, 7, 4, 6, 5, 5, 5,

4, 6, 5, 6, 4, 5, 4, 5, 5

We illustrate this in a frequency table as follows:

Number of Number of questions questions correctcorrect

Tally MarksTally Marks FrequencyFrequency

33

44

55

66

77

TotalTotal

1

9

11

5

1

27

Page 6: Statistics

'Do Now' Questions Correct

0123456789

101112

1 2 3 4 5 6 7 8

Number Correct

Fre

quency

Number of Number of questions questions correctcorrect

FrequencFrequencyy

33 11

44 99

55 1111

66 55

77 11

For discrete data, we graph the information in a bar graph.

A bar graph must have:• a title• labels on both axes• scale on both axes• separate bars• x-axis numbers in centre of the bar• y-axis scale starting at 0 with evenly spaced numbers• a good size – at least ⅓

page

Discrete Data

Page 7: Statistics

OutliersOutliers are data values that are either much larger or much smaller than the general body of data.

Outliers appear separated from the body of data on a graph.Number of peas in pods from my garden

Frequency

Page 8: Statistics

Grouped Discrete DataA kindergarten was concerned about the number of

cars passing by between 8.45 am and 9.00 am. Over 30 week days they recorded data.The results were: 27, 30, 17, 13, 46, 23, 40, 28, 38, 24, 23, 22, 18, 29, 16, 35, 24, 8, 24, 44, 32, 52, 31, 39, 32, 19, 41, 38, 24, 32In situations like this, it is necessary to group the data into class intervals.

Page 9: Statistics

Histogram:

Cars passing by kindergarten

Frequency

0 10 20 30 40 50 60

Number of cars

Frequency Table:

A histogram is then drawn as shown below.

Note the difference in the scale on the horizontal axis.

Page 10: Statistics

Continuous DataFor continuous data, we graph the information in a histogram.Example: 2 students lie in the 60 kg up to but not including 70 kg, 7 students lie in the 70 kg up to but not including 80 kg, 9 students lie in the 80 kg up to but not including 90 kg, 5 students lie in the 90 kg up to but not including 100 kg, 3 students lie in the 100 kg up to but not including 110 kg.

The frequency table would be:

The graph would be:

Frequency

Weights of boys in the rugby squad.

Rugby

Squad

Note: The bars are JOINED

Page 11: Statistics

A bar graph must have:• a title• labels on both axes• scale on both axes• separate bars• x-axis numbers in centre of the bar• y-axis scale starting at 0 with evenly spaced numbers• a good size – at least ⅓ page

histogram

bars joinedat the join of the bars

Page 12: Statistics

AveragesAveragesAn average is a number that is typical of the An average is a number that is typical of the

data.data.

In maths we use three different types of averageIn maths we use three different types of average

1. 1.

2.2.

3.3.

median = middle value

mode = most common value

Example: the mean of 5, 0, 8, 1, 0, 4, 3, 0, 2, 2

10

2203401805

5.210

25

valuesofnumber

valuesalloftotalmean =

Page 13: Statistics

The The medianmedian is the middle value when they are all is the middle value when they are all

placed in order.placed in order.

For an For an odd number odd number of data, the median is the of data, the median is the oneone in in the middle.the middle.

For an For an even numbereven number of data, the median is between the of data, the median is between the two middle values.two middle values.

Example: The median of: 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4Example: The median of: 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4

In order, the data is: 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8In order, the data is: 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 8

There are 13 bits of data so the median is the 7th

bitmedian = 5

Example: The median of 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8

There are 14 bits of data so the median is between the 7th and 8th bit

median is (5+6) ÷ 2 = 5.5

Page 14: Statistics

The mode is the value that occurs most often

Example: The mode of 4, 6, 3, 2, 7, 8, 3, 5, 5, 7, 6, 6, 4

The mode = 6

Example: The mode of 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 7, 7, 8This data set has two modes. The modes are 6 and 7. We say that the data is bimodal.

Pg 308 Ex 10F.1

If a data set has more than 2 modes, we do not use the mode as a measure of the middle.

Page 15: Statistics

Measures of SpreadMeasures of Spread The The rangerange is the difference between the largest and is the difference between the largest and

the smallest value, ie range = highest value – lowest the smallest value, ie range = highest value – lowest value.value.

The The inter-quartile rangeinter-quartile range is the difference between the is the difference between the upper quartile and lower quartile, upper quartile and lower quartile,

ie I.Q.R. = upper quartile – lower quartileie I.Q.R. = upper quartile – lower quartileWhat’s a quartile?

That’s easy. It’s just the median of each half of

the data.

eg: 2, 5, 8, 3, 5, 7, 1, 9, 4, 8, 4, 6, 7, 8, 4, 9

The range is 9 - 1 = 8

Page 16: Statistics

QuartilesQuartiles1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 8, 9, 91, 1, 1, 2, 2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 8, 9, 9

18 values puts the median between the 9th and 10th values.

median

The lower quartile is ¼ of the way along the data, which is the middle of the left hand half.

ie median = 5

LQ UQ

ie LQ = 2

The upper quartile is ¾ of the way along the data, which is the middle of the right hand half.ie UQ = 6

Inter-quartile range = UQ – LQ = 6 – 2 = 4

eg 1:

Page 17: Statistics

eg 2: 1, 1, 1, 2, 3, 3, 4, 5, 6, 7, 7, 8, 8, 9, 9, 10, 12

17 bits of data, so median is the 9th bit of data

median

ie median = 6

Each half (not counting the median) has 8 bits of data, so the quartiles are between the 4th and 5th bit of data.

LQ

LQ = 2.5

UQ

UQ = 8.5 IQR = 6

Page 18: Statistics

The time spent (in The time spent (in minutes) by 20 people in minutes) by 20 people in a queue at a bank has a queue at a bank has been recorded as follows:been recorded as follows:

Stem & Leaf Stem & Leaf GraphGraph

3.63.6 2.12.1 3.83.8 2.22.2 4.54.5 1.41.4 00

00 1.61.6 4.84.8 1.51.5 1.91.9 00 3.63.6

5.25.2 2.72.7 3.03.0 0.80.8 3.83.8 3.23.2

55

44

33

22

11

00

6

0

2

1

6

7

8

8

0

2

5

8

5

9

8

4

0

2

0

6

Key: 2 | 7 = 2.7 minutes

55 22

44 55 88

33 00 22 66 66 88 88

22 11 22 77

11 44 55 66 99

00 00 00 00 88

unsortedsorted

Time spent queueing

at the bank

median =

(2.2+2.7) ÷ 2

= 2.45 minsUQ = 3.7 mins

LQ = 1.45 mins

Page 19: Statistics

This is an example of a This is an example of a back to back stem and leaf back to back stem and leaf plot.plot.

The numbers are sorted The numbers are sorted from smallest to largest - from smallest to largest - from the centre out.from the centre out.

Discus results for 10SKD

Key | 9 | 4 = 9.4 m

A stem and leaf graph must have• a title• labels (if back-to-back)• a key• the leaf numbers in columns• numerical order (sorted)• no commas between numbers

girls boys 14 0 2 5 13 6 7 3 0 12 2 4 5 3 119 3 3 10 3 5 4 9 4 5 5 8 5 7

Page 20: Statistics

Box and Whisker plotBox and Whisker plot

The box and whisker plot is a visual display of the The box and whisker plot is a visual display of the five statistics:five statistics:

Minimum, Lower Quartile, Median, Upper Quartile Minimum, Lower Quartile, Median, Upper Quartile and Maximum.and Maximum.

Min LQ Median UQ Max

Page 21: Statistics

A box and whisker graph must have:

• an axis

• a scale on the axis

• a label on the axis (including the units)

• a title

• headings, if side by side

eg: Maths exam results

100

90

80

70

60

50

40

30

20

10

0

mark (%)

boys girls

Page 22: Statistics

Comparing dataComparing dataOften two (or more) box and whisker plots are put Often two (or more) box and whisker plots are put

on the same set of axes (‘side-by-side’ box on the same set of axes (‘side-by-side’ box plots). We can then compare the data, plots). We can then compare the data, commenting on:commenting on:

1) ‘on average’ (using the median)1) ‘on average’ (using the median)

2) ‘spread’ (using the range or IQR)2) ‘spread’ (using the range or IQR)

3) ‘shape’ (symmetrical or skewed)3) ‘shape’ (symmetrical or skewed)

4) and a general statement 4) and a general statement

vertically or horizontally

Page 23: Statistics

Statistical Investigation

2) Collect the data. This may involve preparing a questionnaire, then deciding on whether the data should be collected from the whole population or from just a sample. If from a sample, how do you choose your sample and how many should be in it?

3) Organise the data collected. Maybe in a frequency table or a stem and leaf plot. Calculate the relevant statistics (eg mean, median, LQ, UQ, range, IQR)4) Illustrate the data collected. Maybe in a box and whisker plot, a pie graph, a bar graph etc5) Write an analysis, ending with a conclusion.

1) Pose the question.

Page 24: Statistics

AnalysisAnalysisWhen writing a report to compare two (or more) data sets, there are four things we need to mention:1) On average which is heavier, longer, better etc and quote the values of the medians or means for each set of data.2) The spread of the data by quoting the values of the range or IQR for each set of data.3) The shape of the data (by looking at the bar

graph, stem & leaf graph or box & whisker plot)

We can also mention if there is an outlier ie a value that is significantly bigger or smaller than the rest of the data.

4) A conclusion

Page 25: Statistics

ShapeShape::7

654321

7654321

bi-modal uni-modal

skewed to the lower values

skewed to the higher values

symmetrical

skewed

symmetrical

Page 26: Statistics

Example: Here is a back-to-back stem and leaf graph comparing the lengths of leaves of sprayed fern plants with those that had not been sprayed.

1 1 2 7 8 9 90 3 3 5

7 7 5 5 0 0

0123456

5 9

2 3 3 50 0 4 4 5 6

1 9

4 4 1

9 8 8 5 3 2 2 07 7 5 3 1

5 3 2

sprayed unsprayed

5 1 = 51 cm

Sprayed:

median =

43 cm

range =45 cm

Unsprayed:minimum = 20 cm

LQ = 29 cm

UQ = 54 cm

maximum = 65 cm IQR = 25 cm

median =

31 cm

range =54 cm

minimum = 5 cm LQ = 17.5

cm

UQ =

maximum = 59 cm IQR =

39.5 cm

22 cm

Page 27: Statistics

Sprayed:

median =43 cm

range =45 cm

Unsprayed:

minimum = 20 cm LQ = 29 cm

UQ = 54 cm

maximum = 65 cm IQR = 25 cm

median =31 cm

range =54 cm

minimum = 5 cm LQ = 17.5 cm

UQ =

maximum = 59 cm IQR =

39.5 cm

22 cm

Lengths of leaves of fern plants

Sprayed Unsprayed

0

5

10

15

20

25

30

3540

45

50

55

60

65

70

Length (cm)

Page 28: Statistics

Analysis:

unsprayed leavesas their median is 43cm compared with the unsprayed

leaves median of 31cm.The unsprayed leaves lengths

have a greater spread

as their range is 54 cm compared with the sprayed leaves range of 45cm.

The shape of the unsprayed leaves data is uni-modal

whereas for the sprayed leaves it is bi-modal (as seen from the stem & leaf) .

On average, the sprayed leaves have grown longer than the

Lengths of leaves of fern plants

Sprayed Unsprayed

0

5

10

15

20

25

30

3540

45

50

55

60

65

70

Length (cm)

In conclusion, the lengths of sprayed leaves are longer than the lengths of the non-sprayed leaves.

Page 29: Statistics

1) The following is a back-to-back stem and leaf graph of the heights of boys and girls in a year 9 class. Work out the relevant statistics, draw the box plot, then write an analysis by filling in the gaps.

14

15

16

17

18

7 9

3 6 7 7 9

1 3 4 5 6 6 7 9

1 2 2 3 3 4 5 6 8 9

1 1 2

5 2 1

9 8 8 6 3 2

8 7 5 3 1 1 1 0 0

7 6 6 4 3 3

9 7 4

girls boys

16 3 = 163cm

Girls: Boys:

median = 161 cm

range = 48 cm

minimum = 141 cm LQ = 158

cm

UQ = 174 cmmaximum = 189

cm IQR = 16 cm

median = 168 cm

range = 35 cm

minimum = 147 cm LQ = 160

cm

UQ = 174.5 cm

maximum = 182 cm IQR = 14.5

cm

Page 30: Statistics

On average, the ………..are………………… than the…………. as the median for the boys is………………compared with the girls median of…………..The…………………of heights is greater for the…………….as their …………… is…………………..compared to the………………. range of……………………The shape of the data for the girls is……………………………….. and for the boys is……………………………………(as seen from the box plot)

boys taller girls168cm 161cm

spread

girls range48cm boys 35cm

skewedfairly symmetrical

Heights of year 9 students

Height (cm)

140

145150

155

160

165

170

175

180

185

190 girls boys

In conclusion……………………………………………………………………………the boys are generally taller than the girls.

Page 31: Statistics

2) The following is a back-to-back stem and leaf graph comparing the weights of the students in 2 classes. Calculate the relevant statistics, draw the box plot, then write an analysis by filling in the gaps.

7

6

5

4

3

0 5

0 1 3 4 6

3 6 6 7 7 8 9

2 2 3 4 7 7 8 9 9

5 8

1 0

6 4 2 1

9 9 7 6 5 1 0 0

7 5 4 2 2 0

8 7 7 6

Class J Class K

6 3 = 63 kg

Class J: Class K:

median = 50.5 kg

range = 35 kg

minimum = 36 kg LQ = 42 kg

UQ = 60 kg

maximum = 71 kg IQR = 18 kg

median = 56 kg

range = 40 kg

minimum = 35 kg LQ = 45.5

kg

UQ = 60.5 kgmaximum = 75

kg IQR = 15 kg

Page 32: Statistics

On average class…….. is heavier than class………because the median for class ….. is……………… compared with the ………………for class …… of………….. The ……………. of data for class K is greater as the ………….. for class K is………… compared with the ……………for class J of ……………. The data shape for class J is……………………………… The data shape for class K is ……………………………………………...

K 56 kg median

J50.5 kg spread range

40 kg range 35 kgfairly

symmetricalskewed

K J

Weights of students

Weight (kg)

30

3540

45

50

55

60

65

70

75

80 Class J

In conclusion………………………………………………………………class K is generally heavier than class J

Class K

Page 33: Statistics

3) Here is a back-to-back stem and leaf graph showing the time in minutes for competitors to complete a cross country race. It compares the time of those competitors shorter than 165 cm with those taller than 165cm. It is using a split stem. Calculate the relevant statistics, draw a box plot and then write a report.

7

6

6

5

5

4

4

1 1

5 7 8 9

0 1 1 1 2 2 2 2 3 3

8 8 9

0 0 1 1 1 2 2 2 3 3 4

7 8 8 9

3 4 4

4 0

9 8 7

4 3 3 2

9 8 7 7 6 5

3 3 2 2 2 1 1 0

9 8 6 5

3 2 0

Shorter than 165cm 165cm or taller

Key: 5 2 = 52 mins

Page 34: Statistics

Shorter:

median = 54 mins

range = 34 mins

minimum = 40 minsLQ = 50

mins

UQ = 63 minsmaximum = 74

minsIQR = 13 mins

Taller:

median = 58 mins

range = 28 mins

minimum = 43 minsLQ = 50.5

mins

UQ = 62 minsmaximum = 71

minsIQR = 11.5 mins

Times to run a race

Time (mins)

30

3540

45

50

55

60

65

70

75

80

Shorterstudents

Tallerstudents

Page 35: Statistics

Report:

On average the taller students were slower than the shorter students as the median for the taller students was 58 mins compared to the shorter students median of 54 mins.

The spread of results was greater for the shorter students as they had a range of 34 mins whereas the taller students had a range of 28 mins.

The data for the shorter students is uni-modal and slightly skewed. The data for the taller students is bi-modal (as shown by the stem & leaf graph.

In conclusion, the height of the students does not greatly affect the running speeds.

Pg 326 Problems 1 & 2

Pg 297 Opening Problem B

Pg 324 # 4

Page 36: Statistics

Misleading Graphs

should be

should be

Pg 329 Ex 10L

Page 37: Statistics

Because there is 360o around a circle

Pie Graphs

70 ÷ 315 x

On Calculator:

How students of Sancta Maria College travel to school

by bicycle

TranspoTransportrt

NumbNumberer

WalkWalk 7070

CarCar 105105

BusBus 8484

BicycleBicycle 5656

TotalTotal 315315

WorkinWorkingg

AngleAngle

360315

70x 80°

360315

105x 120°

360315

84x 96°

360315

56x 64

°

360 =

Page 38: Statistics

Time Series DataData that is collected over time, at regular intervals, is often called ‘time series data’.The data is usually presented on a line graph, with time on the horizontal axis.eg. the following graph shows the number of visitors staying at a motel over a 5 year period.

1998 1999 2000 2001 2002

A time series is used to identify trends and patterns in data over a period of time so as to predict future movements.

Page 39: Statistics

Long Term Trend: Whether the measurements are increasing, decreasing or staying fairly constant overall.Seasonal Variations: These are the up and down patterns which recur over a year, month, week or dayShort Term Features: these are irregular fluctuations, unexpected results, outliers.

Page 40: Statistics

2) The following is a back-to-back stem and leaf graph comparing the weights of the students in 2 classes. Calculate the relevant statistics, draw the box plot, then write an analysis.

0 50 1 3 4 63 6 6 7 7 8 92 2 3 4 7 7 8 9 95 8

1 06 4 2 1

9 9 7 6 5 1 0 07 5 4 2 2 0

8 7 7 6

Class J Class K

76543

6 3 = 63 kg

1) Here is a back-to-back stem and leaf graph of the heights of boys and girls in a year 9 class. Work out the relevant statistics, draw the box plot, then write an analysis.

1415161718

5 2 19 8 8 6 3 2

8 7 5 3 1 1 1 0 07 6 6 4 3 3

9 7 4

girls boys

16 3 = 163cm

7 93 6 7 7 91 3 4 5 6 6 7 91 2 2 3 3 4 5 6 8 91 1 2

Page 41: Statistics

3) Here is a back-to-back stem and leaf graph showing the time in minutes for competitors to complete a cross country race. It compares the time of those competitors shorter than 165 cm with those taller than 165cm. It is using a split stem. Calculate the relevant statistics, draw a box plot and then write a report.

7

6

6

5

5

4

4

1 1

5 7 8 9

0 1 1 1 2 2 2 2 3 3

8 8 9

0 0 1 1 1 2 2 2 3 3 4

7 8 8 9

3 4 4

4 0

9 8 7

4 3 3 2

9 8 7 7 6 5

3 3 2 2 2 1 1 0

9 8 6 5

3 2 0

Shorter than 165cm 165cm or taller

Key: 5 2 = 52 mins