45
Chapter 8 Statistics

Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Embed Size (px)

Citation preview

Page 1: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Chapter 8Statistics

Page 2: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems
Page 3: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Statistics

Statistics deals with the collection and analysis of data to solve real-world problems.

Page 4: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Era of informationHuge datasets

Transaction data

Customer behavior

Human genome

Satellite photographs

Demographics

…………………………..

Data mining Data reduction

Multidisciplinary statistics methodology

Page 5: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Elements of Statistics

Statistics

Data Collection

Data Analysis

Descriptive StatisticsAnd Statistical Graphics

Statistical Inference

Survey Sampling

Experimental Design

Observational Study

•Estimation•Testing Hypothesis

Page 6: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Descriptive Statistics 

Year Tutorial HW Quiz Midterm Final Overall Grade 3 A 9.76 9.7 94 88 91.66 A+ 3 A 9.66 8.7 88 93 91.26 A+ 1 B 9.62 9.2 76 94 88.62 A 2 A 9.50 9.0 69 93 85.70 A- 3 A 9.46 5.3 90 87 85.26 A- 1 B 9.80 8.2 70 92 85.00 A- 1 A 9.18 7.4 88 82 83.98 A- 1 A 9.54 8.4 85 79 82.94 B+ 2 A 8.94 8.1 77 85 82.64 B+ 3 A 9.74 7.5 75 85 82.24 B+ 2 B 9.24 7.1 76 85 81.64 B+ 1 A 9.40 7.9 64 90 81.50 B+ 3 A 9.84 7.6 85 77 81.44 B+ 2 B 8.98 7.5 66 88 80.28 B+ 1 B 9.64 8.3 70 82 79.94 B+ 3 A 9.72 7.7 55 91 79.42 B+ 1 B 9.74 6.9 52 94 79.24 B+ 1 B 9.82 8.0 73 79 79.22 B+ 3 B 9.52 7.6 71 80 78.42 B 3 A 8.60 7.3 66 85 78.20 B 1 B 9.54 7.0 69 81 77.74 B 3 A 7.72 7.4 81 76 77.42 B

…………………………………..

…………………………………..

Data: a set of numbers representing characteristics of observations

Page 7: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Different types of Data• Categorical (可分類的 )

– Nominal (無序列性的 )

– Ordinal (有序列性的 )

– Scale (有序列性且其數值有意義 )

• Continuous (連續的 )

Another concern:

• Whether the data are dependent or independent with each other. E.g. Time series.

Page 8: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 1

If the class teacher conducts a survey on the favorite fruit in class and asks the following question:

Please select you favorite fruit among the following choices. (Select one only.)

Apple Orange Banana Mango Others

In this case, we will collect nominal categorical data, as the data are categorized and the ordering of the answers is not meaningful.

In this case, we will collect nominal categorical data, as the data are categorized and the ordering of the answers is not meaningful.

Examples of different types of data

Page 9: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 2

In a questionnaire, the following question is stated:

Do you agree that all chickens should be killed in order to prevent the outbreak of H5N1 virus?

Agree Neutral Not agree

We will collect ordinal categorical data in this case, as the data are categorized and the ordering of the answers is meaningful.

We will collect ordinal categorical data in this case, as the data are categorized and the ordering of the answers is meaningful.

Page 10: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 3

At the end of questionnaires, usually we will see questions about the personal details, such as income:

Please select the range of your monthly income? Below $5000 $5001- $7000 $7001- $9000 $9001- $11000 Above $11001

Here, we have scale categorical data in this case, as the data are categorized and the values of the answers are meaningful.

Here, we have scale categorical data in this case, as the data are categorized and the values of the answers are meaningful.

Page 11: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 4

If the PE teacher conducts a survey on the physical status of students, the following question may be asked:

What is your height? Answer: cm

Here, we obtain continuous data. Furthermore, the data in this case are independent with each other.

Here, we obtain continuous data. Furthermore, the data in this case are independent with each other.

Page 12: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 5Here are the water consumptions of a household in between April 1999 and May 2001

Month Water consumptions (m3)Apr 99 4.5Aug 99 3.5Dec 99 5.5Apr 00 5.5Aug 00 8.5Dec 00 7.0Apr 01 5.0

This is a time series. In fact, this is a particular type of continuous data, which has data observed in series.

This is a time series. In fact, this is a particular type of continuous data, which has data observed in series.

Page 13: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Frequency Distribution

 Summary Statistics for Discrete Variables Grade Count CumCnt Percent CumPct A+ 2 2 2.25 2.25 A 1 3 1.12 3.37 A- 4 7 4.49 7.87 B+ 11 18 12.36 20.22 B 8 26 8.99 29.21 B- 7 33 7.87 37.08 C+ 16 49 17.98 55.06 C 14 63 15.73 70.79 C- 11 74 12.36 83.15 D 13 87 14.61 97.75 F 2 89 2.25 100.00 N= 89

Vehicles Frequency Percentage

Cars 45 59

Lorries 22 29

Motorcycles 6 8

Buses 3 4

Total 76 100

Table: Flow of vehicles

Table: Grade of Students

Page 14: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems
Page 15: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Commonly used graphical displays

• Pictogram

• Bar chart

• Pie chart

• Histogram

• Broken line graph

Page 16: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Pictogram

Each figure represent 5 persons

Number of persons enjoy the three entertainments

Magic

Movie

Concert

Page 17: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Grouped Bar chart

Population of different age groups

0

100

200

300

400

500

600

700

20-24 25-29 30-34 35-39 40-44 45-49

Age

Popu

latio

n (in

thou

sand

)

1991

2001

1991 200120-24 430 40025-29 580 44030-34 600 49035-39 590 63040-44 400 65045-49 240 530

Page 18: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Stacked Bar Chart

Revenue and Expenditure in 2000-2001

0

5000

10000

15000

20000

25000

30000

35000

40000

Revenue Expenditure

$ M

illio

n

Revenue $ Million Expenditure $ MillionDirect Taxes 13816 Social Services 13661Indirect Taxes 8443 Community Services 3902Other Revenue 14084 General Services 13262

Economic Services 1218Security Services 4859

Page 19: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Pie chart

Grade A 2.70%Grade B 6.80%Grade C 14.30%Grade D 22.10%Grade E 21.60%Grade F 16.90%Unclassified 15.60%

Page 20: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Histogram**** Area represent frequency ****

INCORRECT

Page 21: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Broken line graphSales of a record company

0

5000

10000

15000

20000

25000

Jan Feb Mar Apr May June

Month

Num

ber o

f CD

sol

d

Jan 19500Feb 23000Mar 21000Apr 20000May 15000June 9000

Page 22: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Revenue and Expenditure of HK Government from 1979-80 to 1984-85

0

5000

10000

15000

20000

25000

30000

35000

40000

79-80 80-81 81-82 82-83 83-84 84-85

Year

$ M

illio

n

RevenueExpen d i tu r e

Page 23: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 1

If the class teacher conducts a survey on the favorite fruit in class and asks the following question:

Please select you favorite fruit among the following choices. (Select one only.)

Apple Orange Banana Mango Others

Pie chart / Bar chart / PictogramPie chart / Bar chart / Pictogram

What graph(s) is/are suitable?

Page 24: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 2

In a questionnaire, the following question is stated:

Do you agree that all chickens should be killed in order to prevent the outbreak of H5N1 virus?

Agree Neutral Not agree

Bar chartBar chart

Page 25: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 3

At the end of questionnaires, usually we will see questions about the personal details, such as income:

Please select the range of your monthly income? Below $5000 $5001- $7000 $7001- $9000 $9001- $11000 Above $11001

Bar chartBar chart

Page 26: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 4

If the PE teacher conducts a survey on the physical status of students, the following question may be asked:

What is your height? Answer: cm

HistogramHistogram

Page 27: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Example 5Here are the water consumptions of a household in between April 1999 and May 2001

Month Water consumptions (m3)Apr 99 4.5Aug 99 3.5Dec 99 5.5Apr 00 5.5Aug 00 8.5Dec 00 7.0Apr 01 5.0

Broken line graphBroken line graph

Page 28: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

A.      Frequency Distribution

Data that have not been organized in any way are called raw data.

When summarizing large masses of raw data, it is often useful to distribute the data into classes or categories and to determine the number of individuals belonging to each class, called the class frequency. A tabular arrangement of data by classes together with the corresponding class frequencies is called a “frequency distribution” or “frequency table”.

Page 29: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Frequency Table

Example:Heights of 100 Male Students at XYZ University

Height(inches)

Number ofStudents

60-62 5

63-65 18

66-68 42

69-71 27

72-74 8

  Total = 100

Page 30: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

B. Graphical Representation of Frequency Distribution

(1)   Histogram(2)   Frequency Polygon and Curve(3)   Cumulative Frequency Polygon and Curve

Page 31: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

The Three Averages:

C. Measures of Central Tendency

Arithmetic Mean

Median

Mode

Page 32: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

1)       Arithmetic Mean  (I) For ungrouped data

n

xxxx

n

xx n

n

ii

3211

n

xxxx

n

xx n

n

ii

3211

Let be the n ungrouped data. Then, the arithmetic mean ( ) is given by:

nxxxx ,,,, 321 x

Arithmetic Mean =

Sum of ALL dataNo. of data

Arithmetic Mean =

Sum of ALL dataNo. of data

Page 33: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

(ii) For grouped data

47.5131830291054

)9(1)8(3)7(18)6(30)5(29)4(10)3(5)2(4

321

332211

1

1

n

nnn

ii

n

iii

ffff

xfxfxfxf

f

xfx

E.g. Find the mean no. of potatoes per plant given the following frequencies of occurrence.

x

No. of potatoes 2 3 4 5 6 7 8 9

No. of plants 4 5 10 29 30 18 3 1

Page 34: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

x

5.5012162423187

)5.62(12)5.57(16)5.52(24)5.47(23)5.42(18)5.37(7

x

If class intervals are given, we have to assign the class mid-point to each of the intervals

E.g. Find the mean of the following distribution.

Class Frequency (f) Class mid-point (x)

35 – 40 7  

Over 40 – 45 18  

Over 45 – 50 23  

Over 50 – 55 24  

Over 55 – 60 16  

Over 60 – 65 12  

Class Frequency (f) Class mid-point (x)

35 – 40 7 37.5

Over 40 – 45 18 42.5

Over 45 – 50 23 47.5

Over 50 – 55 24 52.5

Over 55 – 60 16 57.5

Over 60 – 65 12 62.5

Page 35: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

2)       Median 

The median is the middle value of a group of data when they are arranged in order of magnitude.

Xi : 10 12 7 30 15

7 10 12 15 30 12Median

Rank

E.g.

Page 36: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

3)       Mode 

The mode is the datam that occurs most frequently in a set of data.

x

No. of potatoes 2 3 4 5 6 7 8 9

No. of plants 4 5 10 29 30 18 3 1

Mode = 6

E.g.

Page 37: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

E.g.The histogram below shows the results of an experiment in which 140 batteries are tested to determine their lifetimes.

Lifetimes of 140 batteries

Fre

qu

en

cy

50

40

30

20

10

0

Number of hours

11.5 13.5 15.5 17.5 19.5

(a) Find the median of the lifetimes of the batteries.

(b) Find, correct to 2 decimal places, the mean lifetime of the batteries.

(c) Find the probability that a battery chosen at random from these 140 batteries would have a lifetime of at least 18.5 hours.

Page 38: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

(a) Find the median of the lifetimes of the batteries.

Since the total number of data is 140, therefore we have to find the straight line which cuts the data in the middle position, i.e. in between the 70th and the 71st data.

Lifetimes of 140 batteries

Fre

qu

en

cy

50

40

30

20

10

0

Number of hours

11.5 13.5 15.5 17.5 19.5

1620

34

42

28

Median

= 15.5+17.52

= 16.5 hours

Page 39: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Lifetimes of 140 batteries

Fre

qu

en

cy

50

40

30

20

10

0

Number of hours

11.5 13.5 15.5 17.5 19.5

1620

34

42

28

(b) Find, correct to 2 decimal places, the mean lifetime of the batteries.

Mean

= sum of fxsum of f

Class mark(x) Frequency(f) fx

11.5 16 18413.5 20 27015.5 34 52717.5 42 73519.5 28 546

140 2262

Sum of f

Sum of fx

= 2262140

= 16.16 hours (2 d.p.)

Page 40: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

(c) Find the probability that a battery chosen at random from these 140 batteries would have a lifetime of at least 18.5 hours.

Class mark(x) Frequency(f)

11.5 1613.5 2015.5 3417.5 4219.5 28

Pr (lifetime of at least 18.5 hrs)

= No. of batteries have lifetime at least 18.5 hrsTotal no. of batteries

Class interval is18.5 –20.5

= 28 = 0.2140

Page 41: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems
Page 42: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

The Arithmetic Mean1. It requires all the given data in its calculation

2. It can be easily affected by extreme values,

thus may be very misleading

3. It is often used in further statistical calculations

Arithmetic Mean =

Sum of ALL dataNo. of data

Arithmetic Mean =

Sum of ALL dataNo. of data

Here, we have two sets of data:

Data 1: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}

Data 2: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 100000}

Then,

Arithmetic Mean of Data 1 = 1

Arithmetic Mean of Data 2 = 10000.9

Here, we have two sets of data:

Data 1: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}

Data 2: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 100000}

Then,

Arithmetic Mean of Data 1 = 1

Arithmetic Mean of Data 2 = 10000.9

Arithmetic Mean has a lot of nice properties and hence many advanced

statistical tools are developed based on it.

Arithmetic Mean has a lot of nice properties and hence many advanced

statistical tools are developed based on it.

Page 43: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

The Median1. It requires only the middle datam or data in it

s calculation

2. It is not affected by extreme values

3. It is seldom used in further statistical calculations

Median is the middle value of a group of data when they are arranged in order of magnitude

Median is the middle value of a group of data when they are arranged in order of magnitudeData 1: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}

Data 2: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 100000}

Then,

Median of Data 1 = 1

Median of Data 2 = 1

Data 1: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}

Data 2: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 100000}

Then,

Median of Data 1 = 1

Median of Data 2 = 1

The topic of Ordered Statistics is very complicated.

The topic of Ordered Statistics is very complicated.

Page 44: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

The Mode

1. It is easy to understand and convenient to use.

2. It is not affected by extreme values

3. There may be more than one mode in a distribution.

Mode is the value with highest frequencyMode is the value with highest frequency

Data 1: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}

Data 2: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 100000}

Then,

Mode of Data 1 = 1

Mode of Data 2 = 1

Data 1: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}

Data 2: {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 100000}

Then,

Mode of Data 1 = 1

Mode of Data 2 = 1

4. It is rarely used in further statistical calculations

Page 45: Chapter 8 Statistics. Statistics Statistics deals with the collection and analysis of data to solve real-world problems

Points to note when using the averages

1. The measures should be sensibly used. (i.e., easy to interpret and not misleading)

2. Usually, MEAN is useful and not misleading, however when the distribution of data is highly skewed (i.e., extreme value existed), then we better use MEDIAN.

3. We use MODE only when no further statistical analysis is needed.