44
Statist ics The POWER of Data

Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Embed Size (px)

Citation preview

Page 1: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Statistics

The POWER of Data

Page 2: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Statistics: Definition

Statistics is the mathematics of the collection, organization, and interpretation of numerical data.

Page 3: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Data

• Data is information, in the form of facts or figures, used as a basis for making calculations or drawing conclusions

• Data can be quantitative or qualitative.

• Data can be obtained by a variety of methodologies:– Random Sample– Quadrat Study– Questionnaires– Experiments

Page 4: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Types of Data: Qualitative

• Information that relates to characteristics or descriptions (observable qualities)

• Examples– Species of plant– Type of insect– Shades of color– Rank of flavor in taste

testing

• Qualitative data can be “scored” and evaluated numerically

IB Junior Class

Qualitative data:• friendly demeanors • Hard workers • environmentalists • positive school spirit

Page 5: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Types of Data: Quantitative

• Quantitative – measured using a naturally occurring numerical scale

• Measurements are often displayed graphically

• Examples– Chemical

concentration– Temperature– Length– Weight…etc.

Page 6: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

http://www.swiftchart.com/example.htm

Page 7: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Error Analysis• ALL measurements are subject to

uncertainties and this must always be stated.

– Measure someone’s height. Are they slouching? Wearing shoes? Standing on even ground? Is it morning or night?

– Measuring multiple times REDUCES the error!!

• The limits of the equipment used add some uncertainty to the data collected. All equipment has a certain magnitude of uncertainty. For example, is a ruler that is mass-produced a good measure of 1 cm? 1mm? 0.1mm?

• For quantitative testing, you must indicate the level of uncertainty of the tool that you are using for measurement!!

Page 8: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Instrument Error

Most measurements in the lab are done with devices that have a marked scale. Here we are measuring the length of a pendulum. Even using this idealized, zoomed-in picture, we cannot tell for sure whether the length to the end of the mass is 128.89 cm or 128.88 cm. However, it is certainly closer to 128.9 cm than to 128.8 cm or 129.0 cm. Thus we can state with absolute

confidence that the length L is 128.9cm + 0.1cm

Page 9: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Error and Electronics

• ALL devices have an error potential; sometimes it is written on the instrument itself

• The error of an electronic device is usually half of the last precision digit.

Page 10: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Error and Graphing

• Error bars are a graphical representation of the variability of data.

• Error bars may show confidence intervals, standard errors, standard deviations, range of data or other quantities.

• More trials = less error

Page 11: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Significant Figures

• Be sure that the number of significant digits in the data table/graph reflects the precision of the instrument used (for ex. If the manufacturer states that the accuracy of a balance is to 0.1g – and your average mass is 2.06g, be sure to round the average to 2.1g) Your data must be consistent with your measurement tool regarding significant figures.

Page 12: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Creating Error Bars on Graphs

• When we need an accurate measurement, we usually repeat the measurement several times and calculate an average value

• Then we do the same for the next measurement

Average

•Once the two averages are calculated for each set of data, the average values can be plotted together on a graph, to visualize the relationship between the two!

Page 13: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Comparing Averages

Page 14: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Creating Error Bars

The simplest way to draw an The simplest way to draw an error bar is to use the mean as error bar is to use the mean as the central point, and to use the central point, and to use the distance of the the distance of the measurement that is furthest measurement that is furthest from the average as the from the average as the endpoints of the data barendpoints of the data bar

Page 15: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Error Bars

• Error bars that overlap can suggest that there is not a significant difference

Page 16: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Other Data Calculations

• mode: value that appears most frequently

• median: When all data are listed from least to greatest, the value at which half of the observations are greater, and half are lesser.

• The most commonly used measure of central tendency is the mean, or arithmetic average (sum of data points divided by the number of points)   

• You should be able to find the mean, mode and quartiles of your data and this should be shown on your graphs or in your data charts.

• 13, 18, 13, 14, 13, 16, 14, 21, 13

• mean: 15median: 14mode: 13range: 8

Page 17: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Standard Deviation

• Standard deviation is used to summarize the spread of values around the mean

• For normally distributed data, about 68% of all values lie within ±1 standard deviation of the mean. This rises to about 95% for ±2 standard deviations.

• A small standard deviation indicates that the data is clustered closely around the mean value.

• Conversely, a large standard deviation indicates a wider spread around the mean.

• Standard deviation can also be used in drawing error bars

Page 18: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Calculating Standard Deviation

• You can use the old formula for calculating standard deviation or the much easier calculator button!

• We will practice using the formula.

Page 19: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Calculating Standard Deviation

• Standard Deviation can be calculated:– on your calculator– In Microsoft excel (type the

following code into the cell where you want the standard deviation result, using the “unbiased,” or “n-1” method: STDEV(A1:A30) (substitute (substitute the cell name of the first the cell name of the first value in your dataset for value in your dataset for A1, and the cell name of A1, and the cell name of the last value for A30.)the last value for A30.)

– On your computer: On your computer: http://www.pages.drexel.edhttp://www.pages.drexel.edu/~jdf37/mean.htmu/~jdf37/mean.htm

Page 20: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Let’s measure some people!!

Page 21: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Is our data reliable enough to support a conclusion?

Page 22: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Imagine we chose two children at random from two class rooms…

210 211

… and compare their height …

Page 23: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

210 211… we find that

one pupil is taller than the

other

WHY?

Page 24: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

REASON 1: There is a significant difference between the two groups, so pupils in 211 are taller than

pupils in 210210

Ninth Grade

211

11th grade

Page 25: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

REASON 2: By chance, we picked a short pupil from 210 and a tall one from 211

210 211

Heather

(11th grade)

Alex

(11th grade)

Page 26: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

How do we decide which reason is most likely?

MEASURE MORE STUDENTS!!!

Page 27: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

If there is a significant difference between the two groups…

210 211… the average or mean height of the two groups should

be very…

… DIFFERENT

Page 28: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

If there is no significant difference between the two groups…

210 211… the average or mean height of the two groups should

be very…

… SIMILAR

Page 29: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Remember:

Living things normally show a lot of variation, so…

Page 30: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

It is VERY unlikely that the mean height of our two samples will be exactly the same

211 Sample

Average height = 162 cm

210 Sample

Average height = 168 cm

Is the difference in average height of the samples large enough to be significant?

Page 31: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

We can analyze the spread of the heights of the students in the samples by drawing histograms

Here, the ranges of the two samples have a small overlap, so…

… the difference between the means of the two samples IS probably significant.

2

4

6

8

10

12

14

16

Fre

quen

cy

140-149

150-159

160-169

170-179

180-189

Height (cm)

211 Sample

2

4

6

8

10

12

14

16

Fre

quen

cy

140-149

150-159

160-169

170-179

180-189

Height (cm)

210 Sample

Page 32: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Here, the ranges of the two samples have a large overlap, so…

… the difference between the two samples may NOT be significant.

The difference in means is possibly due to random sampling error

2

4

6

8

10

12

14

16

Fre

quen

cy

140-149

150-159

160-169

170-179

180-189

Height (cm)

211 Sample

2

4

6

8

10

12

14

16

Fre

quen

cy

140-149

150-159

160-169

170-179

180-189

Height (cm)

210 Sample

Page 33: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

To decide if there is a significant difference between two samples we must compare the mean height for each sample…

… and the spread of heights in each sample.

Statisticians calculate the standard deviation of a sample as a measure of the spread of a sample

Sx =Σx2 -

(Σx)2 n

n - 1

Where:Sx is the standard deviation of sampleΣ stands for ‘sum of’x stands for the individual measurements in

the samplen is the number of individuals in the sample

You can calculate standard deviation using the formula

Page 34: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Student’s t-test

The Student’s t-test compares the averages and standard deviations of two samples to see if there is a significant difference between them.

We start by calculating a number, t

t can be calculated using the equation:

( x1 – x2 )

(s1)2

n1

(s2)2

n2

+

t =Where:

x1 is the mean of sample 1s1 is the standard deviation of sample 1n1 is the number of individuals in sample 1x2 is the mean of sample 2s2 is the standard deviation of sample 2n2 is the number of individuals in sample 2

Page 35: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Worked Example: Random samples were taken of pupils in 211 and 210

Their recorded heights are shown below…

Students in 211 Students in 210

Student Height (cm)

145 149 152 153 154 148 153 157 161 162

154 158 160 166 166 162 163 167 172 172

166 167 175 177 182 175 177 183 185 187

Step 1: Work out the mean height for each sample

161.60211: x1 = 168.27210: x2 =

Step 2: Work out the difference in means

6.67x2 – x1 = 168.27 – 161.60 =

Page 36: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Step 3: Work out the standard deviation for each sample

211: s1 = 10.86 210: s2 = 11.74

Step 4: Calculate s2/n for each sample

(s1)2

n1

=211:

10.862 ÷ 15 = 7.86

(s2)2

n2

=210:

11.742 ÷ 15 = 9.19

Page 37: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Step 5: Calculate (s1)2

n1

+(s2)2

n2

(s1)2

n1

+(s2)2

n2

= (7.86 + 9.19) = 4.13

Step 6: Calculate t (Step 2 divided by Step 5)

t =

(s1)2

n1

+(s2)2

n2

=

x2 – x1

6.67

4.13= 1.62

Page 38: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Step 7: Work out the number of degrees of freedom

d.f. = n1 + n2 – 2 = 15 + 15 – 2 = 28

Step 8: Find the critical value of t for the relevant number of degrees of freedom

Use the 95% (p=0.05) confidence limit

Critical value = 2.048

Our calculated value of t is below the critical value for 28d.f., therefore, there is no significant difference between the height of students in samples from 211 and 210

http://www.graphpad.com/quickcalcs/ttest1.cfm

Page 39: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Ethics in Statistics

• Statistics are funny things; they can be misused and used to prove almost anything.– 50% of the population

of the US has below average intelligence!

– 4 of 5 doctors recommend….

Page 40: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Ways of Misusing Data

• Recreate experiment until the numbers say what you want. (interview MANY groups of 5 dentists)

• NEVER IGNORE DATA THAT DOESN’T SEEM TO MATCH YOUR BELIEFS!!!!!

Page 41: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Misusing Data: Dos and Don’ts

• The amount of data matters! More data is more reliable

• Avoid bias! Don’t have a perceived idea of what will happen (this is hard!)

• Accidents occur! 5% of results are likely to happen by accident (False alarm probability)

• Don’t discard some data; all data is important even if unexpected.

• Don’t over generalize (All apples are red)• Don’t manipulate• Don’t make up “better” data

Page 42: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Misusing Data

• Correlations don’t equal Causation!!– Experiments don’t

PROVE; they merely suggest correlations

*People with more moles live longer.

Page 43: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

Data Misuse Examples

• Bell Curve– There are substantial individual and group

differences in intelligence; these differences profoundly influence the social structure and organization of work in modern industrial societies, and they defy easy remediation.

Page 44: Statistics The POWER of Data. Statistics: Definition Statistics is the mathematics of the collection, organization, and interpretation of numerical data

FREAKONOMICS

• Schoolteachers and Sumo wrestlers have a lot in common.

• The Ku Klux Klan and Real Estate Agents work under the same principles

• Abortion lessons crime!• Do names matter?

– Low education parents: Ricky, Terry, Larry, Jazmine, Misty, Mercedes

– High education parents: Marie-Claire, Glynnis, Aviva, Finnegan, MacGregor, Harper

– ???: Loser, Winner, Temptress, Sir John, Precious