35
1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

Embed Size (px)

Citation preview

Page 1: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

1

Math 1231: Spring 2015Chapter 1, Part One

Course Introduction

1.2: Statistical Thinking

1.3: Types of Data

Page 2: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

2

Recent Real-World Statistics

• 74% of “online adults” use a social networking site of some kind. Among 18-29 year olds, this rises to 89% [Pew Internet Project, Jan. 2014].

• U.S. Adults living with at least one child under age 6 spent an average of 2 hrs/day providing primary care and 5.4 hrs/day on secondary care. [Bureau of Labor Statistics, June 2014].

• In 2014, 43% of American adults self-identified as political independents [Gallup, Jan. 2015].

Page 3: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

Links for Previous Examples

• BLS American Time Use Survey:– http://www.bls.gov/news.release/atus.nr0.htm

• Pew Social Media User Demographics:– http://www.pewinternet.org/data-trend/social-media/

social-media-user-demographics/

• Gallup Political Identification:– http://www.gallup.com/poll/180440/new-record-

political-independents.aspx

3

Page 4: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

4

Some Interesting Questions…

• Gallup’s political poll was based on phone interviews conducted during 2014. Were you contacted during this time?

• Gallup interviewed “only” 16,479 American adults, out of more than 230 million total!

• Question: Is Gallup justified in making such a claim about all American adults, using information from such a small percentage?

Page 5: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

5

A Typical Problem

• The previous scenarios illustrate a central theme in real-world statistics:– We have a VERY LARGE set of individuals (all

American adults, for example).– It is EFFECTIVELY IMPOSSIBLE to gather

information from every single individual.– Using information from a relatively small

number of individuals, we can draw some conclusions about the VERY LARGE set.

Page 6: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

Statistical Thinking

Section 1.2

Page 7: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

7

*** Basic Terminology ***

• Population: The complete set of individuals that we intend to study (Gallup Poll: All American adults—more than 230 million individuals!).

• Sample: The set of individuals for which we have obtained actual data. The Sample is a subset of the Population. (Gallup: 16,479 adults, contacted by telephone during 2014).

• Data: Specific information about each individual (Gallup: Each individual’s political affiliation).

• Assume we have data only from those in the sample.

Page 8: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

Population vs. Sample(not to scale!!)

8

Page 9: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

Descriptive / Summary Statistics

• Goal: Take a set of data and organize or summarize it in a useful way.

• Example: Your current GPA summarizes your academic performance, without needing to see your entire transcript.– Is this a fair/accurate summary? It doesn’t matter,

many people will use it anyway.• We will look at several different graphical and

numerical summaries over the next few classes.

9

Page 10: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

10

Statistical Inference

Goal: Use (known) data from a sample in order to draw conclusions about the entire population.• Since we don’t have data for the entire population, these

conclusions ALWAYS have some degree of uncertainty.More carefully stated, Gallup concluded this:• Based on sample data from 16,479 individuals, we claim

that the percentage of American adults who self-identified as “independent” in 2014 was between 42% and 44%. There is a 95% chance this claim is correct.

Page 11: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

11

Statistical Inference• Gallup’s conclusion is based on data from a very small

percentage, less than 0.01%, of the intended population.• It may surprise you that such a small amount of data can

be used to make a conclusion about a large population. We’ll discuss the underlying methods (and how to properly interpret these results) later in the course.

• For now: It is MORE IMPORTANT that the sample data are collected in an “appropriate” way, otherwise our methods will give potentially inaccurate results.

Page 12: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

How to think about statistical data and results

Page 13: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

13

Data: Important Considerations

• Context: What do the data represent? The same numbers can have completely different meanings/interpretations:

GRADE A B C D F W

No. of Students 3 7 6 5 2 4

Day M T W R F S

Pieces of Mail 3 7 6 5 2 4

Page 14: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

14

Data: Important Considerations

• Source: Where do the data come from?– Who gathered the data?– Who summarized or analyzed the data?– Who sponsored or funded the research?– Are those responsible for collecting/analyzing

the data reliable?– Is there any incentive to distort results and/or

favor a particular type of result?

Page 15: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

15

Data: Important Considerations

• Sampling Method: What process was used to choose the sample and collect data?– Was sample selection limited to individuals

who volunteered to provide data?– Was sample selection limited to individuals

who were convenient?– Was data collection based on subjective

judgment or ambiguous terminology?• Example: Do you spend a lot of time studying?

Page 16: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

16

Important Considerations

• Conclusions: What are the results of the statistical analysis/inference?– What is the intended population? Are the

results valid for the entire population?– Can you restate results in a way that can be

understood by someone with no little or no knowledge of statistical terminology?

– Is there a cause-and-effect relationship, or merely a statistical relationship (“Correlation does not imply causality”—see Chapter 10).

Page 17: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

17

Some Other Considerations

• Practical Implications: Are the conclusions useful or relevant in a real-world context?– A “Statistically Significant” claim comes from

analyzing the data using numerical methods, without any context (see the next slide).

– “Practically Significant” means useful or relevant to the real world.

– These are not necessarily the same thing!

Page 18: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

18

*** Statistical Significance ***

• When doing statistical inference, there is always some degree of randomness in how we gather the sample data.

• If we wind up with results that are unlikely to occur by random chance, we say the results are statistically significant.

• Simple Example: How likely is it that a fair coin would come up heads in 95/100 flips?

Page 19: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

19

Example: Class Attendance

• In analyzing grade data among Math 1231 students from previous semesters, the average course grade for students with “many” absences was 15 points (out of 100) less than the average for students with “few or no” absences.

• My Claim: Students with “many” absences tend to have lower course grades than students with “few or no” absences.

• Is this statistically and/or practically significant?• Is frequent absence the cause of lower grades?

Page 20: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

20

Types of Data

Section 1.3

Page 21: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

21

*** Parameters vs. Statistics ***

• The goal of statistical inference is to use sample data to draw conclusions about some VERY LARGE population.

• A parameter is a numerical value describing some aspect of the population.

• A statistic is a numerical value describing some aspect of a sample.

• The value of a statistic (computed from sample data) can be used to estimate the value of a parameter (almost always unknown).

Page 22: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

22

Parameters vs. Statistics

• Example: I want to estimate the average height of all students currently in class. I choose four students “at random” and compute the average height for those four.– Average class height: This is a parameter, its value is

unknown to us (the population is the entire class).– Average height for the group of four: This is a statistic

(the sample consists of these four students).• Question for later: Is it reasonable to claim that

the sample average is “close” to the population average, based on our sample?

Page 23: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

23

Quantitative vs. Categorical

• Quantitative data consist of number that represent counts or measurements.

• All quantitative data is numerical, but not all numerical data is quantitative.

• Data with a unit of measurement (seconds, feet, pounds, dollars, etc.) is quantitative.

• Numerical data used as a label or range of values (Student ID Number, 20-25 years) is not quantitative.

Page 24: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

24

Examples: Quantitative Data

• The University keeps the following quantitative data about each student.– Grade Point Average– Number of Credit Hours Completed– Age– Amount of money owed for tuition– Other examples?

Page 25: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

25

Categorical Data

• Data that are not quantitative are called categorical.

• Non-numerical data must be categorical.• Numerical data that serves to label or

identify individuals are categorical – (Example: Social Security Number).

• A useful guide: Would it make sense to consider an average value? If not, treat the data as categorical.

Page 26: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

26

Examples: Categorical Data

• The University keeps the following categorical data about each student:– Name– Laker ID Number– Date of Birth– Gender– Residency (“in-state” or “out-of-state”)– Other?

Page 27: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

27

Discrete vs. Continuous

• Quantitative (number) data can be classified as:– Discrete: Finitely many possible values, or infinitely

many values with clearly-defined “next” and “previous” values. Discrete values can be put into a list.

– Continuous: Infinitely many values anywhere in a given range/interval, with no holes or gaps.

• A useful guide: Is it theoretically possible to make your measurements more accurate/precise? If so, then you probably have continuous data.

Page 28: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

28

Examples: Discrete or Continuous?

• Number of siblings.• Amount of time it takes to run one mile.• Resting pulse rate (beats per minute).• Distance you live from this building.• Grade point average.• Credit card balance (in dollars/cents).

Note: The answers may depend on how the data are measured and/or used.

Page 29: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

29

Levels of Measurement

• An alternate way to classify data, based on what can be done to summarize/analyze it. There is some debate on how many levels are needed; these four are commonly used:– Nominal (qualitative)– Ordinal (ordering is meaningful)– Interval (differences are meaningful)– Ratio (ratios are meaningful)

Page 30: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

30

Nominal Level

• Consists of names, labels, or well-defined categories. There is no meaningful way to order values (alphabetical is often used).– Colors (Red, Green, Yellow, etc.)– Gender (Female, Male)– Party Affiliation (Democrat, Republican, Other)– State of Residence

• Nominal data is always categorical.

Page 31: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

31

Ordinal Level

• Data can be arranged in some meaningful order, but differences between values cannot be computed or are useless.– Course Grades (A, B, C, D, F)– Competitive Rankings (Gold > Silver > Bronze,

but “Gold minus Silver” is useless, even if we represent these as numbers 1, 2, 3).

• Ordinal data is often categorical (notable exceptions are IQ and Body Mass Index).

Page 32: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

32

Interval Level• Numerical values that can be put in order, and the

difference between two values has some useful meaning.

• However, there is no “natural zero” level and ratios do not have any practical meaning. Examples:– Temperature (Fahrenheit or Celsius): 15 is colder

than 30, but zero degrees does not mean an absence of temperature (unless you use Kelvin).

– Calendar Data: Aug. 7th < Aug. 21st, with a difference of 14 days (but the 21st is not “three times” the 7th).

• Interval data is the least common of the four levels.

Page 33: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

33

Ratio Level• Numerical values that can be put in order, the

difference between two values has meaning, and there is a natural, non-arbitary “zero level.”

• Ratio data measures “amount of stuff.” The zero level means that “no stuff” is present.– Distance, amount of time, mass/weight, many other

physical quantities.– Price, Checking account balance, many other

monetary quantities.• If “twice as much” or “half as much” make sense,

then you have ratio data. Ratio data is always quantitative.

Page 34: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

34

Examples

Classify the following data (about students):• Age• Year of birth• Academic major• Weight• Transfer student? (yes/no)• Currently seated in which row?• SAT score

Page 35: 1 Math 1231: Spring 2015 Chapter 1, Part One Course Introduction 1.2: Statistical Thinking 1.3: Types of Data

35

Examples

Answers to the previous slide:• Age: Quantitative, Discrete(?), Ratio• Year of birth: Quantitative(?), Discrete, Interval• Academic major: Categorical, Nominal• Weight: Quantitative, Continuous, Ratio• Transfer student?: Categorical, Nominal• Current row?: Categorical, Ordinal• SAT score: Quantitative, Discrete, Ordinal(?)

08/15/11