Upload
dylan-wilkins
View
218
Download
2
Embed Size (px)
Citation preview
1
1-1 Overview
1- 2 Types of Data
1- 3 Random Sampling
1- 4 Design of Experiments
1- 5 Abuses of Statistics
Chapter 1 Introduction to Statistics
2
Statistics (Definition)A collection of methods for planning experiments, obtaining data, and then organizing, summarizing,
presenting, analyzing, interpreting, and drawing conclusions based on the data
1-1 Overview
3
DefinitionsPopulation The complete collection of all
data to be studied.
Census The collection of data from every member of the population.
SampleThe collection of data from a subset
of the population.
4
ExampleIdentify the population and
sample in the studyA quality-control manager randomly selects 50 bottles of Coca-Cola to assess the calibration of the filing machine.
5
Statistics
Broken into 2 areas
Descriptive Statistics Inferencial Statistics
Definitions
6
DefinitionsDescriptive Statistics
Describes data usually through the use of graphs, charts and pictures. Simple calculations like mean, range, mode, etc., may also be used.
Inferencial Statistics Uses sample data to make inferences (draw
conclusions) about an entire population
Test QuestionTest Question
7
Parameter vs. Statistic
Variables
Quantitative Data vs. Qualitative Data
Nominal Data vs. Ordinal Data
Discrete Data vs. Continuous Data
Univariate Data vs. Bivariate Data
1-2 Types of Data
8
Parameter
a numerical measurement describing some characteristic of a population
population
parameter
Definitions
9
Statistic
a numerical measurement describing some characteristic of a sample
sample
statistic
Definitions
10
Examples
Parameter
51% of the entire population of the US is Female
Statistic
Based on a sample from the US population is was determined that 35% consider themselves overweight.
11
Variable
» characteristics of the individuals (data) being measured or observed
» represented as a symbol (x, Y, s, σ, µ, etc.).
Definitions
12
We further describe variables by
distinguishing between Qualitative and
Quantitative data (variables)
Definitions
VariablesVariables
QualitativeQualitative
QuantitativeQuantitative
13
Definitions
Quantitative data
Numbers representing counts or measurements
Qualitative (or categorical or
attribute) dataCan be separated into different categories that are distinguished by some nonnumeric characteristics
14
Examples
Quantitative data
x = The number of FLC students with blue eyes
Qualitative (or categorical or
attribute) datay = The eye color of FLC students
15
ExampleOf the adult U.S. population, 36% has an allergy. A sample of 1200 randomly selected adults resulted in 33.2% reporting an allergy.
1. Describe the variable and give its type
2. Describe the population
3. Describe the sample
4. What is the value of the parameter?
5. What is the value of the statistic?
6. Which statement is descriptive in nature?
7. Which statement is inferential in nature?
Quiz Question
16
We further describe qualitative data by
distinguishing between Nominal and
Ordinal data
Definitions
Qualitative Qualitative DataData
NominalNominal
OrdinalOrdinal
17
Nominal Nominal data are categorical data where the order
of the categories is arbitrary Example: race/ethnicity has values 1=White, 2=Hispanic,
3=American Indian, 4=Black, 5=Other. Note that the order of the categories is arbitrary.
Ordinal Ordinal data are categorical data where there is a
logical ordering to the categoriesExample: scale that you see on many surveys: 1=Strongly
disagree; 2=Disagree; 3=Neutral; 4=Agree; 5=Strongly agree.
Definitions
18
We further describe quantitative data by
distinguishing between discrete and
continuous data
Definitions
Quantitative Quantitative DataData
DiscreteDiscrete
ContinuousContinuous
19
Discrete data result when the number of possible values is
either a finite number or a ‘countable’ number of possible values
0, 1, 2, 3, . . .
Continuous (numerical) data result from infinitely many
possible values that correspond to some continuous scale or interval that covers a range of values without gaps, interruptions, or jumps
Definitions
2 3
20
Discrete
The number of eggs that hens lay; for
example, 3 eggs a day.
Continuous
The amounts of milk that cows produce;
for example, 2.343115 gallons a day.
Examples
21
Univariate Data » Involves the use of one variable (X)
» Does not deal with causes and relationship
Bivariate Data» Involves the use of two variables (X and Y)
» Deals with causes and relationships
Definitions
22
Univariate Data How many first year students attend FLC?
Bivariate DataIs there a relationship (association) between then
number of females in Computer Programming and their scores in Mathematics?
Example
23
1. Center: A representative or average value that indicates where the middle of the data set is located
2. Variation: A measure of the amount that the values vary among themselves or how data is dispersed
3. Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed)
4. Outliers: Sample values that lie very far away from the vast majority of other sample values
5. Time: Changing characteristics of the data over time
Important Characteristics of Data
24
Uses of Statistics
Almost all fields of study benefit from the application of statistical methodsSociology, Genetics, Insurance, Biology, Polling,
Retirement Planning, automobile fatality rates, and many more too numerous to mention.
25
1- 3
Design of Experiments
26
Designing an Experiment
Identify your objective
Collect sample data
Use a random procedure that avoids bias
Analyze the data and form conclusions
27
Observational Study measures the characteristics of a population by studying individuals in a sample, but does not attempt to manipulate or influence variables of interest
Experimentapplies treatments to experimental units or subjects and attempts to isolate the effects of the treatments on a response variable
Definition
28
Observational Study A poll is conducted in which 500 people are
asked whom they plan to vote for in the upcoming election
ExperimentTo determine the effect of type of fertilizers a
farmer might divide 20 tomato plants into two groups. Group 1 received fertilizer 1 and Group 2 receives fertilizer 2. All other factors for the two groups are kept the same (sunlight, water, etc).
Examples
29
Define the treatment, experimental unit and response variable in the following experiment.To determine the effect of type of fertilizers a
farmer might divide 20 tomato plants into two groups. Group 1 received fertilizer 1 and Group 2 receives fertilizer 2.
All other factors for the two groups are kept the same (sunlight, water, etc). i.e., Confounding does not occur
Experimental Design
30
Lurking variables: A variable that was not considered in a study but
may affect study.
Confounding: Occurs in a study when lurking variables affect the outcome.
Confounding
31
Example: Flu shots are associated with a lower risk of being
hospitalized or dying from influenza.
Possible Lurking Variables: • age
• health status
• mobility of the senior
Confounding
32
Experiment apply some treatment (Action)
Event (Response)observe its effects on the subject(s) (Observe)
Example: Experiment: Toss a coin
Event: Observe a tail
Probability Experiment
33
1- 4
Sampling
34
Random (type discussed in this class)
Systematic
Convenience
Stratified
Cluster
Methods of Sampling
35
TI-83 CalculatorUsing a random number generator
1. Press Math
2. Cursor over to PRB
3. Press “5” RandInt
4. Enter (low value, high value, sample size)
Example:
RandInt(1,30,5) will select 5 random numbers between 1 and 30
Note: if you get duplicate numbers you should draw more numbers than you need and ignore the duplicates.
36
Simple Random Sample members of the population are selected in such a way that each has an equal chance of being selected (if not then sample is biased)
Definitions
37
Random Sampling - selection so that
each has an equal chance of being selected
38
Systematic SamplingSelect some starting point and then
select every K th element in the population
39
Convenience Samplinguse results that are easy to get
40
Stratified Samplingsubdivide the population into at
least two different subgroups that share the same characteristics, then draw a sample from each
subgroup (or stratum)
41
Cluster Sampling - divide the population into sections (or clusters); randomly select some of those clusters; choose all members from selected clusters
42
Sampling Error the difference between a sample result and the true
population result; such an error results from chance sample fluctuations.
Nonsampling Error sample data that are incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective instrument, or copying the data incorrectly).
Errors in Sampling
43
Sampling Error (Example) A recent poll showed potential voters favored the
proposition 52% to 48%. The margin of error for the poll was 3%.
Nonsampling Error (Example)During presidential election or 2000, early results from an Florida exit poll were skewed by a programming error.
Errors in Sampling
44
Bad Samples Small Samples Loaded Questions Misleading Graphs Pictographs Precise Numbers Distorted Percentages Partial Pictures Deliberate Distortions
1-5 Abuses of Statistics
45
Abuses of StatisticsBad Samples
Inappropriate methods to collect data. BIAS (on test) Example: using phone books to sample data.
Small Samples (will have example on exam)
We will talk about same size later in the course. Even large samples can be bad samples.
Loaded Questions
Survey questions can be worked to elicit a desired response
46
Bad Samples Small Samples Loaded Questions Misleading Graphs Pictographs Precise Numbers Distorted Percentages Partial Pictures Deliberate Distortions
Abuses of Statistics
47
Bachelor High SchoolDegree Diploma
Salaries of People with Bachelor’s Degrees and with High School Diplomas
$40,000
30,000
25,000
20,000
$40,500
$24,400
35,000
$40,000
20,000
10,000
0
$40,500
$24,40030,000
Bachelor High SchoolDegree Diploma
(a) (b)(test question)
48
We should analyze the numerical information given in the graph instead of being mislead by its general shape.
49
Bad Samples Small Samples Loaded Questions Misleading Graphs Pictographs Precise Numbers Distorted Percentages Partial Pictures Deliberate Distortions
Abuses of Statistics
50
Double the length, width, and height of a cube, and the volume increases by a factor of eight
What is actually What is actually intended here? 2 intended here? 2 times or 8 times?times or 8 times?
51
Bad Samples Small Samples Loaded Questions Misleading Graphs Pictographs Precise Numbers Distorted Percentages Partial Pictures Deliberate Distortions
Abuses of Statistics
52
Precise NumbersThere are 103,215,027 households in the US. This is actually an estimate and it would be best to say there are about 103 million households.
Distorted Percentages
100% improvement doesn’t mean perfect.
Deliberate Distortions
Lies, Lies, all Lies
Abuses of Statistics
53
Bad Samples Small Samples Loaded Questions Misleading Graphs Pictographs Precise Numbers Distorted Percentages Partial Pictures Deliberate Distortions
Abuses of Statistics
54
Abuses of StatisticsPartial Pictures“Ninety percent of all our cars sold in this country in the last 10 years are still on the road.”
Problem: What if the 90% were sold in the last 3 years?
55
Factorial Notation 8! = 8x7x6x5x4x3x2x1
Order of Operations 1. ( )2. POWERS3. MULT. & DIV.4. ADD & SUBT.5. READ LIKE A BOOK
Keep number in calculator as long a possible
Using Formulas