Upload
ledien
View
224
Download
0
Embed Size (px)
Citation preview
CHAPTER 1 Data CollectionSection 1.1 Introduction to the Practice of StatisticsObjectives
1.. Define statist ics and statist ical thinking2. Explain the process of statistics3. Dist inguish between quali tat ive and quantitat ive4. Dist inguish between discrete and continuous vari5. Determine the level of measurement of a variable
Objective 1 Define statistics and statistical thinkingo Ststistics is the science of collecting, organizing,
answer questions. In addit ion, statist ics is aboutData
The information referred to in the definition isData are a "fact or proposirtion used to draw a conan ind iv idual .A key aspect of data is that they vary.One goal of statist ics is to describe and understand
a
a
a
a
Objective 2 Explain the Process of Statistics
' A&&tze-z-arconsists of the entire group of ind
' A\SA4AE- is a subset of the population that i
' Anaultall4s a person or object that is a
DeserelraTlyg. ̂ .trZits.r-:-E consist of organizing and sum
numerical summaries, tables, and graphs.
n-SZZSZCis a numerical sum ma ry based
zwF€e€^r./4L' --SraEsrlai uses methods that take results
measures the reliabil i ty of the result.
. n&pnner,lis a numerical summary of a
Pa rameter versus StatisticExample:Suppose the percentage of al l students on our caa) What is the population?
Aze S ruEalO a/t/ aq€
b) Does the value 84.9% represent a parameter o
/4ft,A// € fe.<
c) Suppose a sample of 250 students is obtained,Does the value 86.4% represent a parameter or a
5 7n.r/5 7/A
marizing, and analyzing informatircn to draw conclusions ording a measure of confidence in any conclusions.
n or make a decision."Data dr:scribe characteristics of
urces of variabi l i ty.
ls to be studied.
being studied.
of the population being situdied.
rizing data. Descript ive statist ics describq data through
a sample.
a sample, extends them to ther population, and
who have a job is 84.9%.
'P45
a statistic?
f rom this sample we f ind that 8i6.4% haVe a job.
stic?
lllustrating the process of StatisicsStep 1: ldentify the research objective.Step 2: Collect the information needed to answerStep 3: Describe the data )Organize andStep 4: Draw conclusions from the data.
Objective 3 Distinguish between Qualitative and
o Key Point: Variables vary. Consider the variable hthe height of one individual would be suff icient inthe case. As researchers, we wish to identify the f
Variables and Types of Data
Variables can be classif ied a
. Qufrct 7A 7/vE variables allow for classi
. @ar iab lesprov idenuaddit ion and subtraction can be performed onresults.
Objective 4 Distinguish between Discrete and Contin
Quantitative variables can be further classified
' ADls&sr€ variableis a quantitative variable
a) The number of heads obtained after f l ipping a,^clunur tr+nvE ,D
b) Weights of new born babies in a hospital
c) Eye colors of students in Math 227
countable number of possible values. The term "2, 3, and so on. (e.g. # of books, # of desks)
' AfuzdgrZttva riableis a q ua ntitative va ria bleand can be measured to any desired level of accu
Classification of Va riables
Example:Classify each variable as quali tat ive or quantitat ivecontinuous.
e question.the information,
Variables
re. the characteristics of the individ within the population.
ight. l f a l l individuals had the samc. height, then obtaininging the heights qf al l individuals. Of course, this is not
rs that inf luence variabi l i ty.
or Qruttrlmnvc -.
of individuals based on some attr ibute or characterist ic.
measures of individqals. Anithmetic operations such asof the quantitat ive variable and provide meaningful
r iables
two groups.
t has either a f inite number of possible v4lues or auntable" means the values result from counting such as 0, 1,
t has an inf inite number of possible valuqs it can take on
lf the variable is quantitat ive, further classify i t as discrete or
n f ive t imes.
€7e
QuaPn r/1nv€ t
Qtunzt r4r/vE
Dwuats
Objective 5 Determine the Level of Measurement ofo Variables can also be classif ied by how they are cao The level of measurement of the data is useful in
problems.
Four common types of measurement scales are used
rl/ontNftc lalec or. qdytA&EnL-the values of the variable na
not al low for the values of the variable to be arra
OpDak- /eE/ C)F. fu//pe/A4I--it has the properties of the no
for the values of the variable to be arranged in a
-Eatl4uEL /auZar. /HFA,?t4E4L_-it has the properties of the o
of the variable have meaning. A value of zero in thof the quantity. Arithmetic operations such as addvariable.
Pnro /eutz or/a.Z<ezaf-it has the properties of the
variable have meaning. A value of zero in the ratioArithmetic operations such as mult ipl ication and d
Example:Classify each as nominal- level, ordinal- level, in
a) Sizes of carsaPDrrv AL
b) Nationali ty of each studentMotunt
c) lQ of each student-zuEEv4t
d)Weightr@Tzo
Section 1.2 Observational Studies Versus DesignedObjective
t. Dist inguish between an observat ional study and a
Objective lDistinguish between an Observational
' An@rneasures the value ofvalue of ei ther the response or explanatory var iathe behavior of the individuals in the study withou
lf a researcher assigns the individuals in a study toexplanatory variable, and then records the valueconducting a Dat@ten 4zreutat T
Variable
r ized, counted, on measured.ing what procedure to take to arpply statist ics to real
classify variables:
label, or categorize. In addit ion, t lhe naming scheme doesin a ranked, or specif ic, order.
inal level of measunement andthr.. naming scheme allowsor specif ic, order.
inal level of measurement and the differences in the valuesinterval level of measurement does not mean the absence
and subtraction can be perforrmed on values of the
rval level of measurement and the rat ios of the values of thelevel of measurement means the absence of the quant i ty.is ion can be perfornned on the valrues of the var iable.
l - level, or rat io leveldata.
experiment
and an Experiment
response variable without attempting tQ inf luence theThat is, in an observational study, the re5earcher observes
t ry ing to inf luence t lne outcome ofthe study.
certain group, intentionally changes the value of thethe response variable for each group, the researcher is
Example: Cellular Phones and Brain Tumors
In both studies, the goal of the research was toof contracting brain tumors. Whether or not brainvariable). The level of cell phone usage is theIn research, we wish to determine how varying theresponse variable.
/rr.fcwNOltte in a study occurs when theTherefore, any relation that may exist between an(dependent) variable may be due to some other va
A/t " f W. ^ is an explanatory variable th
the response variable in the study. In addit ion,variables considered in the studv.
Example:
ldentify the explanatory variable and thea) Rats with cancer are divided into two grou
is used to f ight cancer, and the other receives
is measured. a:y/aAl,lftZE/ y/7f,
t(aftlss lAronBc*A researcher wants to oeterrfr inb wnetner
weight than those who stay single.
fYfzn'va 7a E r/ tt fl€t4ts t a
EEPa^tS€ rfrEda(€ - al
A census is a l ist of al l individuals in a populat ion a
Sect ion 1.3 Simple Random SamplingObjective
L . Obta in a S imple Random Sample
Objective I Random sampling
A sample of s ize n from a populat ion of s ize N issample of s ize n has an equal ly l ikely chance of
Example: l l lustrating Simple Random SamplingSuppose a study group of consists of 5Bob, Patricia, Mike, Jan, and Maria2 of the students must go to the board to
List al l possible samples of size 2 (without rep
b )
fCaa,7azetetz)2 -C-fu,, ///4 6*at, -
(/are414 Jn"I>) C/*reP4 Hac'a
k- page 15)
rmine if radio frequencies from cell phones increase the riskncer was contracted is the response voriable (dependent
variable (i nde pe nde nt variaLtle ).unt of an explanatory variabler affects the value of a
of two or more explanatory variables are not separated.natory ( independent) variable and the response
ble or var iables not accounted for in the study.
was not considered in a study, but that affect the value ofing variables are typical ly related to any explanatory
variable for the fol lowing studies.
One group receives 5 mg of a medication that
0 mg. After 2 years, the spread rof the cancer
-'7rtE,q*au^t f dF,4AEfz6#46€E dFrae slca0
dr /7/€enNe<'h-g cfiptes who marry are more l ikely to gain
/442/7eL 5 r474J
^47 €>?zrF)
ng with certain characterist ics of each indiVidual.
ined through simple random sampling i f eVery possitr le. The sample is then ca l led a s imp le rqndom sample .
onstrate a homework problem.
)2Cha lacO(*q Jnt)t
Section 1.4 Other Effective Sampling MethodsObjectives
t. Obtain a Strat i f ied Sample2. Obtain a Systematic Sample3. Obtain a Cluster Sample
nWis one obtained by secal led strofo, and then obtaining a simple randomstratum should be homogeneous (or simi lar) in so
ASys;rarUUo Qad/rris obtained by selecting eveselected is a random number between 1 and k,
n tUlsrep- .Q"qftris obtained by selecting aof individuals.
Sect ion 1.5 Bias in SamplingObjectives
t. Explain the Sources of Bias in Sampling
Sources of Bias in Samplingo l f the results of the sample are not representat ive
Three Sources of Bias1. Sampling Bias-occurs when the technique usedone part of the population over another.2. Nonresponse Bias - exists when individualshave different opinions from those who do.3. Response Bias - exists when the answers on a sTypes of Response Bias: L) Interviewer error; 2)question; 4) Order of the questions or words withi
{vaztaJ' of 5',fr4////V6
(<) €udY /a / cls/azrc- €/1t
7d -r€/ec/ /?€2 oE lzs
+ sys,ry??,re -fq/v/z€
@ fl,Fne,zp< D..vz)o /r, a,fe€eztt r*t t?ap/{ 4zz
lfF.lazzz, r3@ n sZhr ca,€2.,r€zBeJ "Fo".ii
-/e:{nb7iA t
-> SrezZFte)
the population into homogeneoLrs, non-overlapping groupsfrom each stratum. The indi,uiduals within each
way.
k*h individual from the population. The f irst individual
I individuals within a randomly selercted col lect ion or Eiroup
the populat ion, then the sample has bias.
obtain the individuals to be in the sample tends to l lavor
to be in the sample who do not respond to the survey
do not reflect the true feelings of the respondent.isrepresented answers; 3) Words used in surveythe quest ion.
'7tfal-f '
€ 4 @oe/ gze€ ts '?tfe:o
y'aEtr€ (ata4-
,n c4 /.v p 3 A .rq24Ezf7afult Ei^/AAlZy
TlE z6 4/,/71//'{ Tl€ ? *4s€e 7az'<5
o F //t oFel4aEO,
Srubaattr PaluAA za! _Zt ra ltvs 4aa6--EAr/a6, A,uj 6.CA)4*E ztU.LduT 7ry,€TtrFeff* €,4er' cc*+6 a'{oesks r*e:
1
Section 1.6 The Design of Experiments
Objectives
1. Describe the Characteristics of an Experiment 2. Explain the Steps in Designing an Experiment 3. Explain the Completely Randomized Design 4. Explain the Matched-Pairs Design 5. Explain the Randomized Block Design
Objective 1. Describe the Characteristics of an Experiment
An experiment is a controlled study conducted to determine the effect of varying one or more explanatory
variables or factors has on a response variable. Any combination of the values of the factors is called a
treatment.
The experimental unit (or subject) is a person, object or some other well-defined item upon which a treatment
is applied.
2
A control group serves as a baseline treatment that can be used to compare to other treatments.
A placebo is an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental
medication.
Blinding refers to nondisclosure of the treatment an experimental unit is receiving.
A single-blind experiment is one in which the experimental unit (or subject) does not know which
treatment he or she is receiving.
A double-blind experiment is one in which neither the experimental unit nor the researcher in contact
with the experimental unit knows which treatment the experimental unit is receiving.
EXAMPLE The Characteristics of an Experiment
The English Department of a community college is considering adopting an online version of the freshman English course. To compare the new online course to the traditional course, an English Department faculty member randomly splits a section of her course. Half of the students receive the traditional course and the other half is given an online version. At the end of the semester, both groups will be given a test to determine which performed better. (a) Who are the experimental units? The students in the class
(b) What is the population for which this study applies? All students who enroll in the class
(c) What are the treatments? Traditional vs. online instruction
(d) What is the response variable? Exam score
(e) Why can’t this experiment be conducted with blinding? Both the students and instructor know which treatment they are receiving
Objective 2 Explain the Steps in Designing an Experiment
To design an experiment means to describe the overall plan in conducting the experiment.
Steps in Conducting an Experiment
Step 1: Identify the problem to be solved. • Should be explicit • Should provide the experimenter direction • Should identify the response variable and the population to be studied. • Often referred to as the claim.
3
Step 2: Determine the factors that affect the response variable.
• Once the factors are identified, it must be determined which factors are to be fixed at some
predetermined level (the control), which factors will be manipulated and which factors will be
uncontrolled.
Step 3: Determine the number of experimental units. Step 4: Determine the level of the predictor variables
1. Control: There are two ways to control the factors. (a) Fix their level at one predetermined value throughout the experiment. These are variables whose
effect on the response variable is not of interest. (b) Set them at predetermined levels. These are the factors whose effect on the response variable
interests us. The combinations of the levels of these factors represent the treatments in the experiment.
2. Randomize: Randomize the experimental units to various treatment groups so that the effects of variables whose level cannot be controlled is minimized. The idea is that randomization “averages out” the effect of uncontrolled predictor variables.
Step 5: Conduct the Experiment
a) Replication occurs when each treatment is applied to more than one experimental unit. This helps to assure that the effect of a treatment is not due to some characteristic of a single experimental unit. It is recommended that each treatment group have the same number of experimental units. b) Collect and process the data by measuring the value of the response variable for each replication. Any
difference in the value of the response variable is a result of differences in the level of the treatment.
Step 6: Test the claim. • This is the subject of inferential statistics. • Inferential statistics is a process in which generalizations about a population are made on the basis of results obtained from a sample. Provide a statement regarding the level of confidence in the generalization. Methods of inferential statistics are presented later in the text.
Objective 3 Explain the Completely Randomized Design A completely randomized design is one in which each experimental unit is randomly assigned to a treatment.
EXAMPLE Designing an Experiment The octane of fuel is a measure of its resistance to detonation with a higher number indicating higher resistance. An engineer wants to know whether the level of octane in gasoline affects the gas mileage of an automobile. Assist the engineer in designing an experiment. Step 1: The response variable is miles per gallon. Step 2: Factors that affect miles per gallon: Engine size, outside temperature, driving style, driving conditions, characteristics of car Step 3: We will use 12 cars all of the same model and year. Step 4: We list the variables and their level.
• Octane level - manipulated at 3 levels. Treatment A: 87 octane, Treatment B: 89 octane, Treatment C: 92 octane
4
• Engine size - fixed • Temperature - uncontrolled, but will be the same for all 12 cars. • Driving style/conditions - all 12 cars will be driven under the same conditions on a closed track - fixed. • Other characteristics of car - all 12 cars will be the same model year, however, there is probably
variation from car to car. To account for this, we randomly assign the cars to the octane level. Step 5: Randomly assign 4 cars to the 87 octane, 4 cars to the 89 octane, and 4 cars to the 92 octane. Give each car 3 gallons of gasoline. Drive the cars until they run out of gas. Compute the miles per gallon. Step 6: Determine whether any differences exist in miles per gallon. Completely Randomized Design
Objective 4. Explain the Matched-Pairs Design
A matched-pairs design is an experimental design in which the experimental units are paired up. The pairs are
matched up so that they are somehow related (that is, the same person before and after a treatment, twins,
husband and wife, same geographical location, and so on). There are only two levels of treatment in a matched-
pairs design.
EXAMPLE A Matched-Pairs Design Xylitol has proven effective in preventing dental caries (cavities) when included in food or gum. A total of 75 Peruvian children were given milk with and without Xylitol and were asked to evaluate the taste of each. The researchers measured the children’s ratings of the two types of milk. (Source: Castillo JL, et al (2005) Children's acceptance of milk with Xylitol or Sorbitol for dental caries prevention. BMC Oral Health (5)6.) (a) What is the response variable in this experiment? Rating (b) Think of some of the factors in the study. Which are controlled? Which factor is manipulated? Age and
gender of the children; Milk with and without Xylitol is the factor that was manipulated
(c) What are the treatments? How many treatments are there? Milk with Xylitol and milk without xylitol; 2 (d) What type of experimental design is this? Matched-pairs design
(e) Identify the experimental units. 75 Peruvian children
5
(f) Why would it be a good idea to randomly assign whether the child drinks the milk with Xylitol first or second? Remove any effect due to order in which milk is drunk. (g) Do you think it would be a good idea to double-blind this experiment? Yes!
Objective 5 Explain the Randomized Block Design
Grouping similar (homogeneous) experimental units together and then randomizing the experimental units
within each group to a treatment is called blocking. Each group of homogeneous individuals is called a block.
Confounding occurs when the effect of two factors (explanatory variables) on the response variable cannot be
distinguished.
A randomized block design is used when the experimental units are divided into homogeneous groups called
blocks. Within each block, the experimental units are randomly assigned to treatments.
EXAMPLE A Randomized Block Design
Recall, the English Department is considering adopting an online version of the freshman English course. After
some deliberation, the English Department thinks that there may be a difference in the performance of the men
and women in the traditional and online courses. To accommodate any potential differences, they randomly
assign half the 60 men to each of the two courses and they do the same for the 70 women.
This is a randomized block design where gender forms the block. This way, gender will not play a role in the
value of the response variable, test score. We do not compare test results across gender.