Author: Brenda Gunderson, Ph.D., 2013 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution!onCommer"ial#hare Ali$e 3.0 Un%orted &i"ense' htt%'(("reative"ommons.or)(li"enses(b*n"sa(3.0( +he Universit* of i"hi)an -%en. i"hi)an initiative has reviewed this material in a""ordan"e with U. Co%*ri)ht &aw and have tried to ma imi/e *our abilit* to use, share, and ada%t it. +he attribution %rovides information about how *ou ma* share and ada%t this material. Co%*ri)ht holders of "ontent in"luded in this material should "onta"t o%en.mi"hi)an umi"h.edu with uestions, "orre"tions, or "larifi"ation re)ardin) the use of "ontent. or more information about how to attribute these materials visit' htt%'((o%en.umi"h.edu(edu"ation(about(termsofuse. #ome materials are used with %ermission from the "o%*ri)ht holders. ou ma* need to obtain new %ermission to use those materials for other uses. +hi in"ludes all "ontent from' ind on #tatisti"s Utts(4e"$ard, 5th 6dition, Cen)a)e &, 2012 +e t -nl*' 7#B! 89:12:;13;8:5 Bundled version' 7#B! 89:0;3:9335:8 #P## and its asso"iated %ro)rams are trademar$s of #P## 7n". for its %ro%rietar* "om%uter software. -ther %rodu"t names mentioned in this resour"e are used for identifi"ation %ur%o onl* and ma* be trademar$s of their res%e"tive "om%anies. Attribution Key or more information see' htt%''((o%en.umi"h.edu(wi$i(AttributionPoli"* Content the copyright holder, author, or law permits you to use, share and adapt: Creative Commons Attribution!onCommer"ial#hare Ali$e &i"ense Publi" Domain < #elf Dedi"ated' =or$s that a "o%*ri)ht holder has dedi"ated to the %ubli" domain. Make Your Own Assessment Content -%en. i"hi)an believes "an be used, shared, and ada%ted be"ause it is ineli)ible for "o%*ri Publi" Domain < 7neli)ible. =or$s that are ineli)ible for "o%*ri)ht %rote"tion in the U.#. >19 U#C ?102>b@@ laws in *our urisdi"tion ma* differ. Content -%en. i"hi)an has used under a air Use determination air Use' Use of wor$s that is determined to be air "onsistent with the U.#. Co%*ri)ht A"t >19 U#C ? 109@ laws in *our urisdi"tion ma* differ. -ur determination D-6# !-+ mean that all uses of this third%art* "ontent are air Uses and we D- !-+ )uarantee that *our use of the "ontent is air. +o use this "ontent *ou should "ondu"t *our own inde%endent anal*sis to determine whether or not *our use will be air. Statistics 250 1
Author: Brenda Gunderson, Ph.D., 2013
License: Unless otherwise noted, this material is made
available under the terms of the Creative Commons
Attribution!onCommer"ial#hare Ali$e 3.0 Un%orted &i"ense'
htt%'(("reative"ommons.or)(li"enses(b*n"sa(3.0(
+he Universit* of i"hi)an -%en.i"hi)an initiative has reviewed this
material in a""ordan"e with U.#. Co%*ri)ht &aw and have tried
to maimi/e *our abilit* to use, share, and ada%t it. +he
attribution $e* %rovides information about how *ou ma* share and
ada%t this material.
Co%*ri)ht holders of "ontent in"luded in this material should
"onta"t o%en.mi"hi)anumi"h.edu with an* uestions, "orre"tions,
or "larifi"ation re)ardin) the use of "ontent.
#ome materials are used with %ermission from the "o%*ri)ht holders.
ou ma* need to obtain new %ermission to use those materials for
other uses. +his in"ludes all "ontent from'
ind on #tatisti"s Utts(4e"$ard, 5th 6dition, Cen)a)e &, 2012
+et -nl*' 7#B! 89:12:;13;8:5 Bundled version' 7#B!
89:0;3:9335:8
#P## and its asso"iated %ro)rams are trademar$s of #P## 7n". for
its %ro%rietar* "om%uter software. -ther %rodu"t names mentioned in
this resour"e are used for identifi"ation %ur%oses onl* and ma* be
trademar$s of their res%e"tive "om%anies.
Attribution Key
or more information see'
htt%''((o%en.umi"h.edu(wi$i(AttributionPoli"*
Content the copyright holder, author, or law permits you to use,
share and adapt:
Creative Commons Attribution!onCommer"ial#hare Ali$e
&i"ense Publi" Domain < #elf Dedi"ated' =or$s that a
"o%*ri)ht holder has dedi"ated to the %ubli" domain.
Make Your Own Assessment
Content -%en.i"hi)an believes "an be used, shared, and ada%ted
be"ause it is ineli)ible for "o%*ri)ht.
Publi" Domain < 7neli)ible. =or$s that are ineli)ible for
"o%*ri)ht %rote"tion in the U.#. >19 U#C ?102>b@@ laws in
*our
urisdi"tion ma* differ.
Content -%en.i"hi)an has used under a air Use determination
air Use' Use of wor$s that is determined to be air
"onsistent
with the U.#. Co%*ri)ht A"t >19 U#C ? 109@ laws in *our
urisdi"tion ma* differ.
-ur determination D-6# !-+ mean that all uses of this third%art*
"ontent are air Uses and we D- !-+ )uarantee that *our use of the
"ontent is air. +o use this "ontent *ou should "ondu"t *our own
inde%endent anal*sis to determine whether or not *our use will be
air.
Statistics 250
Weekly Labs, In-Lab Projects, Suppleents,
an! "l! #$as %or &e'ie( )se! in all lab sections o% Stat
250
*r+ ren!a un!erson *epartent o% Statistics )ni'ersity o%
.ic/ian
able o% ontents
.aterial Pae ote to Stu!ents an! Suppleents
Supplement 1: SPSS Commands Summary Supplement 2: Notation Sheet
Supplement 3: Name That Scenario Supplement 4: Editing Charts in
SPSS Supplement 5: Notes about SPSS t Procedures Supplement :
!nterpretation E"amples Supplement #: Summary o$ the %ain
t & Tests Supplement ': (egression )utput in
SPSS
1 2 4
22
2:
Lab 37 ie Plots an! ;-; Plots 3 Lab 47 Probability an! &an!o
<ariables
4
5
2
60
:
105
Lab 127 Siple Linear &eression 115 Lab 137 /i-SAuare ests
126
3
"l! #$as %or &e'ie( E"am 1 *uestions E"am 2 *uestions +inal
E"am *uestions
135 1: 203
ote to Stu!ents
,elcome to Statistics 25- at the .ni/ersity o$ %ichigan0 This lab
orboo is designed $or you to use in lab and as e"tra preparation
$or e"ams !n the orboo you ill nd the $olloing materials:
Suppleental .aterial 6 great summaries $or re$erence throughout the
term:
1 SPSS Commands (e$erence 2 Notation Sheet 3 Name That Scenario 4
Editing Charts in SPSS 5 !mportant Notes $or 7ypothesis Testing
!nterpretation E"amples # Summary o$ T&tests and Name That
Scenario Practice $or %eans ' (egression )utput in SPSS
Weekly Labs ?nubere! 1 to 13@ 6 each lab contains the $ollo parts:
o Lab ackroun! 6 ob8ecti/e and brie$ o/er/ie material
hich is good to tae a couple minutes to read before you come
to lab each ee
o War-)p >cti'ity 6 9uic 9uestions $or you to do be$ore the
!n&ab Pro8ect usually a 9uic re/ie o$ concepts you ha/e seen in
lecture
o ILP ?In-Lab Project@ 6 one or more acti/ities that you ill or on
in lab in groups ; copy o$ the !P ill be pro/ided to each group $or
turning in at the end o$ the lab period
o ool-*o(n >cti'ity 69uestions $or you to do a$ter the !P $or
$urther re<ection and application o$ the concepts co/ered in the
!P
o #$aple #$a ;uestions 6 old e"am 9uestions on the lab
topic=s> $or additional practice
"l! #$as 6 complete sets o$ actual old e"am $or studying ?e
sure to re$er to CTools to see i$ any problems on these old e"ams
are not rele/ant $or your particular upcoming e"am =due to
di@erences in the semester schedule> This in$ormation in
addition to solutions ill be posted on CTools in the A(e/ie !n$oB
$older under the A(esourcesB tab closer to each e"am date
The abs are designed to be interacti/e and to pro/ide you ith
a complete e"ample $or each concept Completing the
1
corresponding Preab assignment =a lin to /ideo instructions $or
Preabs ill be on CTools and the Stat 25- ouTube channel> and
reading the upcoming lab bacground o/er/ie be$ore lab each ee is a
good ay to prepare $or the /arious lab acti/ities
Dood luc in Statistics 25-0 && The Stat 250 Instructors and
GSIs
Suppleent 17 SPSS oan!s Suary y Lab B For ;uick &e%erence
Lab 2 – Descriptive Statistics
"pen a !ata =le a$ter ha/ing SPSS already open: +ile )pen
Fata
To produce a 8istora7 Draphs egacy Fialogs
7istogram
To produce a o$plot $or a single /ariable ith no
groups: Draphs egacy Fialogs ?o"plot Simple Summaries $or separate
/ariables
To use *ata Label .o!e7 +rom inside the Chart Editor
Elements Fata abel %ode
To generate *escripti'e Statistics I7 ;nalyGe
Fescripti/e Statistics Fescripti/es
To generate *escripti'e Statistics II7 ;nalyGe
Fescripti/e Statistics +re9uencies
To produce a ar /art7 Draphs egacy Fialogs ?ar Simple
Summaries o$ separate /ariables
To Split ?or unsplit@ t/e !ata ?et c/arts an! statistics by
roup@7 Fata Split +ile
To produce Si!e-by-Si!e o$plots7 Draphs egacy Fialogs
?o"plot Simple Summaries $or groups o$ cases
Lab 3 – Time Plots and Q-Q Plots
To produce a SeAuence ?ie@ Plot7 ;nalyGe +orecasting
Se9uence Charts To produce a ;-; Plot7 ;nalyGe Fescripti/e
Statistics *&* Plots
2
Lab 8 – One-Sample Confdence Intervals or a Pop!lation "ean
To produce a on=!ence Inter'al %or a population ean ?et/o!
I@7 ;nalyGe Fescripti/e Statistics E"plore Statistics
option
To produce a on=!ence Inter'al %or a population ean ?et/o!
II@7 ;nalyGe Compare %eans )ne&Sample T Test
Lab 8 – One-Sample t Proced!res or a Pop!lation "ean
To per$orm a "ne-Saple est %or a population ean7
;nalyGe Compare %eans )ne&Sample T Test
3
To calculate a con=!ence inter'al %or *7 ;nalyGe
Compare
%eans Paired&Samples T Test To per$orm a Paire!
est7 ;nalyGe Compare %eans Paired& Samples T Test To
copute *iCerences7 Trans$orm Compute
Lab $% – Independent Samples t Proced!res
To construct a con=!ence inter'al %or 1 - 27
;nalyGe Compare %eans !ndependent&Samples T Test
To per$orm a (o-Saples est: ;nalyGe Compare %eans
!ndependent&Samples T Test
Lab $$ – One-&a' (nal'sis o )ariance *(+O)(,
To per$orm an >"<>7 ;nalyGe Compare %eans
)ne&,ay ;N)H;
Lab $2 – Simple Linear e.ression
To produce a Scatterplot7 Draphs egacy Fialogs
ScatterIFot To per$orm a Linear &eression7 ;nalyGe
(egression inear To produce a &esi!ual plot7 Draphs
egacy Fialogs ScatterIFot
Lab $3 – C/i-S0!are Tests
To (ei/t cases by ounts7 Fata ,eight Cases To
per$orm a oo!ness o% Fit est7 ;nalyGe Nonparametric
Tests Chi&S9uare To per$orm a est o%
In!epen!ence7 ;nalyGe Fescripti/e Statistics Crosstabs
To per$orm a est o% 8ooeneity7 ;nalyGe Fescripti/e
Statistics Crosstabs
4
Suppleent 27 otation S/eet The table belo denes important
notations including that used by SPSS hich you ill come across in
the course This is not an e"hausti/e list but it is a $airly
comprehensi/e o/er/ie o$ the Astrange lettersB used in the course
Note: ?lan cells mean there is no corresponding notation
ae Population otation
Proportion p p = p&hat> Stan!ar!
!e'iation
σ =sigma> s Std Fe/iation
<ariance σ 2 s2 Hariance Saple siDe n N
on=!ence Inter'als
.arin o% error
8ypot/esis estin est statistics Note: t F and
2 χ statistics ha/e degrees o$ $reedom
=abbre/iated d$> associated ith them oo $or these on your
+ormula Card
z t t F F
2 χ =chi& s9uare>
SSD
Su o% sAuares %or error
SSE
.ean sAuare %or roups
S9uare>
column labeled %ean
S9uare> &eression
&esponse ?!epen!ent@ 'ariable
y = y &hat>
? =loo in the ro labeled =Constant>>
Slope 1β =beta&one> b1
? =loo in the ro labeled ith the
name o$ the "&/ariable>
r 2 ( S9uare
ε =error terms>
Suppleent 37 ae /at Scenario
The rst thing to do in any research in$erence problem is
determine hat type o$ in$erence problem it is This ill help in
deciding hat procedureI$ormulas are appropriate to use The $olloing
9uestions can help you determine the data scenario you are oring
ith
!ease note" #hen ans#er$n%" &'o# any ar$ab!es are there*+ do
not count the ar$ab!e #h$ch de,nes the popu!at$ons ($f there $s ore
than one popu!at$on)-
7o many populations are thereK
"ne (o .ore t/an t(o
7o many /ariables are thereK
"ne (o
ateorical ;uantitati'e
Then use the $olloing table to determine hich type
o$ in$erence ould be appropriate $or this scenario
Note the corresponding parameter is in parentheses here
appropriate
#
uber o% Populations uber o% <ariables an! ype
"ne (o .ore /an
1&sample in$erence $or population proportion =p> =abs 5 an
>
Chi&s9uare: Doodness o$ +it =ab 13>
2 independe nt samples in$erence $or the di@erence beteen 2
population proportions
=p1 6 p2>
;uantita ti'e
1&sample in$erence $or population mean =µ> =ab '>
Paired samples in$erence $or a population mean di@erence =µF>
=ab L>
2 independe nt samples in$erence $or the di@erence beteen 2
population means =µ1 & µ2> =ab 1->
;N)H;
=µi 6 here there is one µi $or each population> =ab
11>
( o
Chi&s9uare: !ndependen ce =ab 13>
;uantita ti'e =relationshi
Suppleent 47 #!itin /arts in SPSS )nce e ha/e a histogram =or any
chart> made e may ish to edit the chart =perhaps to change the
color o$ the bars or change the number o$ class inter/als> To do
this double clic on the chart displayed in the output /ieer indo
This ill open the chart in the SPSS /art #!itor (in!o(
Suppose e ant to c/ane t/e color o$ the histogram bars $rom
tan to a light green To do this double clic on one o$ the tan bars
and clic on the Fill or!er tab in the properties indo Change
the =ll color to light green =clic on the light green bo"
color> and clic on apply and then close
To c/ane soe aspect o% t/e 1 -a$is such as the
scaling double clic on the x &a"is and its
corresponding properties bo" ill appear ith many tab options The
scale tab can be used to change the endpoints and ma8or increments
o$ the x &a"is /alue labels ou could ad8ust the
minimum to 1---- and the ma"imum to 15---- ea/ing the ;uto bo"
checed $or the %a8or !ncrement option ill let SPSS create the
increment siGe The 8istora "ptions tab ould allo you to add a
normal cur/e to a histogram as ell as change the starting position
and siGe o$ the bins =classes or inter/als> in the histogram !n
general SPSS uses algorithms to produce a nice display o$ the data
These options are help$ul i$ you ha/e multiple plots that you ould
lie to display using the same "&a"is /alues so comparisons can
be more easily made ou can also change the gray bacground color to
hite under the Properties (in!o( ith the bacground highlighted
The /elp button in the loer right corner o$ the
Properties bo" can be selected to pro/ide more details about any o$
the /arious options $or that tab )nce you ha/e nished customiGing
your chart you can close out the chart editor
There are alternate ays to get to the Properties bo" in order
to customiGe your chart )nce you ha/e double&cliced on your
chart to open the Chart Editor clic once on the part o$ the chart
that you ish to customiGe =so that is it highlighted> Then clic
on the S/o( Properties Win!o( tab =it loos lie a paint
palette> in your menu This ill open the Properties bo" ;nother
alternati/e is to simply select Properties under the #!it menu
in the Chart Editor ;lso note that i$ you do not close the
Properties bo" and you continue highlighting di@erent parts o$ your
chart the Properties bo" updates so that you can customiGe those
parts as you go
1-
+or bo$plots i$ there are any points denoted as outliers you can
identi$y them by looing at their case label number in the de$ault
output The Chart Editor pro/ides a special mode $or identi$ying
indi/idual cases hose data labels you ant to display This is the
!ata label o!e and hen you are in data label mode you canMt change
anything else in the chart +rom the menus choose #leentsG *ata
Label .o!e The cursor changes shape to indicate that you are in
data label mode Clic the data element $or hich you ant to display
the case label !$ there are o/erlapping data elements in the spot
that you clic the Chart Editor displays the Select *ata #leent to
Label dialog bo" This dialog allos you to select the
specic data element or elements $or hich you ant to display data
labels The Chart Editor displays the data label in a de$ault
position related to the data element ,hen you are nished choosing
data elements $rom the menus choose #leentsG *ata Label
.o!e again and the cursor changes bac to the arro to indicate
that you are no longer in Fata abel %ode
The "ptions menu lets you customiGe your chart $urther
ou may add a title or te"t bo" $rom this menu Te"t bo"es can appear
anyhere in a chart +rom the Chart Editor menus select "ptionsG e$t
o$ or "ptionsG itle depending on hich you ant +or titles the Chart
Editor creates the title bo" and automatically positions it in the
top center o$ the chart Type the te"t and press enter hen you are
nished typing To enter line breas press Shi$tEnter !$ necessary use
the e$t tab to $ormat the te"t +or te"t bo"es you can drag and
drop to reposition them ou may need to resiGe the graph so the te"t
bo" ill not co/er up part o$ the graph ou can also copy the plots
onto %S,ord or another te"t editor and then type in your name and
title ithin the document
Savin. O!tp!t o1es and rap/s !mages and other output $rom SPSS can
o$ten be copied and then pasted into a document by selecting the
desired output right& clicing and choosing opy 1 To save an
o!tp!t bo1 such as a table o$ descripti/e
statistics rst ha/e the location here you ould lie to store the
output open Then right&clic on the output table and select
opy The table can then be pasted into a document or
te"t&eld =such as those in your Preab assignments or homeor>
!$ you are pasting into a ,ord document and i$ the output does not
appear to $ormat correctly it may be a good idea to choose Paste
Special and paste as an image our
11
stats 25- DS!s pre$er not to recei/e ,ord documents 2 To save a
.rap/ such as a bo"plot or histogram you ill
ant to #$port the graph as an image Select the graph you ish
to e"port then select #$port $rom the File menu ;t the top
clic the Selection button select one ?rap/ics only@ under *ocuent
and then choose the le type at the bottom +or uploading to
lectureboocom the e"tension A8pgB or ApngB is re9uired ?e$ore
completing the e"port command be sure to gi/e your le an
in$ormati/e name and note the location here the le ill be
sa/ed
3 To save an entire SPSS session you can e"port output =all
output in the /ieer charts only or te"t only> in many possible
$ormats =html 8peg bitmap etc> ou rst mae the Hieer the acti/e
indo and select the #$port command is under the
File menu
ou can also print the contents o$ your output /ieer indo
=all output te"t as ell as charts> or any selected portion Clic
on File $rom the menu bar and then choose Print ou could also 8ust
sa/e your output le ithin SPSS by selecting the Sa'e as option
and gi/ing a name $or your le and clic "H
Suppleent 57 otes about SPSS t Proce!ures 1 The
reported p&/alue under the column heading o$ Si.4
*2-
tailed, is $or a 2&sided test +or a one&sided test
you rst di/ide the reported 2&tailed p&/alue in hal$
= pI2> !$ the t &statistic is positi/e and the
alternati/e hypothesis as upper&tailed => then pI2 is
the p& /alue !$ the t &statistic is negati/e and
the alternati/e hypothesis as loer&tailed =O> then pI2
is the p&/alue 7oe/er i$ the t &statistic is
positi/e and the alternati/e hypothesis as loer&tailed =O>
or i$ the t &statistic is negati/e and the alternati/e
hypothesis as upper&tailed => then the p& /alue is
1 & pI2
positive
Alternative is >, then p-value is 1 – (sig/2)
Alternative is <, then p-value is 1 – (sig/2)
Alternative is <, then p-value is sig/2
12
+or e"ample consider a 2&sided test ith an obser/ed t
statistic /alue o$ -L4' and a p&/alue o$ -34 This -34 is
actually the sum o$ to e9ual areas: one being the area to the right
o$ -L4' and the other being the area to the le$t o$ &-L4' under
the t &distribution ith d$ 11 cur/e =see +igure 1
belo>
!$ the alternati/e hypothesis had been upper&tailed => then
the p&/alue ould be only the area to the right =in the
direction o$ e"treme> o$ the obser/ed t &statistic o$
-L4' hich is hal$ o$ the to&sided p&/alue or -1'2 ?ut
i$ the alternati/e hypothesis had been loer&tailed =O>
the p&/alue ould be the area to the le$t =in the direction
o$ e"treme> o$ the obser/ed t &statistic o$ -L4'
So the p&/alue ould be 1 6 =-34I2> -'1'
2 Sometimes SPSS ill display a p&/alue o$ ---- Clearly
the probability is not e"actly Gero (ather it is Gero to 3
signicant digits Thus it is correct to say that the
p&/alue is less than ----5 since anything greater hen
rounded ould ha/e resulted in Si+ 2-taile! p-'alue o$
---1 or more
3 )ne can use a one&sample t =1 & α>Q
condence inter/al to test a to&sided hypothesis at the α Q
le/el by checing hether the hypothesiGed or null test /alue is
contained in the inter/al !$ it is e cannot re8ect 7- but i$ not e
ould re8ect 7- (ecall that i$ you ant to produce a condence
inter/al $or the population mean μ you ust speci$y the
test /alue to be - in the one&sample t &test dialog
bo" +or e"ample a test /alue o$ 3- that ould result in a L5Q
condence inter/al o$ the di@erence gi/en as =&-25 -5> is
actually a condence inter/al $or μ 6 3- See ab 5 $or more
details
13
4 Supplement pro/ides an o/er/ie o$ the assumptions $or all three
t &tests that are presented in abs ' L and 1-
14
Suppleent 7 Interpretation #$aples
!n 1L'- ?ausch and omb Corporation de/eloped a ne type o$
e"tended&li$e contact lens made o$ silicone hich it claimed had
a use$ul li$e o$ more than 4 years Furing the research and
de/elopment period a random sample o$ contact earers as ased to ear
the ne contact lenses and record ho long they lasted The a/erage
use$ul li$e o$ the si" pairs o$ lenses as 4 years ith a
standard de/iation o$ -4L years
a Interpretation o% t/e Stan!ar! *e'iation7 The a'erae
!istance o$ the obser/ed use$ul li/es o$ these lenses %ro
t/eir ean use$ul li$e o$ 4 years is abo!t -4L
years
b Calculate the /alue o$ the stan!ar! error o% t/e ean
200.0 6
s X SE
Interpretation7 ,e ould estiate that the a'erae
!istance o$ the possible sample mean use$ul li$e /alues
=obtained $rom repeated random samples o$ siGe n pairs
o$ such lenses> $rom the population ean use%ul
li%e μ to be ro!./l' -2- years
c Construct a :0 con=!ence inter'al $or the population mean
li$e o$ all such silicone&based lenses:
)00".#,1!$.()200.0)(01#.2(6. ⇒±
Interpretation o% t/e Inter'al7 This inter/al pro/ides a range
o$ reasonable /alues $or the population mean use$ul li$e μ ,e
ould estimate the population ean use$ul li$e μ to
be beteen 41L# years and 5--3 years ith L-Q condence
Interpretation o% t/e :0 on=!ence Le'el7 This inter/al as made
ith a method hich i$ repeated ould generate many L-Q condence
inter/als ,e ould e"pect L-Q o$ these resulting inter/als to
contain the population mean li$e μ.
d State the hypotheses to test the claim made by ?ausch and omb
about their ne contact lensR that is test i$ the
15
population mean use$ul li$e is more than 4 years
µ
The p-'alue $or this test is the probability o$
getting a t.test statistic at least as e"treme as the obser/ed test
statistic assuming the null hypothesis is true
So e ha/e the p&/alue is ( )00."≥T P
$ound under the t =5> distribution
This p&/alue turns out to be e9ual to --15
Interpretation o% t/e 'alue o% t/e test statistic t 5 34%% in ters
o% a !istance7 The obser/ed sample mean as 3 a/erage
distances =ie 3 standard errors> abo/e the hypothesiGed mean o$
4
Interpretation o% t/e resultin p-val!e o %4%$67 !$ the null
hypothesis as true =the population mean use$ul li$e is 8ust 4
years> and this procedure =study> as repeated many times e
ould e"pect to see a t &test statistic /alue o$ 3-- or
larger in only 15Q o$ the repetitions Thus are data are somehat
unusual under the null hypothesis theory pro/iding e/idence $or the
alternati/e theory that the population mean use$ul li$e is greater
than 4 years
e ;t a 1-Q signicance le/el hat is the !ecisionK
e7ect '0 since the p&/alue is less than
-1-
$ ,hat is the conclusionK There is sucient e/idence to conclude
that the population ean use$ul li$e o$ the ne lenses is
greater than 4 years
+OT9 These interpretations can be e"tended to the any test
and condence inter/al ad8usting $or the di@erent parameters
di@erent directions o$ e"treme di@erent test statistics
etc
1
Suppleent 67 Suary o% t/e .ain t -ests The three
in$erence scenarios presented in abs ' L and 1- are: one-saple
t proce!ures, paire! t proce!ures, an! t(o
in!epen!ent saples t proce!ures !t is important to
loo at the data to determine i$ doing a particular t
procedure is appropriate That is e need to chec the assumptions
=(ecall that checing assumptions is the second step in per$orming a
hypothesis test> The t procedures ha/e the $olloing
assumptions:
1 Each sample is a ran!o saple 6 =the obser/ations can be /ieed as
realiGations o$ independent and identically distributed random
/ariables> !n the paired t procedures the !iCerences
are assumed a random sample
2 Each sample is dran $rom a noral population that is the response
/ariable has a normal distribution $or each population !n the
paired t procedures the population o$
!iCerences is assumed to ha/e a normal distribution !n the
to&sample case both populations o$ responses are assumed
to ha/e normal distributions ou need normality o$ the
underlying population $or the response in order to ha/e normality
$or the sample mean !n the case here you do not ha/e a normal
population you can still ha/e normality o$ the sample mean i$ you
ha/e a large enough sample siGe =most te"ts state at !east /0>
!n ab # =Sampling Fistributions and the CT> you ored ith an
applet that demonstrated the CT using a !ar%e sample siGe o$
25 Thus e ill accept at !east 25 as large enough $or in$erence
about means
3 +or the to independent samples t procedures e
also assume that the t(o saples are in!epen!ent ,e also need to
assess hether the t(o population 'ariances can be assumed
eAual in order to decide beteen the pooled and the
unpooled t tests
rap/ical tools can be used to chec these assumptions =see abs
2 and 3 $or more details about these /arious graphs>
ie Plots ?or SeAuence Plots@7 !$ your 9uantitati/e data ha/e been
gathered o/er time then a time plot can be used to determine i$ the
underlying process that generated that time dependent data appears
to be stable So these plots can help us
1#
assess i$ the random sample assumption is reasonable +or the paired
design problems e assume our set o$ di@erences calculated $rom the
paired obser/ations =F1 F2 Fn> are a random sample So i$ these
di@erences ere obtained o/er time they should be plotted against
their order to see i$ they loo lie they came $rom one population o$
all di@erences =no changing mean or /ariability o/er time>
&eeber7 ie or SeAuence plots are use%ul %or c/eckin
stability only (/en t/e !ata are or!ere! in soe sense+ I% t/ere is
no in/erent or!er to t/e !ata, a seAuence plot s/oul! not be
a!e+
8istoras7 7istograms are especially use$ul $or displaying the
distribution o$ a 9uantitati/e response /ariable ou could mae a
histogram o$ the obser/ations in a one&sample problem o$ the
di@erences in a matched pairs design and o$ each o$ the to samples
separately in the independent samples design E"amine the histogram
$or e/idence o$ strong departures $rom normality such as bimodality
or e"treme outliers Since you are 8ust plotting data =8ust a sample
and not the entire population o$ responses> your histogram may
not loo perfect!y bell&shaped or normal
;-; plots7 *&* plots =or 9uantile plots or normal probability
plots> are generally better than histograms $or assessing i$ a
normal model is appropriate !$ the points in a *&* plot $all
approx$ate!y in a straight line =ith a positi/e slope>
then the normal model assumption is reasonable
o$plots7 ?o"plots are most use$ul $or assessing the /alidity
o$ the assumption o$ eAuality o% population 'ariances in
t/e t(o in!epen!ent saples !esin ,e ould see i$ the !*(s =shon
graphically by the length o$ the bo"es> are comparable and also
compare the o/erall ranges !$ they do ha/e comparable lengths or
siGes =they do not need to be lined up> then e ha/e support that
the e9uality o$ population /ariances assumption is reasonable ,e
ould also ant to compare the to sample standard de/iations
themsel/es and e/enes test o$ e9uality o$ the to population
/ariances may also be a/ailable
1'
ae t/at Scenario Practice %or t/e /ree ests7 7a/ing 8ust re/ieed
the three main t &test in$erence scenarios you should
understand the testing procedures and be able to interpret the
results o$ a test 7oe/er it is important to no hen each scenario
applies (ead each o$ the $olloing in$erence scenarios and determine
hich o$ the three t &test procedures ould be most
appropriate: the one& sample t &test the paired
t &test or the to&independent samples
t &test
1 ; researcher is studying the e@ect o$ a ne teaching techni9ue $or
middle school students )ne class o$ 3- students is taught using the
ne techni9ue and their mean score on a standardiGed test is
compared to the mean score o$ another class o$ 2# students ho ere
taught using the old techni9ue
2 ; company claims that the economy siGe /ersion o$ their product
contains 32 ounces ; consumer group decides to test the claim by
e"amining a random sample o$ 1-- economy siGe bo"es o$ the product
since they ha/e recei/ed reports that the bo"es contain less than
the 32 ounces claimed
3 ;t some uni/ersities athletic departments ha/e come under re $or
lo academic achie/ement among their athletes ;n athletic director
decides to test hether or not athletes do in $act ha/e loer DP;s ;
random sample o$ 2-- student athletes and a random sample o$ 5--
non&athlete students are taen and their DP;s are recorded
4 ;s part o$ a biology pro8ect some high school students compare
heart rates o$ 4- o$ their classmates be$ore and a$ter running a
mile They ant to see i$ the heart rate o$ students their age is
$aster a$ter running a mile than be$ore on a/erage
5 ; hospital is studying patient costsR they decide to $ollo 5--
surgery patients hospital and medical bills $or a year a$ter
surgery and compare them to the estimated costs pro/ided to the
patients be$ore surgery They ant to see i$ the estimated and actual
costs are comparable on a/erage
; chemical process re9uires that no more than 23 grams o$ an
ingredient be added to a batch be$ore the rst hour o$ the process
is complete ;n analyst $eels that due to current settings more than
23 grams may actually be added !$ the analyst is correct the
settings
1L
need to be altered and recent batches recalled ; random sample
o$ 25 batches is obtained $rom the machine that is supposed
to add the ingredient The measurements are used to test the
analysts claim
Suppleent 7 &eression "utput in SPSS There are $our parts
to the de$ault regression output .se the scroll bar at the right
edge o$ the "utput Win!o( to scroll up to the top o$ the regression
output The rst section 8ust reminds you hich /ariable as entered as
the e"planatory " /ariableR $or this e"ample the e"planatory
/ariable is D+(
The second section has the heading .o!el Suary The %odel
Summary starts ith the correlation beteen the to /ariables &
hich is the absol!te val!e o$ the correlation coeEcient
r ou need to loo at the sin o% t/e slope o$ the
regression line to determine i$ you need to put a minus sign in
$ront o$ this /alue to correctly report the correlation coecient
=The actual /alue o$ the correlation coecient is also
reported in the last section o$ regression output under the
column heading eta> The correlation coecient measures the
strength o$ the linear association beteen the to /ariables The
closer it is to 1 or &1 the stronger the linear association The
s9uare o$ the correlation the & SAuare 9uantity has a
use$ul interpretation in regression !t is o$ten called the
coeEcient o% !eterination and measures the proportion o$ the
/ariation in the response that can be e"plained by the linear
regression o$ y on x Thus it is a measure o$ ho ell
the linear regression model ts the data The St!+ #rror o% t/e
#stiate gi/es the /alue o$ s the estimate o$ the population
standard de/iation σ
Model Summary
Predi"tors' >Constant@, D!Aa.
The third part o$ the output contains the >"<> table
%or reression used $or assessing i$ the slope is
signicantly
2-
di@erent $rom - /ia an F test The corresponding
t &test ill be discussed rst and e return to this ;N)H;
part later
ANOVAb
1::.22: : 23.;2:
21
The last portion o$ the output $alls under the heading
oeEcients !n this section the least s9uare estimates $or the
regression line are gi/en These estimated regression coecients are
$ound under the column labeled The estimated slope is ne"t to the
independent /ariable name =in this e"ample it is FN;> and the
estimated intercept is ne"t to ?onstant@ So b-
is the coecient $or the /ariable =Constant> and b1 is the
coecient $or the independent /ariable x in the
model The ne"t column heading is St!+ #rror hich pro/ides the
corresponding stan!ar! error o$ each o$ the least s9uares estimates
;lso produced in this table are the t -test statistics in the
column labeled t and Si+ hich reports the to&sided
p-'alues $or these t &test statistics
Coefficientsa
.19 .03 .:; 5.98 .002
>Constant@
D!A
odel
1
De%endent Eariable' P&AFU6a.
The t -statistic $or the slope in the second ro is a
test o$ the signicance o$ the model ith 1 /ersus
the model ithout 1 " that is $or testing 7-:
β 1 - /ersus
7a: β 1 ≠ - The t -statistic $or
the y &intercept in the rst ro is a
test o$ hether the y &intercept =β o> is
di@erent $rom Gero This test is not o$ten o$ interest unless a
/alue o$ - $or the y &intercept is meaning$ul and o$
interest +or e"ample i$ x amount o$ soap used and
y height o$ the suds then an intercept /alue o$ -
is meaning$ul as no soap ould lead to no suds The column labeled
Si+ gi/es the t(o-si!e! p-'alue $or the corresponding
hypothesis test
SPSS also pro/ides the in$ormation to calculate condence inter/als
$or the parameter estimates The column labeled St!+ #rror pro/ides
stan!ar! errors =estimated standard de/iations> o$ the
parameter estimates and is the 9uantity that is multiplied by the
appropriate t: 'alue in computing the hal$&idth o$ the
condence inter/al (ecall that you can re9uest SPSS to produce
22
these condence inter/als $or you using the Statistics button in the
&eression !ialo bo$
23
Interpretation o% estiate! slope b17 ;ccording to our
regression model e estimate that increasing FN; by one unit has the
e@ect o$ increasing the predicted pla9ue by 1# units
Interpretation o% r27 ;ccording to our model 63 o%
'ariation in pla9ue le/els can be accounted $or by its linear
relations/ip ith FN;
*ecision %or test o% a sini=cant linear relations/ip7 Since
the p&/alue --2 is less than the signicance le/el U -5 e
can re8ect the null hypothesis that the population slope β 1
e9uals -
onclusion7 There is sucient e/idence to conclude that in the
linear model $or pla9ue based on FN; the population slope β 1
does not e9ual Gero 7ence it appears that FN; is a signicant linear
predictor o$ pla9ue
ets return to the >"<> table in the middle o$ the
regression output
ANOVAb
1::.22: : 23.;2:
Predi"tors' >Constant@, D!Aa.
De%endent Eariable' P&AFU6b.
The &eression Su o% SAuares corresponds to the portion
o$ the total /ariation in the data that is accounted $or by
the regression line E/erything that is le$t o/er and not accounted
$or by the regression line is placed in the &esi!ual Su o%
SAuares category Then di/iding the sum o$ s9uares by their
respecti/e !% =degrees o$ $reedom> yields the .ean
SAuares
+inally the ratio o$ the %ean S9uares pro/ides the ; statistic
hich tests i$ the slope is signicantly di@erent $rom Gero =ie i$
there is a signicant non&Gero linear relationship beteen the to
/ariables 6 7-: β1 - /ersus 7a: β1 ≠ -> The Si+
is the corresponding p&/alue $or the F test o$
these hypotheses
24
!n simple linear regression the t &test in the
oeEcients output $or the slope is e9ui/alent to the ;N)H;
F &test Notice that the s9uare o$ the
t &statistic $or testing about the slope is e9ual to the
F &statistic in the ;N)H; table and the
corresponding p&/alues are the same
25
/eckin t/e Siple Linear &eression >ssuptions 7ere is a
summary o$ some graphical procedures that are use$ul in detecting
departures $rom the assumptions underlying the simple linear
regression model
1+ LI#>&IJ7 Fo a scatter plot
o$ y /ersus x The plot should
appear to be roughly linear
2 S>ILIJ7 Fo a se9uence plot o$ the residuals The plot
should sho no pattern indicating any trend in the mean or in the
/ariance o$ the residuals ;n e"ample series plot is shon belo eeber
that $t $s on!y appropr$ate to a1e seuence p!ots #hen there $s soe
order$n% present $n the data-
2
3 "&.>LIJ7 E"amine a *&* plot o$ the residuals to chec
on the assumption o$ normality $or the population =true> error
terms ;n e"ample *&* plot is shon belo
4 "S> S>*>&* *#<I>I" o$ the population
=true> error terms: %ae a plot o$ the residuals
/ersus x This plot is called a resi!ual plot The
residuals represent hat is le$t o/er a$ter the linear model has
been t The residual plot should be a random scatter o$ points in
roughly a horiGontal band ith no apparent pattern ;n e"ample
residual plot is shon at the right Sometimes this plot can also
re/eal departures $rom linearity =ie that the regression analysis
is not appropriate due to lac o$ a linear
relationship>
2#
Lab 17 Sca'ener 8unt9 .ean an! .e!ian
Ob7ective9 !n this lab you ill /isit se/eral o$ the sites that
ill be used throughout the course o$ the semester and learn the
locations o$ the important resources !n the second part o$ the lab
you ill mae use o$ an applet to e"plore ideas related to measures
o$ center $or a data set
Overvie&9 !n this course you ill be using /arious $orms
o$ technology including applets that ill be use$ul $or
e"ploring statistical concepts There ill be a lot o$ places to
/isit and the lins $or all these sites are a/ailable on CTools !n
the rst portion o$ the !n&ab Pro8ect you ill complete a
Sca/enger 7unt that ill allo you to become $amiliar ith the
resources a/ailable on CTools !n the second part o$ the !n&ab
Pro8ect you ill use the rst applet to enhance your understanding o$
the mean and median
.easures o% enter7 %easures o$ center are numerical /alues that
tend to report the $dd!e o$ a set o$ data The to that e ill
$ocus on are the mean and the median
1 .ean7 The mean o$ a set o$ n obser/ations is simply the
sum o$ the obser/ations di/ided by the number o$ obser/ations
n
2 .e!ian7 The median o$ a set o$ obser/ations ordered $rom
smallest to largest is the middle /alue !t ill be the /alue such
that =at least> hal$ o$ the obser/ations are less than or e9ual
to that /alue and =at least> hal$ the obser/ations are
2'
greater than or e9ual to that /alue
.easures o% <ariation or Sprea!7 %easures o$ /ariation include
the !nter9uartile (ange =!*(> and standard de/iation These
numerical summaries describe the amount o$ spread that is $ound
among the data ith larger /alues indicating more /ariability or
more spread
1 Stan!ar! *e'iation7 Standard de/iation is a measure o$
the spread o$ the obser/ations $rom the mean !t is actually the
s9uare root o$ an a/erage o$ the s9uared de/iations o$ the
obser/ations $rom the mean 3e can th$n1 of the standard de$at$on as
approx$ate!y an aera%e d$stance of the obserat$ons fro the
ean-
2 I;&7 The !*( measures the spread o$ the middle 5-Q o$
the data !t is dened as the di@erence beteen the 3rd 9uartile
=*3> and the 1st 9uartile =*1> These 9uartiles are also
called the #5th and 25th percentiles respecti/ely !*( *3
6 *1
2L
War-)p7 ateorical an! ;uantitati'e <ariables Tae a $e
minutes to recall the to types o$ /ariables that ha/e been
introduced in class: categorical and 9uantitati/e +or each o$
the $olloing /ariables determine hether it is a categorical or
9uantitati/e /ariable (ecall that numerical summaries such as mean
and median can only be computed $or 9uantitati/e /ariables
Cell Phone %odel =iPhone ;ndroidV> ateorical
;uantitati'e
;uantitati'e
;uantitati'e
Points scored in a basetball game ateorical
;uantitati'e
ILP7 Sca'ener 8unt
The rst acti/ity in this !n&ab Pro8ect is a Sca/enger
7unt that ill allo you to become $amiliar ith the locations $or all
your important resources on CTools
ask >ns(er
1+ o to t/e ools Site an! =n! t/e %orula car! pae in &esources7
3hat $s the top$c header at the top of pa%e / of foru!a
card*
2+ o to t/e ools Site an! =n! t/e *ata Sets %ol!er in
&esources7
3hat $s the nae of the !ast data set (#hen sorted a!phabet$ca!!y a
z)*
3+ o to t/e ools Site an! =n! t/e Lab In%o %ol!er in
&esources7
Do to the $older $or your lab section and nd your lab
syllabus
4oo1 at the Grad$n% o!$cy sect$on The In. 4ab ro6ects #$!! count
#hat 7 to#ard your ,na! course %rade*
31
ask >ns(er 4+ o to t/e ools Site an! =n! t/e
Stats 250 Prelab link7
Clic on the lin that ill tae you to the
Prelab Sitemaer Site
Clic on ESS)N 1 'o# any short $deos do you hae to #atch for your
4esson 8*
5+ o to t/e ools Site an! =n! t/e >ssinents link in t/e le%t
enu7
+ind the P(E;? ESS)N 1 assignment
3hen $s your 94:; 49SS<N 8 ass$%nent due*
<4 o to t/e ools Site an! =n! t/e Link to t/e "nline 8W site
?calle! Lecturebook@7
!$ you ha/e your 7, tool subscription log in and select your DS!
3hat $s our !ab sect$on nuber*
Note: !$ you dont ha/e your DS! selected =ith correct section
number> your DS! cannot see your homeor and thus you ill recei/e
- points0
6+ o to t/e ools Site an! =n! t/e Lab In%o Fol!er in
&esources7
Do to the $older $or ins to ab ;pplets
Clic on ;? 1
3hat $s the t$t!e of the :pp!et*
+or the second part o$ the !P you ill or ith this applet
32
ILP7 /e .ean an! t/e .e!ian
The applet that you opened in the last step o$ the Sca/enger
7unt portion o$ the !P ill no be used to help you disco/er ho the
shape o$ the distribution $or a set o$ data can pro/ide important
details regarding the relationship beteen the mean and the median
$or that data !n this acti/ity you ill obser/e the mean and the
median $or a /ariety o$ shapes o$ distributions
)pen the Lab 1 *escripti'es >pplet $rom the Ains to
ab ;ppletsB $older on the Stat 25- CTools site =in the Aab !n$oB
$older under A(esourcesB>
;lternati/ely the original applet can be $ound at:
http:IIonlinestatboocomIstatWsimIdescripti/eIinde" html
33
This eb site contains a Xa/a applet that ill help you
understand the relationship beteen the mean and the median
1 (ead the applet instructions
2 Clic ein and you ill see a histogram o$ nine numbers: 3 4 4
5 5 5 and #
This histogram shos a symmetric distribution The summary in
the upper le$t corner shos that the mean and the median are both
e9ual to 5 the standard de/iation is 115 and there is no seness
=note the seness measure is ->
3 Change the distribution so that it no has a positi/e se by
ApaintingB the histogram ith the mouse Foes this correspond to a
right or le$t seed distributionK ,hich is bigger the mean or the
medianK
4 Change the distribution so that it has a negati/e se ,hich
direction is this distribution seedK No hich is bigger the mean or
the medianK
5 Try a $e other distributions =uni$orm u&shaped etc> and
see ho the mean and median compare Comment on your
34
ndings here
SummariGe hat you ha/e learned about the relationship beteen the
shape o$ a distribution and the mean and median
ool-*o(n K17 W/ic/ to &eport ou ha/e seen that the mean
is more sensiti/e to outliers than the median +or a data set that
contains se/eral outliers hich measure o$ center ould you choose to
reportK ,hat measure o$ spreadK E"plain
ool-*o(n K27 &eal Worl! #$aple ou are the manager
o$ a local grocery store ho is put in charge o$ setting the prices
$or your stoc ou ill determine the prices $or each product by
e"amining the prices o$ your competitors in the neighborhood
Suppose your neighborhood consists mainly o$ chain store
supermarets along ith 2 high&end grocery stores ou ant to
set your prices lo enough to attract customers but high enough so
you ill mae a prot 7o ould you use these measures o$ center to help
you determine the pricesK
35
#$aple #$a ;uestion on .ean an! .e!ian
attoos B The Pe (esearch Center too a sur/ey o$ %illenials
that is young adults beteen the ages o$ 1' and 2L The sur/ey looed
at characteristics that researchers thought described %illenials 6
social netoring li$e priorities and aspects o$ the respondentsM
appearances )ne 9uestion ased ho many tattoos respondents had and
there ere 4' respondents ho had at least one tattoo Fata $or these
4' respondents are shon in the histogram belo
a 7o could this histogram be describedK Choose all that apply
I=T S>?D L;T S>?D
S@""TIC ST(L
I"OD(L A+I"OD(L
DC(SI+ T+D
b !$ the mean number o$ tattoos $or these %illenials is 3-1 tattoos
hich o$ the $olloing is a reasonable /alue $or the medianK
2 34%$ B 6
c !s the $olloing statement true or $alseK ;ased on the h$sto%ra"
#e can be sure the ran%e of the tattoo data $s exact!y 20-
TA ;(LS
d The standard de/iation is computed e"actly to be 2L1 ,rite a
sentence to interpret this standard de/iation =in terms o$ an
a/erage distance> in the conte"t o$ the problem
3#
Lab 27 *escribin *ata (it/ rap/s an! ubers
Ob7ective9 !n this lab you ill use some graphical and
numerical tools to summariGe the distribution $or a 9uantitati/e
/ariable or response 6 a histogram a bo"plot mean median standard
de/iation and inter9uartile range =!*(> ou ill also be
introduced to side&by&side bo"plots $or comparing to or
more distributions and bar charts $or summariGing categorical data
These techni9ues can be /ery use$ul at the start o$ data
analysis to get a $eel $or the data
Overvie&9 To graphs that can be used to summariGe
the distribution $or a single 9uantitati/e /ariable or response are
a /istora and a bo$plot Each graph pro/ides di@erent
in$ormation about the distribution ,hen used properly graphs can be
a /ery e@ecti/e ay to summariGe data Fata on a single 9uantitati/e
/ariable should rst be e"amined graphically The o/erall shape o$
the distribution and e"istence o$ outliers can generally be used to
assess i$ the data appear to be coming $rom a relati/ely homogenous
population !$ so then /arious numerical summaries may be used to
characteriGe the center o$ the distribution =such as mean and
median> and the spread o$ the distribution =such as the standard
de/iation and the !*(> +or categorical /ariables a bar
c/art can be used to display the number $alling in each
category =$re9uency distribution>
8istoras7 ; histogram displays the distribution o$ a
9uantitati/e /ariable by shoing the $re9uency =count> or percent
o$ the /alues that are in /arious classes The classes are typically
inter/als o$ numbers that co/er the $ull range o$ the /ariable
7istograms can be used to assess the syetry
and o!ality o$ a single distribution or $or
comparing the relati/e locations and shapes o$ se/eral
distributions
o$plots7 )ne plot that can detect e"treme obser/ations or
outliers is the bo$plot4 ; bo"plot is a graphical
representation o$ the /e&number summary namely the
minimum rst 9uartile median third 9uartile and ma"imum o$ the data
The centerline o$ the bo" mars the median or the
5-th percentile The sides o$ the bo" sho the rst
=loer> 9uartile *1 and the third =upper> 9uartile *3 Thus a
bo"plot shos the o/erall range =ma"imum 6 minimum> and the
interAuartile rane =!*( *3 6 *1> ; modied bo"plot uses a
rule $or identi$ying /alues that are
3'
e"traordinary compared to the others =outliers or outsi!e
'alues> Circles =o> are used to denote outliers and asteriss
=J> to denote e"treme outliers i$ any are present ;ny point belo
*1 6 =15 " !*(> or abo/e *3 =15 " !*(> is considered an
outlier E"treme outliers are points belo *1 6 =2 " !*(> or abo/e
*3 =2 " !*(> ;ox p!ots cannot te!! you the shape of the
d$str$but$on-
3L
Si!e-by-si!e o$plots7 These plots are help$ul $or comparing
to or more distributions ith respect to the /e&number summary
+or e"ample suppose you are interested in comparing the
distribution o$ a /ariable such as the salary o$ the employees o$ a
certain company !$ you ha/e in$ormation on se" $or the group you
might be interested in comparing the distribution o$ salary
o$ $emales ith respect to males !n this case the side&by&
side bo"plot ill be an important part o$ the descripti/e analysis
o$ the data set in/ol/ed
ar /arts7 )ne ay to display the number or $re9uency
distribution $or a categorical /ariable is ith a bar chart ; bar
chart shos the percentage o$ items that $all into each cateory or
/alue o$ a cateorical 'ariable !t displays a bar $or each
category ith the height o$ each bar e9ual to the number the
proportion or the percentage o$ items in that category+
!$ the categories ha/e no inherent order e could rearrange the bars
in the graph in any ay e lie !n such cases the shape o$ the bar
graph ould ha/e no bearing on its interpretation
War-)p7 .atc/in %atch the graph or descripti/e statistic to one o$
its primary uses =some may ha/e more than one and you may use an
anser more than once>
WWWW i 7istogram ; %easure o$ center not sensiti/e to
outliers
WWWW ii?ar Chart ? Compare distributions =but not their
shapes>
WWWW iii %ean C E"amine distribution o$ a categorical
/ariable
WWWW i/ %edian F E"amine distribution o$ a 9uantitati/e
/ariable
WWWW /Side&by&side ?o"plots E %easure o$ spread
WWWW /i !*( + %easure o$ center sensiti/e to outliers
4-
ILP7 <isualiDin an! #$plorin *ata
!n this !n&ab Pro8ect you ill learn ho to create graphs and
obtain descripti/e statistics $or a data set using SPSS
Tas9 The data set eployee !ata+sa' contains in$ormation on
employees at a company E"plore possible 9uestions this data could
be used to address Create appropriate graphs and obtain descripti/e
statistics $or current salary and discuss the results
1 og onto your computer To obtain the data set go to CTools and nd
AFatasets $or abs and 7,B in the A(esourcesB $older Select eployee
!ata+sa' and sa/e it to a directory o$ your choice
=alternati/ely you may open the data set directly in hich case you
do not need to open SPSS a$ter> )nce you ha/e sa/ed the data set
go to Proras $olloed by Statistics Packaes .at/ Proras and
then select SPSS
2 To open the eployee !ata+sa' data set $rom ithin SPSS select
the option "pen an e$istin !ata source $rom the dialog bo"
ith the .ore Files line highlighted and clic on "H+ Change
the directory to here you sa/ed the data set select eployee
!ata+sa' and clic on the "pen button The data set ill
open and you can /ie it =it ors lie an E"cel spreadsheet>
3 The starting /ie o$ the data is the *ata #!itor indo 7ere
you can see the /ariables in the data set and their /alues The rst
/ariable you should see is !F
,hat is the second /ariable present in the data setK ,hat type o$
/ariable is itK ,hat is the eighth /ariable present in the data
setK ,hat type o$ /ariable is itK
4 ?rainstorm possible 9uestions this data set may ha/e been
collected to address
5 +ocus on the /ariable current salary ,hat are some graphs
41
that ould be appropriate to mae $or this /ariableK
Create a histogram $or current salary .se the graphs menu &
rap/sG Leacy *ialosG 8istora select =current> salary and mo/e it
to the /ariable bo" Editing details can be $ound in the #!itin
/arts in SPSS section =Supplement 4> o$ this orboo
Note: ;ll Statistics 25- homeor and labor ill re9uire that students
pro/ide an appropriate title and their name on each SPSS chart or
output +or histograms clic on the itles button and enter the
in$ormation and clic on ontinue
Fescribe hat the histogram shos about the distribution o$
current salary ; good description includes in$ormation regarding
the shape modality and range o$ the data along ith possible
outliers
# )btain a bo"plot $or current salary .se: rap/sG Leacy *ialosG
o$plotG SipleG Suaries o% separate 'ariables This is appropriate
$or one /ariable ith no groups Clic on the button *e=ne to open
another dialog bo" that denes the /ariables $or our analysis Clic
once on salary to highlight it and then on the o$es
&epresent arro to select it
ote7 ?o"plots do not ha/e a Titles option 7oe/er you may add a
title /ia the Chart Editor Fouble clic the graph and $rom the Chart
Editor menus select "ptionsG itle The Chart Editor creates the te"t
bo" and automatically positions it in the top center o$ the chart
Type the te"t and press enter hen you are nished typing To enter
line breas press Shi$tEnter
Fescribe hat the bo"plot shos about the distribution o$
current salary ,hat do the /arious lines on the bo"plot
representK
42
' Numerical summaries may also be obtained $or any 9uantitati/e
/ariable To obtain the /e&number summary do >nalyDeG
*escripti'e StatisticsG FreAuencies and then choose the summary
measures you ant under the Statistics button +ill in the basic
summary measures $or current salary =some re9uire hand
calculation>
%ean: Standard Fe/iation:
%in: %a": (ange: %a"&%in
L To obtain numerical summaries or any graph =e"cept bo"plots>
$or current salary by minority status e need to split the data le
.se *ataG Split File and choose "raniDe output by roups Foing
this success$ully does not produce any noticeable changes in the
SPSS indosR there ill only be to short lines o$ output in the
)utput indo conrming the split The grouping /ariable is
minority classication )btain descripti/e statistics $or current
salary by minority status =once the data is split 8ust generate
descripti/e statistics> ist some o$ your ndings belo
%inority: Non&%inority:
1-Create histograms $or both minorities and non&minorities
=ea/e the data split and create histograms as be$ore> +or each
category ould it be appropriate to summariGe the shape o$ the
distribution o$ the current salary using descriptors such as seed
or symmetricK ,hyK
Iportant ote7 ,hen you are nished conducting analyses by
43
group you need to go bac to the Split File dialog bo" and choose
>nalyDe all cases, !o not create roups ;gain the only change ill
be one line o$ output conrming the data has been
Aun&splitB
11Create side&by&side bo"plots $or current salary The data
le should +OT be split to create these .se rap/sG Leacy
*ialosG o$plot ith Siple and Suaries %or roups o% cases
%inority Status is the /ariable $or the category a"is and current
salary is the /ariable
7o does the distribution $or current salary compare $or minorities
/ersus non&minorities =based on the side&by&side
bo"plots histograms and descripti/es>K
ool-*o(n7 /eck Jour )n!erstan!in about St! *e' (ecall the denition
o$ Standard Fe/iation $rom ab 1 6 the standard de/iation is a
measure o$ the spread o$ the obser/ations $rom the mean !t is
actually the s9uare root o$ an a/erage o$ the s9uared de/iations o$
the obser/ations $rom the mean 3e can th$n1 of the standard
de$at$on as approx$ate!y an aera%e d$stance of the obserat$ons fro
the ean-
a Suppose e are interested in learning about heights o$
%ichigan students ,e tae a simple random sample o$ 1-- students and
nd that the a/erage height $or this sample is inches ith a standard
de/iation o$ 2 inches ?elo are some interpretations o$ this
standard de/iation +or each one e/aluate i$ it is a correct
interpretation or say hy it is incorrect
1 The a/erage distance beteen the height /alues and the mean height
is roughly 2 inches
Correct Incorrect because
2 The height /alues di@er $rom the mean height by appro"imately 2
inches on a/erage
Correct Incorrect because
WWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
3 The a/erage distance beteen the height /alues is roughly 2
inches
Correct Incorrect because
WWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
b ; student pro/ided the $olloing $ncorrect interpretation o$
standard de/iation
A'Q o$ the height /alues are ithin 2 inches o$ the mean
heightB
,hy is this interpretation incorrect in generalK ,hat graph
o$ the height data ould you mae to chec i$ the statement
could be correctK ,hat ould you loo $or in the graphK
45
#$aple #$a ;uestion on o$plots +i$ty&/e parents o$
grade&school children ere recently inter/ieed regarding the
brea$ast habits in their $amily )ne 9uestion ased as i$ their
children tae the time to eat a brea$ast =recorded as brea$ast
status 6 es or No> The grades o$ the children in some core
classes =eg reading riting math> ere also recorded and a
standardiGed grade score =on a 1-&point scale> as computed
$or each child ;t the end o$ the study it as disco/ered that the
children ho do tae time to eat brea$ast get higher grade scores
than those ho dont
a ,hat type o$ study is thisK 9xper$ent <bserat$ona!
study
b ,hat is the response /ariable in this studyK
WWWWWWWWWWWWWWWWWWWWWWWWWWWWW
c ,hat is the e"planatory /ariable in this studyK
WWWWWWWWWWWWWWWWWWWWWWWWWWW
d ,hat type o$ /ariable is the e"planatory /ariableK =ate%or$ca!
>uant$tat$e
Side&by&side bo"plots o$ the childrens standardiGed grade
scores are pro/ided
Do *ou have brea$fast
5
3
e ,hat is =appro"> the loest grade scored by a child ho does
ha/e brea$astK WWWWWWWWWWW points
$ ,hat is =appro"> the !*( $or the grade scores o$ children ho
do eat brea$astK WWWWWWWWWWW points
g .sing one o$ the measures displayed in the bo"plot complete
4
this sentence The highest grade scored by one o$ the children
not eating brea$ast is (approx) e9ual to the
WWWWWWWWWWWWWWWWWWWWWW $or the children ho !o eat
brea$ast
h True or $alse: The symmetry in the b $or the children
not eating brea$ast $p!$es that the histogram o$ the same
data is also symmetric Circle one: Tr!e ;alse E"plain:
Lab 37 ie Plots an! ;-; Plots
Ob7ective9 !n this lab you ill add to your set o$ graphical
tools $or e"amining data The graphs you ill e"amine include
se9uence =time> plots $or data collected o/er time and *&*
plots $or checing hether a normal model is a reasonable
distribution $or a 9uantitati/e /ariable
Overvie&9 ab 2 pro/ided a summary o$ some graphical and
numerical tools that can be used to summariGe the distribution $or
9uantitati/e and 9ualitati/e /ariables or responses ,e may use
those tools $or the acti/ities in this !n&ab Pro8ect but e ill
also need to utiliGe the ne tools described belo Note that these
%raph$ca! too!s are $ntroduced so!e!y $n !ab" not $n !ecture" so $t
#$!! bene,t you to read th$s oer$e# thorou%h!y-
ie ?SeAuence@ Plots7 Fata is o$ten gathered o/er time
Employment rate stoc prices and sales gures are 8ust a $e e"amples
,hen data is gathered o/er time it is generally ise to e"amine the
data plotted against time Plots against time can re/eal the main
$eatures o$ a time series o/erall patterns and striing de/iation
$rom those patterns Some o/erall patterns that may arise are:
; persistent long&term rise or $all called a tren! =either
increasing or decreasing>
; pattern that repeats itsel$ at regular inter/als o$ time
called seasonal 'ariation
; persistent long&term increase or decrease in the
'ariation o$ the obser/ations called a pattern in
'ariation
!$ data is collected o/er time a time plot can be used to chec the
assumption o$ a random sample hich ill be needed $or in$erence
procedures ;s you ha/e learned in your lecture notes
4#
on sampling a random sample consists o$ $ndependent and
$dent$ca!!y d$str$buted =iid> obser/ations This means that
the obser/ations can be considered as all coming $rom the same
parent population =ith the same or $dent$ca! distribution> and
are $ndependent o$ one other ,ith a se9uence plot you
can chec the $dent$ca!!y d$str$buted aspect o$ a random sample
by looing $or e/idence o$ stability in the plot Stability is
supported hen both the mean o$ the obser/ations and the amount o$
/ariation among obser/ations appear to be constant o/er time and
there does not appear to be any pattern in the resulting
plot
4'
;-; Plots7 ater in this class e ill see that the assuption
o% a noral o!el %or a population o% responses (ill be nee!e!
in or!er to per%or certain in%erence proce!ures Pre/iously e ha/e
seen that a histogram can be used to get an idea o$ the shape o$ a
distribution 7oe/er there are more sensiti/e tools $or checing
hether the shape is c!ose to a normal =bell&shaped>
model
The best plot that can be used to chec $or normality is
called a *& * Plot hich is a plot o$ the percentiles =or
9uantiles> o$ a standard normal distribution against the
corresponding percentiles o$ the obser/ed data !$ the obser/ations
$ollo an appro"imately normal distribution the resulting plot
should be roughly a straight line ith a positi/e slope Fe/iations
$rom this indicate possible departures $rom a normal
distribution
;t the right is an e"ample o$ a *&* Plot shoing strong support
to say the data that does seem to come $rom a population ith an
appro"imately normal distribution
4L
The *&* plot on the le$t indicates the e"istence o$ to
clusters o$ obser/ations The *&* plot in the center shos
an e"ample here the shape o$ the distribution appears to be seed
right The *&* plot on the right shos e/idence o$ an underlying
distribution that has shorter tails compared to those o$ a normal
distribution
ote7 !t is only important that you can see the departures in the
abo/e graphs and not as important to no i$ the departure implies
seed le$t /ersus seed right and so on ; histogram ould allo you to
see the shape and type o$ departure $rom normality
+inally e consider an e"ample *&* plot =shon at the right>
that appears normal ith the e"ception o$ one data point
5-
!n this case e ould say the *&* plot shos e/idence o$ an
underlying distribution hich is appro"imately normal e"cept $or one
large outlier that should be $urther in/estigated
Note that outliers could appear in either the upper or loer
tail
?arm-Ap: /eck Jour )n!erstan!in ?e$ore beginning the !n&ab
Pro8ect re/ie your understanding o$ the ey concepts related
to time plots and ** plots
1 +or each ord pair select the appropriate ord=s> to complete
the sentences
,e use se9uence =or time> plots to chec the
in!epen!ent i!entically !istribute!
part o$ the random sample assumption
by looing to see i$ the data appear to be
stable noral
that is ha/e a constant mean and constant /ariation o/er
time
!$ there is any pattern in the obser/ations o/er time e
s/oul! s/oul! not
mae a histogram o$ the obser/ations $or $urther analysis
2+ !$ the time plot supports that e can consider our obser/ations
to be a random sample =that is shos our underlying process appears
to be stable> e could mae a histogram to help
51
assess i$ the model $or our response in the underlying population
is normally distributed ,hat other graph could e mae to help assess
this normality assumptionK ,hat ould e hope to see in that graph to
support normality is reasonableK
ILP7 ie Plot an! ;; Plot #$aples !n this rst part o$ the !n&ab
Pro8ect you ill loo at more e"amples o$ time plots to help learn ho
to better Yread such graphs $or assessing hether our data appear to
be stable and support the random sample condition
ask 17 Do to the Stat 25- Prelab Site and nd the Time Series tab
along the top Fonload the timeseriesrdata le script you ill be
using in this part o$ the !P ,hen you double clic on this script le
it should open up the ( program =on all campus machines and you can
donload ( to your computer $ree too> 1 ?egin the program by
entering the $olloing command
timeseries=>
2 Select your sample siGe by entering a number beteen 1 and
1----
3 Select i$ you ould lie to see an e"ample o$ a stable or unstable
time plots
4 !$ unstable time plots are selected you ill be ased 9uestions to
determine the type o$ unstable pattern youd lie to see
5 )nce your time plots ha/e been created you ill be ased i$
you ant to sa/e your plots to the destop as an image ;nser and then
you ill again be ased to select a ne"t sample siGe Try out the
/arious options and e"plore the /arious patterns o$ time plots that
ere mentioned in the introduction
Setch belo a time plot hich indicates both an increasing mean and a
decreasing /ariance
52
ask 27 Do to the Stat 25- Prelab Site and nd the ** Plot tab
along the top Fonload the 99plotrdata le script you ill be using in
this part o$ the !P ,hen you double clic on this script le it
should open up the ( program =on all campus machines and you can
donload ( to your computer $ree too> 1 ?egin the program by
entering the $olloing command
99plot=>
2 Select your sample siGe by entering a number beteen 1 and
1----
3 Select the type o$ distribution you ould lie an e"ample a ** plot
be generated $rom
4 )nce your ** plots and the corresponding histograms ha/e been
created you ill be ased i$ you ant to sa/e your plots to the destop
as an image ;nser and then you ill again be ased to select a ne"t
sample siGe Try creating ** plots $or many di@erent distributions
and sample siGes
5 Setch belo one o$ the resulting ** plots and 7istograms $or a
sample o$ 1--- obser/ations $rom a seed right distribution ;; plot7
8istora7
53
ILP7 ie-*epen!ent *ata
ac.ro!nd $9 The data set !eat/rate+sa' contains
the death rate =number o$ deaths per 1-- million miles dri/en>
taen at to&year inter/als $rom 1L- to 2--4
Fisplay and summariGe this data in an appropriate and use$ul ay
,hat do you seeK ,ould it mae sense to mae a histogram o$ the
death ratesK The $olloing steps ill guide your thining as you
complete this tas
2 ,hy should a se9uence plot be made to display this dataK
3 %ae a se9uence plot $or the data using >nalyDeG ForecastinG
SeAuence /arts ,hat does the graph shoK Comment on i$ you see any
trend seasonal /ariation or pattern in /ariation in this
graph
4 Foes the plot appear to be stableK ,hat ould you conclude
i$ ased i$ the data ere a random sample o$ death ratesK
5 ,ould it mae sense to mae a histogram o$ the death ratesK ,hy or
hy notK
54
ac.ro!nd 29 The data set ol!%ait/%ul+sa' contains
the date and duration o$ eruptions =in minutes> o$ the )ld
+aith$ul geyser The data as collected se/eral times per day o/er 23
consecuti/e days Fisplay and summariGe the data in an appropriate
and use$ul ay ,hat do you seeK Foes there appear to be any pattern
to this processK The $olloing steps ill guide your thining as you
complete this tas
1 %ae a time plot $or the data using >nalyDeG ForecastinG
SeAuence /arts ,hat does the graph shoK ;re there any
patterns to the processK
2 Foes the plot appear to be stableK ,hat ould you conclude
i$ ased i$ the data ere a random sample o$ eruptionsK
ILP7 >ssessin orality ou ha/e discussed using a
histogram to e"amine the shape o$ the distribution o$ a
9uantitati/e /ariable !$ the histogram shos a $airly homogeneous
unimodal set o$ obser/ations e might lie to assess hether a normal
distribution is a reasonable model $or the response ; better graph
$or assessing normality is a *&* plot !n this problem e ill
e"amine a $e distributions and see hat each corresponding *&*
plot loos lie
ac.ro!nd $9 Suppose a study e"amined high school students and
the relationship beteen !* and DP; .se the i9sa/ dataset and
e"amine the distribution o$ !*
1 Create a histogram and a *&* plot $or the !* /alues *&*
plots are created /ia ;nalyGe Fescripti/e Statistics *&* plots
Pro/ide rough setches 8istora7 ;; Plot7
2 Fescribe the shape o$ the resulting histogram
55
3 !s a normal distribution a reasonable model $or !* scores in the
population based on this *&* plotK
ac.ro!nd 29 ,e ha/e pre/iously e"plored the eployee
!ata+sa' dataset /ariable o$ salary No lets chec to see i$ the
o/erall distribution o$ salary can be considered normal and then to
see i$ the distribution o$ salary might be normal depending on
minority status
1 Create a histogram $or the /ariable salary and describe its
shape
2 Create a *&* plot $or salary using >nalyDeG *escripti'e
StatisticsG ;-; Plots ?ased on the e/idence o$ these graphs is a
normal distribution an appropriate model $or current salaryK ,hy or
hy notK
3 Create histograms and *&* plots separately $or minorities and
non&minorities =recall the *ataG Split File command>
Foes the distribution o$ salary appear to be di@erent $or either
groupK Comment on both histograms and *&* plots
4 +or salary is a normal model reasonable $or either minorities or
non&minoritiesK
5
ool-*o(n7 .atc/in rap/s %atch the corresponding histograms bo"plots
and *&* plots
8istora > 8istora 8istora
,hich type o$ graph best shos the shape o$ the underlying
distributionK
,hich type o$ graph best shos i$ the underlying distribution
appears to be normal =bell&cur/e>K
#$aple #$a ;uestion on SeAuence Plots an! ;- ; Plots ; ne method o$
measuring phosphorus le/els in soil is under consideration ; sample
o$ 11 soil specimens is analyGed using the ne method The time
series =se9uence plot> $or the 11 obser/ations is presented
belo
a Comment on the o/erall stability o$ these data based on this
plot
#euen"e number
0
50
20
00
;:0
;0
;50
;20
b ;n assumption o$ many statistical in$erence methods is that the
data $ollo a normal distribution !n the space pro/ided belo setch
ho the *&* plot ould appear i$ a normal distribution as a good
model $or phosphorus le/els
5L
Lab 47 Probability an! &an!o <ariables
Ob7ective9 The ob8ecti/e o$ this lab is to become $amiliar ith
using the models $or random /ariables and to nd the probabilities
associated ith the models you ha/e learned The probabilities e
compute $rom these models =$or e"ample pZ/alues in testing
theories> ill help us mae reasonable decisions ou ill or ith
three random /ariables and the methods used to calculate
probability $or each /ariable ou ill also become $amiliar ith
se/eral concepts that allo $or easier calculation o$
probabilities
Overvie&9 !n this lab you ill be introduced to se/eral
random /ariables and their models These /ariables can be classied
as one o$ to types: a discrete random variable hich has a nite
number o$ outcomes and a contin!o!s random variable hich has
an innite number o$ outcomes
-
• Independent vents9 To e/ents ; ? are said to be independent
i$ noing that one ill occur =or has occurred> does not change
the probability that the other occurs !n probability notation this
can be e"pressed as P=;[?> P=;>
• "!t!all' 1cl!sive9 To e/ents ; ? are mutually e"clusi/e =or
dis8oint> i$ they do not contain any o$ the same outcomes So
their intersection is empty
andom )ariables9 ; random /ariable assigns a number to each
outcome o$ a random circumstance or e9ui/alently a random /ariable
assigns a number to each unit in a population The distribution o$ a
random /ariable is a model that shos us hat /alues are possible $or
that particular random /ariable and ho o$ten those /alues are
e"pected to occur =ie their probabilities> The model can
be e"pressed as a $unction or table or picture depending on the
type o$ /ariable it is
,e ill consider to broad classes o$ random /ariables: discrete
random /ariables and continuous random /ariables
1
Discrete andom )ariable9 ; discrete random /ariable \ is a
random /ariable ith a nite or countable number o$ possible outcomes
The probability distribution $unction =pd$> $or a discrete
random /ariable \ is a table or rule that assigns probabilities to
the possible /alues o$ the \
To conditions that must alays apply to the probabilities $or
a discrete random /ariable are:
Condition 1: The sum o$ all o$ the indi/idual probabilities must
e9ual 1
Condition 2: The indi/idual probabilities must be beteen - and
1
inomial andom )ariable9 )ne type o$ a discrete random /ariable
is the binomial random /ariable hich counts the number o$ times a
certain e/ent occurs out o$ a particular number o$ obser/ations or
trials o$ a random e"periment
; binomial e"periment is dened by the $olloing conditions: 1 There
are n AtrialsB here n is determined in ad/ance not a
random /alue 2 There are to possible outcomes on each trial called
a
AsuccessB =S> and a A$ailureB =+> 3 The outcomes are
independent $rom one trial to the ne"t 4 The probability o$ a
AsuccessB remains the same $rom one
trial to the ne"t and this probability is denoted by p The
probability o$ a A$ailureB is 1 6 p $or e/ery trial
; binomial random /ariable is dened as: \ number o$ successes
in the n trials o$ a binomial e"periment
Contin!o!s andom )ariable: ; continuous random /ariable \ taes on
all possible /alues in an inter/al =or a collection o$
inter/als> The ay that e determine probabilities $or continuous
random /ariables di@ers in one important respect $rom ho e
determine probabilities $or discrete random /ariables +or a
discrete random /ariable e can nd the probability that the /ariable
\ e"actly e9uals a specied /alue ,e cant do this $or a continuous
random /ariable !nstead e are only able to nd the probability that
\ could tae on /alues in an inter/al ,e do this by determining the
corresponding area under a cur/e called the probability density
$unction o$ the random /ariable
So the probability distribution o$ a continuous random /ariable is
described by a density cur/e The probability o$ an e/ent is the
area under the cur/e $or the /alues o$ \ that mae up the
e/ent
2
The probability model $or a continuous random /ariable
assigns probabilities to inter/als
Fenition: ; cur/e =or $unction> is called a Probability *ensity
ur'e i$:
1 !t lies on or abo/e the horiGontal a"is 2 Total area under the
cur/e is e9ual to 1
+ormal andom )ariable: The $amily o$ normal distributions is /ery
important because many /ariables ha/e this shape and $orm
appro"imately and many statistics that e use in our in$erence
methods are based on sums or a/erages hich generally ha/e
=appro"imately> a normal distribution
; normal cur/e is symmetric bellZshaped centered at the mean and
its spread is determined by the standard de/iation !n $act the
points o$ in<ection on each side o$ the mean mar the /alues hich
are one standard de/iation aay $rom the mean
Standardied Scores9 ; normal distribution is inde"ed by its
population mean and its population standard de/iation (ecall that
the standard de/iation is a use$ul AyardsticB $or measuring ho $ar
an indi/idual /alue $alls $rom the mean The standardiGed score or
GZscore is the distance beteen the obser/ed /alue and the mean
measured in terms o$ number o$ standard de/iations Halues that are
abo/e the mean ha/e positi/e GZscores and /alues that are belo the
mean ha/e negati/e GZscores
+ormal (ppro1imation to t/e inomial Distrib!tion9 The easier
ay in/ol/es using a normal distribution The normal distribution can
be used to appro"imate probabilities $or other types o$ random
/ariables one being binomial random /ariables hen the sample siGe n
is large
1pected )al!e9 The e"pected /alue o$ a random /ariable is the
mean /alue o$ the /ariable \ in the sample space or population
o$ possible outcomes E"pected /alue denoted by E=\> can
also be interpreted as the mean /alue that ould be obtained $rom an
innite number o$ obser/ations on the random /ariable
Standard Deviation9 The standard de/iation can be /ieed as
appro"imately the a/erage distance o$ the possible /alues o$ \ $rom
its mean
3
War-)p7 ypes o% <ariables Todays typical undergraduate
student is o$ten characteriGed as pre$erring teamor e"periential
acti/ities and the use o$ technology ;n EC;( =Educause Center
$or ;pplied (esearch> study as published on technology use among
undergraduate students The study used sur/ey and inter/ieer data to
create a portrait o$ todays students e"periences ith and sill using
in$ormation technology
1 isted belo are some o$ the response /ariables that ere measured
in this study +or each o$ these determine hether it is categorical
9uantitati/e discrete or 9uantitati/e continuous
a Technology onership: Fo you on a computerK cate.orical
0!antitative discrete
0!antitative contin!o!s
b Time =per ee in minutes> spent using a computer $or riting
documents =ord processing> cate.orical 0!antitative
discrete
0!antitative contin!o!s
c ,hich social netoring site=s> are you a memberK =$aceboo
myspace $riendster etc> cate.orical 0!antitative discrete
0!antitative contin!o!s
2 !denti$y appropriate model $or each /ariable =?e
complete>
\ has a WWWWWWWWWWWWWWWWWWWWWWWWWWW distribution
b Suppose that 45Q o$ %ichigan residents on dogs et ?
represent the number o$ %ichigan residents ith a dog in a random
sample o$ 1- %ichigan residents
\ has a WWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWW distribution c et \ represent the score =in
points out o$ 1--> on a
standardiGed e"am The model $or \ is shon belo
\ has a WWWWWWWWWWWWWWWWWWWWWWWWWWWW distribution
#
Proble 17 Stu!y on Silin !n a recent study people ere obser/ed $or
about 1- seconds in public places =eg malls and restaurants> to
determine hether they smiled during the randomly chosen
1-&second inter/al The table shos the results $or comparing
males =group 1> and $emales =group 2>
a ,hat is the probability that a randomly selected person smiledK
b The researcher ould lie to assess i smilin. stat!s is independent
o .ender i To chec $or independence the probability $ound in
part =a>
should be compared to hich o$ the $olloing probabilitiesK
P*smiled and male, P*smiled .iven male,
P*male .iven smiled, P*male,
ii +ind the probability selected abo/e and circle the appropriate
conclusion
The probability WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
Thus it appears that smiling status is is not
independent o$ gender
Proble 27 Suer rip Lent/ Fid high gas prices eep ;mericans $rom
hitting the road this past summerK !n a nationide sur/ey o$ adults
one /ariable measured as ho many days /acationers spent dri/ing on
the road on their longest trip Consider the $olloing =partial>
probability distribution $or the random /ariable
? the number o$ days $or the longest car
trip
E 4 5 # ' Probab ilit'
-1- -2- -25
'
a Suppose the probability o$ # days is tice as liely as the
probability o$ ' days Complete the probability distribution $or
? Sho your or
Proble 37 Sur'i'in rees B ; landscaping company claims that L-Q o$
the trees they plant sur/i/e =dened as being still ali/e one year
$rom planting> !$ a tree does not sur/i/e the company ill
replace the tree ith a ne one a ; homeoner ill ha/e 5 trees planted
in his yard by this
landscaping company ou can consider these 5 trees to be a random
sample o$ all trees planted by this company !$ companys claim
is correct hat is the probability at least 4 trees ill
sur/i/eK
b ;n )ce Space Fe/eloper ill ha/e 2-- trees planted all around a ne
oce space building comple" by this landscaping company ou can
consider these 2-- trees to be a random sample o$ all trees planted
by this company !$ the companys claim is correct hat is the
probability that 3- or more trees ill need to be replaced =ie not
sur/i/e>K
Proble 47 8o( .uc/ ie to Jou Spen! Stu!yin ; ,ashington Post
article A!s college too easyK ;s study time $alls debate risesB
=%ay 21 2-12> stated that the amount o$ time college students
actually study has dindled $rom an a/erage o$ 24 hours per ee
to about 15 hours =based on a sur/ey>
; pro$essor o$ statistics decided to as all o$ his current semester
students to report the number o$ hours per ee they spend
L
studying his course material =on a regular non&e"am ee> The
mean $or the $emale students as 1- hours and the standard de/iation
as 35 hours
a Consider the $olloing interpretations o$ the standard de/iation
and clearly circle those that are correct
)n a/erage the number o$ hours spent studying statistics /aried
$rom the mean by about 35 hours
The a/erage distance beteen the number o$ hours spent
studying statistics is roughly 35 hours
The a/erage number o$ hours spent studying statistics is
about 35 hours aay $rom the mean
c The male students had a loer mean and a larger standard de/iation
Suppose Xaes response corresponds to a G&score o$ 21
Complete the sentence to e"plain hat this tells us about the number
o$ hours that Xae studies statistics per ee =?e as specic as you
can>
#-
d The pro$essor as interested in ho the results o$ his students
compare to those taing a Chemistry class The distribution o$
hours spent studying =per ee> $or students in the Chemistry
class ere reported as being appro"imately normal ith a mean o$ 12
and a standard de/iation o$ 3
ii Xing learns that she is in the top 3-Q o$ this distribution
?ased on the distribution Xing must study at least ho many hours
per eeK %ae a hand setch o$ hat you are trying to nd to help sho
your or
ool-*o(n7 rue or False Fecide hether the $olloing 9uestions
regarding probability and random /ariables are true or $alse 1 !$
the time to ait $or pharmacy help has a uni$orm distribution
$rom - minutes to 3- minutes then 33Q o$ the customers are e"pected
to ait more than 2- minutes
rue False
2 !$ \ has a ?inomial =5- -#> distribution then the criteria to
use the normal appro"imation are met
rue False
3 'Q o$ test scores are alays ithin one standard de/iation o$
the mean test score
#1
rue False
#$aple #$a ;uestion on Probability an! &an!o <ariables
Suppose that the amount o$ time spent aiting $or your bus to our
campus each day is a uni$orm random /ariable beteen - to 2-
minutes
a Setch a picture o$ the model $or aiting time $or the bus Pro/ide
labels $or each a"is and some /alues along each a"is
ets dene the $olloing e/ents: • ; is the e/ent that you ait at
least 1- minutes that is your
aiting time is in the inter/al ^1-2-_ • ? is the e/ent that you ait
at most 15 minutes that is your
aiting time is in the inter/al ^-15_ • C is the e/ent that you ait
at most 1- minutes that is your
aiting time is in the inter/al ^-1-_ ;nser the $olloing 9uestions
based on the in$ormation gi/en Sho all or
b ,hat is P=;>K
+inal anser: WWWWWWWWWWWWWWWW
+inal anser: WWWWWWWWWWWWWWWW
+inal anser: WWWWWWWWWWWWWWWW
#2
+inal anser: WWWWWWWWWWWWWWWW
$ ;re the e/ents ; and C mutually e"clusi/eK Circle one:
Jes o E"plain brie<y
Lab 57 on=!ence Inter'als ?I@ %or a Population Proportion
Ob7ective9 This module ill help you better understand the
ideas in/ol/ed in condence inter/al estimation as ell as ho to
interpret both the condence le/el and condence inter/al $or a
population proportion ou ill construct one&sample
condence inter/als $or a population proportion and to chec that the
conditions necessary $or the inter/al are /alid
Overvie&9 Since generally a population proportion is an
unnon number e are interested