Upload
carl-azzopardi
View
222
Download
0
Embed Size (px)
Citation preview
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
1/102
Intrctin t Statistics
r Bimeical Enineers
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
2/102
Copyrght 2007 by Morgan & Claypool
All rghts reserved. No part o ths publcaton may be reproduced, stored n a retreval system, or transmtted n
any orm or by any meanselectronc, mechancal, photocopy, recordng, or any other except or bre quotatons n
prnted revews, wthout the pror permsson o the publsher.
Introducton to Statstcs or Bomedcal Engneers
Krstna M. Ropella
www.morganclaypool.com
ISBN: 1598291963 paperback
ISBN: 9781598291964 paperback
ISBN: 1598291971 ebook
ISBN: 9781598291971 ebook
DOI: 10.2200/S00095ED1V01Y200708BME014
A Publcaton n the Morgan & Claypool Publshers seres
SYNTHESIS LECTURES ON BIOMEDICAL ENGINEERING #14
Lecture #14
Seres Edtor: John D. Enderle, Unversty o Connectcut
Series ISSN
ISSN 1930-0328 prnt
ISSN 1930-0336 electronc
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
3/102
Intrctin t Statisticsr Bimeical EnineersKristina M. RpellaDepartment o Bomedcal Engneerng
Marquette Unversty
SYNTHESIS LECTURES ON BIOMEDICAL ENGINEERING #14
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
4/102
This text is dedicated to all the students who have completed my BIEN 084statistics course or biomedical engineers and have taught me how to be
more eective in communicating the subject matter and making statisticscome alive or them. I also thank J. Claypool or his patience andor encouraging me to fnally put this text together.
Finally, I thank my amily or tolerating my time at home on the laptop.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
5/102
ABSTRACTThere are many books wrtten about statstcs, some bre, some detaled, some humorous, some
colorul, and some qute dry. Each o these texts s desgned or a specc audence. Too oten, texts
about statstcs have been rather theoretcal and ntmdatng or those not practcng statstcal
analyss on a routne bass. Thus, many engneers and scentsts, who need to use statstcs much
more requently than calculus or derental equatons, lack sucent knowledge o the use o
statstcs. The audence that s addressed n ths text s the unversty-level bomedcal engneerng
student who needs a bare-bones coverage o the most basc statstcal analyss requently used n
bomedcal engneerng practce. The text ntroduces students to the essental vocabulary and basc
concepts o probablty and statstcs that are requred to perorm the numercal summary and sta-
tstcal analyss used n the bomedcal eld. Ths text s consdered a startng pont or mportant
ssues to consder when desgnng experments, summarzng data, assumng a probablty model or
the data, testng hypotheses, and drawng conclusons rom sampled data.
A student who has completed ths text should have sucent vocabulary to read more ad-
vanced texts on statstcs and urther ther knowledge about addtonal numercal analyses that are
used n the bomedcal engneerng eld but are beyond the scope o ths text. Ths book s desgned
to supplement an undergraduate-level course n appled statstcs, speccally n bomedcal eng-
neerng. Practcng engneers who have not had ormal nstructon n statstcs may also use ths text
as a smple, bre ntroducton to statstcs used n bomedcal engneerng. The emphass s on the
applcaton o statstcs, the assumptons made n applyng the statstcal tests, the lmtatons o
these elementary statstcal methods, and the errors oten commtted n usng statstcal analyss.
A number o examples rom bomedcal engneerng research and ndustry practce are provded to
assst the reader n understandng concepts and applcaton. It s benecal or the reader to have
some background n the le scences and physology and to be amlar wth basc bomedcal n-
strumentaton used n the clncal envronment.
KEywoRdSprobablty model, hypothess testng, physology, ANOVA, normal dstrbuton,
condence nterval, power test
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
6/102
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
7/102
Cntents
1. Intrctin .......................................................................................................1
2. Cllectin data an Eperimental desin ...........................................................5
3. data Smmar an descriptie Statistics ............................................................9
3.1 Why Do We Collect Data? ................................................................................ 9
3.2 Why Do We Need Statstcs? ............................................................................. 9
3.3 What Questons Do We Hope to Address Wth Our Statstcal Analyss? ..... 10
3.4 How Do We Graphcally Summarze Data? .................................................... 11
3.4.1 Scatterplots ........................................................................................... 113.4.2 Tme Seres ........................................................................................... 11
3.4.3 Box-and-Whsker Plots ........................................................................ 12
3.4.4 Hstogram ............................................................................................. 13
3.5 General Approach to Statstcal Analyss ......................................................... 17
3.6 Descrptve Statstcs ........................................................................................ 20
3.6.1 Measures o Central Tendency ............................................................. 21
3.6.2 Measures o Varablty ......................................................................... 22
4. Assmin a Prbabilit Mel Frm the Sample data ........................................ 25
4.1 The Standard Normal Dstrbuton .................................................................. 29
4.2 The Normal Dstrbuton and Sample Mean .................................................... 32
4.3 Condence Interval or the Sample Mean ....................................................... 33
4.4 The tDstrbuton ............................................................................................ 36
4.5 Condence Interval Usng tDstrbuton .......................................................... 38
5. Statistical Inerence .......................................................................................... 41
5.1 Comparson o Populaton Means .................................................................... 41
5.1.1 The tTest ............................................................................................. 42
5.1.1.1 Hypothess Testng ................................................................ 425.1.1.2 Applyng the tTest ................................................................ 43
5.1.1.3 Unpared tTest ...................................................................... 44
5.1.1.4 Pared tTest ........................................................................... 49
5.1.1.5 Example o a Bomedcal Engneerng Challenge ................. 50
ii
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
8/102
5.2 Comparson o Two Varances .......................................................................... 54
5.3 Comparson o Three or More Populaton Means ........................................... 59
5.3.1 One-Factor Experments ...................................................................... 60
5.3.1.1 Example o Bomedcal Engneerng Challenge .................... 60
5.3.2 Two-Factor Experments ...................................................................... 69
5.3.3 Tukeys Multple Comparson Procedure ............................................. 73
6. Linear Reressin an Crrelatin Analsis ....................................................... 75
7. Per Analsis an Sample Size ........................................................................ 81
7.1 Power o a Test ................................................................................................. 82
7.2 Power Tests to Determne Sample Sze ............................................................ 83
8. Jst the Beinnin ............................................................................................ 87
Bibliraph ............................................................................................................. 91
Athr Biraph ...................................................................................................... 93
iii INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
9/102
1
C H A P T E R 1
Bomedcal engneers typcally collect all sorts o data, rom patents, anmals, cell counters, mcro-
assays, magng systems, pressure transducers, bedsde montors, manuacturng processes, materal
testng systems, and other measurement systems that support a broad spectrum o research, desgn,
and manuacturng envronments. Ultmately, the reason or collectng data s to make a decson.
That decson may concern derentatng bologcal characterstcs among derent populatons
o people, determnng whether a pharmacologcal treatment s eectve, determnng whether t s
cost-eectve to nvest n multmllon-dollar medcal magng technology, determnng whether a
manuacturng process s under control, or selectng the best rehabltatve therapy or an ndvdual
patent.
The challenge n makng such decsons oten les n the act that all real-world data contans
some element o uncertanty because o random processes that underle most physcal phenomenon.
These random elements prevent us rom predctng the exact value o any physcal quantty at any
moment o tme. In other words, when we collect a sample or data pont, we usually cannot predct
the exact value o that sample or expermental outcome. For example, although the average restng
heart rate o normal adults s about 70 beats per mnute, we cannot predct the exact arrval tme
o our next heartbeat. However, we can approxmate the lkelhood that the arrval tme o the next
heartbeat wll all n a specc tme nterval we have a good probablty model to descrbe the
random phenomenon contrbutng to the tme nterval between heartbeats. The tmng o heart-
beats s nfuenced by a number o physologcal varables [1], ncludng the reractory perod o
the ndvdual cells that make up the heart muscle, the leakness o the cell membranes n the snus
node (the hearts natural pacemaker), and the actvty o the autonomc nervous system, whch may
speed up or slow down the heart rate n response to the bodys need or ncreased blood fow, oxygen,
and nutrents. The sum o these bologcal processes produces a pattern o heartbeats that we may
measure by countng the pulse rate rom our wrst or carotd artery or by searchng or specc QRS
waveorms n the ECG [2]. Although ths sum o events makes t dcult or us to predct exactly
when the new heartbeat wll arrve, we can guess, wth a certanty amount o condence when the
next beat wll arrve. In other words, we can assgn a probablty to the lkelhood that the next
heartbeat wll arrve n a speced tme nterval. I we were to consder all possble arrval tmes and
Intrctin
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
10/102
2 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
assgned a probablty to those arrval tmes, we would have a probablty model or the heartbeat
ntervals. I we can nd a probablty model to descrbe the lkelhood o occurrence o a certan
event or expermental outcome, we can use statstcal methods to make decsons. The probablty
models descrbe characterstcs o the populaton or phenomenon beng studed. Statstcal analys
then makes use o these models to help us make decsons about the populaton(s) or processes.
The conclusons that one may draw rom usng statstcal analyss are only as good as thunderlyng model that s used to descrbe the real-world phenomenon, such as the tme nterva
between heartbeats. For example, a normally unctonng heart exhbts consderable varablty n
beat-to-beat ntervals (Fgure 1.1). Ths varablty refects the bodys contnual eort to mantan
homeostass so that the body may contnue to perorm ts most essental unctons and supply the
body wth the oxygen and nutrents requred to uncton normally. It has been demonstrated through
bomedcal research that there s a loss o heart rate varablty assocated wth some dseases, such
as dabetes and schemc heart dsease. Researchers seek to determne ths derence n varablty
between normal subjects and subjects wth heart dsease s sgncant (meanng, t s due to some
underlyng change n bology and not smply a result o chance) and whether t mght be used topredct the progresson o the dsease [1]. One wll note that the probablty model changes as a
consequence o changes n the underlyng bologcal uncton or process. In the case o manuactur
ng, the probablty model used to descrbe the output o the manuacturng process may change a
FIguRE 1.1: Example o an ECG recordng, where R-R nterval s dened as the tme nterval be
tween successve R waves o the QRS complex, the most promnent waveorm o the ECG.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
11/102
INTRoduCTIoN 3
a uncton o machne operaton or changes n the surroundng manuacturng envronment, such as
temperature, humdty, or human operator.
Besdes helpng us to descrbe the probablty model assocated wth real-world phenomenon,
statstcs help us to make decsons by gvng us quanttatve tools or testng hypotheses. We call
ths inerential statistics, whereby the outcome o a statstcal test allows us to draw conclusons or
make nerences about one or more populatons rom whch samples are drawn. Most oten, scen-tsts and engneers are nterested n comparng data rom two or more derent populatons or rom
two or more derent processes. Typcally, the deault hypothess s that there s no derence n the
dstrbutons o two or more populatons or processes, and we use statstcal analyss to determne
whether there are true derences n the dstrbutons o the underlyng populatons to warrant d-
erent probablty models be assgned to the ndvdual processes.
In summary, bomedcal engneers typcally collect data or samples rom varous phenomena,
whch contan some element o randomness or unpredctable varablty, or the purposes o makng
decsons. To make sound decsons n the context o the uncertanty wth some level o condence,
we need to assume some probablty model or the populatons rom whch the samples have beencollected. Once we have assumed an underlyng model, we can select the approprate statstcal
tests or comparng two or more populatons and then use these tests to draw conclusons about
FIguRE 1.2:Steps n statstcal analyss.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
12/102
our hypotheses or whch we collected the data n the rst place. Fgure 1.2 outlnes the steps o
perormng statstcal analyss o data.
In the ollowng chapters, we wll descrbe methods or graphcally and numercally sum
marzng collected data. We wll then talk about ttng a probablty model to the collected data by
brefy descrbng a number o well-known probablty models that are used to descrbe bologca
phenomenon. Fnally, once we have assumed a model or the populatons rom whch we have collected our sample data, we wll dscuss the types o statstcal tests that may be used to compare data
rom multple populatons and allow us to test hypotheses about the underlyng populatons.
4 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
13/102
5
C H A P T E R 2
Beore we dscuss any type o data summary and statstcal analyss, t s mportant to recognze that
the value o any statstcal analyss s only as good as the data collected. Because we are usng data
or samples to draw conclusons about entre populatons or processes, t s crtcal that the data col-
lected (or samples collected) are representatve o the larger, underlyng populaton. In other words,
we are tryng to determne whether men between the ages o 20 and 50 years respond postvely
to a drug that reduces cholesterol level, we need to careully select the populaton o subjects or
whom we admnster the drug and take measurements. In other words, we have to have enough
samples to represent the varablty o the underlyng populaton. There s a great deal o varety n
the weght, heght, genetc makeup, det, exercse habts, and drug use n all men ages 20 to 50 years
who may also have hgh cholesterol. I we are to test the eectveness o a new drug n lowerng
cholesterol, we must collect enough data or samples to capture the varablty o bologcal makeup
and envronment o the populaton that we are nterested n treatng wth the new drug. Capturng
ths varablty s oten the greatest challenge that bomedcal engneers ace n collectng data and
usng statstcs to draw meanngul conclusons. The expermentalst must ask questons such as the
ollowng:
What type o person, object, or phenomenon do I sample?
What varables that mpact the measure or data can I control?
How many samples do I requre to capture the populaton varablty to apply the appro-
prate statstcs and draw meanngul conclusons?
How do I avod basng the data wth the expermental desgn?
Expermental desgn, although not the prmary ocus o ths book, s the most crtcal step to sup-
port the statstcal analyss that wll lead to meanngul conclusons and hence sound decsons.
One o the most undamental questons asked by bomedcal researchers s, What sze sam-
ple do I need? or How many subjects wll I need to make decsons wth any level o condence?
Cllectin data anEperimental desin
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
14/102
We wll address these mportant questons at the end o ths book when concepts such as varablty
probablty models, and hypothess testng have already been covered. For example, power tests wl
be descrbed as a means or predctng the sample sze requred to detect sgncant derences n
two populaton means usng a ttest.
Two elements o expermental desgn that are crtcal to prevent basng the data or selectng
samples that do not arly represent the underlyng populaton are randomzaton and blockng.Randomzaton reers to the process by whch we randomly select samples or expermenta
unts rom the larger underlyng populaton such that we maxmze our chance o capturng the
varablty n the underlyng populaton. In other words, we do not lmt our samples such tha
only a racton o the characterstcs or behavors o the underlyng populaton are captured n the
samples. More mportantly, we do not bas the results by artcally lmtng the varablty n the
samples such that we alter the probablty model o the sample populaton wth respect to the prob
ablty model o the underlyng populaton.
In addton to randomzng our selecton o expermental unts rom whch to take samples, w
mght also randomze our assgnment o treatments to our expermental unts. Or, we may random-ze the order n whch we take data rom the expermental unts. For example, we are testng the
eectveness o two derent medcal magng methods n detectng bran tumor, we wll randomly
assgn all subjects suspect o havng bran tumor to one o the two magng methods. Thus, we hav
a mx o sex, age, and type o bran tumor partcpatng n the study, we reduce the chance o havng
all one sex or one age group assgned to one magng method and a very derent type o populaton
assgned to the second magng method. I a derence s noted n the outcome o the two magng
methods, we wll not artcally ntroduce sex or age as a actor nfuencng the magng results.
As another example, one are testng the strength o three derent materals or use n
hp mplants usng several strength measures rom a materals testng machne, one mght random
ze the order n whch samples o the three derent test materals are submtted to the machne
Machne perormance can vary wth tme because o wear, temperature, humdty, deormaton
stress, and user characterstcs. I the bomedcal engneer were asked to nd the strongest matera
or an artcal hp usng specc strength crtera, he or she may conduct an experment. Let us
assume that the engneer s gven three boxes, wth each box contanng ve artcal hp mplant
made rom one o three materals: ttanum, steel, and plastc. For any one box, all ve mplan
samples are made rom the same materal. To test the 15 derent mplants or materal strength
the engneer mght randomze the order n whch each o the 15 mplants s tested n the mater-
als testng machne so that tme-dependent changes n machne perormance or machne-matera
nteractons or tme-varyng envronmental condton do not bas the results or one or more o th
materals. Thus, to ully randomze the mplant testng, an engneer may lterally place the number
115 n a hat and also assgn the numbers 115 to each o the mplants to be tested. The engnee
wll then blndly draw one o the 15 numbers rom a hat and test the mplant that corresponds to
6 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
15/102
CoLLECTINg dATA ANd ExPERIMENTAL dESIgN 7
that number. Ths way the engneer s not testng all o one materal n any partcular order, and we
avod ntroducng order eects nto the data.
The second aspect o expermental desgn s blockng. In many experments, we are nterested
n one or two specc actors or varables that may mpact our measure or sample. However, there
may be other actors that also nfuence our measure and conound our statstcs. In good exper-
mental desgn, we try to collect samples such that derent treatments wthn the actor o nterestare not based by the derng values o the conoundng actors. In other words, we should be cer-
tan that every treatment wthn our actor o nterest s tested wthn each value o the conoundng
actor. We reer to ths desgn as blockng by the conoundng actor. For example, we may want to
study weght loss as a uncton o three derent det plls. One conoundng actor may be a persons
startng weght. Thus, n testng the eectveness o the three plls n reducng weght, we may want
to block the subjects by startng weght. Thus, we may rst group the subjects by ther startng
weght and then test each o the det plls wthn each group o startng weghts.
In bomedcal research, we oten block by expermental unt. When ths type o blockng s
part o the expermental desgn, the expermentalst collects multple samples o data, wth eachsample representng derent expermental condtons, rom each o the expermental unts. Fg-
ure 2.1 provdes a dagram o an experment n whch data are collected beore and ater patents
receves therapy, and the expermental desgn uses blockng (let) or no blockng (rght) by exper-
mental unt. In the case o blockng, data are collected beore and ater therapy rom the same set o
human subjects. Thus, wthn an ndvdual, the same bologcal actors that nfuence the bologcal
response to the therapy are present beore and ater therapy. Each subject serves as hs or her own
control or actors that may randomly vary rom subject to subject both beore and ater therapy.
In essence, wth blockng, we are elmnatng bases n the derences between the two populatons
Block (Repeated Measures) No Block (No repeated measures)
Subject Measure
beforetreatment
Measure
aftertreatment
Subject Measure
beforetreatment
Subject Measure
aftertreatment
1 M11 12 1 M1 K+1 M(K+1)
2 M21 M22 2 M2 K+2 M(K+2)
3 M31 M32 3 M3 K+3 M(K+3)
. . .
. . .K MK1 MK2 K MK K+K M(K+K)
FIguRE 2.1:Samples are drawn rom two populatons (beore and ater treatment), and the exper-
mental desgn uses block (let) or no block (rght). In ths case, the block s the expermental unt (sub-
ject) rom whch the measures are made.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
16/102
8 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
(beore and ater) that may result because we are usng two derent sets o expermental unts. Fo
example, we used one set o subjects beore therapy and then an entrely derent set o subject
ater therapy (Fgure 2.1, rght), there s a chance that the two sets o subjects may vary enough n
sex, age, weght, race, or genetc makeup, whch would lead to a derence n response to the therapy
that has lttle to do wth the underlyng therapy. In other words, there may be conoundng actor
that contrbute to the derence n the expermental outcome beore and ater therapy that are noonly a actor o the therapy but really an artact o derences n the dstrbutons o the two der
ent groups o subjects rom whch the two samples sets were chosen. Blockng wll help to elmnat
the eect o ntersubject varablty.
However, blockng s not always possble, gven the nature o some bomedcal research stud
es. For example, one wanted to study the eectveness o two derent chemotherapy drugs n
reducng tumor sze, t s mpractcal to test both drugs on the same tumor mass. Thus, the two
drugs are tested on derent groups o ndvduals. The same type o desgn would be necessary o
testng the eectveness o weght-loss regmens.
Thus, some mportant concepts and dentons to keep n mnd when desgnng expermentnclude the ollowng:
experimental unit: the tem, object, or subject to whch we apply the treatment and rom
whch we take sample measurements;
randomization: allocate the treatments randomly to the expermental unts;
blocking: assgnng all treatments wthn a actor to every level o the blockng actor.
Oten, the blockng actor s the expermental unt. Note that n usng blockng, we stll
randomze the order n whch treatments are appled to each expermental unt to avod
orderng bas.Fnally, the expermentalst must always thnk about how representatve the sample populaton
wth respect to the greater underlyng populaton. Because t s vrtually mpossble to test every
member o a populaton or every product rollng down an assembly lne, especally when destruc
tve testng methods are used, the bomedcal engneer must oten collect data rom a much smalle
sample drawn rom the larger populaton. It s mportant, the statstcs are gong to lead to useu
conclusons, that the sample populaton captures the varablty o the underlyng populaton. Wha
s even more challengng s that we oten do not have a good grasp o the varablty o the underly
ng populaton, and because o expense and respect or le, we are typcally lmted n the number o
samples we may collect n bomedcal research and manuacturng. These lmtatons are not easy toaddress and requre that the engneer always consder how ar the sample and data analyss s and
how well t represents the underlyng populaton(s) rom whch the samples are drawn.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
17/102
9
C H A P T E R 3
We assume now that we have collected our data through the use o good expermental desgn. We
now have a collecton o numbers, observatons, or descrptons to descrbe our data, and we would
lke to summarze the data to make decsons, test a hypothess, or draw a concluson.
3.1 wHy do wE CoLLECT dATA?The world s ull o uncertanty, n the sense that there are random or unpredctable actors that
nfuence every expermental measure we make. The unpredctable aspects o the expermental out-
comes also arse rom the varablty n bologcal systems (due to genetc and envronmental ac-
tors) and manuacturng processes, human error n makng measurements, and other underlyng
processes that nfuence the measures beng made.
Despte the uncertanty regardng the exact outcome o an experment or occurrence o a u-
ture event, we collect data to try to better understand the processes or populatons that nfuence an
expermental outcome so that we can make some predctons. Data provde normaton to reduce
uncertanty and allow or decson makng. When properly collected and analyzed, data help us
solve problems. It cannot be stressed enough that the data must be properly collected and analyzed
the data analyss and subsequent conclusons are to have any value.
3.2 wHy do wE NEEd STATISTICS?We have three major reasons or usng statstcal data summary and analyss:
The real world s ull o random events that cannot be descrbed by exact mathematcal
expressons.
Varablty s a natural and normal characterstc o the natural world.
We lke to make decsons wth some condence. Ths means that we need to nd trends
wthn the varablty.
1.
2.
3.
data Smmar andescriptie Statistics
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
18/102
10 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
3.3 wHAT QuESTIoNS do wE HoPE To AddRESS wITHouR STATISTICAL ANALySIS?
There are several basc questons we hope to address when usng numercal and graphcal summary
o data:
Can we derentate between groups or populatons?Are there correlatons between varables or populatons?
Are processes under control?
Fndng physologcal derences between populatons s probably the most requent am
o bomedcal research. For example, researchers may want to know there s a derence n le
expectancy between overweght and underweght people. Or, a pharmaceutcal company may wan
to determne one type o antbotc s more eectve n combatng bactera than another. Or, a
physcan wonders dastolc blood pressure s reduced n a group o hypertensve subjects ate
the consumpton o a pressure-reducng drug. Most oten, bomedcal researchers are comparng
populatons o people or anmals that have been exposed to two or more derent treatments or d
agnostc tests, and they want to know there s derence between the responses o the populaton
that have receved derent treatments or tests. Sometmes, we are drawng multple samples rom
the same group o subjects or expermental unts. A common example s when the physologcal dat
are taken beore and ater some treatment, such as drug ntake or electronc therapy, rom one group
o patents. We call ths type o data collecton blockingn the expermental desgn. Ths concept o
blockng s dscussed more ully n Chapter 2.
Another queston that s requently the target o bomedcal research s whether there s a cor
relaton between two physologcal varables. For example, s there a correlaton between body buld
and mortalty? Or, s there a correlaton between at ntake and the occurrence o cancerous tumors
Or, s there a correlaton between the sze o the ventrcular muscle o the heart and the requency o
abnormal heart rhythms? These type o questons nvolve collectng two set o data and perormng
a correlaton analyss to determne how well one set o data may be predcted rom another. When
we speak o correlaton analyss, we are reerrng to the lnear relaton between two varables and the
ablty to predct one set o data by modelng the data as a lnear uncton o the second set o data
Because correlaton analyss only quantes the lnear relaton between two processes or data sets
nonlnear relatons between the two processes may not be evdent. A more detaled descrpton o
correlaton analyss may be ound n Chapter 7.
Fnally, a bomedcal engneer, partcularly the engneer nvolved n manuacturng, may be
nterested n knowng whether a manuacturng process s under control. Such a queston may ars
there are tght controls on the manuacturng speccatons or a medcal devce. For example
1.2.
3.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
19/102
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 11
the engneer s tryng to ensure qualty n producng ntravascular catheters that must have d-
ameters between 1 and 2 cm, the engneer may randomly collect samples o catheters rom the
assembly lne at random ntervals durng the day, measure ther dameters, determne how many o
the catheters meet speccatons, and determne whether there s a sudden change n the number
o catheters that al to meet speccatons. I there s such a change, the engneers may look or
elements o the manuacturng process that change over tme, changes n envronmental actors, oruser errors. The engneer can use control charts to assess whether the processes are under control.
These methods o statstcal analyss are not covered n ths text, but may be ound n a number o
reerences, ncludng [3].
3.4 How do wE gRAPHICALLy SuMMARIZE dATA?We can summarze data n graphcal or numercal orm. The numercal orm s what we reer to as
statstcs. Beore blndly applyng the statstcal analyss, t s always good to look at the raw data,
usually n a graphcal orm, and then use graphcal methods to summarze the data n an easy to
nterpret ormat.
The types o graphcal dsplays that are most requently used by bomedcal engneers nclude
the ollowng: scatterplots, tme seres, box-and-whsker plots, and hstograms.
Detals or creatng these graphcal summares are descrbed n [36], but we wll brefy
descrbe them here.
3.4.1 ScatterpltsThe scatterplot smply graphs the occurrence o one varable wth respect to another. In most cases,
one o the varables may be consdered the ndependent varable (such as tme or subject number),and the second varable s consdered the dependent varable. Fgure 3.1 llustrates an example o a
scatterplot or two sets o data. In general, we are nterested n whether there s a predctable rela-
tonshp that maps our ndependent varable (such as respratory rate) nto our dependent varable
(such a heart rate). I there s a lnear relatonshp between the two varables, the data ponts should
all close to a straght lne.
3.4.2 Time SeriesA tme seres s used to plot the changes n a varable as a uncton o tme. The varable s usually
a physologcal measure, such as electrcal actvaton n the bran or hormone concentraton n the
blood stream, that changes wth tme. Fgure 3.2 llustrates an example o a tme seres plot. In ths
gure, we are lookng at a smple snusod uncton as t changes wth tme.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
20/102
12 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
3.4.3 B-an-whisker Plts
These plots llustrate the rst, second, and thrd quartles as well as the mnmum and maxmumvalues o the data collected. The second quartle (Q2) s also known as the medan o the data. Th
quantty, as dened later n ths text, s the mddle data pont or sample value when the samples
are lsted n descendng order. The rst quartle (Q1) can be thought o as the medan value o the
samples that all below the second quartle. Smlarly, the thrd quartle (Q3) can be thought o as
the medan value o the samples that all above the second quartle. Box-and-whsker plots are use
ul n that they hghlght whether there s skew to the data or any unusual outlers n the sample
(Fgure 3.3).
-2
-1
0
1
2
5 10 15 20
Amplitude
Time (msec)
FIguRE 3.2: Example o a tme seres plot. The ampltude o the samples s plotted as a uncton o
tme.
20100
10
9
8
7
6
5
4
3
2
1
0
Independent Variable
Depe
ndentVariable
FIguRE 3.1: Example o a scatterplot.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
21/102
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 13
1
10
9
8
7
6
5
4
3
2
1
0
Category
DependentVariable
Q1
Q2
Q3
Box and Whisker Plot
FIguRE 3.3: Illustraton o a box-and-whsker plot or the data set lsted. The rst (Q1), second (Q2),
and thrd (Q3) quartles are shown. In addton, the whskers extend to the mnmum and maxmum
values o the sample set.
3.4.4 HistramThe hstogram s dened as a requency dstrbuton. GvenNsamples or measurements, x
i, whch
range rom Xmn
to Xmax
, the samples are grouped nto nonoverlappng ntervals (bns), usually o
equal wdth (Fgure 3.4). Typcally, the number o bns s on the order o 714, dependng on the
nature o the data. In addton, we typcally expect to have at least three samples per bn [7]. Stur-
gess rule [6] may also be used to estmate the number o bns and s gven by
k = 1 + 3.3 log(n).
where k s the number o bns and n s the number o samples.
Each bn o the hstogram has a lower boundary, upper boundary, and mdpont. The hsto-
gram s constructed by plottng the number o samples n each bn. Fgure 3.5 llustrates a hstogram
or 1000 samples drawn rom a normal dstrbuton wth mean () = 0 and standard devaton () =
1.0. On the horzontal axs, we have the sample value, and on the vertcal axs, we have the number
o occurrences o samples that all wthn a bn.
Two measures that we nd useul n descrbng a hstogram are the absolute requency and
relatve requency n one or more bns. These quanttes are dened as
i= absolute requency n ith bn;
i/n = relatve requency n th bn, where n s the total number o samples beng summarzed
n the hstogram.
a)
b)
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
22/102
14 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
A number o algorthms used by bomedcal nstruments or dagnosng or detectng ab-
normaltes n bologcal uncton make use o the hstogram o collected data and the assocated
relatve requences o selected bns [8]. Oten tmes, normal and abnormal physologcal uncton
(breath sounds, heart rate varablty, requency content o electrophysologcal sgnals) may be d
erentated by comparng the relatve requences n targeted bns o the hstograms o data repre
sentng these bologcal processes.
Lower Bound Upper BoundMidpoint
FIguRE 3.4: One bn o a hstogram plot. The bn s dened by a lower bound, a mdpont, and an
upper bound.
-2 -1 0 1 2 3
0
10
20
Normalized Value
Frequency
FIguRE 3.5:Example o a hstogram plot. The value o the measure or sample s plotted on the hor-
zontal axs, whereas the requency o occurrence o that measure or sample s plotted along the vertca
axs.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
23/102
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 15
The hstogram can exhbt several shapes. The shapes, llustrated n Fgure 3.6, are reerred
to as symmetrc, skewed, or bmodal.
A skewed hstogram may be attrbuted to the ollowng [9]:
mechansms o nterest that generate the data (e.g., the physologcal mechansms that
determne the beat-to-beat ntervals n the heart);an artact o the measurement process or a sht n the underlyng mechansm over tme
(e.g., there may be tme-varyng changes n a manuacturng process that lead to a change
n the statstcs o the manuacturng process over tme);
a mxng o populatons rom whch samples are drawn (ths s typcally the source o a
bmodal hstogram).
The hstogram s mportant because t serves as a rough estmate o the true probablty den-
sty uncton or probablty dstrbuton o the underlyng random process rom whch the samples
are beng collected.
The probablty densty uncton or probablty dstrbuton s a uncton that quantes theprobablty o a random event, x, occurrng. When the underlyng random event s dscrete n nature,
we reer to the probablty densty uncton as the probablty mass uncton [10]. In ether case, the
uncton descrbes the probablstc nature o the underlyng random varable or event and allows us
to predct the probablty o observng a specc outcome, x (represented by the random varable),
o an experment. The cumulatve dstrbuton uncton s smply the sum o the probabltes or a
group o outcomes, where the outcome s less than or equal to some value, x.
Let us consder a random varable or whch the probablty densty uncton s well dened
(or most real-world phenomenon, such a probablty model s not known.) The random varable s
the outcome o a sngle toss o a dce. Gven a sngle ar dce wth sx sdes, the probablty o rollng
a sx on the throw o a dce s 1 o 6. In act, the probablty o throwng a one s also 1 o 6. I we
consder all possble outcomes o the toss o a dce and plot the probablty o observng any one o
those sx outcomes n a sngle toss, we would have a plot such as that shown n Fgure 3.7.
Ths plot shows the probablty densty or probablty mass uncton or the toss o a dce.
Ths type o probablty model s known as a unorm dstrbuton because each outcome has the
exact same probablty o occurrng (1/6 n ths case).
For the toss o a dce, we know the true probablty dstrbuton. However, or most real-
world random processes, especally bologcal processes, we do not know what the true probablty
densty or mass uncton looks lke. As a consequence, we have to use the hstogram, created rom a
small sample, to try to estmate the best probablty dstrbuton or probablty model to descrbe the
real-world phenomenon. I we return to the example o the toss o a dce, we can actually toss the
dce a number o tmes and see how close the hstogram, obtaned rom expermental data, matches
1.
2.
3.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
24/102
16 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
-4 -3 -2 -1 0 1 2 3
0
100
200
Measure
Frequ
ency
Symmetric
0150
0
100
200
300
400
Measure
Frequency
Skewed
0 10 20
0
100
200
300
400
Measure
Frequency
Bimodal
FIguRE 3.6:Examples o a symmetrc (top), skewed (mddle), and bmodal (bottom) hstogram. In
each case, 2000 sampled were drawn rom the underlyng populatons.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
25/102
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 17
the true probablty mass uncton or the deal sx-sded dce. Fgure 3.8 llustrates the hstograms
or the outcomes o 50 and 1000 tosses o a sngle dce. Note that even wth 50 tosses or samples, t
s dcult to determne what the true probablty dstrbuton mght look lke. However, as we ap-
proach 1000 samples, the hstogram s approachng the true probablty mass uncton (the unorm
dstrbuton) or the toss o a dce. But, there s stll some varablty rom bn to bn that does not
look as unorm as the deal probablty dstrbuton llustrated n Fgure 3.7. The message to take
away rom ths llustraton s that most bomedcal research reports the outcomes o a small numbero samples. It s clear rom the dce example that the statstcs o the underlyng random process
are very dcult to dscern rom a small sample, yet most bomedcal research reles on data rom
small samples.
3.5 gENERAL APPRoACH To STATISTICAL ANALySISWe have now collected our data and looked at some graphcal summares o the data. Now we wll
use numercal summary, also known as statstcs, to try to descrbe the nature o the underlyng
populaton or process rom whch we have taken our samples. From these descrptve statstcs, we
assume a probablty model or probablty dstrbuton or the underlyng populaton or process andthen select the approprate statstcal tests to test hypotheses or make decsons. It s mportant to
note that the conclusons one may draw rom a statstcal test depends on how well the assumed
probablty model ts the underlyng populaton or process.
1 2 3 4 5 6
0
1/6
Result of Toss of Single Dice
RelativeFrequency
Probability Mass Function
FIguRE 3.7: The probablty densty uncton or a dscrete random varable (probablty mass unc-
ton). In ths case, the random varable s the value o a toss o a sngle dce. Note that each o the sx pos-
sble outcomes has a probablty o occurrence o 1 o 6. Ths probablty densty uncton s also knownas a unorm probablty dstrbuton.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
26/102
18 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
654321
0.2
0.1
0.0
Value of Dice Toss
RelativeFrequ
ency
Histogram of 50 Dice Tosses
654321
0.2
0.1
0.0
Value of Dice Toss
RelativeFrequency
Histogram of 2000 Dice Tosses
FIguRE 3.8:Hstograms representng the outcomes o experments n whch a sngle dce s tossed
50 (top) and 2000 tmes (lower), respectvely. Note that as the sample sze ncreases, the hstogram ap
proaches the true probablty dstrbuton llustrated n Fgure 3.7.
As stated n the Introducton, bomedcal engneers are tryng to make decsons about popu
latons or processes to whch they have lmted access. Thus, they desgn experments and collecsamples that they thnk wll arly represent the underlyng populaton or process. Regardless o
what type o statstcal analyss wll result rom the nvestgaton or study, all statstcal analys
should ollow the same general approach:
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
27/102
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 19
Measure a lmted number o representatve samples rom a larger populaton.
Estmate the true statstcs o larger populaton rom the sample statstcs.
Some mportant concepts need to be addressed here. The rst concept s somewhat obvous. It s
oten mpossble or mpractcal to take measurements or observatons rom an entre populaton.
Thus, the bomedcal engneer wll typcally select a smaller, more practcal sample that represents
the underlyng populaton and the extent o varablty n the larger populaton. For example, we
cannot possbly measure the restng body temperature o every person on earth to get an estmate o
normal body temperature and normal range. We are nterested n knowng what the normal body
temperature s, on average, o a healthy human beng and the normal range o restng temperatures
as well as the lkelhood or probablty o measurng a specc body temperature under healthy, rest -
ng condtons. In tryng to determne the characterstcs or underlyng probablty model or body
temperature or healthy, restng ndvduals, the researcher wll select, at random, a sample o healthy,
restng ndvduals and measure ther ndvdual restng body temperatures wth a thermometer. The
researchers wll have to consder the composton and sze o the sample populaton to adequatelyrepresent the varablty n the overall populaton. The researcher wll have to dene what character-
zes a normal, healthy ndvdual, such as age, sze, race, sex, and other trats. I a researcher were to
collect body temperature data rom such a sample o 3000 ndvduals, he or she may plot a hsto-
gram o temperatures measured rom the 3000 subjects and end up wth the ollowng hstogram
(Fgure 3.9).The researcher may also calculate some basc descrptve statstcs or the 3000 samples,
such as sample average (mean), medan, and standard devaton.
1.
2.
95 96 97 98 99 100 101 102
0.0
0.1
0.2
0.3
0.4
0.5
Temperature (F)
Density
Body Temperature
FIguRE 3.9:Hstogram or 2000 nternal body temperatures collected rom a normally dstrbuted
populaton.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
28/102
20 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
Once the researcher has estmated the sample statstcs rom the sample populaton, he or sh
wll try to draw conclusons about the larger (true) populaton. The most mportant queston to ask
when revewng the statstcs and conclusons drawn rom the sample populaton s how well th
sample populaton represents the larger, underlyng populaton.
Once the data have been collected, we use some basc descrptve statstcs to summarze th
data. These basc descrptve statstcs nclude the ollowng general measures: central tendencyvarablty, and correlaton.
3.6 dESCRIPTIvE STATISTICSThere are a number o descrptve statstcs that help us to pcture the dstrbuton o the underlyng
populaton. In other words, our ultmate goal s to assume an underlyng probablty model or th
populaton and then select the statstcal analyses that are approprate or that probablty model.
When we try to draw conclusons about the larger underlyng populaton or process rom ousmaller sample o data, we assume that the underlyng model or any sample, event, or measure
(the outcome o the experment) s as ollows:
X = ndvdual derences stuatonal actors unknown varables,
whereXs our measure or sample value and s nfuenced by, whch s the true populaton mean
ndvdual derences such as genetcs, tranng, motvaton, and physcal condton; stuaton actors
such as envronmental actors; and unknown varables such as undented/nonquanted actor
that behave n an unpredctable ashon rom moment to moment.In other words, when we make a measurement or observaton, the measured value represent
or s nfuenced by not only the statstcs o the underlyng populaton, such as the populaton
mean, but actors such as bologcal varablty rom ndvdual to ndvdual, envronmental actor
(tme, temperature, humdty, lghtng, drugs, etc.), and random actors that cannot be predcted
exactly rom moment to moment. All o these actors wll gve rse to a hstogram or the sample
data, whch may or may not refect the true probablty densty uncton o the underlyng popula
ton. I we have done a good job wth our expermental desgn and collected a sucent number o
samples, the hstogram and descrptve statstcs or the sample populaton should closely refect th
true probablty densty uncton and descrptve statstcs or the true or underlyng populaton. I
ths s the case, then we can make conclusons about the larger populaton rom the smaller sampl
populaton. I the sample populaton does not refect varablty o the true populaton, then the
conclusons we draw rom statstcal analyss o the sample data may be o lttle value.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
29/102
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 21
There are a number o probablty models that are useul or descrbng bologcal and manu-
acturng processes. These nclude the normal, Posson, exponental, and gamma dstrbutons [10].
In ths book, we wll ocus on populatons that ollow a normal dstrbuton because ths s the most
requently encountered probablty dstrbuton used n descrbng populatons. Moreover, the most
requently used methods o statstcal analyss assume that the data are well modeled by a normal
(bell-curve) dstrbuton. It s mportant to note that many bologcal processes are not well mod-eled by a normal dstrbuton (such as heart rate varablty), and the statstcs assocated wth the
normal dstrbuton are not approprate or such processes. In such cases, nonparametrc statstcs,
whch do not assume a specc type o dstrbuton or the data, may serve the researcher better n
understandng processes and makng decsons. However, usng the normal dstrbuton and ts asso-
cated statstcs are oten adequate gven the central lmt theorem, whch smply states that the sum
o random processes wth arbtrary dstrbutons wll result n a random varable wth a normal ds-
trbuton. One can assume that most bologcal phenomena result rom a sum o random processes.
3.6.1 Measres Central TenencThere are several measures that refect the central tendency or concentraton o a sample populaton:
sample mean (arthmetc average), sample medan, and sample mode.
The sample mean may be estmated rom a group o samples, xi, where is sample number,
usng the ormula below.
Gven n data ponts, x1, x
2,, x
n:
xn
xi
i
n
==
1
1
.
In practce, we typcally do not know the true mean, , o the underlyng populaton, nstead we
try to estmate true mean, , o the larger populaton. As the sample sze becomes large, the sample
mean, x, should approach the true mean,, assumng that the statstcs o the underlyng populaton
or process do not change over tme or space.
One o the problems wth usng the sample mean to represent the central tendency o a
populaton s that the sample mean s susceptble to outlers. Ths can be problematc and oten
decevng when reportng the average o a populaton that s heavly skewed. For example, when
reportng ncome or a group o new college graduates or whch one s an NBA player who has just
sgned a multmllon-dollar contract, the estmated mean ncome wll be much greater than whatmost graduates earns. The same msrepresentaton s oten evdent when reportng mean value or
homes n a specc geographc regon where a ew homes valued on the order o a mllon can hde
the act that several hundred other homes are valued at less than $200,000.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
30/102
22 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
Another useul measure or summarzng the central tendency o a populaton s the sample
medan. The medan value o a group o observatons or samples, xi, s the mddle observaton when
samples, xi, are lsted n descendng order.
For example, we have the ollowng values or tdal volume o the lung:
2, 1.5, 1.3, 1.8, 2.2, 2.5, 1.4, 1.3,
we can nd the medan value by rst orderng the data n descendng order:
2.5, 2.2, 2.0, 1.8, 1.5, 1.4, 1.3, 1.3,
and then we cross o values on each end untl we reach a mddle value:
2.5, 2.2, 2.0, 1.8, 1.5, 1.4, 1.3, 1.3.
In ths case, there are two mddle values; thus, the medan s the average o those two values, whchs 1.65.
Note that the number o samples, n, s odd, the medan wll be the mddle observaton. I
the sample sze, n, s even, then the medan equals the average o two mddle observatons. Com-
pared wth the sample mean, the sample medan s less susceptble to outlers. It gnores the skew n
a group o samples or n the probablty densty uncton o the underlyng populaton. In general
to arly represent the central tendency o a collecton o samples or the underlyng populaton, we
use the ollowng rule o thumb:
I the sample hstogram or probablty densty uncton o the underlyng populaton s
symmetrc, use mean as a central measure. For such populatons, the mean and medan
are about equal, and the mean estmate makes use o all the data.
I the sample hstogram or probablty densty uncton o the underlyng populaton s
skewed, medan s a more ar measure o center o dstrbuton.
Another measure o central tendency s mode, whch s smply the most requent observaton n
a collecton o samples. In the tdal volume example gven above, 1.3 s the most requently occurrng
sample value. Mode s not used as requently as mean or medan n representng central tendency.
3.6.2 Measres variabilitMeasures o central tendency alone are nsucent or representng the statstcs o a populaton o
process. In act, t s usually the varablty n the populaton that makes thngs nterestng and lead
1.
2.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
31/102
dATA SuMMARy ANd dESCRIPTIvE STATISTICS 23
to uncertanty n decson makng. The varablty rom subject to subject, especally n physologcal
uncton, s what makes ndng ool-proo dagnoss and treatment oten so dcult. What works
or one person oten als or another, and, t s not the mean or medan that pcks up on those
subject-to-subject derences, but rather the varablty, whch s refected n derences n the prob-
ablty models underlyng those derent populatons.
When summarzng the varablty o a populaton or process, we typcally ask, How ar romthe center (sample mean) do the samples (data) le? To answer ths queston, we typcally use the
ollowng estmates that represent the spread o the sample data: nterquartle ranges, sample var-
ance, and sample standard devaton.
The nterquartle range s the derence between the rst and thrd quartles o the sample
data. For sampled data, the medan s also known as the second quartle, Q2. Gven Q2, we can nd
the rst quartle, Q1, by smply takng the medan value o those samples that le below the second
quartle. We can nd the thrd quartle, Q3, by takng the medan value o those samples that le
above the second quartle. As an llustraton, we have the ollowng samples:
1, 3, 3, 2, 5, 1, 1, 4, 3, 2.
I we lst these samples n descendng order,
5, 4, 3, 3, 3, 2, 2, 1, 1, 1,
the medan value and second quartle or these samples s 2.5. The rst quartle, Q1, can be ound
by takng the medan o the ollowng samples,
2.5, 2, 2, 1, 1, 1,
whch s 1.5. In addton, the thrd quartle, Q3, may be ound by takng the medan value o the
ollowng samples:
5, 4, 3, 3, 3, 2.5,
whch s 3. Thus, the nterquartle range, Q3 Q1 = 3 1.5 = 2.
Sample varance, s2, s dened as the average dstance o data rom the mean and the ormula
or estmatng s2 rom a collecton o samples, xi, s
sn
x xi
i
n
2 2
1
1
1=
( ) .
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
32/102
24 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
Sample standard devaton, s, whch s more commonly reerred to n descrbng the varablty o
the data s
=2
s s (same unts as orgnal samples).
It s mportant to note that or normal dstrbutons (symmetrcal hstograms), sample mean
and sample devaton are the only parameters needed to descrbe the statstcs o the underlyng
phenomenon. Thus, one were to compare two or more normally dstrbuted populatons, one only
need to test the equvalence o the means and varances o those populatons.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
33/102
25
Now that we have collected the data, graphed the hstogram, estmated measures o central ten-
dency and varablty, such as mean, medan, and standard devaton, we are ready to assume a
probablty model or the underlyng populaton or process rom whch we have obtaned samples.
At ths pont, we wll make a rough assumpton usng smple measures o mean, medan, standard
devaton and the hstogram. But t s mportant to note that there are more rgorous tests, such as
thec2 test or normalty [7] to determne whether a partcular probablty model s approprate to
assume rom a collecton o sample data.
Once we have assumed an approprate probablty model, we may select the approprate
statstcal tests that wll allow us to test hypotheses and draw conclusons wth some level o con-
dence. The probablty model wll dctate what level o condence we have when acceptng or
rejectng a hypothess.
There are two undamental questons that we are tryng to address when assumng a prob-
ablty model or our underlyng populaton:
How condent are we that the sample statstcs are representatve o the entre
populaton?
Are the derences n the statstcs between two populatons sgncant, resultng rom
actors other than chance alone?
To declare any level o condence n makng statstcal nerence, we need a mathematcal model
that descrbes the probablty that any data value mght occur. These models are called probablty
dstrbutons.
There are a number o probablty models that are requently assumed to descrbe bologcalprocesses. For example, when descrbng heart rate varablty, the probablty o observng a specc
tme nterval between consecutve heartbeats mght be descrbed by an exponental dstrbuton [1, 8].
Fgure 3.6 n Chapter 3 llustrates a hstogram or samples drawn rom an exponental dstrbuton.
1.
2.
Assmin a Prbabilit MelFrm the Sample data
C H A P T E R 4
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
34/102
26 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
Note that ths dstrbuton s hghly skewed to the rght. For R-R ntervals, such a probablty unc-
ton makes sense physologcally because the ndvdual heart cells have a reractory perod that pre-
vents them rom contractng n less that a mnmum tme nterval. Yet, a very prolonged tme nterva
may occur between beats, gvng rse to some long tme ntervals that occur nrequently.
The most requently assumed probablty model or most scentc and engneerng applca
tons s the normal or Gaussan dstrbuton. Ths dstrbuton s llustrated by the sold black lne nFgure 4.1 and oten reerred to as the bell curve because t looks lke a muscal bell.
The equaton that gves the probablty,(x), o observng a specc value ox rom the un
derlyng normalpopulaton s
f x
x
( ) ,=-
-
1
2
1
2
2
e
< x <
where s the true mean o the underlyng populaton or process and s the standard devaton
o the same populaton or process. A graph o ths equaton s gven llustrated by the sold, smoothcurve n Fgure 4.1. The area under the curve equals one.
Note that the normal dstrbuton s
a symmetrc, bell-shaped curve completely descrbed by ts mean, , and standard deva-
ton, .
by changng and , we stretch and slde the dstrbuton.
1.
2.
0
0.05
0.1
Normalized Measure
Relative
Frequency
Histogram of Measure, with Normal Curve
-4 -3 -2 -1 0 1 2 3
FIguRE 4.1:A hstogram o 1000 samples drawn rom a normal dstrbuton s llustrated. Super-
mposed on the hstogram s the deal normal curve representng the normal probablty dstrbuton
uncton.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
35/102
ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 27
Fgure 4.1 also llustrates a hstogram that s obtaned when we randomly select 1000 samples
rom a populaton that s normally dstrbuted and has a mean o 0 and a varance o 1. It s mpor-
tant to recognze that as we ncrease the sample sze n, the hstogram approaches the deal normal
dstrbuton shown wth the sold, smooth lne. But, at small sample szes, the hstogram may look
very derent rom the normal curve. Thus, rom small sample szes, t may be dcult to determne
the assumed model s approprate or the underlyng populaton or process, and any statstcaltests that we perorm may not allow us to test hypotheses and draw conclusons wth any real level
o condence.
We can perorm lnear operatons on our normally dstrbuted random varable, x, to produce
another normally dstrbuted random varable,y. These operatons nclude multplcaton ox by a
constant and addton o a constant (oset) to x. Fgure 4.2 llustrates hstograms or samples drawn
rom each o populatons x andy. We note that the dstrbuton or y s shted (the mean s now
equal to 5) and the varance has ncreased wth respect to x.
One test that we may use to determne how well a normal probablty model ts our data
s to count how many samples all wthn 1 and 2 standard devatons o the mean. I the dataand underlyng populaton or process s well modeled by a normal dstrbuton, 68% o the samples
should le wthn 1 standard devaton rom the mean and 95% o the samples should le wthn
1050
200
100
0
xoryvalue
Frequency
y= 2x+ 5
x
FIguRE 4.2:Hstograms are shown or samples drawn rom populatons x andy, wherey s smply a ln-
ear uncton ox. Note that the mean and varance oy der rom x, yet both are normal dstrbutons.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
36/102
28 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
2 standard devatons rom the mean. These percentages are llustrated n Fgure 4.3. It s mpor
tant to remember these ew numbers, because we wll requently use ths 95% nterval when drawng conclusons rom our statstcal analyss.
Another means or determnng how well our sampled data, x, represent a normal dstrbu
ton s the estmate Pearsons coecent o skew (PCS) [5]. The coecent o skew s gven by
PCSmedian
=
3 x x
s.
I the PCS > 0.5, we assume that our samples were not drawn rom a normally dstrbuted populaton
When we collect data, the data are typcally collected n many derent types o physcal unt
(volts, celsus, newtons, centmeters, grams, etc.). For us to use tables that have been developed oprobablty models, we need to normalze the data so that the normalzed data wll have a mean o
0 and a standard devaton o 1. Such a normal dstrbuton s called a standard normal dstrbuton
and s llustrated n Fgure4.1.
3210-1-2-3
90
80
70
60
50
40
30
20
10
0
Normalized value (Z score)
Frequency
68 %95%
FIguRE 4.3: Hstogram or samples drawn rom a normally dstrbuted populaton. For a normal ds
trbuton, 68% o the samples should le wthn 1 standard devaton rom the mean (0 n ths case) and
95% o the samples should le wthn 2 standard devatons (1.96 to be precse) o the mean.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
37/102
ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 29
The standard normal dstrbuton has a bell-shaped, symmetrc dstrbuton wth = 0 and
= 1.
To convert normally dstrbuted data to the standard normal value, we use the ollowng
ormulas,
z = (x )/ or z = (x x )/s,
dependng on we know the true mean, , and standard devaton, a, or we only have the sample
estmates, x or s.
For any ndvdual sample or data pont, xi, rom a sample wth mean, x, and standard deva-
ton, s, we can determne ts z score rom the ollowng ormula:
z
x x
sii
=
.
For an ndvdual sample, the z score s a normalzed or standardzed value. We can use ths value
wth our equatons or probablty densty uncton or our standardzed probablty tables [3] to de-
termne the probablty o observng such a sample value rom the underlyng populaton.
The z score can also be thought o as a measure o the dstance o the ndvdual sample, xi,
rom the sample average, x, n unts o standard devaton. For example, a sample pont, xihas a z
score ozi= 2, t means that the data pont, x
i, s 2 standard devatons rom the sample mean.
We use normalzed z scores nstead o the orgnal data when perormng statstcal analyss
because the tables or the normalzed data are already worked out and avalable n most statstcs
texts or statstcal sotware packages. In addton, by usng normalzed values, we need not worry
about the absolute ampltude o the data or the unts used to measure the data.
4.1 THE STANdARd NoRMAL dISTRIBuTIoNThe standard normal dstrbuton s llustrated n Table 4.1.
The z table assocated wth ths gure provdes table entres that gve the probablty that z
a, whch equals the area under the normal curve to the let oz = a. I our data come rom a normal
dstrbuton, the table tells us the probablty or chance o our sample value or expermental out-
comes havng a value less than or equal to a.
Thus, we can take any sample and compute ts z score as descrbed above and then use the
z table to nd the probablty o observng a z value that s less than or equal to some normalzed
value, a. For example, the probablty o observng a z value that s less than or equal to 1.96 s97.5%. Thus, the probablty o observng a z value greater than 1.96 s 2.5%. In addton, because o
symmetry n the dstrbuton, we know that the probablty o observng a z value greater than 1.96
s also 97.5%, and the probablty o observng a z value less than or equal to 1.96 s 2.5%. Fnally,
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
38/102
3210-1-2-3-4
Measure
Frequency
ZDistribution
Z
Area to let oza
equals the Pr(z < za) = 1 a; thus, the area n the tal to the rght o z
equals a.
TABLE 4.1: Standard z dstrbuton uncton: areas under standardzed normal densty uncton
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.l5279 0.5319 0.535
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.575
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.614
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9631.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.970
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.976
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.981
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.993
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.995
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.996
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.999
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
39/102
ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 31
the probablty o observng a z value between 1.96 and 1.96 s 95%. The reader should study the
z table and assocated graph o the z dstrbuton to very that the probabltes (or areas under the
probablty densty uncton) descrbed above are correct.
Oten, we need to determne the probablty that an expermental outcome alls between two
values or that the outcome s greater than some value a or less or greater than some value b. To nd
these areas, we can use the ollowng mportant ormulas, where Pr s the probablty:
Pr(azb) = Pr(zb) Pr(za)
= area between z = a and z = b.
Pr(za) = 1 Pr(z < a)
= area to rght oz = a
= area n the rght tal.
Thus, or any observaton or measurement, x, rom any normal dstrbuton:
Pr( ) Pr ,a x b
az
b =
where s the mean o normal dstrbuton and s the standard devaton o normal dstrbuton.
In other words, we need to normalze or nd the z values or each o our parameters, a and b,
to nd the area under the standard normal curve (z dstrbuton) that represents the expresson onthe let sde o the above equaton.
Eample 4.1 The mean ntake o at or males 6 to 9 years old s 28 g, wth a standard devaton
o 13.2 g. Assume that the ntake s normally dstrbuted. Steves ntake s 42 g and Bens ntake s
25 g.
AREA IN RIgHTTAIL,a Za
0.10 1.282
0.05 1.645
0.025 1.96
0.010 2.326
0.005 2.576
Commonly used z values:
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
40/102
32 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
What s the proporton o area between Steves daly ntake and Bens daly ntake?
I we were to randomly select a male between the ages o 6 and 9 years, what s the prob-
ablty that hs at ntake would be 50 g or more?
Solution:x = at ntake
The problem may be stated as: what s Pr(25 x 42)?Assumng a normal dstrbuton, we convert to z scores:
What s Pr(((25 28)/13.2) < z < ((42 28)/13.2)))?
= Pr (0.227 z 1.06) = Pr (z 1.06) Pr (z 0.227) (usng ormula o
Pr (azb))
` = Pr (z 1.06) [1 Pr(z 0.227)] = 0.8554 [1 0.5910] = 0.4464 or 44.6% o
area under the z curve.
2. The problem may be stated as, What s Pr (x > 50)?
Normalzng to z score, what s Pr (z > (50 28)/13.2)?
= Pr (z > 1.67)= 1 Pr (z 1.67) = 1 0.9525 = 0.0475, or 4.75% o the area
under the z curve.
Eample 4.2 Suppose that the speccatons on the range o dsplacement or a lmb ndentor are
0.5 0.001 mm. I these dsplacements are normally dstrbuted, wth mean = 0.47 and standard
devaton = 0.002, what percentage o ndentors are wthn speccatons?
Solution:x = dsplacement.
The problem may be stated as, What s Pr(0.499 x 0.501)?
Usng z scores, Pr(0.499 x 0.501) = Pr((0.499 0.47)/0.002 z (0.501 0.470.002))
= Pr (14.5 z 15.5) = Pr (z 15.5) Pr (z 14.5) = 1 1 = 0
It s useul to note that the dstrbuton o the underlyng populaton and the assocated sample
data are not normal (.e. skewed), transormatons may oten be used to make the data normal
and the statstcs covered n ths text may then be used to perorm statstcal analyss on the trans-
ormed data. These transormatons on the raw data nclude logs, square root, and recprocal.
4.2 THE NoRMAL dISTRIBuTIoN ANd SAMPLE MEANAll statstcal analyss ollows the same general procedure:
Assume an underlyng dstrbuton or the data and assocated parameters (e.g., the
sample mean).
1.
2.
1.
1.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
41/102
ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 33
Scale the data or parameter to a standard dstrbuton.
Estmate condence ntervals usng a standard table or the assumed dstrbuton. (The
queston we ask s, What s the probablty o observng the expermental outcome by
chance alone?)
Perorm hypothess test (e.g., Students ttest).
We are begnnng wth and ocusng most o ths text on the normal dstrbuton or probablty
model because o ts prevalence n the bomedcal eld and somethng called the central lmt
theorem. One o the most basc statstcal tests we perorm s a comparson o the means rom
two or more populatons. The sample mean s t tsel an estmate made rom a nte number
o samples. Thus, the sample mean, x, s tsel a random varable that s modeled wth a normal
dstrbuton [4].
Is ths model or x legtmate? The answer s yes, or large samples, because o the central
lmt theorem, whch states [4, 10]:
I the individual data points or samples (each sample is a random variable),x, come rom any arbitrary
probability distribution, the sum (and hence, average) o those data points is normally distributed as the
sample size, n, becomes large.
Thus, even each sample, such as the toss o a dce, comes rom a nonnormal dstrbuton
(e.g., a unorm dstrbuton, such as the toss o a dce), the sum o those ndvdual samples (such
as the sum we use to estmate the sample mean, x) wll have a normal dstrbuton. One can eas-
ly assume that many o the bologcal or physologcal processes that we measure are the sum o a
number o random processes wth varous probablty dstrbutons; thus, the assumpton that oursamples come rom a normal dstrbuton s not unreasonable.
4.3 CoNFIdENCE INTERvAL FoR THE SAMPLE MEANEvery sample statstc s n tsel a random varable wth some sort o probablty dstrbuton. Thus,
when we use samples to estmate the true statstcs o a populaton (whch n practce are usually not
known and not obtanable), we want to have some level o condence that our sample estmates are
close to the true populaton statstcs or are representatve o the underlyng populaton or process.
In estmatng a condence nterval or the sample mean, we are askng the queston: How
close s our sample mean (estmated rom a nte number o samples) to the true mean o thepopulaton?
To assgn a level o condence to our statstcal estmates or statstcal conclusons, we need
to rst assume a probablty dstrbuton or model or our samples and underlyng populaton and
then we need to estmate a condence nterval usng the assumed dstrbuton.
2.
3.
4.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
42/102
34 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
The smplest condence nterval that we can begn wth regardng our descrptve statstcs
a condence nterval or the sample mean, x. Our queston s how close s the sample mean to the
true populaton or process mean, ?
Beore we can answer ths, we need to assume an underlyng probablty model or the sampl
mean, x, and true mean, . As stated earler, t may be shown that or a large samples sze, the sampl
mean, x, s well modeled by a normal dstrbuton or probablty model. Thus, we wll use ths modewhen estmatng a condence nterval or our sample mean, x.
Thus, x s estmated rom the sample. We then ask, how close s x (sample mean) to the true
populaton mean, ?
It may be shown that we took many groups on samples and estmated x or each group,
the average or sample mean ox = , and
the standard devaton ox = s n.
Thus, as our sample sze, n, gets large, the dstrbuton or x approaches a normal dstrbuton.For large n, x ollows a normal dstrbuton, and the z score or xmay be used to estmate th
ollowng:
Pr( ) Pr
/ /.a x b
a
nz
b
n =
Ths expresson assumes a large n and that we know.
Now we look at the case where we mght have a large n, but we do not know. In such caseswe replace wth s to get the ollowng expresson:
Pr( ) Pr
/ /,a x b
a
s nz
b
s n =
where s n s called the sample standard error and represents the standard devaton or x.
Let us assume now or large n, we want to estmate the 95% condence nterval or x. W
rst scale the sample mean, x, to a z value (because the central lmt theorem says that x s normally
dstrbuted)
z
x
s n
=
/.
1.
2.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
43/102
ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 35
We recall that 95% oz values all between 1.96 (approxmately 2) o the mean, and or the z
dstrbuton,
Pr(1.96 z1.96) = 0.95.
Substtutng or z,
z
x
s n
=
/.
we get
0 95 1 96 1 96. Pr .
/. .=
x
s n
I we use the ollowng notaton n terms o the sample standard error:
SE( ) .xs
n
=
Rearrangng terms or the expresson above, we note that the probablty that les between 1.96
(or 2) standard devatons ox s 95%:
0 9 5 1 96 1 96. Pr . SE( ) . SE( ).= +( )x x x x
Note that 1.96 s reerred to as za/2. Ths z value s the value oz or whch the area n the rghttal o the normal dstrbuton s a/2. I we were to estmate the 99% condence nterval, we would
substtute z0.01/2
, whch s 2.576, nto the 1.96 poston above.
Thus, For large n and any condence level, 1 a, the 1 acondence nterval or the true
populaton mean, , s gven by:
= x z SE x/ ( ).2
Ths means that there s a (1 a)percent probablty that the true mean les wthn the above nter-
val centered about x.
Eample 4.3 Estmate o condence ntervals
Problem: Gven a collecton o data wth, x = 505 and s = 100. I the number o samples was 1000,
what s the 95% condence nterval or the populaton mean, ?
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
44/102
36 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
Solution: I we assume a large sample sze, we may use the z dstrbuton to estmate the condenc
nterval or the sample mean usng the ollowng equaton:
= x z x/ SE( ).2
We plug n the ollowng values:
x=505;SE( ) / / .x s n= = 100 1000
For 95% condence nterval, a= 0.05.
Usng a z table to locate z(0.05/2), we nd that the value oz that gves an area o 0.025 n
the rght tal s 1.96
Pluggng n x, SE(x), and z(a/2) nto the estmate or the condence nterval above, we nd
that the 95% condence nterval or = [498.80, 511.20].
Note that we wanted to estmate the 99% condence nterval, we would smply use a d-erent z value, z (0.01/2) n the same equaton. The z value assocated wth an area o 0.005 n th
rght tal s 2.576. I we use ths z value, we estmate a condence nterval or o [496.86, 515.14]
We note that the condence nterval has wdened as we ncreased our condence level.
4.4 THE tdISTRIBuTIoNFor small samples, x s no longer normally dstrbuted. Thereore, we use Students tdstrbuton to
estmate the true statstcs o the populaton. The tdstrbuton, as llustrated n Table 4.2 looks lk
a z dstrbuton but wth slower taper at the tals and fatter central regon.
Measure
Frequency
t Distribution
t
Curve changes with df
3210-1-2-3-4
Table entry = t(a; d), where as the area n the tal to the rght ot(a; d) and ds degree
o reedom.
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
45/102
ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 37
TABLE 4.2: Percentage ponts or Students tdstrbuton
d a= area to rght ot(a; d)
0.10 0.05 0.025 0.01 0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
1.282 1.645 1.960 2.326 2.576
za(large sample) 1.282 1.645 1.960 2.326 2.576
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
46/102
38 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS
We use tdstrbuton, wth slower tapered tals, because wth so ew samples, we have les
certanty about our underlyng dstrbuton (probablty model).
Now our normalzed value or x s gven by
s n
/
,x
whch s known to have a tdstrbuton rather than the z dstrbuton that we have dscussed thu
ar. The tdstrbuton was rst nvented by W.S. Gosset [4, 11], a chemst who worked or a brew
ery. Gosset decded to publsh hs tdstrbuton under the alas o Student. Hence, we oten reer to
ths dstrbuton as Students tdstrbuton.
The tdstrbuton s symmetrc lke the z dstrbuton and generally has a bell shape. But th
amount o spread o the dstrbuton to the tals, or the wdth o the bell, depends on the sample
sze, n. Unlke the z dstrbuton, whch assumes an nnte sample sze, the tdstrbuton change
shape wth sample sze. The result s that the condence ntervals estmated wth tvalues are morspread out than or z dstrbuton, especally or small sample szes, because wth such sample
szes, we are penalzed or not havng sucent samples to represent the extent o varablty o
the underlyng populaton or process. Thus, when we are estmatng condence ntervals or th
sample mean, x, we do not have as much condence n our estmate. Thus, the nterval wdens to
refect ths decreased certanty wth smaller sample szes. In the next secton, we estmate the con
dence nterval or the same example gven prevously, but usng the tdstrbuton nstead o the z
dstrbuton.
4.5 CoNFIdENCE INTERvAL uSINg tdISTRIBuTIoNLke the z tables, there are t tables where the values ot that are assocated wth derent area
under the probablty curve are already calculated and may be used or statstcal analyss wthou
the need to recalculate the tvalues. The derence between the z table and ttable s that now th
tvalues are a uncton o the samples sze or degrees o reedom. Table 4.2 gves a ew lnes o th
ttable rom [3].
To use the table, one smply looks or the ntersecton o degrees o reedom, d, (related to
sample sze) and avalue that one desres n the rght-hand tal. The ntersecton provdes the tvalu
or whch the area under the tcurve n the rght tal s a. In other words, the probablty that twlbe less than or equal to a specc entry n the table s 1 a. For a specc sample sze, n, the degree
o reedom, d= n 1.
Now we smply substtute tor z to nd our condence ntervals. So, the condence nterva
or the sample mean, x, usng the tdstrbuton now becomes
8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella
47/102
ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 39
= 1 SE( )
x t 2n;
x
where s the true mean o the underlyng populaton or process rom whch we are drawng sam-
ples, SE(x) s the standard error o the underlyng populaton or process, ts the tvalue or whch
there s an area oa/2 n the rght tal, and n s the sample sze.
Eample 4.4 Condence nterval usng tdstrbuton
Problem: We consder the same example used prevously or estmatng condence ntervals usng
z values. In ths case, the sample sze s small (n = 20), so we now use a tdstrbuton.
Solution: Now, our estmate or condence nterval or = 1 SE( )
x t 2n;
x .
Agan, we plug n the ollowng values:
x=505;SE( ) / / .x s n= = 100 20
For 95% condence nterval, a= 0.05.
Usng a ttable to locate t(0.05/2, 20 1), we nd that or 19 d, the value otthat gves an
area o 0.025 n the rght tal s 2.093.
Pluggng n x, SE(x), and t(a, n 1) nto the estmate or the condence nterval above, we
nd that the 95% condence nterval or = [458.20, 551.80] We note that ths condence nterval
s wdened compared wth that prevously estmated usng the z values. Ths s expected because the
tdstrbuton s wder than the z dstrbuton at small sample szes, refectng the act that we have
less condence n our estmate ox and, hence, when the sample sze s small.
Condence ntervals may be estmated or most descrptve statstcs, such as the samplemean, sample varance, and even the lne o best t determned through lnear regresson [3]. As
noted above, the condence nterval refects the varablty n the parameter estmate and s n-
fuenced by the sample sze, the varablty o the populaton, and the level on condence that we
desre. The greater the desred condence, the wder the nterval. Lkewse the greater the varablty
n the samples, the wder the condence nterval.
Eample 4.5 Revew o probablty concepts
Problem: Assembly tmes were measured or a sample o 15 glucose nuson pumps. The mean tme
to assem