Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

Embed Size (px)

Citation preview

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    1/102

    Intrctin t Statistics

    r Bimeical Enineers

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    2/102

    Copyrght 2007 by Morgan & Claypool

    All rghts reserved. No part o ths publcaton may be reproduced, stored n a retreval system, or transmtted n

    any orm or by any meanselectronc, mechancal, photocopy, recordng, or any other except or bre quotatons n

    prnted revews, wthout the pror permsson o the publsher.

    Introducton to Statstcs or Bomedcal Engneers

    Krstna M. Ropella

    www.morganclaypool.com

    ISBN: 1598291963 paperback

    ISBN: 9781598291964 paperback

    ISBN: 1598291971 ebook

    ISBN: 9781598291971 ebook

    DOI: 10.2200/S00095ED1V01Y200708BME014

    A Publcaton n the Morgan & Claypool Publshers seres

    SYNTHESIS LECTURES ON BIOMEDICAL ENGINEERING #14

    Lecture #14

    Seres Edtor: John D. Enderle, Unversty o Connectcut

    Series ISSN

    ISSN 1930-0328 prnt

    ISSN 1930-0336 electronc

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    3/102

    Intrctin t Statisticsr Bimeical EnineersKristina M. RpellaDepartment o Bomedcal Engneerng

    Marquette Unversty

    SYNTHESIS LECTURES ON BIOMEDICAL ENGINEERING #14

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    4/102

    This text is dedicated to all the students who have completed my BIEN 084statistics course or biomedical engineers and have taught me how to be

    more eective in communicating the subject matter and making statisticscome alive or them. I also thank J. Claypool or his patience andor encouraging me to fnally put this text together.

    Finally, I thank my amily or tolerating my time at home on the laptop.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    5/102

    ABSTRACTThere are many books wrtten about statstcs, some bre, some detaled, some humorous, some

    colorul, and some qute dry. Each o these texts s desgned or a specc audence. Too oten, texts

    about statstcs have been rather theoretcal and ntmdatng or those not practcng statstcal

    analyss on a routne bass. Thus, many engneers and scentsts, who need to use statstcs much

    more requently than calculus or derental equatons, lack sucent knowledge o the use o

    statstcs. The audence that s addressed n ths text s the unversty-level bomedcal engneerng

    student who needs a bare-bones coverage o the most basc statstcal analyss requently used n

    bomedcal engneerng practce. The text ntroduces students to the essental vocabulary and basc

    concepts o probablty and statstcs that are requred to perorm the numercal summary and sta-

    tstcal analyss used n the bomedcal eld. Ths text s consdered a startng pont or mportant

    ssues to consder when desgnng experments, summarzng data, assumng a probablty model or

    the data, testng hypotheses, and drawng conclusons rom sampled data.

    A student who has completed ths text should have sucent vocabulary to read more ad-

    vanced texts on statstcs and urther ther knowledge about addtonal numercal analyses that are

    used n the bomedcal engneerng eld but are beyond the scope o ths text. Ths book s desgned

    to supplement an undergraduate-level course n appled statstcs, speccally n bomedcal eng-

    neerng. Practcng engneers who have not had ormal nstructon n statstcs may also use ths text

    as a smple, bre ntroducton to statstcs used n bomedcal engneerng. The emphass s on the

    applcaton o statstcs, the assumptons made n applyng the statstcal tests, the lmtatons o

    these elementary statstcal methods, and the errors oten commtted n usng statstcal analyss.

    A number o examples rom bomedcal engneerng research and ndustry practce are provded to

    assst the reader n understandng concepts and applcaton. It s benecal or the reader to have

    some background n the le scences and physology and to be amlar wth basc bomedcal n-

    strumentaton used n the clncal envronment.

    KEywoRdSprobablty model, hypothess testng, physology, ANOVA, normal dstrbuton,

    condence nterval, power test

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    6/102

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    7/102

    Cntents

    1. Intrctin .......................................................................................................1

    2. Cllectin data an Eperimental desin ...........................................................5

    3. data Smmar an descriptie Statistics ............................................................9

    3.1 Why Do We Collect Data? ................................................................................ 9

    3.2 Why Do We Need Statstcs? ............................................................................. 9

    3.3 What Questons Do We Hope to Address Wth Our Statstcal Analyss? ..... 10

    3.4 How Do We Graphcally Summarze Data? .................................................... 11

    3.4.1 Scatterplots ........................................................................................... 113.4.2 Tme Seres ........................................................................................... 11

    3.4.3 Box-and-Whsker Plots ........................................................................ 12

    3.4.4 Hstogram ............................................................................................. 13

    3.5 General Approach to Statstcal Analyss ......................................................... 17

    3.6 Descrptve Statstcs ........................................................................................ 20

    3.6.1 Measures o Central Tendency ............................................................. 21

    3.6.2 Measures o Varablty ......................................................................... 22

    4. Assmin a Prbabilit Mel Frm the Sample data ........................................ 25

    4.1 The Standard Normal Dstrbuton .................................................................. 29

    4.2 The Normal Dstrbuton and Sample Mean .................................................... 32

    4.3 Condence Interval or the Sample Mean ....................................................... 33

    4.4 The tDstrbuton ............................................................................................ 36

    4.5 Condence Interval Usng tDstrbuton .......................................................... 38

    5. Statistical Inerence .......................................................................................... 41

    5.1 Comparson o Populaton Means .................................................................... 41

    5.1.1 The tTest ............................................................................................. 42

    5.1.1.1 Hypothess Testng ................................................................ 425.1.1.2 Applyng the tTest ................................................................ 43

    5.1.1.3 Unpared tTest ...................................................................... 44

    5.1.1.4 Pared tTest ........................................................................... 49

    5.1.1.5 Example o a Bomedcal Engneerng Challenge ................. 50

    ii

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    8/102

    5.2 Comparson o Two Varances .......................................................................... 54

    5.3 Comparson o Three or More Populaton Means ........................................... 59

    5.3.1 One-Factor Experments ...................................................................... 60

    5.3.1.1 Example o Bomedcal Engneerng Challenge .................... 60

    5.3.2 Two-Factor Experments ...................................................................... 69

    5.3.3 Tukeys Multple Comparson Procedure ............................................. 73

    6. Linear Reressin an Crrelatin Analsis ....................................................... 75

    7. Per Analsis an Sample Size ........................................................................ 81

    7.1 Power o a Test ................................................................................................. 82

    7.2 Power Tests to Determne Sample Sze ............................................................ 83

    8. Jst the Beinnin ............................................................................................ 87

    Bibliraph ............................................................................................................. 91

    Athr Biraph ...................................................................................................... 93

    iii INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    9/102

    1

    C H A P T E R 1

    Bomedcal engneers typcally collect all sorts o data, rom patents, anmals, cell counters, mcro-

    assays, magng systems, pressure transducers, bedsde montors, manuacturng processes, materal

    testng systems, and other measurement systems that support a broad spectrum o research, desgn,

    and manuacturng envronments. Ultmately, the reason or collectng data s to make a decson.

    That decson may concern derentatng bologcal characterstcs among derent populatons

    o people, determnng whether a pharmacologcal treatment s eectve, determnng whether t s

    cost-eectve to nvest n multmllon-dollar medcal magng technology, determnng whether a

    manuacturng process s under control, or selectng the best rehabltatve therapy or an ndvdual

    patent.

    The challenge n makng such decsons oten les n the act that all real-world data contans

    some element o uncertanty because o random processes that underle most physcal phenomenon.

    These random elements prevent us rom predctng the exact value o any physcal quantty at any

    moment o tme. In other words, when we collect a sample or data pont, we usually cannot predct

    the exact value o that sample or expermental outcome. For example, although the average restng

    heart rate o normal adults s about 70 beats per mnute, we cannot predct the exact arrval tme

    o our next heartbeat. However, we can approxmate the lkelhood that the arrval tme o the next

    heartbeat wll all n a specc tme nterval we have a good probablty model to descrbe the

    random phenomenon contrbutng to the tme nterval between heartbeats. The tmng o heart-

    beats s nfuenced by a number o physologcal varables [1], ncludng the reractory perod o

    the ndvdual cells that make up the heart muscle, the leakness o the cell membranes n the snus

    node (the hearts natural pacemaker), and the actvty o the autonomc nervous system, whch may

    speed up or slow down the heart rate n response to the bodys need or ncreased blood fow, oxygen,

    and nutrents. The sum o these bologcal processes produces a pattern o heartbeats that we may

    measure by countng the pulse rate rom our wrst or carotd artery or by searchng or specc QRS

    waveorms n the ECG [2]. Although ths sum o events makes t dcult or us to predct exactly

    when the new heartbeat wll arrve, we can guess, wth a certanty amount o condence when the

    next beat wll arrve. In other words, we can assgn a probablty to the lkelhood that the next

    heartbeat wll arrve n a speced tme nterval. I we were to consder all possble arrval tmes and

    Intrctin

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    10/102

    2 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    assgned a probablty to those arrval tmes, we would have a probablty model or the heartbeat

    ntervals. I we can nd a probablty model to descrbe the lkelhood o occurrence o a certan

    event or expermental outcome, we can use statstcal methods to make decsons. The probablty

    models descrbe characterstcs o the populaton or phenomenon beng studed. Statstcal analys

    then makes use o these models to help us make decsons about the populaton(s) or processes.

    The conclusons that one may draw rom usng statstcal analyss are only as good as thunderlyng model that s used to descrbe the real-world phenomenon, such as the tme nterva

    between heartbeats. For example, a normally unctonng heart exhbts consderable varablty n

    beat-to-beat ntervals (Fgure 1.1). Ths varablty refects the bodys contnual eort to mantan

    homeostass so that the body may contnue to perorm ts most essental unctons and supply the

    body wth the oxygen and nutrents requred to uncton normally. It has been demonstrated through

    bomedcal research that there s a loss o heart rate varablty assocated wth some dseases, such

    as dabetes and schemc heart dsease. Researchers seek to determne ths derence n varablty

    between normal subjects and subjects wth heart dsease s sgncant (meanng, t s due to some

    underlyng change n bology and not smply a result o chance) and whether t mght be used topredct the progresson o the dsease [1]. One wll note that the probablty model changes as a

    consequence o changes n the underlyng bologcal uncton or process. In the case o manuactur

    ng, the probablty model used to descrbe the output o the manuacturng process may change a

    FIguRE 1.1: Example o an ECG recordng, where R-R nterval s dened as the tme nterval be

    tween successve R waves o the QRS complex, the most promnent waveorm o the ECG.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    11/102

    INTRoduCTIoN 3

    a uncton o machne operaton or changes n the surroundng manuacturng envronment, such as

    temperature, humdty, or human operator.

    Besdes helpng us to descrbe the probablty model assocated wth real-world phenomenon,

    statstcs help us to make decsons by gvng us quanttatve tools or testng hypotheses. We call

    ths inerential statistics, whereby the outcome o a statstcal test allows us to draw conclusons or

    make nerences about one or more populatons rom whch samples are drawn. Most oten, scen-tsts and engneers are nterested n comparng data rom two or more derent populatons or rom

    two or more derent processes. Typcally, the deault hypothess s that there s no derence n the

    dstrbutons o two or more populatons or processes, and we use statstcal analyss to determne

    whether there are true derences n the dstrbutons o the underlyng populatons to warrant d-

    erent probablty models be assgned to the ndvdual processes.

    In summary, bomedcal engneers typcally collect data or samples rom varous phenomena,

    whch contan some element o randomness or unpredctable varablty, or the purposes o makng

    decsons. To make sound decsons n the context o the uncertanty wth some level o condence,

    we need to assume some probablty model or the populatons rom whch the samples have beencollected. Once we have assumed an underlyng model, we can select the approprate statstcal

    tests or comparng two or more populatons and then use these tests to draw conclusons about

    FIguRE 1.2:Steps n statstcal analyss.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    12/102

    our hypotheses or whch we collected the data n the rst place. Fgure 1.2 outlnes the steps o

    perormng statstcal analyss o data.

    In the ollowng chapters, we wll descrbe methods or graphcally and numercally sum

    marzng collected data. We wll then talk about ttng a probablty model to the collected data by

    brefy descrbng a number o well-known probablty models that are used to descrbe bologca

    phenomenon. Fnally, once we have assumed a model or the populatons rom whch we have collected our sample data, we wll dscuss the types o statstcal tests that may be used to compare data

    rom multple populatons and allow us to test hypotheses about the underlyng populatons.

    4 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    13/102

    5

    C H A P T E R 2

    Beore we dscuss any type o data summary and statstcal analyss, t s mportant to recognze that

    the value o any statstcal analyss s only as good as the data collected. Because we are usng data

    or samples to draw conclusons about entre populatons or processes, t s crtcal that the data col-

    lected (or samples collected) are representatve o the larger, underlyng populaton. In other words,

    we are tryng to determne whether men between the ages o 20 and 50 years respond postvely

    to a drug that reduces cholesterol level, we need to careully select the populaton o subjects or

    whom we admnster the drug and take measurements. In other words, we have to have enough

    samples to represent the varablty o the underlyng populaton. There s a great deal o varety n

    the weght, heght, genetc makeup, det, exercse habts, and drug use n all men ages 20 to 50 years

    who may also have hgh cholesterol. I we are to test the eectveness o a new drug n lowerng

    cholesterol, we must collect enough data or samples to capture the varablty o bologcal makeup

    and envronment o the populaton that we are nterested n treatng wth the new drug. Capturng

    ths varablty s oten the greatest challenge that bomedcal engneers ace n collectng data and

    usng statstcs to draw meanngul conclusons. The expermentalst must ask questons such as the

    ollowng:

    What type o person, object, or phenomenon do I sample?

    What varables that mpact the measure or data can I control?

    How many samples do I requre to capture the populaton varablty to apply the appro-

    prate statstcs and draw meanngul conclusons?

    How do I avod basng the data wth the expermental desgn?

    Expermental desgn, although not the prmary ocus o ths book, s the most crtcal step to sup-

    port the statstcal analyss that wll lead to meanngul conclusons and hence sound decsons.

    One o the most undamental questons asked by bomedcal researchers s, What sze sam-

    ple do I need? or How many subjects wll I need to make decsons wth any level o condence?

    Cllectin data anEperimental desin

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    14/102

    We wll address these mportant questons at the end o ths book when concepts such as varablty

    probablty models, and hypothess testng have already been covered. For example, power tests wl

    be descrbed as a means or predctng the sample sze requred to detect sgncant derences n

    two populaton means usng a ttest.

    Two elements o expermental desgn that are crtcal to prevent basng the data or selectng

    samples that do not arly represent the underlyng populaton are randomzaton and blockng.Randomzaton reers to the process by whch we randomly select samples or expermenta

    unts rom the larger underlyng populaton such that we maxmze our chance o capturng the

    varablty n the underlyng populaton. In other words, we do not lmt our samples such tha

    only a racton o the characterstcs or behavors o the underlyng populaton are captured n the

    samples. More mportantly, we do not bas the results by artcally lmtng the varablty n the

    samples such that we alter the probablty model o the sample populaton wth respect to the prob

    ablty model o the underlyng populaton.

    In addton to randomzng our selecton o expermental unts rom whch to take samples, w

    mght also randomze our assgnment o treatments to our expermental unts. Or, we may random-ze the order n whch we take data rom the expermental unts. For example, we are testng the

    eectveness o two derent medcal magng methods n detectng bran tumor, we wll randomly

    assgn all subjects suspect o havng bran tumor to one o the two magng methods. Thus, we hav

    a mx o sex, age, and type o bran tumor partcpatng n the study, we reduce the chance o havng

    all one sex or one age group assgned to one magng method and a very derent type o populaton

    assgned to the second magng method. I a derence s noted n the outcome o the two magng

    methods, we wll not artcally ntroduce sex or age as a actor nfuencng the magng results.

    As another example, one are testng the strength o three derent materals or use n

    hp mplants usng several strength measures rom a materals testng machne, one mght random

    ze the order n whch samples o the three derent test materals are submtted to the machne

    Machne perormance can vary wth tme because o wear, temperature, humdty, deormaton

    stress, and user characterstcs. I the bomedcal engneer were asked to nd the strongest matera

    or an artcal hp usng specc strength crtera, he or she may conduct an experment. Let us

    assume that the engneer s gven three boxes, wth each box contanng ve artcal hp mplant

    made rom one o three materals: ttanum, steel, and plastc. For any one box, all ve mplan

    samples are made rom the same materal. To test the 15 derent mplants or materal strength

    the engneer mght randomze the order n whch each o the 15 mplants s tested n the mater-

    als testng machne so that tme-dependent changes n machne perormance or machne-matera

    nteractons or tme-varyng envronmental condton do not bas the results or one or more o th

    materals. Thus, to ully randomze the mplant testng, an engneer may lterally place the number

    115 n a hat and also assgn the numbers 115 to each o the mplants to be tested. The engnee

    wll then blndly draw one o the 15 numbers rom a hat and test the mplant that corresponds to

    6 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    15/102

    CoLLECTINg dATA ANd ExPERIMENTAL dESIgN 7

    that number. Ths way the engneer s not testng all o one materal n any partcular order, and we

    avod ntroducng order eects nto the data.

    The second aspect o expermental desgn s blockng. In many experments, we are nterested

    n one or two specc actors or varables that may mpact our measure or sample. However, there

    may be other actors that also nfuence our measure and conound our statstcs. In good exper-

    mental desgn, we try to collect samples such that derent treatments wthn the actor o nterestare not based by the derng values o the conoundng actors. In other words, we should be cer-

    tan that every treatment wthn our actor o nterest s tested wthn each value o the conoundng

    actor. We reer to ths desgn as blockng by the conoundng actor. For example, we may want to

    study weght loss as a uncton o three derent det plls. One conoundng actor may be a persons

    startng weght. Thus, n testng the eectveness o the three plls n reducng weght, we may want

    to block the subjects by startng weght. Thus, we may rst group the subjects by ther startng

    weght and then test each o the det plls wthn each group o startng weghts.

    In bomedcal research, we oten block by expermental unt. When ths type o blockng s

    part o the expermental desgn, the expermentalst collects multple samples o data, wth eachsample representng derent expermental condtons, rom each o the expermental unts. Fg-

    ure 2.1 provdes a dagram o an experment n whch data are collected beore and ater patents

    receves therapy, and the expermental desgn uses blockng (let) or no blockng (rght) by exper-

    mental unt. In the case o blockng, data are collected beore and ater therapy rom the same set o

    human subjects. Thus, wthn an ndvdual, the same bologcal actors that nfuence the bologcal

    response to the therapy are present beore and ater therapy. Each subject serves as hs or her own

    control or actors that may randomly vary rom subject to subject both beore and ater therapy.

    In essence, wth blockng, we are elmnatng bases n the derences between the two populatons

    Block (Repeated Measures) No Block (No repeated measures)

    Subject Measure

    beforetreatment

    Measure

    aftertreatment

    Subject Measure

    beforetreatment

    Subject Measure

    aftertreatment

    1 M11 12 1 M1 K+1 M(K+1)

    2 M21 M22 2 M2 K+2 M(K+2)

    3 M31 M32 3 M3 K+3 M(K+3)

    . . .

    . . .K MK1 MK2 K MK K+K M(K+K)

    FIguRE 2.1:Samples are drawn rom two populatons (beore and ater treatment), and the exper-

    mental desgn uses block (let) or no block (rght). In ths case, the block s the expermental unt (sub-

    ject) rom whch the measures are made.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    16/102

    8 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    (beore and ater) that may result because we are usng two derent sets o expermental unts. Fo

    example, we used one set o subjects beore therapy and then an entrely derent set o subject

    ater therapy (Fgure 2.1, rght), there s a chance that the two sets o subjects may vary enough n

    sex, age, weght, race, or genetc makeup, whch would lead to a derence n response to the therapy

    that has lttle to do wth the underlyng therapy. In other words, there may be conoundng actor

    that contrbute to the derence n the expermental outcome beore and ater therapy that are noonly a actor o the therapy but really an artact o derences n the dstrbutons o the two der

    ent groups o subjects rom whch the two samples sets were chosen. Blockng wll help to elmnat

    the eect o ntersubject varablty.

    However, blockng s not always possble, gven the nature o some bomedcal research stud

    es. For example, one wanted to study the eectveness o two derent chemotherapy drugs n

    reducng tumor sze, t s mpractcal to test both drugs on the same tumor mass. Thus, the two

    drugs are tested on derent groups o ndvduals. The same type o desgn would be necessary o

    testng the eectveness o weght-loss regmens.

    Thus, some mportant concepts and dentons to keep n mnd when desgnng expermentnclude the ollowng:

    experimental unit: the tem, object, or subject to whch we apply the treatment and rom

    whch we take sample measurements;

    randomization: allocate the treatments randomly to the expermental unts;

    blocking: assgnng all treatments wthn a actor to every level o the blockng actor.

    Oten, the blockng actor s the expermental unt. Note that n usng blockng, we stll

    randomze the order n whch treatments are appled to each expermental unt to avod

    orderng bas.Fnally, the expermentalst must always thnk about how representatve the sample populaton

    wth respect to the greater underlyng populaton. Because t s vrtually mpossble to test every

    member o a populaton or every product rollng down an assembly lne, especally when destruc

    tve testng methods are used, the bomedcal engneer must oten collect data rom a much smalle

    sample drawn rom the larger populaton. It s mportant, the statstcs are gong to lead to useu

    conclusons, that the sample populaton captures the varablty o the underlyng populaton. Wha

    s even more challengng s that we oten do not have a good grasp o the varablty o the underly

    ng populaton, and because o expense and respect or le, we are typcally lmted n the number o

    samples we may collect n bomedcal research and manuacturng. These lmtatons are not easy toaddress and requre that the engneer always consder how ar the sample and data analyss s and

    how well t represents the underlyng populaton(s) rom whch the samples are drawn.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    17/102

    9

    C H A P T E R 3

    We assume now that we have collected our data through the use o good expermental desgn. We

    now have a collecton o numbers, observatons, or descrptons to descrbe our data, and we would

    lke to summarze the data to make decsons, test a hypothess, or draw a concluson.

    3.1 wHy do wE CoLLECT dATA?The world s ull o uncertanty, n the sense that there are random or unpredctable actors that

    nfuence every expermental measure we make. The unpredctable aspects o the expermental out-

    comes also arse rom the varablty n bologcal systems (due to genetc and envronmental ac-

    tors) and manuacturng processes, human error n makng measurements, and other underlyng

    processes that nfuence the measures beng made.

    Despte the uncertanty regardng the exact outcome o an experment or occurrence o a u-

    ture event, we collect data to try to better understand the processes or populatons that nfuence an

    expermental outcome so that we can make some predctons. Data provde normaton to reduce

    uncertanty and allow or decson makng. When properly collected and analyzed, data help us

    solve problems. It cannot be stressed enough that the data must be properly collected and analyzed

    the data analyss and subsequent conclusons are to have any value.

    3.2 wHy do wE NEEd STATISTICS?We have three major reasons or usng statstcal data summary and analyss:

    The real world s ull o random events that cannot be descrbed by exact mathematcal

    expressons.

    Varablty s a natural and normal characterstc o the natural world.

    We lke to make decsons wth some condence. Ths means that we need to nd trends

    wthn the varablty.

    1.

    2.

    3.

    data Smmar andescriptie Statistics

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    18/102

    10 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    3.3 wHAT QuESTIoNS do wE HoPE To AddRESS wITHouR STATISTICAL ANALySIS?

    There are several basc questons we hope to address when usng numercal and graphcal summary

    o data:

    Can we derentate between groups or populatons?Are there correlatons between varables or populatons?

    Are processes under control?

    Fndng physologcal derences between populatons s probably the most requent am

    o bomedcal research. For example, researchers may want to know there s a derence n le

    expectancy between overweght and underweght people. Or, a pharmaceutcal company may wan

    to determne one type o antbotc s more eectve n combatng bactera than another. Or, a

    physcan wonders dastolc blood pressure s reduced n a group o hypertensve subjects ate

    the consumpton o a pressure-reducng drug. Most oten, bomedcal researchers are comparng

    populatons o people or anmals that have been exposed to two or more derent treatments or d

    agnostc tests, and they want to know there s derence between the responses o the populaton

    that have receved derent treatments or tests. Sometmes, we are drawng multple samples rom

    the same group o subjects or expermental unts. A common example s when the physologcal dat

    are taken beore and ater some treatment, such as drug ntake or electronc therapy, rom one group

    o patents. We call ths type o data collecton blockingn the expermental desgn. Ths concept o

    blockng s dscussed more ully n Chapter 2.

    Another queston that s requently the target o bomedcal research s whether there s a cor

    relaton between two physologcal varables. For example, s there a correlaton between body buld

    and mortalty? Or, s there a correlaton between at ntake and the occurrence o cancerous tumors

    Or, s there a correlaton between the sze o the ventrcular muscle o the heart and the requency o

    abnormal heart rhythms? These type o questons nvolve collectng two set o data and perormng

    a correlaton analyss to determne how well one set o data may be predcted rom another. When

    we speak o correlaton analyss, we are reerrng to the lnear relaton between two varables and the

    ablty to predct one set o data by modelng the data as a lnear uncton o the second set o data

    Because correlaton analyss only quantes the lnear relaton between two processes or data sets

    nonlnear relatons between the two processes may not be evdent. A more detaled descrpton o

    correlaton analyss may be ound n Chapter 7.

    Fnally, a bomedcal engneer, partcularly the engneer nvolved n manuacturng, may be

    nterested n knowng whether a manuacturng process s under control. Such a queston may ars

    there are tght controls on the manuacturng speccatons or a medcal devce. For example

    1.2.

    3.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    19/102

    dATA SuMMARy ANd dESCRIPTIvE STATISTICS 11

    the engneer s tryng to ensure qualty n producng ntravascular catheters that must have d-

    ameters between 1 and 2 cm, the engneer may randomly collect samples o catheters rom the

    assembly lne at random ntervals durng the day, measure ther dameters, determne how many o

    the catheters meet speccatons, and determne whether there s a sudden change n the number

    o catheters that al to meet speccatons. I there s such a change, the engneers may look or

    elements o the manuacturng process that change over tme, changes n envronmental actors, oruser errors. The engneer can use control charts to assess whether the processes are under control.

    These methods o statstcal analyss are not covered n ths text, but may be ound n a number o

    reerences, ncludng [3].

    3.4 How do wE gRAPHICALLy SuMMARIZE dATA?We can summarze data n graphcal or numercal orm. The numercal orm s what we reer to as

    statstcs. Beore blndly applyng the statstcal analyss, t s always good to look at the raw data,

    usually n a graphcal orm, and then use graphcal methods to summarze the data n an easy to

    nterpret ormat.

    The types o graphcal dsplays that are most requently used by bomedcal engneers nclude

    the ollowng: scatterplots, tme seres, box-and-whsker plots, and hstograms.

    Detals or creatng these graphcal summares are descrbed n [36], but we wll brefy

    descrbe them here.

    3.4.1 ScatterpltsThe scatterplot smply graphs the occurrence o one varable wth respect to another. In most cases,

    one o the varables may be consdered the ndependent varable (such as tme or subject number),and the second varable s consdered the dependent varable. Fgure 3.1 llustrates an example o a

    scatterplot or two sets o data. In general, we are nterested n whether there s a predctable rela-

    tonshp that maps our ndependent varable (such as respratory rate) nto our dependent varable

    (such a heart rate). I there s a lnear relatonshp between the two varables, the data ponts should

    all close to a straght lne.

    3.4.2 Time SeriesA tme seres s used to plot the changes n a varable as a uncton o tme. The varable s usually

    a physologcal measure, such as electrcal actvaton n the bran or hormone concentraton n the

    blood stream, that changes wth tme. Fgure 3.2 llustrates an example o a tme seres plot. In ths

    gure, we are lookng at a smple snusod uncton as t changes wth tme.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    20/102

    12 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    3.4.3 B-an-whisker Plts

    These plots llustrate the rst, second, and thrd quartles as well as the mnmum and maxmumvalues o the data collected. The second quartle (Q2) s also known as the medan o the data. Th

    quantty, as dened later n ths text, s the mddle data pont or sample value when the samples

    are lsted n descendng order. The rst quartle (Q1) can be thought o as the medan value o the

    samples that all below the second quartle. Smlarly, the thrd quartle (Q3) can be thought o as

    the medan value o the samples that all above the second quartle. Box-and-whsker plots are use

    ul n that they hghlght whether there s skew to the data or any unusual outlers n the sample

    (Fgure 3.3).

    -2

    -1

    0

    1

    2

    5 10 15 20

    Amplitude

    Time (msec)

    FIguRE 3.2: Example o a tme seres plot. The ampltude o the samples s plotted as a uncton o

    tme.

    20100

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    Independent Variable

    Depe

    ndentVariable

    FIguRE 3.1: Example o a scatterplot.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    21/102

    dATA SuMMARy ANd dESCRIPTIvE STATISTICS 13

    1

    10

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    Category

    DependentVariable

    Q1

    Q2

    Q3

    Box and Whisker Plot

    FIguRE 3.3: Illustraton o a box-and-whsker plot or the data set lsted. The rst (Q1), second (Q2),

    and thrd (Q3) quartles are shown. In addton, the whskers extend to the mnmum and maxmum

    values o the sample set.

    3.4.4 HistramThe hstogram s dened as a requency dstrbuton. GvenNsamples or measurements, x

    i, whch

    range rom Xmn

    to Xmax

    , the samples are grouped nto nonoverlappng ntervals (bns), usually o

    equal wdth (Fgure 3.4). Typcally, the number o bns s on the order o 714, dependng on the

    nature o the data. In addton, we typcally expect to have at least three samples per bn [7]. Stur-

    gess rule [6] may also be used to estmate the number o bns and s gven by

    k = 1 + 3.3 log(n).

    where k s the number o bns and n s the number o samples.

    Each bn o the hstogram has a lower boundary, upper boundary, and mdpont. The hsto-

    gram s constructed by plottng the number o samples n each bn. Fgure 3.5 llustrates a hstogram

    or 1000 samples drawn rom a normal dstrbuton wth mean () = 0 and standard devaton () =

    1.0. On the horzontal axs, we have the sample value, and on the vertcal axs, we have the number

    o occurrences o samples that all wthn a bn.

    Two measures that we nd useul n descrbng a hstogram are the absolute requency and

    relatve requency n one or more bns. These quanttes are dened as

    i= absolute requency n ith bn;

    i/n = relatve requency n th bn, where n s the total number o samples beng summarzed

    n the hstogram.

    a)

    b)

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    22/102

    14 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    A number o algorthms used by bomedcal nstruments or dagnosng or detectng ab-

    normaltes n bologcal uncton make use o the hstogram o collected data and the assocated

    relatve requences o selected bns [8]. Oten tmes, normal and abnormal physologcal uncton

    (breath sounds, heart rate varablty, requency content o electrophysologcal sgnals) may be d

    erentated by comparng the relatve requences n targeted bns o the hstograms o data repre

    sentng these bologcal processes.

    Lower Bound Upper BoundMidpoint

    FIguRE 3.4: One bn o a hstogram plot. The bn s dened by a lower bound, a mdpont, and an

    upper bound.

    -2 -1 0 1 2 3

    0

    10

    20

    Normalized Value

    Frequency

    FIguRE 3.5:Example o a hstogram plot. The value o the measure or sample s plotted on the hor-

    zontal axs, whereas the requency o occurrence o that measure or sample s plotted along the vertca

    axs.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    23/102

    dATA SuMMARy ANd dESCRIPTIvE STATISTICS 15

    The hstogram can exhbt several shapes. The shapes, llustrated n Fgure 3.6, are reerred

    to as symmetrc, skewed, or bmodal.

    A skewed hstogram may be attrbuted to the ollowng [9]:

    mechansms o nterest that generate the data (e.g., the physologcal mechansms that

    determne the beat-to-beat ntervals n the heart);an artact o the measurement process or a sht n the underlyng mechansm over tme

    (e.g., there may be tme-varyng changes n a manuacturng process that lead to a change

    n the statstcs o the manuacturng process over tme);

    a mxng o populatons rom whch samples are drawn (ths s typcally the source o a

    bmodal hstogram).

    The hstogram s mportant because t serves as a rough estmate o the true probablty den-

    sty uncton or probablty dstrbuton o the underlyng random process rom whch the samples

    are beng collected.

    The probablty densty uncton or probablty dstrbuton s a uncton that quantes theprobablty o a random event, x, occurrng. When the underlyng random event s dscrete n nature,

    we reer to the probablty densty uncton as the probablty mass uncton [10]. In ether case, the

    uncton descrbes the probablstc nature o the underlyng random varable or event and allows us

    to predct the probablty o observng a specc outcome, x (represented by the random varable),

    o an experment. The cumulatve dstrbuton uncton s smply the sum o the probabltes or a

    group o outcomes, where the outcome s less than or equal to some value, x.

    Let us consder a random varable or whch the probablty densty uncton s well dened

    (or most real-world phenomenon, such a probablty model s not known.) The random varable s

    the outcome o a sngle toss o a dce. Gven a sngle ar dce wth sx sdes, the probablty o rollng

    a sx on the throw o a dce s 1 o 6. In act, the probablty o throwng a one s also 1 o 6. I we

    consder all possble outcomes o the toss o a dce and plot the probablty o observng any one o

    those sx outcomes n a sngle toss, we would have a plot such as that shown n Fgure 3.7.

    Ths plot shows the probablty densty or probablty mass uncton or the toss o a dce.

    Ths type o probablty model s known as a unorm dstrbuton because each outcome has the

    exact same probablty o occurrng (1/6 n ths case).

    For the toss o a dce, we know the true probablty dstrbuton. However, or most real-

    world random processes, especally bologcal processes, we do not know what the true probablty

    densty or mass uncton looks lke. As a consequence, we have to use the hstogram, created rom a

    small sample, to try to estmate the best probablty dstrbuton or probablty model to descrbe the

    real-world phenomenon. I we return to the example o the toss o a dce, we can actually toss the

    dce a number o tmes and see how close the hstogram, obtaned rom expermental data, matches

    1.

    2.

    3.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    24/102

    16 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    -4 -3 -2 -1 0 1 2 3

    0

    100

    200

    Measure

    Frequ

    ency

    Symmetric

    0150

    0

    100

    200

    300

    400

    Measure

    Frequency

    Skewed

    0 10 20

    0

    100

    200

    300

    400

    Measure

    Frequency

    Bimodal

    FIguRE 3.6:Examples o a symmetrc (top), skewed (mddle), and bmodal (bottom) hstogram. In

    each case, 2000 sampled were drawn rom the underlyng populatons.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    25/102

    dATA SuMMARy ANd dESCRIPTIvE STATISTICS 17

    the true probablty mass uncton or the deal sx-sded dce. Fgure 3.8 llustrates the hstograms

    or the outcomes o 50 and 1000 tosses o a sngle dce. Note that even wth 50 tosses or samples, t

    s dcult to determne what the true probablty dstrbuton mght look lke. However, as we ap-

    proach 1000 samples, the hstogram s approachng the true probablty mass uncton (the unorm

    dstrbuton) or the toss o a dce. But, there s stll some varablty rom bn to bn that does not

    look as unorm as the deal probablty dstrbuton llustrated n Fgure 3.7. The message to take

    away rom ths llustraton s that most bomedcal research reports the outcomes o a small numbero samples. It s clear rom the dce example that the statstcs o the underlyng random process

    are very dcult to dscern rom a small sample, yet most bomedcal research reles on data rom

    small samples.

    3.5 gENERAL APPRoACH To STATISTICAL ANALySISWe have now collected our data and looked at some graphcal summares o the data. Now we wll

    use numercal summary, also known as statstcs, to try to descrbe the nature o the underlyng

    populaton or process rom whch we have taken our samples. From these descrptve statstcs, we

    assume a probablty model or probablty dstrbuton or the underlyng populaton or process andthen select the approprate statstcal tests to test hypotheses or make decsons. It s mportant to

    note that the conclusons one may draw rom a statstcal test depends on how well the assumed

    probablty model ts the underlyng populaton or process.

    1 2 3 4 5 6

    0

    1/6

    Result of Toss of Single Dice

    RelativeFrequency

    Probability Mass Function

    FIguRE 3.7: The probablty densty uncton or a dscrete random varable (probablty mass unc-

    ton). In ths case, the random varable s the value o a toss o a sngle dce. Note that each o the sx pos-

    sble outcomes has a probablty o occurrence o 1 o 6. Ths probablty densty uncton s also knownas a unorm probablty dstrbuton.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    26/102

    18 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    654321

    0.2

    0.1

    0.0

    Value of Dice Toss

    RelativeFrequ

    ency

    Histogram of 50 Dice Tosses

    654321

    0.2

    0.1

    0.0

    Value of Dice Toss

    RelativeFrequency

    Histogram of 2000 Dice Tosses

    FIguRE 3.8:Hstograms representng the outcomes o experments n whch a sngle dce s tossed

    50 (top) and 2000 tmes (lower), respectvely. Note that as the sample sze ncreases, the hstogram ap

    proaches the true probablty dstrbuton llustrated n Fgure 3.7.

    As stated n the Introducton, bomedcal engneers are tryng to make decsons about popu

    latons or processes to whch they have lmted access. Thus, they desgn experments and collecsamples that they thnk wll arly represent the underlyng populaton or process. Regardless o

    what type o statstcal analyss wll result rom the nvestgaton or study, all statstcal analys

    should ollow the same general approach:

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    27/102

    dATA SuMMARy ANd dESCRIPTIvE STATISTICS 19

    Measure a lmted number o representatve samples rom a larger populaton.

    Estmate the true statstcs o larger populaton rom the sample statstcs.

    Some mportant concepts need to be addressed here. The rst concept s somewhat obvous. It s

    oten mpossble or mpractcal to take measurements or observatons rom an entre populaton.

    Thus, the bomedcal engneer wll typcally select a smaller, more practcal sample that represents

    the underlyng populaton and the extent o varablty n the larger populaton. For example, we

    cannot possbly measure the restng body temperature o every person on earth to get an estmate o

    normal body temperature and normal range. We are nterested n knowng what the normal body

    temperature s, on average, o a healthy human beng and the normal range o restng temperatures

    as well as the lkelhood or probablty o measurng a specc body temperature under healthy, rest -

    ng condtons. In tryng to determne the characterstcs or underlyng probablty model or body

    temperature or healthy, restng ndvduals, the researcher wll select, at random, a sample o healthy,

    restng ndvduals and measure ther ndvdual restng body temperatures wth a thermometer. The

    researchers wll have to consder the composton and sze o the sample populaton to adequatelyrepresent the varablty n the overall populaton. The researcher wll have to dene what character-

    zes a normal, healthy ndvdual, such as age, sze, race, sex, and other trats. I a researcher were to

    collect body temperature data rom such a sample o 3000 ndvduals, he or she may plot a hsto-

    gram o temperatures measured rom the 3000 subjects and end up wth the ollowng hstogram

    (Fgure 3.9).The researcher may also calculate some basc descrptve statstcs or the 3000 samples,

    such as sample average (mean), medan, and standard devaton.

    1.

    2.

    95 96 97 98 99 100 101 102

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    Temperature (F)

    Density

    Body Temperature

    FIguRE 3.9:Hstogram or 2000 nternal body temperatures collected rom a normally dstrbuted

    populaton.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    28/102

    20 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    Once the researcher has estmated the sample statstcs rom the sample populaton, he or sh

    wll try to draw conclusons about the larger (true) populaton. The most mportant queston to ask

    when revewng the statstcs and conclusons drawn rom the sample populaton s how well th

    sample populaton represents the larger, underlyng populaton.

    Once the data have been collected, we use some basc descrptve statstcs to summarze th

    data. These basc descrptve statstcs nclude the ollowng general measures: central tendencyvarablty, and correlaton.

    3.6 dESCRIPTIvE STATISTICSThere are a number o descrptve statstcs that help us to pcture the dstrbuton o the underlyng

    populaton. In other words, our ultmate goal s to assume an underlyng probablty model or th

    populaton and then select the statstcal analyses that are approprate or that probablty model.

    When we try to draw conclusons about the larger underlyng populaton or process rom ousmaller sample o data, we assume that the underlyng model or any sample, event, or measure

    (the outcome o the experment) s as ollows:

    X = ndvdual derences stuatonal actors unknown varables,

    whereXs our measure or sample value and s nfuenced by, whch s the true populaton mean

    ndvdual derences such as genetcs, tranng, motvaton, and physcal condton; stuaton actors

    such as envronmental actors; and unknown varables such as undented/nonquanted actor

    that behave n an unpredctable ashon rom moment to moment.In other words, when we make a measurement or observaton, the measured value represent

    or s nfuenced by not only the statstcs o the underlyng populaton, such as the populaton

    mean, but actors such as bologcal varablty rom ndvdual to ndvdual, envronmental actor

    (tme, temperature, humdty, lghtng, drugs, etc.), and random actors that cannot be predcted

    exactly rom moment to moment. All o these actors wll gve rse to a hstogram or the sample

    data, whch may or may not refect the true probablty densty uncton o the underlyng popula

    ton. I we have done a good job wth our expermental desgn and collected a sucent number o

    samples, the hstogram and descrptve statstcs or the sample populaton should closely refect th

    true probablty densty uncton and descrptve statstcs or the true or underlyng populaton. I

    ths s the case, then we can make conclusons about the larger populaton rom the smaller sampl

    populaton. I the sample populaton does not refect varablty o the true populaton, then the

    conclusons we draw rom statstcal analyss o the sample data may be o lttle value.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    29/102

    dATA SuMMARy ANd dESCRIPTIvE STATISTICS 21

    There are a number o probablty models that are useul or descrbng bologcal and manu-

    acturng processes. These nclude the normal, Posson, exponental, and gamma dstrbutons [10].

    In ths book, we wll ocus on populatons that ollow a normal dstrbuton because ths s the most

    requently encountered probablty dstrbuton used n descrbng populatons. Moreover, the most

    requently used methods o statstcal analyss assume that the data are well modeled by a normal

    (bell-curve) dstrbuton. It s mportant to note that many bologcal processes are not well mod-eled by a normal dstrbuton (such as heart rate varablty), and the statstcs assocated wth the

    normal dstrbuton are not approprate or such processes. In such cases, nonparametrc statstcs,

    whch do not assume a specc type o dstrbuton or the data, may serve the researcher better n

    understandng processes and makng decsons. However, usng the normal dstrbuton and ts asso-

    cated statstcs are oten adequate gven the central lmt theorem, whch smply states that the sum

    o random processes wth arbtrary dstrbutons wll result n a random varable wth a normal ds-

    trbuton. One can assume that most bologcal phenomena result rom a sum o random processes.

    3.6.1 Measres Central TenencThere are several measures that refect the central tendency or concentraton o a sample populaton:

    sample mean (arthmetc average), sample medan, and sample mode.

    The sample mean may be estmated rom a group o samples, xi, where is sample number,

    usng the ormula below.

    Gven n data ponts, x1, x

    2,, x

    n:

    xn

    xi

    i

    n

    ==

    1

    1

    .

    In practce, we typcally do not know the true mean, , o the underlyng populaton, nstead we

    try to estmate true mean, , o the larger populaton. As the sample sze becomes large, the sample

    mean, x, should approach the true mean,, assumng that the statstcs o the underlyng populaton

    or process do not change over tme or space.

    One o the problems wth usng the sample mean to represent the central tendency o a

    populaton s that the sample mean s susceptble to outlers. Ths can be problematc and oten

    decevng when reportng the average o a populaton that s heavly skewed. For example, when

    reportng ncome or a group o new college graduates or whch one s an NBA player who has just

    sgned a multmllon-dollar contract, the estmated mean ncome wll be much greater than whatmost graduates earns. The same msrepresentaton s oten evdent when reportng mean value or

    homes n a specc geographc regon where a ew homes valued on the order o a mllon can hde

    the act that several hundred other homes are valued at less than $200,000.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    30/102

    22 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    Another useul measure or summarzng the central tendency o a populaton s the sample

    medan. The medan value o a group o observatons or samples, xi, s the mddle observaton when

    samples, xi, are lsted n descendng order.

    For example, we have the ollowng values or tdal volume o the lung:

    2, 1.5, 1.3, 1.8, 2.2, 2.5, 1.4, 1.3,

    we can nd the medan value by rst orderng the data n descendng order:

    2.5, 2.2, 2.0, 1.8, 1.5, 1.4, 1.3, 1.3,

    and then we cross o values on each end untl we reach a mddle value:

    2.5, 2.2, 2.0, 1.8, 1.5, 1.4, 1.3, 1.3.

    In ths case, there are two mddle values; thus, the medan s the average o those two values, whchs 1.65.

    Note that the number o samples, n, s odd, the medan wll be the mddle observaton. I

    the sample sze, n, s even, then the medan equals the average o two mddle observatons. Com-

    pared wth the sample mean, the sample medan s less susceptble to outlers. It gnores the skew n

    a group o samples or n the probablty densty uncton o the underlyng populaton. In general

    to arly represent the central tendency o a collecton o samples or the underlyng populaton, we

    use the ollowng rule o thumb:

    I the sample hstogram or probablty densty uncton o the underlyng populaton s

    symmetrc, use mean as a central measure. For such populatons, the mean and medan

    are about equal, and the mean estmate makes use o all the data.

    I the sample hstogram or probablty densty uncton o the underlyng populaton s

    skewed, medan s a more ar measure o center o dstrbuton.

    Another measure o central tendency s mode, whch s smply the most requent observaton n

    a collecton o samples. In the tdal volume example gven above, 1.3 s the most requently occurrng

    sample value. Mode s not used as requently as mean or medan n representng central tendency.

    3.6.2 Measres variabilitMeasures o central tendency alone are nsucent or representng the statstcs o a populaton o

    process. In act, t s usually the varablty n the populaton that makes thngs nterestng and lead

    1.

    2.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    31/102

    dATA SuMMARy ANd dESCRIPTIvE STATISTICS 23

    to uncertanty n decson makng. The varablty rom subject to subject, especally n physologcal

    uncton, s what makes ndng ool-proo dagnoss and treatment oten so dcult. What works

    or one person oten als or another, and, t s not the mean or medan that pcks up on those

    subject-to-subject derences, but rather the varablty, whch s refected n derences n the prob-

    ablty models underlyng those derent populatons.

    When summarzng the varablty o a populaton or process, we typcally ask, How ar romthe center (sample mean) do the samples (data) le? To answer ths queston, we typcally use the

    ollowng estmates that represent the spread o the sample data: nterquartle ranges, sample var-

    ance, and sample standard devaton.

    The nterquartle range s the derence between the rst and thrd quartles o the sample

    data. For sampled data, the medan s also known as the second quartle, Q2. Gven Q2, we can nd

    the rst quartle, Q1, by smply takng the medan value o those samples that le below the second

    quartle. We can nd the thrd quartle, Q3, by takng the medan value o those samples that le

    above the second quartle. As an llustraton, we have the ollowng samples:

    1, 3, 3, 2, 5, 1, 1, 4, 3, 2.

    I we lst these samples n descendng order,

    5, 4, 3, 3, 3, 2, 2, 1, 1, 1,

    the medan value and second quartle or these samples s 2.5. The rst quartle, Q1, can be ound

    by takng the medan o the ollowng samples,

    2.5, 2, 2, 1, 1, 1,

    whch s 1.5. In addton, the thrd quartle, Q3, may be ound by takng the medan value o the

    ollowng samples:

    5, 4, 3, 3, 3, 2.5,

    whch s 3. Thus, the nterquartle range, Q3 Q1 = 3 1.5 = 2.

    Sample varance, s2, s dened as the average dstance o data rom the mean and the ormula

    or estmatng s2 rom a collecton o samples, xi, s

    sn

    x xi

    i

    n

    2 2

    1

    1

    1=

    ( ) .

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    32/102

    24 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    Sample standard devaton, s, whch s more commonly reerred to n descrbng the varablty o

    the data s

    =2

    s s (same unts as orgnal samples).

    It s mportant to note that or normal dstrbutons (symmetrcal hstograms), sample mean

    and sample devaton are the only parameters needed to descrbe the statstcs o the underlyng

    phenomenon. Thus, one were to compare two or more normally dstrbuted populatons, one only

    need to test the equvalence o the means and varances o those populatons.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    33/102

    25

    Now that we have collected the data, graphed the hstogram, estmated measures o central ten-

    dency and varablty, such as mean, medan, and standard devaton, we are ready to assume a

    probablty model or the underlyng populaton or process rom whch we have obtaned samples.

    At ths pont, we wll make a rough assumpton usng smple measures o mean, medan, standard

    devaton and the hstogram. But t s mportant to note that there are more rgorous tests, such as

    thec2 test or normalty [7] to determne whether a partcular probablty model s approprate to

    assume rom a collecton o sample data.

    Once we have assumed an approprate probablty model, we may select the approprate

    statstcal tests that wll allow us to test hypotheses and draw conclusons wth some level o con-

    dence. The probablty model wll dctate what level o condence we have when acceptng or

    rejectng a hypothess.

    There are two undamental questons that we are tryng to address when assumng a prob-

    ablty model or our underlyng populaton:

    How condent are we that the sample statstcs are representatve o the entre

    populaton?

    Are the derences n the statstcs between two populatons sgncant, resultng rom

    actors other than chance alone?

    To declare any level o condence n makng statstcal nerence, we need a mathematcal model

    that descrbes the probablty that any data value mght occur. These models are called probablty

    dstrbutons.

    There are a number o probablty models that are requently assumed to descrbe bologcalprocesses. For example, when descrbng heart rate varablty, the probablty o observng a specc

    tme nterval between consecutve heartbeats mght be descrbed by an exponental dstrbuton [1, 8].

    Fgure 3.6 n Chapter 3 llustrates a hstogram or samples drawn rom an exponental dstrbuton.

    1.

    2.

    Assmin a Prbabilit MelFrm the Sample data

    C H A P T E R 4

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    34/102

    26 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    Note that ths dstrbuton s hghly skewed to the rght. For R-R ntervals, such a probablty unc-

    ton makes sense physologcally because the ndvdual heart cells have a reractory perod that pre-

    vents them rom contractng n less that a mnmum tme nterval. Yet, a very prolonged tme nterva

    may occur between beats, gvng rse to some long tme ntervals that occur nrequently.

    The most requently assumed probablty model or most scentc and engneerng applca

    tons s the normal or Gaussan dstrbuton. Ths dstrbuton s llustrated by the sold black lne nFgure 4.1 and oten reerred to as the bell curve because t looks lke a muscal bell.

    The equaton that gves the probablty,(x), o observng a specc value ox rom the un

    derlyng normalpopulaton s

    f x

    x

    ( ) ,=-

    -

    1

    2

    1

    2

    2

    e

    < x <

    where s the true mean o the underlyng populaton or process and s the standard devaton

    o the same populaton or process. A graph o ths equaton s gven llustrated by the sold, smoothcurve n Fgure 4.1. The area under the curve equals one.

    Note that the normal dstrbuton s

    a symmetrc, bell-shaped curve completely descrbed by ts mean, , and standard deva-

    ton, .

    by changng and , we stretch and slde the dstrbuton.

    1.

    2.

    0

    0.05

    0.1

    Normalized Measure

    Relative

    Frequency

    Histogram of Measure, with Normal Curve

    -4 -3 -2 -1 0 1 2 3

    FIguRE 4.1:A hstogram o 1000 samples drawn rom a normal dstrbuton s llustrated. Super-

    mposed on the hstogram s the deal normal curve representng the normal probablty dstrbuton

    uncton.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    35/102

    ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 27

    Fgure 4.1 also llustrates a hstogram that s obtaned when we randomly select 1000 samples

    rom a populaton that s normally dstrbuted and has a mean o 0 and a varance o 1. It s mpor-

    tant to recognze that as we ncrease the sample sze n, the hstogram approaches the deal normal

    dstrbuton shown wth the sold, smooth lne. But, at small sample szes, the hstogram may look

    very derent rom the normal curve. Thus, rom small sample szes, t may be dcult to determne

    the assumed model s approprate or the underlyng populaton or process, and any statstcaltests that we perorm may not allow us to test hypotheses and draw conclusons wth any real level

    o condence.

    We can perorm lnear operatons on our normally dstrbuted random varable, x, to produce

    another normally dstrbuted random varable,y. These operatons nclude multplcaton ox by a

    constant and addton o a constant (oset) to x. Fgure 4.2 llustrates hstograms or samples drawn

    rom each o populatons x andy. We note that the dstrbuton or y s shted (the mean s now

    equal to 5) and the varance has ncreased wth respect to x.

    One test that we may use to determne how well a normal probablty model ts our data

    s to count how many samples all wthn 1 and 2 standard devatons o the mean. I the dataand underlyng populaton or process s well modeled by a normal dstrbuton, 68% o the samples

    should le wthn 1 standard devaton rom the mean and 95% o the samples should le wthn

    1050

    200

    100

    0

    xoryvalue

    Frequency

    y= 2x+ 5

    x

    FIguRE 4.2:Hstograms are shown or samples drawn rom populatons x andy, wherey s smply a ln-

    ear uncton ox. Note that the mean and varance oy der rom x, yet both are normal dstrbutons.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    36/102

    28 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    2 standard devatons rom the mean. These percentages are llustrated n Fgure 4.3. It s mpor

    tant to remember these ew numbers, because we wll requently use ths 95% nterval when drawng conclusons rom our statstcal analyss.

    Another means or determnng how well our sampled data, x, represent a normal dstrbu

    ton s the estmate Pearsons coecent o skew (PCS) [5]. The coecent o skew s gven by

    PCSmedian

    =

    3 x x

    s.

    I the PCS > 0.5, we assume that our samples were not drawn rom a normally dstrbuted populaton

    When we collect data, the data are typcally collected n many derent types o physcal unt

    (volts, celsus, newtons, centmeters, grams, etc.). For us to use tables that have been developed oprobablty models, we need to normalze the data so that the normalzed data wll have a mean o

    0 and a standard devaton o 1. Such a normal dstrbuton s called a standard normal dstrbuton

    and s llustrated n Fgure4.1.

    3210-1-2-3

    90

    80

    70

    60

    50

    40

    30

    20

    10

    0

    Normalized value (Z score)

    Frequency

    68 %95%

    FIguRE 4.3: Hstogram or samples drawn rom a normally dstrbuted populaton. For a normal ds

    trbuton, 68% o the samples should le wthn 1 standard devaton rom the mean (0 n ths case) and

    95% o the samples should le wthn 2 standard devatons (1.96 to be precse) o the mean.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    37/102

    ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 29

    The standard normal dstrbuton has a bell-shaped, symmetrc dstrbuton wth = 0 and

    = 1.

    To convert normally dstrbuted data to the standard normal value, we use the ollowng

    ormulas,

    z = (x )/ or z = (x x )/s,

    dependng on we know the true mean, , and standard devaton, a, or we only have the sample

    estmates, x or s.

    For any ndvdual sample or data pont, xi, rom a sample wth mean, x, and standard deva-

    ton, s, we can determne ts z score rom the ollowng ormula:

    z

    x x

    sii

    =

    .

    For an ndvdual sample, the z score s a normalzed or standardzed value. We can use ths value

    wth our equatons or probablty densty uncton or our standardzed probablty tables [3] to de-

    termne the probablty o observng such a sample value rom the underlyng populaton.

    The z score can also be thought o as a measure o the dstance o the ndvdual sample, xi,

    rom the sample average, x, n unts o standard devaton. For example, a sample pont, xihas a z

    score ozi= 2, t means that the data pont, x

    i, s 2 standard devatons rom the sample mean.

    We use normalzed z scores nstead o the orgnal data when perormng statstcal analyss

    because the tables or the normalzed data are already worked out and avalable n most statstcs

    texts or statstcal sotware packages. In addton, by usng normalzed values, we need not worry

    about the absolute ampltude o the data or the unts used to measure the data.

    4.1 THE STANdARd NoRMAL dISTRIBuTIoNThe standard normal dstrbuton s llustrated n Table 4.1.

    The z table assocated wth ths gure provdes table entres that gve the probablty that z

    a, whch equals the area under the normal curve to the let oz = a. I our data come rom a normal

    dstrbuton, the table tells us the probablty or chance o our sample value or expermental out-

    comes havng a value less than or equal to a.

    Thus, we can take any sample and compute ts z score as descrbed above and then use the

    z table to nd the probablty o observng a z value that s less than or equal to some normalzed

    value, a. For example, the probablty o observng a z value that s less than or equal to 1.96 s97.5%. Thus, the probablty o observng a z value greater than 1.96 s 2.5%. In addton, because o

    symmetry n the dstrbuton, we know that the probablty o observng a z value greater than 1.96

    s also 97.5%, and the probablty o observng a z value less than or equal to 1.96 s 2.5%. Fnally,

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    38/102

    3210-1-2-3-4

    Measure

    Frequency

    ZDistribution

    Z

    Area to let oza

    equals the Pr(z < za) = 1 a; thus, the area n the tal to the rght o z

    equals a.

    TABLE 4.1: Standard z dstrbuton uncton: areas under standardzed normal densty uncton

    z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

    0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.l5279 0.5319 0.535

    0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.575

    0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.614

    1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9631.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.970

    1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.976

    2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.981

    2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.993

    2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.995

    2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.996

    3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.999

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    39/102

    ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 31

    the probablty o observng a z value between 1.96 and 1.96 s 95%. The reader should study the

    z table and assocated graph o the z dstrbuton to very that the probabltes (or areas under the

    probablty densty uncton) descrbed above are correct.

    Oten, we need to determne the probablty that an expermental outcome alls between two

    values or that the outcome s greater than some value a or less or greater than some value b. To nd

    these areas, we can use the ollowng mportant ormulas, where Pr s the probablty:

    Pr(azb) = Pr(zb) Pr(za)

    = area between z = a and z = b.

    Pr(za) = 1 Pr(z < a)

    = area to rght oz = a

    = area n the rght tal.

    Thus, or any observaton or measurement, x, rom any normal dstrbuton:

    Pr( ) Pr ,a x b

    az

    b =

    where s the mean o normal dstrbuton and s the standard devaton o normal dstrbuton.

    In other words, we need to normalze or nd the z values or each o our parameters, a and b,

    to nd the area under the standard normal curve (z dstrbuton) that represents the expresson onthe let sde o the above equaton.

    Eample 4.1 The mean ntake o at or males 6 to 9 years old s 28 g, wth a standard devaton

    o 13.2 g. Assume that the ntake s normally dstrbuted. Steves ntake s 42 g and Bens ntake s

    25 g.

    AREA IN RIgHTTAIL,a Za

    0.10 1.282

    0.05 1.645

    0.025 1.96

    0.010 2.326

    0.005 2.576

    Commonly used z values:

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    40/102

    32 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    What s the proporton o area between Steves daly ntake and Bens daly ntake?

    I we were to randomly select a male between the ages o 6 and 9 years, what s the prob-

    ablty that hs at ntake would be 50 g or more?

    Solution:x = at ntake

    The problem may be stated as: what s Pr(25 x 42)?Assumng a normal dstrbuton, we convert to z scores:

    What s Pr(((25 28)/13.2) < z < ((42 28)/13.2)))?

    = Pr (0.227 z 1.06) = Pr (z 1.06) Pr (z 0.227) (usng ormula o

    Pr (azb))

    ` = Pr (z 1.06) [1 Pr(z 0.227)] = 0.8554 [1 0.5910] = 0.4464 or 44.6% o

    area under the z curve.

    2. The problem may be stated as, What s Pr (x > 50)?

    Normalzng to z score, what s Pr (z > (50 28)/13.2)?

    = Pr (z > 1.67)= 1 Pr (z 1.67) = 1 0.9525 = 0.0475, or 4.75% o the area

    under the z curve.

    Eample 4.2 Suppose that the speccatons on the range o dsplacement or a lmb ndentor are

    0.5 0.001 mm. I these dsplacements are normally dstrbuted, wth mean = 0.47 and standard

    devaton = 0.002, what percentage o ndentors are wthn speccatons?

    Solution:x = dsplacement.

    The problem may be stated as, What s Pr(0.499 x 0.501)?

    Usng z scores, Pr(0.499 x 0.501) = Pr((0.499 0.47)/0.002 z (0.501 0.470.002))

    = Pr (14.5 z 15.5) = Pr (z 15.5) Pr (z 14.5) = 1 1 = 0

    It s useul to note that the dstrbuton o the underlyng populaton and the assocated sample

    data are not normal (.e. skewed), transormatons may oten be used to make the data normal

    and the statstcs covered n ths text may then be used to perorm statstcal analyss on the trans-

    ormed data. These transormatons on the raw data nclude logs, square root, and recprocal.

    4.2 THE NoRMAL dISTRIBuTIoN ANd SAMPLE MEANAll statstcal analyss ollows the same general procedure:

    Assume an underlyng dstrbuton or the data and assocated parameters (e.g., the

    sample mean).

    1.

    2.

    1.

    1.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    41/102

    ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 33

    Scale the data or parameter to a standard dstrbuton.

    Estmate condence ntervals usng a standard table or the assumed dstrbuton. (The

    queston we ask s, What s the probablty o observng the expermental outcome by

    chance alone?)

    Perorm hypothess test (e.g., Students ttest).

    We are begnnng wth and ocusng most o ths text on the normal dstrbuton or probablty

    model because o ts prevalence n the bomedcal eld and somethng called the central lmt

    theorem. One o the most basc statstcal tests we perorm s a comparson o the means rom

    two or more populatons. The sample mean s t tsel an estmate made rom a nte number

    o samples. Thus, the sample mean, x, s tsel a random varable that s modeled wth a normal

    dstrbuton [4].

    Is ths model or x legtmate? The answer s yes, or large samples, because o the central

    lmt theorem, whch states [4, 10]:

    I the individual data points or samples (each sample is a random variable),x, come rom any arbitrary

    probability distribution, the sum (and hence, average) o those data points is normally distributed as the

    sample size, n, becomes large.

    Thus, even each sample, such as the toss o a dce, comes rom a nonnormal dstrbuton

    (e.g., a unorm dstrbuton, such as the toss o a dce), the sum o those ndvdual samples (such

    as the sum we use to estmate the sample mean, x) wll have a normal dstrbuton. One can eas-

    ly assume that many o the bologcal or physologcal processes that we measure are the sum o a

    number o random processes wth varous probablty dstrbutons; thus, the assumpton that oursamples come rom a normal dstrbuton s not unreasonable.

    4.3 CoNFIdENCE INTERvAL FoR THE SAMPLE MEANEvery sample statstc s n tsel a random varable wth some sort o probablty dstrbuton. Thus,

    when we use samples to estmate the true statstcs o a populaton (whch n practce are usually not

    known and not obtanable), we want to have some level o condence that our sample estmates are

    close to the true populaton statstcs or are representatve o the underlyng populaton or process.

    In estmatng a condence nterval or the sample mean, we are askng the queston: How

    close s our sample mean (estmated rom a nte number o samples) to the true mean o thepopulaton?

    To assgn a level o condence to our statstcal estmates or statstcal conclusons, we need

    to rst assume a probablty dstrbuton or model or our samples and underlyng populaton and

    then we need to estmate a condence nterval usng the assumed dstrbuton.

    2.

    3.

    4.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    42/102

    34 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    The smplest condence nterval that we can begn wth regardng our descrptve statstcs

    a condence nterval or the sample mean, x. Our queston s how close s the sample mean to the

    true populaton or process mean, ?

    Beore we can answer ths, we need to assume an underlyng probablty model or the sampl

    mean, x, and true mean, . As stated earler, t may be shown that or a large samples sze, the sampl

    mean, x, s well modeled by a normal dstrbuton or probablty model. Thus, we wll use ths modewhen estmatng a condence nterval or our sample mean, x.

    Thus, x s estmated rom the sample. We then ask, how close s x (sample mean) to the true

    populaton mean, ?

    It may be shown that we took many groups on samples and estmated x or each group,

    the average or sample mean ox = , and

    the standard devaton ox = s n.

    Thus, as our sample sze, n, gets large, the dstrbuton or x approaches a normal dstrbuton.For large n, x ollows a normal dstrbuton, and the z score or xmay be used to estmate th

    ollowng:

    Pr( ) Pr

    / /.a x b

    a

    nz

    b

    n =

    Ths expresson assumes a large n and that we know.

    Now we look at the case where we mght have a large n, but we do not know. In such caseswe replace wth s to get the ollowng expresson:

    Pr( ) Pr

    / /,a x b

    a

    s nz

    b

    s n =

    where s n s called the sample standard error and represents the standard devaton or x.

    Let us assume now or large n, we want to estmate the 95% condence nterval or x. W

    rst scale the sample mean, x, to a z value (because the central lmt theorem says that x s normally

    dstrbuted)

    z

    x

    s n

    =

    /.

    1.

    2.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    43/102

    ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 35

    We recall that 95% oz values all between 1.96 (approxmately 2) o the mean, and or the z

    dstrbuton,

    Pr(1.96 z1.96) = 0.95.

    Substtutng or z,

    z

    x

    s n

    =

    /.

    we get

    0 95 1 96 1 96. Pr .

    /. .=

    x

    s n

    I we use the ollowng notaton n terms o the sample standard error:

    SE( ) .xs

    n

    =

    Rearrangng terms or the expresson above, we note that the probablty that les between 1.96

    (or 2) standard devatons ox s 95%:

    0 9 5 1 96 1 96. Pr . SE( ) . SE( ).= +( )x x x x

    Note that 1.96 s reerred to as za/2. Ths z value s the value oz or whch the area n the rghttal o the normal dstrbuton s a/2. I we were to estmate the 99% condence nterval, we would

    substtute z0.01/2

    , whch s 2.576, nto the 1.96 poston above.

    Thus, For large n and any condence level, 1 a, the 1 acondence nterval or the true

    populaton mean, , s gven by:

    = x z SE x/ ( ).2

    Ths means that there s a (1 a)percent probablty that the true mean les wthn the above nter-

    val centered about x.

    Eample 4.3 Estmate o condence ntervals

    Problem: Gven a collecton o data wth, x = 505 and s = 100. I the number o samples was 1000,

    what s the 95% condence nterval or the populaton mean, ?

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    44/102

    36 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    Solution: I we assume a large sample sze, we may use the z dstrbuton to estmate the condenc

    nterval or the sample mean usng the ollowng equaton:

    = x z x/ SE( ).2

    We plug n the ollowng values:

    x=505;SE( ) / / .x s n= = 100 1000

    For 95% condence nterval, a= 0.05.

    Usng a z table to locate z(0.05/2), we nd that the value oz that gves an area o 0.025 n

    the rght tal s 1.96

    Pluggng n x, SE(x), and z(a/2) nto the estmate or the condence nterval above, we nd

    that the 95% condence nterval or = [498.80, 511.20].

    Note that we wanted to estmate the 99% condence nterval, we would smply use a d-erent z value, z (0.01/2) n the same equaton. The z value assocated wth an area o 0.005 n th

    rght tal s 2.576. I we use ths z value, we estmate a condence nterval or o [496.86, 515.14]

    We note that the condence nterval has wdened as we ncreased our condence level.

    4.4 THE tdISTRIBuTIoNFor small samples, x s no longer normally dstrbuted. Thereore, we use Students tdstrbuton to

    estmate the true statstcs o the populaton. The tdstrbuton, as llustrated n Table 4.2 looks lk

    a z dstrbuton but wth slower taper at the tals and fatter central regon.

    Measure

    Frequency

    t Distribution

    t

    Curve changes with df

    3210-1-2-3-4

    Table entry = t(a; d), where as the area n the tal to the rght ot(a; d) and ds degree

    o reedom.

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    45/102

    ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 37

    TABLE 4.2: Percentage ponts or Students tdstrbuton

    d a= area to rght ot(a; d)

    0.10 0.05 0.025 0.01 0.005

    1 3.078 6.314 12.706 31.821 63.657

    2 1.886 2.920 4.303 6.965 9.925

    3 1.638 2.353 3.182 4.541 5.841

    4 1.533 2.132 2.776 3.747 4.604

    5 1.476 2.015 2.571 3.365 4.032

    10 1.372 1.812 2.228 2.764 3.169

    11 1.363 1.796 2.201 2.718 3.106

    12 1.356 1.782 2.179 2.681 3.055

    13 1.350 1.771 2.160 2.650 3.012

    14 1.345 1.761 2.145 2.624 2.977

    15 1.341 1.753 2.131 2.602 2.947

    30 1.310 1.697 2.042 2.457 2.750

    40 1.303 1.684 2.021 2.423 2.704

    60 1.296 1.671 2.000 2.390 2.660

    120 1.289 1.658 1.980 2.358 2.617

    1.282 1.645 1.960 2.326 2.576

    za(large sample) 1.282 1.645 1.960 2.326 2.576

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    46/102

    38 INTRoduCTIoN To STATISTICS FoR BIoMEdICAL ENgINEERS

    We use tdstrbuton, wth slower tapered tals, because wth so ew samples, we have les

    certanty about our underlyng dstrbuton (probablty model).

    Now our normalzed value or x s gven by

    s n

    /

    ,x

    whch s known to have a tdstrbuton rather than the z dstrbuton that we have dscussed thu

    ar. The tdstrbuton was rst nvented by W.S. Gosset [4, 11], a chemst who worked or a brew

    ery. Gosset decded to publsh hs tdstrbuton under the alas o Student. Hence, we oten reer to

    ths dstrbuton as Students tdstrbuton.

    The tdstrbuton s symmetrc lke the z dstrbuton and generally has a bell shape. But th

    amount o spread o the dstrbuton to the tals, or the wdth o the bell, depends on the sample

    sze, n. Unlke the z dstrbuton, whch assumes an nnte sample sze, the tdstrbuton change

    shape wth sample sze. The result s that the condence ntervals estmated wth tvalues are morspread out than or z dstrbuton, especally or small sample szes, because wth such sample

    szes, we are penalzed or not havng sucent samples to represent the extent o varablty o

    the underlyng populaton or process. Thus, when we are estmatng condence ntervals or th

    sample mean, x, we do not have as much condence n our estmate. Thus, the nterval wdens to

    refect ths decreased certanty wth smaller sample szes. In the next secton, we estmate the con

    dence nterval or the same example gven prevously, but usng the tdstrbuton nstead o the z

    dstrbuton.

    4.5 CoNFIdENCE INTERvAL uSINg tdISTRIBuTIoNLke the z tables, there are t tables where the values ot that are assocated wth derent area

    under the probablty curve are already calculated and may be used or statstcal analyss wthou

    the need to recalculate the tvalues. The derence between the z table and ttable s that now th

    tvalues are a uncton o the samples sze or degrees o reedom. Table 4.2 gves a ew lnes o th

    ttable rom [3].

    To use the table, one smply looks or the ntersecton o degrees o reedom, d, (related to

    sample sze) and avalue that one desres n the rght-hand tal. The ntersecton provdes the tvalu

    or whch the area under the tcurve n the rght tal s a. In other words, the probablty that twlbe less than or equal to a specc entry n the table s 1 a. For a specc sample sze, n, the degree

    o reedom, d= n 1.

    Now we smply substtute tor z to nd our condence ntervals. So, the condence nterva

    or the sample mean, x, usng the tdstrbuton now becomes

  • 8/8/2019 Introduction to Statistics for Bio Medical Engineers - Kristina M. Ropella

    47/102

    ASSuMINg A PRoBABILIT y ModEL FRoM THE SAMPLE dATA 39

    = 1 SE( )

    x t 2n;

    x

    where s the true mean o the underlyng populaton or process rom whch we are drawng sam-

    ples, SE(x) s the standard error o the underlyng populaton or process, ts the tvalue or whch

    there s an area oa/2 n the rght tal, and n s the sample sze.

    Eample 4.4 Condence nterval usng tdstrbuton

    Problem: We consder the same example used prevously or estmatng condence ntervals usng

    z values. In ths case, the sample sze s small (n = 20), so we now use a tdstrbuton.

    Solution: Now, our estmate or condence nterval or = 1 SE( )

    x t 2n;

    x .

    Agan, we plug n the ollowng values:

    x=505;SE( ) / / .x s n= = 100 20

    For 95% condence nterval, a= 0.05.

    Usng a ttable to locate t(0.05/2, 20 1), we nd that or 19 d, the value otthat gves an

    area o 0.025 n the rght tal s 2.093.

    Pluggng n x, SE(x), and t(a, n 1) nto the estmate or the condence nterval above, we

    nd that the 95% condence nterval or = [458.20, 551.80] We note that ths condence nterval

    s wdened compared wth that prevously estmated usng the z values. Ths s expected because the

    tdstrbuton s wder than the z dstrbuton at small sample szes, refectng the act that we have

    less condence n our estmate ox and, hence, when the sample sze s small.

    Condence ntervals may be estmated or most descrptve statstcs, such as the samplemean, sample varance, and even the lne o best t determned through lnear regresson [3]. As

    noted above, the condence nterval refects the varablty n the parameter estmate and s n-

    fuenced by the sample sze, the varablty o the populaton, and the level on condence that we

    desre. The greater the desred condence, the wder the nterval. Lkewse the greater the varablty

    n the samples, the wder the condence nterval.

    Eample 4.5 Revew o probablty concepts

    Problem: Assembly tmes were measured or a sample o 15 glucose nuson pumps. The mean tme

    to assem