1995 Practical Experiments in Statistics

Embed Size (px)

Citation preview

  • 8/13/2019 1995 Practical Experiments in Statistics

    1/7

    Practical Experiments in Statisticsraig A Stone and LorinD Mumaw

    San J ose State University, San Jose , CA95192-0101Gaining practical knowledge of statistics is important for quency distributions. From them, students learn how theundereraduates in the physical sciences. Demee programs in size of a data set (i.e., the number of measurements) affects. -chemistty, physics, bioio&, and math generally require stu-dents to take a course in statistics. Other fields, such as psy-chology and business, also rely on a knowledge of statiskics.Learning the concepts of statistics is essential if students areto understand the onalitv and esueciallv the limitations oftheir data. Without h i s ;nderst&ding it may be difficult tocomoare two different observations whose values s u ~ w s tif-ferent conclusions. Statistics can help in designing those ex-periments by more clearly defining a property or leading to amore firmly established conclusion.Although s tudents may be exposed to a thorough theo-retical treatm ent of statistics, they often miss the benefitof reducing this theory to practice. Laboratory experi-ments are time-consuming, so the size of data sets is lim-ited. Instruments tha t have a high throughput are expen-

    sive. Few are available in a classroom setting, and they areprobably unavailable for classes in math, psychology, andbusiness. It i s thu s difficult to generate the large data setsneeded to studv sta tistical conceuts.The expernnents described here ran he app1it.d 11, anyfield that reauires a knowlt,dm of statistics. They are ens.vto carry out; and they use ;expensive instrumentation.Sealed sources of radioactive nuclides are used to generatethe data. Nuclear decay, a microscopic property, produces anatura l statistical fluctuation on which the experimentsare based. The sources are small, often with an intensityon the order of the ' Am sources found in home smokedetectors. Thus. no suecial licenses or handling ~ro ce du re s.are needed for either the sources or the i n s t G e nt s . Ra-diation-detection instruments are available through sev-

    eral manufacturers who market their equipment to highschool and univers ity science programs. An introduction tothese concepts can be found in numerous books on stat is-tics. The following are suggested references for variousfields: general statistics (1, 2); eneral sciences, mathe-matics, and engineering 3 ,4 ) ; uclear science applications5); iology 6, ;psychology 8,9);nd business (10, ll ).The primary goal is for studen ts to become f a m ih r withorobsbilits distributions. Experiments an, tsi~mttd i] ac-quire a data set large enough to generate a series of fre-

    Figure I Gaussian distributions with mean values of 25,50, and 100.In most experiments in nuclear science, the width of the curve is de-fined as the sauare root of the mean value.

    650lo

    a Meant o

    5 M

    5Dl 1 200 MO 4MeasurementNumber

    Figure 2. Time distribution for a typical data set. The data set used toconstruct this figure contained 500 measurements.

    518 Journal of Chemical Education

  • 8/13/2019 1995 Practical Experiments in Statistics

    2/7

  • 8/13/2019 1995 Practical Experiments in Statistics

    3/7

    Figure 4. The effect hat data binning has on the distribution curve. Distributions are shown for in sizes 2, 5, 10, and 20. The width of graphs isconstant.

    The variance of the distribution is 02 nd it is equivalent tothe mean, n . This provides a simple method of estimatingthe inherent scatter in the data. The standard deviationin anexperiment is the square root of the mean for the distribution.Amean of 100 m'ves a standard deviation of 10%. 1000 eives3.2%, and so f o 2 h One th us intuitively knows themagnIrtudeof the uncertaintv Fi mr e llustrates how the width of thedistribution varies wiTb the mean.Geiger-Muller Counting Syst em

    Experiments a re best carried out with radiation-detec-tion equipment. They a re compact, simple to use, and havea high data throughput. The instruments are sensitiveenough that individual decays can be detected. Measure-ments are carried out by counting the number of decaysthat occur within small time intervals. Data will have anatural statistical fluctuation, and this gives the knownresponse used to understand statistical concepts.Laboratory courses can use almost any radiation-detec-tion instrument to generate frequency distributions. Manygamma-ray detectors use single crystals of NaKTl), Ge, or

    SXLi) to convert th e radia tion into a n electrical signal. Liq-uid scintillation systems are used in many biochemistryprograms, and they can he r un in a n automated mode.Gas-filled counters for alpha and beta spectroscopy can heused a s well. Many ins truments from physics or chemistrylaboratories can be adapted to these experiments.Whatever instrument is used, it must have a short cycletime, allowing students to complete a measurement inabout 10 s. An exoeriment with 250 data ~ o i n t s av takea s much as 50 min to c arry out with such a cycle time (in-cluding time required to record th e results). Some instru-ment s can be operated in an automatic, repetitive mode.Although a system t ha t automatically records the informa-

    tion is more efficient. the extra time s ~ e n t anuallv carrv-ing out the experiment ha s a n advantage: Stu dents noticethe large scatter in the ~ o i n t s hen thev see the data onemeasurement at a time.Results from the exoeriments should be disolaved usinegraphics software applications. Some commercial applica-tions have functions th at generate a frequency distributionfrom raw d ata. If they a re not available, a spreadsheet ap-plication should he used to process the information beforegraphing. Data a re first sorted by increasing value. Stu-dents then count the number of occurrences for each value.plotting th e number of occurrences versus value.Statistical Fluctuation in a Data Set

    In the first experiment students construct a large dataset. Measurements are carried out by placing a radioactivesource close to th e end window of a Geiger-Muller tube.The time interval should be set so th at a t least 100 decaysar e detected in a measurement. If th e time interval is toolarge, the experiment can be long. A set of conditions ischosen Le., the time interval and source-to-detector dis-tance), which produces 250 measurem ents in a reasonableamou nt of time. Much of the d ata acquisition time is usedto record the results. Students should work in pairs; onepartner sta rts th e count and calls th e result to the secondpartner, who records it. typical set of dat a is graphed asa time distribution in Figure as the value versus the or-der in which the measurement was made. The data setused here contains 500 measurements.Number of Points an d Appropriate Bin Size

    Firmre 3 shows how the number of ~o i n t sn a d ata set af-.ferts the l'requenc> d1s1r1huurm The w di;t ril~ uti~mswrtgnnrrareo using r r snm data rhar produred the tlme d~q tn -520 Journal o Chemical Education

  • 8/13/2019 1995 Practical Experiments in Statistics

    4/7

  • 8/13/2019 1995 Practical Experiments in Statistics

    5/7

    Vatlle8bFigure 8 Frequency distribution for two instruments. Part a is thefrequency distribution for the instrument whose time distributionisshown in part a o Figure 7. Likewise, part b of this figureis the fre-quency distribution for the instrument whose time distribution isshown in part b of Figure7.ments. If laboratory time is limited, groups work as ateam, sharing data sets and independently analyzing thedat a. Asecond method is to work with one instrument.Af-ter assembling a counting system, stud ents collect the firstdata set, change one experimental parameter, and thencollect the second data set. Some parame ters t hat can bechanged: Students can switch to a different power supplyor amplifier. They can also change the high voltage, thepreamplifier capacitance, or th e amplifier gain. Da ta setsar e constructed for each set of conditions.

    omparing the StabilityTime distributions are used to compare the stability in

    th e two data sets. These ar e generated by plotting meas-uremen ts with the ir value on the vertical axis and time onth e horizontal axis. The number of th e measure ment ex.,1 2, 3 is taken as time or At A constant should beadded to each measurement in one data set to verticallyoffset th at distribution from the other. Figure shows anexample of such a graph.

    It i s possible to measure t he stability of a n instrument bylooking qualitatively a t the time distributions. The graphsshould look random. A noticeable slope suggests th at anexperimental pa rameter i s drifting. Linear-regression pro-grams can be used to calculate th e slope of th e data. O therfeatures to look for include an oscillation that i s of a lowerfrequency tha n t he statistical fluctuation or a region th atvaries significantly from the mean. These feature s mightsuggest that one data set is better than t he other or thatone instrument performs better than the other. Part a ofFigure 7 ha s a large drift and is obviously the poorer in-stru men t of the two.

    gxe 9 Samp n c-n e for a nonoqcnco-s system Dan a, Panosnow me me o str oui80rr for in s sjstem, ano pan c snows 11s fre-quency distribution.omparing the Frequency D~stributionsInstrumental performance is also determined by com-

    paring the frequency distributions. The distributionsshould be symmetric and a s narrow a s possible. Excessnoise i n one component of th e instru ment will increase thewidth of the distribution and can lead to skewing. Withoutnoise the standard deviation is the square root of themean. If instrume ntal noise has a Gaussian distribution,then it will combine with t he statistical fluctuation by

    where a is th e observed s tan dard deviation;ad is thenatural statistical fluctuation of th e data from nuclear de-cay; and q is the instrumental uncertainty. Stu-dents should calculate a to de te rmin e if theinstrument significantly changes the assumed modelwhere a is the sq uare root of the me an). The standa rd de-viation a; is a quantitative figure of merit for com-

    522 Journal of Chemical Education

  • 8/13/2019 1995 Practical Experiments in Statistics

    6/7

    paring the two instruments. Fi gure 8 shows th at no suchcalculations are necessary to determine which is the opti-mum instmment.Sampling and Inhomogeneity

    A common meas urement problem is sampling some fea-ture of a large system. What sample size provides repre-sentative results:? In a fairly homogeneous system a meas-urement with a particular sample size can take on aGaussian distribuiion. h e d(y o icattcr dec~~case ssthe s i z e o t h e s;irnplt in cr c~ sc s ntd the sample includesthe entire system und er study.Nuclear-decay data sets can be used to illustrate this,assu ming instrumenta l noise does not significantly distortthe distribution. The system is th e set of measurements,

    and the sample is t he individual measurement, each col-lected with a time At Sample sizes are increased by in-creasing the me asurement time. Assume that th e countingtimes a re held to multiples of At Counting for a longertime period is similar to summing every n measurements.Because the measurement is the variance, summing themeasuremen ts properly propagates the uncertainties: Thetotal uncertainty is still the square root of the sum. Theoriginal set of measurements can thu s be used to explorevariations in sample size.Sampling urvesA sampling curve is shown in Figure 9 a. In this figurethe value for a measureme nt versus sample size is plotted,which is i n units of At The da ta used to ge nerate thi s fig-

    Figure 10. Sampling curve (part a) for a system whose time distribu-tion has a positive bias. function was added to the homogeneousdistribution giving a rise of 10% over the 500 points. Part b is the timedistribution for this system, and part c is the frequency distribution.

    Figure 11. Sampling curve for a system with an exponential bias (parta). The homogeneous system was normalized using a functionwithan exponential form. Data were then normalized so that the sum ofall values is equivalent to that for he homogeneous system. Part b isthe time distribution for this system and part c is the frequency distri-bution.Volume 72 Number 6 June 1995 523

  • 8/13/2019 1995 Practical Experiments in Statistics

    7/7

    Inhomogeneous Systemsmi:

    Figure 12. Sampling curve for a two-component system (part a). AGaussian distribution, centered at measurement number 200, wassuperimposed on the homogeneous distribution. The data were nor-malized so that the sum of all values is equivalent to that from thehomogeneous system. Part b is the time distribution for this system,and part c is the frequency distribution.ure are t he same a s those used to generate Figures 2 and3. Summed measurements are normalized to l A t Theshape of the sampling curve can he understood a s a seriesof frequency distr ibut ions viewed from above. Ahomogene-ous system will have a time distribution tha t is uniform,randomly varying about a mean value. This is shown aspart b of Figure 9, along with th e resulting frequency dis-tribution (p art c).

    Inhomogeneity skews the results. Spikes may he appar-ent in the time distribution, and it may have a slope orother nonstandard behavior. The frequency distributionbecomes asymmetric, and a larger number of samples isneeded to obtain a representa tive sampling of the system.Three inhomogeneous systems a re shown in Figures 10-12. The assumed time distributions were generated byadding a function to th e original data set an d normalizingthe entire dat a se t so th at t he su m of all points is equiva-lent to t ha t from th e original data s et. In th is way, thk sys-

    tems a re equivalent in total concentration or response huthave different inhomogeneities. Figure 10 bas a time dis-tribution with a 10 (positive) slope over the 500 points.The frequency distribution is asymmetric and i s distortedon the right side; the sampling curve is much broader atlarge sample sizes.In Fimre 11 a system is shown tha t could describe anvlem( ntal ron cen trat~ on ighly dep rnd mr on p:irriclr siw ,an enponenti:il function. The frc.qucl~cv ls tr ll ~u t~ onoesnot appear as a Gaussian f ~ n c t i o ~ a n dpre ads over a widerange of counts. Likewise, t he sampling curve i s broadlydistributed.Asvstem with two components or phases is shown in Fig-ure f2. A Gaussian peak was su pe ~m po se d n the other-wise random distribution. This peak i s evident in the fre-

    quency distribution a s the region to th e right of theprimary peak. The sampling curve almost appears to havetwo components. A low&v&ed sampling c k e , centerednea r a mean of about 620, is fairly well-defined, and a sec-ond weaker sampling curve is suggested near a meanvalue of about 750.onclusionSeveral courses at S an Jose Stat e University have usedthese experiments for two years. Most of the- experienceha s been in courses in nuclear science and health physics,courses that traditimnlly ha w a stron gemp has~sn siaris-tirs and inirrurncnta~lon. he Chemistrv Departmentteaches a course in scientific computing. AS of thiscourse students use the experiments to learn about data

    processing and issues of instrument performance. Stu-dents in each course carry out thes e experiments on a va-riety of the radiation-detection inst ruments in t he NuclearScience Facility. During the upcoming year the experi-men ts may be extended to courses within the Physics De-partment and later to other departments around campus.Literature ited1. Witte R S. SlotiBirr. 4th ed.: Harcourt BraceJovanovich Coll eg: New York 1993.2. Baird D. C. Eq~~rimenlofinr~nd ed. : Prentiee Hull: Englewood CliK? NJ. 1983.3. ~ar sen . . J.: an, M. L . A ~ntmductiono Marliamoiieoi statialicr and itx~ppli-cations. 2nd ad.;PmnticeHall:Englcwoad Cliffs. NJ. 1986.4. Hogg. R. V; Ledolte~ A pp ii d Sloiislics or Enpinreir ond Physicnl Sckntisf s. 2ndcd.; Macmillan: New York 1992.5. Knoll. G. Radiotion Defection mnd Mensummml 2nd ed.: John Wiley and S a mNewYork. 1989.fi Clarke. G. M. Stntislrcsortd Eroerimsatoi D o s ~ ~ I .nd ed.: Edward Arnold Ltd: Lon

    andHall: N e w York 1985.6. Howell. D. C. Fundomenla1 Stolis iics or the Behavioral Se6nces. 2nd ed.: Duxbury:Selmont CA. 1989.9. Fergusan.G. A. SfnlisiicnlAnniy.ir in Psyehoiopy nndEducafion. 4th ad.: McGrau-Hill:Ne w York 1976.10 Winn. P. R.; Johnson. R H. Burinesr Slatidicr: Maemillan: New York. 1978.11. Mendenhall. W.; Rainmuth. J. E.; eaver, R : Duhan D. Sfolislics/nr nizagementondEmnomirs 5th ed.: PWS: Boston. 1986.

    524 Journal of Chemical Education