MATH600 - BIOSTATISTICS Reviewer.docx

Embed Size (px)

Citation preview

  • 8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx

    1/4

    BIOSTAT

    INTRODUCTION

    STATISTICSA science whereby inferences aremade about specific random phenomena on the

    basis of relatively limited sample material. Mathematical Statisticsconcerns the

    development of new methods of statistical inference

    and requires detailed knowledge of abstract

    mathematics for its implementation.

    Applied Statisticsinvolves application ofmathematical statistical methods to specific subject

    areas such as economics, psychology, and public

    health.

    BIOSTATICSTICSA branch of applied statistics thatapplies statistical methods to medical and biological

    problems.

    Standard statistical methods may not necessarily beapplicable for all studies.

    New Bio statistical methods are developed byBiostaticians.

    ROLE OF BIOSTATISTICS IN MEDICAL RESEARCH

    Observation:

    Blood pressure readings of patient X obtained using,

    Automatic measuring device = 115 mmHg;Highest reading = 130 mmHg

    Standard blood pressure cuff = 90 mmHgWhy is there a difference in blood pressure readings

    between an automatic machine vs. a human

    observer?Are the two methods of determining blood pressure

    comparable?

    Study Questions:

    Are the methods of automatic vs. manual

    determination of blood pressure comparable?

    To address this question, we designed and carried

    out the following small-scale study of blood pressure

    monitoring machines.

    Q & A:

    1. No. of machines to be tested.- 4; since machines may or may not be

    comparable in quality.

    2. No. of participants for each machine to betested.

    - 100 people at each test location basedon sample size determination method.

    3. Order of taking measurements:

    ManualAutomated or vice versa (For our

    study, simultaneous readings were

    logistically feasible)

    - To rule out any effects that themeasurement menthod may have, the

    order of measurement was randomized

    (flipping coin, using a table of randomnumbers, etc.)

    4. Critical data to be captured viaquestionnaire to aid in comparison between

    the methods.

    *Age

    *Sex

    *Previous Hypertension History

    *Body Size (since this variable was seento

    influence accurate reading)

    5. Format of recording data to ease futuredata entry into computers

    *Each person assigned a unique

    identification number (ID)

    *Using a coding form that was keyed in and

    verified

    *Same coding form entered twice to ensure

    accuracy of records

    6. Checking accuracy of computerized data.*Using editing programs to check that all

    values of variables fell within specific range

    *Outliers or aberrant values were manually

    checked

    NEXT STEPS:

    Data

    Collection

    DataEntry

    Data

    Editing

    Data

    Analysis

  • 8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx

    2/4

    DATA ANALYSISData obtained from the study can

    be summarized using descriptive statistics

    *Descriptive material can be Numeric or Graphic

    > If Numeric, data can be tabulated or presented as

    frequency distribution

    > If Graphic, data can be summarized pictorially

    Choice of numeric or graphic descriptive statistics isdependent on type of distribution of data.

    1. Continuous Data: Where there are infinite numbers possible

    values (e.g. blood pressure measurements)

    Means and standard deviations may beused

    2. Discrete Data: Where there are onlu a few possible values

    (e.g. sex)

    Percentages of people for each value maybe considered

    INFERENTIAL STATISTICSdetermining whether the

    difference in blood pressure readings is real or by

    chance

    Sample size = 98 people from the general population

    Estimated mean difference = 14 mmHg

    Error in estimated mean difference = ?

    True mean difference = d = ?

    Inferring the characteristics of a population from a

    sample is the central concern of statistical inference.

    To accomplish this aim, we need to develop a

    probability model, which would tell us how likely it

    is to obtain a 14-mmHg difference between the two

    methods in a sample of 98 people if there were no

    real difference between the two methods over theentire population of users of the machine.

    A small enough probability would indicate that the

    difference between the two methods is real.

    For our study, we used a probability model based on

    t-distribution.

    The probability was found to be

  • 8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx

    3/4

    NEGATIVELY SKEWED DISTRIBUTIONS -arithmetic mean tends to be smaller than

    the median.

    MODE

    - The most frequently occurring valueamong all the observations in a sample.

    Data distributions may have one or moremodes.

    UNIMODALOne Mode BIMODALTwo Modes TRIMODAL and so onThree or More

    Modes

    GEOMETRIC MEAN

    Many types of laboratory data can be expressed as

    multiples of 2 or a constant multipled by a power of

    2, that is,

    SOME PROPERTIES OF ARITHMETIC MEAN

    Original sample:

    Translated sample: X1 + Cz , Xn + C (Where c is some

    constant)

    Let Yi = Xi + C I = 1, , n then y = x + c

    MEASURES OF SPEED

    The mean obtained by the two methods is the same.

    However, the variability or spread of the

    Autoanalyzer method appers to be greater.

    RANGE OR VARIABILITY RANGEis the difference between the

    largest and smallest observations in a

    sample

    Once the sample is ordered, it is very easyto compute the range.

    Range is very sensitive to extremeobservations or outliers.

    Larger sample size (n), the largest range andthe more difficult the comparison between

    the ranges from data sets of varying sizes.

    *A better approach to quantifying the spread in data

    sets is percentiles or quantiles.

    *Percentiles are less sensitive to outliers and are not

    greatly affected by the sample size.

    The pth percentile is the value Vp such that p percent

    of the sample points are less than or equal to VpThe pth percentile is defined by

    The (k+1)th largest sample point if np/100 isnot an integer (where k is the largest

    integer less than np/100)

    The average of the (np/100)th and(np/100+1)th largest observations if np/100

    is an integer

    Frequently used percentiles are

    Quartiles (25th, 50th, and 75thpercentiles) Quintiles (20th, 40th, 60th, and 80th

    percentiles) Deciles (10th, 20th, , 90thpercentiles)

    To compute percentiles, the sample points must be

    ordered.

    If n is large, a stem-and-leaf plot or a computer

    program may be used.

    VARIANCE AND STANDARD DEVIATION

    If the center of the sample is defined as the

    arithmetic mean, then the measure that can

    summarize the difference (or deviations) between

    the individual sample points and the arithmetic

    mean can be expressed as

    That is,

    The sum of the deviations of the individual

    observations of a sample about the sample mean is

    always zero.

    Standard deviation d is a reasonable measure of

    spread if the distribution is bell-shaped.

    MEAN DEVIATION

    The difference d does not help distinguish the

    difference in spreads between two methods.

    Mean Deviation, expressed as may

    be used.

    Alternatively, sample variances or variance, which is

    the average of the squares of the deviations from

    the sample mean may be used

    Another commonly used measure of spread is the

    sample standard deviation

    COEFFICIENT OF VARIATION (CV)

    Defined as 100% x (s/X)

    Remains the same regardless of units used

    Useful in comparing variability of different samples

    with different arithmetic means

    Useful for comparing the reproducibility of different

    variables.

    GRAPHIC METHODS

  • 8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx

    4/4

    Graphic methods of displaying data give a quick

    overall impression of data. The following are some

    graphic methods:

    *BAR GRAPHS:

    > Used to display grouped data;

    > Difficult to contrast;> Identity of the sample points within the respective

    group is lost

    *STEM-AND-LEAF PLOTS:

    > Easy to compute the median and other quantiles

    > Each data point is converted into stem and leaf,

    e.g. 438 (stem: 43; leaf:8)

    *BOX PLOTS:

    > Uses the relationships among the median, upper

    quantile, and lower quantile to describe the

    skeweness or symmetry of a distribution

    An outlying calue is a value x such that either:

    x>upper quartile +1.5 x (upper quartilelower

    quartile)

    x< lower quartile1.5 x (upper quartilelower

    quartile)

    An extreme outlying value is a value x such that

    either:

    X > upper quartile + 3.0 x (upper quartilelower

    quartile)

    X < lower quartile3.0 x (upper quartilelower

    quartile)

    A vertical bar connects the upper quartile tothe largest nonoutlying value in the sample

    A vertical bar connects the lower quartile tothe smallest nonoutlying value in the

    sample

    OBTAINING DESCRIPTIVE STATISTICES USING A

    COMPUTER

    Numerous statistical packages may be used Excel may be used to compute average (for

    the arithmetic mean), median (for the

    median), StDev (for the standard deviation),

    Var (for the Variance), GeoMean (for the

    Geometric Mean), and Percentile (for

    obtaining arbitrary percentiles from a

    sample).

    SUMMARY:

    Numeric or graphic methods for displaying data help

    in:

    Quickly summarizing a data set And/or presenting results to others

    A data set can be described numerically in terms of

    measure of location and a measure of spread:

    Measure of Location

    Arithmetic Mean

    Median

    Mode

    Geometric Mean

    Measure of Spread

    Standard Deviation

    Quantiles

    Range

    Graphic methods include: Bar Graphs and more

    exploratory methods such as Stem-and-Leaf Plots

    and Box Plots.