View
222
Download
0
Category
Preview:
Citation preview
8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx
1/4
BIOSTAT
INTRODUCTION
STATISTICSA science whereby inferences aremade about specific random phenomena on the
basis of relatively limited sample material. Mathematical Statisticsconcerns the
development of new methods of statistical inference
and requires detailed knowledge of abstract
mathematics for its implementation.
Applied Statisticsinvolves application ofmathematical statistical methods to specific subject
areas such as economics, psychology, and public
health.
BIOSTATICSTICSA branch of applied statistics thatapplies statistical methods to medical and biological
problems.
Standard statistical methods may not necessarily beapplicable for all studies.
New Bio statistical methods are developed byBiostaticians.
ROLE OF BIOSTATISTICS IN MEDICAL RESEARCH
Observation:
Blood pressure readings of patient X obtained using,
Automatic measuring device = 115 mmHg;Highest reading = 130 mmHg
Standard blood pressure cuff = 90 mmHgWhy is there a difference in blood pressure readings
between an automatic machine vs. a human
observer?Are the two methods of determining blood pressure
comparable?
Study Questions:
Are the methods of automatic vs. manual
determination of blood pressure comparable?
To address this question, we designed and carried
out the following small-scale study of blood pressure
monitoring machines.
Q & A:
1. No. of machines to be tested.- 4; since machines may or may not be
comparable in quality.
2. No. of participants for each machine to betested.
- 100 people at each test location basedon sample size determination method.
3. Order of taking measurements:
ManualAutomated or vice versa (For our
study, simultaneous readings were
logistically feasible)
- To rule out any effects that themeasurement menthod may have, the
order of measurement was randomized
(flipping coin, using a table of randomnumbers, etc.)
4. Critical data to be captured viaquestionnaire to aid in comparison between
the methods.
*Age
*Sex
*Previous Hypertension History
*Body Size (since this variable was seento
influence accurate reading)
5. Format of recording data to ease futuredata entry into computers
*Each person assigned a unique
identification number (ID)
*Using a coding form that was keyed in and
verified
*Same coding form entered twice to ensure
accuracy of records
6. Checking accuracy of computerized data.*Using editing programs to check that all
values of variables fell within specific range
*Outliers or aberrant values were manually
checked
NEXT STEPS:
Data
Collection
DataEntry
Data
Editing
Data
Analysis
8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx
2/4
DATA ANALYSISData obtained from the study can
be summarized using descriptive statistics
*Descriptive material can be Numeric or Graphic
> If Numeric, data can be tabulated or presented as
frequency distribution
> If Graphic, data can be summarized pictorially
Choice of numeric or graphic descriptive statistics isdependent on type of distribution of data.
1. Continuous Data: Where there are infinite numbers possible
values (e.g. blood pressure measurements)
Means and standard deviations may beused
2. Discrete Data: Where there are onlu a few possible values
(e.g. sex)
Percentages of people for each value maybe considered
INFERENTIAL STATISTICSdetermining whether the
difference in blood pressure readings is real or by
chance
Sample size = 98 people from the general population
Estimated mean difference = 14 mmHg
Error in estimated mean difference = ?
True mean difference = d = ?
Inferring the characteristics of a population from a
sample is the central concern of statistical inference.
To accomplish this aim, we need to develop a
probability model, which would tell us how likely it
is to obtain a 14-mmHg difference between the two
methods in a sample of 98 people if there were no
real difference between the two methods over theentire population of users of the machine.
A small enough probability would indicate that the
difference between the two methods is real.
For our study, we used a probability model based on
t-distribution.
The probability was found to be
8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx
3/4
NEGATIVELY SKEWED DISTRIBUTIONS -arithmetic mean tends to be smaller than
the median.
MODE
- The most frequently occurring valueamong all the observations in a sample.
Data distributions may have one or moremodes.
UNIMODALOne Mode BIMODALTwo Modes TRIMODAL and so onThree or More
Modes
GEOMETRIC MEAN
Many types of laboratory data can be expressed as
multiples of 2 or a constant multipled by a power of
2, that is,
SOME PROPERTIES OF ARITHMETIC MEAN
Original sample:
Translated sample: X1 + Cz , Xn + C (Where c is some
constant)
Let Yi = Xi + C I = 1, , n then y = x + c
MEASURES OF SPEED
The mean obtained by the two methods is the same.
However, the variability or spread of the
Autoanalyzer method appers to be greater.
RANGE OR VARIABILITY RANGEis the difference between the
largest and smallest observations in a
sample
Once the sample is ordered, it is very easyto compute the range.
Range is very sensitive to extremeobservations or outliers.
Larger sample size (n), the largest range andthe more difficult the comparison between
the ranges from data sets of varying sizes.
*A better approach to quantifying the spread in data
sets is percentiles or quantiles.
*Percentiles are less sensitive to outliers and are not
greatly affected by the sample size.
The pth percentile is the value Vp such that p percent
of the sample points are less than or equal to VpThe pth percentile is defined by
The (k+1)th largest sample point if np/100 isnot an integer (where k is the largest
integer less than np/100)
The average of the (np/100)th and(np/100+1)th largest observations if np/100
is an integer
Frequently used percentiles are
Quartiles (25th, 50th, and 75thpercentiles) Quintiles (20th, 40th, 60th, and 80th
percentiles) Deciles (10th, 20th, , 90thpercentiles)
To compute percentiles, the sample points must be
ordered.
If n is large, a stem-and-leaf plot or a computer
program may be used.
VARIANCE AND STANDARD DEVIATION
If the center of the sample is defined as the
arithmetic mean, then the measure that can
summarize the difference (or deviations) between
the individual sample points and the arithmetic
mean can be expressed as
That is,
The sum of the deviations of the individual
observations of a sample about the sample mean is
always zero.
Standard deviation d is a reasonable measure of
spread if the distribution is bell-shaped.
MEAN DEVIATION
The difference d does not help distinguish the
difference in spreads between two methods.
Mean Deviation, expressed as may
be used.
Alternatively, sample variances or variance, which is
the average of the squares of the deviations from
the sample mean may be used
Another commonly used measure of spread is the
sample standard deviation
COEFFICIENT OF VARIATION (CV)
Defined as 100% x (s/X)
Remains the same regardless of units used
Useful in comparing variability of different samples
with different arithmetic means
Useful for comparing the reproducibility of different
variables.
GRAPHIC METHODS
8/13/2019 MATH600 - BIOSTATISTICS Reviewer.docx
4/4
Graphic methods of displaying data give a quick
overall impression of data. The following are some
graphic methods:
*BAR GRAPHS:
> Used to display grouped data;
> Difficult to contrast;> Identity of the sample points within the respective
group is lost
*STEM-AND-LEAF PLOTS:
> Easy to compute the median and other quantiles
> Each data point is converted into stem and leaf,
e.g. 438 (stem: 43; leaf:8)
*BOX PLOTS:
> Uses the relationships among the median, upper
quantile, and lower quantile to describe the
skeweness or symmetry of a distribution
An outlying calue is a value x such that either:
x>upper quartile +1.5 x (upper quartilelower
quartile)
x< lower quartile1.5 x (upper quartilelower
quartile)
An extreme outlying value is a value x such that
either:
X > upper quartile + 3.0 x (upper quartilelower
quartile)
X < lower quartile3.0 x (upper quartilelower
quartile)
A vertical bar connects the upper quartile tothe largest nonoutlying value in the sample
A vertical bar connects the lower quartile tothe smallest nonoutlying value in the
sample
OBTAINING DESCRIPTIVE STATISTICES USING A
COMPUTER
Numerous statistical packages may be used Excel may be used to compute average (for
the arithmetic mean), median (for the
median), StDev (for the standard deviation),
Var (for the Variance), GeoMean (for the
Geometric Mean), and Percentile (for
obtaining arbitrary percentiles from a
sample).
SUMMARY:
Numeric or graphic methods for displaying data help
in:
Quickly summarizing a data set And/or presenting results to others
A data set can be described numerically in terms of
measure of location and a measure of spread:
Measure of Location
Arithmetic Mean
Median
Mode
Geometric Mean
Measure of Spread
Standard Deviation
Quantiles
Range
Graphic methods include: Bar Graphs and more
exploratory methods such as Stem-and-Leaf Plots
and Box Plots.
Recommended