Upload
saba-butt
View
1.709
Download
1
Embed Size (px)
Citation preview
1
BIOSTATISTICS – A TOOL FOR RESEARCH AND DATA ANALYSIS
PRESENTED BYSABA BUTT
2
SIGNIFICANCE OF STATISTICS FOR ANALYSIS AND RESEARCH
3
STATISTICS IS NECESSARY FOR ALL FIELDS OF LIFE REQUIRING RESEARCH AND DATA ANALYSIS
In all fields of life we have to analyze facts and interpret from these to make conclusions. The analysis needs statistics – to compare the qualities and quantities to help reach some conclusion, which will lead to decision making in business, government, industry etc and development of theories in science.
4
BIOSTATISTICS
THE STATISTICS IN LIFE SCIENCES
5
• designing experiments and other data collection, • summarizing information to aid understanding, • drawing conclusions from data, and • estimating the present or predicting the future.
In making predictions, Statistics uses the companion subject of Probability, which models chance mathematically and enables calculations of chance in complicated cases.
BIOSTATISTICS IS A DISCIPLINE THAT IS CONCERNED WITH:
6
SOME IMPORTANT DEFINITIONS
7
POPULATION AND SAMPLE
POPULATION: A population consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as all males between the ages of 15 and 18.
SAMPLE: A sample is a subset of a Population Since it is usually impractical to test every member of a population, a sample from the population is typically the best approach available.
8
PARAMETER AND STATISTIC
PARAMETER: A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency in a population.
STATISTIC: A "statistic" is defined as a numerical quantity (such as the mean calculated in a sample).
9
MEASURES OF CENTRAL TENDENCY
Mean (Arithmetic Mean)
Average value of a sample or population
Median
Middle value of sample or population
ModeThe value repeated most
10
The Arithmetic Mean or Mean is what is commonly called the average: When the word "mean" is used without a modifier, it can be assumed that it refers to the arithmetic mean. The mean is the sum of all the scores divided by the number of scores.
Formula of calculating Population Mean is:
μ = ΣX/N,
where μ = population mean, and
N = number of scores.
If the scores are from a sample, then the symbol X refers to the mean and n refers to the sample size, formula written as:
X = ΣX/n.
11
Median: The median is the middle of a distribution: half the scores are above the median and half are below the median. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions.
5 3 4 2.5 6Mode: The mode is the most frequently occurring score in a distribution and is used as a measure of central tendency. The advantage of the mode as a measure of central tendency is that its meaning is obvious.
5 3 4 5 6
12
MEASURES OF DISPERSION
After measuring the central value i.e., mean, next is to know that to which extent this central value represents all values, that is, to know the scattering or dispersion of the data. There are certain measures which gives values of dispersion. The most important and widely used of these in research are:Varience Standard DeviationStandard Error of Mean
13
HYPOTHESIS TESTING
Student’s t test
F test
ANOVA
Correlation
Regression
14
EXAMPLE OF DATA ANALYSIS
Comparison of Weight to Height Ratio expressed by Body Mass Index of a population. BMI is calculated as weight in Kg / Height in Meter2.
General surveys in USA and Europe showed that young population is overweight which is enhancing chances of diseases. We surveyed young female population of Punjab University for BMI. We measured BMI of 400 students randomly.
15
36.6620.2130.2929.3331.9727.5825.3326.9027.7427.0126.8222.6531.9030.8120.8425.1922.9828.6822.7322.8627.73
M-1M-2M-3M-4M-5M-6M-7M-8M-9M-10M-11M-12M-13M-14M-15M-16M-17M-18M-19M-20M-21
F-1F-2F-3F-4F-5F-6F-7F-8F-9F-10F-11F-12F-13F-14F-15F-16F-17F-18F-19F-20F-21
30.1128.0016.8738.9435.6332.6923.9225.5530.8743.4335.3419.6536.4534.3534.1538.8626.2829.5224.9929.7534.58
Subject No.
BMI Subject No.
BMI
16
• We have two tables of data: one giving BMI of girls, other BMI of boys. These are long data tables.
• Now, we have to analyze it to conclude something from this data . What we need, now?
• We need a measure of central tendency to indicate average BMI to compare with other populations, between boys and girls and with the normal range.
The most common and useful measure for the purpose is the Arithmetic Mean. Arithmetic Mean is calculated by taking sum of all values and dividing it by No. of observations.
ARITHMETIC MEAN
17
SAMPLING ERROR
Then next, we have an average value but is this average representative of all values really. Is it possible that some values be very large and some very small? If it is so, the Mean is not representative of whole data. This is called sampling error because some students may have strong genetic tendency to being overweight, these values are somewhat different from population. This will make our result erroneous, i.e., our Mean does not represent all data.
18
EXAMPLE
We have four values - 2, 3, 4, 10
Mean = Sum of values / No of Observations
2 + 3 + 4 + 10 / 4
= 4.75
This is far from three values in the data. This is because of a large value that exist in the data i.e. 10.
19
STANDARD DEVIATION
• Now, we need some statistical measure that tell us how to rule out sampling error.
• This is the standard deviation – measure to find how the individual values vary from the average value, i.e., Mean.
20
Standard Deviation of that Data
SD = s = ∑ (x – x) 2
n - 1
Descriptive Statistics from MINITAB
Variable N Mean Median StDev SE Mean
C1 4 4.75 3.50 3.59 1.80
21
Student’s T Test
Two Sample T-Test and Confidence Interval
Two sample T for BMI-F vs BMI-M N Mean StDev SE Mean
BMI-F 30 31.35 6.26 1.1
BMI-M 21 26.96 4.11 0.90
95% CI for mu BMI-F - mu BMI-M: ( 1.5, 7.31)T-Test mu BMI-F = mu BMI-M (vs not =): T= 3.02 P=0.0040 DF= 48