Statistical Analysis Of Data Final

1

BIOSTATISTICS – A TOOL FOR RESEARCH AND DATA ANALYSIS

PRESENTED BYSABA BUTT

2

SIGNIFICANCE OF STATISTICS FOR ANALYSIS AND RESEARCH

3

STATISTICS IS NECESSARY FOR ALL FIELDS OF LIFE REQUIRING RESEARCH AND DATA ANALYSIS

In all fields of life we have to analyze facts and interpret from these to make conclusions. The analysis needs statistics – to compare the qualities and quantities to help reach some conclusion, which will lead to decision making in business, government, industry etc and development of theories in science.

4

BIOSTATISTICS

THE STATISTICS IN LIFE SCIENCES

5

• designing experiments and other data collection, • summarizing information to aid understanding, • drawing conclusions from data, and • estimating the present or predicting the future.

In making predictions, Statistics uses the companion subject of Probability, which models chance mathematically and enables calculations of chance in complicated cases.

BIOSTATISTICS IS A DISCIPLINE THAT IS CONCERNED WITH:

6

SOME IMPORTANT DEFINITIONS

7

POPULATION AND SAMPLE

POPULATION: A population consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as all males between the ages of 15 and 18.

SAMPLE: A sample is a subset of a Population Since it is usually impractical to test every member of a population, a sample from the population is typically the best approach available.

8

PARAMETER AND STATISTIC

PARAMETER: A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency in a population.

STATISTIC: A "statistic" is defined as a numerical quantity (such as the mean calculated in a sample).

9

MEASURES OF CENTRAL TENDENCY

Mean (Arithmetic Mean)

Average value of a sample or population

Median

Middle value of sample or population

ModeThe value repeated most

10

The Arithmetic Mean or Mean is what is commonly called the average: When the word "mean" is used without a modifier, it can be assumed that it refers to the arithmetic mean. The mean is the sum of all the scores divided by the number of scores.

Formula of calculating Population Mean is:

μ = ΣX/N,

where μ = population mean, and

N = number of scores.

If the scores are from a sample, then the symbol X refers to the mean and n refers to the sample size, formula written as:

X = ΣX/n.

11

Median: The median is the middle of a distribution: half the scores are above the median and half are below the median. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions.

5 3 4 2.5 6Mode: The mode is the most frequently occurring score in a distribution and is used as a measure of central tendency. The advantage of the mode as a measure of central tendency is that its meaning is obvious.

5 3 4 5 6

12

MEASURES OF DISPERSION

After measuring the central value i.e., mean, next is to know that to which extent this central value represents all values, that is, to know the scattering or dispersion of the data. There are certain measures which gives values of dispersion. The most important and widely used of these in research are:Varience Standard DeviationStandard Error of Mean

13

HYPOTHESIS TESTING

Student’s t test

F test

ANOVA

Correlation

Regression

14

EXAMPLE OF DATA ANALYSIS

Comparison of Weight to Height Ratio expressed by Body Mass Index of a population. BMI is calculated as weight in Kg / Height in Meter2.

General surveys in USA and Europe showed that young population is overweight which is enhancing chances of diseases. We surveyed young female population of Punjab University for BMI. We measured BMI of 400 students randomly.

15

36.6620.2130.2929.3331.9727.5825.3326.9027.7427.0126.8222.6531.9030.8120.8425.1922.9828.6822.7322.8627.73

M-1M-2M-3M-4M-5M-6M-7M-8M-9M-10M-11M-12M-13M-14M-15M-16M-17M-18M-19M-20M-21

F-1F-2F-3F-4F-5F-6F-7F-8F-9F-10F-11F-12F-13F-14F-15F-16F-17F-18F-19F-20F-21

30.1128.0016.8738.9435.6332.6923.9225.5530.8743.4335.3419.6536.4534.3534.1538.8626.2829.5224.9929.7534.58

Subject No.

BMI Subject No.

BMI

16

• We have two tables of data: one giving BMI of girls, other BMI of boys. These are long data tables.

• Now, we have to analyze it to conclude something from this data . What we need, now?

• We need a measure of central tendency to indicate average BMI to compare with other populations, between boys and girls and with the normal range.

The most common and useful measure for the purpose is the Arithmetic Mean. Arithmetic Mean is calculated by taking sum of all values and dividing it by No. of observations.

ARITHMETIC MEAN

17

SAMPLING ERROR

Then next, we have an average value but is this average representative of all values really. Is it possible that some values be very large and some very small? If it is so, the Mean is not representative of whole data. This is called sampling error because some students may have strong genetic tendency to being overweight, these values are somewhat different from population. This will make our result erroneous, i.e., our Mean does not represent all data.

18

EXAMPLE

We have four values - 2, 3, 4, 10

Mean = Sum of values / No of Observations

2 + 3 + 4 + 10 / 4

= 4.75

This is far from three values in the data. This is because of a large value that exist in the data i.e. 10.

19

STANDARD DEVIATION

• Now, we need some statistical measure that tell us how to rule out sampling error.

• This is the standard deviation – measure to find how the individual values vary from the average value, i.e., Mean.

20

Standard Deviation of that Data

SD = s = ∑ (x – x) 2

n - 1

Descriptive Statistics from MINITAB

Variable N Mean Median StDev SE Mean

C1 4 4.75 3.50 3.59 1.80

21

Student’s T Test

Two Sample T-Test and Confidence Interval

Two sample T for BMI-F vs BMI-M N Mean StDev SE Mean

BMI-F 30 31.35 6.26 1.1

BMI-M 21 26.96 4.11 0.90

95% CI for mu BMI-F - mu BMI-M: ( 1.5, 7.31)T-Test mu BMI-F = mu BMI-M (vs not =): T= 3.02 P=0.0040 DF= 48

Technology

Statistical Analysis Of Data Final