78
Biostatistic s Khushbu Mishra

Biostatistics khushbu

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Biostatistics khushbu

Biostatistics Khushbu Mishra

Page 2: Biostatistics khushbu

CONTENTS

• Introduction • Definition• Common statistical terms• Sources and collection of Data• Presentation of Data• Analysis and interpretation Statistical averages Measures of Dispersion

Page 3: Biostatistics khushbu

Sampling and sampling methodsSampling errorsTests of significanceCorrelation and regressionlimitations

Page 4: Biostatistics khushbu

Introduction • We, medical and dental students during period of our study,

learn best methods of diagnosis and therapy.• After graduation, we go through research papers presented at

conferences and in current journals to know

new methods of therapy,

improvement in diagnosis and surgical techniques.• It must be admitted that essence of papers contributed to

medical journals is largely statistical.

Page 5: Biostatistics khushbu

Training in statistics has been recognized as “indispensible” for students of medical science.

for eg.

if we want to establish cause and effect relationship, we need statistics.

if we want to measure state of health and also burden of disease in community, we need statistics.

Page 6: Biostatistics khushbu

• statistics are widely used in epidemiology,

clinical trial of drug vaccine

program planning

community medicine

health management

health information system etc..

• The knowledge of medical statistics enables one to develop a self- confidence & this will enable us to become a good clinician, good medical research worker, knowledgable in statistical thinking.

Page 7: Biostatistics khushbu

• Everything in medicine, be it research, diagnosis or treatment

depends on counting or measurment.

• According to Lord Kelvin,

when you can measure what you are speaking about and express it in numbers, you know something about it but when you can not measure, when you can not express it in numbers, your knowledge is of meagre and unsatisfactory kind.

Page 8: Biostatistics khushbu

Bio-Statistics in Various areas

Health Statistics

Medical Statistics

Vital Statistics

Page 9: Biostatistics khushbu

• In Public Health or Community Health, it is called Health Statistics.

• In Medicine, it is called Medical Statistics. In this we study the defect, injury, disease, efficacy of drug, Serum and Line of treatment, etc.,

• In population related study it is called Vital Statistics. e.g. study of vital events like births, marriages and deaths.

Page 10: Biostatistics khushbu

• Application and uses of Biostatistics as a science..

in Physiology,

a. to define what is normal/healthy in a population

b. to find limits of normality

c. to find difference between means and proportions of normal at two places or in different periods.

d. to find the correlation between two variables X and Y such as in height or weight..

for eg. Weight increases or decreases proportionately with height and if so by how much has to be found.

Page 11: Biostatistics khushbu

• In Pharmacology,

a. To find action of drug

b. To compare action of two different drugs

c. To find relative potency of a new drug with respect to a standard drug.

• In Medicine,

a. To compare efficacy of particular drug, operation or line of treatment.

b. To find association between two attributes eg. Oral cancer and smoking

c. To identify signs and symptoms of disease/ syndrome.

Page 12: Biostatistics khushbu

Common statistical terms

• Variable:- A characteristic that takes on different values in different persons, places/ things.

• Constant:- Quantities that donot vary such as π = 3.141

e = 2.718

these donot require statistical study.

In Biostatistics, mean, standard deviation, standard error, correlation coefficient and proportion of a particular population are considered constant.

• Observation:- An event and its measurment.

for eg.. BP and its measurment..

Page 13: Biostatistics khushbu

• Observational unit:- the “sources” that gives observation for eg. Object, person etc.

in medical statistics:- terms like individuals, subjects etc are used more often.

• Data :- A set of values recorded on one or more observational units.

• Population:- It is an entire group of people or study elements-

persons, things or measurments for which we

have an intrest at particular time.• Sampling unit:- Each member of a population.• Sample:- It may be defined as a part of a population.

Page 14: Biostatistics khushbu

• Parameter:- It is summary value or constant of a variable, that describes the sample such as its mean,

standard deviation

standard error

correlation coefficient etc..• Parametric tests:- It is one in which population constants such

as described above are used :- mean,

variances etc..

data tend to follow one assumed or established distribution such as normal, binomial, poisson etc..

• Non- parametric tests:- Tests such as CHI- SQUARE test, in which no constant of population is used.

Data donot follow any specific distribution and no assumptions are made in non- parametric tests.eg ..good, better and best..

Page 15: Biostatistics khushbu

American Heritage Dictionary® defines statistics as: "The mathematics of the collection, organization, and interpretation of numerical data, especially the analysis of population characteristics by inference from sampling.”

The Merriam-Webster’s Collegiate Dictionary® definition is: "A branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data."

DEFINITION

Page 16: Biostatistics khushbu

A Simple but Concise definition by Croxton and Cowden:“Statistics is defined as the Collection, Presentation, Analysis and Interpretation of numerical data.”

Page 17: Biostatistics khushbu

“Statistics defined as the science of

Collection,

Organisation,

presentation,

analysis and

interpretation of numerical data.”

In the line of the definition of Croxton and Cowden, a comprehensive definition of Statistics can be:

Page 18: Biostatistics khushbu

• STATISTIC/ DATUM:- measured/ counted fact or piece of information

such as height of person,

birth weight of baby…• STATISTICS/ DATA:- plural of the same

such as height of 2 persons,

birth weight of 5 babies

plaque score of 3 persons…• BIOSTATISTICS:- term used when tools of statistics are

applied to the data that is derived from biological sciences such as medicine.

Page 19: Biostatistics khushbu

Types of Data

Qualitative Data Quantitative Data

Nominal Ordinal Discrete Continuous

Interval Ratio

Page 20: Biostatistics khushbu

COLLECTION OF DATA

Data can be collected throughPrimary sources:- here data is obtained by the investigator himself. This is first hand information.

Secondary sources:- The data already recorded is utilized to serve the purpose of the objective of study eg. records of OPD of dental clinics.

Page 21: Biostatistics khushbu

• Main sources for collection of medical statistics:-

1. Experiments

2. Surveys

3. Records.

• Experiments and surveys are applied to generate data needed for specific purposes.

• While Records provide ready- made data for routine and continuous information.

Page 22: Biostatistics khushbu

Methods of collection of data

• Method of direct observation:- clinical signs and symptoms and prognosis are collected by direct observation.

• Method of house to house visit:- vital statistics and morbidity statistics are usually collected by visiting house to house.

• Method of mailed questionnaire:- this method is followed in community where literacy status of people is very high. Prepaid postage stamp is to be attached with questionnaire.

Page 23: Biostatistics khushbu

Presentation of data

• to sort and classify data into groups or classification.• Objective :- to make data simple,

concise,

meaningful,

intresting,

helpful for further analysis.• 2 main methods are

i. Tabulations

ii. Charts and diagrams

Page 24: Biostatistics khushbu

• Tabulation :- • Devices for presenting data simply from masses of statistical

data.• A table can be simple or complex, depending upon the number

or measurment of a single set or multiple set of items.• 3 types:

a. Master table:- contains all the data obtained from a survey.

b. Simple table:- oneway table which supply answers to questions about one characteristics only.

c. Frequency distribution table:- data is first split up into convenient groups and the number of items which occur in each group is shown in adjacent columns.

Page 25: Biostatistics khushbu

Table 1

states population 1st march 2011

Andhra pradesh 8,46,65,533

Madhya pradesh 7,25,97,565

Uttar pradesh 19,95,81,477

Karnataka 7,14,83,435

Rajasthan 18,23,45,998

kerela 6,43,35,772

Page 26: Biostatistics khushbu

Frequency distribution table

• The following figures are the ages of patients admitted to a hospital with poliomyelitis..

8, 24, 18, 5, 6, 12, 14, 3, 23, 9, 18, 16, 1, 2, 3, 5, 11, 13, 15, 9, 11, 11, 7, 106, 9, 5, 16, 20, 4, 3, 3, 3, 10, 3, 2, 1, 6, 9, 3, 7, 14, 8, 1, 4, 6, 4, 15, 22, 2, 1, 4, 6, 4, 15, 22, 2, 1, 4, 7, 1, 12, 3, 23, 4, 19, 6, 2, 2, 4, 14, 2, 2, 21, 3, 2, 1, 7, 19.

Age Number of patients

0-4 35

5-9 18

10-14 11

15-19 8

20-24 6

Page 27: Biostatistics khushbu

Charts and diagrams

1. Histogram

2. Frequency polygon

3. Frequency curve

4. Line chart or graph

5.Cumulative frequency diagram

6. Scatter diagram

Quantitative data

1.Bar diagram

2. Pie or sector diagram

3.Pictogram

4.Map diagram

Qualitative data

Page 28: Biostatistics khushbu

Histogram

Page 29: Biostatistics khushbu

Frequency polygon

Page 30: Biostatistics khushbu

Frequency polygon

Page 31: Biostatistics khushbu

Frequency curve

Page 32: Biostatistics khushbu

Line chart or graph

Page 33: Biostatistics khushbu

Cumulative frequency diagram or Ogive

Page 34: Biostatistics khushbu

Scatter or dot or correlation diagrams

Page 35: Biostatistics khushbu

Bar diagrams

Page 36: Biostatistics khushbu

Pie diagram

Page 37: Biostatistics khushbu

Pictogram or picture diagram

Page 38: Biostatistics khushbu

Map diagram or spot maps

Page 39: Biostatistics khushbu

INTERPRETATION

&

Page 40: Biostatistics khushbu

Measures of central tendency/ statistical averages

• The word “average” implies a value in the distribution, around which other values are distributed.

• It gives a mental picture of the central value.• Commonly used methods to measure central tendency..

a. The Arithmetic Mean

b. Median

c. Mode.

Page 41: Biostatistics khushbu

• Mean = sum of all values

total no. of values• Median = middle value (when the data are arranged

in order.• Mode = most common value

Page 42: Biostatistics khushbu

• For eg..

the income of 7 people per day in rupees are as follows.

5, 5, 5, 7, 10, 20, 102= (total 154)

• Mean = 154/7 = 22• Median= 7

• Median, therefore, is a better indicator of central tendency when more of the lowest or the highest observations are wide apart .

• Mode is rarely used as series can have no modes, 1 mode or multiple modes.

Page 43: Biostatistics khushbu

Measures of Dispersion

• Widely known measures of dispersion are ..

a. The Range

b. The Mean or Average Deviation

c. The Standard Deviation.

d. Range : simplest

difference between highest and lowest figures

for eg.. Diastolic BP – 83, 75, 81, 79, 71, 90, 75, 95, 77, 94

so, the range is expressed as 71 to 95

or by actual difference of 24

Page 44: Biostatistics khushbu

• Merit :- simplest.• Demerit :-

not of much practical importance.

indicates nothing about the dispersion of values

between two extreme values.• Mean deviation:-

average of deviation from arithmetic mean.

M.D. = Ʃ(X – X )

ɳ

Page 45: Biostatistics khushbu

• Standard Deviation :- most frequently used

“ Root Mean Square Deviation”

denoted by greek letter σ or by initials

S.D. = Square root of Ʃ(X-X )2

ɳ

• if sample size is less than 30 in denominator, (ɳ-1)• S.D. gives us idea of the spread of dispersion .• Larger the standard deviation, greater the dispersion of

values about the mean

Page 46: Biostatistics khushbu

Normal distribution

• large number of observations of any variable characteristics.• A frequency distribution table is prepared with narrow class

intervals.• Some observations are below the mean and some are above the

mean.• If they are arranged in order, deviating towards the extremes

from the mean, on plus or minus side, maximum number of frequencies will be seen in the middle around the mean and fewer at extremes, decreasing smoothly on both the sides.

• Normally, almost half the observations lie above and half below the mean and all observations are symmetrically distributed on each of the mean.

Page 47: Biostatistics khushbu

• A distribution of this nature or shape is called normal or gaussian distribution.

Page 48: Biostatistics khushbu

standardized normal curve

• Devised to estimate easily the area under normal curve between any two ordinates.•Smooth•Bell shaped•Perfectly symmetrical curve•Total area of curve is 1 mean=0 standard deviation= 1Mean, Median and Mode all coincide.•Probability of occurrence of any variable can be calculated.

Page 49: Biostatistics khushbu

Estimation of probability (example)

• The pulse of a group of normal healthy males was 72, with a standard deviation of 2. what is the probability that a male chosen at random would be found to have a pulse of 80 or higher?

• The relative deviate (z) = (x-x )

σ

= 80 – 72 = 4

2

The area of normal curve corresponding to a deviate 4= 0.49997, so, probability = 0.5- .49997 = 0.00003 i.e. 3 out of 1,00,000 individuals.

Page 50: Biostatistics khushbu

Areas of the standard normal curve with mean 0 and standard deviation 1

Relative deviate (z)= (x-x) σ

Proportion of area from middle of the curve of designated deviation.

0.00 .0000

0.50 .1915

1.00 .3413

1.50 .4332

2.00 .4772

4.00 . 4999998

Page 51: Biostatistics khushbu

Sampling

• When a large proportions of individuals or units have to be studied, we take a sample.

• It is easier• More economical• Important to ensure that group of people or items included in

sample are representative of whole population to be studied.• Sampling frame: once universe has been defined

a sampling frame must be prepared.

Listing of the members of the universe from which sample is to be drawn.

Page 52: Biostatistics khushbu

• Accuracy

& influences quality of sample drawn from it.

completeness

• Sampling methods

i. Simple random sampling

ii. Systematic random sampling

iii. Stratified random sampling

Page 53: Biostatistics khushbu

Sampling errors

• Repeated samples from same population

• Results obtained will differ from sample to sample.

• This type of variation from one sample to another is called sampling error.

• Factors influencing sample error are:-

a. Size of sample

b. Natural variability of individual readings.• As sample sample size increases, sampling error will

decrease.

Page 54: Biostatistics khushbu

Non – sampling errors

• Errors may occur due to

i. Inadequately caliberated instruments

ii. Observer’s variation

iii. Incomplete coverage achieved in examining the subjects.

iv. Selected and conceptual errors

Page 55: Biostatistics khushbu

Standard error

• If we take random sample (ɳ) from the population,

and similar samples over and over again we will find that every sample will have different mean.(X).

• Make frequency distribution of all sample means.• Distribution of mean is nearly a normal distribution.• Mean of sample means is practically same as population

means.• The standard deviation of the means is a measure of sample

error and given by the formula

standard error = S.D(σ)/ √n

Page 56: Biostatistics khushbu

• Since distribution of means follows the pattern of a normal distribution, it is not difficult to visualize that 95% of sample means follows within limits of two standard error.

• Therefore, standard error is a measure which enables us to judge whether mean of a given sample is within the set confidence limits.

Page 57: Biostatistics khushbu

Tests of significance

• Standard error indicates how reliable an estimate of the mean is likely to be.

• Standard error is applied with appropriate formulae to all statistics, i.e, mean, standard deviation.etc..

i. Standard error of Mean

ii. Standard error of Proportion

iii. Standard error of difference between means

iv. Standard error of difference between proportions

Page 58: Biostatistics khushbu

Standard error of Mean

• we take only one sample from universe, calculate Mean and standard deviation.

• But, how accurate is mean of our sample?• What can be said about true mean of universe.• In order to answer these questions,

we calculate standard error of Mean and set up confidence limits

within which the mean(μ), of the population (of which we have only one sample) is likely to lie.

Page 59: Biostatistics khushbu

let us suppose, we obtained a random sample of 25 males, age 20-24 years whose mean temperature was 98.14 deg.F with a standard deviation of 0.6. what can we say of the true mean of the universe from which the sample was drawn?

Confidence limits on the basis of normal curve distribution- 95% confidence limits= 98.14+ (2 0.12)

Range= 97.90 to 98.38degree F

25

0.6

0.6

√25

0.12

Page 60: Biostatistics khushbu

Standard error of proportion

• Standard error of proportion= √pq/n

S.E. (d) = square root of σ 21 + σ 2 2

Between the means n1 n2• The actual difference between the two means should be more than

twice the standard error of difference between two means.

standard error of difference between two Mean

Page 61: Biostatistics khushbu

• Parametric Statistical Tests: EX: Z test t test F test

• Non Parametric Statistical Tests: EX: Chi- square test sign test

Page 62: Biostatistics khushbu

Types of problems

I Comparison of sample mean with population mean

II Comparison of two sample means

III Comparison of sample proportion with the population proportion

IV Comparison of two sample proportions

Page 63: Biostatistics khushbu

Steps

• Finding out the type of problem and the question to be answered.

• Stating the Null Hypothesis (Ho)• Calculating the standard error• Calculating the critical ratio difference between statistics / standard error• Comparing the value observed in the experiment

with that at the predetermined significant level given by the table

• Making inferences. P<0.05 significant reject the Ho P =0.05 and P>0.05 accept the Ho

Page 64: Biostatistics khushbu

Z Test

Prerequisites to apply Z-test • The sample or the samples must be randomly selected.• The data must be quantitative.• The variable is assumed to follow normal distribution

in the population.• The sample size must be larger than 30

Two types:• one tailed Z test• Two tailed Z test

Page 65: Biostatistics khushbu

• The z- test has 2 applications:

i. To test the significance of difference between a sample mean and a known value of population mean.

Z = Mean – Population mean

S.E. of sample mean

ii. To test the significance of difference between 2 sample means or between experiment sample mean and a control sample mean.

Z = Observed difference between 2 sample means

SE of difference between 2 sample means

Page 66: Biostatistics khushbu

t - Test

Criteria for applying t-test• Random samples• Quantitative data• Variable normally distributed• Sample size less than 30

• Unpaired t-test: applied on unpaired data of independent observations made on individuals of two different or separate groups or samples drawn from two populations

• Paired t-test: applied to paired data of independent observations from one sample only

Page 67: Biostatistics khushbu

• It was designed by WS Gosseett whose pen name was ‘student’.

• The formula used is

t = observed difference between two means of small samples

SE of difference in the same

Page 68: Biostatistics khushbu

F-test (Analysis of variance test)

• Used for comparing more than two samples mean drawn from corresponding normal populations.

Ex: to find out whether occupation plays any part in causation of BP. systolic BP values of 4 occupations are given. Determine if there is significant difference in mean BP of 4 groups in order to assess the role of occupation in causation of BP.

F = Mean square between samples / Mean square within the samples

Page 69: Biostatistics khushbu

Chi-square TestApplication :

1. Proportion:

a) compare the values of two binomial samples even if <30.Ex: Incidence of diabetes in 20 obese and 20 non obese.

b) compare the frequencies of two multinomial samples ex: no of diabetics and non diabetics in groups weighing 40-50, 50-60 and >60 kg

2.Association: It measures the probability of association between two discrete attributes. It has an added advantage that it can be applied to find association or relationship between two discrete attributes when there are more than two classes or groups.

Page 70: Biostatistics khushbu

Ex:- Trial of 2 whooping cough vaccines results of the field trial were as below

Vaccine Attacked Not attacked

Total Attack rate

A B

2214

6872

9086

24.4%16.2%

Total 36 140 176 -

Page 71: Biostatistics khushbu

• Null hypothesis ( Ho):- there was no difference between the effect of two vaccines.

• Calculation of the expected number (E) in each group of the sample or the cell of table

E=(column or vertical total x Row or horizontal total) / sample total

Vaccine Attacked Not Attacked

A O=22E=36x90 / 176 =18.4

O=68E=71.55

B O=14E=17.54

O=72E=68.37

Page 72: Biostatistics khushbu

• Applying the χ² test.χ²= ∑(O-E)² / E

= 0.72+0.17+0.71+0.19 = 1.79• Finding the degree of freedom.

d.f. = (c-1) (r-1) = 1.

• Probability tables. 5% level = 3.84 P >0.05 Accept the

Ho

• Inference:- The vaccine B is not superior to vaccine A

Page 73: Biostatistics khushbu

Restrictions in application of χ² test:

• Will not give reliable result with one degree of freedom if the expected value in any cell is less than 5. Apply Yates correction

χ² = ∑ ( | O – E | - ½ ) / E

• Yates correction cannot be applied in tables larger than 2x2

• Tells the presence or absence of association but does measure strength of association.

• Statistical finding of relationship, does not indicate the cause and effect.

Page 74: Biostatistics khushbu

Correlation and Regression

• To find whether there is significant association or not between two variables, we calculate co- efficient of correlation, which is represented by symbol “r”.

• r = Ʃ (x - x ) (y - y )

√ Ʃ( x-x)2 Ʃ(y-y)2

• The correlation coefficient r tends to lie between – 1.0 and +1.0.

Page 75: Biostatistics khushbu

Types of correlation :-

Perfect positive correlation:• The correlation co-efficient(r) = +1 i.e. both variables rise or fall

in the same proportion.

Perfect negative correlation:• The correlation co-efficient(r) = -1 i.e. variables are inversely

proportional to each other, when one rises, the other falls in the same proportions.

Moderately positive correlation: • Correlation co-efficient value lie between 0< r< 1

Moderately negative correlation:• Correlation coefficient value lies between -1< r< 0

Absolutely no correlation:• r = 0, indicating that no linear relationship exits between the 2

variables.

Page 76: Biostatistics khushbu

conclusion

• Statistics is central to most medical research .• Basic principles of statistical methods or techniques equip

medical and dental students to the extent that they may be able to appreciate the utility and usefulness of statistics in medical and other biosciences.

• Certain essential bits of methods in biostatistics, must be learnt to understand their application in diagnosis, prognosis, prescription and management of diseases in individuals and community.

Page 77: Biostatistics khushbu

References

• PARK’S textbook of preventive and social medicine- 22nd edition.

• Methods in Biostatistics- 7th edition by BK Mahajan.

Page 78: Biostatistics khushbu

Thank you