69
Kingsuk Sarkar, MD Asst. Prof. Dept. of Community Medicine , DSMCH FUNDAMENTALS OF BIOSTATISTICS

Fundamentals of biostatistics

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Fundamentals of biostatistics

K i n g s u k S a r k a r , M D

A s s t . P r o f .D e p t . o f C o m m u n i t y M e d i c i n e , D S M C H

FUNDAMENTALS OF BIOSTATISTICS

Page 2: Fundamentals of biostatistics

COMMON STATISTICAL TERMSstatistics: - It refers to the subject of scientific activity dealing with the theories and methods of collection, compilation, analysis and interpretation of data.

Bio-statistics:- An art & science of collection, compilation, analysis and interpretation of data.

Data(sing. Datum):- A set of observations, usually obtained by measurement or counting

Page 3: Fundamentals of biostatistics

Classification of data-Qualitative/AttributeQuantitative/Variable: Continuous & Discreet

Qualitative Data:- Can not be expressed in number- Not measurable- Can only be categorized under different

categories & frequencies- E.g., Religion is an attribute; can be categorized

into Hindu, Muslim, Christian- Human Blood Group: A,B,AB or O- Sex: M/F

Page 4: Fundamentals of biostatistics

Quantitative Data/variable:- In statistical language, any character, characteristic or quality that varies is called variable

- It has got magnitudeContinuous variable:- It is expressed in numbers & can be measured

- Can take up infinite no. of values in a certain range

- E.g., weight, height, blood sugar

Page 5: Fundamentals of biostatistics

Discreet variable:- Countable only- Takes only some isolated values- E.g., numbers of a family members, no. of workers in a factory, no. of persons suffering from a particular disease

According to source-Primary DataSecondary Data

Page 6: Fundamentals of biostatistics

Primary Data:- Collected directly from the field of enquiry- original in nature- E.g., measurement of BP, weight, height, blood

sugarSecondary Data:- Collected previously by some other

agency/organization- Used afterwards by another- E.g., hospital records, census data

Page 7: Fundamentals of biostatistics

Nominal scalesOrdinal ScalesInterval ScalesRatio Nominal Scales:- Used when data are classified by major

categories or subgroups of population- Religion can be assigned to following

categories- Muslim, Hindu, Christian- Outcome of treatment: cured or not cured; died

or survived

Page 8: Fundamentals of biostatistics

Ordinal Scales:- Assign rank order to categories placed in an

order- E.g., students rank in a class; Grades A,B,C,D;- Literacy status: illiterate, just literate, primary,

secondary, higher secondary, graduate, post graduate

- Disease condition: mild, moderate, severe Interval Scale:- Distance between two measurement is defined,

not their ratio- E.g., intelligence score in IQ tests, temperature

in Centigrade

Page 9: Fundamentals of biostatistics

Ratio Scale:- Both the distance & ratio between two measurements

are defined - E.g., length, weight, incidence of disease, no. of

children in a family Dichotomy/ Binary Scale: - A scale with only two categories- E.g., disease→ present/absent; sex→male /female Population: - An aggregate of objects, animate or inanimate, under study- A group of units defined according to aims & objective

of the study Sample: - a finite subset of or part of population- Every member of population should have equal chance

to be included in sample

Page 10: Fundamentals of biostatistics

Parameter: - constant, describes the characteristics of

population Statistic: - Function of observation, which describes a

sample

Statistic Parameter

Mean x (x bar) µ(Mu)

Standard Deviation s s (sigma)

No. of Subject n N

Proportion P P

Page 11: Fundamentals of biostatistics

SOURCES OF DATA• Main sources for collection of medical statistics are:1. Experiments:- Performed in the laboratories of physiology, biochemistry,

pharmacology,, clinical pathology - Hospital words→ for investigations & fundamental research- Used in preparation of thesis/dissertation, scientific paper for

publication in scientific journals & books2. Surveys:- Carried out for epidemiological studies in the field by trained

teams to find out incidence or prevalence of health or disease situations in a community

- Used in OR→ assessment of existing condition, how to follow a program, to study merits of different methods adopted to control of a disease

- Provide trends in health status, morbidity, mortality, nutritional status, health practices, environmental hazards

- Provide feedback needed to modify policy- Provide timely earning of public health hazards

Page 12: Fundamentals of biostatistics

3. Records:- Maintained as a routine in registers or books

over a long period of time- Used for keeping vital statistics: births, deaths,

marriage, hospitalization following illness,- Used in demography & public health practices- Collected data are qualitative

Page 13: Fundamentals of biostatistics

PRESENTATION OF DATA DATA

INFORMATION

Statistical data is presented usually in tabular forms through different types of tables and in pictorial forms; diagrams, charts

Method of presentation:A. TabulationB. Drawing

Consolidation & summarization

Page 14: Fundamentals of biostatistics

Tabular presentation:- A form of presenting data from a mass of

statistical data- at first frequency distribution table is prepared- Table can be simple or complex• Frequency distribution table or frequency table:- All frequencies considered together form

“frequency distribution”- No of person in each group is called the

frequency of that group- Frequency distribution table of most biological

variables develop normal, binomial or Poisson distribution.

Page 15: Fundamentals of biostatistics

For qualitative data-Here is no notion of magnitude or size of attribute

Page 16: Fundamentals of biostatistics

• Presentation of quantitative data is more cumbersome as- Characteristic has a measured magnitude as well as

frequency

- Table x: presentation of quantitative data of height in markings

Height of groups in Cm

Markings Frequency of each group

160-162 //// //// 10

162-164 //// //// //// 15

164-166 //// //// //// // 17

166-168 //// //// //// //// 19

168-170 //// //// //// //// 20

170-172 //// //// //// //// //// / 26

172-174 //// //// //// //// //// //// 29

174-176 //// //// //// //// //// //// 30

176-178 //// //// //// //// // 22

178-180 //// //// // 12

Total 200

Page 17: Fundamentals of biostatistics

- Data needs consolidation by way of tabulation to express some meaning

- Tabulation → a process of summarizing raw data & displaying it in a compact form for further analysis

- Orderly management of data in columns & rows

Page 18: Fundamentals of biostatistics

•General Principle in designing Table:- Table should be numbered- Brief & self-explanatory title should be there

mentioning time, place, person- Headings of columns & rows should be clear & concise- Data to be presented according to size of importance

chronologically, alphabetically, geographically- Data must be presented meaningfully- Table should not be too large- Foot notes given, if necessary- Total no of observations ; the denominator should be

written- Information obtained should be summarized in the

table

Page 19: Fundamentals of biostatistics

• Frequency distribution drawings:- After classwise or groupwise tabulation, the frequencies of a charecteristics can be presented by two kinds of drawings-Graphs & Diagrams-May be shown by either lines, dots, figures o Presentation of quantitative data is

through graphso Presentation of qualitative, discreet,

counted data is through diagrams

Page 20: Fundamentals of biostatistics

Presentation of Quantitative data:1. Histogram- Graphical presentation of frequency

distribution- Variable characters of different groups are

indicated in the horizontal line (x-axis) is called abscissa

- No. of observations marked on the vertical line (y-axis) is called ordinate

- Frequency of each group forms a triangle

Page 21: Fundamentals of biostatistics

2. Frequency Polygon:- An area diagram of frequency distribution

developed over a histogram- Mid points of the class intervals at the height of

frequency are joined by straight lines- It gives a polygon, figure with many angles

Page 22: Fundamentals of biostatistics

3. Frequency Curve:- If no. of observation are very large & group interval reduced- Frequency polygon tends to loose its angulation-Gives rise to a smooth curve → frequency curve

Page 23: Fundamentals of biostatistics

4. Line Chart or Graph:- A frequency polygon presenting variation by lin- Shows trend of event occurring over a period of

time- Shows rise, fall or periodic fluctuations vertical

axis may not start from zero, but some point above frequency

Page 24: Fundamentals of biostatistics

5. Cumulative Frequency Diagram or “Ogive”- Graph of the cumulative frequency distribution- An ordinary frequency distribution table→

relative frequency table- Cumulative frequency: total no. of persons in

each particular range from lowest value of the characteristic up to & including any higher group value

Page 25: Fundamentals of biostatistics

6. Scatter or Dot Diagram:- Prepared after tabulation in which frequencies

of at least two variables have been cross classified- Shows nature of correlation between two

variable character in same person(s)( e.g., height & weight)- Also called correlation diagram

Page 26: Fundamentals of biostatistics

Presentation of illustration of qualitative data

1. Bar Diagram:- Graphically present frequencies of different categories

of qualitative data- Vertical/ horizontal- May be descending/ascending order- Widths should be equal- Spacing between bars should also be equali. Simple Bar Diagram:- Each bar represents frequency of a single category with

a distinct gap from one another

Page 27: Fundamentals of biostatistics

ii. Multiple bar diagram:-- Used to show comparison of two or more sets of related

statistical data

iii. Component/ proportional bar diagram:- Used to compare sizes of different component parts

among themselves- Also shows relation between each part & the whole

Page 28: Fundamentals of biostatistics

2. Pie/ sector Diagram:- A circle whose area is divided into different

segments by different straight lines from cenre to circumference- Each segment express proportional components

of the attributes- Angle (◦) of a sector is calculated by Class frequency X 3.6 or(Class frequency/total frequency)X 360

Page 29: Fundamentals of biostatistics

3. Pictogram/ Picture Diagram:- A popular method to denote the frequency of the occurrence of events to common man such as attacks, deaths, number operated, admitted, discharged, accidents, etc. in a population.

Page 30: Fundamentals of biostatistics

• 4. Map diagram/ spot Map:- These diagrams are prepared to visualize the geographic distribution of frequency of characteristics-One point denotes occurrence of one more events

Page 31: Fundamentals of biostatistics

MEASURES OF CENTRAL TENDENCY• When a series of observations have been

tabulated in the form of frequency distribution→→it is felt necessary to convert a series of

observation in a single value, that describes the characteristics of that distribution,→ called Measure Of Central Tendency• All data or values are clustered round it• These values enable comparisons to be made

between one series of observations and another• Individual values may overlap, two distributions

have different central tendency• E.g., average incubation period of measles is 10

days and that of chicken pox is 15 days.

Page 32: Fundamentals of biostatistics

Types : Central tendency

Measures of Central tendency

Mean Mode

Median

Arithmetic Geometric Harmonic

Mean(AM) Mean(GM) Mean(HM)

Page 33: Fundamentals of biostatistics

•Arithmetic mean:- Sum of all observations divided by number of observations-Mean(x)=Sx/n; x is a variable taking different observational values & n= no. of observations- Exmp.• ESR of 7 subjects are 8,7,9,10,7,7, & 6 mm for 1st hr. Calculate mean ESR.

- Mean(x)= (8+7+9+10+7+7+6)/7=54/7=7.7 mm

Page 34: Fundamentals of biostatistics

• Median : when observations are arranged in ascending or

descending order of magnitude, the middle most value is known as Median.• Problem:- From same example of ESR, observations are

arranged first in ascending order: 6,7,7,7,8,9,10.- Median= {7+1}/2=8/2=4th observation I,e., 7- When n is Odd no., Median={n+1}2 th

observation- When n is Even no., Median={n/2th +

(n/2+1)th}/2 th observation• Problem: suppose, there are 8 observations of ESR

like 5,6,7,7,7,8,9,10• Median={8/2th +(8/2+1)th}/2={4th+5th

obs}/2=(7+7)/2=7

Page 35: Fundamentals of biostatistics

•Mode:- The observation, which occurs most frquently in series• Problem: ESR of 7 subjects are 8,7,9,10,7,7, & 6 mm for 1st hr. Calculate the Mode.

- Mode is 7.

Page 36: Fundamentals of biostatistics

• Calculation of weighted arithmetic mean:- Following methods are utilized in case of large no.

of observationsFor Ungrouped Data:- Suppose we have x₁, x₂, x₃,…nth observations with

corresponding frequencies f₁, f₂,f₃,…fn

- Mean=

Page 37: Fundamentals of biostatistics

For grouped Date:- Data are arrange in groups & frequency

distribution table are prepared- Mean value of each group is multiplied by

frequency- Sum of product value is divided by total no of

observations- Mean such obtained is called “ weighted mean”- Mean(x)=

Page 38: Fundamentals of biostatistics

• Geometric mean:- Used when data contain a few extremely large

or small values- It’s the nth root product of n observastions • GM=ⁿ√(x₁.x₂.x₃….xn)• Harmonic Mean:- Reciprocal of the arithmetic mean of reciprocals of

observationsarithmetic mean of reciprocals of observations=S(⅟x)- HM=n/S⅟x- got limited use- A.M>GM>HM

Page 39: Fundamentals of biostatistics

Measures of dispersion

• Measures of central tendency do not provide information about spread or scatter values around them• Measures of dispersion helps us to find how

individual observations are dispersed or scattered around the mean of a large series of data• Different measures of Dispersion are:i. Rangeii. Mean deviationiii. Standard deviationiv. Variancev. Coefficient of variation

Page 40: Fundamentals of biostatistics

•Range:- Difference between highest & lowest value- Defines normal value of a biological

characteristic• Problem: Systolic blood pressure (mm of Hg) of

10 medical students as follows: 140/70, 120/88, 160/90, 140/80, 110/70, 90/60, 124/64, 100/62, 110/70 & 154/90• Range of Systolic BP of medical students =

highest value- lowest value=160-90=70mm of Hg• Range of Diastolic BP= 90-60=30 mm of Hg

Page 41: Fundamentals of biostatistics

• Mean deviation:- Average deviations of observations from mean

value- Mean Deviation(S) =(x-x)/n, where x=observation, x=Mean

Page 42: Fundamentals of biostatistics

• Standard Deviation:- Most frequently used measures of dispersion- Square root of the arithmetic mean of the

square of deviations taken from the arithmetic mean.- In simple term “ Root-Mean-Square-Deviation”- s)

- Where x= observationX=Mean

n=no. of observations

Page 43: Fundamentals of biostatistics

• To estimate variability in population from values of a sample, degree of freedom is used in placed of no. of observations• Standard deviation is calculated by following stages:- Calculate the mean- Calculate the difference between each observation &

mean- Square the difference- Sum the squared values- Divide the sum of squares by the no. of observations(n)

to get mean square deviation or variances(s)- Find the square root of variance to get “Root-Mean-

Square-Deviation”• Use: sample size calculation of any study- Summarizes deviation of a large series of observation around mean in a single value

Page 44: Fundamentals of biostatistics

• Coefficient of Variation:- Used to denote the comparability of

variances of two or more different sets of observations- Coefficient of Variation=(Sd/Mean)X100- Coefficient of Variation indicates relative

variability

Page 45: Fundamentals of biostatistics

NORMAL DISTRIBUTION• Most important useful distribution in theoretical

statistics • Quantitative data can be represented by a histogram &

by joining midpoints of each rectangle in the histogram we can get a frequency polygon• when no. of observations become very large & class

intervals get very much reduced→ frequency polygon loses its angulation →gives rise to a smooth curve known as frequency curve,• Most biological variables , e.g., height, weight, blood

cholesterol etc, follows normal distribution can be graphically represented by “normal curve”

Page 46: Fundamentals of biostatistics

• If a large no. of observations of any variables such as height, weight, blood pressure, pulse rate etc. are taken at random to make a representative sample of the world and if a frequency distribution table is made, it will show following characteristics:- Exactly half the observations will lie above &

half below the mean and all observations are symmetrically distributed on either side of mean- Maximum no. of frequencies will be seen in the

middle around the mean and fewer at extremities, decreasing smoothly on both sides

Page 47: Fundamentals of biostatistics

• Mathematically can be expressed as following:-

-Mean

- Mean - Mean - A distribution of this nature or shape is called

Normal distribution or Gaussian Distribution- This distribution pattern is usual for biological

variables

Page 48: Fundamentals of biostatistics

• Normal Curve:- Observations of a variable, which are normally

distributed in a population, when plotted as a frequency curve will give rise to Normal Curve• Characteristics of a Normal Curve:- Smooth- Bell shaped- Bilaterally symmetrical- Mean, Median, Mode coincide- Distribution of observation under normal curve

follows the same pattern of normal distribution as already mentioned

Page 49: Fundamentals of biostatistics

• Standard Normal Curve:- Each observation under a normal curve has a ‘Z’

value- ‘Z’ or standard normal variate or relative deviate

or critical ratio is the measure of distance of the observation from mean in terms of standard deviation

- If ‘Z’ score is -2→ observation is 2 S.D. away from mean on left hand side; if it is +2, I implies the observation is 2 S.D. away on right hand side.- If all observations of normal curves are replaced

by ‘Z’ score, virtually all curves become identical- This standardized curve with ‘0’ mean and 1

variance is known as “standard Normal Curve”

Page 50: Fundamentals of biostatistics

• It has got all properties of Normal Curvwe• follows normal distribution with ‘0’ mean & 1 variance• Area under the curve is 1• Mean, Median, & Mode coincide & they are 0• Standard deviation is 1

Page 51: Fundamentals of biostatistics

SAMPLING TECHNIQUE Universe/population:- Aggregate of units of observation about which certain

information is required- Population is a set of persons (or objects) having a

common observable characteristics- E.g., while recording pulse rate of boys in a school, all

boys in the school constitute the population/universe Sample:- A portion or part of total population selected in some

manner Sapling Frame:- A complete, non-overlapping list of all the sampling

units (persons or objects) of the population from which the sample is to be drawn

- E.g., telephone directory acts as a frame for conducting opinion survey in a city

Page 52: Fundamentals of biostatistics

• Statistic:- A characteristic of a sample, whereas a • parameter - a character of a populationTypes of sampling: non-probability &

probability/random sampling• Non-probability sampling:- Easier, less expensive o perform- Sampling is done by choice & not by chance- Information collected cannot be presumed to be

representative of the whole universe- E.g, Quota Sampling, convenience sampling,

Purposive sampling, Snowball Sampling, Case Study

Page 53: Fundamentals of biostatistics

•Probability/Random Sampling:- Sample are selected from universe by proper sampling technique- Each member of the universe has equal opportunity to get selected- Composition of sample from universe occurs only by chance

Types:oSimple Random Sampling:

Page 54: Fundamentals of biostatistics

oStratified Random Sampling:oSystemic Random Sampling:oCluster Sampling:oMultistage sampling:oMultiphase Sampling:

Page 55: Fundamentals of biostatistics

Thank You

Page 56: Fundamentals of biostatistics

• Exercise no. 1Following are the diastolic blood pressure values (in

mmHg) of 10 male adults.80, 60, 70, 80,65, 74, 66, 80, 70, 55Solution:Mode= 80 Arranging in ascending order:

55,60,65,66,70,70,74,80,80,80Median={10/2th+(10/2+1)th}/2={5th +

6th}/2={70+70}/2=70Mean=700/10=70

Page 57: Fundamentals of biostatistics

Exercise No. 5.The following table shows the number of children

per family in a village

Calculate the measure of central tendency:

No of children per family No of families

0 30

1 40

2 70

3 30

4 20

5 10

Page 58: Fundamentals of biostatistics

Solution:Table 1.1 showing number of children in families

• Average (x)no. of children=400/200=2

No. of children in a family(x)

No. of families(f)

Total no. of children(fx)

0 30 0x30=0

1 40 1x40=40

2 70 2x70=140

3 30 3x30=90

4 20 4x20=80

5 10 5x10=50

Total 200 400

Page 59: Fundamentals of biostatistics

Exercise no. 8Marks obtained by 50 students in community medicine in

final MBBS Part-I Exam as follows:

Calculate central tendency.

Marks No. of students

41-50 5

51-60 18

61-70 15

71-80 7

81-90 5

Page 60: Fundamentals of biostatistics

• Solution:

Average marks obtained by students=3165/50=63.3

Marks obtained

No. of students(f)

Mid value of marks group(x) of students

Total marks obtained by each group(fx)

41-50 5 45.5 227.5

51-60 18 55.5 999

61-70 15 65.5 982.5

71-80 7 75.5 528.5

81-90 5 85.5 427.5

Total 50 3165

Page 61: Fundamentals of biostatistics

Calculation of Median:

N/2=3165/2=1582.5Median class=60.5-70.5Median=L+{(N/2 –cf) xh}/f• where:• L = lower boundary of the median class

h= class width N = total frequencycf = cumulative frequency of the class previous to the median classf = frequency in the median class

Class boundary frequency Cumulative frequency

40.5-50.5 227.5 227.5 <N/2

50.5-60.5 999 Cf=1226.5 <N/2

60.5-70.5 f=982.5 2209 >N/2

70.5-80.5 528.5 2737.5

80.5-90.5 427.5 3165

Total 3165

Page 62: Fundamentals of biostatistics

• Median= 60.5+ (1582.5 - 1226.5)x10/982.5 = 60.5 + 3560/982.5 = 60.5 + 3.62 = 64.12

*Modal class: the class having maximum frequency

Class boundary frequency

40.5-50.5 f1=227.5

50.5-60.5 fm=999 Modal Class

60.5-70.5 f2=982.5

70.5-80.5 528.5

80.5-90.5 427.5

Total 3165

Page 63: Fundamentals of biostatistics

• Mode=L + (fm –f1)/(2fm- f1 – f2)x h

Where, L= lower boundary of modal classfm =Frequency of modal class

f1= frequency of pre-modal class

f2= Frequency of post-modal class

h= width of modal classMedian= 60.5 +(999 –227.5 )/(2x 999- 227.5-

982.5 )x10 =60.5 -771.5/(1998-1210)x10 =60.5 – 771.5/788x10 =60.5 – 9.79 =50.71

Page 64: Fundamentals of biostatistics

• Exercise no. 11Calculate measures of dispersion from following data:15,17,19,25,30,35,48Solution:Range=48- 15= 33Mean deviation= Σ(x- x)/n

Observation(x) Mean(x) (x-x)

15 X=Σx/n=189/7=27 -12

17 -10

19 -8

25 -2

30 3

35 8

48 11

Σx=189 Σ(x-x)=54, ignoring- or + signs

Page 65: Fundamentals of biostatistics

X• Standard deviation:

SD=√(506/10)=√50.6=

Observation(x)

Mean(x) Deviation (x-x)

(x-x)2

15 X=Σx/n=189/7=27

-12 144

17 -10 100

19 -8 64

25 -2 4

30 3 9

35 8 64

48 11 121

Σx=189 Σ(x-x)=54, Σ(x-x)=506

Page 66: Fundamentals of biostatistics

• Coefficient of variation=(SD/Mean)x 100 =√50.6/27 x 100 =

Page 67: Fundamentals of biostatistics

• Exercise no. 20In the following data A & B are given below:

Calculate mean deviation & standard deviation.

A-item B-frequency

10-20 4

20-30 8

30-40 8

40-50 16

50-60 12

60-70 6

70-80 4

Page 68: Fundamentals of biostatistics

• Solution:a=assumed mean

SD=√{(sumfd1)2 – (sum fd1)/N}2/√(N-1) x h

• x= sumfd1 x h + a

Data A - Class interval

Data B- frequency (f)

Mid value (x)

d1=(x-a)/h

fd1

fd12

10-20 4 15 (15-35)/10=-2

-8 64

20-30 8 25 -1 -8 64

30-40 8 a=35 0 0 0

40-50 16 45 1 16 256

50-60 12 55 2 24 576

60-70 6 65 3 18 324

total 54 Σfd1=74 Σfd12=128

4

Page 69: Fundamentals of biostatistics

• SD=√{1284- 74/54}/√(54-1) x 10 = √{1284- 1.37}/√53 x 10 = √( 1282.63/53) x 10 = √24.2 x 10