68
Welcome to the

Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: [email protected]

Embed Size (px)

Citation preview

Page 1: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Welcometo the

Page 2: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Biostatistics 1

Course instructor: Dr. JMA Hannan

Class hours: Monay 6:00 pm – 9.00 pm

Cell: 01199248989

E-mail: [email protected]

Page 3: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

How to do well in this class?

1. Forget about your previous failure.

2. Attend lectures and take notes.

3. * Effort = Result.

4. Read the syllabus.

5. Read exam questions carefully.

6. Answer all parts of a given question.

7. Turn assignments in on time

8. Ask if you have questions.

Page 4: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

General Policy

Examination Marks

Midterm 1 20%Midterm 2 20%Final exam 40% Class tests 10%Assignment 5%Class participation 5%

Total marks 100

Page 5: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Grading PolicyNumerical Scores Letter Grade

93 and aboveA

90 – 92A-

87 – 89B+

83 – 86 B80 – 82

B-77 – 79

C+ 73 – 76

C70 – 72

C-67- 69 D+60 – 66

D<60 F

(Fail)

If you are absent in 3 consecutive classes you will be given “F”

Page 6: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Lecture 1 : Introduction to Biostatistics – scope of Biostatistics in biology and medical sciences. Data & presentation, Mean, Median and Mode; Rang, Standard Deviation, Standard error and Coefficient of variation.

Lecture 2: Normal distribution , Test of hypothesis

Lecture 3 : z-test, t-test

Lecture 4 : One way ANOVA

1st Midterm Exam (July 10 - 15, 2008)

Lecture 5 : Post Hoc tests (Bonferroni, Duncan, Dunnet, LSD, Tukey test), Repeated measure ANOVA

Lecture 6: Mann-whitney, Wilcoxon rank test & Kruskal-Wallis test (Tukey test)

Lecture 7: Chi-square test, Relative risk, Odds ratio

Lecture 8: Simple Correlation & Rank Correlation

Lecture 9: Regression analysis.

Lecture 10-12: Introduction to SPSS and analysis of data using SPSS.

Lecture 13 - : Review class

Final Exam (September 10 – 15, 2008)

Topics

Page 7: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

• To develop and understand the fundamental concepts of statistics.

• To be knowledgeable about different application of statistical methods in the MPH context.

• To enable students to conduct statistical analyses via a user friendly software package like SPSS and to correctly interpret the output.

• To be capable to correctly analyze simple data sets and to report the results in a precise and concise way.

Objective of this course

Page 8: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Textbook and reference books

• Text Book of Medical & Pharmaceutical Statistics – Dr JMA Hannan

• Biostatistics : A Foundation for Analysis in the Health Science, by Wayne W. Daniel.

• Medical Statistics by Michael J. Campbell, David Machin.

Page 9: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

STATISTICS - HISTORICAL PERSPECTIVES

Statistics seems to be derived from Latin word ‘Status’ or Italian word ‘Statista’ or German word ‘Statistik’ or French word ‘Statistique’ which all meaning ‘political state’.

In ancient time the king used to collect information about total population, land, wealth, soldiers of the country and thus statistics served as an index of a country’s overall condition. In olden days, statistics was regarded as ‘the science of kings’.

Page 10: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

STATISTICS - HISTORICAL PERSPECTIVES

In mid 17th century, the theoretical development in modern statistics came with the introduction of ‘Theory of Probability’ and ‘Theory of Games and Chances’.

Gambling, in the form of games of chance, led to this theory of probability being originated by the French mathematician Pascal (1623-1662).

Pascal

Page 11: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

STATISTICS - HISTORY PERSPECTIVES

Francis Galton (1822-1921) introduced the concept of regression line.

Galton and his friend Karl Pearson later introduced correlation analysis and chai-square test which play an important role in modern theory of statistics.

W.S. Gosset (1876-1937), student of Karl Pearson, introduced ‘Student t-test’ is the basic tool of statistical analysis.

F. Galton

W.S. Gosset

Karl Pearson

Page 12: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

STATISTICS - HISTORY PERSPECTIVES

Sir R.A. Fisher (1890-1962), known as the father of statistics, introduced a number of statistical procedures such as Analysis of Variance (ANOVA) and design of experiments and so on.

R.A. Fisher

Page 13: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Statistics is the science of

• collection

• organization

• presentation

• analysis and

• interpretation of data

Croxton and Cowden have given a very simple and concise definition of statistics.

DEFINITION OF STATISTICS

Page 14: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

BIOSTATISTICS is derived from Greek word Bios (Life) & Metron (Measure).

Thus biostatistics is the term used when tools of statistics are applied to the data that derived from biological and medical science.

Biostatistics is the science of

• Collection

• Organization

• Presentation

• analysis and

• interpretation of data that is derived from

biological sciences such as medicine.

DEFINITION OF BIOSTATISTICS

Page 15: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

• Simplifies complexity

• Helps to compare

Why use Statistics?

Page 16: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

APPLICATION OF OF BIOSTATISTICS

In the field of medicine, statistical methods are used to evaluate effectiveness of a new drug and method of treatment. A drug is given to animal or human to explore whether the changes produced by the drug are due to the action of drug or by chance, or to compare the action of two or more different drugs or different dosages of the same drug are studied using statistical methods.

To find an association between disease and risk factors such as myocardial infarction (MI) and alcohol intake, we need the help of statistics.

To define the normal range/limit of physiological and biochemical parameters for example: the average systolic blood pressure is 120 mmgHg or random blood glucose level is 6.7mmol/l but upto what limits it may be normal on either side of average which may be established with appropriate statistical technique.

In Medicine

The concepts of statistics may be applied to a number of fields that include public health, pharmaceutical company, business, psychology, agriculture etc.

Page 17: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

In Community Medicine and Public Health

In epidemiological studies – the role of causative factors is statistically tested.

For example, deficiency of iodine as an important cause of goiter in a

community is confirmed only after comparing the incidence of goiter cases

before and after giving iodized salt.

To test usefulness of vaccines in the field – percentage of attacks or deaths

among the vaccinated subjects is compared with that among unvaccinated ones

to find whether the difference observed is statistically significant.

Statistics play an important role in many decisions –making processes in public

health like:

What factors increase the risk that an individual will develop coronary hart disease?

To address these issues and others, we rely on the methods of bio-statistics.

APPLICATION OF OF BIOSTATISTICS

Continuation……..

Page 18: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

What is data? • The raw material of statistics is data.

• We may define data as numbers or observations usually obtained by some process of counting or measurement.

• It is the outcome of

facts (sex, occupation), events (birth, death, disease) measurements (height, weight)

About many individual i.e. when these happens for number of people then it becomes data e.g.

Sex: male/female,

Birth: live birth/still birth

Death: cause/age/sex

Occupation: teacher/physician/labor etc

Page 19: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Types of data

1. Qualitative data

» Nominal data» Rank data

2. Numerical or Quantitative» Discrete data» Continuous data

Page 20: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Qualitative Data

Nominal Data

• Nominal data are data that one can name.

• They are not measured but simply counted.

• They often consist of unordered ‘either-or’ type observations,

• for example: Dead or Alive; Male or Female; Cured or Not Cured; pregnant or Not pregnant

Page 21: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Qualitative Data

Ranked Data

• If there are more than two categories of classification it may be possible to order them in some way.

• For example, after treatment a patient may be either improved, the same or worse; a woman may never have conceived, conceived but spontaneously aborted, or give birth to a live infant.

• In some situations we have a group of observations that are first arranged from highest to lowest according to magnitude and then assigned numbers correspond to each observation’s place in the sequence. This type of data is known as ranked data.

Page 22: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Numerical/Quantitative data

Discrete Data• Such data consist of counts which are only

isolated points. • Example may be the number of deaths in a

hospital per year.

Continuous Data• Such data are measurement that can, in theory

at least, take any value within a given range.

• Example: Diastolic blood pressure, which is continuous, is converted into hypertension and normotension.

Page 23: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

1. Interviewing or enumeration

2. Questionnaire3. Experiments

Data from Physiology, Pharmacolgy and clinical pathology lab, hospital ward, fundamental research etc

4. Surveys Data of incidence/prevalence of health or disease situation in a community such as incidence of malaria or prevalence of leprosy etc

5. Records Records are maintained as a routine in register or books over along period of time for still birth, death etc. Data are collected from these records.

Collection of data

Page 24: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

1. Tabulation of data2. Diagrammatic presentation

Methods of presentation of data

Every study or experiment yields a set of data. Its size can range from a few measurements to many thousands of observations.

The principal object of data presentation, whether tabular or graphical, is to convey the essential features of the study to any reader of the final publication.

Page 25: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Tabulation of data

Objectives of tabulation:

To clarify the object of investigation.

To simplify complex data.

To facilitate comparison

A statistical table is a systematic organisation of data in columns and rows in accordance with some characteristics. Tabulation is the process of presenting data in tables.

Page 26: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Rules for Tabulation of data

Construction of a good statistical table is a specialized art and requires great skill, experience and common sense.

The table should be simple and compact

All title, subtitle, caption etc should be arrange in a systemic

manner.

The unit of measurement should be clearly defined in the table.

A table should be complete and self-explanatory.

A table should be attractive to draw attention of readers.

Accurate statistical analysis should be done.

Abbreviation should be avoided

• If units of measurements are involved, such as mg/100 ml for the serum cholesterol levels should be specified.

Page 27: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Parts of tabulation

• Table number• Title• Caption• Headings of columns and rows• Body of the table• Foot-note

Row Heading

Caption TotalCol. Heading Col. Heading Col. Heading

Row sub heading

Row sub heading

Body

Row sub heading

Number & Title of the table

Page 28: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Year Number of cigarettes consumption per adolescent boy

1996 654

1997 700

1998 900

1999 1200

2000 1500

2001 1350

The following Table Shows the consumption per person among adolescent boys

Page 29: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Example: Tabulation of data

Group Blood pressure

Systolic BP Diastolic BP MAP

Control (n=10) 210±40 100±24 74±29

Propranolol treated (n=20) 120±30 65±20 60±23

t/p value (upaired t-test)

Control vs propranolol 4.23/0.01 3.12/0.02 2.13/0.05

Data are presented as mean±SD. Unpaired t-test was done as the test of significance. *p<0.05, **p<0.01.

Table 2: Effects of propranolol on blood pressure in human

Page 30: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Graphical presentation of data

Importance of diagrams:

They are attractive and impressive.

They save time and labour to understand

They make data simple.

They make comparison easy

They provide more information than table

A diagram is a visual form for presentation of data. Complicated data through a diagram or graph can easily be understood. It is convincing to the eye and mind.

Page 31: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Types of diagram

Line diagram.

bar diagram (simple & multiple)

Pie diagram

Histogram

Scatter

Page 32: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Line DiagramNumber of cigarettes consumption per adolescent

boy

0

500

1000

1500

2000

1996 1997 1998 1999 2000 2001

Page 33: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

0

2

4

6

8

10

0 min 30 min 60 min 90 min 120 min

Glucose Diadetic food1

Fig: AUC and Glycemic index of diabetic food in rats.

Page 34: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Bar Diagram showing Number of cigarettes consumption per adolescent boy

Number of cigarettes consumption per adolescent boy

0

500

1000

1500

2000

1996 1997 1998 1999 2000 2001

Page 35: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

0

50

100

150

200

250

AUC GlycemicIndex

Glucose only Diabetic food

Fig: AUC and Glycemic index of diabetic food in rats.

Page 36: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Pie Diagram for Current Contraceptive Use in Bangladesh(BDHS 2004)

45%

17%

7%

10%

2%

19%Pill

Injectable

Condom

Sterilization

IUD and Norplant

Traditional

Page 37: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

SUMMARIZING DATA: Measures of location

A measure of location or central tendency or average is a single value used to represent a set of data.

Objective of average:

1.    To get single value that represent the entire data.

2.    To facilitate comparison between groups of data of similar nature.

Important measures of central tendency are:

1. Mean2. Median3. Mode

Page 38: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Mean = sum of all the observation values ÷ number of observations

Mean

The mean of ‘n’ observations is given by

where x stands for an observed value. n stands for the number of observations in the data set. stands for the sum of all observed x values. stands for the mean value of x.

Example: mean of 10, 20, 30, 25, 15 is (10±20±30±25±15)/5 = 20.

x nxxxx ............,, 321

n

xxxxx n........,, 321

n

xx

Page 39: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Merits:• It is the most popular average easy to understand and easy to calculate.

• It takes all the observation into account.

• The mean is used in computing other statistics (such as the variance, standard deviation etc)

Mean

Limitation:

• Mean is affected by extremely high or low values.

• It is not a good measure of average in extremely asymmetric distribution of observations.

Page 40: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Median

When all the observations of a set of data are arranged in either ascending or descending order, the middle observation is known as median. If the number of observation is even, the mean of the two central values is taken as the median.

Median = the middle value of a set of data.

Page 41: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Median for group data

Example: Median of a grouped frequency distribution  Mark in test Frequency Cumulative Range of frequency Cumulative frequency

5 - 9 12 12 < 12

9 - 13 8 20 12 - 20

13 - 17 15 35 20 - 35

17 - 21 19 54 35 - 54

21 - 25 14 68 54 - 68

25 - 29 7 75 68 - 75

75Here n= 75, Therefore n/2 = 75/2 = 37.5. Looking at the cumulative range column in the table, we find that n/2 (37.5) falls in the range 17 – 21. This means that median value lies between 17 and 21. L = 17, F = 35, f = 19, c = 5.

Here = 17 ± = 17.66

Where, L = The lower limit of the median class (median class is that class which contains n/2 observations of the series). N = Total number of observation F = Cumulative frequency of the class just preceding the median class. f = Frequency of the median class c = The class interval of the median class.

cf

Fn

LMedian

2 519

355.37

Page 42: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Median

Merits:

• Median is easy to understand and easy to calculate.

• It is not affected by extremely high or low values.

Limitation:

• It is not based on all the observations. It is a position average and thus it is not determined by each and every observation.

• It is less reliable average than mean when number of observation is small.

Page 43: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

The mode is the value of a set data that occurs most frequently. It is the typical or commonly observed value which occurs maximum number of times.

Mode

Example: the mode of the observations 3, 6, 7, 9, 6, 8, 6 = 6.

Page 44: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Mode

Merits:• Mode is easy to understand and easy to calculate.

• Like median, mode is not at all affected by extremely high or low values.

• When there is a large frequency in a distribution, mode happens to be meaningful as an average.

Limitation:• It is not based on all the observations.

• It is less reliable average than mean when number of observation is small.

Page 45: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Since average is a single value

representing a group of values it

must be properly interpreted

otherwise there is a possibility to

wrong conclusion.

Page 46: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

COMPARISON OF MEAN, MEDIAN & MODE

• The mode is useful for non-numeric data. It provides little information about the rest of the values in the data.

• The mean can be seriously affected by the presence of outliers (When an observation is very different from all other observations in a data set, it is called an outlier i.e very small or large values, eg. 200) but the median is not.

5 7 8 8 12 15 19 21 23

median = 12, mean = 13.1

5 7 8 8 12 15 19 21 23 200

median = 13.5, mean = 31.8

• The median (a position average) does not alter because it is only dependent on the middle observation's value. The mean does change, however, because it is dependent on the average value of all observations. So, in the above example, as the last value of the last observation increases, so too does the mean.

• Outliers can sometimes occur as a result of error or deliberate misinformation. In these cases, the outliers should be excluded from the measure of central tendency. Other times, outliers just show how different one value is, and this can be a very useful piece of data.

Page 47: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

COMPARISON OF MEAN, MEDIAN & MODE

3. Half of the data lies below the median and half of the data lies above

it. This will be approximately true for the mean when the data is

symmetric. If the data is skewed, then the median may differ

significantly from the mean and usually the median would be used.

4. By choosing a wrong measure of central tendency, one can mislead

people with statistics. In fact, this is commonly done.

Cont’d

Page 48: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

SUMMARIZING DATA: Measures of variation

Measure of Dispersion (variation) is the measure of extent of deviation of individual value from the central value (average). It determines how much representative the central value is. Dispersion is small if the values are closely bunched about their mean and it is large if the values are scatted widely about their mean.

The median and mean mark for both tests are 20 but data A is more spread out than data B.

Page 49: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Important measures of dispersion are:

1. Range

2. Variance & standard deviation

3. Standard error of Mean

4. Co-efficient of variation.

Page 50: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Range is the absolute difference between the highest value and the lowest value in a series of observations.

Range = largest value - smallest value

Range

Example: the weight of 10 students are:

25, 28, 33, 36, 40, 45, 49, 52, 55, 57.

Range is 57 – 25 = 32. • The range is the simplest measure of dispersion.

• It is a rough measure of dispersion as its measure depends upon the extreme items and not on all the items.

• It does not tell us anything about the distribution of values in the series.

Page 51: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Range

Application:

Range is used in medical science to define the normal limits of biological characteristics.

Example: normal ranges of systolic and diastolic blood pressure are 100 – 140 mm and 80 –90 mm respectively. Ordinarily observations falling within a particular range are considered normal and those falling outside the normal range are considered as abnormal.

Range for a biological character such as blood cholesterol, fasting blood sugar, hemoglobin, bilirubin etc is worked out after measuring the characteristics in large number of healthy persons of the same age, sex, class etc.

Page 52: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Range

Merits:

• It is simple to compute and understand.

• It gives a rough but quick answer

Limitation:

1. It is not a satisfactory measure as it is based only on two extreme values, ignoring the distribution of all other observations within the extremes. These extreme values vary from study to study, depending upon the size and nature of sample and type of study.

Page 53: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

• Karl Pearson introduced the concept of Standard Deviation in 1893.

• The standard deviation is a statistic that tells us how tightly all the

values are clustered around the mean in a set of data.

Variance & Standard deviation

The mean of the squares of the deviations of every observation from their mean is a measure of spread and is called the variance. The standard deviation is the square root of the variance.

1

)()(..

2

n

xxDS

1

)( 2

n

xxVariance

It is computed as the root of average squared deviation of each number from its

mean. For example, for the numbers 1, 2, and 3 the mean is 2 and the standard

deviation is:

SD = 0.667 = 0.44

Page 54: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Standard deviation

Merits:

• It is the most important and widely used measure of dispersion.

• It is based on all the observations and the actual sign of deviations are used.

• Standard deviation provides the unit of measurement for the normal distribution.

• It is the basis for measuring the coefficient of correlation, sampling and statistical inference.

Limitation:• It is not easy to understand and difficult to calculate

• It is affected by the value of every item in the series.

Page 55: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

= 20 = 20

Calculations of SD:

In these two groups, means are same (20) but their variation (SD) is different (SDA, 8.2 and SDB, 5.5).

Page 56: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Calculations of SD with alternative formulas:

Page 57: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Greater SD, greater is variation of

observation.

Mean is presented with SD as …..

Mean±SD.

Page 58: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

The standard error of a sample mean is just the sample standard deviation divided by the square root of the sample size.

Standard Error of Mean

If we draw a series of samples from same population and calculate the mean of

the observations in each, we have a series of means. The series of means, like

the series of observations in each sample, has a standard deviation. The SE of

the mean of one sample is an estimate of the SD that would be obtained from

the means of a large number of samples drawn from the population.

Another thing is if we draw random samples from the population their means

will vary from one to another. This variation depends on the variation of

population and size of samples. We do not know the variation of population so

we use the variation of the sample as an estimate of it. This is expressed in SD

and if we divide SD by squire root of the number of observations in the sample

we have an estimate of SE of mean, SEM = SD/n

n

SDSE

Page 59: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Advantage of SE

• To determine the significant difference of two means of different variables.

• To calculate the size of sample. If SD is known.

=

n

sx

z

nSE

SEn

Page 60: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Greater SE, greater is variation of

observation.

Mean is presented with SE as …..

Mean±SE.

Page 61: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Relative measure of variation is called Co-efficient of variation (C.V.). C.V. is defined as the S.D. divided by the mean times 100.

Co-efficient of variation (C.V.)

It is useful in comparing distribution whose units or characters may be different e.g. height in cm in one and in inches in the other.

100.. Mean

SDVC

Page 62: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Mean SD CV

Adult 160 cm 10 cm

6.25%

Children 60 cm 5 cm

8.33%  

It means though height in adult shows greater

variation in SD, but real thing is that children is

greater variation.

Co-efficient of variation (C.V.)

Example: Height (cm) of adult and children are given in the table

Page 63: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Population & Sample

Population

• All possible values of a variable or all possible objects whose characteristics are of interest in any particular investigation or enquiry.

• If the income of the citizen of country is of interest to us, the aggregate of all relevant incomes will constitute the population.

Sample• A sample is a part of population. • Although we are primarily interested in the properties of

a population or universe, it is often impracticable or even impossible to study the entire universe.

• Thus inferences about a population are usually drawn on the basis of a sample. It represents the population.

Page 64: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Normal Distribution

• The normal distribution was first introduced by the French mathematician

La Place (1749-1827).

• It is highly useful in the field of statistics. The graph of this distribution is

called normal curve or bell-shaped curve.

• In normal distribution, observations are more clusters around the mean.

Normally almost half the observations lie above and half below the mean

and all observations are symmetrically distributed on each side of the

mean.

• The normal distribution is symmetrical around a single peak so that mean

median and mode will coincide. It is such a well-defined and simple shape,

a great deal is known about it. The mean and standard deviation are the

only two values we need to know o be able to describe a normal curve

completely.

Page 65: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Normal Distribution

Page 66: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Characteristics :

• The curve is symmetrical

• It is a bell shaped curve.

• Maximum values at the center and

decrease to zero symmetrically on each side

• Mean, median and mode coincide

Mean = Median = Mode

It is determined by mean and standard deviation.

Mean1SD limits, includes - 68% of all observations

Mean 2SD - ,, ,, - 95% ,, ,,

Mean 3SD - ,, ,, - 99% ,, ,,

Normal Distribution

Page 67: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Normal Distribution

• Almost all statistical tests (t-test, ANOVA etc)

assume normal distributions. These tests work

very well even if the distribution is only

approximately normally distributed.

• Some tests (Mann-whitney U test, Wilcoxon W

test etc) work well even with very wide deviations

from normality.

Page 68: Welcome to the. Biostatistics 1 Course instructor: Dr. JMA Hannan Class hours: Monay 6:00 pm – 9.00 pm Cell: 01199248989 E-mail: jmahannan@northsouth.edu

Group Distribution is normal Distribution is not normal

(Parametric tests) (Nonparametric tests)

Normal Distribution

One Mean±SD Median (Range)

Two t-test

Unpaired t-test

Paired t-test

  Non-parametric t-test (or Rank Test)

The Mann-Whitney U test

The Wilcoxon Matched-Pairs Signed-

Ranks TestThree or more  

The Kruskal-Wallis One-way ANOVA by Ranks

The Friedman’s Test

One Way ANOVA

The Repeated Measures ANOVA

Relationship between two variables

The Correlation Coefficient

Simple Linear Regression

The Spearman Rank Correlation Coefficient

Nonparametric Regression Analysis