1. Introduction to biostatistics

KNOWLEDGE FOR THE BENEFIT OF HUMANITYKNOWLEDGE FOR THE BENEFIT OF HUMANITY

BIOSTATISTICS (HFS3283)

INTRODUCTION TO BIOSTATISTICS

Dr.Dr. MohdMohd RazifRazif ShahrilShahril

School of Nutrition & Dietetics School of Nutrition & Dietetics

Faculty of Health SciencesFaculty of Health Sciences

UniversitiUniversiti Sultan Sultan ZainalZainal AbidinAbidin

1

S C H O O L O F N U T R I T I O N A N D D I E T E T I C S • U N I V E R S I T I S U L T A N Z A I N A L A B I D I N

Synopsis “This course exposes students to different types of data,

measures of central tendency and dispersion, probability,

normal distribution and inferential analysis in biostatistics. It

also covers various basic statistical analyses including

descriptive, parametric and nonparametric tests,

association and prediction models commonly used in

research. This course provides hands-on experience for

students to perform statistical analysis using SPSS and

interpret their outcomes in answering specific research

question”

2


Learning Outcomes At the end of this course, students should be able to;

• Describe concepts of descriptive, inferential, parametric

and non-parametric tests in biostatistics.

• Describe concepts of categorical data analysis,

association, prediction, reliability and validity in

biostatistics.

• Choose statistical analysis of data based on types of

variables and objective of analysis using SPSS and

interpret their outcomes.

3


Teaching & Learning Strategy Blended Mode Learning (Face to Face + e-Learning)

• Interactive Lecture – offline + online

• Hands-on Practical

• Project

• Self-Directed Learning (Literature, Video)

4


Brief Course Outline 1. Introduction to biostatistics

2. Inferential statistics

3. Independent t test

4. ANOVA & Paired t-test

5. Nonparametric analysis

6. Categorical data analysis: Chi square & Fisher exact test

7. Categorical data analysis: OR and RR

8. Correlation

9. Regression

10. Validity analysis

11. Reliability analysis

5


Course Schedule

6

WEEK / DATE ACTIVITIES

1 MONDAY

(25/01/2016)

Lecture (8.00am - 10.00am) – Dr. Mohd Razif Shahril

INTRODUCTION TO BIOSTATISTICS

Practical (10.00am - 12.00pm) –Dr. Mohd Razif Shahril

INTRODUCTION TO SPSS

2 MONDAY

(01/02/2016)

Lecture (8.00am - 10.00am) –Dr. Mohd Razif Shahril

INFERENTIAL STATISTICS


SPSS: DESCRIPTIVE ANALYSIS & NORMALITY

3 MONDAY

(08/02/2016)

CHINESE NEW YEAR (PUBLIC HOLIDAY)

4 MONDAY

(15/02/2016)

Lecture (8.00am - 10.00am) –Dr. Sharifah Wajihah Wafa

INDEPENDENT T-TEST

Practical (10.00am - 12.00pm) –Dr. Sharifah Wajihah Wafa

SPSS: INDEPENDENT T-TEST

5 MONDAY

(22/02/2016)

Lecture (8.00am - 10.00am) –Dr. Sharifah Wajihah Wafa

ANOVA & PAIRED T-TEST

Practical (10.00am - 12.00pm) –Dr. Sharifah Wajihah Wafa

SPSS: ANOVA & PAIRED T-TEST


Course Schedule (cont.)

7


6 MONDAY

(29/02/2016)


NONPARAMETRIC ANALYSIS


SPSS: NONPARAMETRIC ANALYSIS

7 MONDAY

(07/03/2016)


CATEGORICAL DATA ANALYSIS: CHI SQUARE & FISHER

EXACT TEST


SPSS: CHI SQUARE & FISHER EXACT TEST

THURSDAY

(10/03/2016)

TEST (8.30am - 10.00am) –Dr. Mohd Razif Shahril

8 MONDAY

(14/03/2016)


CATEGORICAL DATA ANALYSIS: OR AND RR


SPSS: OR AND RR

MID SEMESTER BREAK (20/03/2016 – 26/03/2016)



8


9 MONDAY

(28/03/2016)


CORRELATION


SPSS: CORRELATION

10 MONDAY

(04/04/2016)


REGRESSION


SPSS: REGRESSION

11 MONDAY

(11/04/2016)


RELIABILITY


SPSS: INTERNAL CONSISTENCY & INTRACLASS

CORRELATION

12 MONDAY

(18/04/2016)


VALIDITY


SPSS: FACTOR ANALYSIS



9


13 MONDAY

(25/04/2016)

Project (8.00am - 11.00am) –Dr. Mohd Razif Shahril

REPORT PREPARATION

14 MONDAY

(02/05/2016)

Project (8.00am - 11.00am) –Dr. Mohd Razif Shahril

REPORT SUBMISSION

STUDY WEEK (06/05/2016 – 09/05/2016)

END OF SEMESTER EXAMINATION (10/05/2016 – 26/05/2016)

Location Makmal Komputer Khadijah

(Khadijah Computer Lab)


Assessment • Continuous Assessment

– Test (30%)

– Project Report (1500 words) (30%)

• Summative Assessment

– End of Semester Examination (40%)

10


Main References

11


12

ATTENDANCE

LESS THAN 80% BARRED FROM EXAMINATION

[NO MC FROM PRIVATE CLINICS!]


Topic Learning Outcomes At the end of this lecture, students should be able to;

• define data and types of data.

• define descriptive statistics, variables and scales.

• explain types of central tendency and dispersion

measurements.

13


What is DATA?

14

• Definition: A collection of items of information

• Types of data

– Qualitative

– Quantitative


What is DATA? (cont)

15

• Qualitative data

– Observation or information characterized by

measurement on a categorical scale (dichotomous,

nominal or ordinal scale).

– Data that describe a quality of the subject studied.

• E.g. gender, ethnic, death or survival, nationality etc.

– Generally describes in terms of percentages or

proportions.

– Mostly displayed by using contingency table, pie

chart, bar charts.


What is DATA? (cont)

16

• Quantitative data

– Data in numerical quantities such as continuous

measurements or counts.

– Observation for which the differences between

numbers have meaning on a numerical scale.

– They measure the quantity of something

– Types of numerical scales;

• Continuous scale (e.g. Age, height)

• Discrete scale (e.g. Number of pregnancy)

– Described in terms of means and standard deviation.

– Frequency tables and histograms are most often used

to display this type of information.


How to analyse DATA? (cont)

17

• Using STATISTICS!

– A small representative ‘sample’ is used to study a big

‘population’

• Why?

– Expensive to conduct very large study

– Impossible to collect information from everyone in the

population

• Types of statistics

– Descriptive statistics

– Inferential statistics


How to analyse DATA? (cont)

18

Population: A complete collection of data on the group under study Sample: A collection of sampling units selected from the population Sampling unit: A member of the population

Sampling unitSampling unit n=1n=1

SampleSample n= 200n= 200

PopulationPopulation N=20,000N=20,000


Descriptive statistics

19

• Describe the frequency and distribution to

characterize data collected from a group of

sample to represent the population.

• E.g.

– Percentage of patients attending diabetes clinic

– Gender, age group, education level of the patients

– Patients waiting time for doctors consultation

– Patients fasting glucose and HbA1c level

– Etc.


Variable

20

• A variable is a characteristics that can take on

different values for different members of the

group under study

– E.g. a group of university students will be found to

differ in gender, height, attitudes, intelligence and may

ways. These characteristics are called variables.

• Categories of variable

– Continuous vs. Discrete

– Independent vs. Dependent


Variable (cont.)

21

• Continuous variable – can take on any values on the measurement scale under

study

– Do not fit into a finite number or categories

– Referred to as measurement data

– E.g. weight, height, age, blood pressure etc.

• Discrete variable – only designated values or integer values i.e. 1, 2, 3…

– Fit into limited categories

– Referred as count data (dichotomous/ multichotomous) • E.g. dichotomous

– Male-Female

– Yes-No

• E.g. multichotomous – Malay-Chinese-Indian

– Man Utd-Arsenal-Chelsea-Man City-Liverpool


Variable (cont.)

22

• Independent variable (IV)

– Manipulated in accordance with the purpose of the

investigation

– Set by researcher

• Dependent variable (DV)

– Consequence of the independent variable

– Effected by independent variable

– Outcome


Scales

23

• Type of scales

– Nominal – classify observation into categories that

cannot be numerically arranged (no order)

– Ordinal – assign order to categories so that one

category is higher than another

– Interval / ratio –sequential ranking of values (as

ordinal scales)


Measurement of central tendency

24

Mean

Mode

Median


Mode

25

• Is the most frequent occurring value in a set of

discrete data

• Can be more than one mode if two or more

values are equally common

• E.g.

– 1,3,4,7,2,5,9,4,6,7,8,9,3,4,9,6,4,5,2,1,6,6,7,4,3

– 1,1, 2,2,3,3,3,4,4,4,4,4,5,5,6,6,6,6,7,7,7,8,9,9,9

Mode=4


Median

26

• The value halfway through the ordered data set

• Generally a good descriptive measure of the

location which works well for skewed data or

data with outliers

• E.g. (n=25)

– 3,4,7,2,5,1, 9,4,6,7,8,9,3,4,9,6,4,5,2,1,6,6,7,4,3

– 1,1, 2,2,3,3,3,4,4,4,4,4,5,5,6,6,6,6,7,7,7,8,9,9,9

Median=5

Ordered data


Mean

27

• The sample mean is an estimator available for

estimating the population mean.

• Its value depends equally on all of the data

which may include outliers.

• E.g. (N=10) (3, 4, 7, 2, 5, 7, 5, 5, 1, 2)

= 3+3+4+7+2+5+7+5+5+1+2

10

= 4.1


Measurement of dispersion

28

• Used to describe the variability (spread and

dispersion) in a given sample

• Dispersion measurement;

– Range

– Percentiles

– Variance

– Standard deviation

– Standard error

– Interquartile range


Measurement of dispersion (cont.)

29

• Range – Difference between highest and lowest value

• Percentiles – Indicate the % of individuals who have equal to/below a given

value.

• Variance – Provides information about how individuals differ within sample.

• Standard deviation (SD) – Gives information about the spread/variability of scores around

the mean.

• Standard error (SE) – Indicates about the certainty of the mean itself.

• Interquartile Range (IQR) – the distance between 1st and 3rd quartile


Range

30

• Is the difference between the smallest and

largest value in a set of observation

• Range = (the largest value – the smallest value)

– E.g. 3,5,6,7,9,10

– Range = 10 – 3 = 7

• Uses only extreme values and ignores the other

values in the data set.


Variance

31

• Measure spread or dispersion within a set of

sample data.

• E.g. for N observation X1, X2, X3,.... Xn with

sample mean:

• Therefore, the sample variance is


Standard deviation (SD)

32

• Measure of spread or dispersion of a set of data

• Calculated by taking square root of the variance

• The more widely the values are spread out, the

larger the standard deviation


Standard Error (of the Mean)

33

• The SEM quantifies the precision of the mean.

• A small SEM indicates that the sample mean is

likely to be quite close to the true population

mean.

• A large SEM indicates that the sample mean is

likely to be far from the true population mean

• A small SEM can be due to a large sample size

rather than due to tight data.


Interquartile Range (IQR)

34

• IQR is the distance between 1st and 3rd quartile.

• It is not sensitive to extreme values (outliers).

• Thus, it is usually described together with the

median in skewed distribution of observation.

• Formula: IQR = (Q3 – Q1)


Histogram

35

• Display the frequency distributions of one

variable

• Very similar to bar chart that are used for

categorical data

• Consists of a set of columns with no space

between each of them

– On horizontal axis, the variables of consideration

– On vertical axis, the scale of frequency of occurrence

– The area under the each column represents the

frequency of each class and thus the total under all

columns equals the total frequency


Histogram (cont.)

36


Histogram (cont.)

37

• Unimodal - bell shaped curve • Symmetric about its mean –

Mirror image • Mean = Median = Mode


Histogram (cont.)

38


Histogram (cont.)

39


Boxplot (box and whisker plot)

40

• It is a graphical display using percentile in an

ordered data

• The plot provides information about central

tendency and the variability of the distribution

• It allows detection of outliers and symmetry of

data set

– Outliers : if distance is away > (1.5 times IQR) above

and below of Median, it is shown as circle

– Extreme outliers : if distance is away > (3 times IQR)

above and below of Median, it is shown as asterisks


Boxplot (cont.)

41


Note

42

• In descriptive presentation use Mean (SD) or

Median (IQR)

– Normal distribution data, use Mean (SD)

– Skewed distribution data, use Median (IQR)

• In inferential results use Mean (SE)

Thank YouThank You

43

Health & Medicine

1. Introduction to biostatistics