79
MEDICAL COMPUTING AND STATISTICS PRESENTED BY DR.RAJ KUMAR SINGH(JR-1) DEPTT .OF ORTHODONTICS AND DENTAL ANATOMY SUPERVISOR DR. SANJEEV KUMAR VERMA CHAIRMAN, DEPTT.OF ORTHODONTICS AND DENTAL ANATOMY DR.Z.A.DENTAL COLLEGE,AMU,ALIGARH CO-SUPERVISOR DR.MD.SAIF KHAN LECTURER, DEPTT .OF PERIODONTICS DR.Z.A.DENTAL COLLEGE,AMU,ALIGARH

Medical Computing and Statistics

Embed Size (px)

Citation preview

PRESENTED BY DR.RAJ KUMAR SINGH(JR-1) DEPTT .OF ORTHODONTICS AND DENTAL ANATOMY

SUPERVISOR DR. SANJEEV KUMAR VERMA CHAIRMAN, DEPTT.OF ORTHODONTICS AND DENTAL ANATOMY DR.Z.A.DENTAL COLLEGE,AMU,ALIGARH

CO-SUPERVISOR DR.MD.SAIF KHAN LECTURER, DEPTT .OF PERIODONTICS DR.Z.A.DENTAL COLLEGE,AMU,ALIGARH

Overview of seminar Introduction to medical computing Role of medical computing Introduction to statistics

How to use statistics Role of statistics conclusion

What does it mean computers

in medicine ?

The Computer Meets Medicine and Biology:

Emergence of a Discipline

After taking this course, you should know the answers to these questions:

Why

is information management a central issue in biomedical research and clinical practice?

What are integrated information-management environments and how might we expect them to affect the practice of medicine and biomedical research in coming years?

What do we mean by the terms medical computer science, medical computing ,medical informatics, clinical informatics, nursing informatics, bioinformatics, and health informatics? Why should health professionals and students of the

health professions learn about medical-informatics concepts and informatics applications?

How has the development of mini-computers, microprocessors, and the Internet changed the nature of biomedical computing? How is medical informatics related to clinical practice , biomedical engineering, molecular biology, decision science, information science, and computer science?

Role of computing Medical Decision making: Probabilistic medical

reasoning. Patient care systems. Patient monitoring systems. Computer aided surgery. Computer based patient record systems.

Clinical decision support systems. The internet. Standards in medical informatics.

Imaging modalities. Image management systems. Telemedicine. Bioinformatics.

Conventional data collection for clinical trialMedical records

Data sheets

Computer database

Analyses

Results

WHAT IS STATISTICS

Introduction Statistics is a science that comprises data collection

methods,processing of data into useful information,and utilising this information in taking decisions with least error Medical statistics-

A collection of statistical procedures particularly well-suited to the analysis of healthcare-related data.

Medicine is empirical science depending on observations Medical data are necessary for any medical decision,be it for diagnosis,for treatment planning or prognosis,is that some information is availaible for the patient

Also for medico-legal or ethical reasons,to justify Thus, they must be evidence based

Uncertainities in medicine arise mainly due to

1)biological variability 2)environmental variability 3)sampling fluctuations 4)chance variability 5)instrument variability

To deal with such enormous uncertainities that

pervade all aspects of medical practice, a separate science has developed,called biostatistics It provides methods to measure uncertainities by

probabilities ,and helps to control the impact of uncertainities on medical practice by laying principles to choose decisions that judiciously combine the probabilities with judgements

How to use statistics Develop an underlying question of interest Generate a hypothesis Design a study

Collect Data Analyze Data Descriptive statistics Statistical Inference

Hypothesis: tentative assumption of the study or expected results of the study It should be very specific and limited to the piece of

research in hand because it has to be tested.

The role of hypothesis is to guide the researcher by

delimiting the area of research and to keep him on the right track.

Develop Study Design Research question Study sample Sample size Enrollment/Follow-up strategies On-going monitoring

sampling Sample is that part of target population which is

actually enquired on or investigated Types of sampling:-

1)Simple random 2)systematic random 3)stratified random 4)cluster random 5)multistage random

Existing data Primary data are those which one elicits from

individual patients,subjects or other units(such as hospitals or laboratories) directly Secondary data are those that are elicited by others Secondary data sources include disease specific

database on the web,medical literature,records of surveys and registrations done by the government

Generation of new data Existing data may be incomplete and insufficient to

provide answers to specific questions For these data are specially generated through new surveys

and experiments Basically there are two types of studies to generate new

data-descriptive and analytical In either setup,it is necessary that a sample of subjects is

studied

data collection designs

Objective

descriptive

analytical

Method

surveys

observational

experimental

Time frame

prospective retrospective crosssectional(One point time)

(Cohort-cause to effect) (Effect to cause)

Setting

animal trial

Describing data with tables 1) frequency table 2) relative and cumulative frequency 3) grouped frequency

4) open- ended groups 5) cross-tabulation

Frequency tablevariables frequency

Mortality (%)11.2-15.1 15.2-20.1 20.2-25.1 25.2-30.1

Tally1, 1, 1, 1, 1, 1, 1, 1, 1 1, 1, 1, 1, 1, 1, 1, 1 1, 1, 1, 1, 1 1, 1, 1

No. of ICU9 8 5 3

30.2-35.1

1,

1

Relative and cumulative frequencyparity No.of women Percentage (relative frequency) Cumulative percentage

01 2 3 4 7 8

56 14 10 3 1 1

12.515 35 25 7.5 2.5 25

12.527.5 62.5 87.5 95 97.5 100

Cross tabulationTwo variables within a single group of individualsCaries Yes Occlusal 21 (84%) (66) 2 or fewer children No 11 (73%)(34) 32(100) Totals

proximal Totals

4 (16%) (50) 4 (27%)(50) 25(100%) 15(100%)

8(100) 40

Describing data with charts1) Charting nominal data

(1) the pie chart

(2) the simple bar chart(3) the cluster bar chart (4) the stacked bar chart

2) Charting ordinal data(1) the pie chart (2) the bar chart

3) Charting discrete metric data 4) Charting continuous metric data 1)the histogram

Pie chart

4-5 categories One variable Start at 0 in the same order as the tablePie chart: Hair color of children reciving d-phenothrin

dark , 21, 21%

blonde, 18, 18% blonde

red, 4, 4%

brown red dark

brown, 55, 57%

Simple bar diagram

Clustered bar diagramCluster percetage bar chart of the hair color receiving Malathion and dphenothrin60 50 40 30 20 10 0 malathion d-penothrin 16 4 28 18 4 22 blonde brown red dark 52 56

HistogramExercise 3-5, Histogram40 35 30 25 20 15 10 5 0 19 20-24 25-29 30-34 35 Percentage age distribution of pregnant women Thrombosis cases

Step chartExercise 3.8 Cumulative percetage o finfants 120 100 90 80 60 40 20 0 0 60 36.67 16.67 6.67 5 10 Cumulative percetage o finfants 100

Charting cumulative ordinal or discrete metric data

Cumulative frequency curveExercise 3.9 Ogive120 100 80 60 40 20 0 15-24 Attempting suicide Later successful

25-34

35-44

45-54

55-64

65-74

75-84

> 85

Percentage cumulative frequency curves of age for male suicide attempters and later succeeders

Data collection ,types and quality Evidence based decisions are only as good as the

evidence itself Thus it is important that the data gathered for creating

evidence is correct

Methods such as interview,examination ,investigations

are availaible

He must decide which method is best for particular

information

Data can be either , quantitative or qualitative Qualitative data can be on nominal scale or ordinal

scale Quantitative data are on metric scale

Nominal scale data It can be allocated into one of a number of categories. Blood type, sex(male/female) No meaningful order

Ordinal scale data It can be allocated to one of a number of categories but

be put in meaningful order. Very satisfied, satisfied, neutral, unsatisfied, very

unsatisfied.

Descrete metric data Countable variables. Integer form Numbers of things Age, numbers of men

Continuous metric data Measurable variables. Round to the nearest integer Kg, m, mmHg, hour, years

Quality of data is assessed in terms of validity and

reliability of the measurements or of the tools used to obtain the data Validity - the ability to correctly measure the

characteristic that it purports to measure

For tests,this is assessed in terms of sensitivity-

specificity ,and positive and negative predictivities Reliability - the ability to give same result when used

repeatedly in identical conditions

Statisitcal analyses Descriptive Statistics Describe the sample Inference Make inferences about the population Primarily performed in two ways:

Hypothesis testing Estimation (more important !!)

Prediction

Descriptive statistics Descriptive statistics are a way of summarizing the complexity of the data with a single number.

A. For one variable ("univariate analysis"): Measures of "CENTRAL TENDENCY") (averages) and of DISPERSION or variance around that average. Examples: Means, Modes, Medians, Standard Deviation, quartiles

B. Descriptive statistics for the strength of relationship between two variables (bivariate analysis) or among a set of variables (multivariate analysis) are measures of ASSOCIATION or correlation.

Measure of central tendency

Nominal & Ordinal Frequencies Percents Medians Modes (all)

Interval & Ratio

Means

Measure of dispersionNominal & Ordinal (qualitative) Range Deviation Interval & Ratio(quantitative) Standard Quartiles

Measure of associationNominal & Ordinal Interval & Ratio

Cross-tabulation Non-Parametric Phi, Gamma , Eta Lamda, Tau-B etc.

Pearson's R

Measure of significanceNominal & Ordinal Chi Squre ,t-test Interval & Ratio Anova (F-ratio)

Inferential statistics Are measures of the SIGNIFICANCE of the relationship between two or more variables. Significance refers to the probability that the findings could be attributed to sampling error. Appropriate statistics depend on the LEVEL OF MEASUREMENT OF THE DEPENDENT VARIABLE (and of the independent variable).

Parameters Summary measures , as mean and standard deviation

can be obtained for a sample as also for entire population Summary measures,when obtained for the entire

target population ,are called parameters The values of parameters are hardly ever known

because nobody has time and resources to study the entire population

When parameter values are unknown,as almost

invariably is,it becomes necessary to fall back on samples to get some tangible lead regarding the characteristic of population Measures such as mean and SD when obtained for

sample subjects are called statistics

Standard deviation and normalmean

Tests of parametric significance1) Student t-test:

for comparison of mean between 2 groups 2) Anova F-test:

for comparison of means in three or more groups (both the above test requires that the means follow a Gaussian distribution and hence are called parametric tests)

Nonparametric test When sample size is very small and distribution is

skewed, parametric tests cannot be used In such cases ,non parametric tests(less powerful test

than parametric) are used For paired data - non-parametric tests commonly used

are sign test and other is Wilcoxon signed rank test

For unpaired two-sample data - the non-parametric

test is Mann-Whitney test Another important non-parametic test is Chi-square

test(used for nominal data),a test of proportion This is used to test the significance of association of

two or more qualitative characteristics

Point estimation and standard error It is a reality that samples in all likelihood will differ

from one another Even though there is rarely a need for a second sample

in scientific endeavours provided the first is chosen with due precautions such as random selection and inclusion of sufficient number of individuals

In such cases ,summary measures based on one

sample alone are considered good estimates of the respective characteristics of target population These are called point estimates

Although point estimates obtained from carefully

derived sample are fairly representative of population parameters,uncertainities arising out of sampling variation must be taken into account Sampling variation is a reality that says that samples in

all likelihood will differ from one another

S.E. of mean calculates these uncertainities Point estimates have reliability only when SE is small

Confidance interval When SE is large,an interval estimate should be

obtained This is also called confidence interval This is the range that is very likely to contain the

parameter value

This likelihood is called confidence level Generally a 95% confidence level is used The 95% CI is obtained as statistic+_2 SE of that

statistic

Null hypothesis It is the hypothesis that says that there is no

difference,or that asserts the existing knowledge or claim,and is tested for refutation by the study For eg- newer drug B is not better than existing drug A

for releiving toothache A null hypothesis is sought to be refuted by

conducting a study

A null hypothesis is either rejected or not rejected,it is

never accepted Alternate hypothesis is the assertion that is accepted

when the null is rejected Note that alternative is accepted when null is rejected

but nothing is accepted when null is not rejected

Evidance against null In case of medical studies,evidence is provided in

terms of the results of a trial conducted on some patients,or observations regarding natural occurences in a group or many group of people

The evidence is considered sufficient against the null

hypothesis if 1)study is unbiased 2)There are no confounders that can affect the findings 3)Sample size is sufficient to inspire confidence in results and sampling fluctuations are minimal

Type-1 error and p- values Type I error - when a true null hypothesis is rejected

due to the wrong evidence provided by the data This is serious error The probability of type-I error is called P-value

Thus, P value is the chance that the presence of

difference is concluded when actually there is none It is this type I error that later on forces ban on some

drugs after they are licensed for marketing

The maximum threshold of tolerance of the

probability of type-I error is called the significance level It is denoted by and is fixed in advance,generally at 0.05 percent P-value is calculated on basis of the data but is fixed

in advance

When P