19
The Application of Data Mining in Health Research Li Xiaosong, M.D., M.P.H., Ph.D. Prof. of Biostatistics School of Public Health Sichuan University

The Application of Data Mining in Health Research

  • View
    1.280

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: The Application of Data Mining in Health Research

The Application of Data Mining

in Health Research

Li Xiaosong, M.D., M.P.H., Ph.D.

Prof. of Biostatistics

School of Public Health

Sichuan University

Page 2: The Application of Data Mining in Health Research

With the rapid development of the Information In

dustry, great advances have been made in data pr

oduction and collection capacities,however, the co

nflict of “rich data but pool knowledge” is getting i

ncreasingly evident.

It is the “Knowledge discovery in databases” that

cater the demand!

Knowledge discovery in databases ( KDD)

Page 3: The Application of Data Mining in Health Research

Data tomb? Knowledge

Data mining

Page 4: The Application of Data Mining in Health Research

Data Mining (DM)

In the face of vast databases, how to discover the hidden but

useful knowledge from data, which can help in the government

and enterprises’ decision-making, so as to get more benefit had

become an important problem to solve

Data KnowledgeDM

Data Mining is the procedure of distilling the unknown but

potentially valuable information and knowledge from plentiful

data which is uncompleted, misty and stochastic

Page 5: The Application of Data Mining in Health Research

Data Mining: A KDD Process

Data mining--the core of knowledge discovery process

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 6: The Application of Data Mining in Health Research

Classification of Data Mining technology

Association Rule Mining

Classification and Predicting

Clustering Analysis

Trend Analysis

Patten Analysis

… …

Page 7: The Application of Data Mining in Health Research

The Application of Data Mining

Banking

Telecom

Economy

Meteorology

Agriculture

Health care

Military

… …

Page 8: The Application of Data Mining in Health Research

Data Mining applied in health care

The application of data mining in medical and health

researches had prove itself to be effective, showing

great development potentialities

Now data mining has become a key method in

obtaining information in clinical medicine, biomedicine,

pharmacy and public health

Page 9: The Application of Data Mining in Health Research

Data Mining applied in Clinical Research

Finding the relationship among diseases Searching the rule of disease development and prevalence Disease diagnosis and treatment Summarizing therapeutic effects

e.g. using Bayes classification & decision tree

classification in disease diagnosis

Page 10: The Application of Data Mining in Health Research

Data Mining applied in Biomedicine

Using Sequence Model Analysis & Similarity Retrieval

to find the gene sequence model for certain kind of diseases

Data Cleaning & Data integration is valuable in the data

integration and database building of gene

Association Rules Analysis can help to discover Gene

crossover and correlation in a Genome

- A powerful tool in DNA analysis!

Page 11: The Application of Data Mining in Health Research

Data Mining applied in Public Health

Spatio-temporal Data Mining used in infectious diseases

monitoring to search for the epidemic rules and distribution

characteristics of diseases

Using Time Series Analysis, Neural Networks to predict the

incidence and infant mortality rate of infectious diseases

Association Rules to discuss the influencing factors of

diseases and health seeking behavior

Clustering and classification are now widely used in the

decision support system for health insurance

Page 12: The Application of Data Mining in Health Research

Association Rules aims to find out the relationship among valuables in database, resulting in deferent types of rules

LAD% - The percentage of heat disease caused by left anterior descending coronary artery RCA% - The percentage of heat disease caused by right coronary artery

Table 1 : original data from a research on heart disease

Gender Age Smoker LAD% RCA%

F 52 Y 85 100

M 62 N 80 0

M 75 Y 70 80

M 73 Y 40 99

M 66 N 50 45

… … … … …

e.g. Application of Association Rules in Medical Data Analysis

Page 13: The Application of Data Mining in Health Research

Results:

Table.2 Medical Association Rules

NO. Rule

1 Gender=M∩Age≥ 70∩ Smoker=Y RCA%≥ 50(40%,100%)

2 Gender=F∩Age<70∩ Smoker=Y LAD%≥ 70(20%,100%)

Rule 1 indicates : 40% of the cases are male, over 70 years old and

have the habit of smoking, the possibility of RCA%≥50% is 100%

Rule 2 indicates : 20% of the cases are female, under 70 years old and

have the habit of smoking, the possibility of LAD%≥70% is 100%

Page 14: The Application of Data Mining in Health Research

The future application of Data Mining in medical research

Data Mining is based on a series of new data process

technology

Wavelets Analysis

Neural Networks

Genetic Algorithm

Fuzzy Logic Reasoning

Page 15: The Application of Data Mining in Health Research

Challenges facing Data Mining

The data of medical research are always complicated and

unique in types and structures To integrate the specialized knowledge of both medical and data-

processing staffs Plenty of data and repeatedly practices are needed

Targets:

To form a real useful data mining system for

health research

Page 16: The Application of Data Mining in Health Research

Backpropagation Neural Networks (BPNNs)

A type of artificial neural network

A way to model highly complex, nonlinear solutions to

classification problems

Useful in classifying health-related phenomena

Page 17: The Application of Data Mining in Health Research

e.g. Classification of smoking cessation status with BPNN

Classifier performance estimates:

The confidence intervals of Az for both the BPNN and logistic classifiers are narrow

And exceeds random chance (Az=0.5) by at least 25% points

The finding indicates the performance of both classifiers exceeds that of random

chance

* Az: area under receiver operating characteristic curve

Page 18: The Application of Data Mining in Health Research

e.g. Classification of smoking cessation status with BPNN

The graph illustrates the

estimated true positive fraction

(TPF) at multiple values of the

false positive fraction (FPF)

The areas under the ROC curve

differ at a=0.05

Binormal conventional ROC curves for

BPNN and logistic regression classifiers

* ROC:receiver operating characteristic

Page 19: The Application of Data Mining in Health Research

Thank you!