View
1.280
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
The Application of Data Mining
in Health Research
Li Xiaosong, M.D., M.P.H., Ph.D.
Prof. of Biostatistics
School of Public Health
Sichuan University
With the rapid development of the Information In
dustry, great advances have been made in data pr
oduction and collection capacities,however, the co
nflict of “rich data but pool knowledge” is getting i
ncreasingly evident.
It is the “Knowledge discovery in databases” that
cater the demand!
Knowledge discovery in databases ( KDD)
Data tomb? Knowledge
Data mining
Data Mining (DM)
In the face of vast databases, how to discover the hidden but
useful knowledge from data, which can help in the government
and enterprises’ decision-making, so as to get more benefit had
become an important problem to solve
Data KnowledgeDM
Data Mining is the procedure of distilling the unknown but
potentially valuable information and knowledge from plentiful
data which is uncompleted, misty and stochastic
Data Mining: A KDD Process
Data mining--the core of knowledge discovery process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Classification of Data Mining technology
Association Rule Mining
Classification and Predicting
Clustering Analysis
Trend Analysis
Patten Analysis
… …
The Application of Data Mining
Banking
Telecom
Economy
Meteorology
Agriculture
Health care
Military
… …
Data Mining applied in health care
The application of data mining in medical and health
researches had prove itself to be effective, showing
great development potentialities
Now data mining has become a key method in
obtaining information in clinical medicine, biomedicine,
pharmacy and public health
Data Mining applied in Clinical Research
Finding the relationship among diseases Searching the rule of disease development and prevalence Disease diagnosis and treatment Summarizing therapeutic effects
e.g. using Bayes classification & decision tree
classification in disease diagnosis
Data Mining applied in Biomedicine
Using Sequence Model Analysis & Similarity Retrieval
to find the gene sequence model for certain kind of diseases
Data Cleaning & Data integration is valuable in the data
integration and database building of gene
Association Rules Analysis can help to discover Gene
crossover and correlation in a Genome
- A powerful tool in DNA analysis!
Data Mining applied in Public Health
Spatio-temporal Data Mining used in infectious diseases
monitoring to search for the epidemic rules and distribution
characteristics of diseases
Using Time Series Analysis, Neural Networks to predict the
incidence and infant mortality rate of infectious diseases
Association Rules to discuss the influencing factors of
diseases and health seeking behavior
Clustering and classification are now widely used in the
decision support system for health insurance
Association Rules aims to find out the relationship among valuables in database, resulting in deferent types of rules
LAD% - The percentage of heat disease caused by left anterior descending coronary artery RCA% - The percentage of heat disease caused by right coronary artery
Table 1 : original data from a research on heart disease
Gender Age Smoker LAD% RCA%
F 52 Y 85 100
M 62 N 80 0
M 75 Y 70 80
M 73 Y 40 99
M 66 N 50 45
… … … … …
e.g. Application of Association Rules in Medical Data Analysis
Results:
Table.2 Medical Association Rules
NO. Rule
1 Gender=M∩Age≥ 70∩ Smoker=Y RCA%≥ 50(40%,100%)
2 Gender=F∩Age<70∩ Smoker=Y LAD%≥ 70(20%,100%)
Rule 1 indicates : 40% of the cases are male, over 70 years old and
have the habit of smoking, the possibility of RCA%≥50% is 100%
Rule 2 indicates : 20% of the cases are female, under 70 years old and
have the habit of smoking, the possibility of LAD%≥70% is 100%
The future application of Data Mining in medical research
Data Mining is based on a series of new data process
technology
Wavelets Analysis
Neural Networks
Genetic Algorithm
Fuzzy Logic Reasoning
Challenges facing Data Mining
The data of medical research are always complicated and
unique in types and structures To integrate the specialized knowledge of both medical and data-
processing staffs Plenty of data and repeatedly practices are needed
Targets:
To form a real useful data mining system for
health research
Backpropagation Neural Networks (BPNNs)
A type of artificial neural network
A way to model highly complex, nonlinear solutions to
classification problems
Useful in classifying health-related phenomena
e.g. Classification of smoking cessation status with BPNN
Classifier performance estimates:
The confidence intervals of Az for both the BPNN and logistic classifiers are narrow
And exceeds random chance (Az=0.5) by at least 25% points
The finding indicates the performance of both classifiers exceeds that of random
chance
* Az: area under receiver operating characteristic curve
e.g. Classification of smoking cessation status with BPNN
The graph illustrates the
estimated true positive fraction
(TPF) at multiple values of the
false positive fraction (FPF)
The areas under the ROC curve
differ at a=0.05
Binormal conventional ROC curves for
BPNN and logistic regression classifiers
* ROC:receiver operating characteristic
Thank you!