Upload
dustin-whitehead
View
214
Download
0
Embed Size (px)
Citation preview
Introduction to Data Mining
byYen-Hsien Lee
Department of Information ManagementCollege of Management
National Sun Yat-Sen University
March 4, 2003
• What is Data Mining• Data Mining Process• Properties of Data Mining Applications• Data Mining Techniques
Outline
• Data mining is the process of extracting previously unknown, valid, and actionable patterns, knowledge, or high-level information from large databases.
What is Data Mining?
Data Mining Process
Selection
Preprocessing
Transformation
Mining
Interpretation/Evaluation
Data TargetData
PreprocessedData
TransformedData
Patterns
Knowledge
Properties of Data Mining Applications
• Business-question-driven process• Multiple data mining technique potentially
appropriate for a data mining task• Hybrid approach for better data mining res
ults• Importance of data prospecting (selection)
and cleaning (preprocessing)• Unavoided knowledge post-processing• etc.
Data Mining Techniques
• Classification– Process that establishes classes with attributes
from a set of instances (called training examples) in a database.
• Clustering Analysis– Process of creating a partition so that all
members of each cluster are similar according to some metric (e.g., distance between objects).
• Association Rule Analysis– Discovery of association rules showing attribute-
value conditions that occur frequently together in a given set of data
Data Mining Techniques (Cont’d)
• Sequential Pattern Analysis– Discovery the sequential occurrence of items
across ordered transactions over time.
• Time-series Similarity Analysis– To find those sequences that are similar to a
query sequence Q (called whole matching), or to identify the sequences that contain subsequences similar to Q (called subsequence matching).
• Link Analysis• Text Mining