7
Introduction to Data Mining by Yen-Hsien Lee Department of Information Manag ement College of Management National Sun Yat-Sen University March 4, 2003

Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Embed Size (px)

Citation preview

Page 1: Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Introduction to Data Mining

byYen-Hsien Lee

Department of Information ManagementCollege of Management

National Sun Yat-Sen University

March 4, 2003

Page 2: Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

• What is Data Mining• Data Mining Process• Properties of Data Mining Applications• Data Mining Techniques

Outline

Page 3: Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

• Data mining is the process of extracting previously unknown, valid, and actionable patterns, knowledge, or high-level information from large databases.

What is Data Mining?

Page 4: Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Data Mining Process

Selection

Preprocessing

Transformation

Mining

Interpretation/Evaluation

Data TargetData

PreprocessedData

TransformedData

Patterns

Knowledge

Page 5: Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Properties of Data Mining Applications

• Business-question-driven process• Multiple data mining technique potentially

appropriate for a data mining task• Hybrid approach for better data mining res

ults• Importance of data prospecting (selection)

and cleaning (preprocessing)• Unavoided knowledge post-processing• etc.

Page 6: Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Data Mining Techniques

• Classification– Process that establishes classes with attributes

from a set of instances (called training examples) in a database.

• Clustering Analysis– Process of creating a partition so that all

members of each cluster are similar according to some metric (e.g., distance between objects).

• Association Rule Analysis– Discovery of association rules showing attribute-

value conditions that occur frequently together in a given set of data

Page 7: Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003

Data Mining Techniques (Cont’d)

• Sequential Pattern Analysis– Discovery the sequential occurrence of items

across ordered transactions over time.

• Time-series Similarity Analysis– To find those sequences that are similar to a

query sequence Q (called whole matching), or to identify the sequences that contain subsequences similar to Q (called subsequence matching).

• Link Analysis• Text Mining