DBMiner
- A System for Mining Knowledge from Large Data Sources
이진숙 인공지능 연구실 석사 3 학기[email protected]
1999 년 12 월 8 일 수
Content
Introduction Architecture Functionalities DMQL and Interactive Data Mining Implementation of DBMiner Demonstration Conclusion Future work Reference
Introduction ( 1/3 ) - Who ?
Data Mining Research Group
Intelligent Database Systems Research Lab.
Simon Fraser University
British Columbia, Canada
http://db.cs.sfu.ca/
- Version DBMiner 2.0 (Enterprise) - ? A mini-version of DBMiner E1.1 **
- table 크기는 1000 개의 행 ( 레코드 ) 로 cube 크기는 3 dimension 으로 제한됨
Introduction - why? ( 2/3 ) 배경
-> Data warehousing
multiple sources 로부터 large warehouses 로 데이터를 통합하는 것
-> Data mining (knowledge discovery in databases)
Extraction of interesting knowledge
Motive : Data explosion problem
DBMiner : 관계형 DB 와 데이터 웨어하우스를 위한 OLAP 마이닝 시스템 * Goal : 대량의 RDB 에서 multiple-level knowledge 의 Interactive mining 을 위해 개발됨
Introduction ( 3/3 ) - overview DBMiner 의 특징 >
• Data Mining 과 Data Warehousing 기술의 통합
• A generalization-based data mining tool for knowledge discovery from large relational data sources.
• Multiple data mining function modules
• 강력한 DMQL
• Mining knowledge at multiple concept levels.
• Data warehousing and OLAP capabilities.
• Integrated
• Interactive - GUI 제공
Architecture
General architecture of DBMiner
Graphical User Interface
SQL server Discovery Modules
Data Concept HierarchyConcept Hierarchy
Modules of DBMiner
Knowledge Discovery Modules of DBMiner
SummarizerSummarizer ComparatorComparator ClassifierClassifier
AssociationAssociationRule MinerRule Miner
Time SeriesTime SeriesAnalyzer Analyzer
PredictorPredictor
ClusterClusterAnalyzer Analyzer
Meta-RuleMeta-RuleGuidedGuided
FutureFuture ModulesModules
Major modules ( E 1.0 에서 )
Data warehouse construction module -> for automatic dimension generation and data cube creation 3-D cube view of the data warehouse 3-D boxplot (statistical) view of the data warehouse OLAP-based data summarizer Associator -> for mining association rules Classifier -> for data classification and decision tree 생성 Predictor -> for regression analysis and predictive modeling
Functionalities of modules (1/2)
Characterizer ( generalize )
사용자가 명시한 자료집합의 일반적 특성을 요약
Comparator (mines )
discriminant rule 의 집합을 찾는 것
Classifier (analyze)
training data 집합을 분석하고 , 각 클래스에 대한 모델을 생성
Associator (discover)
association rules 의 집합을 찾는 것
Functionalities of modules (2/2)
Meta-pattern guided miner
명세화된 meta 규칙 형태를 찾는 데이터마이닝 메커니즘
Predictor ( predict )
가능한 값을 예측
Cluster analyzer ( grouping )
선택된 데이터 집합을 그룹화
Time -series analyzer
Future modules
DMQL and Interactive Data Mining
DMQL (Data Mining Query Language) 제공 Graphical User Interface 제공
목적 - 복합 단계 지식의 상호 대화적인 마이닝을 위해
query process -> 관련데이타의 수집 : relational query -> data 일반화 (generalize) : attribute-oriented induction -> 출력 : 관계일반화 , feature table 일반화 , 일반화된 룰의 복합적인 형식 , pie, bar charts, curves 등을 제공가능
Implementation of DBMiner (1/4)
Data generalization : DBMiner 의 핵심기능 ( = Summarization,Characterization)
데이터 일반화에서 고려되는 두가지 데이터 구조 - 일반화된 관계 구조 VS 다차원 데이터 큐브 구조
목적 : 효과적인 구현을 위해서 저장공간을 줄임 , 빠른 access, 비용감소
roll-up , drill down 등을 통한 multi-concept level -> data generalized
* Generalized relation
- 속성집합과 aggregate 속성집합으로 구성된 한 릴레이션
Implementation of DBMiner (2/4)
Multiple-level characterization
data characterization
- summarize and characterize
목적 : multi-level knowledge mining 하기 위해
응용기술 - progressive deepening (drill-down)
- progressive generalization (roll-up)
Implementation of DBMiner (3/4)
판별 규칙의 탐색
복합 단계 association
-> inter-attribute association
-> intra-attribute association
Meta-rule guided mining
-> meta-rule (mata-pattern) ; 명세화된 constraint .
-> 목적 : 많은 종류의 규칙들의 마이닝을 안내하기 위해 사용 .
Implementation of DBMiner (4/4)
Classification
-> 목적 : 각 클래스를 모델링하거나 설명을 하기 위해 -> Decision-tree method 를 사용 ( ID3, C4.5, 통계적 방법 , 신경망 , rough set 과 같은 )
-> classfier
Prediction
-> 데이터 값 , 값의 분산
Clustering ; 각 cluster 는 공통 특성을 공유 -> unsupervised learning
-> 클래스의 한 집합으로 데이터 집합을 분할하는 과정
The Major System Components of DBMinerThe Major System Components of DBMiner
The Warehouse Workspace (Browser)
– Building a data warehouse– Browsing a data cube
The Mining Wizard
The Data Mining Modules
The Warehouse WorkspaceThe Warehouse Workspace
Importing dataTable browsingDimension creationDimension browsingCube buildingCube browsing
Dimension CreationDimension Creation
Create dimensions or measurements by selecting appropriate table columnsContext sensitive menus appear
Building a Data CubeBuilding a Data Cube
Adding dimensions to a cube Adding measurements to a cube Deleting/Modifying cube elements The “Build” command
Browsing a Data CubeBrowsing a Data Cube
Powerful visualization OLAP capabilities Interactive manipulation
DicingDicing through to a “Subcube”
Double click on a particular cell e.g. cell =
Product (Environmental Line)Revenue (0-2000)Location (Far East)
Data Cube Aggregation for Summarization
sum
0-20K20-40K 60K- sum
Comp_Method
… ...
sum
Database
Amount
Province
Discipline
40-60KB.C.
PrairiesOntario
All AmountComp_Method, B.C.
Each dimension contains a hierarchy of values for one attributeA cube cell stores aggregate values, e.g., count, sum, max, etc.A “sum” cell stores dimension summation values.Sparse-cube technology and MOLAP/ROLAP integration.
Demonstration
Configuration
- Windows 95 에서 - DBMiner Educational Demo version 1.1
- Sample DB , local warehouse 를 이용
Conclusion (1/3)< DBMiner 의 주요기술 >
• OLAP technology
• Multi-level and multiple mining modules
• Interactive OLAP-based mining and visual graphical display
• A data mining query language DMQL and mining in both relational databases and data warehouses.
Conclusion (2/3)
< DBMiner 의 응용용도 >
• To query, report and analyze ( 관계형 데이터베이스나 데이터웨어하우스를 )
• Ideally suited for -> 이익과 성장 분석 -> 전략적 관리 -> 고객관계 관리 -> 자산관리 -> business management -> decision support efforts ( Business process reengineering(BRP) , total quality management (TQM) )
Conclusion (3/3)
< DBMiner 의 주요특징 >
OLAP
Attribute 기반 귀납법
통계분석
복합 단계 지식을 마이닝
몇가지 흥미있는 데이터마이닝 기술을 결합
사용자에게 친숙한 interactive 데이터마이닝 환경을 제공
Future Work
새로운 종류의 지식을 마이닝 - evolution, deviation, pattern-matching
GeoMiner : Spatial Data Mining
Library Miner
Multimedia Miner
WebMiner : WWW Data Mining
Reference J. Han, J. Chiang, S. Chee, J. Chen, Q. Chen, S. Cheng, W. Gong, M. Kamber, K.Koperski,
G. Liu, Y. Lu, N. Stefanovic, L. Winstone, B. Xia, O. R. Zaiane, S. Zhang, H. Zhu,
``DBMiner: A System for Data Mining in Relational Databases and Data Warehouses'',
Proc. CASCON'97: Meeting of Minds, Toronto, Canada, November 1997.
Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Wan Gong, Krzysztof Koperski, Deyi Li, Yijun Lu, Amynmohamed Rajan, Nebojsa Stefanovic, Betty Xia, Osmar R. Zaiane,
`` DBMiner: A System for Mining Knowledge in Large Relational Databases “
Proc. 1996 Int'l Conf. on Data Mining and Knowledge Discovery (KDD'96) ,
Portland, Oregon, August 1996, pp. 250-255.
The Data Mining Research Group,
``Introduction to DBMiner and Data Mining and Warehousing Concepts''
(Microsoft PowerPoint version), Boeing Workshop, Seattle, Washington, December 1997.
http://db.cs.sfu.ca/ School of Computing Science , Simon Fraser University
DBMiner Information and Demo: http://db.cs.sfu.ca/DBMiner