Upload
easter
View
44
Download
0
Embed Size (px)
DESCRIPTION
Data Mining and Virtual Observatory. Yanxia Zhang National Astronomical Observatories,CAS DEC.2 2004. Outline. Why What How. Astronomy is Facing a Major “ Data Avalanche” : - PowerPoint PPT Presentation
Citation preview
1
Data Mining and Virtual Data Mining and Virtual ObservatoryObservatory
Yanxia Zhang
National Astronomical Observatories,CAS
DEC.2 2004
2
OutlineOutline
Why
What
How
3
Astronomy is Astronomy is Facing a Major Facing a Major Data AvalancheData Avalanche
Astronomy is Facing a Major “Data Avalanche”:
Multi-Terabyte Sky Surveys and Archives (Soon: Multi-Petabyte), Billions of Detected Sources, Hundreds of Measured Attributes per Source …
4
Understanding of Complex Astrophysical Phenomena Requires Complex and Information-Rich Data Sets,and the Tools to Explore them …
… This Will Lead to a Change in the nature of the Astronomical Discovery Process …
… Which Requires A New Research Environment for Astronomy: VO
Necessity Is the Mother of Invention
5
DM
Database system,Data warehouse,
OLAPstatistics
Other disciplines
Information science
ML&AI Visualization
DM: Confluence of Multiple Disciplines
6
What is DM?What is DM?
The search for interesting patterns,
in large databases,
that were collected for other applications,
using machine learning algorithms,
high-performance computers
and others methods
for science and society!
7
Data Mining: A KDD ProcessData Mining: A KDD Process
Data mining: the core of knowledge discovery process.
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
8
Data Mining Data Mining
Increasing potentialto support decisions
End User
scientist Analyst
DataAnalyst
DBA
KwonledgeDiscovery
Data Presentation
Visualization Techniques
Data MiningInformation Discovery
Data ExplorationOLAP, MDA,
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data Sources(Paper, Files, Information Providers, Database Systems, OLTP)
9
Architecture: Typical Data Mining SystemArchitecture: Typical Data Mining System
Data Warehouse
Data cleaning & data integration Filtering
Databases
Database or data warehouse server
Data mining engine
Pattern evaluation
Graphical user interface
Knowledge-base
10
The ratio of every DM stepThe ratio of every DM step
0
10
20
30
40
50
60
Decide target Data preparing Data mining Evaluation
11
DM: On What Kind of Data?DM: On What Kind of Data?
Relational databases Data warehouses Transactional databases Advanced DB systems and information repositories
Object-oriented and object-relational databases
Spatial databases Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
WWW
12
Data Mining FunctionalityData Mining Functionality
Concept description
Association
Classification and Prediction
Clustering
Time-series analysis
Other pattern-directed or statistical analysis
13
RA Dec
WavelengthTime
Flux
Taking a Broader View: The Observable Parameter Space
Along each axis the measurements are characterized by the position, extent, sampling and resolution. All astronomical measurements span some volume in this parameter space.
Propermotion
Non-EM …
Polarization
Morphology / Surf.Br.
What is the coverage?Where are the gaps?Where do we go next?
14
How and Where are Discoveries Made?How and Where are Discoveries Made?
Conceptual Discoveries: e.g., Relativity, QM, Brane World, Inflation … Theoretical, may be inspired by observations
Phenomenological Discoveries: e.g., Dark Matter, QSOs, GRBs, CMBR, Extrasolar Planets, Obscured Universe …
Empirical, inspire theories, can be motivated by them
New TechnicalCapabilities
ObservationalDiscoveries Theory
IT/VO (VO)
Phenomenological Discoveries:
Pushing along some parameter space axis VO useful
Making new connections (e.g., multi-) VO critical!
Understanding of complex astrophysical phenomena requirescomplex, information-rich data (and simulations?)
15
Exploration of observable parameter spaces and Exploration of observable parameter spaces and searches for searches for rare or new types of objectsrare or new types of objects
16
But Sometimes You Find a Surprise…But Sometimes You Find a Surprise…
17
Precision Cosmology and LSSPrecision Cosmology and LSS Better matching of theory and observations Better matching of theory and observations
DPOSS Clusters (Gal et al.) LSS Numerical Simulation (VIRGO)
Clustering on a clustered background Clustering with a nontrivial topology
18
A Possible Example of an “Orphan Afterglow” (GRB?) discovered in DPOSS: an 18th mag transient associated with a 24.5 mag galaxy.At an estimated z ~ 1, the observed brightness is ~ 100 times that of a SN at the peak.
Or, is it something else, new?
Exploration of the Time Domain: Optical Transients
DPOSS
Keck
19
Exploration of the Time Domain:Exploration of the Time Domain:Faint, Fast Transients (Tyson et al.)Faint, Fast Transients (Tyson et al.)
20
Comparison between HI, H, and 100 Diffuse Emission
IRAS 100 Micron ImageDPOSS red image
Brunner et al.
Exploring the Low Surface Brightness(Low Contrast) Universe
21
Background Enhancement Technique demonstratedon two knownM31 dwarf spheroidals
(Brunner et al.)
22
Data Mining in the Image Domain: Can We Discover NewTypes of Phenomena Using Automated Pattern Recognition?(Every object detection algorithm has its biases and limitations)
23
An OLAM ArchitectureAn OLAM Architecture
Data Warehouse
Meta Data
MDDB
OLAMEngine
OLAPEngine
User GUI API
Data Cube API
Database API
Data cleaning
Data integration
Layer3
OLAP/OLAM
Layer2
MDDB
Layer1
Data Repository
Layer4
User Interface
Filtering&Integration Filtering
Databases
Mining query Mining result
24
View of Warehouses and HierarchiesView of Warehouses and Hierarchies
Importing data Table Browsing Dimension creation Dimension browsing Cube building Cube browsing
25
Selecting a Data Mining TaskSelecting a Data Mining Task
Major data mining functions: Summary
(Characterization) Association Classification Prediction Clustering Time-Series Analysis
26
Mining Characteristic RulesMining Characteristic Rules
Characterization: Data
generalization/summarization at
high abstraction levels.
An example query: Find a
characteristic rule for Cities
from the database ‘CITYDATA'
in relevance to location,
capita_income, and the
distribution of count% and
amount%.
27
Browsing a Data CubeBrowsing a Data Cube
Powerful visualization OLAP capabilities Interactive manipulation
28
Visualization of Data Dispersion: Boxplot Visualization of Data Dispersion: Boxplot AnalysisAnalysis
29
Mining Association Rules ( Table Form )Mining Association Rules ( Table Form )
30
Association Rule in Plane FormAssociation Rule in Plane Form
31
Association Rule GraphAssociation Rule Graph
32
Mining Classification RulesMining Classification Rules
33
Prediction: Numerical DataPrediction: Numerical Data
34
Prediction: Categorical DataPrediction: Categorical Data
35
DMiner: ArchitectureDMiner: Architecture
Graphic User Interface
Infrared DB ……. DB Radio DB
Comparator
Characterizer
Classifier
Cluster Analyzer
Associator
Future Modules Future Modules
Database and Cube Server
Optical DB
36
Image features
Keywords
WordNet
Keyword Hierarchy
Metadata
Pre-built Concept Hierarchiesfor colour, texture, format, etc.
Pre-processingData Cubes and
Numeric Hierarchies Real-time Interaction
Pattern discoveries
A System Prototype for MultiMedia Data Mining
Internet Domain Hierarchy
Simon Fraser University
WWW
37
WWW
Media Descriptors
Data CubeDimensions
Mining Engine
Discoveries
Database
Simon Fraser University
38
WebLogMiner ArchitectureWebLogMiner Architecture
Web log is filtered to generate a relational database
A data cube is generated form database OLAP is used to drill-down and roll-up in the
cube OLAM is used for mining interesting knowledge
1Data Cleaning
2Data CubeCreation
3OLAP
4Data Mining
Web logDatabase
Data Cube Sliced and dicedcube
Knowledge
39
VO: Conceptual ArchitectureVO: Conceptual Architecture
Data ArchivesData Archives
Analysis toolsAnalysis tools
Discovery toolsDiscovery toolsUser
Gateway
40
ConclusionConclusion
◆ Development and application of DM in astronomy;
◆ Automated DM, visulized DM and audio DM;
◆ Integrate VO and DM.
The next golden age of discovery in astronomy
come eariler!
41
Q&A?Q&A?
Thank you !!!Thank you !!!