25
11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 [email protected] Vijay Kumar UMKC Kansas City, Missouri 64110 [email protected]

11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 [email protected] Vijay Kumar UMKC Kansas

Embed Size (px)

Citation preview

Page 1: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

111/26/07 – IRADSN’07

Stream Hierarchy Data Mining for Sensor Data

Margaret H. DunhamSMU

Dallas, Texas 75275

[email protected]

Vijay KumarUMKC

Kansas City, Missouri 64110

[email protected]

Page 2: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

2

From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization

Temporal Heat Map Data Stream Modeling

Extensible Markov Model Data Stream Hierarchy

11/26/07 – IRADSN’07

Page 3: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

3

From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization

Temporal Heat Map Data Stream Modeling

Extensible Markov Model Data Stream Hierarchy

11/26/07 – IRADSN’07

Page 4: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

4

From Sensors to Streams

Data captured and sent by a set of sensors is usually referred to as “stream data”.

Real-time sequence of encoded signals which contain desired information. It is continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items

Stream data is infinite - the data keeps coming.

11/26/07 – IRADSN’07

Page 5: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

5

Data Stream Management Systems (DSMS)

Software to facilitate querying and managing stream data.

Retrieve the most recent information from the stream Data aggregation facilitates merging together multiple

streams Modeling stream data to “summarize” stream Visualization needed to observe in real-time the spatial

and temporal patterns and trends hidden in the data.

11/26/07 – IRADSN’07

Page 6: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

6

DSMS Problems Stream Management development in state similar to

that of databases prior to 1970’s Each system/researcher looks at specific

application or system No standards concerning functionality No standard query language

Unreasonable to expect end users will access raw data, data in the DSMS, or even data at a summarized view

Domain experts need to “see” a higher level of data

11/26/07 – IRADSN’07

Page 7: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

7

Our Proposal

Four level data abstraction to facilitate the creation of actionable intelligence for domain experts evaluating sensor data.

11/26/07 – IRADSN’07

Page 8: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

8

From Sensors to Streams – An Outline Data Stream Overview

Data Stream Visualization Temporal Heat Map

Data Stream Modeling Extensible Markov Model

Data Stream Hierarchy

11/26/07 – IRADSN’07

Page 9: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

9

Assumptions for Our Research

End User: May not be knowledgeable concerning sensors Probably a Domain Expert May not need to see exact sensor values Concerned with trends and approximate values Need to see data from MANY sensors at one time Need to see data continuously in a visualization of

the stream

11/26/07 – IRADSN’07

Page 10: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

10

Suppose There Were MANY Sensors

Traditional line graphs would be very difficult to read Requirements for new visualization technique:

High level summary of data Handle multiple sensors at once Continuous Temporal Spatial

11/26/07 – IRADSN’07

Page 11: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

11

Temporal Heat Map

Also called Temporal Chaos Game Representation (TCGR) Temporal Heat Map (THM) is a visualization technique for streaming

data derived from multiple sensors. It is a two dimensional structure similar to an infinite table. Each row of the table is associated with one sensor value. Each column of the table is associated with a point in time. Each cell within the THM is a color representation of the sensor value Colors normalized (in our examples)

0 – While 0.5 – Blue 1.0 - Red

11/26/07 – IRADSN’07

Page 12: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

1210/11/07

NGDM'07

Cisco – Internal VoIP Traffic Data

• Time →

•V

alue

s →

• Complete Stream: CiscoEMM.png

• VoIP traffic data was provided by Cisco Systems and represents logged VoIP traffic in their Richardson, Texas facility from Mon Sep 22 12:17:32 2003 to Mon Nov 17 11:29:11 2003.

Page 13: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

13

Derwent River (UK)

11/26/07 – IRADSN’07

28043

28011

28048

28010

28023

28117

Derwent Temporal Heat Map

derwentrotate.png

Page 14: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

14

From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization

Temporal Heat Map

Data Stream Modeling Extensible Markov Model

Data Stream Hierarchy

11/26/07 – IRADSN’07

Page 15: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

15

Data Stream Modeling Requirements

Summarization (Synopsis )of data Use data NOT SAMPLE Temporal and Spatial Dynamic Continuous (infinite stream) Learn Forget Sublinear growth rate - Clustering

11/26/07 – IRADSN’07

Page 16: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

16

Extensible Markov Model Extensible Markov Model (EMM): at any time t, EMM

consists of a Markov Chain with designated current node, Nn, and algorithms to modify it, where algorithms include:

EMMCluster, which defines a technique for matching between input data at time t + 1 and existing states in the MC at time t.

EMMIncrement algorithm, which updates MC at time t + 1 given the MC at time t and clustering measure result at time t + 1.

EMMDecrement algorithm, which removes nodes from the EMM when needed.

 In addition, the EMM has associated Data Mining functions such a Rare Event Detection and Prediction

Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374.11/26/07 – IRADSN’07

Page 17: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

1710/11/07NGDM'07

EMM Learning

• <18,10,3,3,1,0,0>

• <17,10,2,3,1,0,0>

• <16,9,2,3,1,0,0>

• <14,8,2,3,1,0,0>

• <14,8,2,3,0,0,0>

• <18,10,3,3,1,1,0.>

• 1/3

• N1

• N2

• 2/3

• N3

• 1/1• 1/3

• N1

• N2

• 2/3

• 1/1

• N3

• 1/1

• 1/2

• 1/3

• N1

• N2

• 2/3 • 1/2

• 1/2

• N3

• 1/1

• 2/3

• 1/3

• N1

• N2

• N1

• 2/2• 1/1

• N1

1

Page 18: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

1811/26/07 – IRADSN’07

N2

N1 N3

N5 N6

2/2

1/3

1/3

1/3

1/2

N1 N3

N5 N6

1/61/6

1/6

1/31/3

1/3

EMM Forgetting

Page 19: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

1911/26/07 – IRADSN’07

EMM Sublinear Growth Rate

Minnesota Department of Transportation (MnDot)

Page 20: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

20

From Sensors to Streams – An Outline Data Stream Overview Data Stream Visualization

Temporal Heat Map Data Stream Modeling

Extensible Markov Model

Data Stream Hierarchy

11/26/07 – IRADSN’07

Page 21: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

21

Traditional DBMS Data Abstraction

Three levels of data abstraction Physical, Logical External

Data is normally pulled to the user by a query

11/26/07 – IRADSN’07

Page 22: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

22

Proposed DSMS Data Abstraction Abstraction

Level 0 - Physical Level• Raw data from sensors• Cannot be stored

Level 1 – DSMS• Sensor data is merged, aggregated, and cleansed. • DSMS queries may be processed against this data.

Level 2 – Model• Summarization (Synopsis )of data

Level 3 – Domain Expert• Summary Visualization

Data is normally pushed to the user11/26/07 – IRADSN’07

Page 23: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

2311/26/07 – IRADSN’07

Levels Lowest Level

Highest Level Abstraction

Inter-level Data Migration

Memory Hierarchy

n External Storage

Subset/Cache/Buffer Fetch/Prefetch

DBMS Data Hierarchy

3 Physical Storage

External View Fetch, Prefetch

Data Warehouse

n Operational Data

Cube/Multidimensional View

Aggregation

Stream Hierarchy

4 Sensor Data Visualization/Triggers Automatic Push

Page 24: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

2411/26/07 – IRADSN’07

Data StreamManagement System(DSMS)

N2

N3N1

N4 N5

P21

P12

P41

P31

P34P15

P24

P53

P55

LEVEL 3Domain Expert

LEVEL 2Model

LEVEL 1DSMS

LEVEL 0Sensors

Streams

Triggers, Lookmarks, Anomalies

Data MiningApplications

Temporal DynamicSynopsis

Actionable Intelligence

Visualization

Query

Scratch Space

Page 25: 11/26/07 – IRADSN’07 1 Stream Hierarchy Data Mining for Sensor Data Margaret H. Dunham SMU Dallas, Texas 75275 mhd@engr.smu.edu Vijay Kumar UMKC Kansas

25

Stream Hierarchy Summary

Except for the inter-level functionality requirements, each level functionality is independent of the others and may differ across different implementations.

The model used must capture time and ordering of data, be able to both learn and forget, and use some variation of clustering.

Visualization at the domain expert level must capture both time and ordering. It addition it should be able to be easily “read” for many sets of sensors.

11/26/07 – IRADSN’07