Transcript
Page 1: The Rensselaer IDEA: Data Exploration

Data ExplorationJim Hendler

Director, Rensselaer Institute for Data Exploration and Applications

THE RENSSELAER IDEARensselaer Polytechnic Institute, USA

http://www.cs.rpi.edu/~hendler

Page 2: The Rensselaer IDEA: Data Exploration

IDEA

• Data-driven Medical and Healthcare Applications• Predictive Models for Business and Economics• “Biome” studies for Built and Natural Environments• Question Answering from texts and data• Resiliency Models for Population-Scale Problems and cyber-

security domains• Semantically-enabled Data Services for Science and

Engineering Research• Materials genome and nano-manufacturing informatics• Platforms for testing Policy and Open Data issues • …

Data-driven research areas at RPI

Page 3: The Rensselaer IDEA: Data Exploration

IDEA

The Rensselaer IDEA: empowering our researchers

Data discovery, integration,

and interaction technologies

Application-specificdata tools

Page 4: The Rensselaer IDEA: Data Exploration

IDEA

High Performance Modeling and Simulation• Center for Computational Innovation

Cognitive Computing • Watson at Rensselaer IBM Partnership

Perceptualization• Experimental Multimedia Performing Arts Center

Data Science• Data Science Research Center

The trunk: Shared Data Technologies

Page 5: The Rensselaer IDEA: Data Exploration

IDEA

Roots: Data Exploration

Discover

Integrate

Validate

Explain

Geekopedia: Data exploration helps a data consumer focus an information search on the pertinent aspect of relevant data before true analysis can be achieved. In large data sets, data is not gathered or controlled in a focused manner. Even in smaller data sets, it is also true that data gathered are not in a very rigid and specific technique can result in a disorganized manner and a myriad of subsets each…

DATA

Page 6: The Rensselaer IDEA: Data Exploration

IDEA

Data Exploration Challenges

Discover

Integrate

Validate

Explain

These needs live outside traditional data/info architectures

Page 7: The Rensselaer IDEA: Data Exploration

IDEA

Discovery needs semantics

How do you find the Data you need?

Middle Eastern Terrorists for $800 ?

Page 8: The Rensselaer IDEA: Data Exploration

IDEA

Discovery – there’s a lot out there

Page 9: The Rensselaer IDEA: Data Exploration

IDEA

Discovery needs more than keywords

World Bank: Africa

US Data.gov: Crop

Africover: Agriculture

Kenya: Agricultural

Page 10: The Rensselaer IDEA: Data Exploration

IDEA

Integration needs Semantics

Person

RIN 660125137

Address # 1118

Address St Pinehurst

Address zip 12203

Course topic CSCI

Course # 4961

Campus Personnel

RPI ID 660125137

Name Hendler

Campus Classes

CRN 1118

Name Intro to Physics

YES

NO!!!!

Page 11: The Rensselaer IDEA: Data Exploration

IDEA

Semantic Web and Linked Data (UK)

County Council

Ordnance Survey

Royal Mail

IOGDC Open Data Tutorial 11

Page 12: The Rensselaer IDEA: Data Exploration

IDEADistribution Statement

http://logd.tw.rpi.edu

Data Mashups

Page 13: The Rensselaer IDEA: Data Exploration

IDEA

Validation needs semantics

Easy for us

Page 14: The Rensselaer IDEA: Data Exploration

IDEA

Hard for machines…

Head to head comparison shows that burglaries in Avon and Somerset (UK) far exceed those in Los Angeles, California

Page 15: The Rensselaer IDEA: Data Exploration

IDEA

Data + everything else you know

Same or different?

Do the terms mean the same? Are they collected in the same way? Are they processed differently? …

Page 16: The Rensselaer IDEA: Data Exploration

IDEA

Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)

Validation/Explanation need knowledge

Statistical correlation needs explanation

Page 17: The Rensselaer IDEA: Data Exploration

IDEA

Explanation also needs Semantics

Inference Web: McGuinness – various DoD/IC projects

Page 18: The Rensselaer IDEA: Data Exploration

IDEA

Closing the loop: where do the semantics come from?

Data

Prediction

Model

Design

How do we go from the predictive analytics of Big Data to models/explanations that allow newunderstanding?

Page 19: The Rensselaer IDEA: Data Exploration

IDEA

1. Better tools for Analytics, Agents and HPC

Make the tools and algorithms being developed by RPI researchers more “reusable” and multitask (including HPC data-analytic tools)

Page 20: The Rensselaer IDEA: Data Exploration

IDEA

2. Next-Gen Visualization (at scale)

How can multi-modal, multi-user, large scale sensory (visualization, sonification, haptics) interaction change the way we understand data?

Page 21: The Rensselaer IDEA: Data Exploration

IDEA

3. Include “agents” in the modeling

Develop technologies that enable researchers to work with “human-based” data at larger scales and in new ways• Population-scale

computing models for agent-based simulations

Page 22: The Rensselaer IDEA: Data Exploration

IDEA

Approach

Platform: Research in using supercomputers fordiscrete modeling• Carothers’ ROSS model

KR Model:• Weaver’s restricted rules

on graphs

Challenge problem:• Classification algorithms at petaflop scale• “Logical” (nonlinear, discontinuous) agents

Page 23: The Rensselaer IDEA: Data Exploration

IDEA

4. Exploit Cognitive Computing

IDEA will be the hub of Rensselaer’s cognitive-computing research• eg. Answer questions such as “Why” and “How”

integrated with large scale simulations

Page 24: The Rensselaer IDEA: Data Exploration

IDEA

Watson’s parallel model

Distributed (coarse-grained) parallelism© Making Watson Fast, IBM J Res and Dev,3/4 2012

Page 25: The Rensselaer IDEA: Data Exploration

IDEA

DeepQA type approach best on large clusters

(Physical) Simulation runs on supercomputers

Cognitive Computing at Scale

Page 26: The Rensselaer IDEA: Data Exploration

IDEA

Approach: link these computational models

Surmise (unproven): Cognitive Computing on a fast (large) cluster can query computations run against data generated by simulations (physical or agent-based) on the supercomputer

Page 27: The Rensselaer IDEA: Data Exploration

IDEA

• Semantics is a key technology for common data services

5. Data services will provide synergy across disciplines

Discovery, Integration. ValidationCuration, Citation,Archiving …

Page 28: The Rensselaer IDEA: Data Exploration

IDEA

Conclusions• The “warehouse” is only a small part of the data

ecosystem• Database technologies are only part of the story• Discovery, Integration, … , validation, explanation are key to

solving problems with data

• Closing the loop means “exploring” our data • Humans are still a key player in this

• The Rensselaer IDEA will explore• Data-driven applications and tools, but also…• … multimodal visualization, multiscale and agent modeling,

cognitive computing, and semantic data platforms

Page 29: The Rensselaer IDEA: Data Exploration

Rensselaer Institute for Data Exploration and Applications


Recommended