22
NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report Co-chairs: Orit Levin (Microsoft) James Ketner (AT&T) Don Krapohl (Augmented Intelligence) July 24th, 2013

NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

  • Upload
    cili

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report. Co-chairs: Orit Levin ( Microsoft) James Ketner ( AT&T) Don Krapohl (Augmented Intelligence ) July 24th, 2013. Reference Architecture Objectives. - PowerPoint PPT Presentation

Citation preview

Page 1: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST BIG DATA WGReference Architecture Subgroup

Intermediate Report

Co-chairs:Orit Levin (Microsoft)James Ketner (AT&T)Don Krapohl (Augmented Intelligence)

July 24th, 2013

Page 2: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

Reference Architecture Objectives• Addresses a broad range of stakeholders (e.g., data owners, industries,

academia, policy makers)• Wide scope:• Encompasses the whole data life cycle or in the ecosystem• Can be applied to different use cases (including various verticals)• Represents different system architectures (e.g., an enterprise data warehouse,

distributed cloud-based system using multiple service providers)• Focus• Potentially with initial focus on the Big Data analytics and tools• Assists in identifying security and privacy issues

• Agnostic to any specific technologies

27/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 3: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

RA Diagram Independent Submissions• Different styles and perspectives, but easy to map between them• Data centric (Wo Chang)• Data Flow centric (Orit Levin, Bob Marcus)• Technology Layers / Stack diagram (Gary Mazzaferro)

• The vocabulary used in these submissions and on the mailing list has been compiled and submitted as M-0057

37/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 4: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 4

Abstract Reference Architectureby Wo Chang / NIST

7/24/2013

Page 5: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 5

Independent RA Proposals: Big DataSources, Usage, Transformation, and Infrastructure

7/24/2013

Data FlowDiagram by Bob

Marcus

Technology Stack / Layers

Diagram by G. Mazzaferro

Data Flow Ecosystem Diagram by Orit Levin

Page 6: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 6

Data Sources and Usage

7/24/2013

Data FlowDiagram by Bob

Marcus

Technology Stack / Layers

Diagram by G. Mazzaferro

Data Flow Ecosystem Diagram by Orit Levin

Page 7: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 7

Infrastructure: Storage, Security, and Management

7/24/2013

Data FlowDiagram by Bob

Marcus

Technology Stack / Layers

Diagram by G. Mazzaferro

Data Flow Ecosystem Diagram by Orit Levin

Page 8: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 8

Data Transformation: Processing, Analytics, and Visualization

7/24/2013

Data FlowDiagram by Bob

Marcus

Technology Stack / Layers

Diagram by G. Mazzaferro

Data Flow Ecosystem Diagram by Orit Levin

Page 9: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 9

Draft Agreement / Rough Consensus• Transformation includes• Processing functions• Analytic functions• Visualization functions

• Data Infrastructure includes• Data stores• In-memory DBs• Analytic DBs

7/24/2013

Sources

Transformation

Usage

Data

Infra

stru

ctur

e

Secu

rity

Man

agem

ent

Clou

d Co

mpu

ting

Net

wor

k

Page 10: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

Next Steps and AIs• Deliverable I: Write the White Paper draft showing one or more (e.g., Data Flow and

Stack approaches) using the same or similar terminology• AI: Chairs will start the draft of the document incorporating the submissions to the Ref Arch

subgroup• AI: Close cooperation between “Ref Arch” and “Def&Tax” sub-groups to produce the Output:

taxonomy for the RA diagrams with definitions for major entities/blocks; Input: M-0057.

• Deliverable II: A draft of a single RA requires more discussion and inputs based on the work of all sub-groups• AI: Chairs will start the draft of the document incorporating the findings of the Ref Arch subgroup• AI: Review the latest contributions to the Ref Arch and incorporate their findings (See email from

Yuri Demchenko / University of Amsterdam)• AI: Close cooperation with the “Use Cases” and “Security” sub-groups to identify the areas of

focus for “zooming” into their architecture

107/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 11: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

Backup Slides

117/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 12: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

12

Submitted RAs

7/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 13: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 13

Data Centric by Wo Chang / NIST

7/24/2013

Page 14: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

Data Flow Diagram by Bob Marcus

147/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 15: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

Individual Data Transfer

Big Data Transfer

Selected Data Storage and Retrieval

Big Data Storage and Retrieval

Aggregation

D a t a O b j e c t sData Sources

Data UsageGovernment (incl. health & financial institutions)Industries / BusinessesNetwork Operators / Telecom Academia

Data Mining

Matching

Collection

Data Transformation Data InfrastructureStorage & Retrieval

Man

agem

ent

Secu

rity

Cond

ition

ing

Anonymized

Pseudo- anonymized

PII

VOLUMEVARIETY

VELOCITY

Aggregation

15

Data Flow Ecosystem Diagram by Orit Levin

7/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 16: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

M i c r o s o f t

Technology Layers / Stack diagramby Gary Mazzaferro

167/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 17: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 17

Mapping to Technologies and Use CasesPrepared by the authors of the original RAs

7/24/2013

Page 18: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

187/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 19: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

19

An Example of Cloud Computing Usage in Big Data Ecosystem

Individual Data Transfer

Big Data Transfer

Selected Data Storage and Retrieval

Big Data Storage and Retrieval

Aggregation

D a t a O b j e c t sData Sources

Data UsageGovernment (incl. health & financial institutions)Industries / BusinessesNetwork Operators / Telecom Academia

Data Mining

Collection

Data Transformation Data Infrastructure

VOLUMEVARIETY

VELOCITY

Data Warehouse

Cloud Provider/ Service

LayerSa

aS

Paa S

IaaSMatching

7/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 20: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

Online Data Aggregator

Data Subject / PersonOnline Sources

Public Records (commons, government, etc.)

Offline Sources

Internal RecordsOther devices (Smart Grid, surveillance, scientific, etc.)

End User devices incl. OS (mobile phones, etc.)

Applications (search, publishers, etc.)

Match/Bridge Service

Networks

Government, health, financial institutions, academia

Industries /Businesses

Network Operators

Collection

DataManagementPlatforms(DMPs)

UI: Do Not Track (DNT)

HTTP: DNT

Analytic Cookie

DMP Cookie

DPI

Match Cookie

Appl. with customers (communications, social network, etc.

Match Container Tag or Pixel request

Offline Data Aggregator

Web Browsers

Data Mining

Person Attribution

UsersSSP DSPAdNet AdX AgencyPublisher AdvertiserAdvertising Industry Ecosystem

DMP Container Tagor Pixel request

Control

Aggregated

1st Party2nd Party

De-identifiedPII

3rd Party

Contextual Data Collection

Behavioral Data Creation

Big Data Transfer

Individual Data Transfer

20

Use Case: Advertising

7/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 21: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

Individual Data Transfer

Big Data Transfer

Selected Data Storage and Retrieval

Big Data Storage and Retrieval

Online Analytical Processing (OLAP)

Data Usage Department Data MartRegional Data MartSubject Data Mart Application Data Mart

Data Mining /Knowledge Discovery in Databases (KDD)

Extraction, Transformation, and Loading(ETL)

Data Transformation Data InfrastructureCentral Data Warehouse

Man

agem

ent

Secu

rity

ArchivesFilesOnline Transaction Processing (OLTP) Systems

MS Office Documents

Functional Data Mart

Operational Data Store

Staging Area

Data Sources

Manual

Managed Report Environment (MRE)

D a t a O b j e c t s

21

Use Case: Enterprise Data Warehouse

7/24/2013 NIST Big Data WG / Ref Arch Sub-group

Page 22: NIST BIG DATA WG Reference Architecture Subgroup Intermediate Report

NIST Big Data WG / Ref Arch Sub-group 227/24/2013