33
1 © Copyright 2011 EMC Corporation. All rights reserved. GREENPLUM CHORUS Cloud Computing for Data Warehousing and Analytics Manish Jiandani Director, Product Management EMC Data Computing Division

GREENPLUM CHORUS - Cloud Computing for Data Warehousing and Analytics

Embed Size (px)

DESCRIPTION

GREENPLUMCHORUSCloud Computing forData Warehousing and Analytics

Citation preview

1 © Copyright 2011 EMC Corporation. All rights reserved.

GREENPLUM CHORUS Cloud Computing for Data Warehousing and Analytics

Manish Jiandani Director, Product Management EMC Data Computing Division

2 © Copyright 2011 EMC Corporation. All rights reserved.

J U LY 2 0 1 0 - E M C A C Q U I R E S G R E E N P L U M

Greenplum Becomes the Foundation of EMC’s Data Computing Division

“For three years, Gartner has identified Greenplum as the most advanced vendor in the visionary

quadrant of its data warehouse DBMS Magic Quadrant….” – Gartner

3 © Copyright 2011 EMC Corporation. All rights reserved.

Data Volume Growing 44x

2020: 35.2 Zettabytes

2009: 0.8

Zettabytes

Big Data Size: The Volume Of Data Continues To Explode The Digital Universe 2009 - 2020

Source: IDC Digital Universe Study, sponsored by EMC, May 2010

4 © Copyright 2011 EMC Corporation. All rights reserved.

Big Data Significance: Not Just For Google and Facebook…

“Just as search engines have transformed how we access information, other forms of big data computing can and will transform the activities of companies, scientific researchers, medical practitioners, and our nation's defense and intelligence operations.”

Randal E. Bryant Carnegie Mellon University

Randy H. Katz UC Berkeley

Edward D. Lazowska University of Washington

5 © Copyright 2011 EMC Corporation. All rights reserved.

The Analytics Reality – Sound Familiar? • Finding data is hard – data is inaccessible,

undocumented • Exploring and transferring data is slow • Shadow IT systems due to lack of provisioning • No sharing of code and data, no standardized best

practices • Lost insights

6 © Copyright 2011 EMC Corporation. All rights reserved.

The World of the Analyst: It All Starts with a Question

Hey, analyst! I have a question!

Great, business colleague!

I’m ready…

7 © Copyright 2011 EMC Corporation. All rights reserved.

STEP 1 The Hunt for Data

When can I get access

to data?

8 © Copyright 2011 EMC Corporation. All rights reserved.

STEP 2 Data Exploration and Transfer

What data do I need?

9 © Copyright 2011 EMC Corporation. All rights reserved.

STEP 3 Store Data for Analysis

Where do I store the

data?

10 © Copyright 2011 EMC Corporation. All rights reserved.

STEP 4 Manage Insights, Code and Other

Artifacts

How do I manage these insights, files?

11 © Copyright 2011 EMC Corporation. All rights reserved.

THE WORLD OF THE ANALYST

No way to track how data turns

into insight

No provisioning for analytical sandboxes

Exploring and transferring data is slow

Data is hard to find and often inaccessible

12 © Copyright 2011 EMC Corporation. All rights reserved.

THE WORLD OF THE ANALYST

13 © Copyright 2011 EMC Corporation. All rights reserved.

GREENPLUM CHORUS SOFTWARE The World’s First Enterprise Data Cloud Platform

14 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus: The World’s First Enterprise Data Cloud Platform • World’s first Enterprise Data Cloud

Platform (EDC), enabling: –  Self-service provisioning –  Data services –  Collaborative analytics

• Customers deploy Chorus along with the Greenplum Database to create a self-service analytic infrastructure

• Chorus can significantly accelerate the time and ease with which companies extract value and insight from their data

15 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus - Enterprise Data Cloud Platform

MAD

lib

Provisioning Data Services Workflow Collaboration Integrations Insights Source Control Open Source

16 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus: Aimed at Helping 3 Primary Users

•  Database Architect/Administrator –  Responsible for providing database capacity

and operations to the company –  Oversees the flow of data into these databases

•  Power Analyst –  Responsible for creating insight from the data –  Interacts closely with the DBAs to get the data

they need, and the required performance/capacity out of the infrastructure

•  Executive –  Ultimately responsible for justifying the

investment –  Focused on making rapid progress on their

data, and celebrating analytic insight

17 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus: Core Design Philosophies

•  Secure –  Provide comprehensive and granular access

control over whom is authorized to view and subscribe to data within Chorus

•  Collaborative –  Facilitate the publishing, discovery, and

sharing of data and insight using a social computing model that appears familiar and easy-to-use

•  Data-centric –  Focus on the necessary tooling to manage

the flow and provenance of data sets as they are created/shared within a company

•  MAD Skills in Action –  Build a platform capable of supporting the

magnetic, agile, and deep principles of MAD Skills

18 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus Features – Provisioning

•  Spin up new projects rapidly with self-service provisioning

–  Provision instances, as new single-node instances

–  Provision sandboxes as new databases

–  Import data easily from anywhere in the data cloud

19 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus Features – Data Services

•  Data is now discoverable, self-documenting, and shared

–  Browse schemas and explore data with powerful search and visualization tools.

–  Attach documents, ask questions, add comments, and build a living data dictionary.

–  Define data sets, share them with the team, and schedule imports

20 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus Features – Workflows

•  Manage and execute analytical workflows

–  Execute workflows directly in the sandbox, and then track changes to work and results over time

–  Import workflows –  Package and distribute workflows

21 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus Features – Collaboration

•  Create a collaborative environment for deep analytics on big data

–  Create project workspaces with shared files, data, documentation and workflows.

–  Control permissions to protect private data.

–  Publish functions and documentation, to promote common standards and techniques.

–  Import functions from libraries of in-database analytics functions.

–  Collaborate within projects, share information across teams

22 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus Features – Integrations

•  Leverage existing IT investments •  Integration with 3rd party

applications –  BI tools, analytical packages,

SQL editors etc. •  Open, extensible, and pluggable

framework

23 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus Features – Insights

•  Drive business action through Insights –  Define, publish and share new

insights –  Discover and learn from existing

insights –  Collaborate around Insights, post

comments, ask questions –  Build a living library of Insights

24 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Chorus Features – Source Control

•  Create a living repository of artifacts of analytics projects

–  Manage files (data, code, presentation, etc.)

–  Version files and track changes –  Compare file versions

25 © Copyright 2011 EMC Corporation. All rights reserved.

In Summary • Greenplum Chorus is the paradigm for how to

successfully perform analytics on massive data • Greenplum Chorus enables organizations to quickly

and easily gain insights from their data by providing them with:

–  A self service infrastructure to support rapid iterations and business owned solution

–  An open, extensible, and collaborative development platform for analytics

26 © Copyright 2011 EMC Corporation. All rights reserved.

Q&A

27 © Copyright 2011 EMC Corporation. All rights reserved.

Greenplum Breakout Sessions DATES TIMES SESSION

Mon., May 9 Wed., May 11

10:00am – 11:00am 10:00am – 11:00am

Choosing a Data Warehousing Platform: Leveraging Greenplum to Become a Data Driven Business

Mon., May 9 Thu., May 12

3:30pm – 4:30pm 10:00am – 11:00am

Best Practices for Greenplum in a Virtualized Environment

Mon., May 9 Thu., May 12

5:00pm – 6:00pm 8:30am – 9:30am

Improving Enterprise Security with Greenplum and RSA enVision

Tue., May 10 Thu., May 12

10:00am – 11:00am 1:00pm – 2:00pm

Look Inside the EMC Greenplum Data Computing Appliance

Tue., May 10 Wed., May 11

2:00pm – 3:00pm 2:45pm – 3:45pm

Data-Driven Businesses: Greenplum Case Studies from Zions Bank, HAVAS Digital, Silver Spring Network

Tue., May 10 Wed., May 11

3:30pm – 4:30pm 4:15pm – 5:15pm

Greenplum: Cloud Computing for Data Warehousing and Analytics

Tue., May 10 Thu., May 12

5:00pm – 6:00pm 11:30am – 12:30pm

Building an Enterprise Data Warehouse Using EMC VMAX, RDF, Snapshots, Data Domain & Greenplum

28 © Copyright 2011 EMC Corporation. All rights reserved.

Visit the EMC Greenplum Booth 211 • EMC Greenplum HD

–  Enterprise-ready Apache Hadoop

• Greenplum DCA • Greenplum Database • Greenplum Analytics

Lab • Eight Ecosystem

Partners

29 © Copyright 2011 EMC Corporation. All rights reserved.

Visit the EMC Greenplum Booth 211

GET YOUR HADOOP

T-SHIRT IN BOOTH 211!

30 © Copyright 2011 EMC Corporation. All rights reserved.

EMC Greenplum in the Solutions Pavilion

EMC Corporate

221

EMC Proven IT

100 Brocade

408

EMC PUE 631

31 © Copyright 2011 EMC Corporation. All rights reserved.

The World’s First Data Scientist Summit May 11 – 12, 2011 The Venetian Las Vegas

32 © Copyright 2011 EMC Corporation. All rights reserved.

33 © Copyright 2011 EMC Corporation. All rights reserved.

THANK YOU