1 © Copyright 2011 EMC Corporation. All rights reserved.
GREENPLUM CHORUS Cloud Computing for Data Warehousing and Analytics
Manish Jiandani Director, Product Management EMC Data Computing Division
2 © Copyright 2011 EMC Corporation. All rights reserved.
J U LY 2 0 1 0 - E M C A C Q U I R E S G R E E N P L U M
Greenplum Becomes the Foundation of EMC’s Data Computing Division
“For three years, Gartner has identified Greenplum as the most advanced vendor in the visionary
quadrant of its data warehouse DBMS Magic Quadrant….” – Gartner
3 © Copyright 2011 EMC Corporation. All rights reserved.
Data Volume Growing 44x
2020: 35.2 Zettabytes
2009: 0.8
Zettabytes
Big Data Size: The Volume Of Data Continues To Explode The Digital Universe 2009 - 2020
Source: IDC Digital Universe Study, sponsored by EMC, May 2010
4 © Copyright 2011 EMC Corporation. All rights reserved.
Big Data Significance: Not Just For Google and Facebook…
“Just as search engines have transformed how we access information, other forms of big data computing can and will transform the activities of companies, scientific researchers, medical practitioners, and our nation's defense and intelligence operations.”
Randal E. Bryant Carnegie Mellon University
Randy H. Katz UC Berkeley
Edward D. Lazowska University of Washington
5 © Copyright 2011 EMC Corporation. All rights reserved.
The Analytics Reality – Sound Familiar? • Finding data is hard – data is inaccessible,
undocumented • Exploring and transferring data is slow • Shadow IT systems due to lack of provisioning • No sharing of code and data, no standardized best
practices • Lost insights
6 © Copyright 2011 EMC Corporation. All rights reserved.
The World of the Analyst: It All Starts with a Question
Hey, analyst! I have a question!
Great, business colleague!
I’m ready…
7 © Copyright 2011 EMC Corporation. All rights reserved.
STEP 1 The Hunt for Data
When can I get access
to data?
8 © Copyright 2011 EMC Corporation. All rights reserved.
STEP 2 Data Exploration and Transfer
What data do I need?
9 © Copyright 2011 EMC Corporation. All rights reserved.
STEP 3 Store Data for Analysis
Where do I store the
data?
10 © Copyright 2011 EMC Corporation. All rights reserved.
STEP 4 Manage Insights, Code and Other
Artifacts
How do I manage these insights, files?
11 © Copyright 2011 EMC Corporation. All rights reserved.
THE WORLD OF THE ANALYST
No way to track how data turns
into insight
No provisioning for analytical sandboxes
Exploring and transferring data is slow
Data is hard to find and often inaccessible
13 © Copyright 2011 EMC Corporation. All rights reserved.
GREENPLUM CHORUS SOFTWARE The World’s First Enterprise Data Cloud Platform
14 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus: The World’s First Enterprise Data Cloud Platform • World’s first Enterprise Data Cloud
Platform (EDC), enabling: – Self-service provisioning – Data services – Collaborative analytics
• Customers deploy Chorus along with the Greenplum Database to create a self-service analytic infrastructure
• Chorus can significantly accelerate the time and ease with which companies extract value and insight from their data
15 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus - Enterprise Data Cloud Platform
MAD
lib
Provisioning Data Services Workflow Collaboration Integrations Insights Source Control Open Source
16 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus: Aimed at Helping 3 Primary Users
• Database Architect/Administrator – Responsible for providing database capacity
and operations to the company – Oversees the flow of data into these databases
• Power Analyst – Responsible for creating insight from the data – Interacts closely with the DBAs to get the data
they need, and the required performance/capacity out of the infrastructure
• Executive – Ultimately responsible for justifying the
investment – Focused on making rapid progress on their
data, and celebrating analytic insight
17 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus: Core Design Philosophies
• Secure – Provide comprehensive and granular access
control over whom is authorized to view and subscribe to data within Chorus
• Collaborative – Facilitate the publishing, discovery, and
sharing of data and insight using a social computing model that appears familiar and easy-to-use
• Data-centric – Focus on the necessary tooling to manage
the flow and provenance of data sets as they are created/shared within a company
• MAD Skills in Action – Build a platform capable of supporting the
magnetic, agile, and deep principles of MAD Skills
18 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus Features – Provisioning
• Spin up new projects rapidly with self-service provisioning
– Provision instances, as new single-node instances
– Provision sandboxes as new databases
– Import data easily from anywhere in the data cloud
19 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus Features – Data Services
• Data is now discoverable, self-documenting, and shared
– Browse schemas and explore data with powerful search and visualization tools.
– Attach documents, ask questions, add comments, and build a living data dictionary.
– Define data sets, share them with the team, and schedule imports
20 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus Features – Workflows
• Manage and execute analytical workflows
– Execute workflows directly in the sandbox, and then track changes to work and results over time
– Import workflows – Package and distribute workflows
21 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus Features – Collaboration
• Create a collaborative environment for deep analytics on big data
– Create project workspaces with shared files, data, documentation and workflows.
– Control permissions to protect private data.
– Publish functions and documentation, to promote common standards and techniques.
– Import functions from libraries of in-database analytics functions.
– Collaborate within projects, share information across teams
22 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus Features – Integrations
• Leverage existing IT investments • Integration with 3rd party
applications – BI tools, analytical packages,
SQL editors etc. • Open, extensible, and pluggable
framework
23 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus Features – Insights
• Drive business action through Insights – Define, publish and share new
insights – Discover and learn from existing
insights – Collaborate around Insights, post
comments, ask questions – Build a living library of Insights
24 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Chorus Features – Source Control
• Create a living repository of artifacts of analytics projects
– Manage files (data, code, presentation, etc.)
– Version files and track changes – Compare file versions
25 © Copyright 2011 EMC Corporation. All rights reserved.
In Summary • Greenplum Chorus is the paradigm for how to
successfully perform analytics on massive data • Greenplum Chorus enables organizations to quickly
and easily gain insights from their data by providing them with:
– A self service infrastructure to support rapid iterations and business owned solution
– An open, extensible, and collaborative development platform for analytics
27 © Copyright 2011 EMC Corporation. All rights reserved.
Greenplum Breakout Sessions DATES TIMES SESSION
Mon., May 9 Wed., May 11
10:00am – 11:00am 10:00am – 11:00am
Choosing a Data Warehousing Platform: Leveraging Greenplum to Become a Data Driven Business
Mon., May 9 Thu., May 12
3:30pm – 4:30pm 10:00am – 11:00am
Best Practices for Greenplum in a Virtualized Environment
Mon., May 9 Thu., May 12
5:00pm – 6:00pm 8:30am – 9:30am
Improving Enterprise Security with Greenplum and RSA enVision
Tue., May 10 Thu., May 12
10:00am – 11:00am 1:00pm – 2:00pm
Look Inside the EMC Greenplum Data Computing Appliance
Tue., May 10 Wed., May 11
2:00pm – 3:00pm 2:45pm – 3:45pm
Data-Driven Businesses: Greenplum Case Studies from Zions Bank, HAVAS Digital, Silver Spring Network
Tue., May 10 Wed., May 11
3:30pm – 4:30pm 4:15pm – 5:15pm
Greenplum: Cloud Computing for Data Warehousing and Analytics
Tue., May 10 Thu., May 12
5:00pm – 6:00pm 11:30am – 12:30pm
Building an Enterprise Data Warehouse Using EMC VMAX, RDF, Snapshots, Data Domain & Greenplum
28 © Copyright 2011 EMC Corporation. All rights reserved.
Visit the EMC Greenplum Booth 211 • EMC Greenplum HD
– Enterprise-ready Apache Hadoop
• Greenplum DCA • Greenplum Database • Greenplum Analytics
Lab • Eight Ecosystem
Partners
29 © Copyright 2011 EMC Corporation. All rights reserved.
Visit the EMC Greenplum Booth 211
GET YOUR HADOOP
T-SHIRT IN BOOTH 211!
30 © Copyright 2011 EMC Corporation. All rights reserved.
EMC Greenplum in the Solutions Pavilion
EMC Corporate
221
EMC Proven IT
100 Brocade
408
EMC PUE 631
31 © Copyright 2011 EMC Corporation. All rights reserved.
The World’s First Data Scientist Summit May 11 – 12, 2011 The Venetian Las Vegas