Upload
emc-academic-alliance
View
110
Download
2
Embed Size (px)
DESCRIPTION
Citation preview
1 © Copyright 2014 EMC Corporation. All rights reserved.
Data Science + Data Engineering
Annika Jimenez
Secret Weapon of the Strategic Enterprise
2 © Copyright 2014 EMC Corporation. All rights reserved.
Agenda
Data Science: What is it and why do we do it?
The Importance of Data Engineering
An Example: Kaiser Code-a-thon
Transforming Your Enterprise with Pivotal
Pivotal Data Labs: Data Engineering + Data Science
Closing Advice
3 © Copyright 2014 EMC Corporation. All rights reserved.
What Matters: Apps. Data. Analytics.
Apps power business, and those apps generate data Analytic insights from that data drive new app functionality, which in-turn drives new data The faster you can move around the cycle, the faster you learn, innovate and pull away from the competition
4 © Copyright 2014 EMC Corporation. All rights reserved.
Primary Motions for Pivotal
Agile: data-driven apps and rapid time to value Data Lake: store everything, analyze anything
Enterprise PaaS: revolutionary software development and speed; build the right thing
5 © Copyright 2014 EMC Corporation. All rights reserved.
DATA SCIENCE
The use of statistical and machine learning techniques on big, multi-structured data
– in a distributed computing environment – to identify correlations and causal relationships, classify and predict events, identify patterns and
anomalies, and infer probabilities, interest, and sentiment.
6 © Copyright 2014 EMC Corporation. All rights reserved.
But, why do we use Data Science?
7 © Copyright 2014 EMC Corporation. All rights reserved.
BI – show dashboard
8 © Copyright 2014 EMC Corporation. All rights reserved.
Is the Goal Any of These Things?
A. Cool Visualizations
B. Custom Querying
C. Decision Enablement
D. Insights
E. All of the above… NO
9 © Copyright 2014 EMC Corporation. All rights reserved.
DRIVE AUTOMATED LOW LATENCY ACTIONS
IN RESPONSE TO EVENTS OF INTEREST
10 © Copyright 2014 EMC Corporation. All rights reserved.
YOUR DATA
DATA SCIENCE + = MODELS
11 © Copyright 2014 EMC Corporation. All rights reserved.
Drive Automated
Low Latency Actions
Production Data Feeds
Low Latency Model
Scoring
API Availability or Push to
Apps
Business Logic
Application
Response
New Events
(aka, Data) Model Operationalization
(“O16N”)
12 © Copyright 2014 EMC Corporation. All rights reserved.
Data Science Value Chain
Instrumen-tation
Logs Capture Store Transform
& Prepare Access Model Dev. Deploy Apps Process Change
Product Engineer
Data Engineer DBA Data
Engineer Data
Engineer Data Scientist
Data Engineer
Application Developer PMO
13 © Copyright 2014 EMC Corporation. All rights reserved.
→ Kaiser Blog for Full Story
14 © Copyright 2014 EMC Corporation. All rights reserved.
Code-a-Thon Details – Logistics
24-Hour Data Science Code-a-Thon
5 resources per vendor
Vendors were asked to be prepared for any use in the domain
A 15-minute presentation to senior leaders, executives, doctors and pharmacists
Teams were required to use Tableau in their presentation
15 © Copyright 2014 EMC Corporation. All rights reserved.
Code-a-Thon – Pivotal Team
Hulya Emir-Farinas Data Scientist
Noah Zimmerman Data Scientist
Jacque Istok Application Developer
Dillon Woods Application Developer
Randy Williard Big Data Engineer
Jemish Patel Big Data Engineer
Adam Shook Big Data Engineer
Roy Mims Coordinator
16 © Copyright 2014 EMC Corporation. All rights reserved.
The Day of the Code-a-Thon…
17 © Copyright 2014 EMC Corporation. All rights reserved.
Key Insight
18 © Copyright 2014 EMC Corporation. All rights reserved.
Asthma Population Management Application
19 © Copyright 2014 EMC Corporation. All rights reserved.
Asthma Management Application
20 © Copyright 2014 EMC Corporation. All rights reserved.
What Did We Learn in 2013? Pivotal has a world-class Data Science team, the best there is
Data Science alone is good, but Data Science + Expert Data Engineering and Architecture is great
Corollary: Data Science + Data Engineering + Apps trumps everything
– This is the path to rapid value creation and ROI
21 © Copyright 2014 EMC Corporation. All rights reserved.
DATA SCIENCE
DATA ENGINEERING
PIVOTAL LABS
Data Science + Data Engineering + Pivotal Labs = The Magic in the Middle
22 © Copyright 2014 EMC Corporation. All rights reserved.
What Is Pivotal Data Labs?
Data Science Data Engineering
+
23 © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Data Scientists are technical professionals with strong programming skills, anchored in vertical/horizontal domains or in specialized academic research, able to identify real-world problems requiring predictive analytics, formulate these mathematically, and solve them by applying machine learning and statistical algorithms, on Big Data, in Pivotal and third-party technologies.
Pivotal Data Engineers are Big Data experts and industry veterans with a passion to leverage these skills to drive business value for Pivotal customers. They posses expert knowledge and skills with the Pivotal data products and excel at architecting enterprise scale solutions to the most demanding data and analytic challenges.
Data Science Data Engineering
+
What Is Pivotal Data Labs?
24 © Copyright 2014 EMC Corporation. All rights reserved.
What is a “Data Scientist”? P
rog
ram
min
g S
kills
Mathematical/Statistical Skills
25 © Copyright 2014 EMC Corporation. All rights reserved.
What is a “Data Engineer”?
Prog
ramm
ing
Skills
Architectural Skills
26 © Copyright 2014 EMC Corporation. All rights reserved.
World’s Leading Experts Pivotal Labs – Pivotal Data Labs
BATCH BATCH
NEAR TIME NEAR TIME HAWQ Greenplum DB
Pivotal HD
REAL TIME REAL TIME GemFire XD GemFire
27 © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal One
SOLUTIONS Pivotal One
SERVICESS
Pivo
tal O
ne
PIVOTAL
MySQL
Elastic Runtime Services: Java, Spring, Ruby, Node.JS
Value-adds: Installation, Management & Monitoring
(Core OSS)
• Data Lake Solutions (Security Analytics, Corp Comm Analytics, Business) • RTI for Telco (RTI4T)
PIVOTAL GemFire XD
PIVOTAL Data Dispatch
Coming in 2014
Pivotal HD Hadoop+Que
ry
GPDB MPP
Analytics
GemFire In-Memory
Grid
Spring App
Framework
RabbitMQ, Redis…
Pivotal One
MARKETPLACE
Pivotal Data Labs in Data Fabric Building Towards Pivotal One
28 © Copyright 2014 EMC Corporation. All rights reserved.
Introducing Pivotal Data Labs Our Charter: Pivotal Data Labs is Pivotal’s differentiated and highly opinionated data-centric service delivery organization.
Our Goals: Expedite customer time-to-value and ROI, by driving business-aligned innovation and solutions assurance within Pivotal’s Data Fabric technologies.
Drive customer adoption and autonomy across the full spectrum of Pivotal Data technologies through best-in-class data science and data engineering services, with a deep emphasis on knowledge transfer.
29 © Copyright 2014 EMC Corporation. All rights reserved.
Highly-Opinionated & Differentiated
“Highly-Opinionated” – Highly prescriptive in our counsel of data best practices to customers and partners, drawing from best-in-class talent and deep experience operating on the Pivotal Data Fabric
“Differentiated” – An expert Data services business that is unique in its class and unlike the Data services available elsewhere
30 © Copyright 2014 EMC Corporation. All rights reserved.
What Will PDL Deliver For Customers?
Accelerated time-to-value and real ROI for customers
31 © Copyright 2014 EMC Corporation. All rights reserved.
How Do We Do This? Best-in-class Data Science to drive value creation on
Pivotal stack and customer data Best-in-class Data Engineering to drive pragmatic, well-
designed, customized architecture for end-to-end Pivotal stack Assured solutions success in Pivotal data service delivery Operationally-optimized predictive models Collaboration with Pivotal Labs to deliver data-driven
applications
32 © Copyright 2014 EMC Corporation. All rights reserved.
INSTALL VERIFY ENABLE We will verify the installation
making sure it’s fully operational in your environment and ready
to give you the Pivotal advantage.
Our experts work with your staff to plan, install and fully-
configure the Pivotal software based on your environment and
requirements.
Lastly, we’ll train your people and conduct knowledge transfer
to make sure you are comfortable using and
supporting Pivotal software.
Getting Started with Pivotal Software
Engagement Management – site prep, project management, customer support overview
Hardware Validation / Installation
Software Installation / Verification
Training & Knowledge Transfer
PIVOTAL ONBOARDIN
G SERVICES
33 © Copyright 2014 EMC Corporation. All rights reserved.
PIVOTAL SOLUTION
ASSURANCE
INCEPTION ADVISE SUPPORT Leveraging expert-services in
Pivotal Data Labs, Pivotal Labs, and Certified Pivotal Partners we’ll work with your architects and developers to assure that
your system design and development is aligned with
Pivotal best practices.
Getting off the ground with a well-formed plan, a solid team and realistic expectations are
fundamental to overall success. Our experts help with design,
guidance, oversight and lessons from other customers to
get your initiative going in the best direction from the start.
.
We’ll act as your conduit to Certified Professional Services, Customer Support and Pivotal R&D to make you successful,
quickly determine answers and bring in specialized expertise
where needed.
Leverage Pivotal Data Experts for Success
Engagement / Success Alignment – Regular meetings for status and guidance, Resource advice
Architecture Design Implementation Design and Assistance
Resource Assistance, Expedited Response
34 © Copyright 2014 EMC Corporation. All rights reserved.
DISCOVERY INSIGHTS RESULTS Once the data is understood, we set ourselves apart by making optimal use of Pivotal’s Data
Fabric, our analytical tools and our data science experts to build models creating deep actionable insights on key events of interest
in any use case.
Getting the most from your data requires understanding what
you have today and discovering what your data can do for you. Combining our data scientists
with your data starts the path to value creation.
Driving insights into actionable results is enabled through data
scoring and model code optimization, documentation
and knowledge transfer.
Value Creation With Predictive Insights
Engagement / Business / Technical Alignment – Regular meetings for status and validation
Data exploration, readiness assessment, and prep Combining domain knowledge and
data science for predictive modeling Code documentation, and knowledge sharing
DATA SCIENCE
LABS
35 © Copyright 2014 EMC Corporation. All rights reserved.
DATA LAB INCEPTION IMPLEMENT EXCEL
Delivered by a dedicated team drawing from Pivotal Data Labs’
experts in architecture, data engineering, data science, and
application development, implementations of Pivotal
technologies will be targeted to maximize value creation against
your specific technical and business goals.
Getting off the ground with a well-formed holistic
architectural, data science, and application plan is fundamental to overall success. Our experts
will drive this process leveraging deep experience in these areas, to streamline the
path to success for your initiative.
Years of experience building successful data projects give us
the know-how to quickly and efficiently work through the challenging phases of any project including design, scaling, integration and production readiness.
Engagement / Business / Technical Alignment – Regular meetings for status and guidance
Data platform architecting and strategic analytiic use-case roadmaps Data and Application Fabric
deployments, Data Science modeling & App development Knowledge sharing,
training and expert assistance
Deep Partnering to Maximize Value Creation
36 © Copyright 2014 EMC Corporation. All rights reserved.
Pivotal Data Labs + Pivotal Labs = The Magic in the Middle
RAPID VALUE!
PIVOTAL LABS
* PIVOTAL DATA LABS
37 © Copyright 2014 EMC Corporation. All rights reserved.
My Advice to Enterprises 1. Know your data and its potential value
2. Get “vision” and question status quo
3. Understand the technical paradigm shift underway
4. Hire or grow your Data Dream Team: Data Scientists and Data Engineers
5. Clear the path to operationalization (aka, value)
6. Manage the disruption, don’t reject it