Upload
adrian-turcu
View
171
Download
1
Embed Size (px)
Citation preview
© 2015 IBM Corporation
IBM Smarter AnalyticsBig Data Adoption
Adrian TurcuBig Data Architect
IBM Client Innovation Centers RoCEB
© 2015 IBM Corporation2
Mobile
Social
Cloud
Analytics
The Mega Trends
© 2015 IBM Corporation3
Big Data: More than just volume
Volume
Terabytes to exabytes of
existing data to process
Velocity
Streaming data, milliseconds to
seconds to respond
Variety
Structured, unstructured,
text & multimedia
Veracity
Uncertainty from
inconsistency,
ambiguities, etc.
© 2015 IBM Corporation4
Big Data & Analytics Value Proposition
The primary value from big data and analytics comes not from the data in its raw form, but from the processing and analysis of it and the
insights, decisions, products, and services that emerge from analysis.
© 2015 IBM Corporation5
IBM’s Commitment to Big Data
$16 Billion in Big Data acquisitions35 new acquisitions in the last 5 years
More than 1000 developers focused on Big Data technology development
IBM joins Apple, Twitter, and the Weather Company in strategic partnerships
Largest patent portfolio in the industry
IBM has the largest commercial research organization on Earth
‒ 200+ mathematicians developing breakthrough analytics
IBM’s Big Data business grew over 150% in 2014
IBM CEO Says ‘Big Data’ Is Company’s Top Priority
Commitment to Big Data Means…Commitment to Hadoop and Spark
© 2015 IBM Corporation6
IBM is Committed to Open Source
Open source technologies are the base for IBM software and solutions
IBM’s long history of deep open source commitment- Apache Software Foundation: Founding member in 1999- Cloud Foundry: #1 contributor; Basis for Bluemix- OpenStack: #4 contributor; Basis for IBM’s IaaS- Linux: #3 contributor; IBM first enterprise backer of Linux- Hadoop/Spark: Extensive investment in open source contribution;
Integration with Analytics software
Infrastructure
Systems
Application
© 2015 IBM Corporation7
IBM Investing in Four Catalysts for Big Data Adoption
Familiar Interfaces & Integration with Established Tools
Technical Standards
New Analytics Capabilities
Open Source Innovation
© 2015 IBM Corporation8
Apache Hadoop Ecosystem: Rapid Innovation, Few Standards Distributions include different projects at different version levels
“This proliferation of baskets [Hadoop distributions with different project versions] creates significant drag when it comes to building reliable applications ... makes it harder for customers to assess which basket of Hadoop that they need and harder for application developers to create solutions that work broadly.”
– Raymie Stata, CEO, Altiscale
Even though the project versions match, there are interface differences
If the industry is truly committed to developing big data technologies and solutions …, it will require an ecosystem of providers … to create a consistent framework around which everyone can develop.
- Siki Giunta, SVP, Verizon
The Hadoop ecosystem is evolving at a faster pace than is comfortable
“My personal speculation is that it comes from some who have been evaluating for a while seeing change occur so rapidly that they are dropping back for another look.”
– Merv Adrian, VP, Gartner
© 2015 IBM Corporation9
Certify a standard “ODP Core” set of open source Hadoop family projects with specific versions and patch levels
Develop tools and methods to help solution providers to test applications against the ODP Core.
Contribute changes and fixes in the ODP Core Hadoop family projects to the ASF using the ASF processes.
http://opendataplatform.org/
© 2015 IBM Corporation10
Open Data Platform Initiative
Representation across the Hadoop ecosystem…
Hadoop distribution vendors
Software application providers
System integrators/consultants
Hardware vendors Customers
… who all believe in the need for a community-based effort to standardize Hadoop, which will lead to
improved adoption
© 2015 IBM Corporation11
IBM Open Platform with Apache Hadoop (IOP)
100% open source code Apache Hadoop distribution- Commitment to currency: “days, not months”- Includes Spark
Free for production use- Decoupled Apache Hadoop from IBM analytics and data science technologies- Production support offering available
Apache Open Source Components
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
IBM Open Platform with Apache Hadoop
© 2015 IBM Corporation12
Text Analytics
POSIX Distributed Filesystem
Multi-workload, multi-tenant scheduling
IBM Biglnsights Enterprise Management
Machine Learning on Big R
Big R (R support)
IBM Open Platform with Apache Hadoop(HDFS, YARN, MapReduce, Ambari, Hbase, Hive, Oozie, Parquet, Parquet Format, Pig,
Snappy, Solr, Spark, Sqoop, Zookeeper, Open JDK, Knox, Slider)
IBM Biglnsights Data Scientist
IBM Biglnsights Analyst
Big SQL
BigSheets
Industry standard SQL (Big SQL)
Spreadsheet-style tool (BigSheets)
Overview of Biglnsights (v4.x)
. . .
© 2015 IBM Corporation13
IBM Open Platform with Apache Hadoop adopts ODP Core
BigInsights will include ODP certified Apache packages - ODP will initially target core packages of a Hadoop distribution- Packages will expand over time- First certification set expected this summer
Our goal for BigInsights on ODP- Better compatibility and less testing against ecosystem software- Enable IBM Hadoop capabilities to run on other ODP-certified
Hadoop distributions
HDFS
YARN
MapReduce
Ambari HBase
Spark
Flume
Hive Pig
Sqoop
HCatalog
Solr/Lucene
ODP
* Candidate set of certified ODP modules – expected summer 2015
Apache Open Source Components
IBM Open Platform with Apache Hadoop
© 2015 IBM Corporation14
Apache Spark is ideal for: Machine Learning Interactive analytics Data Science
http://spark.apache.org
Spark is an open-source, in-memory compute engine that is highly versatile to any environment, enabling you to quickly build models, iterate faster, and apply deep intelligence everywhere.
Apache Spark Overview
Apache Spark
Spark SQLSpark
StreamingGraphX
MLlib(machine learning)
SparkR
© 2015 IBM Corporation15
IBM | Spark - The Start of Something Big in Data and DesignTogether, creating the platform for Data Science
Understand Business Goal
Data Profiling and Exploration
Train Algorithms
Consult Experts
Prepare data
App Dev, Deploy, Validate
Go live. Refresh.
+
© 2015 IBM Corporation16
IBM Analytic Platform Capabilities
IBM Software Integrates and Extends Hadoop and Spark
Data WarehousingPureData for Analytics, Operational Analytics
Entity Extraction and MatchingBig Match
Security and ComplianceOptim, Guardium Audit and Encryption
Data Integration and GovernanceInformation Server
Enterprise SearchWatson Explorer
Real-time AnalyticsStreams
Predictive Modeling and Descriptive Statistics
SPSS, Big R and Scalable Algorithms
Analysis, Reporting, and ExplorationWatson Analytics, Cognos, BigSheets
Fast, ANSI SQL 2011, and Secure SQLBig SQL
Enterprise File SystemGPFS-FPO
Cluster Resource and Workload Management
Platform Symphony
Large Scale Text ExtractionBig Text
IBM Open Platform with Apache Hadoop
© 2015 IBM Corporation17
What is IBM’s perspective on Spark?
IBM opens Spark Technology Center in San Francisco to foster innovation in the heart of the Spark community
IBM is forging key partnerships and building relationships with the creators of Spark- Big data university- Spark certification and Spark social badge- Databricks partnership- AMPLab partnership
IBM Analytics Platform will unify on and around Spark to ensure robust integration and ease of use for our clients- Biglnsights “Spark-Inside”- Spark as a Service on IBM Bluemix (beta in June 2015)- Streams and Spark integration
© 2015 IBM Corporation18http://g01zcdwas002.ahe.pok.ibm.com/software/data/infosphere/hadoop/trials.html
Free Quick Start (non production): • IBM Open Platform • Biglnsights Analyst, Data
Scientist features • Community support
© 2015 IBM Corporation19
http://g01zcdwas002.ahe.pok.ibm.com/software/data/infosphere/hadoop/trials.html
© 2015 IBM Corporation20
IBM’s Investment in the Big Data CommunityOver 250,000 benefit from free Big Data skills training
http://bigdatauniversity.com
© 2015 IBM Corporation21
Big Data ≠
© 2015 IBM Corporation22
© 2015 IBM Corporation23
Watson is creating a new
partnership between people
and computers that
enhances, scales and
accelerates human expertise.
© 2015 IBM Corporation24
Brief History of IBM Watson
R&D
Demonstration
Commercialization
Cross-industry Applications
IBMResearch Project
(2006 – )
Jeopardy!Grand Challenge
(Feb 2011)
Watson for
Healthcare(Aug 2011 –)
Watson Industry Solutions(2012 – )
Watson for Financial
Services(Mar 2012 – )
Expansion
© 2015 IBM Corporation25
IBM Watson is cognitive computing
Watson understands me.
Watson engages me.
Watson learns and improves over time.
Watson helps me discover.
Watson establishes trust.
Watson has endless capacity for insight.
Watson operates in a timely fashion.
…built on a massively parallel Big Data scalable architecture
© 2015 IBM Corporation26
Many industries have a “discovery” challenge
Drug discovery: ~12-15 yrs, $B per drug, 90+% fallout rateLithium ion Battery: ~20 years development time
Healthcare and Life sciences Chemical and Petroleum
Drug DiscoveryNew
Biology Science
New Bio-medical
Research
Oil Reservoir Discovery
Crop SciencesNew Energy
Materials
Product formation: based on Ad hoc manual trial & errorWater filtration: Billions still do not have clean water today
Consumer Goods and Products Semi-Conductor and Materials
Product Innovation
New Market Identification
New Partnerships
Nano Materials
Energy Storage Water Filtration
Existing Discovery is Slow, Expensive, Ad hoc and Manual
© 2015 IBM Corporation27
27© 2014 International Business Machines Corporation
Bringing IBM Watson to market
Watson Engagement AdvisorWatson Discovery AdvisorWatson Policy AdvisorWatson Decision Advisor
Offerings:
Watson for Wealth ManagementWatson for OncologyChef Watson
Applications:
Watson ExplorerWatson AnalyticsWatson Curator
Products
Watson Zone on BluemixWatson Developer CloudWatson Tooling
Platform:
© 2015 IBM Corporation28
Delivers the tools, methodologies, software developer kits and API(s) for ISVs to build the next generation of cognitive applications
Provides sources of free and fee based content including public, industry and enterprise content
Bridges developers resource gaps by providing a marketplace for critical cognitive skills
• Cloud based sandbox• Hosting Services• Self-service portal –
API / Tooling / SDK / Methodology
• Starter Content• General content• Domain content• Taxonomies
• Third-Party Content
• IBM subject matter experts (500+)
• Third-party specialists• Certification • Individual and project
work
Synopsis
Offering
WATSON DEVELOPER CLOUD
WATSON CONTENT STORE
WATSON TALENT HUB
IBM Watson Platform
© 2015 IBM Corporation29
IBM Watson Services on Bluemix
User ModelingPersonality profiling to help engage users on their own terms.
Language Identification
Identifies the language in which text is written
Machine Translation
Globalize on the fly. Translate text from one language to another.
Concept ExpansionMaps euphemisms or colloquial terms to more commonly understood phrases
Message ResonanceCommunicate with people with a style and words that suits them
Question AnswerDirect responses to users inquiries fueled by primary document sources
Relationship ExtractionIntelligently finds relationships between sentences components (nouns,
verbs, subjects, objects, etc.)
Visualization Rendering
Graphical representations of data analysis for easier understanding
© 2015 IBM Corporation30
The Watson Experience Manager (WEM)
With Watson Experience Manager:
• Developers use APIs to access and test “Powered by Watson” apps
• Data Scientists can manage their content used to enrich Watson
• User experience developers can customize or create user interaction models with Watson
• Domain experts can train and test their “Powered by Watson” apps
WEM provides a role based set of tools for SME, Watson administrators, and Domain Experts
© 2015 IBM Corporation31
Access Watson Developer Cloudusing Watson Experience Manager
Develop app “Powered by Watson”using APIs
Enrich Watsonwith content
Train Watson using tools and experts
Test appfunctional andnon-functional
Deploy application
Building your “Powered by Watson” app
© 2015 IBM Corporation32
Let’s Get Started To partner with Watson, you need to:
Be committed to training Have an accessible corpus of
information Identify a clear problem to solve
Get a Bluemix account
Try the Watson services free for 30 days
Take the next step towards development or production deployment
Let’s Get Started with IBM Watson!
© 2015 IBM Corporation33
Investing and Educating
www.ibmbigdatahub.comwww.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/www.ibm.com/cloud-computing/bluemix/
© 2015 IBM Corporation34
zzzzzzz
Questions?
© 2015 IBM Corporation35