14
© 2015 IBM Corporation IBM SPSS Statistics prepared by: Dennis Buttera, Curriculum Advisor IBM Academic Partnerships

© 2015 IBM Corporation IBM SPSS Statistics prepared by: Dennis Buttera, Curriculum Advisor IBM Academic Partnerships

Embed Size (px)

Citation preview

© 2015 IBM Corporation

IBM SPSS Statisticsprepared by:

Dennis Buttera, Curriculum AdvisorIBM Academic Partnerships

© 2015 IBM Corporation2

Our point of view on Hadoop

Every organization sees Hadoop as providing an open-source, rapidly-evolving platform that is capable of collecting and economically storing a very large corpus of highly variable types of data and making it available.

And yet most organizations are not yet fully realizing the value of Hadoop due to the lack of skills data scientists and developers to extract the valuable insight, or the complexity to scale the Hadoop environment

In order to drive Hadoop adoption we see organizations requires advances:• To have the most powerful analytics in their hands• To distill experience and build skills to drive time to value faster• To easily incorporate Hadoop into a broader data architecture

Every organization sees Hadoop as providing an open-source, rapidly-evolving platform that is capable of collecting and economically storing a very large corpus of highly variable types of data and making it available.

And yet most organizations are not yet fully realizing the value of Hadoop due to the lack of skills data scientists and developers to extract the valuable insight, or the complexity to scale the Hadoop environment

In order to drive Hadoop adoption we see organizations requires advances:• To have the most powerful analytics in their hands• To distill experience and build skills to drive time to value faster• To easily incorporate Hadoop into a broader data architecture

© 2015 IBM Corporation3

What we are announcing at Strata on Feb 17

• IBM Open Platform with Apache Hadoop

• IBM BigInsights for Apache Hadoop

• Sponsoring new global training program for data scientists

• Open Data Platform initiative

• Avnet Enabled Hadoop

© 2015 IBM Corporation4

IBM BigInsights for Apache HadoopThree new modules to get the most out from Hadoop

IBM BigInsights Analyst will include IBM’s SQL engine and IBM’s intuitive spreadsheet and visualizations to find data quickly and easily. On average, millions of SQL queries are run each year. With BigInsights Analyst, the efficiency of these queries has been shown in some cases to improve by approximately 2x to 4x on Apache Hadoop depending on the shuffle size. The ANSI compliant SQL means queries can run unchanged against Hive, HBase and relational databases.

IBM BigInsights Data Scientist will deliver a new machine-learning engine that automatically tunes its performance over large-scale data to find interesting patterns– plus over a dozen industry-specific algorithms such as Decision Trees, PageRank and Clustering to help tackle complex problems out of the box. It will also provide native support for open source R statistical computing helping clients leverage their existing R algorithms, or gain from the more than 4,500 freely available statistics packages from the R community.

IBM BigInsights Enterprise Management will introduce new management tools for clients to realize faster time to results. Designed to help allocate resources and optimize workflows, these tools will allow deployments that can scale to large numbers of users and clusters, and will help satisfy high workload demand. These tools will provide multi-tenancy and multi-instance support in a cluster.

STAC Report™at http://www.stacresearch.com/node/15370

© 2015 IBM Corporation5

IBM BigInsights Data ScientistAccelerates Data to Value with Less Code

RHIPE implementation RHadoop implementation Big R implementation

Coding in R like it was meant to be coded. Not embedding foreign code like Java.

© 2015 IBM Corporation6

ANSI Compatible SQL with 4X the Query Speed on Apache Hadoop

ANSI Compatible SQL with 4X the Query Speed on Apache Hadoop

IBM BigInsights AnalystFamiliar worksheets for large-scale datasets

Web-based and Simple to use UI for Big Data Analytics

Web-based and Simple to use UI for Big Data Analytics

© 2015 IBM Corporation7

IBM BigInsights Enterprise ManagementMulti-tenancy and optimized workflows in a Hadoop Cluster

“In jobs derived from production Hadoop traces at Facebook, IBM® PlatformTM Symphony accelerated Hadoop by an average of 7.3x.”

© 2015 IBM Corporation8

Proof of Concept: IBM BigInsights on the Cloud

to discover insights around specific business concerns.

Objective• Improved repeat shopping conversion rates, greater customer engagement, higher total

revenue period-on-period.

Intended Benefits• Transition to a customer interest-based marketing approach.• Combine multiple data sets to build holistic view of customer.• Model sales performance to environmental and multi-channel attribution.

Results• 541% improvement in revenue• Success in building customer-interest based integrated data set• Disproved some long held beliefs related to weather driving sales in online channel /

establish dichotomy between retail and online receptively to marketing

Objective• Improved repeat shopping conversion rates, greater customer engagement, higher total

revenue period-on-period.

Intended Benefits• Transition to a customer interest-based marketing approach.• Combine multiple data sets to build holistic view of customer.• Model sales performance to environmental and multi-channel attribution.

Results• 541% improvement in revenue• Success in building customer-interest based integrated data set• Disproved some long held beliefs related to weather driving sales in online channel /

establish dichotomy between retail and online receptively to marketing

A Large Online Retailer gained a 541% improvement in revenue

© 2015 IBM Corporation9

Text Analytics

POSIX Distributed Filesystem Multi-workload, Multi-tenant schedulingIBM BigInsights

Enterprise Management

System ML on Big R

Distributed R

Business Analyst

Data Scientist

IBM Open Platform with Apache Hadoop

Developer

Administrator

IBM BigInsights Data Scientist

IBM BigInsights Analyst

Big SQL

Big Sheets

Big SQL

BigSheets

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache HadoopThree new user-centric modules founded on an Open Data Platform

© 2015 IBM Corporation10

IBM is Founder Member in Open Data Platform Initiative

The Open Data Platform Initiative (ODP) is a shared industry effort focused on promoting and advancing the state of Apache Hadoop and Big Data technologies for the enterprise. ODP aims to accelerate the delivery of Big Data solutions by providing a well-defined core platform to target.

Test, Certify, and Standardize the core components of a new “Open Data Platform” of select Apache Software Foundation (ASF) projects to provide a foundation for which Big Data solutions providers can build upon.

Initially Apache Hadoop (HDFS, YARN, MapReduce) and Apache Ambari (Provisioning, Management, and Monitoring)

Support for community development and outreach activities.

© 2015 IBM Corporation11

IBM Sponsors New Data Science Curriculum at Big Data University

A skills shortage is the major obstacle to adoption of big data & analytics technologies. IBM is proud to sponsor Big Data University and its new curriculum for Programming for

Analytics and Data Science. Big Data University delivers free online courses to a community of over 230,000 registered

participants around the world. IBM sponsors and engages in the community to raise skills in the market IBM fosters and supports rapidly growing number of enrolled participants

Big Data University currently offers a number of courses for free including: Hadoop Fundamentals, SQL Access on Hadoop, BigSheets, Hbase for Real-Time

Access, Hive, Streams, IBM BLU Acceleration, Data Mining with R, Application Development, Pig, and many more in multiple languages and online.

© 2015 IBM Corporation12

How is IBM different? Hadoop and the Analytics Ecosystem

There are very few companies who can truly say they are early contributors to Hadoop and are still innovating today. IBM is one of only a handful who have continued to advance Hadoop from its early days as a single Apache Project to more than the dozen projects that exist today. IBM’s core strength is its deep knowledge of the inner workings of Hadoop and its strategic value in the Enterprise. IBM continues its involvement with open source by joining the Open Data Consortium to ensure stability of Hadoop as a foundation for Big Data & Analytics.

In-Hadoop Analytics

IBM invented SQL over 40 years ago and is still in use today as the linga franca of data querying for its simplistic syntax and ubiquity across organizations.

Client Success with HadoopOrganizations rely on IBM to solve their most difficult analytics problems based on our depth of expertise and domain leadership in software, hardware, analytics and research. IBM brings 100,000 trained analytics professionals, over 200 customers, and a 24 billion investment into the platform.

© 2015 IBM Corporation13

What sets our offering apart?

An Open Hadoop for an Expanding Ecosystem

We have decoupled our Hadoop distribution from the core value components.

Performance Improvements at Every Level

We have applied our deep knowledge of distributed computing, query optimization, and workflow controllers to increase performance 4X to 11X compared to our next nearest competitor.

IBM Hadoop is Production Ready

IBM provides an Analytics Platform that incorporates Hadoop is a first class citizen in the broader analytic architecture to remove the barrier of querying across, or moving artifacts to and from, other environments.

© 2015 IBM Corporation14