20
Big Data Insurance Mike Johnson [email protected]

Big Data Insurance

Embed Size (px)

Citation preview

Page 1: Big Data Insurance

Big Data Insurance

Mike Johnson [email protected]

Page 2: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 2

Big Data is Here to Stay – Forbes Sept. 2015

The data volumes are exploding, more data has been created in the past two years than in the entire

previous history of the human race.

By 2020, our accumulated digital universe of data will grow from 4.4 zettabytes today to around 44 zettabytes, or 44 trillion gigabytes.

Within five years there will be over 50 billion smart connected devices in the world, all developed to

collect, analyze and share data.

The Hadoop … market is forecast to grow at a compound annual growth rate 58%

surpassing $1 billion by 2020.

Page 3: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 3

The Big Data Ecosystem Today

Page 4: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 4

Hadoop Ecosystems Continues to Grow instead of Shrink

Page 5: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 5

The Number of Versions of all the Hadoop Components is Staggering!

Page 6: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 6

Big Data Release Cadences Continue to Cause ISVs Difficulty

Quarterly:

Monthly or More:

Yearly:

Multiple Times a Year:

Page 7: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 7

To Make things More Complicated…

§  There is real, valuable, important functionality in many of these releases

§  Examples Include: •  New DataTypes in Hive (Varchar, Decimal, Timestamp, Binary, etc…)

•  Additional Ability to push down Queries in Mongo

•  Metadata Enhancements in newer Versions of Hive

•  Cassandra is adding enhancements every other month

•  Etc..

Page 8: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 8

This Amount of Change puts ISVs in a Difficult Position

Testing Nightmares

Inconsistencies of feature support

Keeping Up with the Industry

Page 9: Big Data Insurance

What do ISVs require today?

Page 10: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 10

What ISVs need is a Vendor that takes care of all this for you!

§  Progress|DataDirect has been writing Connectivity for over 25 Years!

§  We have been working with Big Data sources since ????

§  Significant Investment in Testing Infrastructure •  Over 150 Hadoop Servers

•  More than 30 Spark Servers

•  Over 250 Big Data Servers!

§  Day 1 Support Policy for New Versions

§  Dedicated Team of people dealing with configuring new systems and doing certifications

Page 11: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 11

Progress|DataDirect - Smoothing out the Rough Edges

§  Data Types reported and function based on Version •  Timestamp added in 0.8

•  Decimal added in 0.11

•  Date and varchar added in 0.12

•  Char added in 0.13

§  Syntax differences (HiveQL) •  INSERT statements

•  Parameter arrays

§  Catalog Metadata functionality •  Earlier versions of Hive didn‘t have Metadata functions at all

•  Newer Versions don‘t necessariy report Metadata correctly

Page 12: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 12

The DataDirect Support Matrix

Component Supported Versions Amazon Elastic MapReduce (Amazon EMR) 2.1.4 and Higher Apache Hadoop Hive 0.71 and Higher Cloudera's Distribution Including Apache Hadoop (CDH) CDH3 Update 4 and Higher Hortonworks Distrbution for Apache Hadoop 1.3 and Higher IBM BigInsights 3.0 and Higher MapR Distribution for Apache Hadoop 1.2 and Higher Pivotal HD Enterprise (PHD) 2.0.1 and Higher Cloudera Impala 1.0 and Higher Spark Pivotal HAWQ 1.1 and Higher MongoDB 2.2 and Higher Cassandra 1.2 and Higher

Page 13: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 13

The DataDirect Certification Process

§  Relational DBs •  We run all tests on each supported version before announcing certification

•  Add full test suite runs on all platforms to regular patch runs

•  Generally support 4-6 major versions of a Relational DB

•  The number of tests that we run for a Relational DB increase slowly over time

•  Occasionally phase out really old versions

§  Big Data •  Cloudera versions generally release before Apache

•  Always certify Apache

•  Ensure that other Distros Hive Versions have already been certified

•  Certify a given distro with a given Hive version

Page 14: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 14

It’s not ALL about Connection into Big Data Systems

§  Most of these Systems want to be the core system in your environment

§  There is usually a great need to help get data into the systems through tools such as: •  SQOOP

•  Spark

•  Flume

§  The rest of the DataDirect portfolio of drivers plug into these tools to broaden your reach

Page 15: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 15

The DataDirect Support Matrix

Component Supported Versions Amazon Elastic MapReduce (Amazon EMR) 2.1.4 and Higher Apache Hadoop Hive 0.71 and Higher Cloudera's Distribution Including Apache Hadoop (CDH) CDH3 Update 4 and Higher Hortonworks Distribution for Apache Hadoop 1.3 and Higher IBM BigInsights 3.0 and Higher MapR Distribution for Apache Hadoop 1.2 and Higher Pivotal HD Enterprise (PHD) 2.0.1 and Higher Cloudera Impala 1.0 and Higher Spark 1.2 and Higher Pivotal HAWQ 1.1 and Higher MongoDB 2.2 and Higher Cassandra 1.2 and Higher

Page 16: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 16

Progress|DataDirect - Smoothing out the Rough Edges

§  Data Types reported and function based on Version •  Timestamp added in 0.8

•  Decimal added in 0.11

•  Date and varchar added in 0.12

•  Char added in 0.13

§  Syntax differences (HiveQL) •  INSERT statements

•  Parameter arrays

§  Catalog Metadata functionality •  Earlier versions of Hive didn‘t have Metadata functions at all

•  Newer Versions don‘t necessariy report Metadata correctly

Page 17: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 17

The DataDirect Certification Process

§  Relational DBs •  We run all tests on each supported version before announcing certification

•  Add full test suite runs on all platforms to regular patch runs

•  Generally support 4-6 major versions of a Relational DB

•  The number of tests that we run for a Relational DB increase slowly over time

•  Occasionally phase out really old versions

§  Big Data •  Always certify Apache

•  Cloudera versions generally release before Apache and don’t strictly follow Apache

•  Ensure that other Distros Hive Versions have already been certified

•  Certify a given distro with a given Hive version

Page 18: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 18

It’s not ALL about Connecting into Big Data Systems

§  Most of these Systems want to be the core system in your environment

§  A great need to quickly get data into these systems through tools such as: •  SQOOP

•  Spark

•  Flume

Page 19: Big Data Insurance

© 2016 Progress Software Corporation and/or its subsidiaries or affiliates. All rights reserved. 19

Big Data / NoSQL Relational SaaS / Cloud EDI / XML / Text Ø  Apache Hadoop Hive

Ø  Cloudera Ø  Hortonworks Ø  MapR Ø  Amazon EMR

Ø  Cloudera Impala Ø  Pivotal Hawq Ø  MongoDB Ø  IBM BigInsights Ø  Oracle BDA Ø  Cassandra Ø  SAP HANA (Preview)

Ø  Microsoft SQL Server Ø  Oracle DB Ø  IBM DB2 Ø  Progress OpenEdge Ø  SAP Sybase Ø  MySQL Ø  PostgreSQL Ø  Pervasive SQL (Btrieve) Ø  IBM Informix Ø  Clipper Ø  Dbase Ø  FoxPro Ø  Paradox Ø  Text Files Ø  Excel

Ø  Salesforce.com Ø  Database.com Ø  FinancialForce Ø  Veeva CRM Ø  ServiceMax Ø  Any Force.com App

Ø  Microsoft Dynamics CRM * Ø  Microsoft SQL Azure Ø  Oracle Eloqua * Ø  Oracle Service Cloud Ø  Marketo * Ø  Google Analytics * Ø  SugarCRM Ø  Hubspot (Preview) * Ø  Progress Rollbase *

Ø  EDIFACT Ø  X12 Ø  IATA Ø  HealthcaseEDI:X12 (HIPPA), ICD-10, HL7 Ø  Flat Files: CSV, TXV, dBase Ø  Text files Ø  EDIG@S Ø  EANCOM

Currently Supported Data Sources

Data Warehouses Ø  TeraData Ø  Amazon Redshift Ø  Pivotal GreenPlum Ø  SAP Sybase IQ Any Data Source

Ø  SDK Ø  SequeLink Socket Server Ø  Custom Engineering

* Available exclusively for DataDirect Cloud

Page 20: Big Data Insurance