Concept to production Nationwide Insurance BigInsights Journey with Telematics

© 2015 IBM Corporation

From Concept to Production: Nationwide Insurance IBM BigInsights Journey with Telematics # 2404 Krish Rajaram & Rajesh Nandagiri – 10/26/2015

Big Data and Analytics Helps Nationwide Customers Become Better Drivers

Agenda

Introduction Architecture Data Processing Data Access Business Benefits

2

0.67

8.30 8.30

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

1 Hr Batch 12 Hrs Loop 12 Hrs batch

Dat

a Vo

lum

e in

GB

Elap

se T

ime

in M

ins

Cycles

After Redesign

First Iteration

Introduction

About Nationwide

SmartRide Program

SmartRide Data

About Nationwide

4

16+ MILLION POLICIES 25

MILLION CONTRIBUTED TO NONPROFITS

AND COMMUNITIES

$ 1 #

INSURER OF FARMS AND RANCHES

7 LARGEST

HOMEOWNER AND AUTO INSURANCE

PROVIDER IN THE U.S.

th

GALLUP GREAT PLACE TO WORK AWARD WINNER 3 YEARS RUNNING

LARGEST PET INSURER IN THE U.S.

9 th

LARGEST COMMERCIAL

INSURER

$23.9 BILLION IN REVENUE FOR 2013

Nationwide has approximately 31,000 associates serving customers in nearly every state.

1 # PROVIDER OF

PUBLIC-SECTOR RETIREMENT

PLANS

FOUNDED IN 1926 BY MEMBERS OF THE

OHIO FARM BUREAU

28 th

COMPUTERWORLD GREAT PLACE TO

WORK IN IT

About SmartRide • SmartRide is Nationwide's version of Telematics, offered to

customers to help them improve their driving behavior and save on insurance premiums.

5

• Customers install a small device into their vehicle for 6 months which measures…

SmartRide Data Characteristics Multiple vendors Files of different layouts arriving at different frequencies:

Hourly Every 4 hrs

Four CSV files per vendor ~ 30 GB to ~ 60 GB of data per day Data challenges

Late arriving trips Partial trips Duplicate trips Orphan trips

6

Trip Data Characteristics • Missing Timestamp & Speed Spike • Acceleration Lag

7

vin_nb trip_nb position_ts Speed engine_rpmabc 123 2015-07-21 12:31:36.0 54 1600abc 123 2015-07-21 12:31:39.0 55 1800abc 123 2015-07-21 12:31:42.0 57 1500abc 123 2015-07-21 12:31:43.0 82 1600abc 123 2015-07-21 12:31:44.0 58 1500

vin_nb trip_nb position_ts Speed engine_rpmabc 123 2015-06-30 21:25:05.0 0 700abc 123 2015-06-30 21:25:06.0 0 700abc 123 2015-06-30 21:25:07.0 0 1000abc 123 2015-06-30 21:25:08.0 8 1800abc 123 2015-06-30 21:25:09.0 15 2000

Architecture

Logical Data Flow

IBM® BigInsights™ Configuration

Decision Catalog

Job Orchestration

Logical Data Flow

9

IBM® BigInsights™ for Apache™ Hadoop Configuration

• Version 2.1.2 6 Management Nodes and 16 Data Nodes Each with 128 GB RAM and 18 TB of storage Hadoop 2.2, BigSQL 1.0, Hive 0.12, Hbase 0.96

• Three environments Dev, Test, and Production All same configuration

• Limitations No workload management No environment for DR Used Test Cluster for Hbase failover

10

Decision Catalog

11

Job Orchestration

12

Data Processing

Design Considerations

Phases of Data Movement

Batch Performance Metrics

Design Considerations • One hour window for end

to end processing Handling data issues Summarization Multiple cycles per

day • Predictable run time for

backlog processing when jobs fail

• Reloading incorrect batch

• Restart failed batch

14

0.67

8.30 8.30

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00

9.00

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

1 Hr Batch 12 Hrs Loop 12 Hrs batch

Elap

se T

ime

in M

ins

Cycles

TripScrub TripSecDetails TripSum

State Median Hbase load AuditEn

raw2can Input Size in GB

Acquire Phase

15

Raw trip files copied into HDFS using WebHDFS protocol

Folders created by

vendor, file, load event ID, and batch #

Used Sqoop to

transfer 4 TB of historical data from Data Warehouse

Hive external tables for each file

Partitioned by load event ID and batch #

Used both BigSQL 1.0 and HiveQL

Partitioned external tables helped in o Processing backlog data o Reprocessing incorrect batches

Standardize Phase

16

Select data from external tables based on load event ID

Each load event ID can include one or more batches

More than one load event ID can be processed in one cycle

Data moved to next stage only from work tables Helped in performance

Dynamic partitions helped in loading multiple batches Partitions get overwritten if already exists

Helped in reprocessing incorrect batch

Work tables contain data for CURRENT

processing cycle

Canonical tables partitioned by

source and batch #

Load using dynamic

partitioning

Data Scrubbing & Event Calculation

17

Trip summary

Trip point

Map side join Single read multi write

Orphan trips

Trip points (Work table)

Java M/R program for o Scrubbing o Events calculation Night time driving Hard brake Fast acceleration Miles driven

Events at seconds level (Work table)

Very good performance gain Using Java for complex scrubbing rules Single read multiple writes Only required data points processed No data persisted to corpse tables

Summarization Phase

18

Events at seconds level (Work table)

Gather all trips related to devices from current trip and aggregate at various levels

Union ALL UDF to store data points

for trip graph Replace new summary

info into final table

SRE summary (Work table)

SRE summary partitioned by

source

SRE summary in HBase

Parallelized the Union All operation Partitioning by Source enabled both Vendor

data to be processed at same time if overlap happens

PUT from Hive to Hbase, WAL disabled

Shorten column names

Changed to epoch time

Prefix salting key Generate rowkey Column family

mapping

Batch Performance Metrics

19

1 Hr Batch SLA

0

0.5

1

1.5

2

2.5

0 5

10 15 20 25 30 35 40

Dat

a Si

ze in

GB

Run

tim

e in

Min

s

Cycle Schedule Time

Avg Run Times for Hourly Cycles

0

2

4

6

8

10

12

0

10

20

30

40

50

60

0000 0400 0800 1200 1600 2000

Dat

a Si

ze in

GB

Run

tim

e in

Min

s

Cycle Schedule Time

Avg Run Times for 4 hr Cycles

Trip Second Details

Standarize

SRE Trip Summary Hive

SRE Trip Summary Hbase

Audits

Acquire

Size in GB

Data Access

SmartRide Web Page

Application Layer

Column Family and Row Key Design

Performance Metrics

SmartRide Web Page

21

SmartRide Web Page – Daily

22

Application Layer

23

Data Access Layer

HBASE API

Restful Service

Single Page Web App

BigSQL &

Hive

Aggregates

Daily

HDFS

HBase

HRegion Server

HRegion

HLog

Memstore

HFile

ODS – DB2 ODBC

Column Family & RowKey Design

24

RowKey – Pfx_pgmId_pdflg_timestamp

Column Family – Summary Data

Column Family – Trip-point Data

12_8798782_Tp_2015080912000000

SM:miles,1500001245,’15’, SM:hb,1500001245,’2’, SM:fa,1500001245,’5’, SM:nt,1500001245,’Y’

TP:Trip,1500001245,’{JSON BLOB}’ S

orte

d Le

xico

grap

hica

lly

• Column family (CF) helps in grouping the related columns depending on access pattern.

• Co-locating the keys related to one customer in one region to access data using filter from one region server.

Performance Metrics

Scenarios – 1x, 2x, 3x concurrent users, Zookeeper node going down, Datanode unavailable Tools used – Initial test using custom program, LoadRunner for final test, SiteScope for monitoring resource consumption

25

SLA for aggregates – 5 sec # of concurrent users - 1200

HBase Data Distribution – Using Hannibal

26

SmartRide Data Distribution

Business Benefits • Deeper Engagement with Members

Over 2 million website page views since the July launch. To put in perspective, our vendor-hosted website would receive 100,000 views in a 12 month period.

Over 60K users have accessed the new site and 90% of those are new users.

• Increase in bind ratios across all channels

• Improvement in loss ratios

• Enterprise first "big data" implementation at Nationwide

27

Future scope – Personal and Commercial Fleet

28

Insights Give Nationwide Competitive Advantage

Weather Data

GPS Data

Hourly Trip Data

from Device

Claims Data

Other Public

Records

© 2015 IBM Corporation

Thank You

We Value Your Feedback!

Don’t forget to submit your Insight session and speaker feedback! Your feedback is very important to us – we use it

to continually improve the conference.

Access your surveys at insight2015survey.com to quickly submit your surveys from your smartphone, laptop, or

conference kiosk.

30

http://insight2015survey.com/

31

Notices and Disclaimers Copyright © 2015 by International Business Machines Corporation (IBM). No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY. IBM products and services are warranted according to the terms and conditions of the agreements under which they are provided.

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

32

Notices and Disclaimers (con’t)

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

• IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase, Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®, FileNet®, Global Business Services ®, Global Technology Services ®, IBM ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand, ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower, PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®, PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®, Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ, Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-Force® and System z® Z/OS, are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml.

http://www.ibm.com/legal/copytrade.shtml

• IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.

• Information regarding potential future products is intended to outline our general product direction

and it should not be relied on in making a purchasing decision. • The information mentioned regarding potential future products is not a commitment, promise, or

legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract.

• The development, release, and timing of any future features or functionality described for our

products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

Please Note:

2

Data & Analytics

Concept to production Nationwide Insurance BigInsights Journey with Telematics