23
© 2020 Verisk Analytics, Inc. All rights reserved. 1 1

© 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 11

Page 2: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 22

What Machine Learning Can Do for You

Shane De Zilwa, Ph.D.

Talmor Meir, Ph.D.

Page 3: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 33© 2017 Verisk Analytics, Inc. All rights reserved. 3© 2017 Verisk Analytics, Inc. All rights reserved. 3

44 Zettabytes of Data in 2020That is 40 times more bytes then there are stars in the

observable universe.

© 2018 Verisk Analytics, Inc. All rights reserved. 3

Page 4: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 44

Outline

❑ Analytics of big data

❑ Translating data into business value

❑ Predictive models & Data science techniques

Page 5: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 55

Cross

Domain

Forecasting

Artificial

Intelligence

Telematics

Neural Network

Time Series

Audio Analytics

Classification

Social

Network

Analysis

Internal Analytics:

Sales/Marketing, HR

Deep Machine

Learning

Data

Exploration

Lifetime Value

Analysis

DA

TA

A

NA

LY

TIC

SD

ATA

AN

ALY

TIC

S

Trends &

Patterns

Page 6: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 66

The 3V’s of Big Data

01How much data is

there?

Volume

03How many different

types of sources are

there?

Variety

02How quickly is data

accessed?

Velocity• Batch

• Real Time

• Periodic

• Stream

• Transactions

• Sensors

• Terabyte

• Structured

• Semi Structured

• Unstructured

3-V’s

of Data

Page 7: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 77

Data Types

▪ Data that has no inherent structure

▪ e.g., video, social media text, images

▪ Data with some defining pattern but does not

conform to a table like structure

▪ e.g., emails, smart phone photos

▪ Data having a defining structure

▪ e.g., Database

Structured

Semi-Structured

Unstructured

Incre

asin

g G

row

th

Page 8: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 88

Translating Data into Business Value

Cross Industry Standard Process for Data

Mining (CRSIP-DM)Data

Understanding

Data

Preparation

Modeling

Evaluation

Implementation

Business

Understanding

Data

Page 9: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 99

Policy Life-Cycle

Issue PolicyRatingData Collection Data Validation

Sta

rt

3 to 4 weeks

Automation

Data Science

Natural Language Processing

Electronic Medical Records

Connected Technology

Advance Modeling

Issue

PolicyRating

Data

Collection

Data

Validation

Sta

rt

1 to 2 weeks

shortened time to policy

Page 10: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1010

In the Beginning…

• Predictive Models were built on structured data

• Credit Scoring was one of the first commercial applications

• Analytic Approaches included Supervised Models

Verisk Life Insurance Analytics10

Customer_ID State

# of

Cards

Card

Balances,

$

Card

Limits, $

Utilization,

%

Age of

Oldest

Card,

months

Time since

delinquency,

months

# of 60+ Day

Delinquencies in

last 12 months Performance

1 A12345 CA 2 8,000 10,000 80 60 5 1 Bad

2 B34567 UT 4 1,000 10,000 10 120 24 0 Good

3 B56789 NV 4 3,000 5,000 60 12 N/A 0 Bad

4 C12345 NV 3 6,000 12,000 50 60 36 1 Good

5 D45678 NY 1 3,000 10,000 30 72 48 0 Good

6 D67890 MA 2 5,000 20,000 25 180 N/A 0 Good

7 E23456 TX 6 6,000 8,000 75 120 24 2 Bad

… … … … … … … … … … …

Predictive variables Target

Page 11: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1111

Natural Language Processing

• The natural first evolution from structured data

was to Natural Language Processing (NLP)

also known as Text Mining

Verisk Life Insurance Analytics11

Text Mining

Scientific Papers

Web Content

Social media posts

EmailsRegulatory

Filings

Insurance Forms

Medical Records

Page 12: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1212

Natural Language Processing – Bag of Words Approaches

• Initial approaches used a “Bag of Words” approach

– Disregarded grammar and word order

• Useful for document classification

• But, for information extraction, word order and grammar are important

Verisk Life Insurance Analytics12

I love you only

only I love you

I only love you

I love only you

Page 13: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1313

Natural Language Processing – Rule Based Approaches

• Context

– Proximity to keywords

– Regular Expressions

– Journalistic practices

Verisk Life Insurance Analytics13

A Raleigh man, Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing a fraudulent insurance claim with Mutual Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.

Page 14: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1414

Natural Language Processing – Machine Learning

Instead of providing the machine with

rules, you provide it with many labeled

examples and let the machine learn the

patterns itself.

Verisk Life Insurance Analytics14

A Raleigh man, Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could tell not be reached for comment.

FBI investigators charged Brad Philips with insurance related Raleigh man, Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.

Deepak Patel of Oakland California was arrested for charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.

Tom Birch accuses his former lawyer, Jane Brown, of stealing the life , Robert Stevens, is charged with insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing a fraudulent insurance claim with Statefarm Insurance following motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.

James Yang, 45, of Detroit, pleaded guilty to six counts insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.

A Miami woman, Vanessa Smith, is in court for insurance fraud and attempting to obtain property by false pretense after investigators with the North Carolina Department of Insurance say he filed a fraudulent insurance claim. Department of Insurance criminal investigator, Paul Hernandez accuses Robert Stevens, 37, of 1480 Mountain Road, Raleigh, of filing a fraudulent insurance claim with Statefarm Insurance following a motor vehicle accident in Durham on Nov. 15, 2016, according to a release from the department. Brian Smith, the attorney representing Stevens, could not be reached for comment.

A New York man pleaded guilty in federal court to scheming to defraud several insurance companies out of millions of dollars by taking out life insurance policies on a brother who had died. Jason Yang, 38, pleaded guilty to mail and wire fraud, the U.S. attorney said in a statement. Yang took out at least 18 life insurance policies in his brother’s name, which combined carried total coverage limits in excess of $10 million between May 2015 and June 2017. But Yang's brother had died in China in 2014, prosecutors said. Yang also took steps to make it appear as though his brother was alive by opening and using bank accounts in his brother’s name and renewing his brother’s driver’s license.

Page 15: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1515

Artificial Intelligence:Any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision

trees, and machine learning (including deep learning)

Machine Learning:A subset of AI that includes abstruse statistical techniques that enable machines to

improve at tasks with experience. The category includes deep learning

Definitions

Verisk Life Insurance Analytics15

Deep Learning:The subset of machine learning composed

of algorithms that permit software to train

itself to perform tasks, like speech and

image recognition, by exposing

multilayered neural networks to vast

amounts of data

Page 16: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1616

Deep Learning came to the forefront in Computer Vision

Trying to emulate Human learning

Verisk Life Insurance Analytics

Page 17: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1717

ImageNet Timeline

Verisk Life Insurance Analytics

https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/

Page 18: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1818

Speech Analytics

Automated Speech Recognition (ASR)

Converting speech to text enables subsequent mining

Voice Attributes

e.g., gender, age

emotion, health

Ensemble Model

Combine ‘what was said’ with ‘how it was said’

Page 19: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 1919

Voice Analytics

Verisk Life Insurance Analytics

Ensemble

Rules-basedMachine-learning

based

Computer-vision

based

Pre-processing

Voice Signal

Page 20: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 2020

What Can Machine Learning Do for You?

Verisk Life Insurance Analytics20

Structured: Demographic,

Lifestyle,

Credit

Unstructured

Text: Medical

Records, APS’s

Voice: Recorded

statements,

tele-interviews

Images:Medical

Scans, Social

Media feeds

Life UW

Page 21: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 2121

Machine Learning Is Not a Magic Bullet

• Lots of data

• Representative datasets

• Explainability

• Palatability

• Test, test, test

• Beware the unexpected!!

Verisk Life Insurance Analytics

Page 22: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 2222

Machine Learning is not a Magic Bullet

• Lots of data

• Representative datasets

• Explainability

• Palatability

• Test, test, test

• Beware the unexpected!!

Verisk Life Insurance Analytics

Page 23: © 2020 Verisk Analytics, Inc. All rights reserved. 1 · •Context –Proximity to keywords –Regular Expressions –Journalistic practices 13 Verisk Life Insurance Analytics A

© 2020 Verisk Analytics, Inc. All rights reserved. 2323

Big Data

Deep Learning

Artificial

Intelligence

Neural Networks

Computer Vision Natural Language

Processing

Supervised Learning

Speech Processing

CONCLUSION:

Machine Learning can help you