53

Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

  • Upload
    dohanh

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation
Page 2: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Nelson Henwood

Pricing elasticity and demand modelling using GBMs

Page 3: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Today’s Presentation

3

(Gentle) Introduction

to GBMs

GBMs vs Logistic

Regression

Modelling Process Overview

Some Results

A Couple of other

Observations Conclusions

Page 4: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

The gentle introduction

Page 5: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

The ‘gentle’ introduction

GBM stands for Gradient Boosting Machine

5

= Gradient Descent

An optimisation algorithm for finding

the local minimum of a function

A machine learning ensemble algorithm to combine

weak learners into a single strong learner

Boosting

+

Page 6: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

In English

Fit an ensemble model using an iterative

process

At each step, introduce a weak learner to

compensate the shortcomings of the existing

model:

In Gradient Boosting:

Shortcomings are identified by negative gradients

also called pseudo residuals (effectively residuals

with a view on error distribution)

Shortcomings tell us how to improve our model

Effectively, we are iteratively explaining

the model errors and using this to improve

our prediction

6

GBM

Page 7: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

GBM development timeline

7

1984 CART

Breiman, Freidman et al

1997 Adaboost

Freud & Schapire

1999 Greedy Function Approximation: A

Gradient Boosting Machine

Stochastic Gradient Boosting

2000 Treenet 1 released

2003 CART

Breiman, Freidman et al

2007 Extensions to GBM R Package

Quantile regression

2012 Further extensions to GBM R

Package

Multinomial, t-distribution, pairwise

2014 XGBoost released

Page 8: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Key model parameters

8

Base Learner

Complexity Shrinkage

Sub-

sampling

Stopping

Criteria

Base Learner

Complexity Shrinkage Subsampling

Stopping Criteria

Base Learners

Loss Function

Training Fraction

Page 9: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Base learners

Decision tree most popular

9

Other possibilities

Linear Models

Ordinary linear regression

Ridge penalised linear

regression

Random effects

Smooth Models

P-splines

Radial basis functions

Other Models

Markov Random Fields

Wavelets

Custom base-learner

functions

Mixed Models

Page 10: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Base Learner Complexity

10

Control underlying complexity of

base learners

Trade-off between overfitting and

ability to explain underlying

complexity

Tree depth 2 or 3 to 8

Rule of thumb - minimum

observations 2-5% of data

Maximum Tree Depth

Minimum Observations

per Node

Page 11: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Shrinkage or learning rate

“Shrink” the impact of each

additional fitted base learner

If one of the boosting iterations

turns out to be “erroneous”, its

negative impact can be easily

corrected in subsequent steps

11

Reduces the size of

incremental steps

The smaller (closer to 0) the

shrinkage parameter, the better

the model generalization (but

convergence takes longer)

It is better to improve a model by taking many

small steps than by taking fewer larger steps

Page 12: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Subsampling (Bagging)

At each learning iteration, only a

random part of the training data is

used to fit the next base learner

Improves generalisation and

computation time

This random part is called the

“bag fraction”

A typical value is 50%

12

Introduce a bit of randomness

into the fitting procedure

Page 13: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Stopping criteria: How many boosting iterations?

13

Random Sample

Cross-fold Validation

Page 14: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Logic sequence summary

14

Initial estimate 𝐹0 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡

Calculate pseudo

residuals 𝑟𝑚 𝑥 = 𝑦 − 𝐹𝑚(𝑥)

Select random sample Random 50%

Build a decision tree

on residuals ℎ𝑚(𝑥) to approximate residuals

Update prediction 𝐹𝑚+1 𝑥 = 𝐹𝑚 𝑥 + 𝛿ℎ𝑚 𝑥

Learning rate 𝛿

1

2

3

4

5

Iterate until fit

deteriorates on

hold-out data

Page 15: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

GBMs Vs

Logistic Regression

Page 16: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Traditional approaches Vs Machine learning

Aspect Traditional approach Machine learning

Modelling process

Iterative: Manually propose model

form, predictors etc. Gather feedback,

augment hypothesis.

Largely automated: Skill and

imagination of modeller still relevant

Hypothesis testing Statistical inference Empirical validation: More intuitive

Assumptions Response distribution

Linear predictor is “correct”

On par

Not required for inference

Predictive accuracy Usually good with high effort

expended Good with low effort

Efficiency Low High

Data volumes

required

More suitable with lower volumes

or predictors

More suitable with high volumes

and large number of predictors

16

Page 17: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Common objections

17

It’s a

black

box!

It’s expensive!

As transparent as GLMs

Key drivers & interactions

- shapes (relativities)

- range of predictions

- high/low segments

The software’s

expensive!

More efficient and cheaper

than GLMs (typically more

predictive)

Cost is ~20% of the GLM

cost for retention

modelling and ~30% for

competitor deconstructions

R is free

On-line courses are free

Many packages now

offering R plug-ins

Page 18: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Common objections

18

Prediction

volatility

(it’s not smooth)

For continuous variables

use monotonic function to

ensure smooth response

Group levels of discrete

variables (as with a GLM)

Can’t

implement

the results!

Learn on the job

Co-source a project

with experts:

Get the models,

scripts and knowledge

transfer

Retention models typically

once removed from the

customer facing pricing

Models can be scored

using R / Python and used

with SAS processes

PMML execution in Radar

/ Earnix

We

lack

internal

knowledge!

Page 19: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Modelling process

Page 20: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Section outline

Pre-modelling

Model segmentation

Cancellations

Feature engineering

Other considerations

Modelling

Technical model tuning

Variable selection

and ‘actuarial’ model tuning

Time trends and scoring

20

Page 21: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Pre-modelling : Segmenting the modelling

21

Class of business Motor / Home

Homeowners: Combined Vs Stand Alone products

‘Decrement’ type Cancellations Vs Lapses

Cancellations that are really lapses

Coverage downgrades

Payment frequency Annual Vs Monthly

Shoppers Vs Non-shoppers If only we could segment the modelling this way

How can we approximate this?

Page 22: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Pre-modelling: Considering cancellations

Monthly ‘chunks’

for latest data

Influence able vs

non-influence able

Pricing Vs

Customer management models

Need a year to fully expose

the policy to cancellation

Don’t want to use “old” data

in our model

Cut exposure into monthly

chunks

Use policy month

explanatory variable

Key drivers and

price elasticity differ

depending on

cancellation reason

How good are your

cancellation codes?

Split modelling

if possible

Information emerges through the

policy year (eg claim,

endorsement) but can’t use this for

pricing

For “customer management”

models we can update the

predictors each policy month

22

Page 23: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Pre-modelling: Feature engineering

23

Feature engineering is the process of using domain knowledge of the data

to create features that make machine learning algorithms work.

(Wikipedia)

External data Price data Customer

features

Behavioural

features External

Data Price Data

Customer Features

“Behavioural” Features

Page 24: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Pre-modelling: Some other considerations

24

Peril affected policies for homes

Price change data

How much data?

Oversampling

Page 25: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Modelling: Tuning model parameters

25

Base learner

complexity

Shrinkage

Bag fraction

Number of

trees (stopping)

1

2

3

4

Criteria:

Best fit on

validation data

ROC

Deviance

Page 26: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Modelling: ‘Actuarial’ model tuning

26

Key

drivers Shapes

Highly

correlated

variables

No

‘cheating’

‘Execution

ability’

Page 27: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Modelling: Time trends and scoring

27

Time parameters in model

(sometimes multiple)

Examine trends

and interactions

Align / challenge

budget / forward

forecast

Page 28: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Examples + Results

Page 29: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Finity GBM dashboard: Motor annual lapse model

29

Page 30: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Variable importance

30

Relative contribution to predictive

power of the model – based on

number of times the variable appears

in splits and model improvement as a

result

Sums to 100 across all variables

used in the underlying trees

Main effects not separated from

interaction effects

Variable InfluenceCU: Customer Variable 1 15.0CU: Policy Tenure 11.4CU: Customer Variable 3 6.7PR: Premium Change (%) 5.6BE: Payment Delay 4.1PR: Premium Rate 4.0PO: ABS Region 3.6PO: Vehicle Age 3.5CP: Competitor 1 CPI 3.1PO: Insured Age 2.9CP: Rank Insurer 2.3TI: Renewal Offer Month 2.3PO: Policy Variable 4 1.9PO: Policy Variable 5 1.8CU: Customer Variable 4 1.7CP: Competitor 2 CPI 1.5PR: Premium Change (%) Prior Renewal 1.4BE: Behavioural 2 1.1PR: Premium 3 1.1CP: Competitor 3 CPI 1.0

Page 31: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Variable importance

31

Variable InfluenceCU: Customer Variable 1 15.0CU: Policy Tenure 11.4CU: Customer Variable 3 6.7PR: Premium Change (%) 5.6BE: Payment Delay 4.1PR: Premium Rate 4.0PO: ABS Region 3.6PO: Vehicle Age 3.5CP: Competitor 1 CPI 3.1PO: Insured Age 2.9CP: Rank Insurer 2.3TI: Renewal Offer Month 2.3PO: Policy Variable 4 1.9PO: Policy Variable 5 1.8CU: Customer Variable 4 1.7CP: Competitor 2 CPI 1.5PR: Premium Change (%) Prior Renewal 1.4BE: Behavioural 2 1.1PR: Premium 3 1.1CP: Competitor 3 CPI 1.0

Total competitor

related influence =

9.3 Total price related

influence (inc. comp) =

21.4

Page 32: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Variable type importance

32

Page 33: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Cumulative gains curve and Gini

ROC Area = 78%

% of policies

Highest prediction Lowest prediction

Measure on validation data

Order observations by model score

Plot % of observation against % of

target

Gini index is area under the curve

Page 34: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Decile chart L

ap

se

s r

ate

Highest prediction Lowest prediction

Order observations by model

score and create 10 equal

sized groups

Compare the actual and predicted

outcome on validation data

Big separation between high/low

decile is desirable

Actual and predicted should

be close

Page 35: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

X-Variable R

ela

tive

Im

pa

ct

Partial dependence

35

Measures the impact on

predicted lapse /

cancellation from a change

in a single predictor

Impact from all other

predictors are held constant

Page 36: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Some key partial dependencies

36

Page 37: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

A few more key partial dependencies

37

Page 38: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Competitive position partial dependencies

Competitor ratio

measured as:

Competitor Premium

Vs Insurer Premium

Strength of impact

on retention varies

considerably across

competitors

38

Page 39: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Interaction strength

39

Key interactions (to

whatever depth desired)

can also be identified

Strength measure

Interactions with price

and competitive position

give us information about

price elasticity

Variable 1 Variable 2 Strength

Primary Insured Age Primary Driver Age 0.49

Customer Variable 3 Primary Insured Age 0.24

Customer Variable 3 Multi Product Holdings 0.14

Customer Variable 1 Premium Change (%) 0.11

Customer Variable 3 Insurer Competitive Rank 0.11

Customer Variable 1 Policy Tenure 0.11

Premium Change (%) Primary Driver Age 0.10

Policy Tenure Premium Change (%) 0.08

Premium Change (%) Multi Product Holdings 0.08

Competitor 1 CPI Technical Vehicle Risk 0.08

Page 40: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Motor annual lapse elasticity example

40

Primary driver age Policy duration

Page 41: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Motor monthly cancellation: Relative price sensitivity example (primary driver age)

41

Competitor cheaper Competitor more expensive

Page 42: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Distribution of price elasticity: Motor attrition

42

Page 43: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Motor attrition price elasticity: 5% increase

CPI = Competitor vs Client

Page 44: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Motor attrition price elasticity: 5% decrease

CPI = Competitor vs Client

Page 45: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Example segmentation: Renewal price increase elasticity

45

Page 46: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

A couple of other things

Page 47: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Some other observations

47

Start with

a simple model A simple linear relationship is harder for a GBM

‘Offsets’ Can include in model statement

Exploring different

or mixed base learners ‘Linear’ for continuous, trees for discrete?

Page 48: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Predictive Power

48

Page 49: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Conclusion

Page 50: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

In summary

50

Fast / efficient Good

predictive power

Not a

black box

Multiple

execution options

Page 51: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Questions?

Page 52: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Nelson Henwood

Director

Tel: +61 2 8252 3460

Email: [email protected]

Contact

Page 53: Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’ introduction GBM stands for Gradient Boosting Machine 5 = Gradient Descent An optimisation

Distribution & use

This presentation has been prepared for the Finity

Consulting Pricing & Analytics Seminar, held on 18

October 2016. It is not intended, nor necessarily

suitable, for any other purpose.

Third parties should recognise that the furnishing of this

presentation is not a substitute for their own due

diligence and should place no reliance on this

presentation or the data contained herein which would

result in the creation of any duty or liability by Finity to

the third party.

Reliances & limitations

Finity wishes it to be understood that the information

presented at the Seminar is of a general nature and

does not constitute actuarial advice or investment

advice. While Finity has taken reasonable care in

compiling the information presented, Finity does not

warrant that the information provided is relevant to a

particular reader’s situation, specific objectives or

needs.

Finity does not have any responsibility to any attendee

at the conference or to any other party arising from the

content of this presentation. Before acting on any

information provided by Finity in this presentation,

readers should consider their own circumstances and

their need for advice on the subject – Finity would be

pleased to assist.