Nelson Henwood - Finity Consulting · Conclusions . The gentle introduction . The ‘gentle’...

Preview:

Citation preview

Nelson Henwood

Pricing elasticity and demand modelling using GBMs

Today’s Presentation

3

(Gentle) Introduction

to GBMs

GBMs vs Logistic

Regression

Modelling Process Overview

Some Results

A Couple of other

Observations Conclusions

The gentle introduction

The ‘gentle’ introduction

GBM stands for Gradient Boosting Machine

5

= Gradient Descent

An optimisation algorithm for finding

the local minimum of a function

A machine learning ensemble algorithm to combine

weak learners into a single strong learner

Boosting

+

In English

Fit an ensemble model using an iterative

process

At each step, introduce a weak learner to

compensate the shortcomings of the existing

model:

In Gradient Boosting:

Shortcomings are identified by negative gradients

also called pseudo residuals (effectively residuals

with a view on error distribution)

Shortcomings tell us how to improve our model

Effectively, we are iteratively explaining

the model errors and using this to improve

our prediction

6

GBM

GBM development timeline

7

1984 CART

Breiman, Freidman et al

1997 Adaboost

Freud & Schapire

1999 Greedy Function Approximation: A

Gradient Boosting Machine

Stochastic Gradient Boosting

2000 Treenet 1 released

2003 CART

Breiman, Freidman et al

2007 Extensions to GBM R Package

Quantile regression

2012 Further extensions to GBM R

Package

Multinomial, t-distribution, pairwise

2014 XGBoost released

Key model parameters

8

Base Learner

Complexity Shrinkage

Sub-

sampling

Stopping

Criteria

Base Learner

Complexity Shrinkage Subsampling

Stopping Criteria

Base Learners

Loss Function

Training Fraction

Base learners

Decision tree most popular

9

Other possibilities

Linear Models

Ordinary linear regression

Ridge penalised linear

regression

Random effects

Smooth Models

P-splines

Radial basis functions

Other Models

Markov Random Fields

Wavelets

Custom base-learner

functions

Mixed Models

Base Learner Complexity

10

Control underlying complexity of

base learners

Trade-off between overfitting and

ability to explain underlying

complexity

Tree depth 2 or 3 to 8

Rule of thumb - minimum

observations 2-5% of data

Maximum Tree Depth

Minimum Observations

per Node

Shrinkage or learning rate

“Shrink” the impact of each

additional fitted base learner

If one of the boosting iterations

turns out to be “erroneous”, its

negative impact can be easily

corrected in subsequent steps

11

Reduces the size of

incremental steps

The smaller (closer to 0) the

shrinkage parameter, the better

the model generalization (but

convergence takes longer)

It is better to improve a model by taking many

small steps than by taking fewer larger steps

Subsampling (Bagging)

At each learning iteration, only a

random part of the training data is

used to fit the next base learner

Improves generalisation and

computation time

This random part is called the

“bag fraction”

A typical value is 50%

12

Introduce a bit of randomness

into the fitting procedure

Stopping criteria: How many boosting iterations?

13

Random Sample

Cross-fold Validation

Logic sequence summary

14

Initial estimate 𝐹0 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡

Calculate pseudo

residuals 𝑟𝑚 𝑥 = 𝑦 − 𝐹𝑚(𝑥)

Select random sample Random 50%

Build a decision tree

on residuals ℎ𝑚(𝑥) to approximate residuals

Update prediction 𝐹𝑚+1 𝑥 = 𝐹𝑚 𝑥 + 𝛿ℎ𝑚 𝑥

Learning rate 𝛿

1

2

3

4

5

Iterate until fit

deteriorates on

hold-out data

GBMs Vs

Logistic Regression

Traditional approaches Vs Machine learning

Aspect Traditional approach Machine learning

Modelling process

Iterative: Manually propose model

form, predictors etc. Gather feedback,

augment hypothesis.

Largely automated: Skill and

imagination of modeller still relevant

Hypothesis testing Statistical inference Empirical validation: More intuitive

Assumptions Response distribution

Linear predictor is “correct”

On par

Not required for inference

Predictive accuracy Usually good with high effort

expended Good with low effort

Efficiency Low High

Data volumes

required

More suitable with lower volumes

or predictors

More suitable with high volumes

and large number of predictors

16

Common objections

17

It’s a

black

box!

It’s expensive!

As transparent as GLMs

Key drivers & interactions

- shapes (relativities)

- range of predictions

- high/low segments

The software’s

expensive!

More efficient and cheaper

than GLMs (typically more

predictive)

Cost is ~20% of the GLM

cost for retention

modelling and ~30% for

competitor deconstructions

R is free

On-line courses are free

Many packages now

offering R plug-ins

Common objections

18

Prediction

volatility

(it’s not smooth)

For continuous variables

use monotonic function to

ensure smooth response

Group levels of discrete

variables (as with a GLM)

Can’t

implement

the results!

Learn on the job

Co-source a project

with experts:

Get the models,

scripts and knowledge

transfer

Retention models typically

once removed from the

customer facing pricing

Models can be scored

using R / Python and used

with SAS processes

PMML execution in Radar

/ Earnix

We

lack

internal

knowledge!

Modelling process

Section outline

Pre-modelling

Model segmentation

Cancellations

Feature engineering

Other considerations

Modelling

Technical model tuning

Variable selection

and ‘actuarial’ model tuning

Time trends and scoring

20

Pre-modelling : Segmenting the modelling

21

Class of business Motor / Home

Homeowners: Combined Vs Stand Alone products

‘Decrement’ type Cancellations Vs Lapses

Cancellations that are really lapses

Coverage downgrades

Payment frequency Annual Vs Monthly

Shoppers Vs Non-shoppers If only we could segment the modelling this way

How can we approximate this?

Pre-modelling: Considering cancellations

Monthly ‘chunks’

for latest data

Influence able vs

non-influence able

Pricing Vs

Customer management models

Need a year to fully expose

the policy to cancellation

Don’t want to use “old” data

in our model

Cut exposure into monthly

chunks

Use policy month

explanatory variable

Key drivers and

price elasticity differ

depending on

cancellation reason

How good are your

cancellation codes?

Split modelling

if possible

Information emerges through the

policy year (eg claim,

endorsement) but can’t use this for

pricing

For “customer management”

models we can update the

predictors each policy month

22

Pre-modelling: Feature engineering

23

Feature engineering is the process of using domain knowledge of the data

to create features that make machine learning algorithms work.

(Wikipedia)

External data Price data Customer

features

Behavioural

features External

Data Price Data

Customer Features

“Behavioural” Features

Pre-modelling: Some other considerations

24

Peril affected policies for homes

Price change data

How much data?

Oversampling

Modelling: Tuning model parameters

25

Base learner

complexity

Shrinkage

Bag fraction

Number of

trees (stopping)

1

2

3

4

Criteria:

Best fit on

validation data

ROC

Deviance

Modelling: ‘Actuarial’ model tuning

26

Key

drivers Shapes

Highly

correlated

variables

No

‘cheating’

‘Execution

ability’

Modelling: Time trends and scoring

27

Time parameters in model

(sometimes multiple)

Examine trends

and interactions

Align / challenge

budget / forward

forecast

Examples + Results

Finity GBM dashboard: Motor annual lapse model

29

Variable importance

30

Relative contribution to predictive

power of the model – based on

number of times the variable appears

in splits and model improvement as a

result

Sums to 100 across all variables

used in the underlying trees

Main effects not separated from

interaction effects

Variable InfluenceCU: Customer Variable 1 15.0CU: Policy Tenure 11.4CU: Customer Variable 3 6.7PR: Premium Change (%) 5.6BE: Payment Delay 4.1PR: Premium Rate 4.0PO: ABS Region 3.6PO: Vehicle Age 3.5CP: Competitor 1 CPI 3.1PO: Insured Age 2.9CP: Rank Insurer 2.3TI: Renewal Offer Month 2.3PO: Policy Variable 4 1.9PO: Policy Variable 5 1.8CU: Customer Variable 4 1.7CP: Competitor 2 CPI 1.5PR: Premium Change (%) Prior Renewal 1.4BE: Behavioural 2 1.1PR: Premium 3 1.1CP: Competitor 3 CPI 1.0

Variable importance

31

Variable InfluenceCU: Customer Variable 1 15.0CU: Policy Tenure 11.4CU: Customer Variable 3 6.7PR: Premium Change (%) 5.6BE: Payment Delay 4.1PR: Premium Rate 4.0PO: ABS Region 3.6PO: Vehicle Age 3.5CP: Competitor 1 CPI 3.1PO: Insured Age 2.9CP: Rank Insurer 2.3TI: Renewal Offer Month 2.3PO: Policy Variable 4 1.9PO: Policy Variable 5 1.8CU: Customer Variable 4 1.7CP: Competitor 2 CPI 1.5PR: Premium Change (%) Prior Renewal 1.4BE: Behavioural 2 1.1PR: Premium 3 1.1CP: Competitor 3 CPI 1.0

Total competitor

related influence =

9.3 Total price related

influence (inc. comp) =

21.4

Variable type importance

32

Cumulative gains curve and Gini

ROC Area = 78%

% of policies

Highest prediction Lowest prediction

Measure on validation data

Order observations by model score

Plot % of observation against % of

target

Gini index is area under the curve

Decile chart L

ap

se

s r

ate

Highest prediction Lowest prediction

Order observations by model

score and create 10 equal

sized groups

Compare the actual and predicted

outcome on validation data

Big separation between high/low

decile is desirable

Actual and predicted should

be close

X-Variable R

ela

tive

Im

pa

ct

Partial dependence

35

Measures the impact on

predicted lapse /

cancellation from a change

in a single predictor

Impact from all other

predictors are held constant

Some key partial dependencies

36

A few more key partial dependencies

37

Competitive position partial dependencies

Competitor ratio

measured as:

Competitor Premium

Vs Insurer Premium

Strength of impact

on retention varies

considerably across

competitors

38

Interaction strength

39

Key interactions (to

whatever depth desired)

can also be identified

Strength measure

Interactions with price

and competitive position

give us information about

price elasticity

Variable 1 Variable 2 Strength

Primary Insured Age Primary Driver Age 0.49

Customer Variable 3 Primary Insured Age 0.24

Customer Variable 3 Multi Product Holdings 0.14

Customer Variable 1 Premium Change (%) 0.11

Customer Variable 3 Insurer Competitive Rank 0.11

Customer Variable 1 Policy Tenure 0.11

Premium Change (%) Primary Driver Age 0.10

Policy Tenure Premium Change (%) 0.08

Premium Change (%) Multi Product Holdings 0.08

Competitor 1 CPI Technical Vehicle Risk 0.08

Motor annual lapse elasticity example

40

Primary driver age Policy duration

Motor monthly cancellation: Relative price sensitivity example (primary driver age)

41

Competitor cheaper Competitor more expensive

Distribution of price elasticity: Motor attrition

42

Motor attrition price elasticity: 5% increase

CPI = Competitor vs Client

Motor attrition price elasticity: 5% decrease

CPI = Competitor vs Client

Example segmentation: Renewal price increase elasticity

45

A couple of other things

Some other observations

47

Start with

a simple model A simple linear relationship is harder for a GBM

‘Offsets’ Can include in model statement

Exploring different

or mixed base learners ‘Linear’ for continuous, trees for discrete?

Predictive Power

48

Conclusion

In summary

50

Fast / efficient Good

predictive power

Not a

black box

Multiple

execution options

Questions?

Nelson Henwood

Director

Tel: +61 2 8252 3460

Email: nelson.henwood@finity.com.au

Contact

Distribution & use

This presentation has been prepared for the Finity

Consulting Pricing & Analytics Seminar, held on 18

October 2016. It is not intended, nor necessarily

suitable, for any other purpose.

Third parties should recognise that the furnishing of this

presentation is not a substitute for their own due

diligence and should place no reliance on this

presentation or the data contained herein which would

result in the creation of any duty or liability by Finity to

the third party.

Reliances & limitations

Finity wishes it to be understood that the information

presented at the Seminar is of a general nature and

does not constitute actuarial advice or investment

advice. While Finity has taken reasonable care in

compiling the information presented, Finity does not

warrant that the information provided is relevant to a

particular reader’s situation, specific objectives or

needs.

Finity does not have any responsibility to any attendee

at the conference or to any other party arising from the

content of this presentation. Before acting on any

information provided by Finity in this presentation,

readers should consider their own circumstances and

their need for advice on the subject – Finity would be

pleased to assist.

Recommended