96
Supervised multivariate discretization and levels merging for logistic regression Adrien Ehrhardt 1,2 Christophe Biernacki 2 Philippe Heinrich 3,4 Vincent Vandewalle 2,3 1 Crédit Agricole Consumer Finance 2 Inria Lille - Nord-Europe 3 Université de Lille 4 CNRS 31/08/2018 1/34

Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

Supervised multivariate discretization and levelsmerging for logistic regression

Adrien Ehrhardt1,2

Christophe Biernacki2

Philippe Heinrich3,4

Vincent Vandewalle2,31Crédit Agricole Consumer Finance

2Inria Lille - Nord-Europe3Université de Lille

4CNRS

31/08/2018

1/34

Page 2: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

2/34

Table of Contents

Context and basic notations

Supervised multivariate discretization and factor levels grouping

Interactions in logistic regression

Conclusion and future work

Page 3: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

3/34

Context and basic notations

Page 4: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

4/34

Current practice

Job Home Time injob

Family status Wages

Score

Repayment

Craftsman Owner 20 Widower 2000

225

0

? Renter 10 Common-law 1700

190

0

Licensed profes-sional

Starter 5 Divorced 4000

218

1

Executive By work 8 Single 2700

202

1

Office employee Renter 12 Married 1400

205

0

Worker By family 2 ? 1200

192

0

Table: Dataset with outliers and missing values.

1. Feature selection2. Discretization / grouping3. Interaction screening4. Logistic regression fitting

Page 5: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

4/34

Current practice

Job Home Time injob

Family status Wages

Score

Repayment

Craftsman Owner 20 Widower 2000

225

0

? Renter 10 Common-law 1700

190

0

Licensed profes-sional

Starter 5 Divorced 4000

218

1

Executive By work 8 Single 2700

202

1

Office employee Renter 12 Married 1400

205

0

Worker By family 2 ? 1200

192

0

Table: Dataset with outliers and missing values.

1. Feature selection2. Discretization / grouping3. Interaction screening4. Logistic regression fitting

Page 6: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

4/34

Current practice

Job Family status Wages

Score

Repayment

Craftsman Widower 2000

225

0

? Common-law 1700

190

0

Licensed profes-sional

Divorced 4000

218

1

Executive Single 2700

202

1

Office employee Married 1400

205

0

Worker ? 1200

192

0

Table: Dataset with outliers and missing values.

1. Feature selection2. Discretization / grouping3. Interaction screening4. Logistic regression fitting

Page 7: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

4/34

Current practice

Job Family status Wages

Score

Repayment

Craftsman Widower ]1500;2000]

225

0

? Common-law ]1500;2000]

190

0

Licensed profes-sional

Divorced ]2000;∞[

218

1

Executive Single ]2000;∞[

202

1

Office employee Married ]-∞ ; 1500]

205

0

Worker ? ]-∞ ; 1500]

192

0

Table: Dataset with outliers and missing values.

1. Feature selection2. Discretization / grouping3. Interaction screening4. Logistic regression fitting

Page 8: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

4/34

Current practice

Job Family status Wages

Score

Repayment

?+Low-qualified ?+Alone ]1500;2000]

225

0

?+Low-qualified Union ]1500;2000]

190

0

High-qualified ?+Alone ]2000;∞[

218

1

High-qualified ?+Alone ]2000;∞[

202

1

?+Low-qualified Union ]-∞ ; 1500]

205

0

?+Low-qualified ?+Alone ]-∞ ; 1500]

192

0

Table: Dataset with outliers and missing values.

1. Feature selection2. Discretization / grouping3. Interaction screening4. Logistic regression fitting

Page 9: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

4/34

Current practice

Job Family status x Wages

Score

Repayment

?+Low-qualified ?+Alone x ]1500;2000]

225

0

?+Low-qualified Union x ]1500;2000]

190

0

High-qualified ?+Alone x ]2000;∞[

218

1

High-qualified ?+Alone x ]2000;∞[

202

1

?+Low-qualified Union x ]-∞ ; 1500]

205

0

?+Low-qualified ?+Alone x ]-∞ ; 1500]

192

0

Table: Dataset with outliers and missing values.

1. Feature selection2. Discretization / grouping3. Interaction screening4. Logistic regression fitting

Page 10: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

4/34

Current practice

Job Family status x WagesScore

Repayment

?+Low-qualified ?+Alone x ]1500;2000]225

0

?+Low-qualified Union x ]1500;2000]190

0

High-qualified ?+Alone x ]2000;∞[218

1

High-qualified ?+Alone x ]2000;∞[202

1

?+Low-qualified Union x ]-∞ ; 1500]205

0

?+Low-qualified ?+Alone x ]-∞ ; 1500]192

0

Table: Dataset with outliers and missing values.

1. Feature selection2. Discretization / grouping3. Interaction screening4. Logistic regression fitting

Page 11: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

5/34

Mathematical reinterpretation

The whole process can be decomposed into two steps:

X → E → Yx 7→ e = f (x) 7→ y

Selected features: x = ((xj)d11︸ ︷︷ ︸

∈R

, (xj)dd1+1︸ ︷︷ ︸

∈{1,...,oj}

).

f must be “simple” and “component-wise”, i.e. f = (fj)d1 .

We restrict to discretization and grouping of factor levels.

Page 12: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

5/34

Mathematical reinterpretation

The whole process can be decomposed into two steps:

X → E → Yx 7→ e = f (x) 7→ y

Selected features: x = ((xj)d11︸ ︷︷ ︸

∈R

, (xj)dd1+1︸ ︷︷ ︸

∈{1,...,oj}

).

f must be “simple” and “component-wise”, i.e. f = (fj)d1 .

We restrict to discretization and grouping of factor levels.

Page 13: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

5/34

Mathematical reinterpretation

The whole process can be decomposed into two steps:

X → E → Yx 7→ e = f (x) 7→ y

Selected features: x = ((xj)d11︸ ︷︷ ︸

∈R

, (xj)dd1+1︸ ︷︷ ︸

∈{1,...,oj}

).

f must be “simple” and “component-wise”, i.e. f = (fj)d1 .

We restrict to discretization and grouping of factor levels.

Page 14: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

6/34

Mathematical reinterpretation: Feature Engineering

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = 3

Discretization (1 ≤ j ≤ d1)

Into m intervals with associated cutpoints c = (c1, . . . , cm−1).

Discretization function

fj(·; c ,m) : R→{1, . . . ,m}

x 7→1]−∞;c1](x) +m−2∑k=1

(k + 1) 1]ck ;ck+1](x)

+m 1]cm−1,∞[(x)

Page 15: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

6/34

Mathematical reinterpretation: Feature Engineering

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = 3

Discretization (1 ≤ j ≤ d1)

Into m intervals with associated cutpoints c = (c1, . . . , cm−1).

Discretization function

fj(·; c ,m) : R→{1, . . . ,m}

x 7→1]−∞;c1](x) +m−2∑k=1

(k + 1) 1]ck ;ck+1](x)

+m 1]cm−1,∞[(x)

Page 16: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

7/34

Mathematical reinterpretation: Feature Engineering

1 2

1 2 3 4 5

fj(xj) =

xj =

Grouping (d1 < j ≤ d)

Grouping o values into m, m ≤ o.

Grouping function

fj : {1, . . . , o} → {1, . . . ,m}fj surjective: it defines a partition of {1, . . . , o} in m elements.

Page 17: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

7/34

Mathematical reinterpretation: Feature Engineering

1 2

1 2 3 4 5

fj(xj) =

xj =

Grouping (d1 < j ≤ d)

Grouping o values into m, m ≤ o.

Grouping function

fj : {1, . . . , o} → {1, . . . ,m}fj surjective: it defines a partition of {1, . . . , o} in m elements.

Page 18: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

8/34

Mathematical reinterpretation: Objective

Target feature y ∈ {0, 1} must be predicted given engineeredfeatures f (x) = (fj(xj))

d1 .

We restrict to binary logistic regression.

On “raw” data, logistic regression yields:

logit(pθraw(1|x)) = θ0 +

d1∑j=1

θjxj +d∑

j=d1+1

θxjj

On discretized / grouped data, logistic regression yields:

logit(pθf (1|f (x))) = θ0 +d∑

j=1

θfj (xj )j

Page 19: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

8/34

Mathematical reinterpretation: Objective

Target feature y ∈ {0, 1} must be predicted given engineeredfeatures f (x) = (fj(xj))

d1 .

We restrict to binary logistic regression.

On “raw” data, logistic regression yields:

logit(pθraw(1|x)) = θ0 +

d1∑j=1

θjxj +d∑

j=d1+1

θxjj

On discretized / grouped data, logistic regression yields:

logit(pθf (1|f (x))) = θ0 +d∑

j=1

θfj (xj )j

Page 20: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

8/34

Mathematical reinterpretation: Objective

Target feature y ∈ {0, 1} must be predicted given engineeredfeatures f (x) = (fj(xj))

d1 .

We restrict to binary logistic regression.

On “raw” data, logistic regression yields:

logit(pθraw(1|x)) = θ0 +

d1∑j=1

θjxj +d∑

j=d1+1

θxjj

On discretized / grouped data, logistic regression yields:

logit(pθf (1|f (x))) = θ0 +d∑

j=1

θfj (xj )j

Page 21: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

8/34

Mathematical reinterpretation: Objective

Target feature y ∈ {0, 1} must be predicted given engineeredfeatures f (x) = (fj(xj))

d1 .

We restrict to binary logistic regression.

On “raw” data, logistic regression yields:

logit(pθraw(1|x)) = θ0 +

d1∑j=1

θjxj +d∑

j=d1+1

θxjj

On discretized / grouped data, logistic regression yields:

logit(pθf (1|f (x))) = θ0 +d∑

j=1

θfj (xj )j

Page 22: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

9/34

Example

True data

logit(ptrue(1|x)) = ln

(ptrue(1|x)

1− ptrue(1|x)

)= sin((x1 − 0.7)× 7)

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0

x

logit(p(1|x))

True distribution

Figure: True relationship between predictor and outcome

Page 23: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

9/34

Example

Logistic regression on “raw” data:

logit(pθraw(1|x)) = θ0 + θ1x1

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0

x

logit(p(1|x))

True distributionLinear logistic regression

Figure: Linear logistic regression fit

Page 24: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

9/34

Example

Logistic regression on discretized data:If f is not carefully chosen . . .

logit(pθf (1|f (x))) = θ0 + θf1(x1)1︸ ︷︷ ︸

θ11 ,...,θ501

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0

x

logit(p(1|x))

True distributionBad discretization

Figure: Bad (high variance) discretization

Page 25: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

9/34

Example

Logistic regression on discretized data:If f is carefully chosen . . .

logit(pθf (1|f (x))) = θ0 + θf1(x1)1︸ ︷︷ ︸

θ11 ,...,θ31

0.0 0.2 0.4 0.6 0.8 1.0

-1.0

-0.5

0.0

0.5

1.0

x

logit(p(1|x))

True distributionGood discretization

Figure: Good (bias/variance tradeoff) discretization

Page 26: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

10/34

Criterion

θ can be estimated for each discretization f and f ? can be chosenthrough our favorite model choice criterion: BIC, AIC, . . .

A model selection problem

(f ?,θ?) = argmaxf ∈F ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

How to efficiently explore F?

Page 27: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

10/34

Criterion

θ can be estimated for each discretization f and f ? can be chosenthrough our favorite model choice criterion: BIC, AIC, . . .

A model selection problem

(f ?,θ?) = argmaxf ∈F ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

How to efficiently explore F?

Page 28: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

10/34

Criterion

θ can be estimated for each discretization f and f ? can be chosenthrough our favorite model choice criterion: BIC, AIC, . . .

A model selection problem

(f ?,θ?) = argmaxf ∈F ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

How to efficiently explore F?

Page 29: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

11/34

Exploring F

Example of discretization

“Functional” space F where f lives is continuous:

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = mj

However, for a fixed design x = (x i )n1 there is a countable

space F̃ in which fRg ⇔ ∀i , j , fj(xi ) = gj(xi )

(f ?,θ?) = argmaxf ∈F̃ ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

Page 30: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

11/34

Exploring F

Example of discretization

“Functional” space F where f lives is continuous:

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = mj

However, for a fixed design x = (x i )n1 there is a countable

space F̃ in which fRg ⇔ ∀i , j , fj(xi ) = gj(xi )

(f ?,θ?) = argmaxf ∈F̃ ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

Page 31: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

11/34

Exploring F

Example of discretization

“Functional” space F where f lives is continuous:

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = mj

However, for a fixed design x = (x i )n1 there is a countable

space F̃ in which fRg ⇔ ∀i , j , fj(xi ) = gj(xi )

(f ?,θ?) = argmaxf ∈F̃ ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

Page 32: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

11/34

Exploring F

Example of discretization

“Functional” space F where f lives is continuous:

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = mj

However, for a fixed design x = (x i )n1 there is a countable

space F̃ in which fRg ⇔ ∀i , j , fj(xi ) = gj(xi )

xjfj(xj) = 1 fj(xj) = 2

(f ?,θ?) = argmaxf ∈F̃ ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

Page 33: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

11/34

Exploring F

Example of discretization

“Functional” space F where f lives is continuous:

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = mj

However, for a fixed design x = (x i )n1 there is a countable

space F̃ in which fRg ⇔ ∀i , j , fj(xi ) = gj(xi )

xjgj(xj) = 1 gj(xj) = 2

(f ?,θ?) = argmaxf ∈F̃ ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

Page 34: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

11/34

Exploring F

Example of discretization

“Functional” space F where f lives is continuous:

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = mj

However, for a fixed design x = (x i )n1 there is a countable

space F̃ in which fRg ⇔ ∀i , j , fj(xi ) = gj(xi )

xjhj(xj) = 1 hj(xj) = 2

(f ?,θ?) = argmaxf ∈F̃ ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

Page 35: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

11/34

Exploring F

Example of discretization

“Functional” space F where f lives is continuous:

xjfj(xj) = 1 fj(xj) = 2 fj(xj) = mj

However, for a fixed design x = (x i )n1 there is a countable

space F̃ in which fRg ⇔ ∀i , j , fj(xi ) = gj(xi )

(f ?,θ?) = argmaxf ∈F̃ ,θ∈Θf

n∑i=1

ln pθ(yi |f (x i ))− penalty(n;θ)

Page 36: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

12/34

State-of-the art

Current academic methods:

A lot of existing heuristics, see [Ramírez-Gallego et al., 2016]:

Page 37: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

13/34

State-of-the art

Most of these methods are:

I Univariate,

I Test statistics more or less justified (χ2-based).

Page 38: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

13/34

State-of-the art

Most of these methods are:

I Univariate,

I Test statistics more or less justified (χ2-based).

Page 39: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

14/34

Supervised multivariate discretization and factorlevels grouping

Page 40: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

15/34

Mathematical formalization

Discretized / grouped xj denoted by ej has been seen up to now asthe result of a function of xj :

ej = fj(xj).

Discretization / grouping ej can be seen as a latent randomvariable for which

p(ej |xj) = 1ej (fj(xj))︸ ︷︷ ︸Heaviside-like functiondifficult to optimize

.

Suppose for now that m = (mj)d1 is fixed.

e ∈ Em = {1, . . . ,m1} × . . .× . . .× {1, . . . ,md}.

Page 41: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

15/34

Mathematical formalization

Discretized / grouped xj denoted by ej has been seen up to now asthe result of a function of xj :

ej = fj(xj).

Discretization / grouping ej can be seen as a latent randomvariable for which

p(ej |xj) = 1ej (fj(xj)).

p(ej |xj) = 1ej (fj(xj))︸ ︷︷ ︸Heaviside-like functiondifficult to optimize

.

Suppose for now that m = (mj)d1 is fixed.

e ∈ Em = {1, . . . ,m1} × . . .× . . .× {1, . . . ,md}.

Page 42: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

15/34

Mathematical formalization

Discretized / grouped xj denoted by ej has been seen up to now asthe result of a function of xj :

ej = fj(xj).

Discretization / grouping ej can be seen as a latent randomvariable for which

p(ej |xj) = 1ej (fj(xj))︸ ︷︷ ︸Heaviside-like functiondifficult to optimize

.

Suppose for now that m = (mj)d1 is fixed.

e ∈ Em = {1, . . . ,m1} × . . .× . . .× {1, . . . ,md}.

Page 43: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

15/34

Mathematical formalization

Discretized / grouped xj denoted by ej has been seen up to now asthe result of a function of xj :

ej = fj(xj).

Discretization / grouping ej can be seen as a latent randomvariable for which

p(ej |xj) = 1ej (fj(xj))︸ ︷︷ ︸Heaviside-like functiondifficult to optimize

.

Suppose for now that m = (mj)d1 is fixed.

e ∈ Em = {1, . . . ,m1} × . . .× . . .× {1, . . . ,md}.

Page 44: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

15/34

Mathematical formalization

Discretized / grouped xj denoted by ej has been seen up to now asthe result of a function of xj :

ej = fj(xj).

Discretization / grouping ej can be seen as a latent randomvariable for which

p(ej |xj) = 1ej (fj(xj))︸ ︷︷ ︸Heaviside-like functiondifficult to optimize

.

Suppose for now that m = (mj)d1 is fixed.

e ∈ Em = {1, . . . ,m1} × . . .× . . .× {1, . . . ,md}.

Page 45: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

16/34

Mathematical formalization

Model selection criterionWe want the “best” model pθ?(y |e?) where θ? is the maximumlikelihood estimator and e? is determined by AIC, BIC. . .

(e?,θ?) = argmaxe∈Em ,θ∈Θm

n∑i=1

ln pθ(yi |e i )− penalty(n;θ)

Em is still too big, so there is a need for a “path” in Em.

Page 46: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

16/34

Mathematical formalization

Model selection criterionWe want the “best” model pθ?(y |e?) where θ? is the maximumlikelihood estimator and e? is determined by AIC, BIC. . .

(e?,θ?) = argmaxe∈Em ,θ∈Θm

n∑i=1

ln pθ(yi |e i )− penalty(n;θ)

Em is still too big, so there is a need for a “path” in Em.

Page 47: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

17/34

First set of hypotheses

H1: implicit hypothesis of every discretization:

Predictive information about y in x is “squeezed” in e, i.e.ptrue(y |x , e) = ptrue(y |e).

H2: conditional independence:

Conditional independence of ej |xj with other features xk , k 6= j .

x1

xj

xd

e1

ej

ed

y

f1

fj

fd

Figure: Dependance structure between xj ,ej and y

Page 48: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

17/34

First set of hypotheses

H1: implicit hypothesis of every discretization:

Predictive information about y in x is “squeezed” in e, i.e.ptrue(y |x , e) = ptrue(y |e).

H2: conditional independence:

Conditional independence of ej |xj with other features xk , k 6= j .

x1

xj

xd

e1

ej

ed

y

f1

fj

fd

Figure: Dependance structure between xj ,ej and y

Page 49: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

17/34

First set of hypotheses

H1: implicit hypothesis of every discretization:

Predictive information about y in x is “squeezed” in e, i.e.ptrue(y |x , e) = ptrue(y |e).

H2: conditional independence:

Conditional independence of ej |xj with other features xk , k 6= j .

x1

xj

xd

e1

ej

ed

y

f1

fj

fd

Figure: Dependance structure between xj ,ej and y

Page 50: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

18/34

Proposal: continuous relaxation

H3: link between xj and ej :

Continuous relaxation of a discrete problem (cf neural nets)

Continuous features: relaxation of the “hard” discretizationLink between ej and xj is supposed to be polytomous logistic:

pαj (ej |xj).

Categorical features: relaxation of the grouping problem

A simple contingency table is used:

pαj (ej = k |xj = `) = αk,`j .

Page 51: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

18/34

Proposal: continuous relaxation

H3: link between xj and ej :Continuous relaxation of a discrete problem (cf neural nets)

Continuous features: relaxation of the “hard” discretizationLink between ej and xj is supposed to be polytomous logistic:

pαj (ej |xj).

Categorical features: relaxation of the grouping problem

A simple contingency table is used:

pαj (ej = k |xj = `) = αk,`j .

Page 52: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

18/34

Proposal: continuous relaxation

H3: link between xj and ej :Continuous relaxation of a discrete problem (cf neural nets)

Continuous features: relaxation of the “hard” discretizationLink between ej and xj is supposed to be polytomous logistic:

pαj (ej |xj).

Categorical features: relaxation of the grouping problem

A simple contingency table is used:

pαj (ej = k |xj = `) = αk,`j .

Page 53: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

19/34

Intuitions about how it works: model proposal

p(y |x ,θ,α) =∑

e∈Em

p(y |x , e)p(e|x)

=∑

e∈Em

p(y |e)d∏

j=1

p(ej |xj)

=∑

e∈Em

pθe (y |e)︸ ︷︷ ︸logistic

d∏j=1

pαj (ej |xj)︸ ︷︷ ︸logistic or table

≈ pθ?(y |e?)

Subsequently, it is equivalent to “optimize” p(y |x ,θ,α).

maxθ,e

pθ(y |e) ' maxθ,α

p(y |x ,θ,α)

Page 54: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

19/34

Intuitions about how it works: model proposal

p(y |x ,θ,α) =∑

e∈Em

p(y |x , e)p(e|x)

=∑

e∈Em

p(y |e)d∏

j=1

p(ej |xj)

=∑

e∈Em

pθe (y |e)︸ ︷︷ ︸logistic

d∏j=1

pαj (ej |xj)︸ ︷︷ ︸logistic or table

≈ pθ?(y |e?)

Subsequently, it is equivalent to “optimize” p(y |x ,θ,α).

maxθ,e

pθ(y |e) ' maxθ,α

p(y |x ,θ,α)

Page 55: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

19/34

Intuitions about how it works: model proposal

p(y |x ,θ,α) =∑

e∈Em

p(y |x , e)p(e|x)

=∑

e∈Em

p(y |e)d∏

j=1

p(ej |xj)

=∑

e∈Em

pθe (y |e)︸ ︷︷ ︸logistic

d∏j=1

pαj (ej |xj)︸ ︷︷ ︸logistic or table

≈ pθ?(y |e?)

Subsequently, it is equivalent to “optimize” p(y |x ,θ,α).

maxθ,e

pθ(y |e) ' maxθ,α

p(y |x ,θ,α)

Page 56: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

20/34

Intuitions about how it works: estimation

“Classical” estimation strategy with latent variables: EM algorithm.

There would still be a sum over Em:p(y |x ,θ,α) =

∑e∈Em

pθ(y |e)∏d

j=1 pαj (ej |xj)

Use a Stochastic-EM! Draw e knowing that:

p(e|x , y) =pθ(y |e)

∏dj=1 pαj (ej |xj)∑

e∈Empθ(y |e)

∏dj=1 pαj (ej |xj)︸ ︷︷ ︸

still difficult to calculate

Gibbs-sampling step:

p(ej |x , y , e{−j}) ∝ pθ(y |e)pαj (ej |xj)

Page 57: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

20/34

Intuitions about how it works: estimation

“Classical” estimation strategy with latent variables: EM algorithm.

There would still be a sum over Em:p(y |x ,θ,α) =

∑e∈Em

pθ(y |e)∏d

j=1 pαj (ej |xj)

Use a Stochastic-EM! Draw e knowing that:

p(e|x , y) =pθ(y |e)

∏dj=1 pαj (ej |xj)∑

e∈Empθ(y |e)

∏dj=1 pαj (ej |xj)︸ ︷︷ ︸

still difficult to calculate

Gibbs-sampling step:

p(ej |x , y , e{−j}) ∝ pθ(y |e)pαj (ej |xj)

Page 58: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

20/34

Intuitions about how it works: estimation

“Classical” estimation strategy with latent variables: EM algorithm.

There would still be a sum over Em:p(y |x ,θ,α) =

∑e∈Em

pθ(y |e)∏d

j=1 pαj (ej |xj)

Use a Stochastic-EM! Draw e knowing that:

p(e|x , y) =pθ(y |e)

∏dj=1 pαj (ej |xj)∑

e∈Empθ(y |e)

∏dj=1 pαj (ej |xj)︸ ︷︷ ︸

still difficult to calculate

Gibbs-sampling step:

p(ej |x , y , e{−j}) ∝ pθ(y |e)pαj (ej |xj)

Page 59: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

20/34

Intuitions about how it works: estimation

“Classical” estimation strategy with latent variables: EM algorithm.

There would still be a sum over Em:p(y |x ,θ,α) =

∑e∈Em

pθ(y |e)∏d

j=1 pαj (ej |xj)

Use a Stochastic-EM! Draw e knowing that:

p(e|x , y) =pθ(y |e)

∏dj=1 pαj (ej |xj)∑

e∈Empθ(y |e)

∏dj=1 pαj (ej |xj)︸ ︷︷ ︸

still difficult to calculate

Gibbs-sampling step:

p(ej |x , y , e{−j}) ∝ pθ(y |e)pαj (ej |xj)

Page 60: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

20/34

Intuitions about how it works: estimation

“Classical” estimation strategy with latent variables: EM algorithm.

There would still be a sum over Em:p(y |x ,θ,α) =

∑e∈Em

pθ(y |e)∏d

j=1 pαj (ej |xj)

Use a Stochastic-EM! Draw e knowing that:

p(e|x , y) =pθ(y |e)

∏dj=1 pαj (ej |xj)∑

e∈Empθ(y |e)

∏dj=1 pαj (ej |xj)︸ ︷︷ ︸

still difficult to calculate

Gibbs-sampling step:

p(ej |x , y , e{−j}) ∝ pθ(y |e)pαj (ej |xj)

Page 61: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

21/34

Algorithm

Initializationx1,1 · · · x1,d

.

.

....

.

.

.xn,1 · · · xn,d

at random⇒

e1,1 · · · e1,d

.

.

....

.

.

.en,1 · · · en,d

Loop

y1

.

.

.yn

logisticregression⇒

e1,1 · · · e1,d

.

.

....

.

.

.en,1 · · · en,d

polytomousregression⇒

x1,1 · · · x1,d

.

.

....

.

.

.xn,1 · · · xn,d

Updating e

p(y1, e1,j = k|x i )

.

.

.p(yn, en,j = k|x i )

randomsampling⇒

e1,j

.

.

.en,j

Calculating eMAP

eMAP,1,j

.

.

.eMAP,n,j

MAP

estimate=

argmaxej

pαj (ej |x1,j )

.

.

.argmaxej

pαj (ej |xn,j )

Page 62: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

22/34

Go back to “hard” thresholding: MAP estimation

−0.7 1

eMAP,1 = 1 eMAP,1 = 2 eMAP,1 = 3

x1

p(1|x1)

−0.7 1

eMAP,1 = 1 eMAP,1 = 2 eMAP,1 = 3

x1

p(2|x1)

−0.7 1

eMAP,1 = 1 eMAP,1 = 2 eMAP,1 = 3

x1

p(3|x1)

Page 63: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

23/34

In the end: the best discretization

New model selection criterionWe have drastically restricted the search space to provably clevercandidates e(1)

MAP, . . . , e(iter)MAP resulting from the Gibbs sampling and

MAP estimation.

(e?,θ?) = argmaxe∈{e(1)

MAP,...,e(iter)MAP},θ∈Θm

n∑i=1

ln pθe (yi |e i )− penalty(n;θ)

We would still need to loop over candidates m!

In practice if ∀i , p(ei ,j = 1|xi ,j , yi )� 1, then ej = 1 disappears. . .

Start with m = (mmax)d1 and “wait” . . . eventually until m = 1.

Page 64: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

23/34

In the end: the best discretization

New model selection criterionWe have drastically restricted the search space to provably clevercandidates e(1)

MAP, . . . , e(iter)MAP resulting from the Gibbs sampling and

MAP estimation.

(e?,θ?) = argmaxe∈{e(1)

MAP,...,e(iter)MAP},θ∈Θm

n∑i=1

ln pθe (yi |e i )− penalty(n;θ)

We would still need to loop over candidates m!

In practice if ∀i , p(ei ,j = 1|xi ,j , yi )� 1, then ej = 1 disappears. . .

Start with m = (mmax)d1 and “wait” . . . eventually until m = 1.

Page 65: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

23/34

In the end: the best discretization

New model selection criterionWe have drastically restricted the search space to provably clevercandidates e(1)

MAP, . . . , e(iter)MAP resulting from the Gibbs sampling and

MAP estimation.

(e?,θ?) = argmaxe∈{e(1)

MAP,...,e(iter)MAP},θ∈Θm

n∑i=1

ln pθe (yi |e i )− penalty(n;θ)

We would still need to loop over candidates m!

In practice if ∀i , p(ei ,j = 1|xi ,j , yi )� 1, then ej = 1 disappears. . .

Start with m = (mmax)d1 and “wait” . . . eventually until m = 1.

Page 66: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

23/34

In the end: the best discretization

New model selection criterionWe have drastically restricted the search space to provably clevercandidates e(1)

MAP, . . . , e(iter)MAP resulting from the Gibbs sampling and

MAP estimation.

(e?,θ?) = argmaxe∈{e(1)

MAP,...,e(iter)MAP},θ∈Θm

n∑i=1

ln pθe (yi |e i )− penalty(n;θ)

We would still need to loop over candidates m!

In practice if ∀i , p(ei ,j = 1|xi ,j , yi )� 1, then ej = 1 disappears. . .

Start with m = (mmax)d1 and “wait” . . . eventually until m = 1.

Page 67: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

24/34

Interactions in logistic regression

Page 68: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

25/34

Notations

Upper triangular matrix with δk,` = 1 if k < ` and features p and q“interact” in the logistic regression.

logit(pθf (1|f (x))) = θ0 +d∑

j=1

θfj (xj )j +

∑1≤k<`≤d

δk,`θfk (xk )f`(x`)k,`

Imagine for now that the discretization e = f (x) is fixed. Thecriterion becomes:

(θ?, δ?) = argmax

θ,δ∈{0,1}d(d−1)

2

n∑i=1

ln pθ(yi |e i , δ)− penalty(n;θ)

Analogous to previous problem: 2d(d−1)

2 models.

Page 69: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

25/34

Notations

Upper triangular matrix with δk,` = 1 if k < ` and features p and q“interact” in the logistic regression.

logit(pθf (1|f (x))) = θ0 +d∑

j=1

θfj (xj )j +

∑1≤k<`≤d

δk,`θfk (xk )f`(x`)k,`

Imagine for now that the discretization e = f (x) is fixed. Thecriterion becomes:

(θ?, δ?) = argmax

θ,δ∈{0,1}d(d−1)

2

n∑i=1

ln pθ(yi |e i , δ)− penalty(n;θ)

Analogous to previous problem: 2d(d−1)

2 models.

Page 70: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

25/34

Notations

Upper triangular matrix with δk,` = 1 if k < ` and features p and q“interact” in the logistic regression.

logit(pθf (1|f (x))) = θ0 +d∑

j=1

θfj (xj )j +

∑1≤k<`≤d

δk,`θfk (xk )f`(x`)k,`

Imagine for now that the discretization e = f (x) is fixed. Thecriterion becomes:

(θ?, δ?) = argmax

θ,δ∈{0,1}d(d−1)

2

n∑i=1

ln pθ(yi |e i , δ)− penalty(n;θ)

Analogous to previous problem: 2d(d−1)

2 models.

Page 71: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

26/34

Model proposal

δ is latent and hard to optimize over: use a stochastic algorithm!

Strategy used here: Metropolis-Hastings algorithm.

p(y |e) =∑

δ∈{0,1}d(d−1)

2

p(y |e, δ)p(δ)

p(δ|e, y) ∝ p(y |e, δ)���p(δ)

≈ exp(−BIC[δ]/2)���p(δ) p(δp,q) =12

Which transition proposal q : ({0, 1}d(d−1)

2 , {0, 1}d(d−1)

2 ) 7→ [0; 1]?

Page 72: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

26/34

Model proposal

δ is latent and hard to optimize over: use a stochastic algorithm!

Strategy used here: Metropolis-Hastings algorithm.

p(y |e) =∑

δ∈{0,1}d(d−1)

2

p(y |e, δ)p(δ)

p(δ|e, y) ∝ p(y |e, δ)���p(δ)

≈ exp(−BIC[δ]/2)���p(δ) p(δp,q) =12

Which transition proposal q : ({0, 1}d(d−1)

2 , {0, 1}d(d−1)

2 ) 7→ [0; 1]?

Page 73: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

26/34

Model proposal

δ is latent and hard to optimize over: use a stochastic algorithm!

Strategy used here: Metropolis-Hastings algorithm.

p(y |e) =∑

δ∈{0,1}d(d−1)

2

p(y |e, δ)p(δ)

p(δ|e, y) ∝ p(y |e, δ)p(δ)≈ exp(−BIC[δ]/2)p(δ)

p(y |e) =∑

δ∈{0,1}d(d−1)

2

p(y |e, δ)p(δ)

p(δ|e, y) ∝ p(y |e, δ)���p(δ)

≈ exp(−BIC[δ]/2)���p(δ) p(δp,q) =12

Which transition proposal q : ({0, 1}d(d−1)

2 , {0, 1}d(d−1)

2 ) 7→ [0; 1]?

Page 74: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

26/34

Model proposal

δ is latent and hard to optimize over: use a stochastic algorithm!

Strategy used here: Metropolis-Hastings algorithm.

p(y |e) =∑

δ∈{0,1}d(d−1)

2

p(y |e, δ)p(δ)

p(δ|e, y) ∝ p(y |e, δ)���p(δ)

≈ exp(−BIC[δ]/2)���p(δ) p(δp,q) =12

Which transition proposal q : ({0, 1}d(d−1)

2 , {0, 1}d(d−1)

2 ) 7→ [0; 1]?

Page 75: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

26/34

Model proposal

δ is latent and hard to optimize over: use a stochastic algorithm!

Strategy used here: Metropolis-Hastings algorithm.

p(y |e) =∑

δ∈{0,1}d(d−1)

2

p(y |e, δ)p(δ)

p(δ|e, y) ∝ p(y |e, δ)���p(δ)

≈ exp(−BIC[δ]/2)���p(δ) p(δp,q) =12

Which transition proposal q : ({0, 1}d(d−1)

2 , {0, 1}d(d−1)

2 ) 7→ [0; 1]?

Page 76: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

27/34

Model proposal

2d(d−1) probabilities to calculate. . .

We restrict changes to only one entry δk,`.

Proposal: gain/loss in BIC between bivariate models with /without the interaction.

Trick: alternate one discretization / grouping step and one“interaction” step.

Page 77: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

27/34

Model proposal

2d(d−1) probabilities to calculate. . .

We restrict changes to only one entry δk,`.

Proposal: gain/loss in BIC between bivariate models with /without the interaction.

Trick: alternate one discretization / grouping step and one“interaction” step.

Page 78: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

27/34

Model proposal

2d(d−1) probabilities to calculate. . .

We restrict changes to only one entry δk,`.

Proposal: gain/loss in BIC between bivariate models with /without the interaction.

Trick: alternate one discretization / grouping step and one“interaction” step.

Page 79: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

27/34

Model proposal

2d(d−1) probabilities to calculate. . .

We restrict changes to only one entry δk,`.

Proposal: gain/loss in BIC between bivariate models with /without the interaction.

Trick: alternate one discretization / grouping step and one“interaction” step.

Page 80: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

28/34

Results: several datasets

Performance asserted on simulated data.Good performance on real data:

Gini Current performance glmdisc Basic glmAuto (n=50,000 ; d=15) 57.9 64.84 58

Revolving (n=48,000 ; d=9) 58.57 67.15 53.5Prospects (n=5,000 ; d=25) 35.6 47.18 32.7Electronics (n=140,000 ; d=8) 57.5 58 -10

Young (n=5,000 ; d=25) ≈ 15 30 12.2Basel II (n=70,000 ; d=13) 70 71.3 19

Relatively fast computing time: between 2 hours and a day on alaptop according to number of observations, features, . . .

“Inexisting” human time.

Page 81: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

29/34

Conclusion and future work

Page 82: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,I Python implementation of glmdisc available on Github and

PyPi,I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 83: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,

I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,I Python implementation of glmdisc available on Github and

PyPi,I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 84: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,

I Good empirical results and statistical guarantees (to someextent...),

I R implementation of glmdisc available on Github, to besubmitted to CRAN,

I Python implementation of glmdisc available on Github andPyPi,

I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 85: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),

I R implementation of glmdisc available on Github, to besubmitted to CRAN,

I Python implementation of glmdisc available on Github andPyPi,

I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 86: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,

I Python implementation of glmdisc available on Github andPyPi,

I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 87: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,I Python implementation of glmdisc available on Github and

PyPi,

I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 88: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,I Python implementation of glmdisc available on Github and

PyPi,I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 89: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,I Python implementation of glmdisc available on Github and

PyPi,I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 90: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,I Python implementation of glmdisc available on Github and

PyPi,I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 91: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

30/34

Take-aways

Conclusion

I Reinterpretation as a latent variable problem,I Resolution proposal relying on MCMC and “soft” discretization,I Good empirical results and statistical guarantees (to some

extent...),I R implementation of glmdisc available on Github, to be

submitted to CRAN,I Python implementation of glmdisc available on Github and

PyPi,I Big gain for statisticians in the field of Credit Scoring.

Perspectives

I Tested for logistic regression and polytomous logistic links:can be adapted to other models pθ and pα!

I The same model can be estimated with shallow neuralnetworks.

Page 92: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

31/34

Shallow neural nets as a substitute estimation procedure

Input #1

Input #2

Input #3

σ

σ

σ

σ

σ

σ

σ Output

Hiddenlayer

Inputlayer

Outputlayer

Page 93: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

31/34

Shallow neural nets as a substitute estimation procedure

Input #1

Input #2

Input #3

σ

σ

σ

σ

σ

σ

σ Output

Hiddenlayer

Inputlayer

Outputlayer

mj − 1 neuronswith sigmoidactivations≈ pαj (ej |xj)

Page 94: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

32/34

Thanks!

Page 95: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

33/34

References I

Ramírez-Gallego, S., García, S., Mouriño-Talín, H.,Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A.,Benítez, J. M., and Herrera, F. (2016).Data discretization: taxonomy and big data challenge.Wiley Interdisciplinary Reviews: Data Mining and KnowledgeDiscovery, 6(1):5–21.

Page 96: Supervised multivariate discretization and levels merging ... · 4/34 Currentpractice Job Home Time in job Familystatus Wages Score Repayment Craftsman Owner 20 Widower 2000 225 0?

34/34

Interaction discovery: proposal

p(δk,` = 1|ek , e`, y) = g(BIC[δk,` = 1]− BIC[δk,` = 0])≈ exp

( 12 (BIC[pθ(y |ek , e`, δk,` = 0)]− BIC[pθ(y |ek , e`, δk,` = 1)])

)q(δ, δ′) = |δk,` − pk,`| for the unique couple (k , `) s.t. δ(s)

k,` 6= δ′k,`

α = min(1, p(δ′|e,y)

p(δ|e,y)1−q(δ,δ′)q(δ,δ′)

)≈ min

(1, exp

( 12 (BIC[pθ(y |e, δ)]− BIC[pθ(y |e, δ′)])

) 1−q(δ,δ′)q(δ,δ′)

)