Ordinal Classification

SIKS-Advanced Course on Computational Intelligence, October 2001

1

Ordinal Classification

Rob Potharst

Erasmus University Rotterdam


2

What is ordinal classification?


3

Company: catering service Swift

• total liabilities / total assets 1

• net income / net worth 3

• … …

• managers’work experience 5

• market niche-position 3

• … ...

bankruptcy risk + (acceptable)


4

2 2 2 2 1 3 5 3 5 4 2 4 +4 5 2 3 3 3 5 4 5 5 4 5 +3 5 1 1 2 2 5 3 5 5 3 5 +2 3 2 1 2 4 5 2 5 4 3 4 +3 4 3 2 2 2 5 3 5 5 3 5 +3 5 3 3 3 2 5 3 4 4 3 4 +3 5 2 3 4 4 5 4 4 5 3 5 +1 1 4 1 2 3 5 2 4 4 1 4 +3 4 3 3 2 4 4 2 4 3 1 3 +3 4 2 1 2 2 4 2 4 4 1 4 +2 5 1 1 3 4 4 3 4 4 3 4 +3 3 4 4 3 4 4 2 4 4 1 3 +1 1 2 1 1 3 4 2 4 4 1 4 +2 1 1 1 4 3 4 2 4 4 3 3 +2 3 2 1 1 2 4 4 4 4 2 5 +2 3 4 3 1 5 4 2 4 3 2 3 +2 2 2 1 1 4 4 4 4 4 2 4 +2 1 3 1 1 3 5 2 4 2 1 3 +2 1 2 1 1 3 4 2 4 4 2 4 +2 1 2 1 1 5 4 2 4 4 2 4 +

2 1 1 1 1 3 2 2 4 4 2 3 ?1 1 3 1 2 1 3 4 4 4 3 4 ?2 1 2 1 1 2 4 3 3 2 1 2 ?1 1 1 1 1 1 3 2 4 4 2 3 ?2 2 2 1 1 3 3 2 4 4 2 3 ?2 2 1 1 1 3 2 2 4 4 2 3 ?2 1 2 1 1 3 2 2 4 4 2 4 ?1 1 4 1 3 1 2 2 3 3 1 2 ?3 4 4 3 2 3 3 4 4 4 3 4 ?3 1 3 3 1 2 2 3 4 4 2 3 ?

1 1 2 1 1 1 3 3 4 4 2 3 -3 5 2 1 1 1 3 2 3 4 1 3 -2 2 1 1 1 1 3 3 3 4 3 4 -2 1 1 1 1 1 2 2 3 4 3 4 -1 1 2 1 1 1 3 1 4 3 1 2 -1 1 3 1 2 1 2 1 3 3 2 3 -1 1 1 1 1 1 2 2 4 4 2 3 -1 1 3 1 1 1 1 1 4 3 1 3 -2 1 1 1 1 1 1 1 2 1 1 2 -

Data set: 39 companies

20: + (acceptable)

9: - (unacceptable)

10: ? (uncertain)

from: Greco, Matarazzo, Slowinski (1996)


5

Possible classifier

if man.exp. > 4, then class = ‘+’if man.exp. < 4 and net.inc/net.worth = 1, then class = ‘-’

all other cases: class = ‘?’

• when applied to dataset of 39: 3 mistakes


6

What is classification?

The act of assigning objects to classes, using the values of relevant features of those objects

So we need:

• objects (individuals, cases), all belonging to some domain

• classes, number and kind prescribed

• features (attributes, variables)

• a classifier (classification function) that assigns a class to any object


7

Building classifiers

= induction from a training set of examples:

data without noise

data with noise


8

induction-methods (especially from AI world)

• decision trees: C4.5, CART (from 1984 on)

• neural networks: backpropagation (from1986, with false start from 1974)

• rule induction algorithms: CN2 (1989)

• newer methods: rough sets, fuzzy methods, decision lists, pattern based methods, etc.


9

Decision tree: example

+

?

?-

man.exp. < 3

gen.exp./sales = 1

tot.liab/cashfl = 1

y n

y n

y nclassifies 37 out of 39

ex’s correctly


10

Ordinal classification

• features have ordinal scale

• classes have ordinal scale

• the ordering must be preserved!


11

Preservation of orderingF1 F2 F3 F4

comp A 1 2 2 3

comp B 2 4 3 3

A classifier is monotone iff: if A < B, then also class(A) < class(B)


12

Relevance of ordinal classification

• selection-problems

• credit worthiness

• pricing (e.g. real estate)

• etc.


13

Induction of monotone decision trees

• using C4.5 or CART: non-monotone trees

• needed: an algorithm that guarantees to generate only monotone trees

• Makino, Ibaraki, etc. (1996),

• only for 2-class problems, cumbersome

• Potharst & Bioch (2000)

• for k-class problems, fast and efficient


14

The algorithmtry to split subset T:

1) update D for subset T

2) if D T is homogeneous then

assign class label to T and make T a leaf definitively

else

split T into two non-empty subsets TL and TR using entropy

try to split subset TL

try to split subset TR


15

The update rule

update D for T:

1) if min(T) is not in D then

- add min(T) to D

- class ( min(T) ) = the maximal value allowed, given D

2) if max(T) is not in D then

- add max(T) to D

- class ( max(T) ) = the minimal value allowed, given D


16

The minimal value allowed given D

• For each x X \ D it is possible to calculate the minimal and the maximal class value possible, given D.

• Let x be the downset { y X | y x } of x

• Let y* be an element in D x with highest class value

• Then the minimal class value possible for x is class (y*).


17

The maximal value allowed given D

• Let x be the upset { y X | y x } of x

• Let y* be an element in D x with lowest class value

• Then the maximal class value possible for x is class (y*)

• if there is no such element then take the maximal class value (or the minimal, in the former case)


18

Example

0 0 1 0

0 0 2 1

1 1 2 2

2 0 2 2

2 1 2 3

attr. 1: values 0,1,2



classes: 0, 1, 2, 3

D: 0 0 0

1 0 0

0 1 0

….

2 2 2

X:

Let us calculate the min and max poss value for x = 022:

minvalue: y* = 002, so the min-value = 1

maxvalue: there is no y*, so the max-value = 3


19

Tracing the algorithmTry to split subset T = X:

update D for X:

min(X) = 000 is not in D; maxvalue of 000 is 0

add 000 with class 0 to D

max(X) = 222 is not in D; minvalue of 222 is 3

add 222 with class 3 to D

D X is not homogeneous

so consider all the possible splits:

A1 0; A1 1; A2 0; A2 1; A3 0; A3 1

0 0 0 0

0 0 1 0

0 0 2 1

1 1 2 2

2 0 2 2

2 1 2 3

2 2 2 3


20

The entropy of each split

The split A1 0 splits X into TL = [000,022] and TR = [100,222]

0 0 0 0

0 0 1 0

0 0 2 1

D TR

1 1 2 2

2 0 2 2

2 1 2 3

2 2 2 3

Entropy = 1

D TL

Entropy = 0.92

Average entropy of this split = 3/7 x 0.92 + 4/7 x 1 = 0.97


21

Going on with the trace The split with lowest entropy is A1 0, so we go on with T = TL = [000,022]:

Try to split subset T = [000,022]:

update D for T:

min(T) = 000 is already in D

max(T) = 022 has minimum value 1, so it is added to D

0 0 0 0

0 0 1 0

0 0 2 1

0 2 2 1

1 1 2 2

2 0 2 2

2 1 2 3

2 2 2 3

D T is not homogeneous, so we go on to consider

the following splits: A2 0; A2 1; A3 0; A3 1

Lowest entropy


22

We now have the following tree:

A1 0

A3 1

? ?

?


23

Going on...The split A3 1 splits T into TL = [000,021] and TR = [002,022]

We go on with T = TL = [000,021]

Try to split subset T = [000,021]:

min(T) = 000 is already in D

max(T) = 021 has minimum value 0, so it is added to D

D T is homogeneous, so we stop and make T into a leaf with class value 0

Next, we go on with T = TR = [002,022], etc.


24

Finally...

A1 0

A3 1

0

32

12

A1 1

A2 0


25

A monotone tree for the Bankruptcy problem

• can be seen on p. 107 of the paper that was handed out with this course

• a tree with 6 leaves

• uses the same attributes as those that come up with an ordinal version of the rough set approach: see Viara Popova’s lecture


26

Conclusions and remaining problems

• We described an efficient algorithm for the induction of monotone decision trees, in case we have a monotone dataset

• We also have an algorithm to repair a non-monotone decision tree, but it makes the tree larger

• What if we have noise in the dataset?

• Is it possible to repair by pruning?

Documents

Ordinal Classification