12
Generalized Entropy and Decision Trees Dan A. Simovici Szymon Jaroszewicz Univ. of Massachusetts Boston Dept. of Computer Science Boston, Massachusetts 02125 USA {dsim,sj}@cs.umb.edu ABSTRACT. We introduce an extension of the notion of Shannon conditional entropy to a more general form of conditional entropy that captures both the conditional Shannon entropy and a similar notion related to the Gini index. The proposed family of conditional entropies generates a collection of metrics over the set of partitions of finite sets, which can be used to construct decision trees. Experimental results suggest that by varying the parameter that defines the entropy it is possible to obtain smaller decision trees for certain databases without sacrificing accurracy. RÉSUMÉ. Nous présentons une extension de la notion de l’entropie conditionnelle de Shannon à une forme plus générale d’entropie conditionnelle qui formalise l’entropie conditionnelle de Shannon et une notion semblable liée à l’index de Gini. La famille proposée d’entropies condi- tionnelles produit une collection de métriques sur l’ensemble de partitions des ensembles finis, qui peuvent être employées pour construire des arbres de décision. Les résultats expérimen- taux suggèrent qu’en changeant le paramètre qui définit l’entropie il est possible d’obtenir de plus petits arbres de décision pour certaines bases de données sans sacrifier l’exactitude de la classification. KEYWORDS: Shannon entropy, Gini index, generalized conditional entropy, metric, partition, de- cision tree MOTS-CLÉS : entropie de Shannon, index de Gini, entropie conditionnelle generalisée, métrique, partition, arbre de décision

Generalized Entropy and Decision Trees

Embed Size (px)

Citation preview

GeneralizedEntr opy and DecisionTrees

Dan A. Simovici — SzymonJaroszewicz

Univ. of MassachusettsBostonDept.of ComputerScienceBoston,Massachusetts02125USA

{dsim,sj}@cs.umb.edu

ABSTRACT. We introducean extensionof the notion of Shannonconditionalentropy to a moregeneral form of conditionalentropythat capturesboth theconditionalShannonentropyandasimilar notionrelatedto theGini index. Theproposedfamilyof conditionalentropiesgeneratesa collectionof metricsover the setof partitions of finite sets,which can be usedto constructdecisiontrees. Experimentalresultssuggest that by varying the parameterthat definestheentropy it is possibleto obtainsmallerdecisiontreesfor certaindatabaseswithoutsacrificingaccurracy.

RÉSUMÉ. Nousprésentonsuneextensionde la notionde l’entropieconditionnelledeShannonà uneformeplusgénérale d’entropieconditionnellequi formalisel’entropieconditionnelledeShannonetunenotionsemblableliéeà l’index deGini. La familleproposéed’entropiescondi-tionnellesproduit unecollectiondemétriquessur l’ensembledepartitionsdesensemblesfinis,qui peuventêtre employéespour construire desarbresde décision.Les résultatsexpérimen-tauxsuggèrent qu’enchangeantle paramètre qui définit l’entropie il estpossibled’obtenir depluspetitsarbresdedécisionpour certainesbasesdedonnéessanssacrifier l’exactitudedelaclassification.

KEYWORDS:Shannonentropy, Gini index, generalizedconditionalentropy, metric,partition, de-cisiontree

MOTS-CLÉS: entropiedeShannon,index deGini, entropieconditionnellegeneralisée, métrique,partition, arbre dedécision

1. Intr oduction

In [SIM 02] we introducedan axiomatizationof a generalnotion of entropy forpartitionsof finite sets.The systemof axiomsthatwe proposedshows the commonnatureof Shannonentropy andof othermeasuresof distribution concentrationsuchthattheGini index.

Let PART�����

be the set of partitionsof the nonemptyset�

. The classof allpartitionsof finite setsisdenotedbyPART. Theone-blockpartitionof

�is denotedby��� . Thepartition ��� ������ � � is denotedby � � . If ��������� PART

�����, then �������

if every block of � is includedin a block of ��� . Clearly, for every ��� PART�����

wehave � � ����� ��� .

The partial orderedset�PART

����� ��� � is a lattice (see,for examplea very lucidstudyof this lattice in [LER 81]). If !�� "�#� PART

���$�, then "� covers if %�& "�

andthereis no partition (')� PART���$�

suchthat ��* !'+�, "� . This is denotedby .-/ "� . It is easyto seethat .-/ "� if andonly if "� canbeobtainedfrom by fusingtwo of its blocksinto ablockof "� . Thelargestelementof PART

�����is theone-block

partition � � ; theleastelementis thepartition � �10 �2�34�1�56� � � . Theinfimum oftwo partitions���7 �� PART

���$�will bedenotedby ��89 .

If� ��: aretwo disjointandnonemptysets,��� PART

���$�, �� PART

� : � , where� 0 � � ' �<;�;�;<� ��= � , 0 ��: ' ��;<;�;>�7:�?(� , thenthe partition �A@/ is the partitionof�1B : givenby �9@1 0 � � '���;<;�;<� � = ��:C'��<;�;<;���: ? � .Whenever the“ @ ” operationis defined,thenit is easilyseento beassociative. In

otherwords,if� ��:9�7D arepairwisedisjoint andnonemptysets,and ��� PART

���$�, .� PART

� : � , EF� PART� D � , then �C@ � C@GE � 0 � �C@. � @AE . Observethatif

� ��:aredisjoint, then � � @/��H 0 � ��I H . Also, �J� @ � H is thepartition � � �7:K� of theset�1B : .

If � 0 � � '��<;�;<;�� � = � , 0 �3:C'��<;�;�;<��: ? � arepartitionsof two arbitrarysets,thenwedenotethepartition � �ML!N :POQ�SRT��UV��W��<RC��XK��YJ� of

�/N : by � N . Notethat � � N � H 0 � �VZ H and � � N � H 0 � �VZ H .

The axiomatizationintroducedin [SIM 02] consistsof four axiomssatisfiedbyseveraltypesof entropy-likecharacteristicsof partitions.

Definition 1.1 Let [Q�F\ , [Q]�^ , andlet _a`\Pb c(dCe(fg\ c(d beacontinuousfunctionsuchthat _ ��h �ji � 0 _ � i!� h!� , _ �kh ��^ � 0 h for

h ��i9�A\ c4d .A� _��j[ � -systemof axiomsfor apartition entropy lm` PART

����� e4fn\ c4d consistsof thefollowing axioms:

(P1) If ���j����� PART���$�

aresuchthat �Q����� , then l � ��� � �1l � � � .(P2) If

� ��: aretwo finite setssuchthat � � �o�,� :p� , then l � � � � ��l � ��H � .

2

(P3) For every disjoint sets� ��: andpartitions �q� PART

�����, and a� PART

� : �wehave:l � ��@� � 0sr � � �� � ��@t� :p��uPv l � � � @ r � :p�� � �<@t� :p��u#v l � � @�l � � � �7:K� � ;

(P4) If ��� PART���$�

and �� PART� : � , then l � � N � 0 _ � l � � � �jl � �j� .

Observe thatwe postulatethat l � � �Pw ^ for any partition � sincetherangeof everyfunction l is \ c(d .

For a choiceof [ theseaxiomsdeterminean entropy function l v up to a con-stant factor. The samechoicealso determinesthe function _ . The entropiesde-fined for [gx0 R were namednon-Shannonentropies. In this case,for a partition� 0 � � ' ��;<;�;<� � ?(�y� PART

���$�wehave:

l v 0qz|{} RPe?~O�� ' r � � O �� � ��uPv��� �

where z is a constantthatsatisfiestheinequality z � [GeaR � ]*^ . Thus,for [/]�R wehave l v � � � 0%��{} RPe

?~O�� ' r � � O �� � � uPv��� �andfor [|�tR we have

l v � � � 0%� {}?~O�� ' r � � O �� � �ou v e�R �� �

for somepositive constant� , where �90�z if [�]�R , and ��0 e z when [���R . Ineithercase,we have _ �kh �ji � 0 h @�i�e '� h i for

h �jip�6\ c(d .Thecase[ 0 R yieldstheShannonentropy, thatisl ' � � � 0 e � ?~O�� ' � � O��� � �S���2� b � � O��� � � ;

Also, if [ 0 R , then _ ��h ��i � 0 h @�i forh �ji9�F\ c4d .

2. GeneralizedConditional Entr opy

Thegeneralizedentropiespreviously introducedgeneratecorrespondinggeneral-ized conditionalentropies.Let ��� PART

���$�andlet Dn� � . Denoteby �4� the

3

“trace” of � on D given by �4� 0 �3:���DK� :���� suchthat :���D�x0�� � . Clearly,� � � PART� D � ; also,if D is a blockof � , then � � 0 � � .

Definition 2.1 Theconditionalentropydefinedby the� _���[ � -entropy l is thefunc-

tion l v ` PART b e4fg\ c(d givenby:

l v � �V� � 0?~O�� ' � D5O��� � �y� l v � �4�4� � �

where���7 .� PART���$�

and 0 ��DP'��<;�;<;��7D ? � .Observethat l v � �V� ��� � 0 l v � � � .A directconsequenceof theAxiomsis that l � �J� � 0 ^ for any set

�(LemmaII.2

from [SIM 02]). Thefollowing reciprocalresultalsoholds:

Lemma 2.2 Let�

bea finite setand let �a� PART�����

such that l � � � 0 ^ . Then,� 0 � � .

Proof. Supposethat l v � � � 0 ^ but ��� � � . Then,thereexists a block D of� suchthat �%� D � �. If � 0 ��D$� � e%D�� , then clearly we have ����� , so^+�1l v � � � ��l v � � � , which implies l v � � � 0 ^ . If [|]tR , then

l v � � � 0q��� RSe r � DK�� � ��u v e r �� e|DK�� � � u v�¡ 0 ^�;

Theconcavity of thefunction ¢ ��h(� 0 h v @ � R#e h(� v on £ ^��<R>¤ (when [1]�R ) implieseither D 0 � or D 0%� , which is a contradiction.Thus, � 0 ��� . A similar argumentworksfor theothercases.

Theorem2.3 Let�

bea finite setand let ���� �� PART�����

. We have l v � �V� � 0 ^if andonly if ���� .

Proof. Supposethat 0 ��DP'��<;�;�;<�7D ? � . If ��1� , then �4�4� 0 � �4� for RC��X���Y ,so l v � �V� � 0 ^ . Conversely, supposethat

l v � �V� � 0?~O�� ' � D O �� � �C� l v � �(� � � 0 ^ ;

This implies l v � �4�4� � 0 ^ for Ra�¥X���Y , so �(� � 0 � � � for Ra�¥X���Y byLemma2.2. This meansthatevery block D O of is includedin a block of � , whichimplies ���� .

4

Notethatthepartition �M8T whoseblocksconsistof nonemptyintersections� and canbe written as �68G 0 � �"¦ @ ���<� @�� �(§ 0 H¨¦ @ ����� @a H"© . Therefore,byCorollaryII.7 of [SIM 02], we have:l v � �989 � 0

?~O�� ' r � D O �� � �2uPv l v � � � � � @�l v � � ;For thoseentropieswith [|]tR wehavel v � ��86 � �1l v � �V� � @�l v � � � (1)

while for thosehaving [���R , the reverseinequalityholds. In the caseof Shannonentropy, [ 0 R and l�' � ��8p � 0 l�' � �V� � @|l�' � � (2)0 l�' � 5� � � @|l�' � � � ;Lemma 2.4 Let 4�«ª#�|£ ^��<R>¤ such that T@�ª 0 R . Then,for [|]%R wehave:?~ L � ' � h L @1ª«i L � v �/ ?~ L � ' h v L @�ª ?~ L � ' i vL �for every

h '¬��;<;�;>� h ? ��i�'��<;�;<;>�ji ? �Q£ ^ ��R>¤ . For [��*R , thereverseinequalityholds.

Proof. Theconcavity of the function ¢ �kh!� 0 h v for [q]­R on the interval £ ^��<R>¤implies

� h L @�ª«i L � v �q h v L @�ª«i vL . Summingup theseinequalitiesgivesthedesiredinequality.

Thesecondpartof the lemmafollows from theconvexity of thefunction ® �kh(� 0h v on £ ^���R�¤ when [|�tR .Theorems2.5and2.8extendwell-known monotonicitypropertiesof Shannonen-

tropy.

Theorem2.5 If ���� !�7 "� are partitions of the finite set�

such that s�¯ "� , thenl v � �V� � �1l v � �V� "� � for [Q]�^ .Proof. To provethisstatementit sufficesto consideronly thecasewhen .-/ "� .Supposeinitially that [�]*R .Let !�� "�V� PART

�����suchthat �-t "� . Supposethat °G��± areblocksof such

that D 0 ° B ± , where D is ablock of "� ; thepartition � is �3:C'3��;<;�;��7: ? � .Define

h(L 0³² H"´�µ�¶ ²² ¶ ² and i L 0�² H"´�µ�· ²² · ² for R��*U$�*Y . If we choose 0�² ¶ ²² � ² andª 0 ² · ²² � ² , then� DK� ?~ L � ' � : L �FDK� v� DK� v ��� °.� ?~ L � ' � : L �p°.� v� °�� v @t� ±p� ?~ L � ' � : L �p±9� v� ±9� v �

5

by Lemma2.4.Consequently, we canwrite:l v � �V� �0 �<��� @ � °��� � � l v � � ¶ � @ � ±9�� � � l v � � · � @ �����0 �<��� @ � °��� � � � RPe ?~ L � ' � : L �p°.� v� °�� v ¡ @ � ±p�� � � � RPe ?~ L � ' � : L �6±9� v� ±p� v ¡ @ ���<�� �<��� @ � DK�� � � � RPe ?~ L � ' � : L �FDK� v� DK� v ¡ @ ����� 0 l v � �V� � � ;For [Q�*R we have� DK� ?~ L � ' � : L �FDK� v� DK� v w � °.� ?~ L � ' � : L �p°.� v� °�� v @t� ±p� ?~ L � ' � : L �p±9� v� ±9� v �

by thesecondpartof Lemma2.4. Thus,l v � �V� �0 �<��� @ � °��� � � l v � �(¶ � @ � ±9�� � � l v � �4· � @ �����0 �<��� @ � °��� � � � ?~ L � ' � : L �p°.� v� °.� v e�R ¡ @ � ±p�� � � � ?~ L � ' � : L �6±9� v� ±9� v e�R ¡ @ ���<�� �<��� @ � DK�� � � � ?~ L � ' � : L �FDK� v� DK� v e�R ¡ @ ����� 0 l v � �V� � � ;For [ 0 R theinequalityis a well-known propertyof Shannonentropy.

Corollary 2.6 For every ���� �� PART�����

and [|]�^ , wehavel v � �V� � ��l v � � � .Proof. Since .� ��� , by Theorem2.5wehave l v � �V� � �1l v � �V� ��� � 0 l v � � � .

Corollary 2.7 Let�

bea finiteset.For [ w R wehavel v � �S8P � ��l v � � � @+l v � �for every ���� �� PART�����

.

Proof. By Inequality(1) andby Corollary2.6wehavel v � ��86 � �1l v � �V� � @�l v � � ��l v � � � @�l v � � ;

6

Theorem2.8 If ���j������ are partitions of the finite set�

such that �¸�³��� , thenl v � �V� �#w l v � ���¹� � .Proof. Supposethat 0 �ºD ' ��;<;�;<�7DS?(� . Then,it is clearthat � � �C�a���� � for R��X��aY . Therefore,l v � � � � �Pw l v � ���� � � by Axiom (P1), which implies immediately

thedesiredinequality.

Lemma 2.9 Let�

bea nonemptysetandlet � � ��� � � �k� bea two-block partition of�

.If �Q� PART

���$�, "�"� PART

��� � � , and "� ��� PART��� � � � , thenl v � �V� � @� � � � 0 � � �¹�� � � l v � � � � � � @ �� � �¹�� � � l v � � � � � � � � �

where ��� 0 � �"» and ��� � 0 � �"» » .Proof. Notethat "��@� "� � is a partitionof

�. Thus,we canwrite:l v � �V� � @� � � � 0 ~·5¼½ » � ±9�� � � l v � �4· � @ ~¾ ¼½ » » � ¿��� � � l v � � ¾ �0 ~·5¼½ » � ±9�� � � l v � � �· � @ ~¾ ¼½ » » � ¿��� � � l v � � � �¾ �0 � � �¹�� � � l v � � � � � � @ �

� � �À�� � � l v � � � � � � � � ;Theorem2.10 Let

�bea nonemptysetandlet � � ' ��;<;�;�� �PÁ � bea partition of

�. If��� PART

���$�, � � PART

��� � � for RT� z �1 , thenl v � �V� ' @ �<��� @1 Á>� 0Á~� � ' � � � �� � � l v � � � � � �

where � � 0 � ��à for RT� z �� .Proof. Theresultfollowsimmediatelyfrom Lemma2.9dueto theassociativity of

thepartialoperation“ @ ”.

Theorem2.11 If [Ä]ÅR , thenfor every threepartitions ���7 !�jE of a finite set�

wehave l v � �V� �8pE � @�l v � 5� E �#w l v � ��86 5� E � ;If [|�tR wehavethereverseinequality, andfor [ 0 R wehavetheequalityl v � �V� �8pE � @�l v � 5� E � 0 l v � ��86 5� E � ;

7

Proof. Supposethat � 0 �3:C'��<;�;<;>��: = � , 0 �ºDP'���;<;�;<�7D = � , andE 0 �3°�'��<;�;�;���° Á � .We notedalreadythat 986E 0 ¶5¦ @ ���<� @� ¶ÇÆ 0 E �"¦ @ ���<� @�E �(§ ; Consequently,by Theorem2.10we have l v � 68�� � 0ÅÈ Á � � ' ² ¶ à ²² � ² l v � � ¶ à � ¶ à � ; Also, we havel v � 5� E � 0tÈ

Á � � ' ² ¶ à ²² � ² l v � ¶ à � ;If [Q]*R wesaw that l v � �(¶ à 8K 4¶ à � ��l v � �(¶ à � 4¶ à � @�l v � ¶ à � � for every z ,RC� z �1 , which impliesl v � K8p� � @�l v � 5� E �Éw Á~� � ' � ° � �� � � l v � �(¶ à 86 4¶ à �0 Á~� � ' � ° � �� � � l v �j� �K86 � ¶ à �0 l v � ��86 5� E � ;Usingasimilarargumentweobtainthesecondinequalityof thetheorem.Theequalityfor theShannoncasewasobtainedin [MÁN 91].

Corollary 2.12 Let�

bea finite set.For [ w R andfor ���� !��E.� PART���$�

wehavetheinequality: l v � �V� � @�l v � 5� E �#w l v � �V� E � .Proof. Notethatby Theorem2.5we have:l v � �V� � @1l v � 5� E �#w l v � �V� �89E � @|l v � 5� E � ;Therefore,for [ w R , by Theorems2.11and2.8we obtain:l v � �V� � @�l v � 5� E �#w l v � ��86 5� E �Pw l v � �V� E � ;

Thefollowing resultgeneralizesa resultof LópezdeMántaras[MÁN 91]:

Corollary 2.13 For [ w R definethemappingÊ v ` PART���$� b e4f�\ c4d by Ê v � ���� � 0l v � �V� � @|l v � 5� � � for ���7 �� PART

���$�. Then,Ê v is a metricon PART

�����.

Proof. If Ê v � ���7 � 0 ^ , then l v � �V� � 0 l v � 5� � � 0 ^ . Therefore,by Theo-rem2.3wehave ���� and �Q�� , so � 0 . Thesymmetryof Ê v is immediate.Thetriangularpropertyis adirectconsequenceof Corollary2.12.

In [MÁN 91] it is shown that the mapping Ë ' ` PART����� b e(fÌ\ c4d that corre-

spondsto Shannonentropy, definedbyË ' � ���� � 0 Ê ' � ���7 �l�' � �K86 �for ���7 .� PART

���$�is alsoametricon PART

���$�. This resultis extendednext.

8

Theorem2.14 Let�

be a finite, non-emptyset. For [ w R , the mapping Ë v `PART

����� b e(fg\ c(d definedbyË v � ���� � 0 Í Ê v � ���� �Ê v � ���� � @�l v � � � @�l v � �for ���7 .� PART���$�

is a metriconPART���$�

such that ^)��Ë v � ���7 � �*R .Proof. It easyto seethat ^1��Ë v � ���� � �sR since,by Corollary 2.6, l v � � � @l v � �Îw l v � �V� � @�l v � 5� � � 0 Ê v � ���7 � . Weneedto show only thatthetriangualar

inequalityis satisfiedby Ë v for [|]%R . We canwrite:Ë v � ���7 � @1Ë v � !�jE � 0ÏMЬÑ�Ò ² ½ºÓkÔ ÏMÐÑ ½ ² Ò ÓÏ Ð ÑÕÒ ² ½ºÓkÔ Ï Ð Ñ ½ ² Ò ÓkÔ Ï Ð ÑÕÒ ÓkÔ Ï Ð Ñ ½ºÓ @ ÏMÐ2Ñ ½ ² Ö Ó�Ô Ï#ÐÑ Ö�² ½ºÓÏ Ð Ñ ½ ² Ö ÓkÔ Ï Ð Ñ Ö2² ½ºÓkÔ Ï Ð Ñ ½ºÓkÔ Ï Ð Ñ Ö Ó ;Notethat l v � �V� � @�l v � 5� � � @�l v � � � @�l v � � �l v � �V� � @�l v � 5� � � @�l v � 5� E � @|l v � E¨� � @�l v � � � @�l v � E �becausel v � � �1l v � 5� E � @�l v � E � by Inequality(1) andAxiom (P1). Similarly,l v � 5� E � @|l v � E¨� � @�l v � � @�l v � E � �l v � �V� � @�l v � 5� � � @�l v � 5� E � @|l v � E¨� � @�l v � � � @�l v � E �becausel v � � �1l v � 5� � � @�l v � � � . Thisyieldstheinequality:Ë v � ���7 � @1Ë v � !�jE �#wÏ Ð Ñ�Ò ² ½ºÓ�Ô Ï Ð Ñ ½ ² Ò Ó�Ô Ï Ð Ñ ½ ² Ö ÓkÔ Ï Ð Ñ Ö�² ½ºÓÏMЬÑÕÒ ² ½ºÓkÔ ÏMÐÑ ½ ² Ò ÓkÔ ÏMÐÑ ½ ² Ö ÓkÔ Ï#ÐÑ Ö2² ½ºÓ�Ô Ï#ÐÑ�Ò ÓkÔ Ï#Ð2Ñ Ö Ó 0''jÔ × Ð¬ØÚÙ�Û�Ü × ÐØÚÝ«Û× ÐºØÞÙºß àáÛâÜ × ÐØâà3ß Ù<ÛâÜ × ÐØÞà3ß Ý«ÛâÜ × ÐØÞÝ�ß àáÛ w''jÔã× ÐØÞÙ�ÛâÜ × Ð¬ØÚÝ«Û× ÐºØÞÙºß Ý7Û�Ü × ÐØÚÝ<ß Ù�Û 0 Ë v � ���jE � ;

�For [ 0 R , Ë ' � ���� � 0 ä ¦ Ñ�Ò�å ½ºÓÏ ¦ Ñ�Òæ ½ºÓ , dueto equality(2), which coincideswith the

expressionobtainedin [MÁN 91] for thenormalizeddistance.

3. A Monotonicity Property of GeneralizedDistances

We prove a monotonicitypropertyof thegeneralizeddistanceÊ v thatshows thattheselectionof splittingattributesbasedontheminimalentropicdistanceÊ v doesnotfavor attributeswith largenumberof values.

Theorem3.1 Let�

be a finite setsand let ���j���À�� g� PART���$�

be three parti-tions such that ��� is covered by � . In other words, � 0 �3:C'���;<;�;>�7: = � and � 0

9

�3:C'���;<;�;>�7:��= ��:�� �= � , where : = 0 :��= B :�� �= . Supposealso that there existsa blockD of such that : = ��D . Then,if [ w R , we have Ê v � ���7 � �¸Ê v � ���À�� � andË v � ���� � ��Ë v � ���À�� � .Proof. For the caseof Shannon’s entropy, [ 0 R , the inequalitieswereprovenin [MÁN 91]. Therefore,we canassumethat [|]%R .

We claim thatunderthehypothesisof thetheoremwe have l v � 5� � � 0 l � 5� � � � .Notethat H © 0 � H © , 4H »© 0 � H »© , and 4H » »© 0 � H » »© , since :��= ��:�� �= �%: = �tD .Therefore,l � H © � 0 l � 4H »© � 0 l � H » »© � 0 ^ , hencel v � 5� � � 0

=~ L � ' � : L �� � � l v � H"´ � 0=�ç '~ L � ' � : L �� � � l v � H"´ � 0 l v � 5� � � � ;

Theorem2.8 implies l v � �V� � ��l v � ���¹� � , which givesthefirst inequality.

Notethatthesecondequalityof thetheorem:Ë v � ���7 � 0 l v � �V� � @�l v � 5� � �l v � �V� � @�l v � 5� � � @�l v � � � @|l v � �� l v � ���è� � @�l v � 5� ��� �l v � � � � � @�l v � 5� � � � @�l v � � � � @|l v � �0 Ë v � � � �� �is equivalentto l v � � @�l v � � �l v � �V� � @�l v � 5� � �

w l v � � @|l v � ��� �l v � � � � � @|l v � 5� � � � ; (3)

Applying thedefinitionof conditionalentropy we canwrite:l v � �V� � e�l v � � � � � 0 � DK�� � �#é � :��= � v� DK� v @ � :�� �= � v� DK� v e � : = � v� DK� vê

and l v � � � e�l v � � � � 0 � :��= � v @%� :�� �= � v ea� : = � v� � � v �which impliesl v � �V� � e�l v � � � � � 0¸r � � �� DK� uMv ç ' £ l v � � � e�l v � � � � ¤2; (4)

Thus,weobtain: l v � � � � � e�l v � �V� �Îw l v � � � � e�l v � � � ; (5)

By denoting 0 l v � � and ª 0 l v � 5� � � 0 l v � 5� ��� � , theInequality(3) canbewrittenas T@|l v � � �l v � �V� � @1ª

w $@�l v � � � �l v � � � � � @�ª �

10

which,afterelementarytransformationscanbewrittenasl v � � � � � e�l v � �V� �Pw ª5@|l v� �V� �T@|l v � � �

� l v � � � � e�l v � � ��� �whchis impliedby Inequality(5) becauseª�@�l v � �V� �T@�l v � � � �%R;Thisprovesthesecondinequalityof thetheorem.

4. Experimental Resultsand Conclusions

Theexperimentshavebeenconductedon ëë datasetsfromtheUCI MachineLearn-ing Repository. The ìí�î treebuilderfrom the ïað�ñoò packagehasbeenused,in its orig-inal form, aswell asmodifiedto supportgeneralizedentropy distancesfor differentvaluesof the [ parameter. Theselectionof thesplitting attribute in the modifiedal-gorithmwasbasedon theminimumentropicdistancebetweenthepartitiongeneratedby theattributeandthetargetpartition. Eachexperimentusedó -fold crossvalidation,averagehasbeentaken of the outcomesof the ó runsandwasperformedwith andwithoutpruning.

The treesizeandthe numberof leavesdiminish for 18 of the33 databasesanal-ysedandgrow for the remaining15. Thebestreductionin sizewasachievedfor theô�õ�öÕ÷ ò õ7ø�ù�ú7û�÷+üõ database,wherethesizeof thetreewasreducedto 37%for [ 0 Í ;Úóandthe numberof leaveswasreducedto 38.8%comparedto the standardJ48algo-rithm thatmakesuseof thegainratio. Onanotherhand,thelargestincreasein sizeandnumberof leaveswasrecordedfor the ô�ö�÷ ò ùèýoö ò¬þ ð ú ð3ÿ database,wherefor [ 0 R ,wehasanincreaseto 260%in sizeandto 256%in thenumberof leaves,thoughsuchanincreaseoccursrarelyamongthe15databaseswhereincreasesoccur.

In Figure1 we show thecomparative performanceof thegeneralizedentropy ap-proachcomparedto the standardgain ratio for the databaseswhich yieldedthe bestresults( ò û4ýoö�ü���ü��ø , ��ð ô ò ú�ö�ú�ö ÿ , and ô�õ7ö�÷ ò õ7ø�ù�ú7û�÷+üõ ), in the caseof the prunnedtrees.The100%level refersin eachcaseto thegain-ratioalgorithm. It is interestingto observe thattheaccurracy diminishesslightly or improvesslightly, asshown in ta-ble 1, thusconfirmingpreviousresults[MÁN 91, BRE98, MIN 89] thataccuracy isnot affectedsubstantiallyby themethodusedfor attributeselection.

5. References

[BRE 98] BREIMAN L., FRIEDMAN J. H., OLSHEN R. A., STONE C. J., ClassificationandRegressionTrees, ChapmanandHall, BocaRaton,1998.

[LER 81] LERMAN I . C., Classificationet analyseordinaledesdonnées, Dunod,Paris,1981.

11

0102030405060708090

100110120

��� ��� ���

������������������

Treesize

0102030405060708090

100110120

��� ��� ���

������������������������

0102030405060708090

100110120

��� ��� ��

�����������

audiology hepatitis primarytumor

0102030405060708090

100110120

��� ��� ���

��������������������

Numberof leaves

0102030405060708090

100110120

� � ���

������������������������

0102030405060708090

100110120

� ��� � �

������������

audiology hepatitis primarytumor

The [ factor:[ 0 R [ 0 R2; ó [ 0 Í � �� � � � [ 0 Í ;ÚóFigure1. ComparativeExperimentalResults

Database J48 ����� ������� � ����� ������� �audiology 78.76% 73.42% 73.86% 73.86% 71.64%hepatitis 78.06% 83.22% 83.22% 83.87% 83.87%primary-tumor 40.99% 43.34% 41.87% 43.34% 43.05%

Table 1. AccuracyResults

[MÁN 91] DE MÁNTARAS R. L., “A Distance-BasedAttributeSelectionMeasurefor Deci-sionTreeInduction”, MachineLearning, vol. 6, 1991,p. 81–92.

[MIN 89] M INGERS J., “An EmpiricalComparisonof SelectionMeasuresfor DecisionTreeInduction”, MachineLearning, vol. 3, 1989,p. 319–342.

[SIM 02] SIMOVICI D. A., JAROSZEWICZ S., “An Axiomatizationof Partition Entropy”,IEEE Transactionson InformationTheory, vol. 48,2002,p. 2138–2142.

12