Upload
um-boston
View
0
Download
0
Embed Size (px)
Citation preview
GeneralizedEntr opy and DecisionTrees
Dan A. Simovici — SzymonJaroszewicz
Univ. of MassachusettsBostonDept.of ComputerScienceBoston,Massachusetts02125USA
{dsim,sj}@cs.umb.edu
ABSTRACT. We introducean extensionof the notion of Shannonconditionalentropy to a moregeneral form of conditionalentropythat capturesboth theconditionalShannonentropyandasimilar notionrelatedto theGini index. Theproposedfamilyof conditionalentropiesgeneratesa collectionof metricsover the setof partitions of finite sets,which can be usedto constructdecisiontrees. Experimentalresultssuggest that by varying the parameterthat definestheentropy it is possibleto obtainsmallerdecisiontreesfor certaindatabaseswithoutsacrificingaccurracy.
RÉSUMÉ. Nousprésentonsuneextensionde la notionde l’entropieconditionnelledeShannonà uneformeplusgénérale d’entropieconditionnellequi formalisel’entropieconditionnelledeShannonetunenotionsemblableliéeà l’index deGini. La familleproposéed’entropiescondi-tionnellesproduit unecollectiondemétriquessur l’ensembledepartitionsdesensemblesfinis,qui peuventêtre employéespour construire desarbresde décision.Les résultatsexpérimen-tauxsuggèrent qu’enchangeantle paramètre qui définit l’entropie il estpossibled’obtenir depluspetitsarbresdedécisionpour certainesbasesdedonnéessanssacrifier l’exactitudedelaclassification.
KEYWORDS:Shannonentropy, Gini index, generalizedconditionalentropy, metric,partition, de-cisiontree
MOTS-CLÉS: entropiedeShannon,index deGini, entropieconditionnellegeneralisée, métrique,partition, arbre dedécision
1. Intr oduction
In [SIM 02] we introducedan axiomatizationof a generalnotion of entropy forpartitionsof finite sets.The systemof axiomsthatwe proposedshows the commonnatureof Shannonentropy andof othermeasuresof distribution concentrationsuchthattheGini index.
Let PART�����
be the set of partitionsof the nonemptyset�
. The classof allpartitionsof finite setsisdenotedbyPART. Theone-blockpartitionof
�is denotedby��� . Thepartition ��� ������ � � is denotedby � � . If ��������� PART
�����, then �������
if every block of � is includedin a block of ��� . Clearly, for every ��� PART�����
wehave � � ����� ��� .
The partial orderedset�PART
����� ��� � is a lattice (see,for examplea very lucidstudyof this lattice in [LER 81]). If !�� "�#� PART
���$�, then "� covers if %�& "�
andthereis no partition (')� PART���$�
suchthat ��* !'+�, "� . This is denotedby .-/ "� . It is easyto seethat .-/ "� if andonly if "� canbeobtainedfrom by fusingtwo of its blocksinto ablockof "� . Thelargestelementof PART
�����is theone-block
partition � � ; theleastelementis thepartition � �10 �2�34�1�56� � � . Theinfimum oftwo partitions���7 �� PART
���$�will bedenotedby ��89 .
If� ��: aretwo disjointandnonemptysets,��� PART
���$�, �� PART
� : � , where� 0 � � ' �<;�;�;<� ��= � , 0 ��: ' ��;<;�;>�7:�?(� , thenthe partition �A@/ is the partitionof�1B : givenby �9@1 0 � � '���;<;�;<� � = ��:C'��<;�;<;���: ? � .Whenever the“ @ ” operationis defined,thenit is easilyseento beassociative. In
otherwords,if� ��:9�7D arepairwisedisjoint andnonemptysets,and ��� PART
���$�, .� PART
� : � , EF� PART� D � , then �C@ � C@GE � 0 � �C@. � @AE . Observethatif
� ��:aredisjoint, then � � @/��H 0 � ��I H . Also, �J� @ � H is thepartition � � �7:K� of theset�1B : .
If � 0 � � '��<;�;<;�� � = � , 0 �3:C'��<;�;�;<��: ? � arepartitionsof two arbitrarysets,thenwedenotethepartition � �ML!N :POQ�SRT��UV��W��<RC��XK��YJ� of
�/N : by � N . Notethat � � N � H 0 � �VZ H and � � N � H 0 � �VZ H .
The axiomatizationintroducedin [SIM 02] consistsof four axiomssatisfiedbyseveraltypesof entropy-likecharacteristicsof partitions.
Definition 1.1 Let [Q�F\ , [Q]�^ , andlet _a`\Pb c(dCe(fg\ c(d beacontinuousfunctionsuchthat _ ��h �ji � 0 _ � i!� h!� , _ �kh ��^ � 0 h for
h ��i9�A\ c4d .A� _��j[ � -systemof axiomsfor apartition entropy lm` PART
����� e4fn\ c4d consistsof thefollowing axioms:
(P1) If ���j����� PART���$�
aresuchthat �Q����� , then l � ��� � �1l � � � .(P2) If
� ��: aretwo finite setssuchthat � � �o�,� :p� , then l � � � � ��l � ��H � .
2
(P3) For every disjoint sets� ��: andpartitions �q� PART
�����, and a� PART
� : �wehave:l � ��@� � 0sr � � �� � ��@t� :p��uPv l � � � @ r � :p�� � �<@t� :p��u#v l � � @�l � � � �7:K� � ;
(P4) If ��� PART���$�
and �� PART� : � , then l � � N � 0 _ � l � � � �jl � �j� .
Observe thatwe postulatethat l � � �Pw ^ for any partition � sincetherangeof everyfunction l is \ c(d .
For a choiceof [ theseaxiomsdeterminean entropy function l v up to a con-stant factor. The samechoicealso determinesthe function _ . The entropiesde-fined for [gx0 R were namednon-Shannonentropies. In this case,for a partition� 0 � � ' ��;<;�;<� � ?(�y� PART
���$�wehave:
l v 0qz|{} RPe?~O�� ' r � � O �� � ��uPv��� �
where z is a constantthatsatisfiestheinequality z � [GeaR � ]*^ . Thus,for [/]�R wehave l v � � � 0%��{} RPe
?~O�� ' r � � O �� � � uPv��� �andfor [|�tR we have
l v � � � 0%� {}?~O�� ' r � � O �� � �ou v e�R �� �
for somepositive constant� , where �90�z if [�]�R , and ��0 e z when [���R . Ineithercase,we have _ �kh �ji � 0 h @�i�e '� h i for
h �jip�6\ c(d .Thecase[ 0 R yieldstheShannonentropy, thatisl ' � � � 0 e � ?~O�� ' � � O��� � �S���2� b � � O��� � � ;
Also, if [ 0 R , then _ ��h ��i � 0 h @�i forh �ji9�F\ c4d .
2. GeneralizedConditional Entr opy
Thegeneralizedentropiespreviously introducedgeneratecorrespondinggeneral-ized conditionalentropies.Let ��� PART
���$�andlet Dn� � . Denoteby �4� the
3
“trace” of � on D given by �4� 0 �3:���DK� :���� suchthat :���D�x0�� � . Clearly,� � � PART� D � ; also,if D is a blockof � , then � � 0 � � .
Definition 2.1 Theconditionalentropydefinedby the� _���[ � -entropy l is thefunc-
tion l v ` PART b e4fg\ c(d givenby:
l v � �V� � 0?~O�� ' � D5O��� � �y� l v � �4�4� � �
where���7 .� PART���$�
and 0 ��DP'��<;�;<;��7D ? � .Observethat l v � �V� ��� � 0 l v � � � .A directconsequenceof theAxiomsis that l � �J� � 0 ^ for any set
�(LemmaII.2
from [SIM 02]). Thefollowing reciprocalresultalsoholds:
Lemma 2.2 Let�
bea finite setand let �a� PART�����
such that l � � � 0 ^ . Then,� 0 � � .
Proof. Supposethat l v � � � 0 ^ but ��� � � . Then,thereexists a block D of� suchthat �%� D � �. If � 0 ��D$� � e%D�� , then clearly we have ����� , so^+�1l v � � � ��l v � � � , which implies l v � � � 0 ^ . If [|]tR , then
l v � � � 0q��� RSe r � DK�� � ��u v e r �� e|DK�� � � u v�¡ 0 ^�;
Theconcavity of thefunction ¢ ��h(� 0 h v @ � R#e h(� v on £ ^��<R>¤ (when [1]�R ) implieseither D 0 � or D 0%� , which is a contradiction.Thus, � 0 ��� . A similar argumentworksfor theothercases.
Theorem2.3 Let�
bea finite setand let ���� �� PART�����
. We have l v � �V� � 0 ^if andonly if ���� .
Proof. Supposethat 0 ��DP'��<;�;�;<�7D ? � . If ��1� , then �4�4� 0 � �4� for RC��X���Y ,so l v � �V� � 0 ^ . Conversely, supposethat
l v � �V� � 0?~O�� ' � D O �� � �C� l v � �(� � � 0 ^ ;
This implies l v � �4�4� � 0 ^ for Ra�¥X���Y , so �(� � 0 � � � for Ra�¥X���Y byLemma2.2. This meansthatevery block D O of is includedin a block of � , whichimplies ���� .
4
Notethatthepartition �M8T whoseblocksconsistof nonemptyintersections� and canbe written as �68G 0 � �"¦ @ ���<� @�� �(§ 0 H¨¦ @ ����� @a H"© . Therefore,byCorollaryII.7 of [SIM 02], we have:l v � �989 � 0
?~O�� ' r � D O �� � �2uPv l v � � � � � @�l v � � ;For thoseentropieswith [|]tR wehavel v � ��86 � �1l v � �V� � @�l v � � � (1)
while for thosehaving [���R , the reverseinequalityholds. In the caseof Shannonentropy, [ 0 R and l�' � ��8p � 0 l�' � �V� � @|l�' � � (2)0 l�' � 5� � � @|l�' � � � ;Lemma 2.4 Let 4�«ª#�|£ ^��<R>¤ such that T@�ª 0 R . Then,for [|]%R wehave:?~ L � ' � h L @1ª«i L � v �/ ?~ L � ' h v L @�ª ?~ L � ' i vL �for every
h '¬��;<;�;>� h ? ��i�'��<;�;<;>�ji ? �Q£ ^ ��R>¤ . For [��*R , thereverseinequalityholds.
Proof. Theconcavity of the function ¢ �kh!� 0 h v for [q]R on the interval £ ^��<R>¤implies
� h L @�ª«i L � v �q h v L @�ª«i vL . Summingup theseinequalitiesgivesthedesiredinequality.
Thesecondpartof the lemmafollows from theconvexity of thefunction ® �kh(� 0h v on £ ^���R�¤ when [|�tR .Theorems2.5and2.8extendwell-known monotonicitypropertiesof Shannonen-
tropy.
Theorem2.5 If ���� !�7 "� are partitions of the finite set�
such that s�¯ "� , thenl v � �V� � �1l v � �V� "� � for [Q]�^ .Proof. To provethisstatementit sufficesto consideronly thecasewhen .-/ "� .Supposeinitially that [�]*R .Let !�� "�V� PART
�����suchthat �-t "� . Supposethat °G��± areblocksof such
that D 0 ° B ± , where D is ablock of "� ; thepartition � is �3:C'3��;<;�;��7: ? � .Define
h(L 0³² H"´�µ�¶ ²² ¶ ² and i L 0�² H"´�µ�· ²² · ² for R��*U$�*Y . If we choose 0�² ¶ ²² � ² andª 0 ² · ²² � ² , then� DK� ?~ L � ' � : L �FDK� v� DK� v ��� °.� ?~ L � ' � : L �p°.� v� °�� v @t� ±p� ?~ L � ' � : L �p±9� v� ±9� v �
5
by Lemma2.4.Consequently, we canwrite:l v � �V� �0 �<��� @ � °��� � � l v � � ¶ � @ � ±9�� � � l v � � · � @ �����0 �<��� @ � °��� � � � RPe ?~ L � ' � : L �p°.� v� °�� v ¡ @ � ±p�� � � � RPe ?~ L � ' � : L �6±9� v� ±p� v ¡ @ ���<�� �<��� @ � DK�� � � � RPe ?~ L � ' � : L �FDK� v� DK� v ¡ @ ����� 0 l v � �V� � � ;For [Q�*R we have� DK� ?~ L � ' � : L �FDK� v� DK� v w � °.� ?~ L � ' � : L �p°.� v� °�� v @t� ±p� ?~ L � ' � : L �p±9� v� ±9� v �
by thesecondpartof Lemma2.4. Thus,l v � �V� �0 �<��� @ � °��� � � l v � �(¶ � @ � ±9�� � � l v � �4· � @ �����0 �<��� @ � °��� � � � ?~ L � ' � : L �p°.� v� °.� v e�R ¡ @ � ±p�� � � � ?~ L � ' � : L �6±9� v� ±9� v e�R ¡ @ ���<�� �<��� @ � DK�� � � � ?~ L � ' � : L �FDK� v� DK� v e�R ¡ @ ����� 0 l v � �V� � � ;For [ 0 R theinequalityis a well-known propertyof Shannonentropy.
Corollary 2.6 For every ���� �� PART�����
and [|]�^ , wehavel v � �V� � ��l v � � � .Proof. Since .� ��� , by Theorem2.5wehave l v � �V� � �1l v � �V� ��� � 0 l v � � � .
Corollary 2.7 Let�
bea finiteset.For [ w R wehavel v � �S8P � ��l v � � � @+l v � �for every ���� �� PART�����
.
Proof. By Inequality(1) andby Corollary2.6wehavel v � ��86 � �1l v � �V� � @�l v � � ��l v � � � @�l v � � ;
6
Theorem2.8 If ���j������ are partitions of the finite set�
such that �¸�³��� , thenl v � �V� �#w l v � ���¹� � .Proof. Supposethat 0 �ºD ' ��;<;�;<�7DS?(� . Then,it is clearthat � � �C�a���� � for R��X��aY . Therefore,l v � � � � �Pw l v � ���� � � by Axiom (P1), which implies immediately
thedesiredinequality.
Lemma 2.9 Let�
bea nonemptysetandlet � � ��� � � �k� bea two-block partition of�
.If �Q� PART
���$�, "�"� PART
��� � � , and "� ��� PART��� � � � , thenl v � �V� � @� � � � 0 � � �¹�� � � l v � � � � � � @ �� � �¹�� � � l v � � � � � � � � �
where ��� 0 � �"» and ��� � 0 � �"» » .Proof. Notethat "��@� "� � is a partitionof
�. Thus,we canwrite:l v � �V� � @� � � � 0 ~·5¼½ » � ±9�� � � l v � �4· � @ ~¾ ¼½ » » � ¿��� � � l v � � ¾ �0 ~·5¼½ » � ±9�� � � l v � � �· � @ ~¾ ¼½ » » � ¿��� � � l v � � � �¾ �0 � � �¹�� � � l v � � � � � � @ �
� � �À�� � � l v � � � � � � � � ;Theorem2.10 Let
�bea nonemptysetandlet � � ' ��;<;�;�� �PÁ � bea partition of
�. If��� PART
���$�, � � PART
��� � � for RT� z �1 , thenl v � �V� ' @ �<��� @1 Á>� 0Á~� � ' � � � �� � � l v � � � � � �
where � � 0 � ��à for RT� z �� .Proof. Theresultfollowsimmediatelyfrom Lemma2.9dueto theassociativity of
thepartialoperation“ @ ”.
Theorem2.11 If [Ä]ÅR , thenfor every threepartitions ���7 !�jE of a finite set�
wehave l v � �V� �8pE � @�l v � 5� E �#w l v � ��86 5� E � ;If [|�tR wehavethereverseinequality, andfor [ 0 R wehavetheequalityl v � �V� �8pE � @�l v � 5� E � 0 l v � ��86 5� E � ;
7
Proof. Supposethat � 0 �3:C'��<;�;<;>��: = � , 0 �ºDP'���;<;�;<�7D = � , andE 0 �3°�'��<;�;�;���° Á � .We notedalreadythat 986E 0 ¶5¦ @ ���<� @� ¶ÇÆ 0 E �"¦ @ ���<� @�E �(§ ; Consequently,by Theorem2.10we have l v � 68�� � 0ÅÈ Á � � ' ² ¶ à ²² � ² l v � � ¶ à � ¶ à � ; Also, we havel v � 5� E � 0tÈ
Á � � ' ² ¶ à ²² � ² l v � ¶ à � ;If [Q]*R wesaw that l v � �(¶ à 8K 4¶ à � ��l v � �(¶ à � 4¶ à � @�l v � ¶ à � � for every z ,RC� z �1 , which impliesl v � K8p� � @�l v � 5� E �Éw Á~� � ' � ° � �� � � l v � �(¶ à 86 4¶ à �0 Á~� � ' � ° � �� � � l v �j� �K86 � ¶ à �0 l v � ��86 5� E � ;Usingasimilarargumentweobtainthesecondinequalityof thetheorem.Theequalityfor theShannoncasewasobtainedin [MÁN 91].
Corollary 2.12 Let�
bea finite set.For [ w R andfor ���� !��E.� PART���$�
wehavetheinequality: l v � �V� � @�l v � 5� E �#w l v � �V� E � .Proof. Notethatby Theorem2.5we have:l v � �V� � @1l v � 5� E �#w l v � �V� �89E � @|l v � 5� E � ;Therefore,for [ w R , by Theorems2.11and2.8we obtain:l v � �V� � @�l v � 5� E �#w l v � ��86 5� E �Pw l v � �V� E � ;
Thefollowing resultgeneralizesa resultof LópezdeMántaras[MÁN 91]:
Corollary 2.13 For [ w R definethemappingÊ v ` PART���$� b e4f�\ c4d by Ê v � ���� � 0l v � �V� � @|l v � 5� � � for ���7 �� PART
���$�. Then,Ê v is a metricon PART
�����.
Proof. If Ê v � ���7 � 0 ^ , then l v � �V� � 0 l v � 5� � � 0 ^ . Therefore,by Theo-rem2.3wehave ���� and �Q�� , so � 0 . Thesymmetryof Ê v is immediate.Thetriangularpropertyis adirectconsequenceof Corollary2.12.
In [MÁN 91] it is shown that the mapping Ë ' ` PART����� b e(fÌ\ c4d that corre-
spondsto Shannonentropy, definedbyË ' � ���� � 0 Ê ' � ���7 �l�' � �K86 �for ���7 .� PART
���$�is alsoametricon PART
���$�. This resultis extendednext.
8
Theorem2.14 Let�
be a finite, non-emptyset. For [ w R , the mapping Ë v `PART
����� b e(fg\ c(d definedbyË v � ���� � 0 Í Ê v � ���� �Ê v � ���� � @�l v � � � @�l v � �for ���7 .� PART���$�
is a metriconPART���$�
such that ^)��Ë v � ���7 � �*R .Proof. It easyto seethat ^1��Ë v � ���� � �sR since,by Corollary 2.6, l v � � � @l v � �Îw l v � �V� � @�l v � 5� � � 0 Ê v � ���7 � . Weneedto show only thatthetriangualar
inequalityis satisfiedby Ë v for [|]%R . We canwrite:Ë v � ���7 � @1Ë v � !�jE � 0ÏMЬÑ�Ò ² ½ºÓkÔ ÏMÐÑ ½ ² Ò ÓÏ Ð ÑÕÒ ² ½ºÓkÔ Ï Ð Ñ ½ ² Ò ÓkÔ Ï Ð ÑÕÒ ÓkÔ Ï Ð Ñ ½ºÓ @ ÏMÐ2Ñ ½ ² Ö Ó�Ô Ï#ÐÑ Ö�² ½ºÓÏ Ð Ñ ½ ² Ö ÓkÔ Ï Ð Ñ Ö2² ½ºÓkÔ Ï Ð Ñ ½ºÓkÔ Ï Ð Ñ Ö Ó ;Notethat l v � �V� � @�l v � 5� � � @�l v � � � @�l v � � �l v � �V� � @�l v � 5� � � @�l v � 5� E � @|l v � E¨� � @�l v � � � @�l v � E �becausel v � � �1l v � 5� E � @�l v � E � by Inequality(1) andAxiom (P1). Similarly,l v � 5� E � @|l v � E¨� � @�l v � � @�l v � E � �l v � �V� � @�l v � 5� � � @�l v � 5� E � @|l v � E¨� � @�l v � � � @�l v � E �becausel v � � �1l v � 5� � � @�l v � � � . Thisyieldstheinequality:Ë v � ���7 � @1Ë v � !�jE �#wÏ Ð Ñ�Ò ² ½ºÓ�Ô Ï Ð Ñ ½ ² Ò Ó�Ô Ï Ð Ñ ½ ² Ö ÓkÔ Ï Ð Ñ Ö�² ½ºÓÏMЬÑÕÒ ² ½ºÓkÔ ÏMÐÑ ½ ² Ò ÓkÔ ÏMÐÑ ½ ² Ö ÓkÔ Ï#ÐÑ Ö2² ½ºÓ�Ô Ï#ÐÑ�Ò ÓkÔ Ï#Ð2Ñ Ö Ó 0''jÔ × Ð¬ØÚÙ�Û�Ü × ÐØÚÝ«Û× ÐºØÞÙºß àáÛâÜ × ÐØâà3ß Ù<ÛâÜ × ÐØÞà3ß Ý«ÛâÜ × ÐØÞÝ�ß àáÛ w''jÔã× ÐØÞÙ�ÛâÜ × Ð¬ØÚÝ«Û× ÐºØÞÙºß Ý7Û�Ü × ÐØÚÝ<ß Ù�Û 0 Ë v � ���jE � ;
�For [ 0 R , Ë ' � ���� � 0 ä ¦ Ñ�Ò�å ½ºÓÏ ¦ Ñ�Òæ ½ºÓ , dueto equality(2), which coincideswith the
expressionobtainedin [MÁN 91] for thenormalizeddistance.
3. A Monotonicity Property of GeneralizedDistances
We prove a monotonicitypropertyof thegeneralizeddistanceÊ v thatshows thattheselectionof splittingattributesbasedontheminimalentropicdistanceÊ v doesnotfavor attributeswith largenumberof values.
Theorem3.1 Let�
be a finite setsand let ���j���À�� g� PART���$�
be three parti-tions such that ��� is covered by � . In other words, � 0 �3:C'���;<;�;>�7: = � and � 0
9
�3:C'���;<;�;>�7:��= ��:�� �= � , where : = 0 :��= B :�� �= . Supposealso that there existsa blockD of such that : = ��D . Then,if [ w R , we have Ê v � ���7 � �¸Ê v � ���À�� � andË v � ���� � ��Ë v � ���À�� � .Proof. For the caseof Shannon’s entropy, [ 0 R , the inequalitieswereprovenin [MÁN 91]. Therefore,we canassumethat [|]%R .
We claim thatunderthehypothesisof thetheoremwe have l v � 5� � � 0 l � 5� � � � .Notethat H © 0 � H © , 4H »© 0 � H »© , and 4H » »© 0 � H » »© , since :��= ��:�� �= �%: = �tD .Therefore,l � H © � 0 l � 4H »© � 0 l � H » »© � 0 ^ , hencel v � 5� � � 0
=~ L � ' � : L �� � � l v � H"´ � 0=�ç '~ L � ' � : L �� � � l v � H"´ � 0 l v � 5� � � � ;
Theorem2.8 implies l v � �V� � ��l v � ���¹� � , which givesthefirst inequality.
Notethatthesecondequalityof thetheorem:Ë v � ���7 � 0 l v � �V� � @�l v � 5� � �l v � �V� � @�l v � 5� � � @�l v � � � @|l v � �� l v � ���è� � @�l v � 5� ��� �l v � � � � � @�l v � 5� � � � @�l v � � � � @|l v � �0 Ë v � � � �� �is equivalentto l v � � @�l v � � �l v � �V� � @�l v � 5� � �
w l v � � @|l v � ��� �l v � � � � � @|l v � 5� � � � ; (3)
Applying thedefinitionof conditionalentropy we canwrite:l v � �V� � e�l v � � � � � 0 � DK�� � �#é � :��= � v� DK� v @ � :�� �= � v� DK� v e � : = � v� DK� vê
and l v � � � e�l v � � � � 0 � :��= � v @%� :�� �= � v ea� : = � v� � � v �which impliesl v � �V� � e�l v � � � � � 0¸r � � �� DK� uMv ç ' £ l v � � � e�l v � � � � ¤2; (4)
Thus,weobtain: l v � � � � � e�l v � �V� �Îw l v � � � � e�l v � � � ; (5)
By denoting 0 l v � � and ª 0 l v � 5� � � 0 l v � 5� ��� � , theInequality(3) canbewrittenas T@|l v � � �l v � �V� � @1ª
w $@�l v � � � �l v � � � � � @�ª �
10
which,afterelementarytransformationscanbewrittenasl v � � � � � e�l v � �V� �Pw ª5@|l v� �V� �T@|l v � � �
� l v � � � � e�l v � � ��� �whchis impliedby Inequality(5) becauseª�@�l v � �V� �T@�l v � � � �%R;Thisprovesthesecondinequalityof thetheorem.
4. Experimental Resultsand Conclusions
Theexperimentshavebeenconductedon ëë datasetsfromtheUCI MachineLearn-ing Repository. The ìí�î treebuilderfrom the ïað�ñoò packagehasbeenused,in its orig-inal form, aswell asmodifiedto supportgeneralizedentropy distancesfor differentvaluesof the [ parameter. Theselectionof thesplitting attribute in the modifiedal-gorithmwasbasedon theminimumentropicdistancebetweenthepartitiongeneratedby theattributeandthetargetpartition. Eachexperimentusedó -fold crossvalidation,averagehasbeentaken of the outcomesof the ó runsandwasperformedwith andwithoutpruning.
The treesizeandthe numberof leavesdiminish for 18 of the33 databasesanal-ysedandgrow for the remaining15. Thebestreductionin sizewasachievedfor theô�õ�öÕ÷ ò õ7ø�ù�ú7û�÷+üõ database,wherethesizeof thetreewasreducedto 37%for [ 0 Í ;Úóandthe numberof leaveswasreducedto 38.8%comparedto the standardJ48algo-rithm thatmakesuseof thegainratio. Onanotherhand,thelargestincreasein sizeandnumberof leaveswasrecordedfor the ô�ö�÷ ò ùèýoö ò¬þ ð ú ð3ÿ database,wherefor [ 0 R ,wehasanincreaseto 260%in sizeandto 256%in thenumberof leaves,thoughsuchanincreaseoccursrarelyamongthe15databaseswhereincreasesoccur.
In Figure1 we show thecomparative performanceof thegeneralizedentropy ap-proachcomparedto the standardgain ratio for the databaseswhich yieldedthe bestresults( ò û4ýoö�ü���ü��ø , ��ð ô ò ú�ö�ú�ö ÿ , and ô�õ7ö�÷ ò õ7ø�ù�ú7û�÷+üõ ), in the caseof the prunnedtrees.The100%level refersin eachcaseto thegain-ratioalgorithm. It is interestingto observe thattheaccurracy diminishesslightly or improvesslightly, asshown in ta-ble 1, thusconfirmingpreviousresults[MÁN 91, BRE98, MIN 89] thataccuracy isnot affectedsubstantiallyby themethodusedfor attributeselection.
5. References
[BRE 98] BREIMAN L., FRIEDMAN J. H., OLSHEN R. A., STONE C. J., ClassificationandRegressionTrees, ChapmanandHall, BocaRaton,1998.
[LER 81] LERMAN I . C., Classificationet analyseordinaledesdonnées, Dunod,Paris,1981.
11
0102030405060708090
100110120
��� ��� ���
������������������
Treesize
0102030405060708090
100110120
��� ��� ���
������������������������
0102030405060708090
100110120
��� ��� ��
�����������
audiology hepatitis primarytumor
0102030405060708090
100110120
��� ��� ���
��������������������
Numberof leaves
0102030405060708090
100110120
� � ���
������������������������
0102030405060708090
100110120
� ��� � �
������������
audiology hepatitis primarytumor
The [ factor:[ 0 R [ 0 R2; ó [ 0 Í � �� � � � [ 0 Í ;ÚóFigure1. ComparativeExperimentalResults
Database J48 ����� ������� � ����� ������� �audiology 78.76% 73.42% 73.86% 73.86% 71.64%hepatitis 78.06% 83.22% 83.22% 83.87% 83.87%primary-tumor 40.99% 43.34% 41.87% 43.34% 43.05%
Table 1. AccuracyResults
[MÁN 91] DE MÁNTARAS R. L., “A Distance-BasedAttributeSelectionMeasurefor Deci-sionTreeInduction”, MachineLearning, vol. 6, 1991,p. 81–92.
[MIN 89] M INGERS J., “An EmpiricalComparisonof SelectionMeasuresfor DecisionTreeInduction”, MachineLearning, vol. 3, 1989,p. 319–342.
[SIM 02] SIMOVICI D. A., JAROSZEWICZ S., “An Axiomatizationof Partition Entropy”,IEEE Transactionson InformationTheory, vol. 48,2002,p. 2138–2142.
12