20
University of Economics, Prague MLNET related activities MLNET related activities of Laboratory for of Laboratory for Intelligent Systems Intelligent Systems and and Dept. of Information and Dept. of Information and Knowledge Engineering Knowledge Engineering http://lisp.vse.cz/~berka/MLNet.html http://lisp.vse.cz/~berka/MLNet.html

University of Economics, Prague

  • Upload
    daw

  • View
    28

  • Download
    2

Embed Size (px)

DESCRIPTION

University of Economics, Prague. MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering http://lisp.vse.cz/~berka/MLNet.html. Research. probabilistic methods - decomposable probability models and bayesian networks - PowerPoint PPT Presentation

Citation preview

Page 1: University of Economics, Prague

University of Economics, PragueUniversity of Economics, Prague

MLNET related activities of MLNET related activities of Laboratory for Intelligent SystemsLaboratory for Intelligent Systems

andand

Dept. of Information and Knowledge Dept. of Information and Knowledge EngineeringEngineering

http://lisp.vse.cz/~berka/MLNet.htmlhttp://lisp.vse.cz/~berka/MLNet.html

Page 2: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 22

ResearchResearch

probabilistic methods - decomposable probabilistic methods - decomposable probability models and bayesian networks probability models and bayesian networks

symbolic methods - generalized association symbolic methods - generalized association rules and decision rules rules and decision rules

logical calculi for knowledge discovery in logical calculi for knowledge discovery in databasesdatabases

Page 3: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 33

PeoplePeople

Jiří Ivánek Radim Jiroušek

Petr Berka

Jan RauchTomáš KočkaVojtěch Svátek

Page 4: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 44

SoftwareSoftware

LISp-MinerLISp-Miner two data mining procedures: two data mining procedures:

4FT Miner 4FT Miner (generalised association rules) and(generalised association rules) and

KEX KEX (decision rules),(decision rules), large preprocessing module including SQL,large preprocessing module including SQL, output of rules in database format enables the output of rules in database format enables the

users to implement own interpretation users to implement own interpretation procedures.procedures.

Page 5: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 55

LISP-Miner proceduresLISP-Miner procedures

4FT-Miner (GUHA procedure)4FT-Miner (GUHA procedure)generalised association rules in the form generalised association rules in the form

Ant ~ Suc / CondAnt ~ Suc / Cond

KEXKEX

weighted decision rules in the formweighted decision rules in the form

Ant ==> C (weight)Ant ==> C (weight)

Page 6: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 66

4FT-Miner4FT-Miner

Data Matrix:

CLIENTS LOANS

Id Age Sex Salary District Amount Payment Months Quality

1 45 F 28 000 Prague 48 000 1 000 48 good ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 70 000 18 M 12 000 Brno 36 000 2 000 18 bad

Problem: Are there segments of clients SC and segments of loans SL such that

To be in SC is at 90% equivalent to have a loan from SL and there is at least 100 such clients

Ant is at 90% equivalent to Suc Ant 0.90%, 100 Suc is true iff a/(a+b+c) 0.9 a 100

Suc Suc a - number of objects satisfying Ant and Suc Ant a b b- number of objects satisfying Ant and not satisfying SucAnt c d c- number of objects not satisfying Ant and satisfying Suc d- number of objects satisfying neither Ant nor Suc

Page 7: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 77

4FT Miner4FT Miner

Input:

• Data matrix, • quantifier 0.90%, 100

• Derived attributes for SC (possible Ant): Age (7 values), Sex (2 values), Salary (3 values), District (77 values)

• Derived attributes for SL (possible Suc): Amount (6 values), Duration (5 values), Quality (2 values)

Output:

All Ant 0.90%, 100 Suc true in data matrix (5 equivalences from about 5 milions possible relations)

an example:

Age(20 - 30) Sex(F) Salary(low) District (Prague) 0.90%, 100 Amount<20,50) Quality(Bad)

Suc Suc

a/(a+b+c) = 0.95 0.9 Ant 950 30

950 100 Ant 20 69000

Page 8: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 88

KEX - classificationKEX - classification

F o r w a r d c h a i n i n g u s i n g p s e u d a b a y e s i a n c o m b i n i n g f u n c t i o n

x yx y

x y x y

*

* ( ) * ( )1 1

F o r a p a r t i c u l a r c a s e , a l l a p p l i c a b l e r u l e s a r e a c t i v a t e d , t h e i rc o n t r i b u t i o n s a r e c o m b i n e d i n t o f i n a l d e c i s i o n ( c l a s s p l u s w e i g h t )

E x a m p l e : J a p a n e s e C r e d i t D a t a f r o m U C I R e p o s i t o r yf o r a c a s e y e a r s _ a t _ c o m p < 3 ( 1 1 a ) A N D a g e < 2 5 ( 7 a )t h e r u l e s s

a r e c o m b i n e d i n t o f i n a l d e c i s i o n 1 + ( 0 . 4 4 0 0 ) ( l o a n n o t g r a n t e d )

1 + ( 0 . 6 8 0 0 ) 1 1 a 1 + ( 0 . 2 7 2 0 ) 7 a 1 + ( 0 . 3 0 5 2 ) 7 a 1 1 a 1 + ( 0 . 6 9 2 6 )

Page 9: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 99

KEX - learningKEX - learning

Page 10: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1010

LISp-MinerLISp-Miner

Page 11: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1111

LISp-MinerLISp-Miner

Page 12: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1212

LISp-MinerLISp-Miner

Page 13: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1313

LISp-MinerLISp-Miner

Page 14: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1414

4FT Miner and KEX 4FT Miner and KEX

ApplicationsApplications truck reliability assessmenttruck reliability assessment quality control in a breweryquality control in a brewery segmentation of clients of a banksegmentation of clients of a bank short-term electric load predictionshort-term electric load prediction

Page 15: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1515

LISp Miner LISp Miner

References:References: Berka,P. - Ivanek,J.: Automated Knowledge Acquisition for Berka,P. - Ivanek,J.: Automated Knowledge Acquisition for

PROSPECTOR-like Expert Systems. In: (Bergadano, deRaedt PROSPECTOR-like Expert Systems. In: (Bergadano, deRaedt

eds.) Proc. ECML'94, Springer 1994, 339-342.eds.) Proc. ECML'94, Springer 1994, 339-342. Berka,P. - Rauch,J.: Data Mining using GUHA and KEX. In: Berka,P. - Rauch,J.: Data Mining using GUHA and KEX. In:

(Callaos, Yang, Aguilar eds.) 4th. Int. Conf. on Information (Callaos, Yang, Aguilar eds.) 4th. Int. Conf. on Information Systems, Analysis and Synthesis ISAS'98, 1998, Vol 2, 238- 244. Systems, Analysis and Synthesis ISAS'98, 1998, Vol 2, 238- 244.

Rauch,J.: Classes of Four Fold Table Quantifiers. In: (Zytkow, Rauch,J.: Classes of Four Fold Table Quantifiers. In: (Zytkow, Quafafou eds.) Principles of Data Mining and Knowledge Quafafou eds.) Principles of Data Mining and Knowledge Discovery. Springer 1998, 203 - 211. Discovery. Springer 1998, 203 - 211.

Page 16: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1616

DatasetsDatasets

PKDD‘99 Discovery Challenge data PKDD‘99 Discovery Challenge data (http://lisp.vse.cz/pkdd99/chall.htm)(http://lisp.vse.cz/pkdd99/chall.htm)

financial data: clients of a bank, their accounts, financial data: clients of a bank, their accounts,

transactions, loans etc,transactions, loans etc,

medical data: patients with collagen diseasemedical data: patients with collagen disease

Page 17: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1717

Financial dataFinancial data

Disposition

disp_idclient_idaccount_id5369

Credit Card

disp_id892

Account

account_iddistrict_id4500

Permanentorder

account_id

6471

Loan

account_id682

Person

client_iddistrict_id

5369

Transactions

account_id1056320

Demograph.

district_id77

Page 18: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1818

Medical dataMedical data

patients

patient ID1232

Lab. tests

patient ID763

More Lab. tests

patient ID57542

Page 19: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1919

Other activitiesOther activities

Organized conferencesOrganized conferences Teaching (in czech)Teaching (in czech) KDDKDD KDD seminarKDD seminar MLML

http://lisp.vse.cz/ecml97/http://lisp.vse.cz/ecml97/

http://lisp.vse.cz/pkdd99/http://lisp.vse.cz/pkdd99/

Page 20: University of Economics, Prague

(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 2020

New projectsNew projects

SOL-EU-NET project „Data Mining and SOL-EU-NET project „Data Mining and Decision Support for Business Decision Support for Business Competitiveness: A European Virtual Competitiveness: A European Virtual Enterprise“Enterprise“

(supported by EU grant IST-1999-11.495)(supported by EU grant IST-1999-11.495)