Upload
daw
View
28
Download
2
Tags:
Embed Size (px)
DESCRIPTION
University of Economics, Prague. MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering http://lisp.vse.cz/~berka/MLNet.html. Research. probabilistic methods - decomposable probability models and bayesian networks - PowerPoint PPT Presentation
Citation preview
University of Economics, PragueUniversity of Economics, Prague
MLNET related activities of MLNET related activities of Laboratory for Intelligent SystemsLaboratory for Intelligent Systems
andand
Dept. of Information and Knowledge Dept. of Information and Knowledge EngineeringEngineering
http://lisp.vse.cz/~berka/MLNet.htmlhttp://lisp.vse.cz/~berka/MLNet.html
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 22
ResearchResearch
probabilistic methods - decomposable probabilistic methods - decomposable probability models and bayesian networks probability models and bayesian networks
symbolic methods - generalized association symbolic methods - generalized association rules and decision rules rules and decision rules
logical calculi for knowledge discovery in logical calculi for knowledge discovery in databasesdatabases
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 33
PeoplePeople
Jiří Ivánek Radim Jiroušek
Petr Berka
Jan RauchTomáš KočkaVojtěch Svátek
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 44
SoftwareSoftware
LISp-MinerLISp-Miner two data mining procedures: two data mining procedures:
4FT Miner 4FT Miner (generalised association rules) and(generalised association rules) and
KEX KEX (decision rules),(decision rules), large preprocessing module including SQL,large preprocessing module including SQL, output of rules in database format enables the output of rules in database format enables the
users to implement own interpretation users to implement own interpretation procedures.procedures.
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 55
LISP-Miner proceduresLISP-Miner procedures
4FT-Miner (GUHA procedure)4FT-Miner (GUHA procedure)generalised association rules in the form generalised association rules in the form
Ant ~ Suc / CondAnt ~ Suc / Cond
KEXKEX
weighted decision rules in the formweighted decision rules in the form
Ant ==> C (weight)Ant ==> C (weight)
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 66
4FT-Miner4FT-Miner
Data Matrix:
CLIENTS LOANS
Id Age Sex Salary District Amount Payment Months Quality
1 45 F 28 000 Prague 48 000 1 000 48 good ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 70 000 18 M 12 000 Brno 36 000 2 000 18 bad
Problem: Are there segments of clients SC and segments of loans SL such that
To be in SC is at 90% equivalent to have a loan from SL and there is at least 100 such clients
Ant is at 90% equivalent to Suc Ant 0.90%, 100 Suc is true iff a/(a+b+c) 0.9 a 100
Suc Suc a - number of objects satisfying Ant and Suc Ant a b b- number of objects satisfying Ant and not satisfying SucAnt c d c- number of objects not satisfying Ant and satisfying Suc d- number of objects satisfying neither Ant nor Suc
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 77
4FT Miner4FT Miner
Input:
• Data matrix, • quantifier 0.90%, 100
• Derived attributes for SC (possible Ant): Age (7 values), Sex (2 values), Salary (3 values), District (77 values)
• Derived attributes for SL (possible Suc): Amount (6 values), Duration (5 values), Quality (2 values)
Output:
All Ant 0.90%, 100 Suc true in data matrix (5 equivalences from about 5 milions possible relations)
an example:
Age(20 - 30) Sex(F) Salary(low) District (Prague) 0.90%, 100 Amount<20,50) Quality(Bad)
Suc Suc
a/(a+b+c) = 0.95 0.9 Ant 950 30
950 100 Ant 20 69000
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 88
KEX - classificationKEX - classification
F o r w a r d c h a i n i n g u s i n g p s e u d a b a y e s i a n c o m b i n i n g f u n c t i o n
x yx y
x y x y
*
* ( ) * ( )1 1
F o r a p a r t i c u l a r c a s e , a l l a p p l i c a b l e r u l e s a r e a c t i v a t e d , t h e i rc o n t r i b u t i o n s a r e c o m b i n e d i n t o f i n a l d e c i s i o n ( c l a s s p l u s w e i g h t )
E x a m p l e : J a p a n e s e C r e d i t D a t a f r o m U C I R e p o s i t o r yf o r a c a s e y e a r s _ a t _ c o m p < 3 ( 1 1 a ) A N D a g e < 2 5 ( 7 a )t h e r u l e s s
a r e c o m b i n e d i n t o f i n a l d e c i s i o n 1 + ( 0 . 4 4 0 0 ) ( l o a n n o t g r a n t e d )
1 + ( 0 . 6 8 0 0 ) 1 1 a 1 + ( 0 . 2 7 2 0 ) 7 a 1 + ( 0 . 3 0 5 2 ) 7 a 1 1 a 1 + ( 0 . 6 9 2 6 )
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 99
KEX - learningKEX - learning
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1010
LISp-MinerLISp-Miner
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1111
LISp-MinerLISp-Miner
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1212
LISp-MinerLISp-Miner
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1313
LISp-MinerLISp-Miner
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1414
4FT Miner and KEX 4FT Miner and KEX
ApplicationsApplications truck reliability assessmenttruck reliability assessment quality control in a breweryquality control in a brewery segmentation of clients of a banksegmentation of clients of a bank short-term electric load predictionshort-term electric load prediction
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1515
LISp Miner LISp Miner
References:References: Berka,P. - Ivanek,J.: Automated Knowledge Acquisition for Berka,P. - Ivanek,J.: Automated Knowledge Acquisition for
PROSPECTOR-like Expert Systems. In: (Bergadano, deRaedt PROSPECTOR-like Expert Systems. In: (Bergadano, deRaedt
eds.) Proc. ECML'94, Springer 1994, 339-342.eds.) Proc. ECML'94, Springer 1994, 339-342. Berka,P. - Rauch,J.: Data Mining using GUHA and KEX. In: Berka,P. - Rauch,J.: Data Mining using GUHA and KEX. In:
(Callaos, Yang, Aguilar eds.) 4th. Int. Conf. on Information (Callaos, Yang, Aguilar eds.) 4th. Int. Conf. on Information Systems, Analysis and Synthesis ISAS'98, 1998, Vol 2, 238- 244. Systems, Analysis and Synthesis ISAS'98, 1998, Vol 2, 238- 244.
Rauch,J.: Classes of Four Fold Table Quantifiers. In: (Zytkow, Rauch,J.: Classes of Four Fold Table Quantifiers. In: (Zytkow, Quafafou eds.) Principles of Data Mining and Knowledge Quafafou eds.) Principles of Data Mining and Knowledge Discovery. Springer 1998, 203 - 211. Discovery. Springer 1998, 203 - 211.
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1616
DatasetsDatasets
PKDD‘99 Discovery Challenge data PKDD‘99 Discovery Challenge data (http://lisp.vse.cz/pkdd99/chall.htm)(http://lisp.vse.cz/pkdd99/chall.htm)
financial data: clients of a bank, their accounts, financial data: clients of a bank, their accounts,
transactions, loans etc,transactions, loans etc,
medical data: patients with collagen diseasemedical data: patients with collagen disease
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1717
Financial dataFinancial data
Disposition
disp_idclient_idaccount_id5369
Credit Card
disp_id892
Account
account_iddistrict_id4500
Permanentorder
account_id
6471
Loan
account_id682
Person
client_iddistrict_id
5369
Transactions
account_id1056320
Demograph.
district_id77
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1818
Medical dataMedical data
patients
patient ID1232
Lab. tests
patient ID763
More Lab. tests
patient ID57542
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 1919
Other activitiesOther activities
Organized conferencesOrganized conferences Teaching (in czech)Teaching (in czech) KDDKDD KDD seminarKDD seminar MLML
http://lisp.vse.cz/ecml97/http://lisp.vse.cz/ecml97/
http://lisp.vse.cz/pkdd99/http://lisp.vse.cz/pkdd99/
(c) Petr Berka, LISp, 2000(c) Petr Berka, LISp, 2000 2020
New projectsNew projects
SOL-EU-NET project „Data Mining and SOL-EU-NET project „Data Mining and Decision Support for Business Decision Support for Business Competitiveness: A European Virtual Competitiveness: A European Virtual Enterprise“Enterprise“
(supported by EU grant IST-1999-11.495)(supported by EU grant IST-1999-11.495)