143
Intelligent Data Mining Techniques (tutorial presented at ANNIE´2003) Iveta Mrázová Department of Software Engineering Charles University Prague

Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Intelligent Data Mining Techniques(tutorial presented at ANNIE´2003)

Iveta Mrázová

Department of Software EngineeringCharles University Prague

Page 2: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 2

Content outlineIntelligent Data Mining: introduction and overview of Intelligent DataMining Techniques (20 min)Selected Data Mining Techniques: principles and examples

– undirected DM-techniques:Market Basket Analysis (MBA) - (20 min)Link Analysis and Scale-Free Networks (10 min)Automatic Cluster Detection and Fuzzy Systems:Clustering the World Bank Data (20 min)

– directed DM-techniques:Internal Knowledge Representation in BP-Networks (20 min)Modular Networks, Sensitivity Analysis and Feature Selection (20 min)Neural Networks and Decision Trees: Students´Questionnaire (20 min)Genetic Algorithms and BP-networks: Generating Melodies (10 min)

Conclusions, Questions + Answers (10 min)

Page 3: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 3

Intelligent Data Mining: ReferencesM. J. A. Berry, G. Linoff: Data Mining Techniques for Marketing, Sales,and Customer Support, John Wiley & Sons, 1997M. J. A. Berry, G. Linoff: Mastering Data Mining, John Wiley & Sons,2000J. Han, M. Kamber: Data Mining: Concepts and Techniques, MorganKaufmann Publishers, 2001D. Hand, H. Mannila, P. Smyth: Principles of Data Mining, The MIT Press,2001D. Pyle: Data Preparation for Data Mining, Morgan Kaufmann Press, 1999I. H. Witten, E. Frank: Data Mining: practical machine learning tools andtechniques with Java implementations, Morgan Kaufmann Publishers, 2000http://www.mkp.com/datamining

http://www.cs.waikato.ac.nz/ml/weka

Page 4: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 4

What is Data Mining?discovering patterns in data– discovered patterns should be meaningful– should lead to some advantage, e.g. economic, …– allows to make non-trivial predictions on new data

the data is present in substantial quantitiesautomatic or semi-automatic processtwo extremes for the form of discovered patterns:– black box - e.g. neural networks– transparent box - more structured, capture the

decision structure in an explicit way

Page 5: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 5

Building models for the data

Classification model:– assigns an existing classification to new records

Predictive model– Time-series model

Clustering model

Model

INPUTS

Output

Confidence level

Page 6: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 6

Data Analysis:Influence of other disciplinesstatisticssamplingregression analysis

– linear regressioncorrelation analysis

memory-based reasoninglink analysisgenetic algorithms andneural networks

interpret observationsreduce the size of datainter- and extrapolateobservations

– fit a line to observed datamutual occurrence ofobservationsdirectly from AIgraph theorymodel biological processes

Page 7: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 7

Intelligent DM-Techniques: an overview

Market Basket Analysis (MBA)Memory-Based Reasoning (MBR)Automatic Cluster DetectionFuzzy Systems (FS)Link AnalysisDecision TreesArtificial Neural Networks (ANN)Genetic Algorithms (GA)

Page 8: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 8

Analyses in the retail industry:

What items occur together in a “basket”?

Results:– expressed as rules– highly actionable

Applications:– planning store layouts– offering coupons, limiting specials– bundling products

Market Basket Analysis (MBA)

Page 9: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 9

Memory-Based Reasoning (MBR)

Look for the nearest “known” neighbor toclassify or predict value!applicable to virtually any datanew instances learned by adding them to the data setdistance to neighbors estimates the correctness of theresultsKey elements in MBR:

– distance function - to find nearest neighbors– combination function - combine values at nearest

neighbors to classify or predict

Page 10: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 10

Link AnalysisGoals:– find patterns in relationships between records– visualize the links

Application areas:– telecommunications– law enforcement - clues about crimes are linked

together to solve them– marketing - relationships between customers

Page 11: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 11

Goal:

Find previously unknown similarities in the data!

Build models that find data records similarto each otherGood as an initial analysis of the dataUndirected data mining

Automatic Cluster Detection

Page 12: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 12

Decision Trees and Rule Induction

Divide the data into disjoint subsetscharacterized by simple rules!Directed data mining (classification)Explainable rules applicable directly to newrecordsTechniques:– Classification And Regression Trees (CART)– Chi-squared Automatic Induction (CHAID)– C4.5

Page 13: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 13

Artificial Neural Networks (ANN)

Detect patterns in the data in a way “similar” tohuman thinking!

Directed data mining (classification andprediction)Applicable also to undirected data mining (SOMs)Two major drawbacks:– difficulty in understanding the models they produce– sensitivity to the format of incoming data

Page 14: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 14

Genetic Algorithms (GA)Apply genetics and natural selection to findoptimal parameters of a predictive function!

GA use “genetic” operators to evolve successivegenerations of solutions:– selection– crossover– mutation

Best candidates “survive” to further generationsuntil convergence is achievedDirected data mining

Page 15: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 15

On-Line Analytic Processing (OLAP)

an important tool for extracting and presentinginformationfacilitates understanding of the data and importantpatterns inside ita way of presenting relational data to usersmulti-dimensional databases (MDDs):– a representation of data– allows users to drill down into the data and understand

various important summarizations

Page 16: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 16

Analyses in the retail industry:

What items occur together in a “basket”?

Results:– expressed as rules– highly actionable

Applications:– planning store layouts– offering coupons, limiting specials– bundling products

Market Basket Analysis (MBA)

Page 17: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 17

Association rulesHow do the products relate one to each other?Association rules should be:

– easy to understand: once the pattern is found, it is easy tojustify it

– useful: contain actionable information leading to otherinterventions

Association rules should not be:– trivial: results are already known by anyone familiar with

the business– inexplicable: seem to have no explanation and do not

suggest any action

Page 18: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 18

MBA to compare storesVirtual items:– specify which group the transaction comes from– do not correspond to a product or service

Comparison between new and existing stores:1 Gather data for a specific period from store openings2 Gather about the same amount of data from existing

stores3 Apply MBA to find association rules in each set4 Consider especially association rules containing the

virtual items

Page 19: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 19

MBA - how does it work?Items - products or service offeringsTransactions contain one or more itemsCo-occurrence table– indicates the number of times that any two

items co-occur in a transaction (i.e. theseproducts were purchased together)

– values along the diagonal represent the numberof transactions containing just that one item

Page 20: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 20

MBA - exampleGrocery transactions:

Customer Items 1 bread, butter 2 milk, bread, butter 3 bread, coffee 4 bread, butter, coffee 5 coffee, butter

Co-occurrence of products:

bread butter milk coffee bread 4 3 1 2 butter 3 4 1 2 milk 1 1 1 0 coffee 2 2 0 3

Sales patterns apparent from the co-occurrence table:

Milk is never purchased with coffee. Bread and butter are likely to be purchased together.

Page 21: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 21

Rule: IF Condition THEN Result. ( Rule_r : IF Item_i THEN Item_j . )

Questions:– How good are the found association rules?

supportconfidenceimprovement

– How to find association rules automatically?

MBA - Association rules

Page 22: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 22

Support and confidenceSupport: How frequently can the rule be applied?

Confidence: How much can we rely on the result of the rule?

Nr_of_Transactions_containing_i_and_j

Number_of_all_Transactions

Nr_of_Transactions_containing_i

Nr_of_Transactions_containing_i_and_j• 100 %

• 100 %Support(Rule_r) =

Confidence(Rule_r) =

Page 23: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 23

Support and confidence - example

Rule 1: If a customer purchases bread then the customer also purchases butter.

Rule 2: If a customer purchases coffee then the customer also purchases butter.

Support ( Rule_1 ) = 3 / 5 • 100 % = 60 %

Confidence ( Rule_1 ) = 3 / 4 • 100 % = 75 %

Support ( Rule_2 ) = 2 / 5 • 100 % = 40 %

Confidence ( Rule_2 ) = 2 / 3 • 100 % = 66 %

Page 24: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 24

Improvement of a rule

p(i_and_j) p(i) • p(j)

Improvement(Rule_r) =

Improvement: How much is a rule better at predict- ing the result than just assuming it?

If Improvement < 1:rule is worse at predicting the result than random chanceNEGATING the result might produce a better rule

IF Condition THEN NOT Result.

Page 25: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 25

Improvement of a rule - example

Rule: If a customer purchases milk then the customer also purchases butter.

Support ( Rule_1 ) = 1 / 5 • 100 % = 20 %

Confidence ( Rule_1 ) = 1 / 1 • 100 % = 100 %

Improvement ( Rule_1 ) = ( 1 / 5 ) / ( ( 1 / 5 ) • ( 4 / 5 ) ) = 5 / 4 = 1.25

Page 26: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 26

Basic steps of MBAChoose the right set of items and the right levelGenerate rules by deciphering the co-occurrencematrix– calculate the probabilities and joint probabilities of

items and their combinations in transactions– limit the search with thresholds set on support

Analyze probabilities to determine best rules– overcome limits imposed by the number of items and

their combinations in “interesting” transactions

Page 27: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 27

MBA - the choice of right items

Gathering transaction data:often bad quality requiring extensive pre-processingitems of interest may change over timethe right level of detail:– a growing number of item combinations– actionable results (specific items)– rules with sufficient support (frequent occurrence in

the data set)

Page 28: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 28

Taxonomies: hierarchical categories

MBA - Complexity of generated rules:Use more general items initiallyThen, generate rules for more specific items using onlytransactions containing these items

MBA - Actionable results:Items should occur in roughly the same number of transactions:

roll up rare items to higher levels in the taxonomy (tobecome more frequent)keep more common items at lower levels (to prevents rulesfrom being dominated by the most common items)

Page 29: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 29

Virtual items: go beyond the taxonomy

cross product boundaries of original items– e.g. designer labels - Calvin Klein

may include information about the transaction itself– anonymous (day of week, time, etc.)– signed (info about customers and their behavior over time)

might be a cause of redundant rules– items from the taxonomy are associated with just one virtual

item (“If Coke product then Coke.”)– virtual and generalized items appear together in a rule (“If

Coke product and diet soda then pretzels” instead of “If dietcoke then pretzels”)

Page 30: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 30

MBA - generating rulesCompute the co-occurrence table:– provides the information about which combinations of

items occur most commonly in the transactions– applicable for evaluating basic probabilities necessary to

evaluate the importance of generated rules

Provide useful rules:– improvement should be greater than 1

low improvement can be increased by negating the rulesnegated rules might be less actionable than original rules

– reduce the number of generated rules - PRUNING

Page 31: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 31

Minimum support pruningEliminate less frequent items

actions should affect enough transactionstwo possibilities:– eliminate rare items from consideration (then,

eliminate their respective associative rules)– use taxonomy to generalize items (then, resulting

generalized items should meet the threshold criteria)variable minimum support - a cascading effect

Page 32: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 32

MBA - Dissociation rulesRule: IF A AND NOT B THEN C.

– Introduce new items inverse to original ones– Each transaction will contain an inverse item if it does

not contain the original oneDrawbacks:– doubled number of items– growing size of transactions– inverse items tend to occur more frequently than original

(leading to less actionable rules with all items inverted:“IF NOT A AND NOT B THEN NOT C.”)

Page 33: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 33

Time-series analysis with MBAAnalyze cause and effects:– time- or sequencing information to determine when

transactions occurred relative to each other– usually requires some way of identifying the customer

Conversions to an MBA-problem:– include in transactions items before the event of interest

(for causes) or after the event of interest (for effects);then, remove duplicate items from the transaction

– time-window: a “snapshot” of all items that occurwithin a certain period (e.g. all transactions within a month)

trends for rare items

Page 34: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 34

Strengths of MBAProduces clear and understandable results– actionable IF - THEN - rules

Supports undirected data mining– important when approaching large data sets

with no prior knowledgeWorks on variable-length dataComputations are easy to understand– Computational costs grow exponentially with

the number of items!

Page 35: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 35

Weaknesses of MBAExponentially growing computational costs– necessity for item taxonomies and virtual items

Limited support for attributes on the data– pruning of less actionable general items

Difficult to determine the right number of items– items should have approximately the same frequency

Discounts rare items– variable thresholds for minimum support pruning– higher levels in item taxonomies

Page 36: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 36

Link AnalysisGoals:– find patterns in relationships between records– visualize the links

Application areas:– telecommunications– law enforcement - clues about crimes are linked

together to solve them– marketing - relationships between customers

Page 37: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 37

Scale-Free Networks

Some nodes have an extremely large number oflinks (edges) to other nodes - hubsMost nodes have just a few links to other nodesRobust against accidental failuresVulnerable to coordinated attacksNew application areas

– preventing computer viruses spreading through the Internet– medicine (vaccinations)– business (marketing)

Page 38: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 38

Scale-Free Networks

adapted from “A. L. Barabasi and E. Bonabeau: Scale-Free Networks, Scientific American, May 2003”

A random graph

Distribution of edges Distribution of edges

number of edges number of edgesnr. o

f nod

es

nr. o

f nod

es

A scale-free network

Page 39: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 39

Examples of Scale-Free NetworksSocial networks

– research collaboration (scientists, co-authorship of papers)– Hollywood (actors, appearance in the same movie)

Biological networks– cellular metabolism (molecules involved in energy production,

participation in the same biological reaction)– protein regulatory network (proteins controlling cell activity,

interactions among proteins)

Socio-technical networks– Internet (routers, optical or other connections)– World Wide Web (Web-pages and URLs)

Page 40: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 40

Scale-Free Networks:basic characteristics

Two basic mechanisms:– growth– preferential attachment

“The rich get richer” (hubs):– new nodes tend to connect to the more

connected sites– “popular locations” acquire more links

over time than less connected neighborsReliability

– accidental failures (80% of randomlyselected nodes can fail withoutfragmenting the cluster)

– coordinated attacks (eliminating 5-15% of all hubs can crash the system)

Page 41: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 41

Scale-Free Networks

adapted from “A. L. Barabasi and E. Bonabeau: Scale-Free Networks, Scientific American, May 2003”

node

before before before

hub hub

failed node failed node

after afterafter

attacked hub

Random network accidental node failure

Scale-free network accidental node failure

Scale-free network attack on hubs

Page 42: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 42

Implications ofScale-Free NetworksComputing– networks with scale-free architectures

Medicine– vaccination campaigns and new drugs

Business– cascading financial failures– marketing

Page 43: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 43

Implications ofScale-Free Networks

Computingcomputer networks with scale-free architectures(e.g. WWW)

highly resistant to accidental failuresvery vulnerable to deliberate attacks and sabotage

eradicating viruses from the Internet will beeffectively impossible

Page 44: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 44

Implications ofScale-Free NetworksMedicine

vaccination campaigns against serious virusesfocused on hubs– people with many connections to others– difficult to identify such people

new drugs targeting the hub molecules involved incertain diseasescontrol the side-effects of drugs with maps ofnetworks within cells

Page 45: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 45

Implications ofScale-Free NetworksBusiness

financial failures– understand how companies, industries and economies

are inter-linked– monitor and avoid cascading financial failures

marketing– study the spread of a contagion on a scale-free network– more efficient ways of propagating consumer buzz

about new products

Page 46: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 46

Goal:

Find previously unknown similarities in the data!

Build models that find data records similarto each otherGood as an initial analysis of the dataUndirected data mining

Automatic Cluster Detection

Page 47: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 47

Economies groupedaccording to their results

purchasing power paritygross national product

GD

P gr

owth

rate

s

Page 48: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Mining the World Bank Data:the Fuzzy c-means Clustering Approach

with Cihan H. Dagli,Engineering Management Department, University of Missouri - Rolla

Page 49: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 49

FCM-clustering: introductionWorld Development Indicators (WDI)– published annually by the World Bank– reflect development process in the countries– incomplete and imprecise data

Previously applied techniques– regression analysis - linear relationships– US-based grouping of countries (G. Ip, Wall Street

Journal)– GDP-based grouping of economies (World Bank)– self-organizing feature maps (T. Kohonen, S. Kaski, G.

Deboeck)

Page 50: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 50

• more neurons thancountries

• only localgeometric relationsare important

• countries mappedclose to each otherhave a similar stateof development

adapted from “T. Kohonen: Self-Organizing Maps,3-rd Edition, Springer-Verlag, 2001”

Poverty maps - T. Kohonen

Page 51: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 51

Poverty maps - T. Kohonen, S. Kaski

U-matrix:illustrate “boundaries”between clustersrepresent averagedistances betweenneighboring neuronsin a gray scale

small average distance ⇒ light shade

large average distance ⇒ dark shadeadapted from “T. Kohonen: Self-Organizing Maps,

3-rd Edition, Springer-Verlag, 2001”

Page 52: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 52

Our goalCluster efficiently imprecise dataEstimate the number of clustersVisualize the resultsInterpret the results

Page 53: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 53

Our goalCluster efficiently imprecise dataEstimate the number of clustersVisualize the resultsInterpret the results

fuzzy c - means clustering (FCM) cluster validity indicators spread-sheet-like form find “landmarks”

Page 54: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 54

corresponds to the weighted distancebetween input patterns and cluster centers:

membership degrees between 0 and 1:total membership of a pattern equals to 1:no empty or full clusters:

The objective function

−= ∑∑ ∑

== =

n

jijpj

P

p

c

i

sips vxuJ

1

2

1 1

)()(),( vU

membership degreefuzziness parameter input pattern

cluster center

10 ≤≤ ipu1

1=∀ ∑=

c

i ipup

Pui P

p ip <<∀ ∑ =10

Page 55: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 55

Fuzzy c-means Clustering (FCM)Step 1: Initialize c, s, ε and t. Choose randomly .Step 2: Determine new fuzzy cluster centers:

Step 3: Calculate new partition matrix :

Step 4: EvaluateIf ∆ > ε then set t = t + 1 and go to Step 2. If ∆ ≤ ε then Stop.END of FCM

)0(U

)()1(,

)()1( max tip

tippi

tt uuUU −=−=∆ ++

ps

p

tip

p

stip

ti xu

uv rr ∑∑

= )()(

1 )()(

)(

∑=

−+

−= c

k

stkp

stipt

ip

vx

vxu

1

1/12)(

1/12)()1(

)/1(

)/1(rr

rr

)1( +tU

Page 56: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 56

Cluster validity criteriaPartition coefficient:

Partition entropy:

Windham´s proportion exponent:

∑ ∑= =

=P

p

c

iipu

PcUF

1 1

2)(1);(

)ln(1);(1 1

ip

P

p

c

iip uu

PcUH ∑ ∑

= =

−= .0for 0)ln( ; == ipipip uuu

( )

⋅−

−−= −

=

+

=∑∑

1

1

1

1

1)1(ln);(1

cp

j

jP

p

jjc

cUWp

µµ

{ }ipcip u max ; 1 ≤≤=µ

clusters

membershipdegreepatterns

Page 57: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 57

How many clusters?Partition coefficient:

Partition entropy:

Windham´s proportion exponent:

{ [ ] });(minmin 12 cUHUPc −≤≤

{ [ ] });(maxmax 12 cUFUPc −≤≤

{ [ ] });(maxmax 12 cUWUPc −≤≤

clusterspartitions

Page 58: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 58

Supporting experiments - artificial dataCluster validity indicators for artificial data Fuzzy 4-partition of the data

21 input patterns, s = 1.4, ε = 0.05 ´×´ indicates cluster centers, patterns from the same clusters are labeled identically

Page 59: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 59

Supporting experiments - artificial dataFuzzy 8-partition of the dataFuzzy 6-partition of the data

´×´ indicates cluster centers, patterns from the same clusters are labeled identically

´×´ indicates cluster centers, patterns from the same clusters are labeled identically

Page 60: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 60

Interpret the results!Characteristic features for detected clusters:

cluster centers - “fictive” patterns out of the data set“calibrate” clusters with the “most representative”patterns from the data set - based on just one patternDetermine outstanding properties for clusters:

– compared to other properties within the cluster– compared to properties of other clusters– exception: “border areas”

fuzzy c-landmarks

Page 61: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 61

Automatic landmark selectionFuzzy c-landmark for cluster :

“fuzzy distance” from the cluster center should be small“fuzzy distance” from all other cluster centers should belarge

=

=

≠≤≤

=

=

≤≤

=

P

p

spi

P

pijpj

spi

iici

P

p

spi

P

pjipj

spi

nj

u

vxu

u

vxu

j

1

11

1

1

1*

*

*

*

*

**

min

min arg

*i ( )**,*jivj

membershipdegrees

cluster centers

input patterns

clusters

inputs

Page 62: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 62

Supporting experiments:The World Bank Data

99 state economies with 16 (latest) indicators for eachcountryeconomical and social potential of countries and their citizensall indicators are relative to populationelement-wise transformation to (0,1) with:

and

the choice of other parameters (k=4; s=1.4; ε=0.05)

minmax

min´xx

xxx−

−=

)2/1´(11´´ −−+

= xkex

maximum over all patternsminimum over all patterns

Page 63: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 63

Used Development IndicatorsGNP per capitaPurchasing Power ParityGrowth rate of GDP percapitaGDP implicit deflatorExternal debt (% of GNP)Total debt service (% ofexport of goods and services)High technology exports(% of manufactured exports)Military expenditures(% of GNP)

Expenditures for R&D (% ofGNP)Total expenditures on health (%of GDP)Public expenditures on education(% of GNP)Male life expectancy at birthFertility ratesGINI-index (distribution ofincome/consumption)Internet hosts per 10000 peopleMobile phones per 1000 people

Page 64: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 64

Supporting experiments:the World Bank data Cluster validity indicators for the WB-data

99 countries with 16 indicators s = 1.1, ε = 0.05

Cluster validity indicators for the WB-data

99 countries with 16 indicators s = 1.4, ε = 0.05

Page 65: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 65

Fuzzy 7-partition of the WB data

A part of the fuzzy 7-partition of the World Bank data: 99 countries with 16 indicators; s = 1.4, ε = 0.05

Page 66: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 66

Landmarks for the WB data

“Representative patterns” and fuzzy 7-landmarks for the World Bank data: 99 countries with 16 indicators; s = 1.4, ε = 0.05

Page 67: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 67

FCM-Clustering: conclusionsFCM-clustering– efficiency and cluster validity– choice of the fuzziness parameter– grouping of country economies (World Bank, Ip,

Kohonen, Deboeck)

Visualization– membership degree– topological relationships

Landmarks and interpretation of the results– formulation of “class discriminating” criteria

Page 68: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 68

Rule extraction: Characteristics Economical_results– fuzzy inference systems– (feed-forward) neural networks

back-propagationRBF-networks

Neuro-fuzzy systems with adaptive inputs– detection of significant input patterns– influence of internal knowledge representation– speed-up the training and recall process

From FCM towards Fuzzy Systems

Page 69: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 69

Artificial Neural Networks (ANN)

Detect patterns in the data in a way “similar” tohuman thinking!

Directed data mining (classification andprediction)Applicable also to undirected data mining (SOMs)Two major drawbacks:– difficulty in understanding the models they produce– sensitivity to the format of incoming data

Page 70: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Back-Propagationand GREN-networks

Page 71: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 71

Introduction

Multi-layer feed-forward networks (BP-networks)– one of the most often used models– relatively simple training algorithm– relatively good results

Limits of the considered model– the speed of the training process– convergence and local minimums– generalization and “over-training” additional demands on the desired network behavior

Page 72: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 72

corresponds to the difference between theactual and the desired network output:

the goal of the training process is to minimizethis difference on the given training set

Back-Propagation training algorithm

The error function

( )2,,2

1 ∑ ∑ −=p j

pjpj dyE

actual output

desiredoutput

patterns output neurons

Page 73: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 73

The Back-Propagationtraining algorithm

computes the actual outputfor a given training patterncompares the desired andthe actual outputadapts the weights and thethresholds

– against the gradient of theerror function

– backwards from the outputlayer towards the inputlayer

I N P U T

O U T P U T

Page 74: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 74

Drawbacks of thestandard BP-model

The error function– correspondence to the desired behavior– the form of the training set

requires the knowledge of desired network outputsbetter performance for “larger” and “well-balanced”training sets

Generalization abilities– ability to interpret and evaluate the “gained” experience– retraining for modified and/or developing task domains

Page 75: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 75

Desired propertiesof trained networks

Robustness against small deviations ofthose input patterns lying “close to theseparating hyper-plane”Transparent network structure with asuitable internal knowledge representation

A possible reuse of already trainednetworks under changed conditions

Page 76: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 76

Condensed internal representation

interpret the activity ofhidden neurons:1 active YES0 passive NO

silent“no decision possible”

“clear” the innernetwork structuredetect superfluousneurons and pruneI N P U T

O U T P U T

12

Page 77: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 77

How to force the condensedinternal representation?

formulate “the desired properties” in the form ofan objective function:

local minima of the representation error functioncorrespond to active, passive and silent states:

G E c Fs= +Standard error function

Representation error function

the influence of F on G

( ) ( )F y y yh ps

hph p

s

h p= − −∑∑ , , , .1 0 52

silent stateactive state

passive statepatternshidden neurons

the shape of F

Page 78: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 78

Influence of parametersslower forcing of the internalrepresentation and the desirednetwork functionstability of the forced internalrepresentation and an optimalnetwork architecturethe shape of the representationerror function, the speed of therepresentation forcing processand its formthe time-overhead of the weightadjustment( ) ( )w t w t y yij ij j i r j i+ = + + +1 αδ α ρ

( ) ( )( )+ − −α m ij ijw t w t 1

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0 0.2 0.4 0.6 0.8 1

repr

esen

tatio

n er

ror

actual neuron output

Page 79: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

0

5e-07

1e-06

0 0.2 0.4 0.6 0.8 10

9e-08

1.8e-07

0 0.2 0.4 0.6 0.8 1

0

4e-05

8e-05

0 0.2 0.4 0.6 0.8 10

0.008

0.016

0 0.2 0.4 0.6 0.8 1

Shape of the representation function( ) ( )F y y ys s t

= − −1 0 5.

s t= =1 2

s t= =8 2

s t= =4 2

s t= =5 4

Page 80: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 80

Further modifications ofthe representation function

Discrete internal representation:(S allowed output values for neurons from thelast hidden layer)

Condensed internal representation for allhidden layers:

( ) ( ) ( )F y r y r y rj pjp

t

j p S

t

j p ssjp

tS s

= − − = −∑∑ ∏∑∑, , ,1

2 2 21

K

( ) ( )F y y yj ps

jplj p

s

j pll

l l= − −∑∑∑ '

'' ',

', , .1 0 5

2

r rS1, ,K

Page 81: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 81

Unambiguousinternal representation

Patterns with highly different outputs should formhighly different internal representationsFormulate the requirements as a modifiedobjective function:Ambiguity criterion for the internal representation:

G E F H= + +

( ) ( )H d d y yo p o qojq pp

j p j q= − − −∑∑∑∑≠

12

2 2

, , , ,

= const. for a given p

= const. for a given p= const. for a given ppatterns

hidden neurons output neurons

Page 82: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 82

Modular structure of BP-networks

Decompose the task into the particular subtasksPropose and form the modular architecture– strategy for extracting ε-equivalent BP-modules

elimination of superfluous hidden and/or input neuronssuitable for "already trained" networksa compromise between the desired accuracy of the extractedmodule and its optimal architecture

Communication between the particular modules– serial and parallel composition of BP-networks

Page 83: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 83

0

0.5

1

1.5

2

2.5

-4 -3 -2 -1 0 1 2 3 4 5

0

0.5

1

Extracting BP-modules- allowed potential deviations

The potential changeis in this case smaller than thepotential changeThe potential should change"towards the separating hyper-plane"The changed potential shouldpreserve the location of the inputpattern in the same half-spaceThe allowed potential changesshould be independent of eachparticular input pattern (from S)

( )δ ξr−

( )δ ξr+

ξ δ ξ ξ ξ δ ξ− − +r r( ) ( ) +

f r( )ξ ε+f ( )ξ

f r( )ξ ε−

••° °εr = 0 2.

δ ξ

δ ξr

r

+

( )( )

Page 84: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 84

Notes on the construction ofan ε-equivalent network

possible improvements ofnetwork properties:

– “egalitarian” versus“differentiated” approach

the relationship of theconstruction to “morerobust” networks

– necessary knowledge ofεr-boundary regions

– preserve the created internalrepresentation

I N P U T

O U T P U T

Page 85: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 85

Desired properties of “experts”for training (modular) BP-networks

evaluate the error connectedwith the actual response ofa BP-network“explain” the BP-network itserror during trainingnot require the knowledge ofthe desired network outputbut should recognize a correctbehavior“suggest” a “better” behavior

Page 86: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 86

Desired properties of “experts”for training BP-networks

evaluate the error connectedwith the actual response ofa BP-network“explain” the BP-network itserror during trainingnot require the knowledge ofthe desired network outputbut should recognize a correctbehavior“suggest” a “better” behavior

Page 87: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 87

Desired properties of “experts”for training BP-networks

evaluate the error connected withthe actual response of a BP-network“explain” the BP-network its errorduring trainingnot require the knowledge of thedesired network outputbut should recognize a correctbehavior“suggest” a “better” behavior

Page 88: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 88

GREN-networks:Generalized relief error networks

assign the error to the pairs[input pattern, actual output]trained e.g. by the standardBP-training algorithmshould have good approximationand generalization abilities“approximates” the error functionby:

INPUT PATTERN ACTUAL OUTPUT

ERROR VALUES

∑ ∑=p e

GRpe

BeE ,

output neurons ofthe GREN-network

output values ofthe GREN-networkpatterns

Page 89: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 89

A modular system for trainingBP-networks with GREN-networks

ERROR VALUES

INPUT PATTERN

INPUT PATTERN

ACTUAL OUTPUT

GREN-NETWORK

ADAPTEDBP-NETWORK

Page 90: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 90

Applied the basic idea of Back-PropagationHow to determine the error terms at theoutput of the trained BP-network?

Use error terms back-propagated from the GREN-network

Weight adjustment rules similar to thestandard Back-Propagation

Training with a GREN-network

Page 91: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 91

Training with a GREN-network

Applied the basic idea of Back-Propagation

How to determine at the outputlayer of the BP-network B?

Bij

Bj

Bj

Bj

Bj

Bij

BijE w

yyE

wEw

∂∂

∂∂

∂∂

−=∂∂

−=∆ξ

ξ

potential of neuron j

weight of a BP-network Bactual output

error computed by the GREN-network

BjyE ∂∂

Page 92: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 92

Weight adjustment rules

Use error terms back-propagated from theGREN-networkRules similar to the standard Back-Propagation

For output neurons, compute by means ofpropagated from the GREN-network

Bi

Bj

Bij

Bij yoldwneww αδ+= )()(

BGRk

BGRk

BGRk

B yy

EGRk ξ

δ∂

∂∂−=

Bjδ

learning rate error term

actual output of neuron i

weight

BGRkδ

BGR

error term

actual output of neuron k

potential of neuron k

error

Page 93: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 93

Error terms forthe trained BP-network

The back-propagated error terms correspondfor to:

( )

−−

=

)´(

)´(

)´(1

Bj

k

Bjk

Bk

Bj

k

GRjk

GRk

Bj

e

GRje

GRe

GRe

Bj

fw

fw

fwee

BB

BBB

ξδ

ξδ

ξ

δ

∑=e

eeE

Bjδ

for an output neuron of B and GRB with no hidden layer

for an output neuron of B and GRB with hidden layers

for a hidden neuron of B

Page 94: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 94

Is the GREN-network an “expert”?

Has not to “know the right answer”But should “recognize the correct answer”

for an input pattern, yield the minimum error only for one actual output - the right one

Simple test for “problematic” GREN-experts:– zero-weights from the actual output– zero “y-terms” of potentials in the 1. hidden layer– “too many large negative weights” ∑∑ +− >>

ii

ii ww

By

Page 95: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 95

Find “better” input patterns!

input patterns of a GREN-network“similar” to those presented to and recalled bythe BP-networkwith a smaller error

minimize the error at the output of the GREN-network, e.g. by back-propagation adjust input patterns against the gradient of the GREN-network error function

Page 96: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 96

Avoid “problematic”GREN-networks!

Insensitive to the outputs of trained BP-networks– inadequately small error terms back-propagated by the

GREN-network

Incapable of training further BP-networks– small error terms even for large errors

Our goal: Increase the sensitivity of GREN-

networks to their inputs!

Page 97: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 97

How to handle thesensitivity of BP-networks?Increase their robustness:

over-fitting leads to functions with a lot of structureand a relatively high curvaturefavor “smoother” network functionsalternative formulation of the objective function

– penalizing large second-order derivatives of the networkfunction

– penalizing large second-order derivatives of the transferfunction for hidden neurons

– weight-decay regularizers

Page 98: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 98

Controlled learningof GREN-networks

Require GREN-networks sensitive to their inputs– non-zero error terms for incorrect BP-network outputs

Favor larger values of the error terms

Minimize during training2

,

,∑ ∑ ∑∑>

∂−==

q s nr qr

qs

q

REGq

REG

yy

EE

patterns output neurons controlledinput neurons

outputvalues

controlledinput values

Page 99: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 99

Weight adjustment rules

Regularization by means of

Rules similar to the standard Back-Propagation

∂∂

−∂

∂−=

∂∂

−=∆ ∑ ∑>s nr r

s

ijij

REG

ijE yy

wwEwREG

2

controlled inputs

output

( ) ( )( ) ( )( )1

1

−−+

+∆+∆+=+

TwTw

wwTwTwBB

REGBB

GRij

GRijm

ijEcijEGRij

GRij

α

αα

BP-weight adjustment

weight

controlled weight adjustment

moment

Page 100: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 100

Characteristics of the method

Applicable to any BP-network and/or input neuronQuicker training of “actual” BP-networks– larger “sensitivity terms” transfer better the

errors from the GREN-network

Oscillations during training “actual” BP-networks– due to the “linear” nature of the GREN-specified error

function

rs yy ∂∂ /

patterns

∑ ∑=p e

GRpe

BeE ,

output neurons ofthe GREN-network

output values of theGREN-network

Page 101: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 101

Modification of the methodUse “quadratic” GREN-specified error terms for training“actual” BP-networks

Considers both the GREN-network outputs and the“sensitivity” terms

Crucial for low sensitivity to erroneous training patterns

Bpj

GRpe ye B

,, / ∂∂

patterns

( )2,

ˆ ∑ ∑=p e

GRpe

BeE

output neurons of theGREN-network

output values of theGREN-network

BGRpee ,

Page 102: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 102

Supporting experimentsOutput of the standard BP-network

00.2

0.40.6

0.81x-coordinate 0

0.2

0.4

0.6

0.8

1

y-coordinate

0

0.2

0.4

0.6

0.8

1

1.2

network output

Output of the standard BP-network

x-coordinate

y-coordinate

network output

3000 cycles, SSE = 0.89

Output of the BP-network trained with a GREN-network

00.2

0.40.6

0.81x-coordinate 0

0.2

0.4

0.6

0.8

1

y-coordinate

0

0.2

0.4

0.6

0.8

1

1.2

network output

y-coordinatex-coordinate

network output

3000 cycles, SSE = 0.05

Output of the GREN-trained BP-network

Page 103: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 103

Supporting experiments

-1

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

BP-network output for a constant y=0.25

errorbars correspond to the GREN-error

-1

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

BP-network output for a constant y=0.25

errorbars correspond to the GREN-error

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

GREN-adjusted input/output patterns for a constant y=0.25

errorbars correspond to the GREN-error

initial I/O_pattern_1=[0,0.25,0.197]

initial I/O_pattern_2=[0.5,0.25,0.388]

initial I/O_pattern_3=[1,0.25,0.932]

I/O_pattern_1

I/O_pattern_2 I/O_pattern_3

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

GREN-adjusted input/output patterns for a constant y=0.25

errorbars correspond to the GREN-error

initial I/O_pattern_1=[0,0.25,0.197]

initial I/O_pattern_2=[0.5,0.25,0.388]

initial I/O_pattern_3=[1,0.25,0.932]

I/O_pattern_1

I/O_pattern_2 I/O_pattern_3

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

GREN-adjusted input/output patterns for a constant y=0.25

errorbars correspond to the GREN-error

initial I/O_pattern_1=[0,0.25,0.197]

initial I/O_pattern_2=[0.5,0.25,0.388]

initial I/O_pattern_3=[1,0.25,0.932]

I/O_pattern_1

I/O_pattern_2 I/O_pattern_3

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

GREN-adjusted input/output patterns for a constant y=0.25

errorbars correspond to the GREN-error

initial I/O_pattern_1=[0,0.25,0.197]

initial I/O_pattern_2=[0.5,0.25,0.388]

initial I/O_pattern_3=[1,0.25,0.932]

I/O_pattern_1

I/O_pattern_2 I/O_pattern_3

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

GREN-adjusted input/output patterns for a constant y=0.25

errorbars correspond to the GREN-error

initial I/O_pattern_1=[0,0.25,0.197]

initial I/O_pattern_2=[0.5,0.25,0.388]

initial I/O_pattern_3=[1,0.25,0.932]

I/O_pattern_1

I/O_pattern_2 I/O_pattern_3

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

GREN-adjusted input/output patterns for a constant y=0.25

errorbars correspond to the GREN-error

initial I/O_pattern_1=[0,0.25,0.197]

initial I/O_pattern_2=[0.5,0.25,0.388]

initial I/O_pattern_3=[1,0.25,0.932]

I/O_pattern_1

I/O_pattern_2 I/O_pattern_3

-0.5

0

0.5

1

1.5

2

-0.2 0 0.2 0.4 0.6 0.8 1 1.2

netw

ork

outp

ut

x-coordinate

GREN-adjusted input/output patterns for a constant y=0.25

errorbars correspond to the GREN-error

initial I/O_pattern_1=[0,0.25,0.197]

initial I/O_pattern_2=[0.5,0.25,0.388]

initial I/O_pattern_3=[1,0.25,0.932]

I/O_pattern_1

I/O_pattern_2 I/O_pattern_3

errorbars correspond to the GREN-error

errorbars correspond to the GREN-error

I/O pattern 1

I/O pattern 2 I/O pattern 3

GREN-adjusted input/output patterns for a constant y=0.25

y-co

ordi

nate

x-coordinate

BP-network output for a constant y=0.25

y-co

ordi

nate

x-coordinate

Page 104: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 104

Supporting experiments

Output of the standard BP-network (with 8 hidden neurons)

0

0.5

1

x-coordinate0

0.5

1

y-coordinate

0

0.5

1

network output

1500 cycles, SSE = 0.51

Output of the GREN-trained BP-network (with 8 hidden neurons)

0

0.5

1

x-coordinate0

0.5

1

y-coordinate

0

0.5

1

network output

1500 cycles, SSE = 0.06, GREN-error = 1.2

Page 105: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 105

Supporting experiments

0

2

4

0 2500 5000

network error

network sensitivity

cycles

SSE

0

2

0 2500 5000

network sensitivity

network error

SSE

cycles

Sensitivity and error for a standard BP-trained GREN-network

Sensitivity and error for a controlled-trained GREN-network (control rates = 0.2)

Page 106: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 106

Supporting experiments

Sensitivities and error for a controlled-trained GREN-network (control rates = 0.2)

Sensitivities and error for an over-trained GREN-network (control rates = 0.2)

-2

0

2

0 2500 5000

0

4

8

0 5000 10000 15000

SSESSE

cyclescycles

sensitivityto BP-outputsensitivity to BP-output

network sensitivity

network error

network error

Page 107: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 107

GREN-networks: conclusions

GREN-networks can train BP-networkswithout the knowledge of their desiredoutputsA simple detection of “problematic”GREN-expertsGREN-networks can find “similar” inputpatterns with a lower error

Page 108: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 108

Conclusions:Sensitivity of GREN-networks

Increase the sensitivity of trained GREN-networksto their inputsDetect “over-training” in GREN-networksTrain BP-networks more efficiently byminimizing squared GREN-network outputsinstead of the “linear” onesFurther research: simplified sensitivity control

Page 109: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Acoustic Emission andFeature Selection Based on

Sensitivity Analysis

with M. Chlada and Z. Převorovský, Institute of Thermomechanics, Academy of Science

Page 110: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 110

Acoustic Emission and FeatureSelection Based on Sensitivity Analysis

BP-networks and sensitivity analysis– larger “sensitivity terms” indicate higher

importance of the feature i

numerical experiments– acoustic emission:

classification of simulated AE data– feature selection:

reduction of original input parameters (from 14 to 6)model dependence between parameters

|| ij xy ∂∂

Page 111: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 111

Simulation of AE-data

0 10 20 30-1

-0.5

0

0.5

1WAVE 1

0 10 20 30-0.5

0

0.5

1

1.5WAVE 2

0 10 20 30-0.5

0

0.5

1

1.5WAVE 3

0 10 20 30-0.2

0

0.2

0.4

0.6

0.80.3*WAVE1 + 0.2*WAVE2 + 0.5*WAVE3

MODEL PULSES

Page 112: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 112

0 50 100 150 200-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08INP UT S IGNAL (a=0.3, b=0.2, c=0.5)

CONVOLUTION WITH THE GREEN FUNCTION

0 20 40 60 80 100 120 140 160 180 200-0.2

-0.15

-0.1

-0.05

0

0.05GREEN FUNCTION - 140mm

Simulation of AE-data

Page 113: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 113

Original Features for AE-signalsamplitude:rise timeeffective value (RMS)

energy moment:

mean value:

deviation:

asymmetry:

excess:

6 spectral parameters:

with

andfN is the Nyquist frequency

{ }FEDCBAXdkkf

dkkfP

G

XX ,,,,, ;

)(

)(∈=

∫∫

{ })(maxmax tzz Tt∈=

∫=T

dttzT

RMS )(1 2

∫ ⋅=TE dttztT )(2

( ) Tdttztt Ts ∫ ⋅= )(

( ) TdttztT s∫ −= 22 ))((σ

( ) 332 ))(( ση ∫ −=T s dttzt

( ) 442 ))(( σξ ∫ −=T s dttzt

[ ]( )[ ]( )[ ]( )[ ]( )[ ]( )

[ ]( )[ ]( )N

N

N

N

N

N

N

fGfF

fEfDfCfB

fA

2/1,02/1,6.0

2/6.0,48.02/48.0,36.02/36.0,24.02/24.0,12.0

2/12.0,0

=======

Page 114: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 114

Factor analysisfor input parameters

9 factors selected“explain” 98.4% of allvariables, e.g.

– higher energy of signalslead to higheramplitudes and RMS(parameters 1, 3, 4)

allow to reduce linearlydependent inputparameters

– in our case to: 2, 3, 5, 6,7, 8, 11, 12 and 2 newspectral parametersselected factors

inpu

t par

amet

ers

Page 115: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 115

0.1730.0930.3200.3010.5640.1960.0990.0650.0220.0530.0350.0390.0810.260

0.2660.0680.1930.1780.2500.3220.0630.0150.0140.0200.0120.0500.1340.172

0.1490.0470.1840.1960.2060.1580.0430.0300.0160.0120.0320.0220.0820.109

S ENS ITIVITY COEFFICIENTS

OUTP UTS

INP

UTS

1 2 3

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Sensitivity analysisof trained BP-networks

2000 samples– 500 training s.

14-27-19-3180 iterationsselected inputs:

– sensitivity analysis1, 3, 4, 5, 6, 13, 14

– + factor analysis 1, 3, 5, 6, 13, 14

new architecture:6-13-7-3 (even with slightlybetter MSE-results)

Page 116: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 116

0.32

0.04

0.02

0.16

0.16

0.92

0.05

0.14

0.08

0.03

0.98

0.05

0.04

0.04

0.03

0.35

SENSITIVITY COEFFICIENTS ... X4 = (X1)4

TARGET - PARAMETER

PAR

AME

TE

RS

1 2 3 4

1

2

3

4

-1 -0.5 0 0.5 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1X4 = (X1)4

Model dependence

Page 117: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 117

Model dependence

0.77

0.03

0.04

0.12

0.13

0.89

0.02

0.11

0.06

0.05

0.97

0.05

0.11

0.03

0.03

0.59

SENSITIVITY COEFFICIENTS ... X4 = s in(9*X1)

TARGET - PARAMETER

PAR

AME

TE

RS

1 2 3 4

1

2

3

4

-1 -0.5 0 0.5 1-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1X4 = s in(9*X1)

Page 118: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Knowledge extractionin neural networks(students’ questionnaire)

with Eva Poučková,Department of Software Engineering, Charles University Prague

Page 119: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 119

Knowledge representation in NN

Distributed!⌦Which inputs are the

most important ones?⌦What and how does the

network do?

I N P U T

O U T P U T

Page 120: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 120

Knowledge extraction in NN

Dimension reduction and sensitivity analysisfor inputsRule extraction from trained networks– Structural learning with forgetting

BP-networksGREN-networks

– Babsi-trees (B. Hammer et al.)GRLVQ

Page 121: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 121

Dimension reductionPCA: linear transformation of the input dataSensitivity analysis:Feature Subset Selection (FSS)Correlation-based Feature Selection (CFS):select a group of features with a highaverage correlation input_feature - outputbut with a low mutual correlation

Page 122: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 122

Dimension reduction: resultsPCA: method not suitable for further

processing – knowledge and rule extraction

FSS: 7 features CFS: 7 features

25 original features

8 features selected as a union of theresults for FSS and CFS

Page 123: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 123

Features selected forthe overall evaluation

Feature subset selection(FSS):(1) understandable subject(2) structured and prepared

presentations(3) interesting classes(4) quality of education(5) understandable classes(6) start/end of class on time(7) relationship to students

Correlation-based featureselection (CFS):(1) understandable subject(2) structured and prepared

presentations(3) interesting classes(4) quality of education(5) understandable classes(6) start/end of class on time(8) students prepare for

classes

Page 124: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 124

Methods for knowledge extraction

SLF – Structural learning with forgetting• Learning with forgetting• Learning with forcing internal representations on

hidden neurons• Learning with selective forgetting

Babsi-trees• Form a tree from a neural network trained by

means of the GRLVQ-method

Page 125: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 125

Generalized relevance learningvector quantization (GRLVQ)

a robust combination of GLVQ and RLVQprovides weighing factors (λ) for input features– larger λ corresponds to a “more important” feature

applicable to pruning of input featuresGLVQ: considers class representatives– separating surfaces approach the optimum Bayessian ones

RLVQ: input features can have different importance– relatively unstable, sensitive to noise

Page 126: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 126

Generalized LVQ - GLVQSelect a fixed number of representatives w1, …,wL for all classes Ci, i=1, …, q.Receptive field of the class representative wi

Receptive fields of class representatives shouldbe as small as possible!– minimize ; σ denotes the sigmoid

– and

{ }||||||:||| kii ikTR wxwxx −<−≠∀∈=

( )( )∑ ==

p

ktE

1)(xησ

( )||||||||||||||||

)()(

)()()(

−+

−+

−+−−−−

=wxwxwxwxx tt

tttη

correct classification wrong classification

Page 127: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 127

( )( )+

−+

−+ −

−+−−=∆ wx

wxwxwxw )(

2)()(

)(

|||||||||||´ t

tt

t

ασ

Generalized LVQ - GLVQWeight adjustment:

withand learning rates α

( )( )−

−+

+− −

−+−−

−=∆ wxwxwx

wxw )(2)()(

)(

|||||||||||´ t

tt

t

ασ

)))((1))((())´((´ )()()( ttt xxx ησησησσ −==

Page 128: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 128

( )( )( )

−+

=−−=∆

else0,max

2)()1(

)(2)()1()(

ijtt

i

jt

ijtt

iti

wxcdwx

i

i

ελ

ελλ

Relevance LVQ - RLVQInput features can have different importance λ

Receptive field of the class representative wi

Weight adjustment according to GLVQ with adaptiveimportance factors λ i for input features ( 0 < ε < 1 ):

{ }λλλ ||||||:||| jii ijTR wxwxx −<−≠∀∈=

∑∑ ===−=

n

i in

i iii wxdist1

21

2 1;)(),( λλλwx

Page 129: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 129

( )( )

( )( )

−+−−−

−+−−−=∆

−−+

+

+−+

2)(2)()(

)(

2)(2)()(

)()(

|||||||||||

|||||||||||´

it

itt

t

it

tt

tt

i

wx

wxi

wxwxwx

wxwxwxεσλ

Generalized Relevance LearningVector Quantization (GRLVQ)

Weight adjustment according to GLVQwith adaptive importance factors λ i forinput features:

Page 130: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 130

Babsi-treesRoot-trees G=(V,E) satisfying the following conditions:

all vertices vi∈V can have an arbitrary number ni of sonsall leaves vJ are labeled with the corresponding classificationclass CJ

all vertices vi which are not leafs are labeled with stands for the currently processed input dimension i dimensions are “ordered” according to their importance (λ)

All edges going from a vertex vi to its sons are labeled withintervals

interval boundaries are placed in the middle between twoneighboring cluster representatives

ivI

)ii vl

vk stst ,

ivI

Page 131: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 131

SLF for layered networks feed-forwardneural networks

GREN-networks

Page 132: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 132

Results for the SLF-method

Both BP-networks and GREN-trainednetworks lead to similar sets of rules:

Page 133: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 133

The resulting Babsi-treeRelevant dimensions (features): 4, 8 and 7 (4)

(8)

(7)

Overall evaluation

many patterns

just one pattern

Values for intervals: 1 (-∞,1.5)2 [1.5,2.5)3 [2.5,3.5)4 [3.5,4.5)5 [4.5,∞)

ambiguous classification

Page 134: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 134

Comparison of the resultsSLF

Few simple hierar-chically ordered rulesPossibility to add rulesafter achieving thedesired accuracyRule correctlyapplicable - 71%and 73%, resp.

Babsi-treesMany simple rulesQuick training of thenetworkFew trainingparametersRule correctlyapplicable - 67%

Page 135: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 135

Knowledge extraction: Conclusions

Main results achieved:• dimension reduction for the input space• analysis of various models for knowledge extraction• rule extraction from GREN-networks• comparison with other neural network models

Further research:• adjusting rules extracted from a neural network trained

with the GRLVQ-algorithm• (automatic) selection of training parameters for the

SLF-algorithm

Page 136: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 136

Genetic Algorithms (GA)Apply genetics and natural selection to findoptimal parameters of a predictive function!

GA use “genetic” operators to evolve successivegenerations of solutions:– selection– crossover– mutation

Best candidates “survive” to further generationsuntil convergence is achievedDirected data mining

Page 137: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 137

The basic Genetic AlgorithmStep 1:Create an initial population of individualsStep 2: Evaluate the fitness of all individuals in

the populationStep 3: Select candidates for the next generationStep 4: Create new individuals (use genetic

operators - crossover and mutation)Step 5: Form a new population by replacing

(some) old individuals by new onesGOTO Step 2

Page 138: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

ANTARESSTUDENT SOFTWARE PROJECTsupervised by I. Mrázová, F. Mráz

participating students:D. Bělonožník, D. J. Květoň, M. Šubert,

J. Tomaštík, J. Tupý

http://www.ms.mff.cuni.cz/~mraz/antares

Page 139: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 139

Project ANTARESGenerate melodies with genetic algorithms– the fitness of candidate solutions is evaluated by

the cooperating feed-forward neural networkParallel implementation of genetic algorithms– open system for the design and testing of genetic

algorithms and neural networks– supports mutual cooperation between neural

networks and genetic algorithms

Page 140: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 140

Fitness evaluationwith neural networksFor some problems, it might be difficult todefine explicitly the fitness function– e.g. „evaluate“ the beauty of generated melodies

Fitness of candidate melodies (generated byGA) is evaluated by the pre-trained NN:– provide a set of positive and negative examples

(supervised learning)– train a feed-forward network to approximate the

„unknown“ fitness function (on the training set)

Page 141: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 141

Generating melodies:positive training samples

Page 142: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Iveta Mrázová, ANNIE´03 142

Generating melodiesPositive training samples

Negative training samples

Test samples (with a high fitness value)

Generated melodies

Page 143: Intelligent Data Mining Techniquesktiml.ms.mff.cuni.cz/~mrazova/Tutor_IM.pdf · Iveta Mrázová, ANNIE´03 2 Content outline QIntelligent Data Mining: introduction and overview of

Thank you for your attention!