20
Meta-Association Rules for Fusing Regular Association Rules from Different Databases M.D. Ruiz, J. G´ omez-Romero, M.J. Martin-Bautista, D. anchez, M. Delgado 9th July 2014

Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rules for Fusing RegularAssociation Rules from Different Databases

M.D. Ruiz, J. Gomez-Romero, M.J. Martin-Bautista, D.Sanchez, M. Delgado

9th July 2014

Page 2: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Motivation

I Exponential growth of available data in Data Mining area.

I Datasets are often distributed.

I Datasets are processed separately (several mining processes arecarried out over data with similar meaning coming from a differentsource)

⇒ the extracted information should be fused in order to provide aunified and not overwhelming view to the user.

2

Page 3: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Motivation

Several problems arise when using association rule algorithms indistributed databases:

1. Obtaining rules from very large datasets can be difficult andtime-consuming.

• Parallel versions of rule mining algorithms, e.g. MapReduce

2. Handling with distributed databases with similar meaning anddifferent description, that they cannot be directly merged.

Solution:

Data Mining + Information Fusion

3

Page 4: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Overview

1. Example in Crime Data Analysis

2. ProposalBrief Introduction to Association RulesMeta-Association Rules

3. Algorithm and Implementation Issues

4. Experimental Evaluation

5. Discussion and Future Research

6. References

4

Page 5: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Example in Crime Data Analysis

I We want to study the crime incidents happened in the city ofChicago.

I Each district of the Chicago has its own dataset: D1, D2, . . . , Dk

some of them sharing some of their attributes.

I Association rule mining algorithms are executed separately in eachdistrict obtaining different sets of rules: R1, R2, . . . , Rk.

I There are several attributes concerning/describing some aspects ofthe districts: at1, at2, . . . , atm

Proposal:

Fusing this information by means of Meta-Association Rules

5

Page 6: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Proposal

…"R1# R2# Rk&1# Rk#

"Meta#database#

Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""

Meta&associa1on#rules#

6

Page 7: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Brief Introduction to Association Rules

I Data is usually stored in datasets D composed by transactions ti(rows) and attributes (columns).

I We call item to a pair 〈attribute, value〉 or 〈attribute, interval〉.

D i1 i2 . . . ij ij+1 . . . im

t1 1 0 . . . 0 1 . . . 0t2 0 1 . . . 1 1 . . . 1...

......

. . ....

.... . .

...tn 1 1 . . . 0 1 . . . 1

I Association Rules are expressions of the form A→ B where A, Bare non-empty set of items with no intersection.

I An association rule represents a relation between the jointco-occurrence of A and B.

7

Page 8: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Brief Introduction to Association Rules

I The support of an itemset A is defined as probability that atransaction contains the item

supp(A) =|t ∈ D : A ⊆ t|

|D|

I For assessing the ARs validity, the most common measures aresupport (joint probability P (A ∪B)) and confidence (conditionalprobability P (B|A)

Supp(A→ B) =supp(A ∪B)

|D|; Conf(A→ B) =

supp(A ∪B)

supp (A)

that must be ≥ minsupp and ≥ minconf resp. (thresholdsimposed by the user), that is, the rule is frequent and confident.

8

Page 9: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Brief Introduction to Association Rules

I An alternative framework is to measure the accuracy by means ofthe certainty factor, CF (A→ B)

Conf(A→ B)− supp(B)

1− supp(B)if Conf(A→ B) > supp(B)

Conf(A→ B)− supp(B)

supp(B)if Conf(A→ B) < supp(B)

0 otherwise.

I CF measures how our belief that B is in a transaction changes whenwe are told that A is in that transaction.

I Certainty factor has better properties than confidence and otherquality measures, in particular, it helps to reduce the number ofrules obtained by filtering those rules corresponding to statisticalindependence or negative dependence.

I When CF (A→ B) ≥ minCF the rule is called certain.

9

Page 10: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rules

Meta-association rules are association rules where theantecedent or the consequent can contain regular rules that have

been previously extracted with a high reliability in a highpercentage of the source databases.

10

Page 11: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Proposal

…"R1# R2# Rk&1# Rk#

"Meta#database#

Rules"r1,"r2,"…,"rn"|"Addi1onal"a4ributes:"at1,"…,"atm""

Meta&associa1on#rules#

11

Page 12: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Algorithm and Implementation Issues

1. From each database a set of rules Ri is obtained.

2. We compile these rules in a new database D joint with theattributes at1, . . . , atm.

D r1 r2 · · · rn at1 · · · atmD1 1 1 · · · 0 1 · · · 1D2 0 1 · · · 0 0 · · · 1

......

.... . .

......

. . ....

Dk 1 0 · · · 1 1 · · · 0

3. This information is fused by finding meta-association rules(involving the rules previously extracted r1, . . . , rn and theattributes at1, . . . , atm).

12

Page 13: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rules

Formally, we will obtain three types of meta-association rules:

I ri → rj where ri, rj can be rules or a conjunction of rules.For example: ri = ri1 ∧ · · · ∧ ris.

I ati → atj where ati, atj can be attributes or a conjunction ofattributes.

I ri → atj or atj → ri where ri, atj can be a conjunction ofrules and a conjunction of attributes resp., and they can bemixed.

13

Page 14: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Meta-Association Rule Mining Algorithm

Input: D1, . . . , Dk, minsupp, minCFOutput: MR (set of meta-association rules)1: for all Di such that 1 ≤ i ≤ k do2: # Di preprocessing3: Read Di and store the items I4: Transform Di into a boolean database5: Store database into a vector of BitSets

6: # Mine very strong rules7: Compute the candidate set C of frequent itemsets Supp(X) ≥ minsupp8: Store the BitSet vector indexes of X ∈ C and Supp(X)9: Compose the rule with X,Y ∈ C10: if Supp(X ⇒ Y ) ≥ minsupp and CF (X ⇒ Y ) ≥ minCF then11: The rule is a very strong rule12: end if13: end for14: # D creation

15: Compile all different rules from R1, . . . , Rk

16: Create D using compiled rules and additional attributes17: # Mining meta-association rules18: Repeat steps 1-13 to mine meta-association rules from D

14

Page 15: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Experimental Evaluation: DataSet Description

I 22 Databases about crime related to the districts in the city ofChicago

I Number of transactions: min = 5694 and max = 22493.

I 6 types of attributes (around 300 items) in each database:

• Quarter of the year in which the incident happened.• Day period: morning, afternoon, evening, night.• Crime description according to police standard protocols.• Location description: street, residence, etc.• Arrest, if there is an arrest associated to the crime.• Domestic, if the crime happened in a domestic environment.

I Additional attributes about the districts:

• Number of students in the district: low, medium, high, veryhigh.

• Number of misconducts notified in the district: low, very low,medium, high, very high.

• Perceived safety index, obtained by means of surveys: low,medium, high.

15

Page 16: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Experimental Evaluation: Some Results

Example of obtained meta-association rule:

“IF (Crime-Description=$500 under → Arrest=false)AND

(Location-Description=RESIDENCE → Arrest=false)

THEN Safety-Index=High”

with Supp = 0.136 and CF = 1.

That means that it is frequent to have a high perception of security when

there are crimes of minor relevance without arrests in residential areas.

16

Page 17: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Experimental Evaluation: Some Results

Another example of obtained meta-association rule:

“IF Safety-Index=Medium

THEN(Location-Description=STREET →Domestic=false)

ANDNumber-of-Students=Very High”

with Supp = 0.136 and CF = 0.511.

Interpretation: In some districts (13.6%) a higher safety perception

(medium) is frequently associated to the fact that crimes are happening

in the streets and the number of students in the district is very high.

17

Page 18: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Discussion and Future Research

We have identified several problems or deficiencies of our approachthat can be improved.

I We have taken into account the presence/absence of a rule in D.

• It would be convenient to consider the degree of importance ofthe rule

Future: Improvement taking into account fuzzy association rules.

I The databases considered have the same structure.

• It would be convenient to address the problem of havingdatasets with different structure or different attributedescriptions but very similar meaning.

Future: Using a knowledge repository assisting the algorithm inmatching items with the same meaning.

18

Page 19: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

References

[Sanchez et al.] D. Sanchez, M.A. Vila, L. Cerda, and J.M. Serrano.Association rules applied to credit card fraud detection. ExpertSystems with Applications, 36:3630-3640, 2009.[Delgado et al.] M. Delgado, M.D. Ruiz, and D. Sanchez. Studyinginterest measures for association rules through a logical model. Int.J. of Uncertainty, Fuzziness and Knowledge-Based Systems,18(1):87-106, 2010.[Ruiz et al.] M.D. Ruiz, M.J. Martin-Bautista, D. Snchez, M.A. Vila, andM. Delgado. Anomaly detection using fuzzy association rules. Int.J. Electronic Security and Digital Forensics, 6(1):25-37, 2014.

19

Page 20: Meta-Association Rules for Fusing Regular Association ...decsai.ugr.es/~mdruiz/2014_Fusion-pres.pdf · 9th July 2014. Motivation I Exponential growth of available data in Data Mining

Thank you. Any questions?

20