35
Introduction Association rule mining Artificial bee colony optimization AR mining with ABCO What’s next References 21 st INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS AUGUST 19-22, 2014 – Geneva, Switzerland An association rule miner for unbalanced data based on artificial bee colony optimization Goals and Methodology Emmanuel Rousseaux Gilbert Ritschard Institute of Service Science Institute for Demographic and Life Course Studies [email protected] [email protected] University of Geneva Switzerland

An association rule miner for unbalanced data based on artificial bee colony optimization

  • Upload
    gfri

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

21st INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICSAUGUST 19-22, 2014 – Geneva, Switzerland

An association rule miner for unbalanced databased on artificial bee colony optimization

Goals and Methodology

Emmanuel Rousseaux Gilbert RitschardInstitute of Service Science Institute for Demographic and Life Course Studies

[email protected] [email protected]

University of GenevaSwitzerland

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Outline

IntroductionAssociation rule miningArtificial bee colony optimizationAR mining with ABCOWhat’s next

Rousseaux and Ritschard – University of Geneva – 1/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Motivation

NCCR LIVES“Overcoming vulnerability: life course perspectives”

Methodological team“Measuring life sequences and the disorder of lives” ruled by

Gilbert Ritschard

My contributionData mining approaches for the discovery of critical events in life

courses

Rousseaux and Ritschard – University of Geneva – 2/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Temporal association rules

We want to discover rules: A ⇒ BI If I experience A, then I often experience BI If I’m older than 50 with a low educational level and I lose my job, then I may

fall in long-term unemployment.

I At1 ⇒ Bt2

Rousseaux and Ritschard – University of Geneva – 3/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Temporal association rules

We want to discover rules: A ⇒ BI If I experience A, then I often experience BI If I’m older than 50 with a low educational level and I lose my job, then I may

fall in long-term unemployment.

I At1 ⇒ Bt2

Rousseaux and Ritschard – University of Geneva – 3/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Exclusion rules

Then we may want to find exclusions to these rulesI At1 ∧ Z t2 ⇒ B̄t3

I If I experience A but I experience Z too; I won’texperience B

And look on the whole life courseI At1

work traj . ∧ Bt2family traj . ⇒ C t3

health traj .

Rousseaux and Ritschard – University of Geneva – 4/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Exclusion rules

Then we may want to find exclusions to these rulesI At1 ∧ Z t2 ⇒ B̄t3

I If I experience A but I experience Z too; I won’texperience B

And look on the whole life courseI At1

work traj . ∧ Bt2family traj . ⇒ C t3

health traj .

Rousseaux and Ritschard – University of Geneva – 4/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Exclusion rules

Then we may want to find exclusions to these rulesI At1 ∧ Z t2 ⇒ B̄t3

I If I experience A but I experience Z too; I won’texperience B

And look on the whole life courseI At1

work traj . ∧ Bt2family traj . ⇒ C t3

health traj .

Rousseaux and Ritschard – University of Geneva – 4/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Outline

IntroductionAssociation rule miningArtificial bee colony optimizationAR mining with ABCOWhat’s next

Rousseaux and Ritschard – University of Geneva – 5/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Association rule: Definition

Let:I I = {I1, I2, . . . , Ip} a set of binary attributesI T a database of n observations on I .

An association rule is given by (Agrawal, Imieliński, and Swami, 1993)

I X ⊂ I , Y ⊂ I , X ∩ Y = ∅I A direction: X ⇒ Y

Rousseaux and Ritschard – University of Geneva – 6/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Association rule: Quality assessment

More than 50 criterias...I Support: supp(A⇒ B) = nA∪B/n

I Confidence: conf(A⇒ B) = nA∪B/nA ≈ P(B|A)

I Lift: lift(A⇒ B) = supp(A⇒ B)/(supp(A) ∗ supp(B)) ≈ P(B|A)/P(B)

I Statistical criterias: chi-squared, implicative intensity, etc.

I ...

Rousseaux and Ritschard – University of Geneva – 7/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Application to life-course mining

Specific features

I Temporality (Srikant and Agrawal, 1996; Harms and Deogun, 2004)

I Gap threshold between eventsI Ordering

I Positive and negative rules (Swesi, Bakar, and Kadir, 2012)

Rousseaux and Ritschard – University of Geneva – 8/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Frequent pattern mining

Exp(p) search spaceI Limit exploration: support threshold

I Classical algorithms:

I Apriori (Agrawal, Imieliński, andSwami, 1993)

I FP-growth (Han, Pei, and Yin,2000)

I Extract too many rules, most of them

are useless

I Impossible to discover rare rules

Rousseaux and Ritschard – University of Geneva – 9/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Mining rare classes

Recent approachesI Multiple support thresholds (Liu, Hsu, and Ma, 1999)

I Particle swarm optimization (Sarath and Ravi, 2013)

I Genetic algorithm (Ghosh and Nath, 2004; Salleb-Aouissi, Vrain,

and Nortet, 2007)

I Proposal: Use an artificial bee colony algorithm

Rousseaux and Ritschard – University of Geneva – 10/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Mining rare classes

Recent approachesI Multiple support thresholds (Liu, Hsu, and Ma, 1999)

I Particle swarm optimization (Sarath and Ravi, 2013)

I Genetic algorithm (Ghosh and Nath, 2004; Salleb-Aouissi, Vrain,

and Nortet, 2007)

I Proposal: Use an artificial bee colony algorithm

Rousseaux and Ritschard – University of Geneva – 10/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Outline

IntroductionAssociation rule miningArtificial bee colony optimizationAR mining with ABCOWhat’s next

Rousseaux and Ritschard – University of Geneva – 11/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Artificial bee colony algorithm

Recent Evolutionary Population-based stochasticoptimization algorithm(Karaboga and Basturk, 2007; Karaboga and Basturk, 2008)

I Based on the foraging ability of beesI Originally design for optimization in Rd

I Adapted for different problems (scheduling, feature

selection, non-linear equation, clustering, classification)

(Karaboga, Gorkemli, et al., 2014)

Rousseaux and Ritschard – University of Geneva – 12/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Artificial bee colony algorithm

Recent Evolutionary Population-based stochasticoptimization algorithm(Karaboga and Basturk, 2007; Karaboga and Basturk, 2008)

I Based on the foraging ability of beesI Originally design for optimization in Rd

I Adapted for different problems (scheduling, feature

selection, non-linear equation, clustering, classification)

(Karaboga, Gorkemli, et al., 2014)

Rousseaux and Ritschard – University of Geneva – 12/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Example

Cost function: f (x) = x2

Search space: R

fitness(x) =1

1 + f (x) −2.5

0.0

2.5

5.0

7.5

10.0

−5.0 −2.5 0.0 2.5 5.0Search space

Util

ity

Figure : A (too much) easy functionto minimize

Rousseaux and Ritschard – University of Geneva – 13/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Methodology

Parameters

I Let S = (s1, c1; . . . ; sN , cN) N food sources

I N employed bees and N onlooker bees

I L limit to the number of try per source: ci < L

Initialization

I Each source is randomly initialized

I Counter ci are all equal to 0

−2.5

0.0

2.5

5.0

7.5

10.0

−5.0 −2.5 0.0 2.5 5.0Search space

Util

ity

Rousseaux and Ritschard – University of Geneva – 14/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Methodology

Employed bee phase

I Update solutions using Eq. 1

I Calculate fitness values of new solutions

I Keep a new solution when better, increment its

counter otherwise

vi = xi + ri(xi − xk), i 6= k (1)−2.5

0.0

2.5

5.0

7.5

10.0

−5.0 −2.5 0.0 2.5 5.0Search space

Util

ity

Rousseaux and Ritschard – University of Geneva – 15/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Methodology

Onlooker bee phase

I Calculate selection probability by using Eq. 2

I Select an employed bee and update its solution by

using Eq. 1

I Keep new solutions when better, increment counter

otherwise

pi =fitness(xi)∑N

k=1 fitness(xk)(2)

−2.5

0.0

2.5

5.0

7.5

10.0

−5.0 −2.5 0.0 2.5 5.0Search space

Util

ity

Rousseaux and Ritschard – University of Geneva – 16/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Methodology

Scout bee phase

I Select an abandonment counter having the highest

value

I If higher than L, generate a new source for the

employed bee by using Eq. 1−2.5

0.0

2.5

5.0

7.5

10.0

−5.0 −2.5 0.0 2.5 5.0Search space

Util

ity

Rousseaux and Ritschard – University of Geneva – 17/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Outline

IntroductionAssociation rule miningArtificial bee colony optimizationAR mining with ABCOWhat’s next

Rousseaux and Ritschard – University of Geneva – 18/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Data transformation

Let T be a database with p attributes

I Binary variable: do nothingI Categorical variable: one binary variable perclass

I Quantitative variable: discretization, thenbinarization

Result: each individual is a pattern in {0, 1}Np .

Rousseaux and Ritschard – University of Geneva – 19/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Coding of a rule

Let: A = 0010 and B = 0101, thenI A ∪ B =0111

Then a rule can be coded:I A⇒ B = A ∪ B + is.conclusion = 01110101

This ensure A ∩ B = ∅

Rousseaux and Ritschard – University of Geneva – 20/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Binary optimization with ABC

Initialization: For N sources

I We have to generate N patterns in {0, 1}2Np .I Si = Si1Si1S2Np

I We can use a Bernouilli process:

Sij realization of Xij ∼ Bernouilli(p0)

Rousseaux and Ritschard – University of Geneva – 21/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Binary optimization with ABCO

Very long sequences ⇒ Need a low complexity candidategeneration method

For each dimension (Kiran and Gunduz, 2013)

vi = xi +ri(xi−xk) becomes Vi = Xi⊕[Ri⊗(Xi⊕Xk)], i 6= k

with

I ⊗ = AND

I ⊕ = XOR

I Ri a realization of the logic NOT gate with 50% probability

Rousseaux and Ritschard – University of Geneva – 22/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Fitness functionThe fitness function has to be fast to compute

We use:I A support filter SFθ0 : 1 if support(A⇒ B) >= θ0, 0 otherwise

I lift(A⇒ B)

I conviction(A⇒ B) = 1/lift(A⇒ B̄)

fitness(A⇒ B) = (SFθ0 . lift . conviction)(A⇒ B)

Rousseaux and Ritschard – University of Geneva – 23/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Rule generation

Generation of a set of M rules

I Each run generate a single rule: the best solution of the run

I We iterate enough to get M different rules

I Ensemble scheme: repeat the generation ` times and keep the M most frequents

Rousseaux and Ritschard – University of Geneva – 24/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Outline

IntroductionAssociation rule miningArtificial bee colony optimizationAR mining with ABCOWhat’s next

Rousseaux and Ritschard – University of Geneva – 25/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

What’s next

I Better handling ofI TimeI Negative itemsI Quantitative covariates

I Development in processI Available in R, surely bundled in a packageI http://emmanuel.rousseaux.me/I Experimental assessmentI Visualization

Rousseaux and Ritschard – University of Geneva – 26/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Selected referencesAgrawal, Rakesh, Tomasz Imieliński, and Arun Swami (1993). “Mining association

rules between sets of items in large databases”. In: ACM SIGMOD Record.Vol. 22. 2. ACM, pp. 207–216.

Ghosh, Ashish and Bhabesh Nath (2004). “Multi-objective rule mining using geneticalgorithms”. In: Information Sciences 163.1, pp. 123–133.

Han, Jiawei, Jian Pei, and Yiwen Yin (2000). “Mining frequent patterns withoutcandidate generation”. In: ACM SIGMOD Record. Vol. 29. 2. ACM, pp. 1–12.

Harms, Sherri K and Jitender S Deogun (2004). “Sequential association rule miningwith time lags”. In: Journal of Intelligent Information Systems 22.1, pp. 7–22.

Karaboga, Dervis and Bahriye Basturk (2007). “A powerful and efficient algorithm fornumerical function optimization: artificial bee colony (ABC) algorithm”. In:Journal of global optimization 39.3, pp. 459–471.

— (2008). “On the performance of artificial bee colony (ABC) algorithm”. In:Applied soft computing 8.1, pp. 687–697.

Karaboga, Dervis, Beyza Gorkemli, et al. (2014). “A comprehensive survey: artificialbee colony (ABC) algorithm and applications”. In: Artificial Intelligence Review42.1, pp. 21–57.

Rousseaux and Ritschard – University of Geneva – 27/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Kiran, Mustafa Servet and Mesut Gunduz (2013). “XOR-based artificial bee colonyalgorithm for binary optimization”. In: Turkish Journal of Electrical Engineering &Computer Sciences 21.Sup. 2, pp. 2307–2328.

Liu, Bing, Wynne Hsu, and Yiming Ma (1999). “Mining association rules with multipleminimum supports”. In: Proceedings of the fifth ACM SIGKDD internationalconference on Knowledge discovery and data mining. ACM, pp. 337–341.

Salleb-Aouissi, Ansaf, Christel Vrain, and Cyril Nortet (2007). “QuantMiner: A GeneticAlgorithm for Mining Quantitative Association Rules.” In: IJCAI. Vol. 7.

Sarath, KNVD and Vadlamani Ravi (2013). “Association rule mining using binaryparticle swarm optimization”. In: Engineering Applications of Artificial Intelligence26.8, pp. 1832–1840.

Srikant, Ramakrishnan and Rakesh Agrawal (1996). Mining sequential patterns:Generalizations and performance improvements. Springer.

Swesi, Idheba Mohamad Ali O, Azuraliza Abu Bakar, and Anis Suhailis Abdul Kadir(2012). “Mining positive and Negative Association Rules from interesting frequentand infrequent itemsets”. In: Fuzzy Systems and Knowledge Discovery (FSKD),2012 9th International Conference on. IEEE, pp. 650–655.

Rousseaux and Ritschard – University of Geneva – 28/29

IntroductionAssociation rule mining

Artificial bee colony optimizationAR mining with ABCO

What’s nextReferences

Thank you for your attention

Questions/remarks?

Rousseaux and Ritschard – University of Geneva – 29/29