38
click to edit master text • Click to edit Master text styles • Second level • Third level • Fourth level • Fifth level Artific ial Immune Systems Dr Uwe Aickelin A Recommender System based on the Immune Network • The Recommendation Problem • The AIS Approach • Algorithm Walkthrough • Results and Discussion

Click to edit master text Click to edit Master text styles Second level Third level Fourth level Fifth level Artificial Immune Systems Dr Uwe Aickelin

  • View
    235

  • Download
    0

Embed Size (px)

Citation preview

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Dr Uwe Aickelin

A Recommender System based on the Immune Network

• The Recommendation Problem

• The AIS Approach

• Algorithm Walkthrough

• Results and Discussion

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

The Recommendation Problem “What movies would you predict/recommend?”

Prediction

What rating would I give this film?

Prediction quality can be assessed by absolute error

Recommendation

Give me a ‘top 10’ list of films I might like

Recommendation quality can be assessed by a

ranking ‘discordance’ metric

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

vsInnateAcquire

d

vsCell Mediated

Humoral

T Cell (CD-4, Helper)Binds to MHC-antigen

complexSecretes cytokines to

help…

How do we protect the body against infection? (Antigens)

B CellSecretes

Antibody which binds to antigen

and recruits phagocytes (innate)

T Cell (CD-8, Killer)

Kills cell (viruses)

The Biological Immune System

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

EachMovie database User profiles (3M votes 70k users)

User Profile: set of tuples {movie, rating}

Me: My user profile

Neighbour: User profile of someone else

Similarity metric: Correlation score between user profiles

Neighbourhood: Group of neighbours similar to me

Recommendations: generated from neighbourhood

The Recommendation Problem

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

EachMovie database User profiles

User Profile: set of tuples {movie, rating}

Me: My user profile

Neighbour: User profile of someone else

Similarity metric: Correlation score between user profiles

Neighbourhood: Group of neighbours similar to me

Recommendations: generated from neighbourhood

The AIS Approach

Antigen

Antibody

Antibody – Antigen Binding Antibody – Antibody Binding

Group of antibodies similar to antigen and dissimilar to other antibodies

Stim

ulat

ion Suppression

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Start with empty AIS

Encode target user as an antigen Ag

WHILE (AIS not full) && (More users) DO

Add next user as an antibody Ab

IF (AIS at full size)

Iterate AIS

FI

OD

Generate recommendations from AIS

The AIS Algorithm

Ab4

Ab1

Ab3

Ab2

Ag

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Algorithm walkthrough: Encoding

DATABASE

u1={(m1,v11),(m2,v12),(m3,v13)}

u2={(m1,v21),(m2,v22),(m3,v23),(m4,v24)}

u3={(m1,v31),(m2,v32),(m4,v34)}

u4={(m1,v41),(m4,v44)}

u5={(m1,v51),(m2,v52),(m3,v53), (m4,v54)}

• We do not have user votes for every film

• We want to predict the vote of user u4 on movie m3

Suppose we have 5 users and 4 movies

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Algorithm walkthrough (1)

Start with empty AIS

DATABASE

u1, u2, u3, u4, u5

AIS

Encode user for whom to make predictions as an antigen Ag

DATABASE

u1, u2, u3, u4, u5

u4AIS

Ag

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Algorithm walkthrough (2)

Add antibodies until AIS is full…

Ab1

DATABASE

u1, u2, u3, u4, u5

u1

AIS

Ag

Add next user as an antibody Ab1

Add users 2 and 3 …

DATABASE

u1, u2, u3, u4, u5

u2,u

3

AIS

Ag

Ab1 Ab2

Ab3

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Algorithm walkthrough (3)

After some more iterations… the AIS has filled up:Table of matching Scores between Ab and Ag

MS14, MS24, MS34

Table of matching Scores between Antibodies

MS12 = CorrelCoef(Ab1, Ab2)

MS13 = CorrelCoef(Ab1, Ab3)

MS23 = CorrelCoef(Ab2, Ab3)

Ab2

Ab3

Ab1

Ag

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Algorithm walkthrough (4)

AIS is now at full size so begin iterations…

Ab1

Ab1 Ab2

Ab2

Ab2

Ab2

Ab2

Ag

Ab1 Ab2

Ab3

AIS

Ag

AIS

Notice that antibody 3 has been eliminated.

Calculate new CONCENTRATION for each Ab, considering interactions with Ag (STIMULATION) and other Ab (SUPPRESSION)

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Algorithm walkthrough (5)

If AIS not yet full and more users available, repeat.

Otherwise: GENERATE RECOMMENDATION from CONCENTRATION and ANTIGEN Correlation.

Recommendation for user u4 on movie m3 will be highly based on vote on m3 of user u2

AIS

Ab1

Ab2

Ab1

AgAb2

Ab2

Ab2

Ab2

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

• Tested against EachMovie database (15000 users, 1628 films)

• Results compared to standard method (Pearson k-nearest neighbours)

• Prediction : Results of same quality

• Recommendation: Improved results, 4 out of 5 films correct versus 3 out of 5.

Results

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

1. Stimulation and suppression affect neighbourhood size and number of users

looked at

0

10

20

30

40

50

60

70

80

90

100

0 0.2 0.4 0.6 0.8 1

Stimulation Rate

Ne

igh

bo

urh

oo

d

0

5000

10000

15000

0 0.2 0.4 0.6 0.8 1

Stimulation Rate

# u

sers

0

10

20

30

40

50

60

70

80

90

100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Suppression rate

Nei

gh

bo

urh

oo

d s

ize Rate 0.2

Rate 0.3

Rate 0.5

0

2000

4000

6000

8000

10000

12000

14000

16000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Suppression Rate

Nu

mb

er r

evie

wer

s

Rate 0.2Rate 0.3Rate 0.5

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Stimulation Rate

Mean

Ab

so

lute

Err

or AIS (av)

SP (av)SP baseline

2. AIS matches Pearson for prediction

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

3. AIS surpasses Pearson for Recommendation

0.35

0.4

0.45

0.5

0.55

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Stimulation Rate

Re

co

mm

en

da

tio

n

Ac

cu

rac

y (

Ke

nd

all's

Ta

u)

AIS (av)

SP (av)

SP Baseline

80.0%

90.0%

100.0%

110.0%

120.0%

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Suppression rate

Re

lati

ve

Re

co

mm

en

da

tio

n

ac

cu

rac

y

Rate 0.2

Rate 0.3

Rate 0.5

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

General purpose recommendation tool (e.g. Bookmarks)

Collaborative Filtering is a useful vehicle for examination of AIS dynamics: - Idiotypic effect for more varied population - Potential for distribution - Smaller neighbourhoods (vs computational cost)

Wider applicability (e.g. online community formation)

Evaluation

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Idiotypic effects alter nature of community

How important is diversity?

Are there other network effects that can be used? (hubs, routers etc)

Distribution: the snowball effect

What about interacting communities?

Application areas: ad-hoc community formation, knowledge management, P2P routing…

Speculation: online community formation

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Change detection (Checksums)

‘Self’ : files, network traffic, system calls

Antibodies creation: positive vs negative selection

Collaboration between different populations/sites

Representation: binary string or symbolic (rules)

Other IS features:activation thresholds (vs false positives)co-stimulation (vs false positives)memory detectors (secondary response)MHC masks to cover ‘holes’ (similar to self)

AIS for Security

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Example: Hofmeyr & Forrest 2000

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Evaluation Applied to network intrusion, virus detection…

Good results on test systems

BUT…

Negative Selection doesn’t scale

Inefficient to map entire non self universe

Changes over time

Appropriate representation of self

Appropriate matching

Primary response requires infection?

AIS for Security

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Traditional Self - Non Self Distinction

• An immune response is triggered when the body encounters something foreign.

• The difference between self and non-self is learnt early in life.

• E.g. eliminate those T- and B-cells that react to self.

• Problems:

• No reaction to foreign bacteria in gut

• No reaction to food we eat

• The human body changes over its life

• Auto-immune diseases

• Tumours / Transplants

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

The Danger Theory

• Need for discrimination: What should be responded to?

• Respond to Danger not to “foreignness”.

• No need to attack everything that is foreign.

• Danger is measured by damage / distress signals.

Advantages:

• Can take care of non-self but harmless

• Can take care of self but harmful

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Danger Model Conclusions

• Self-Nonself discrimination still useful.

• Nonself does not cause immune response.

• Danger Signals trigger immune response.

• A question of semantics?

• Can this model help us build an AIS for security applications?

• What would be ‘danger signals’?

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Discussion

Uwe Aickelin: http://www.aickelin.com/

Steve Cayzer: http://www-uk.hpl.hp.co.uk/people/steve_cayzer/

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Additional Slides

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

AIS Models - Idiotypic

Antibody

Antigen

Antibody

Epitope

Paratope

Farmer et al 1986

• Paratope/Epitopes

Lock and Key

Interchangeable?

• Behaviour

Matching

Idiotypic (Memory, auto-immune)

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Jerne’s Big Idea (1974)

Idiotype: specificity of antibody (epitopes to which it will bind)

Idiotope: An idiotypic epitope

Evidence: Antibodies produced againstantibodies of same species (cf individual)

Antigen

P1I1

P2 I2

Idiotypic Set

P3 I3

Anti-Idiotypic Set

Internal Image of Antigen

+-

AIS Models - Idiotypic

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

In Words…

The idiotypic network hypothesis (Jerne 1974) builds on the recognition that antibodies can match other antibodies as well as antigens. A group of antibodies, which match an antigen, may be matched by other antibodies which may in turn be matched by yet other antibodies. This stimulatory effect will set up activation chains or loops.Matched antibodies are suppressed, and this effect will encourage diversity

In Formulae…

AIS Models - Idiotypic

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

dt

dxi

recognised

antigens

dt

dxi

recognised

antibodies

recognised

antigens

dt

dxi

recognised

antigens

recognised

amI

recognised

antibodiesc

dt

dxi

recognised

amI

recognised

antibodies

recognised

antigens

dt

dxi

rate

death

recognised

antigens

recognised

amI

recognised

antibodiesc

dt

dxi

kjiij

i

n

jjiji

N

j

N

jjiijjiji

i

snpkneGm

xkyxmxxmkxxmc

rate

death

recognised

antigens

recognised

amI

recognised

antibodiesc

dt

dx

1

211 1

1 i

n

jjiji

N

j

N

jjiijjiji

i

xkyxmxxmkxxmc

rate

death

recognised

antigens

recognised

amI

recognised

antibodiesc

dt

dx

211 1

1

• For N antibodies, n antigens.

• xi is the concentration of antibody i

• p and e stand for ‘paratope’ and ‘epitope.’

s is the matching threshold.

G is a rectifier function which outputs 0 for all negative input.

k is the allowable overlap

AIS Models - Idiotypic

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Simple user comparisons (Pearson, cosine, k-Nearest Neighbour) Problems: Sparsity, curse of dimensionality Memory vs Model based approaches Transformative and Transitive functions Default votes, Content based, Learning algorithms Challenge of distribution (vs centralization)

Recommendation Approaches

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

System Description: Encoding

nn scoreidscoreidscoreidUser ,...,,, 2211

Users are represented as a set of tuples which represent their votes:

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

We use the Pearson correlation measure

System Description: Matching

n

i

n

iii

n

iii

vvuu

vvuur

1 1

22

1

)(,

__,0

__,0

1 1

22

penaltyoverlapPwhererP

nrPnif

DEFAULTVARIANCEZEROrvvuuif

DEFAULTOVERLAPNOrnifn

i

n

iii

The measure is amended as follows

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Parameters: Matching Function

Parameter Value Comments Minimum expected overlap

5 Minimum expected overlap between 2 users (used to calculate penalty)

Zero Variance Default

0.0 Correlation score when users have overlaps with zero variance

Use minimum expected overlap

True Use minimum overlap (i.e. penalise users with less than expected overlap)

No Overlap Default

0.0 Correlation score if 2 users have no overlapping items

Use mean of overlap

False Use mean of overlapping votes only for correlation (otherwise use mean over all votes of user)

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Parameters: AIS

Parameter Value Comments Suppression Rate

0.001 Suppression constant (weighting on antibody-antibody suppression term)

Rate Constant 0.25 for single, 1 for idiotypic

Rate constant, applied to each match calculation. Between 0 and 1

Stimulation Rate 0 Stimulation constant (weighting on antibody-antibody stimulation term)

Death Rate 0.1 Death Rate of antibodies (ie % that dies off per unit time)

Maximum Concentration

100.0 Maximum concentration of antibody or antigen in this AIS

Minimum Concentration

0 Minimum concentration of antibody or antigen in this AIS

Initial Concentration

1.0 Initial concentration of antibody or antigen in this AIS

Use Concentration

True Should we use concentration to weight stimulation and suppression?

Use Absolute True Should we use absolute match score for weighting (hence negative correlations are treated as valuable)

Synchronous True Should we apply concentration changes synchronously (in batch)

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

We predict a rating by using a weighted average over the neighbourhood of a user:

System Description: Prediction

ionvotednothasvifVOTEDEFAULTv

absolutenotrelativenoterw

w

vvwup

i

uv

Nvuv

Nviuv

i

_

)(

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

Parameters: Prediction

Parameter Value Comments Cluster Size 50-100 Cluster size (AIS size) Should be >=

neighbourhood size Build From Scratch

BOTH Do we build AIS from scratch for each prediction or start with one massive AIS?

Use Default Votes

True Should we use a default vote for prediction purposes?

Neighbourhood size

30-50 Neighbourhood size (k-NN parameter). Should be <= cluster size.

Use Idiotypic AIS

BOTH Use idiotypic immune system (with antibody-antibody interactions)

Default Vote 2.0 Default vote (if used). Use Correlation True Use correlation scores to weight prediction Use category False Use category information to help make prediction Max Iterations 5 Max iterations with no change in AIS Use Concentration

False Weight prediction by antibody concentration (as well as correlation)

click to edit master text

• Click to edit Master text styles

• Second level

• Third level

• Fourth level

• Fifth level

Artificial Immune Systems

• Mean Absolute Error

System Description: Evaluation

n

predictedactualMAE n

Precision vs Recall

Variance 22

n

predictedactual

n

predictedactualnn

likeduserthatitemsU

tionsrecommendaofsetRwhereR

URP

likeduserthatitemsU

tionsrecommendaofsetRwhereU

URC