Immune System Metaphors Applied to Intrusion Detection and Related Problems

Immune System Metaphors Applied to Intrusion Detection

and Related Problems

by

Ian Nunn, SCS, Carleton University

[email protected]

Overview of Presentation

• Review of immune system properties of most interest

• Algorithm design and the representation of application domains

• Examples of two recognition algorithms• Overview of application areas• Focus on intrusion detection systems (IDS)• Advantages of IS models and future research• The IS model as a swarm system

Immune System Characteristics of Interest

• The human immune system (IS) is a system of detectors (principally B and T cells) that:– After initial negative selection (tolerization), does not recognize

elements of the body (self)– Is adaptable in that it can recognize over time, any foreign element

(non-self) including those never before encountered– Remembers previous foreign element encounters– Dynamically regenerates its elements– Regulates the population size and diversity of its elements– Is robust to input signal noise (recognition region) and detector loss– Is distributed in nature with no central or hierarchical control– Is error tolerant in that self recognition does not halt the system– Is self-protecting since it is part of self

Representation of Self/Non-Self

• IS elements involved are cellular proteins and their peptide sequences

• Recognition is based on matching of structural regions called epitopes on antigens and paratopes on antibodies

• Shape space model: a parameterized representation (genotype) of the conformational form of self/non-self elements (phenotype)

IS Application Algorithm Design• Requires a deep understanding of the problem domain• Self/non-self discrimination the fundamental IS principle• Steps in designing an IS algorithm:

– Identification of features allowing correct and complete self/non-self discrimination*

– Representation or encoding of features, particularly of continuous real-valued parameters*. Ab and Ag feature strings of same length facilitate algorithm performance analysis

– Determination of a matching or fitness function. Important for evolution of Ab populations (affinity maturation)

– Selection of IS principles to apply, e.g. negative selection, costimulation, affinity maturation, etc.

* This is hard stuff and an important step in applying any modeling technique whether genetic algorithms or swarm simulations (recall for army ants the problem of deciding what parameters to assign to the ants and to the environment and what values to allow).

Approach to Feature Selection and Representation

• Antibodies and antigens represented by strings of features:– The set of actual values observed such as sensor readings,

voltages, ASCII text is called the application’s phenotype– The coded representation is called the application’s genotype

• A feature is encoded by symbols from a finite alphabet• Some application feature domains:

– Binary variables: digital signals in computer systems– Discrete real variables: ASCII character text– Continuous real variables: real world sensors

• Continuous domains must be mapped onto discrete domains since we work with finite alphabets to ensure finite Ab/Ag population spaces

Phenotype Representation: Change Detection Problem Domains

• OS (UNIX) processes: sequences of top level system calls• Program execution: alphabet symbols represent op codes• File system: reduction to ASCII or binary strings• User behavior and interface use: keystrokes, mouse clicks• Time series data representation of a physical (plant)

processes: x/y position of a milling machine tool• Memory accesses: memory address calls• Local network traffic: TCP/IP packets: addresses, ports• Network traffic through routers and gateways: TCP/IP

packets, addresses, volume

IS Phenotype Encoding and Matching Using a Binary Model1

• Genotype = Phenotype : 32 bit string on a binary alphabet

• Many matching (fitness) functions possible, e.g. for li a contiguous substring of l 1’s in the complementary match (Hamming distance)

example above in the 26),(0 i

ilAbAgM

Example: Use of a Binary Model with a GA for Clonal Selection1

• Start with randomly generated Ag and possibly incomplete Ab populations

• For each Ab in turn, compute its average match (fitness) with a random fixed-size Ag subpopulation

• Use a standard GA with mutation but no crossover to evolve successively better generations of antibodies

• Niches observed to develop in coverage space for genetic commonalities (bacterial polysaccaride coating) if the initial populations have a bias

• Self recognition minimized (without negative selection) by selecting for more Ag specific instead of more general antibodies – less likely to match self

Establishing Antibody FitnessRandom sub-

population

8)( 20 AbM

GA Evolution of Antibodies1

________________1011010111110111

Use of a Negative Selection Algorithm for Clonal Selection5,2

• Want explicit self-filtering (tolerization)• Algorithm:

1. Generate the set S of self (sub)strings

2. Generate a set R0 of random strings

3. Match each string from R0 against S :– Match (non-complementary) on at least r contiguous locations:

reject– No match: add string to detector set R

• How to generate detectors efficiently an issue• Match detector set against target strings to detect intruder• Strings can be on any alphabet

Negative Selection Algorithm2

Self string to be protected: 1011 0111 0011 0000, length of contiguous match substring r = 2

Match Ab1: 10xx

Match Ab4: xx00

The Problem of Holes6

• For a particular choice of matching rule and Ab repertoire, some non-self strings may not be found causing a hole in the coverage space

• Let s1 and s2 be two antibodies matching over r-1 contiguous bits and h1 and h2 be two antigens

• A detector that matches any r contiguous bits in h1 will also match either s1 or s2 for the same feature string. The same for h2. So h1 and h2 are undetectable.

Major Application Categories of Immune System Theory

• Machine learning and pattern recognition: limited but promising work done to date

• Associative memories: limited work done to date• Elimination of identified elements:

– IS model: use the B cell and Tk cell “kill” disable viruses– Use a phagocyte analogue for cleanup and garbage collection– IBM virus lab and Forrest’s group at UNM have looked at this

• Recovery, repair and augmentation of identified elements : – IS model: use the B cell and Tk cell analogue to deliver a positive

payload to an agent– Very little work done to date

Application Areas (cont.)

• Detection problems – where most of the work has occurred: – Fault: failure of a self element (industrial plant

systems)– Change: any change in self (tumors)– Anomaly: unusual presentation of a self element – Virus: presence of a non-self element– Intrusion: attempt to gain access by non-self element– Many of the classical issues of computer and network

security involve some element of detection or self/non-self discrimination

The Intrusion Detection Problem

• Two classical types of intrusion detection systems (IDS):– Host-based: domain is a single machine possibly on a

network– Network-based: domain is a network of hosts

• Two classes of problem:– Anomaly detection: deviations from normal local

resource use and network traffic– Misuse detection: usage identified with known system

vulnerabilities and security policies

Essential Requirements of a Network-based IDS

• Robustness to host failures and noisy signals (anomalous behavior)

• Easy (self-)configurability of hosts• Easy extendibility to new hosts• Scalability: extendible to large networks without

degradation of performance• Adaptability: dynamically able to recognize new

anomalies• Efficiency: simple and low overhead operation• Global analysis: able to correlate local events to form

global patterns

Network Representation

• Commonly represent problem as the connection events (not message content) between computers

• Kim and Bentley4: – Phenotype: 35 real-valued fields in four categories - connection

identifier, port vulnerabilities, TCP handshaking, traffic intensity– Genotype: 35 genes. A detector gene has three “nucleotides”

(cluster number in (0, 9), min offset, max offset). An antibody or antigen has a single real value.

– Cluster and offset tables established for each host at start– A matching function maps an Ag or Ab value to a cluster and

takes the distance to the nearest cluster bound as the measure of similarity

– Use positive detection events to evolve the offsets for clusters

Kim and Bentley Model4

New lower interval bound for cluster 2

New upper interval bound for cluster 2

Network Representation (cont.)

• Hofmeyr and Forrest3: – Phenotype: 3 integer fields (source IP address,

destination IP address, service or port number)

– Genotype: for a detector, 49 bit binary string + state

Algorithmic Refinements3

• Detectors may have a lifetime at the end of which they are replaced if they have not matched – maintain diversity

• Activation threshold and time decay on activation level to deter limited autoimmune reaction to rare self strings

• Local activation causes a message to be sent to other hosts decreasing their activation levels (cytokine costimulation)

• Matching rule may result in holes in coverage. A randomly assigned permutation mask to control packet presentation helps avoid this (MHC molecule host diversity) contributing to population diversity.

• Each host has a unique detector set contributing to diversity and self-protection across a population of hosts

The Hofmeyr and Forrest Model3

Host-based refinement fields

Detector state

Antigen pheotype

Problem Posed by Computer Applications

• The repertoire of human self proteins is fixed over a lifetime

• In networks, valid hosts are added and deleted without notice so what “self” is constantly changes

• Among a fixed set of hosts, valid usage patterns may change without notice

• One solution: costimulation by a trusted (human) authority both at start and subsequent operation

• Much work needs to be done

Advantages of IS Models

• Adaptability through the ability to recognize foreign patterns never before encountered

• Distributed detection contributes to:– Diversity (shape space coverage)– Robustness (failure of individual hosts)– Scalability and extendibility

• Quick response to new variants of old attacks• Ability to reproduce detectors of increasing fitness

while self-regulating the overall population

IS as a Swarm System

• The IS model has a number of characteristics in common with swarm systems:– Large populations of independent agents of characterizable classes– Each agent has at most a very few characteristic simple behaviors:

• Bind with another appropriate agent and activate (B and T cells)• Kill something (killer T cell)• Clone myself (B cell)• Secrete a signaling chemical or an antibody (T and B cells)• Live for a long time (memory B and T cells)

– Simple interactions with the environment:• Special things that happen in lymphoid organs• Secreting signal chemicals which alter environmental properties (cytokines and

inflammation)

– Self-organizing as an emergent property– No centralized control over the system

Areas for Additional Research

• Matching rules with good computational properties, perhaps application specific ones

• Self/non-self representation and encoding• Algorithms for generating detector sets• Other selection algorithms• Incorporation of additional IS characteristics• Detector set populations: evolution, dynamics and

emergent properties at the species level

References

1. Forrest, Smith, Javornik and Perelson. Using Genetic Algorithms to Explore Pattern Recognition in the Immune System. Evolutionary Computation, 1(3):191-211, 1993.

2. Forrest, Allen, Perelson and Cherukuri. Self-Nonself Discrimination in a Computer. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, Los Alamitos CA, 1994.

3. Hofmeyr and Forrest. Immunity by Design: An Artificial Immune System. In Proceedings of 1999 GECCO Conference, 1999.

4. Kim and Bentley. Negative selection and niching by an artificial immune system for network intrusion detection. In Late Breaking Papers at the 1999 Genetic and Evolutionary Computation Conference, Orlando, Florida, 1999.

References (cont.)

5. Forrest, Allen, Perelson and Cherukuri. A Change-Detection Algorithm Inspired by the Immune System. Submitted to IEEE Transactions on Software Engineering, 1995.

6. D'haeseleer, Forrest and Helman. An Immunological Approach to Change Detection: Algorithms, Analysis, and Implications. Proceeding of the 1994 IEEE Symposium on Research in Security and Privacy, 1996.

Documents

Immune System Metaphors Applied to Intrusion Detection and Related Problems