Bayes_network.ppt

Embed Size (px)

Citation preview

  • 8/10/2019 Bayes_network.ppt

    1/80

    BAYESIAN NETWORK

  • 8/10/2019 Bayes_network.ppt

    2/80

    References[1]Jiawei Han: Data Mining Concepts and Techniques ,ISBN 1-53860-489-8Morgan Kaufman Publisher.[2] Stuart Russell,Peter Norvig Artificial Intelligence A modernApproach ,Pearson education.

    [3] Kandasamy,Thilagavati,Gunavati , Probability, Statistics andQueueing Theory , Sultan Chand Publishers.[4] D. Heckerman: A Tutorial on Learning with Bayesian Networks , In Learning in Graphical Models , ed. M.I. Jordan, The MITPress, 1998.[5] http://en.wikipedia.org/wiki/Bayesian_probability

    [6] http://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdf [7] http://www.murrayc.com/learning/AI/bbn.shtml [8] http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html [9] http://en.wikipedia.org/wiki/Bayesian_belief_network

    http://en.wikipedia.org/wiki/Bayesian_probabilityhttp://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdfhttp://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdfhttp://www.murrayc.com/learning/AI/bbn.shtmlhttp://www.cs.ubc.ca/~murphyk/Bayes/bnintro.htmlhttp://en.wikipedia.org/wiki/Bayesian_belief_networkhttp://en.wikipedia.org/wiki/Bayesian_belief_networkhttp://www.cs.ubc.ca/~murphyk/Bayes/bnintro.htmlhttp://www.murrayc.com/learning/AI/bbn.shtmlhttp://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdfhttp://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdfhttp://en.wikipedia.org/wiki/Bayesian_probability
  • 8/10/2019 Bayes_network.ppt

    3/80

    CONTENTS

    HISTORYCONDITIONAL PROBABILITY

    BAYES THEOREMNAVE BAYES CLASSIFIERBELIEF NETWORK

    APPLICATION OF BAYESIAN NETWORKPAPER ON CYBER CRIME DETECTION

  • 8/10/2019 Bayes_network.ppt

    4/80

    HISTORYBayesian Probability was named afterReverend Thomas Bayes (1702-1761).He proved a special case of what is currently known as the Bayes Theorem.The term Bayesian came into use aroundthe 1950 s.Pierre-Simon, Marquis de Laplace (1749-1827) independently proved a generalized version of Bayes Theorem.

    http://en.wikipedia.org/wiki/Bayesian_probability

  • 8/10/2019 Bayes_network.ppt

    5/80

    HISTORY (Cont.)1950 s New knowledge in Artificial Intelligence1958 Genetic Algorithms by Friedberg (Holland and Goldberg ~1985)

    1965 Fuzzy Logic by Zadeh at UC Berkeley1970 Bayesian Belief Network at StanfordUniversity (Judea Pearl 1988)

    The idea s proposed above was not fullydeveloped until later. BBN became popular inthe 1990s.

    http://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdf

  • 8/10/2019 Bayes_network.ppt

    6/80

    HISTORY (Cont.)

    Current uses of Bayesian Networks:Microsoft s printer troubleshooter.

    Diagnose diseases (Mycin).Used to predict oil and stock pricesControl the space shuttle

    Risk Analysis Schedule and Cost Overruns.

  • 8/10/2019 Bayes_network.ppt

    7/80

    CONDITIONAL PROBABILITYProbability : How likely is it that an event will happen?Sample Space S

    Element of S: elementary eventAn event A is a subset of S

    P( A)

    P( S) = 1

    Events A and B

    P(A|B)- Probability that event A occurs given that event B hasalready occurred.

    Example:There are 2 baskets. B1 has 2 red ball and 5 blue ball. B2 has 4 re

    d ball and 3 blue ball. Find probability of picking a red ballfrom basket 1?

  • 8/10/2019 Bayes_network.ppt

    8/80

    CONDITIONAL PROBABILITYThe question above wants P(red ball |

    basket 1).The answer intuitively wants the probability of red ball from only the sample space of b

    asket 1.So the answer is 2/7The equation to solve it is:

    P(A|B) = P(AB)/P(B) [Product Rule] P(A,B) = P(A)*P(B) [ If A and B are independe

    nt ]How do you solve P(basket2 | red ball) ???

  • 8/10/2019 Bayes_network.ppt

    9/80

    BAYESIAN THEOREM

    A special case of Bayesian Theorem:

    P(AB) = P(B) x P(A|B

    )P(BA) = P(A) x P(B|A

    )

    Since P(AB) = P(BA),P(B) x P(A|B) = P(A) x

    P(B|A)=> P(A|B) = [P(A) x P(

    A B

    A B P A P A B P A P A B P A P

    B P

    A B P A P B A P

    ||

    )|()(

    )(

    )|()(

    )|(

  • 8/10/2019 Bayes_network.ppt

    10/80

  • 8/10/2019 Bayes_network.ppt

    11/80

    BAYESIAN THEOREM

    Example 2: A medical cancer diagnosisproblem

    There are 2 possible outcomes of a diagnosis: +ve, -ve. We know .8% of world population has cancer. Test gives correct +ve result 98% of the time and gives correct ve result 97% of the time.

    If a patient s test returns +ve, should wediagnose the patient as having cancer?

  • 8/10/2019 Bayes_network.ppt

    12/80

    BAYESIAN THEOREMP(cancer) = .008 P(-cancer) = .992P(+ve|cancer) = .98 P(-ve|cancer) = .02P(+ve|-cancer) = .03 P(-ve|-cancer) = .97

    Using Bayes Formula:P(cancer|+ve) = P(+ve|cancer)xP(cancer) / P(+ve)= 0.98 x 0.008 = .0078 / P(+ve)P(-cancer|+ve) = P(+ve|-cancer)xP(-cancer) / P(+ve)= 0.03 x 0.992 = 0.0298 / P(+ve)

    So, the patient most likely does not have cancer.

  • 8/10/2019 Bayes_network.ppt

    13/80

    BAYESIAN THEOREM

    General Bayesian Theorem:

    Given E1, E2,

    ,En are mutually disjoint events and P(Ei) 0, (i = 1, 2, , n)

    P(Ei/A) = [P(Ei) x P(A|Ei)] / P(Ei) x P(A|Ei)i = 1, 2, , n

  • 8/10/2019 Bayes_network.ppt

    14/80

    BAYESIAN THEOREM

    Example:There are 3 boxes. B1 has 2 white, 3 black and 4 red balls. B2 has 3 white, 2 black and 2 red balls. B3 has 4 white, 1 black and 3 red balls. A box is chosen at random and 2 balls are drawn. 1 is whiteand other is red. What is the probability that they came from the first box??

  • 8/10/2019 Bayes_network.ppt

    15/80

    BAYESIAN THEOREM

    Let E1, E2, E3 denote events of choosingB1, B2, B3 respectively. Let A be the event that 2 balls selected are white and red.

    P(E1) = P(E2) = P(E3) = 1/3P(A|E1) = [2c1 x 4c1] / 9c2 = 2/9P(A|E2) = [3c1 x 2c1] / 7c2 = 2/7

    P(A|E3) = [4c1 x 3c1] / 8c2 = 3/7

  • 8/10/2019 Bayes_network.ppt

    16/80

    BAYESIAN THEOREM

    P(E1|A) = [P(E1) x P(A|E1)] / P(Ei) x P(A|Ei)

    = 0.23727

    P(E2|A) = 0.30509

    P(E3|A) = 1 (0.23727 + 0.30509) = 0.45764

  • 8/10/2019 Bayes_network.ppt

    17/80

    BAYESIAN CLASSIFICATION

    Why use Bayesian Classification:Probabilistic learning: Calculate explicitprobabilities for hypothesis, among the mo

    st practical approaches to certain types oflearning problemsIncremental: Each training example can

    incrmentally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.

  • 8/10/2019 Bayes_network.ppt

    18/80

    BAYESIAN CLASSIFICATION

    Probabilistic prediction : Predict multiplehypotheses, weighted by their probabilitiesStandard: Even when Bayesian methodsare computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

  • 8/10/2019 Bayes_network.ppt

    19/80

    NA VE BAYES CLASSIFIER

    A simplified assumption: attributes areconditionally independent:

    Greatly reduces the computation cost, only count the class distribution.

  • 8/10/2019 Bayes_network.ppt

    20/80

    NA VE BAYES CLASSIFIER

    The probabilistic model of NBC is to find the probability of a certain class given multiple dijoint (assumed) events.

    The na ve Bayes classifier applies to learning taskswhere each instance x is described by a conjunction of attribute values and where the target function f(x) can take on any value from some finite setV. A set of training examples of the target functionis provided, and a new instance is presented, described by the tuple of attribute values . The learner is asked to predict the target v

    alue, or classification, for this new instance.

  • 8/10/2019 Bayes_network.ppt

    21/80

    NA VE BAYES CLASSIFIER

    Abstractly, probability model for a classifier is aconditional modelP(C|F1,F2, ,Fn)Over a dependent class variable C with a small

    nuumber of outcome or classes conditional over several feature variables F1, ,Fn.

    Na ve Bayes Formula:

    P(C|F1,F2,

    ,Fn) = argmax c [P(C) x P(F1|C) xP(F2|C) x x P(Fn|C)] / P(F1,F2, ,Fn)

    Since P(F1,F2, ,Fn) is common to all probabilities, we donot need to evaluate the denomitator for comparisons.

  • 8/10/2019 Bayes_network.ppt

    22/80

    NA VE BAYES CLASSIFIER

    Tennis-Example

  • 8/10/2019 Bayes_network.ppt

    23/80

    NA VE BAYES CLASSIFIER

    Problem:Use training data from above to classify t

    he following instances:a) b)

  • 8/10/2019 Bayes_network.ppt

    24/80

    NA VE BAYES CLASSIFIER

    Answer to (a):P(PlayTennis=yes) = 9/14 = 0.64P(PlayTennis=n) = 5/14 = 0.36

    P(Outlook=sunny|PlayTennis=yes) = 2/9 = 0.22P(Outlook=sunny|PlayTennis=no) = 3/5 = 0.60P(Temperature=cool|PlayTennis=yes) = 3/9 = 0

    .33

    P(Temperature=cool|PlayTennis=no) = 1/5 = .20P(Humidity=high|PlayTennis=yes) = 3/9 = 0.33P(Humidity=high|PlayTennis=no) = 4/5 = 0.80

    P(Wind=strong|PlayTennis=yes) = 3/9 = 0.33= = = =

  • 8/10/2019 Bayes_network.ppt

    25/80

  • 8/10/2019 Bayes_network.ppt

    26/80

    NA VE BAYES CLASSIFIERAnswer to (b):P(PlayTennis=yes) = 9/14 = 0.64P(PlayTennis=no) = 5/14 = 0.36P(Outlook=overcast|PlayTennis=yes) = 4/9 = 0.44P(Outlook=overcast|PlayTennis=no) = 0/5 = 0P(Temperature=cool|PlayTennis=yes) = 3/9 = 0.33P(Temperature=cool|PlayTennis=no) = 1/5 = .20P(Humidity=high|PlayTennis=yes) = 3/9 = 0.33P(Humidity=high|PlayTennis=no) = 4/5 = 0.80P(Wind=strong|PlayTennis=yes) = 3/9 = 0.33P(Wind=strong|PlayTennis=no) = 3/5 = 0.60

  • 8/10/2019 Bayes_network.ppt

    27/80

    NA VE BAYES CLASSIFIER

    Estimating Probabilities:In the previous example, P(overcast|no) =

    0 which causes the formula-P(no)xP(overcast|no)xP(cool|no)xP(high|

    no)xP(strong|nno) = 0.0This causes problems in comparing becau

    se the other probabilities are not considered. We can avoid this difficulty by using m- estimate.

  • 8/10/2019 Bayes_network.ppt

    28/80

    NA VE BAYES CLASSIFIER

    M-Estimate Formula:

    [c + k] / [n + m] where c/n is the original probability used before, k=1 andm= equivalent sample size.

    Using this method our new values ofprobility is given below-

  • 8/10/2019 Bayes_network.ppt

    29/80

    NA VE BAYES CLASSIFIERNew answer to (b):P(PlayTennis=yes) = 10/16 = 0.63P(PlayTennis=no) = 6/16 = 0.37P(Outlook=overcast|PlayTennis=yes) = 5/12 = 0.42P(Outlook=overcast|PlayTennis=no) = 1/8 = .13P(Temperature=cool|PlayTennis=yes) = 4/12 = 0.33P(Temperature=cool|PlayTennis=no) = 2/8 = .25P(Humidity=high|PlayTennis=yes) = 4/11 = 0.36P(Humidity=high|PlayTennis=no) = 5/7 = 0.71P(Wind=strong|PlayTennis=yes) = 4/11 = 0.36P(Wind=strong|PlayTennis=no) = 4/7 = 0.57

  • 8/10/2019 Bayes_network.ppt

    30/80

    NA VE BAYES CLASSIFIER

    P(yes)xP(overcast|yes)xP(cool|yes)xP(high|yes)xP(strong|yes) = 0.011

    P(no)xP(overcast|no)xP(cool|no)xP(high|no)xP(strong|nno) = 0.00486

    So the class of this instance is yes

  • 8/10/2019 Bayes_network.ppt

    31/80

    NA VE BAYES CLASSIFIER

    The conditional probability values of all the

    attributes with respect to the class arepre-computed and stored on disk.

    This prevents the classifier from computing the conditional probabilities every time it runs.This stored data can be reused to reduce the

  • 8/10/2019 Bayes_network.ppt

    32/80

    BAYESIAN BELIEF NETWORKIn Na ve Bayes Classifier we make the assumption of class conditional independence, that is given the class label of a sample, the value of theattributes are conditionally independent of oneanother. However, there can be dependences betweenvalue of attributes. To avoid this we use Bayesian Belief Network which provide joint conditionalprobability distribution.A Bayesian network is a form of probabilisticgraphical model. Specifically, a Bayesian network is a directed acyclic graph of nodes representing variables and arcs representing dependence relations among the variables.

  • 8/10/2019 Bayes_network.ppt

    33/80

  • 8/10/2019 Bayes_network.ppt

    34/80

    BAYESIAN BELIEF NETWORK

    A Bayesian network is a representation of thejoint distribution over all the variables represented by nodes in the graph. Let the variables be

    X(1), ..., X(n).Let parents(A) be the parents of the node A. Then the joint distribution for X(1) through X(n) is represented as the product of the probability distributions P(Xi|Parents(Xi)) for i =1 to n. If X has no parents, its probability distribution is said to be unconditional, otherwiseit is conditional.

  • 8/10/2019 Bayes_network.ppt

    35/80

    BAYESIAN BELIEF NETWORK

  • 8/10/2019 Bayes_network.ppt

    36/80

    BAYESIAN BELIEF NETWORK

    By the chaining rule of probability, the joint probability of all the nodes in the graph above is:

    P(C, S, R, W) = P(C) * P(S|C) * P(R|C) *P(W|S,R)W=Wet Grass, C=Cloudy, R=Rain,

    S=SprinklerExample: P(W - RSC)= P(W|S,-R)*P(-R|C)*P(S|C)*P(C)

    = 0.9*0.2*0.1*0.5 = 0.009

  • 8/10/2019 Bayes_network.ppt

    37/80

    BAYESIAN BELIEF NETWORK

    What is the probability of wet grass on a givenday - P(W)?

    P(W) = P(W|SR) * P(S) * P(R) +P(W|S-R) * P(S) * P(-R) +P(W|-SR) * P(-S) * P(R) +P(W|-S-R) * P(-S) * P(-R)

    Here P(S) = P(S|C) * P(C) + P(S|-C) * P(-C)P(R) = P(R|C) * P(C) + P(R|-C) * P(-C)

    P(W)= 0.5985

  • 8/10/2019 Bayes_network.ppt

    38/80

  • 8/10/2019 Bayes_network.ppt

    39/80

  • 8/10/2019 Bayes_network.ppt

    40/80

    Problem???

    real world Bayesian network application Learning to classify text.

    Instances are text documentswe might wish to learn the target concept electronic news articles that I find interesting, or pages on the World Wide Web that discuss data mining topics. In both cases, if a computer could learn the target concept accurately, it could automatically filter the large volu

    me ofonline text documents to present only the most relevantdocuments to the user.

  • 8/10/2019 Bayes_network.ppt

    41/80

    TECHNIQUElearning how to classify text, based on thenaive Bayes classifierits a probabilistic approach and is among the most effective algorithms currently known for learning to classify text documents,Instance space X consists of all possible text documents given training examples of some unknown targetfunction f(x), which can take on any value from somefinite set V we will consider the target function classifying documents as interesting or uninteresting to a particular person, using the target values like and dislike to indicate these two classes.

  • 8/10/2019 Bayes_network.ppt

    42/80

    Design issues

    how to represent an arbitrary text document in terms of attribute values

    decide how to estimate the probabilities required by the naive Bayes classifier

  • 8/10/2019 Bayes_network.ppt

    43/80

  • 8/10/2019 Bayes_network.ppt

    44/80

    ASSUMPTIONS

    assume we are given a set of 700training documents that a friend hasclassified as dislike and another 300she has classified as like We are now given a new document andasked to classify it

    let us assume the new text document isthe preceding paragraph

  • 8/10/2019 Bayes_network.ppt

    45/80

    We know (P(like) = .3 and P (dislike) = .7 in the currentexampleP a i , = wk|v j) (here we introduce w k to indicate the k th word

    in the English vocabulary)estimating the class conditional probabilities (e.g., P(a i =our Idislike)) is more problematic because we mustestimate one such probability term for each combination oftext position, English word, and target value.there are approximately 50,000 distinct words in theEnglish vocabulary, 2 possible target values, and 111 textpositions in the current example, so we must estimate2*111* 50, 000 =~10 million such terms from the trainingdata.

  • 8/10/2019 Bayes_network.ppt

    46/80

  • 8/10/2019 Bayes_network.ppt

    47/80

    Final AlgorithmExamples is a set of text documents along with their target values. V is theset of all possible target values. This function learns the probability termsP( w k | v j ), describing the probability that a randomly drawn word from adocument in class v j will be the English word W k . It also learns the class priorprobabilities P(v i ).1. collect all words, punctuation, and other tokens that occur in Examples Vocabulary set of all distinct words & tokens occurring in any textdocument from Examples2. calculate the required P(v i ) and P( w k | v j ) probability terms For each target value v j in V do docs j the subset of documents from Examples for which the target valueis v j P(v1) Idocs jI / \Examplesl Text j a single document created by concatenating all members of docs j n total number of distinct word positions in Text j for each word W k in Vocabulary

    n k number of times word w

    k occurs in Text

    j

    P(w kIvj) n k +1/n+|Vocabulary|

    CLASSIFY_NAIVE_BAYES_TEXT( Doc)Return the estimated target value for the document Doc. a i denotes the wordfound in the i th position within Doc. positions all word positions in Doc that contain tokens found inVocabulary Return V NB , where

  • 8/10/2019 Bayes_network.ppt

    48/80

    During learning, the procedureLEARN_NAIVE_BAYES_TEXT examines all trainingdocuments to extract the vocabulary of all words andtokens that appear in the text, then counts theirfrequencies among the different target classes toobtain the necessary probability estimates. Later,given a new document to be classified, theprocedure CLASSIFY_NAIVE_BAYESTEXT uses theseprobability estimates to calculate VNB according toEquation Note that any words appearing in the newdocument that were not observed in the training setare simply ignored by CLASSIFY_NAIVE_BAYESTEXT

  • 8/10/2019 Bayes_network.ppt

    49/80

    Effectiveness of the AlgorithmProblem classifying usenet news articlestarget classification for an article name of the usenet newsgroup in whichthe article appearedIn the experiment described by Joachims (1996), 20 electronic newsgroupswere considered1,000 articles were collected from each newsgroup, forming a data set of 20,000 documents. The naive Bayes algorithm was then applied using two-thirds o

    f these 20,000 documents as training examples, and performance was measured over the remaining third.100 most frequent words were removed (these include words such as the and of ), and any word occurring fewer than three times was also removed.The resulting vocabulary contained approximately 38,500 words.The accuracy achieved by the program was 89%.

    comp.graphics misc.forsale soc.religion.christian alt.atheism

    comp.os.ms-winclows.misc rec.autos talk.politics.guns sci.space

    cornp.sys.ibm.pc.hardware rec.sport.baseball talk.politics.mideast sci.crypt

    comp.windows.x rec.motorcycles talk.politics.misc sci.electronics

    comp.sys.mac.hardware rec.sport.hockey talk.creligion.misc sci .med

  • 8/10/2019 Bayes_network.ppt

    50/80

    APPLICATIONS

    A newsgroup posting service that learns toassign documents to the appropriatenewsgroup.

    NEWSWEEDER system a program for readingnetnews that allows the user to rate articles ashe or she reads them. NEWSWEEDER thenuses these rated articles (i.e its learned profileof user interests to suggest the most highlyrated new articles each dayNaive Bayes Spam Filtering Using Word- Position-Based Attributes

  • 8/10/2019 Bayes_network.ppt

    51/80

    Thank you

  • 8/10/2019 Bayes_network.ppt

    52/80

    Bayesian Learning Networks Approach to

  • 8/10/2019 Bayes_network.ppt

    53/80

    Cybercrime Detection

    N S ABOUZAKHAR, A GANI and G MANSONThe Centre for Mobile Communications Research

    (C4MCR),University of Sheffield, Sheffield

    Regent Court, 211 Portobello Street,Sheffield S1 4DP, UK

    [email protected]@dcs.shef.ac.uk

    [email protected]

    M ABUITBEL and D KINGThe Manchester School of Engineering,

    University of ManchesterIT Building, Room IT 109,

    Oxford Road,Manchester M13 9PL, UK

    [email protected]

    [email protected]

  • 8/10/2019 Bayes_network.ppt

    54/80

    REFERENCES

    1. David J. Marchette, Computer Intrusion Detection and Network Monitoring,

    A statistical Viewpoint, 2001,Springer-Verlag, New York, Inc, USA.2. Heckerman, D. (1995), A Tutorial on Learning with Bayesian Networks, TechnicalReport MSR-TR-95-06, Microsoft Corporation.

    3. Michael Berthold and David J. Hand, Intelligent Data Analysis, An Introduction, 1999, Springer, Italy.

    4. http://www.ll.mit.edu/IST/ideval/data/data_index.html , accessed on 01/12/2002

    5. http://kdd.ics.uci.edu/ , accessed on 01/12/2002.6. Ian H. Witten and Eibe Frank, Data Mining, Practical Machine Learning Tools andTechniques with Java Implementations, 2000, Morgan Kaufmann, USA.

    7. http://www.bayesia.com , accessed on 20/12/2002

    M i i b hi d h

    http://www.ll.mit.edu/IST/ideval/data/data_index.htmlhttp://kdd.ics.uci.edu/http://www.bayesia.com/http://www.bayesia.com/http://kdd.ics.uci.edu/http://www.ll.mit.edu/IST/ideval/data/data_index.html
  • 8/10/2019 Bayes_network.ppt

    55/80

    Growing dependence of modern societyon telecommunication and informationnetworks.

    Increase in the number of interconnectednetworks to the Internet has led to an

    increase in security threats and cyber crimes.

    Motivation behind the paper..

  • 8/10/2019 Bayes_network.ppt

    56/80

    In order to detect distributed networkattacks as early as possible, an underresearch and development probabilisticapproach, based on Bayesian networkshas been proposed.

    Structure of the paper

  • 8/10/2019 Bayes_network.ppt

    57/80

    Learning Agents which deploy Bayesiannetwork approach are considered to bea promising and useful tool in determining suspicious early events of Internetthreats.

    Where can this model be utilized

  • 8/10/2019 Bayes_network.ppt

    58/80

    Before we look at the details given in the paper lets

    understand what BayesianNetworks are and how they

    are constructed .

    Bayesian Networks

  • 8/10/2019 Bayes_network.ppt

    59/80

    Bayesian NetworksA simple, graphical notation for conditional independence assertions and hence for compact specification of

    fulljoint distributions.

    Syntax:

    a set of nodes, one per variable a directed, acyclic graph (link "directly influences")

    a conditional distribution for each node given itsparents:

    P (X i | Parents (X i))In the simplest case, conditional distribution represented

    as a conditional robabilit table (CPT) ivin the

  • 8/10/2019 Bayes_network.ppt

    60/80

    Some conventions .

    Variables depicted as nodesArcs represent probabilisticdependence betweenvariables.Conditional probabilitiesencode the strength ofdependencies.Missing arcs impliesconditional independence.

    S i

  • 8/10/2019 Bayes_network.ppt

    61/80

    SemanticsThe full joint distribution is defined as the product of

    thelocal conditional distributions:

    P (X 1, ,X n ) = i = 1 P (X i | Parents(X i ))

    e.g., P (j m a b e)

    = P (j | a) P (m | a) P (a | b, e) P ( b) P ( e)

  • 8/10/2019 Bayes_network.ppt

    62/80

    Example of Construction of a BN

  • 8/10/2019 Bayes_network.ppt

    63/80

    Back to the discussion of thepaper .

  • 8/10/2019 Bayes_network.ppt

    64/80

    Description

    This paper shows how probabilistically B

    ayesian network detects communicationnetwork attacks, allowing for generalization of Network Intrusion Detection Systems(NIDSs).

    l

  • 8/10/2019 Bayes_network.ppt

    65/80

    GoalHow well does our model detect or classif

    yattacks and respond to them later on.The system requires the estimation of twoquantities:

    The probability of detection (PD)

    Probability of false alarm (PFA).It is not possible to simultaneously achieve a PD of 1 and PFA of 0.

  • 8/10/2019 Bayes_network.ppt

    66/80

  • 8/10/2019 Bayes_network.ppt

    67/80

    Construction of the network

    The following figure shows the Bayesiannetwork that has been automaticallyconstructed by the learning algorithms ofBayesiaLab.The target variable, activity_type , is directl

    y

    connected to the variables that heavilycontribute to its knowledge such as servic

    e

    and protocol type .

    http://www.bayesia.com/GB/produits/bLab/BLabApprentissage.phphttp://www.bayesia.com/GB/produits/bLab/BLabApprentissage.php
  • 8/10/2019 Bayes_network.ppt

    68/80

  • 8/10/2019 Bayes_network.ppt

    69/80

    Data Gathering

    MIT Lincoln Labs set up an environment to

    acquire several weeks of raw TCP dumpdata for a local-area network (LAN)simulating a typical U.S. Air Force LAN. T

    hegenerated raw dataset contains about fewmillion connection records.

    Mapping the simple

  • 8/10/2019 Bayes_network.ppt

    70/80

    pp g pBayesian Network that we saw to

    the one used in the paper

  • 8/10/2019 Bayes_network.ppt

    71/80

    Observation 1 :

    As shown in the next figure, the most probable activity corresponds to a smurf attack (52.90%), an ecr_i (ECHO_REPLY)

    service (52.96%) and an icmp protocol(53.21%).

  • 8/10/2019 Bayes_network.ppt

    72/80

  • 8/10/2019 Bayes_network.ppt

    73/80

    Observation 2 :

    What would happen if the probability ofreceiving ICMP protocol packets is increased? Would the probability of having a

    smurf attack increase?Setting the protocol to its ICMP value increases the probability of having a smur

    f attack from 52.90% to 99.37%.

  • 8/10/2019 Bayes_network.ppt

    74/80

  • 8/10/2019 Bayes_network.ppt

    75/80

    Observation 3 :

    Let s look at the problem from the opposite direction. If we set the probability of portsweep attack to 100%,then the value of some associated variables would inevitably vary.We note from Figure 4 that the probabilities of the TCP protocol and private service have been increased from 38.10% to 97.49% and fr

    om 24.71% to 71.45% respectively. Also, wecan notice an increase in the REJ and RSTR flags.

  • 8/10/2019 Bayes_network.ppt

    76/80

  • 8/10/2019 Bayes_network.ppt

    77/80

    B fit f th B i M d l

  • 8/10/2019 Bayes_network.ppt

    78/80

    Benefits of the Bayesian ModelThe benefit of using Bayesian IDSs is the abili

    ty to adjust our IDS s sensitivity.This would allow us to trade off betweenaccuracy and sensitivity.Furthermore, the automatic detection networkanomalies by learning allows distinguishing the normal activities from the abnormal ones.Allow network security analysts to see the

    amount of information being contributed by each variable in the detection model to the knowledge of the target node

    http://www.bayesia.com/GB/produits/bLab/BLabAnalyse.phphttp://www.bayesia.com/GB/produits/bLab/BLabAnalyse.php
  • 8/10/2019 Bayes_network.ppt

    79/80

    Performance evaluation

  • 8/10/2019 Bayes_network.ppt

    80/80

    Thank you

    QUESTIONS OR QUERIES