Ahsanul Haque , Swarup Chandra , Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson

Slide 1

Ahsanul Haque*, Swarup Chandra*, Latifur Khan* and Charu Aggarwal+* Department of Computer Science, University of Texas at Dallas+ IBM T. J. Watson Research Center, Yorktown NY, USADistributed Adaptive Importance Sampling on Graphical Models using MapReduceThis material is based upon work supported by

University Of Texas at Dallas1AgendaUniversity Of Texas at DallasBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion#AgendaUniversity Of Texas at DallasBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion#Graphical ModelsUniversity Of Texas at DallasA probabilistic graphical model G is a collection of functions over a set of random variables.

Generally represented as a network of nodes:Each node denoting a random variable (e.g., data feature).Each edge denotes relationship between two random variables.

Two types of representations:Bayesian network is represented by directed graph.Markov network is represented by undirected graph.#Example Graphical ModelUniversity Of Texas at DallasInference is needed to evaluate Probability of Evidence, Prior and Posterior Marginal, Most Probable Explanation (MPE) and Maximum a Posteriori (MAP) queries.Probability of Evidence needs to be evaluated in classification problems.AC(A,C)0050110010151120ABCDEF(A,C)(C,E)(D,F)(B,D)(C,D)(A,B)(E,F)Sample Factor:#Exact InferenceUniversity Of Texas at DallasExact Inference algorithms, e.g., Variable Elimination provide accurate results for Probability of Evidence.

Challenges:Exponential time and space complexity.Computationally intractable on large graphs.

Approximate Inference algorithms are used widely in practice to evaluate queries within resource limit.Sampling based, e.g., Gibbs Sampling, Importance Sampling.Propagation based, e.g., Iterative Join Graph Propagation.#Adaptive Importance Sampling (AIS)University Of Texas at Dallas#RB-AISUniversity Of Texas at DallasWe focus on a special type of AIS in this paper, called Rao-Blackwellized Adaptive Importance Sampling (RB-AIS).In RB-AIS, a set of variables, Xw X \ Xe (called w-cutset variables) are sampled. Xw is chosen in such a way that Exact Inference over X \ Xw, Xe is tractable.Large |Xw| results in quicker evaluation of query but more erroneous result.Small |Xw| results in more accurate result but takes more time.Trade off!

V. Gogate and R. Dechter, Approximate inference algorithms for hybrid bayesian networks with discrete constraints. in UAI. AUAI Press, 2005, pp. 209216.#RB-AIS : StepsUniversity Of Texas at DallasStartInitial Q on XwGenerate SamplesCalculate Sample WeightsUpdate Q and ZConverge?EndYesNo#AgendaUniversity Of Texas at DallasBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion#ProblemUniversity Of Texas at DallasReal world applications require good quality result within the time constraint.Typically, real world networks are large and complex (i.e., large tree width).For instance, if we want to model facebook users using graphical models, it will have billions of nodes in it!Even RB-AIS may run out of time to provide a quality estimate within the time limit.For instance, RB-AIS takes more than 6 hours to find out a single probability of evidence on a network having only 67 nodes and 271 factors.#AgendaUniversity Of Texas at DallasBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion#ChallengesUniversity Of Texas at DallasTo design a parallel and distributed approach for RB-AIS, following challenges need to be addressed:RB-AIS updates Q periodically. Since values of Q and Z at iteration i depends on those values at iteration i -1, a proper synchronization mechanism is needed.

Distributing the task of sample generation on Xw over the worker nodes.#Proposed ApproachesUniversity Of Texas at DallasWe design and implement two MapReduce based approaches for distributed and parallel computation of inference queries using RB-AIS.

Distributed Sampling in Mappers (DSM)Parallel sampling.Sequential weight calculation.

Distributed Weight Calculation in Mappers (DWCM)Sequential samplingParallel weight calculation.#Distributed Sampling in Mappers (DSM)University Of Texas at DallasReducer1(X1, x11, Qi[X1])n(X1, x1n, Qi[X1])Shuffle and Sort: aggregate values by keys

X1Qi[x1]Map 1Input to ith MR job: Xw, QiX2Qi[x2]X3Qi[x3]XmQi[xm]-1Z1(X2, x21, Qi[X2])n(X2, x2n, Qi[X2])1(X3, x31, Qi[X3])n(X3, x3n, Qi[X3])1(Xm, xm1, Qi[Xm])n(Xm, xmn, Qi[Xm])s(X1, x1s, Q[X1])(X2, x2s, Q[X2])(X3, x3s, Q[X3])(Xm, xms, Q[Xm])Update Z, and Qi to Qi+1-1ZX1Qi+1[x1]X2Qi+1[x2]X3Qi+1[x3]XmQi+1[xm]-1ZCombine x1s, x2sxms to form xs, where s = {1,2n}Map 2Map 3Map m#Distributed Weight Calculation in Mappers (DWCM)University Of Texas at DallasReducerwvShuffle and Sort: aggregate values by keys

x1Qi[Xw=x1]Map 1Input to ith MR job: Xw, List[x]-1ZwvwvwvUpdate Z and Qi to Qi+1-1ZMap 2Map 3Map nx2Qi[Xw=x2]x3Qi[Xw=x3]xnQi[Xw=xn]wvx1Qi+1[Xw=x1]-1Zx2Qi+1[Xw=x2]x3Qi+1[Xw=x3]xnQi[Xw=xn]#AgendaUniversity Of Texas at DallasBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion#SetupUniversity Of Texas at DallasPerformance Metrics:Speedup = Tsq/TdTsq = Execution time of sequential approach.Td = Execution time of distributed approach. Scaleup = Ts/TpTs = Execution time using single Mapper.Tp = Execution time using multiple Mappers.Hadoop version 1.2.1. 8 data nodes, 1 name node.Each machine has 2.2GHz processor and 4 GB of RAM.NetworkNumber of NodesNumber of Factors54.wcsp[1]6727129.wcsp[1]82462404.wcsp[1]100710[1] The probabilistic inference challenge (pic2011), http://www.cs.huji.ac.il/project/PASCAL/showNet.php, 2011, last updated on 10.23.2014.#SpeedupUniversity Of Texas at Dallas

#ScaleupUniversity Of Texas at Dallas

#DiscussionUniversity Of Texas at DallasBoth of the approaches achieve substantial speedup and scaleup comparing with the sequential execution.

DWCM has better speedup and scalability than DSM.

Weight calculation is computationally more expensive than sample generation.

DWCM does parallel weight calculation, so it outperforms DSM.

Both of the approaches show similar accuracy to the sequential execution asymptotically.#University Of Texas at DallasQuestions?#

Documents

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson

Ahsanul Haque , Swarup Chandra , Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson