49
Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011 Jonathan Huang Collaborators: Carlos Guestrin CMU Leonidas Guibas Stanford Xiaoye Jiang Stanford Ashish Kapoor Microsoft

Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011

  • Upload
    munin

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Probabilistic Reasoning and Learning with Permutations Thesis Defense, 7/29/2011. Jonathan Huang. Collaborators:. Carlos Guestrin CMU. Leonidas Guibas Stanford. Xiaoye Jiang Stanford. Ashish Kapoor Microsoft. Political Elections in Ireland. - PowerPoint PPT Presentation

Citation preview

Efficient Probabilistic Reasoning with Permutations

Probabilistic Reasoning and Learning with PermutationsThesis Defense, 7/29/2011

Jonathan Huang

Collaborators:Carlos GuestrinCMULeonidas GuibasStanfordXiaoye JiangStanfordAshish KapoorMicrosoft

1

Political Elections in Ireland

1234

But Ireland's complicated [election] system of proportional representation, could upset the front-runner and help the Fianna Fail candidate running second in the polls, to snatch victory.Recent polling indicates Doherty [Sinn Fein Party] is leading the race.2By allowing voters to express their choices in a richer way, candidates can campaign to appear high on someones ballot instead of 0/1 loss type thing 2

Proportional RepresentationProsEncourages coalition governmentsDiscourages negative campaigningNo wasted votes empowers votersIrish Parliament, Maltese Parliament, Australian Senate, Iceland Constitutional Assembly, Academy Awards, University of Cambridge, Scotland local Governments, Cambridge (Mass) local, ..ConFar more complex than plurality voting3By allowing voters to express their choices in a richer way, candidates can campaign to appear high on someones ballot instead of 0/1 loss type thing 32002 Irish Election Data

Ireland64,081 votes, 14 candidates

Major PartiesFianna Fail (FF)Fine Gael (FG)Minor PartiesIndependents (I)Green Party (GP)Christian Solidarity (CS)Labour (L)Sinn Fein (SF)[Gormley, Murphy, 2006] Predict winners Identify voting-blocs Formulate campaign strategies Engender an informed, effective democracyStatistical analysis of voting data can:44Distributions over PermutationsA B C DProbability1 2 3 402 1 3 401 3 2 41/103 1 2 402 3 1 41/203 2 1 41/51 2 4 30RankingsCandidatesA B C DProbability1 2 3 402 1 3 401 3 2 41/103 1 2 402 3 1 41/203 2 1 41/51 2 4 30With probability 1/10: Candidate A ranked first, Candidate B ranked third, Candidate C ranked second, Candidate D ranked last5 "going beyond political elections, permutations arise in all sorts of applications..."

5Permutations are Ubiquitous!7

1234

1245367

312479589PoliticsPreferences

>>>>

Multiobject Tracking6Problem #1: Representationnn!Storage requirements9362,8803 megabytes124.8x1089.5 terabyes151.31x10121729 petabytes (!!)

How can we tractably represent distributions over n! permutations in storage?77First-order summary [Shin et al, 03]14 Candidates14 Ranks1357911130.050.10.150.20.25

FFFFFFFGFGFGIIIIGPCSSFL

25% voters rank Sinn Fein last10% voters rank Sinn Fein firstConReally coarse representation cant compute P(Sinn Fein candidate is first and Fianna Fail candidate is second)Pron2 versus n! storage8For each (j,i) pair, store P(candidate j is in rank i)8

Decomposable DistributionsAdditive DecompositionMultiplicative DecompositionDecompose functions on permutations into sums of simpler functionsDecompose functions on permutations into products of simpler functions9Additive (Fourier) Decompositions+.2 x+.1 x+.01 x.6 xf(x)=Fourier coefficientsFourier basis functionslow frequencyhigh frequencyApproximate distributions over permutations with low frequency basis functions([Kondor2007,Huang2007,Huang2009])Storing low frequency coefficients to approximate ff.6.2.1.1.05.01.0100010Fourier coefficients for permutationslow frequencyhigh frequencyfFourier coefficients for distributions on permutations are matrix-valuedCan exactly reconstruct all n! original probabilitiesCan exactly reconstruct all first-order probabilities with first two matricesCan exactly reconstruct all second-order probabilities with first three matrices11[Diaconis, 88]11Second order summary (submatrix)00.010.020.030.040.050.060.07

Ranks pairsCandidate pairs1,21,31,42,32,43,4FF,FGFF,FFFF,SFFG,FFFG,SFFF,SF7% voters placed two Fianna Fail candidates consecutively in ranks 1 and 2Capture higher order dependencies with O(n4) storage12Accuracy/Storage Trade-offProbabilityFourier interpretation0th orderLowest frequency Fourier coefficient1st orderReconstructible from O(n2) lowest frequency coefficients2nd orderReconstructible from O(n4) lowest frequency coefficients3rd orderReconstructible from O(n6) lowest frequency coefficientsnth orderRequires all n! Fourier coefficientsProblem #1: RepresentationStoring a low frequency Fourier approximationis equivalent tostoring low-order probabilities

(and can be done in polynomial space)

Low-frequency Fourier approximations generalize the first-order summary!13ContributionsRepresentationPolynomial storage for approximate distributionsLow frequency = maintaining probabilities over small sets[NIPS07, JMLR09]Inference14Additive (Fourier)DecompositionMultiplicative DecompositionProblem #2: Probabilistic Inference in Ranking

What are the odds that someone will rank Sinn Fein first if he ranks Fianna Fail second?If a voter ranks Labour first, is he more likely to prefer Fine Gael over Fianna Fail?If I prefer Titanic to Star Wars, am I likely to also prefer The English Patient to Jurassic Park?1515Problem #2: Inference

PosteriorLikelihoodPrior

ABCDBACDACBDCABDxlikelihoodpriorposterior=How can we efficiently compute a posterior based on a new observation?candidate ranking z = Fianna Fail ranked secondP()|Complexity: O(n!)Rev. BayesBayes Rule:1616P(ranking)P(Sinn Fein is first | ranking)Inference with Fourier coefficients

priorlikelihoodGiven:

posteriorP(ranking | Sinn Fein is first)Compute:From Signal Processing: Pointwise products correspond to convolutions of Fourier coefficients1717P(ranking)P(Sinn Fein is first | ranking)Inference with Fourier coefficients[Huang et al, NIPS 2007]

priorlikelihoodposteriorOur algorithm applies to arbitrary distributionsdefined over arbitrary finite groupsPointwise products correspond to (generalized) convolution in the Fourier domainP(ranking | Sinn Fein is first)1818Discard high-frequency coefficients after conditioningEquivalently, maintain low-order probabilities

BandlimitingTheorem. Given rth order terms of the prior and an sth order likelihood, then the (r-s)th order terms of the posterior can be exactly computed.

(Fourier methods work best on low-order observations)[Huang et al, NIPS 2009]1919Dealing with the ImpossibleInfeasible approximations (e.g. negative probabilities) can arise due to bandlimiting20Feasible CoefficientsInfeasible CoefficientsInfeasible approximationNearest feasible Fourier coefficients(Efficient projection (to a relaxed polytope) possible using a quadratic program)Solution [Huang, 2007]: Project to space of coefficients corresponding to feasible probabilitiesPermutations in Tracking

Track 1Track 2Track 3Track 4

Applications to:- Monitoring for Assisted Living- Video analysis for sports- Video surveillance for crowds21Probabilistic Inference in Tracking

Mixing @Tracks 1,2Mixing @Tracks 1,3Mixing @Tracks 1,4Track 1Track 2Track 3Track 4

Inference problem: Where is Alice?

?

2222Simulated tracking dataProjection to the Marginal polytope versus no projection (n=6)Approximation by a uniform distribution1st order3rd order2nd orderBetter00.020.040.060.080.10.12 Errorw/o Projectionw/Projection1st order3rd order2nd order2323Tracking with a camera networkCamera Network data:8 cameras, multi-view, occlusion effects11 individuals in labIdentity observations obtained from color histogramsMixing events declared when people walk close to each otherBetter0102030405060% Tracks correctly IdentifiedOmniscient trackertime-independent classification w/o Projection 2nd order w/Projection

24Problem #2: Inferencecan be formulated in Fourier domain as (generalized) convolution, and approximated via bandlimiting/projections; low-order observations = polytime/accurate inference

24ContributionsRepresentationInferencePolynomial storage for approximate distributionsLow frequency = maintaining probabilities over small setsPolytime Fourier domain conditioning algorithm for finite groupsApproximation guarantee for low order observations[NIPS07, JMLR09][NIPS07, JMLR09]25Additive (Fourier)DecompositionMultiplicative DecompositionEven polynomial is too slowRepresentation Depth# Fourier coefficients1st orderO(n2)2nd orderO(n4)3rd orderO(n6)4th orderO(n8) 3rd order2nd order1st orderExact inferenceBetterRunning time in seconds4567801234n26Can we achieve more compact representations?

26

27Idea: Assume a ranking is created by shuffling smaller, independent rankingsArtichokeBroccoli>Rank VeggiesCherryDates>Rank FruitsVeggieFruitVeggieFruit>>>VeggieFruitVeggieFruit>>>VeggieFruitVeggieFruit>>>VeggieFruitVeggieFruit>>>ArtichokeBroccoliCherryDates>>>Interleave (riffle shuffle) veggie/fruit rankings to form a complete ranking[Huang, Guestrin, 2009]Riffle independent distributions can be represented with a reduced set of parameters!

Riffled Independence10203040506070809010011012000.010.020.030.040.050.060.070.080.090.1permutations probabilityBlue line: candidate {2} riffle independent of candidates {1,3,4,5}American Psych. Assoc. (APA) Election (1980)Empirically, we can find approximate riffled independence in real datasets

William BevanIra IscoeCharles KieslerMax SiegleLogan Wright5738 full ballots, 5 candidatesdataset from [Diaconis, 89]5!=12028#(rankings)5! = 120Can we do better?Parameter Counting{1,2,3,4,5}{1,3,4,5}{2}Item set decompositionRelative ranking of candidates {1,3,4,5}Interleaving candidate {2} with remaining candidates4! = 24530 Total # of model parameters