18
Hidden Permutation Model and Location-Based Activity Recognition Hung Bui SRI International Dinh Phung, Svetha Venkatesh, Hai Phan Curtin University of Technology

Hidden Permutation Model and Location-Based Activity Recognition

Embed Size (px)

Citation preview

Hidden Permutation Model and

Location-Based Activity Recognition

Hung Bui

SRI International

Dinh Phung, Svetha Venkatesh, Hai Phan

Curtin University of Technology

Talk Outline

� Why model permutations?

� Distribution of random permutations

� Hidden Permutation Model (HPM)

� How to estimate HPM parameters?

� How to perform approximate inference?

� Experiments with location-based activity recognition

Why Model Permutations?

� Permutations arise in many real-world problems

� Data association, information extraction from text, machine translation, activity recognition

� Usually, there is an unknown matching that needs to be recovered

� Correspondence in data association

� Field-to-value matching in IR

� Word/phrase matching in machine translation

� A permutation is the simplest form of matching

� Brute-force computation is at least O(n!)

Permutations in Activity Recognition

� Many activities require carrying out a collection of sub-steps, each performed just once (or a repeated a small number of times)

� AAAI travel = (get_approval, book_hotel, book_air_ticket, register, prepare_slides, do_travel)

� Ordering of steps is an unknown permutation that needs to be recovered

� Factors affecting ordering between steps:

� Strongly ordered: A enables B; A and B follow a timetable

� Weakly ordered: A performed before B out of habit

� Unordered: A performed before B by chance

� Learning these ordering constraints from data can lead to better recognition performance

Permutations and Markov Models

� Permutation constraints lead to awkward graphical models, since conditional independence is lost

� Need a more direct way of defining distribution on permutations

� Standard HMM does not enforce permutation

constraints

xn = x1?xn = x2?. . .

Distributions on Permutations

� Let Per(n) = permutations of {1,2,…,n}

� Multinomial over Per(n) requires n! parameters (Kirshner et al, ICML 2003)

� Exponential Family

f : Per(n) → Rd : feature function

λ ∈ Rd : natural parameters

E.F. distribution on permutations

Log-partition function

Pr(x | λ) = exp {〈f(x),λ〉 −A(λ)}

Very general

Expensive

Few parameters

A(λ) = ln

x∈Per(n)

exp (〈f(x),λ〉)

Exponential Family on Permutations (cont.)

� What features to use?

� Factors affecting ordering between activity steps:

� Strongly ordered: A enables B; A and B follow a timetable

� Weakly ordered: A performed before B out of habit

� Unordered: A performed before B by chance

fij(x) = I{x−1i < x−1j }

fij(x) for i < j

� Does step i appear before step j in x?

� With no loss of information, keep only

d = n(n−1)2

features (also num. parameters)

Exponential Family on Permutations (cont.)

Pr(x | λ) = exp

(∑

l<k s.t xl<xk

λxlxk − A(λ)

)

Pr(x | λ) = exp

i<j s.t x−1i <x−1

j

λij − A(λ)

� Simplified density forms

Sum over all in-order pairs

� Example

x = (2 4 1 5 3)λ2,4 + λ2,5 + λ2,3+

λ4,5+

λ1,5 + λ1,3

Some Properties

� Swapping xi and xi+1

x′ = (x1, . . . xi+1, xi, . . . , xn)

� Reverse permutation

x′ = (xn, xn−1 . . . x1)

Pr(x′|λ) =exp(

∑i<j λij − 2A(λ))

Pr(x|λ)

Pr(x′|λ)

Pr(x|λ)=

{e−λxi,xi+1 if xi < xi+1eλxi+1,xi if xi > xi+1

const(λ)

Cost of switching adjacent (i, j), i < j is eλij

Hidden Permutation Model

� “Graphical Model”

Pr(x|λ)

� Joint distribution

Pr(ot|xt = i, η) =

Mult(ηi)

Pr(x,o|λ, η) = Pr(x|λ)n∏

t=1

Pr(ot|xt, ηxt)

Max. Likelihood Estimation, Permutation Known

� Log-likelihood function:

� Optimize

� trivial (count frequency)

� Optimize

� Convex problem� Derivative:

η

λ

i appears before j ?

Pr( i appears before j)

L(λ, η) = lnP (x | λ) + lnP (o| x, η)

▽λij (L) = fij(x)−∑

x

fij(x)P (x | λ)

Max. Likelihood Estimation, Permutation Unknown

� Log-likelihood function

� Need to jointly optimize both ; Non-convex problem

� Can we use EM ?

� M-step to for does not have a closed form

� Can try coordinate ascent:

� Fix and improve by one gradient step

� Fix and improve by EM (now has closed form)

� Didn’t work as well as simple gradient ascent

λ, η

λη

λ η

l(λ, η) =K∑

k=1

log

{∑

x

P (ok,x | λ, η)

}

λ

Max. Likelihood Estimation, Permutation Unknown

� Derivative for

� Derivative for

� Avoid dealing with constraints by transforming to natural parameter for multinomial

λ

η

▽λij (l) =∑

x

fij(x)P (x| o, λ, η)

−∑

x

fij(x)P(x | λ)

▽ηiv(l) =∑

x

I{x−1i ∈ o[v]}P (x | o, λ, η)

− Pr(v|ηi)

Pr( i appears before j given o)

Pr( i appears before j)

Pr( i appears at one of

v’s position(s) given o)

Approximate Inference via MCMC

� Typical “inference” problem requires calculating an expectation.

� Expectations can be approximated if we can generate sample from

� How to draw random permutations?

� Try a well-known MCMC idea

� Start with a random initial permutation

� Randomly switch two positions

� Accept new permutation with probability

x ∼ Pr(x|λ)

min{P (x′|λ)P (x|λ) , 1

}

Atomic activities Physical locationsBanking BankLecture 1 Watson theaterLecture 2 Hayman theaterLecture 3 Davis theaterLecture 4 Jones theaterGroup meeting 1 Bookmark cafe, Library, CBSGroup meeting 2 Library, CBS, Psychology BldGroup meeting 3 Angazi cafe, Psychology BldCoffee TAV, Angazi cafe, Bookmark cafeBreakfast TAV, Angazi cafe, Bookmark cafeLunch TAV, Bookmark cafe

Location-Based Activity Recognition on Campus

Student Activity Routines

(Permutation with

Partial-Order Constraints)

Atomic

ActivitiesCorresponding

LocationsGPS “Places”

Detection

Problem

X

X

X

� Preprocessing

� Removal of points above a speed threshold

� Often missing precisely the samples we want! (e.g. buildings)

� Interpolation within a day and across days

� Clustered into groups to find significant places using DBSCAN

“Places” from GPS

Detection Performance

TP FP Precision RecallNBC 6 4 60% 60%HMM 8.5 5.3 61.6% 85%HPM 9.8 1.9 83.8% 98%

TP FP Precision RecallActivity 1 HMM 18.2 19.5 48.3% 91.0%

KIR 18.5 2.0 90.2% 92.5%HPM 19.1 4.1 82.3% 95.5%

Activity 2 HMM 17.9 4.4 80.3% 89.5%KIR 18.0 0.7 96.3% 90.5%HPM 18.8 0.4 97.9% 94.0%

TP FP Precision RecallActivity 1 NBC 16.6 11.1 59.9% 80.3%

HMM 18.3 19.8 48.0% 91.5%KIR 18.3 8.5 68.3 % 91.5%HPM 19.1 5.1 78.9% 95.5%

Activity 2 NBC 17.1 11.0 60.9% 85.5%HMM 17.7 3.8 82.3% 88.5%KIR 18.1 4.7 79.4 % 90.5%HPM 18.5 0.5 97.4% 92.5%

Simulated Data,

Supervised (Atomic Activities Given)

Simulated Data,

Unsupervised

Real Data,

Unsupervised

In a long sequence of GPS “places”,

detect occurrences of activity routine

Conclusion

� Modelling permutation is hard, but not impossible

� A general way to parameterize distribution over

permutations using the exponential family

� If permutation is not observed, use the Hidden Permutation Model (HPM)

� Demonstrated better performance than other models that do not exploit permutation constraints, as well as naïve

multinomial permutation model (Kirshner et al).

� Future work

� Generalize to permutations with repetitions

� In supervised mode, a discriminative formulation similar to CRF might work better