29
Inferring Mixtures of Markov Chains Tuğkan Batu Sudipto Guha Sampath Kannan University of Pennsylvania

Inferring Mixtures of Markov Chains

  • Upload
    saskia

  • View
    40

  • Download
    1

Embed Size (px)

DESCRIPTION

Inferring Mixtures of Markov Chains. Tu ğ kan BatuSudipto GuhaSampath Kannan University of Pennsylvania. An Example: Browsing habits. You read sports and cartoons. You’re equally likely to read both. You do not remember what you read last. You’d expect a “random” sequence. - PowerPoint PPT Presentation

Citation preview

Page 1: Inferring Mixtures of Markov Chains

Inferring Mixtures of Markov Chains

Tuğkan Batu Sudipto Guha Sampath Kannan

University of Pennsylvania

Page 2: Inferring Mixtures of Markov Chains

An Example: Browsing habits

• You read sports and cartoons. You’re equally likely to read both. You do not remember what you read last.

• You’d expect a “random” sequenceSCSSCSSCSSCCSCCCSSSSCSC…

Page 3: Inferring Mixtures of Markov Chains

Suppose there were two• I like health and entertainment• I always read entertainment first

and then read health page.• The sequence would be

EHEHEHEHEHEHEH…

Page 4: Inferring Mixtures of Markov Chains

Two readers, one log file• If there is one log file…• Assume there is no correlation

between usSECHSSECSHESCSSHCCESCHCCSESHESSHECSHCE…

Is there enough information to tell that there are two people browsing? What are they browsing? How are they browsing?

Page 5: Inferring Mixtures of Markov Chains

Clues in stream?• Yes, somewhat.

• H and E have special relationship.• They cannot belong to different

(uncorrelated) people.

• Not clear about S and C. Suppose there were 3 uncorrelated persons …

SECHSSECSHESCSSHCCESCHCCSESHESSHECSHCE

Page 6: Inferring Mixtures of Markov Chains

Markov Chains as Stochastic Sources

12

3

4

5

6

7

.2.4

.4

.7.3

.1

.9

.5

.5.8

.2.9

.1

Output sequence:1 4 7 7 1 2 5 7 ...

1

Page 7: Inferring Mixtures of Markov Chains

Markov chains on S,E,C,H

SC

1/21/2

1/21/2

Modeled by …

HE1

1

Their interleaving cannotbe Markovian.

Page 8: Inferring Mixtures of Markov Chains

Another example• Consider network traffic logs…• Malicious attacks were made• Can you tell apart the pattern of

attack from the log?

• Intrusion detection, log validation, etc…

Page 9: Inferring Mixtures of Markov Chains

Yet another example• Consider a genome sequence• Each genome sequence has “coding”

regions and “non-coding” regions– (Separate) Markov chains (usually higher

order) are used to model these two regions

• Can we predict anything about such regions?

Page 10: Inferring Mixtures of Markov Chains

The origins of the problem• Two or more probabilistic processes

• We are observing interleaved behavior

• We do not know which state belongs to which process – cold start.

Page 11: Inferring Mixtures of Markov Chains

The ProblemMC1

MC2

... 1 3 2 5 1 4

... 2 6 7 3 1

...2 6 1 3 2 7 5 3 1 4 1

Observe ...2 6 1 3 2 7 5 3 1 4 1 ...Infer: MC1 & MC2

Page 12: Inferring Mixtures of Markov Chains

How About ?

MC1

MC2

... 1 3 2 5 1 4

... 2 6 7 3 1

A gate function

How powerful is this function? Clearly a powerful function can produce arbitrary sequences …

Page 13: Inferring Mixtures of Markov Chains

Power of the Gate function• A powerful gate function can

encode powerful models. Hidden or Hierarchical Markov models…

• Assume a simple (k-way) coin flip for now.

Page 14: Inferring Mixtures of Markov Chains

Streaming Model(s)

... 10111010000110100111010010101101100111011100001101001010010...

Processor

•Processor memory is small (polylog?) compared to input size.•One or more passes but data read left-to-right in each pass.•Input order adversarial or “natural”.

Page 15: Inferring Mixtures of Markov Chains

For our problem we assume:

• Stream is polynomially long in the number of states of each Markov chain (need perhaps long stream).

• Nonzero probabilities are bounded away from 0.

• Space available is some small polynomial in #states.

)( 6nO

Page 16: Inferring Mixtures of Markov Chains

Related Work• [Freund & Ron] Considered gate function to be

a “special” Markov chain and individual processes as distribution.

• Mixture Analysis [Duda & Hart]• Mixture of Bayesian Networks, DAG models

[Thiesson et al.]• Mixture of Gaussians [Dasgupta, Arora &

Kannan] • [Abe & Warmuth] complexity of learning HMMs• Hierarchical Markov Models [Kervrann & Heitz]

Page 17: Inferring Mixtures of Markov Chains

The old example

• No “HH”.• No “HSH” but “HEH”.

• The logic: if E is in a different chain then we should also see “HH”

SECHSSECSHEHSECSSHCCESCHCCSESHESSHECSH

Page 18: Inferring Mixtures of Markov Chains

A few definitions• T[u] : probability of ……u……• T[uv] : probability of ……uv……• T[uv]/T[u] = probability of v after u• S[u]: stationary probability of u (in its chain)• u: mixing probability of chain of u

Remark. We have approximations to T and S.

Page 19: Inferring Mixtures of Markov Chains

Assumption

Assume that stream is generated by Markov chains (number unknown to us) that have disjoint state spaces.

Remark. Once we figure out state spaces, rest is simple.

Page 20: Inferring Mixtures of Markov Chains

• Warm-up: T[uv]=0 : u and v are in same chain.• Idea: If u,v in different chains,

v will follow u w/ freq. vS(v)

Lemma. If , u,v are in same chain.

Proof. If u,v in different chain,

• So, in first phase, we grow components based on this rule.

Inference Idea 1

Page 21: Inferring Mixtures of Markov Chains

What do we have after Idea 1?

• If we have not “resolved” u & v,T[uv]=T[u] T[v].

• Either u,v in different chain, orMuv

= S(v)so thatT[uv]=T[u] vMuv=T[u] vS(v)=T[u]T[v].

Page 22: Inferring Mixtures of Markov Chains

End of Phase 1• We have a set of component

vertices• But, further collapsing is possible.

SC

HE

SC

1/21/2

1/21/2

Page 23: Inferring Mixtures of Markov Chains

Inference Idea 2• Consider u,v already same component, z

in separate component. State z is in same chain if and only if T[uzv]= T[u]T[z]T[v].

Now, we can complete collapsing components.

Page 24: Inferring Mixtures of Markov Chains

At the end• Either we will resolve all edges incident to all

chains, orwe have some singleton components such that for each pair u,v,T[u] T[v] = T[uv],

equivalently, Muv=S(v).

Hence, next state distribution (for any state) is S.

Page 25: Inferring Mixtures of Markov Chains

The Old Example

SC

1/21/2

1/2

HE

1/2

The components of S and C will be left unmerged.

This is no bug!

Page 26: Inferring Mixtures of Markov Chains

More Precisely• If we have two competing

hypotheses then the likelihood of observing the string is exactly equal for both the hypotheses.

• In other words, we have two competing models which are equivalent.

Page 27: Inferring Mixtures of Markov Chains

More General Mixing Processes

• Up to now, i.i.d. coin flips for mixing

• We can handle – even when the next chain is chosen

depending on last output (i.e., each state has its own “next-chain” distribution)

e.g.: Web logs: At some pages you click sooner, others you read before clicking

Page 28: Inferring Mixtures of Markov Chains

Intersecting State SetsWe need two assumptions:1. Two Markov chains,2. There exists a state w that belongs to

exactly one chain,for all v, Mwv > S(v) or Mwv=0.

• Using analogous inference rules and state w as a reference point, we can infer underlying Markov chains.

Page 29: Inferring Mixtures of Markov Chains

Open Questions• Remove/relax assumptions for intersecting state spaces• Hardness results?

• Reduce stream length? Sample more frequently, but lose independence of samples... is there a more sophisticated argument?

• Some form of “hidden” Markov model? Rather than seeing a stream of states we see a stream of a function of states. Difficulty: Identical labels for states

CAUTION: inferring a single hidden Markov model is hard.