Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Activity recognition
through MLNs
Jason Filippou
CVSS Summer Seminar series
July 3rd 2013
Activity recognition through MLNs
• Joint work with Complex Event Recognition (CER) Lab in
NCSR “Demokritos”, Athens, Greece
Traditional logical reasoning for activity
recognition
• Pros: • Can model events of arbitrary complexity (FOL expressiveness)
• Complexity = Multiple actors, temporal complexity and persistence (inertia)
• Rules are easy to write:
• Facilitation of interaction between developer & domain expert.
• Formal semantics
• Satisfiability of a rule
• Deterministic operators (⇒,∃, … )
• Popular implementations
• Efficient Prolog systems (SWI, YAP,…)
• Cons: • Cannot handle uncertainty (facts / rules are 0-1)
Traditional logical reasoning for activity
recognition
• Pros: • Can model events of arbitrary complexity (FOL expressiveness)
• Complexity = Multiple actors, temporal complexity and persistence (inertia)
• Rules are easy to write:
• Facilitation of interaction between developer & domain expert.
• Formal semantics
• Satisfiability of a rule
• Deterministic operators (⇒,∃, … )
• Popular implementations
• Efficient Prolog systems (SWI / YAP…)
• Cons: • Cannot handle uncertainty (facts / rules are 0-1)
Goal: Ameliorate
the cons without
losing the pros
Uncertainty in activity recognition
• Uncertainty in the input stream
• Human detection confidence
• Occlusions
• Identity maintenance / Tracking
• Uncertainty in the rules for recognizing events
• E.g. a rule that dictates that two people are “moving” along together
if they are moving in parallel might not always be true
• Traditional logic cannot handle those
• This paper deals with the second type of uncertainty
• [4] deals with the first, [5] deals with both
Today’s paper: DEC-MLN
• DEC-MLN: Discrete Event Calculus based on MLNs.
Today’s paper: DEC-MLN
• DEC-MLN: Discrete Event Calculus based on Markov Logic Networks.
• Format of the rest of the presentation:
1. The Event Calculus
2. The CAVIAR dataset
3. Markov Logic Networks
4. DEC-MLN approach
5. Pros / Cons of the approach & pointers for further discussion
Will discuss all of those…
• Discrete?
• Event Calculus?
• Markov Logic
Networks?
The Event Calculus
The Event Calculus
• A formal logical language for representing events and
their effects
• Introduced by Kowalski and Sergot [1]
• Core constructs:
• Fluents (F) which might take different values over time (F=V)
• Events E, considered instantaneous
• Time model T (Discrete? Continuous? Smooth?)
The “D” in
DEC-MLN
EC (contd.)
• Domain-independent axioms govern when a fluent F has
a value V
• Two axioms mainly studied in the literature
• 𝑜𝑙𝑑𝑠𝐴𝑡(𝐹 = 𝑉, 𝑇): fluent F has value V at time T
• 𝑜𝑙𝑑𝑠𝐹𝑜𝑟(𝐹 = 𝑉, 𝐼): I is the union of time intervals during
which F has value V continuously.
EC (contd.)
• Domain – dependent predicates govern the initiation and
termination of value assignments to fluents
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝐹 = 𝑉, 𝑇 ← 𝑎𝑝𝑝𝑒𝑛𝑠 𝐸, 𝑇 , 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠 𝑇
• 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠,𝑇- is a set of further conditions that have to be
satisfied by the actors involved in either 𝐹 or 𝐸. • Will provide concrete examples
• 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡/2 rules are of the same form
Law of inertia
• In EC, a fluent holds its value V once initiated and until
termination.
• Concretely:
𝑜𝑙𝑑𝑠𝐴𝑡 𝐹 = 𝑉, 𝑇 + 1 ← 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝐹 = 𝑉, 𝑇
𝑜𝑙𝑑𝑠𝐴𝑡 𝐹 = 𝑉, 𝑇 + 1 ←
𝑜𝑙𝑑𝑠𝐴𝑡 𝐹 = 𝑉, 𝑇 , 𝑛𝑜𝑡 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡(𝐹 = 𝑉, 𝑇)
• This temporal persistence (inertia) is at the core of EC
• Different approaches model it in different ways
Inertia!
EC take-home messages
• EC induces:
1. A separation between (1) domain-independent axioms,
encoded in terms of 𝑜𝑙𝑑𝑠𝐴𝑡 and (2) domain knowledge,
encoded in terms of 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 and 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 • Vision: event hierarchy (low-level and high-level events)
2. A natural characterization of temporal inertia
• An event holds until evidence to the contrary suggests that it shouldn’t.
• Vision: human detection, tracking…
CAVIAR
(Context-Aware Vision using Image-
based Activity Recognition)
The CAVIAR dataset
• Two sub-sets
• INRIA
• Portuguese shopping center
• This work focuses on INRIA
• 28 (staged) surveillance videos, 26419 frames.
• Annotations for coordinates, trajectories, events
• Even identity maintenance has been annotated
• One can always discard unwanted annotations depending on what
kind of inference / learning is to be performed
CAVIAR-INRIA
CAVIAR-INRIA (contd.)
High-level
(group)
events
Low-level
(atomic)
events
CAVIAR & EC
• CAVIAR naturally induces an activity hierarchy
• Annotations for:
• LLE: walking, running, inactive, active, abrupt
• HLE: fighting, meeting, moving, leaving_object
• Very compatible with Event Calculus formalism!
• HLE-> binary fluents (true/false)
• LLE-> events (logical facts)
• In this work, the focus is on HLE recognition (group
activities) based on LLE + locations / orientations
• LLE recognition also possible
• Pose identification / learning…
We
added
this
EC rule examples in CAVIAR
• Fights are initiated by a person performing the “abrupt” activity near a non-inactive person
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑏𝑟𝑢𝑝𝑡 𝑃1 , 𝑇 , 𝑐𝑙𝑜𝑠𝑒 𝑃1, 𝑃2, 𝑇 , ¬ 𝑎𝑝𝑝𝑒𝑛𝑠 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒 𝑃2 , 𝑇
• Fights are terminated when a person walks away from the other person.
𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑎𝑝𝑝𝑒𝑛𝑠 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑃2, 𝑇 , ¬ 𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)
EC rule examples in CAVIAR
• Fights are initiated by a person performing the “abrupt” activity near a non-inactive person
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑏𝑟𝑢𝑝𝑡 𝑃1 , 𝑇 , 𝑐𝑙𝑜𝑠𝑒 𝑃1, 𝑃2, 𝑇 , ¬ 𝑎𝑝𝑝𝑒𝑛𝑠 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒 𝑃2 , 𝑇
• Fights are terminated when a person walks away from the other person.
𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 𝑓𝑖𝑔𝑡𝑖𝑛𝑔 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ← 𝑎𝑝𝑝𝑒𝑛𝑠 𝑤𝑎𝑙𝑘𝑖𝑛𝑔 𝑃2, 𝑇 , ¬ 𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)
HLE (fluents)
LLE, events
𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠,𝑇-
HLE (fluents)
LLE, events
𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑠,𝑇-
CAVIAR Pros / Cons
• Pros:
• Suitable for modeling group activities (High-Level Events, HLE) in
conjunction with atomic activities (Low-Level Events, LLE)
• Annotations for very different elements
• Cons:
• An “easy” dataset for many reasons:
• Very few occlusions (person-person, person-object) (NOT the case for
the Portuguese shopping center sub-dataset)
• Staged activities
• Lighting is controlled
• Static camera / background
CAVIAR Pros / Cons
• Pros:
• Suitable for modeling group activities (High-Level Events, HLE) in
conjunction with atomic activities (Low-Level Events, LLE)
• Annotations for very different elements
• Cons:
• An “easy” dataset for many reasons:
• Very few occlusions (person-person, person-object) (NOT the case for
the Portuguese shopping center sub-dataset)
• Staged activities
• Lighting is controlled
• Static camera / background Natural properties of
indoor surveillance
Markov Logic Networks
(in 5 slides…)
Markov Logic Networks
• Perhaps the most prevalent framework for merging FOL
with probabilistic inference
• From (binary) possible worlds to (soft) probable worlds
• Underlying representation: Ground Markov Random Field
(MRF)
• A set of FOL rules and constants are used to ground the MRF
• Nodes = binary ground facts (can be true or false)
• Every rule is translated to a clique in the network.
• Main idea: Rules are accompanied by weights
• Possible world = assignment to all variables (ground facts).
Grounding example
• Assume the following rule set drawn from a parking lot
surveillance scenario:
1. A car almost certainly has someone driving it.
∀𝑥 (𝐶𝑎𝑟(𝑥) ⇒ ∃𝑦 𝐷𝑟𝑖𝑣𝑒𝑠 𝑦, 𝑥 ) (𝑤1 = 4.0)
2. Vehicles parked next to cars are usually also cars.
∀𝑥, 𝑦 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ⇒ 𝐶𝑎𝑟 𝑦 𝑤2 = 2.5
Grounding example
• Assume the following rule set drawn from a parking lot
surveillance scenario:
1. A car almost certainly has someone driving it.
∀𝑥 (𝐶𝑎𝑟(𝑥) ⇒ ∃𝑦 𝐷𝑟𝑖𝑣𝑒𝑠 𝑦, 𝑥 ) (𝑤1 = 4.0)
¬𝐶𝑎𝑟 𝑥 ∨ 𝐷𝑟𝑖𝑣𝑒𝑠 𝑦, 𝑥
2. Vehicles park next to cars are usually also cars.
∀𝑥, 𝑦 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ⇒ 𝐶𝑎𝑟 𝑦 𝑤2 = 2.5
¬ 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ) ∨ 𝐶𝑎𝑟 𝑦
We need to translate the
formulae into clausal form
before grounding
Grounding example (contd.)
• Combined with a constant set 𝐶 = *𝐽𝑜𝑛,𝑁𝑖𝑠𝑠𝑎𝑛+:
Car(J)
PNT(J, N) PNT(N, J)
Dr (N, J)
Car(N)
Dr(J, N) Dr(N, N)
Dr(J, J)
Grounding example (contd.)
• Combined with a constant set 𝐶 = *𝐽𝑜𝑛,𝑁𝑖𝑠𝑠𝑎𝑛+:
Car(J)
PNT(J, N) PNT(N, J)
Dr (N, J)
Car(N)
Dr(J, N) Dr(N, N)
Dr(J, J)
All ground facts participating in rule 2:
¬ 𝐶𝑎𝑟 𝑥 ∧ 𝑃𝑎𝑟𝑘𝑒𝑑𝑁𝑒𝑥𝑡𝑇𝑜 𝑥, 𝑦 ) ∨ 𝐶𝑎𝑟 𝑦
Joint probability
• The joint probability distribution for X = x is equal to:
𝑃 𝑋 = 𝑥 = 1
𝑍exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥
𝐹
𝑖=1
• 𝑤𝑖: real number representing a rule weight
• Can set to “infinite” (very large) for rules that capture certainties for
the environment
• 𝑛𝑖 (𝑥): #times rule 𝑖 is true in 𝑥 • Lots of groundings of a high-weight rule add to the probability of 𝑥
• This is the model feature associated with every clique
• 𝑍 = exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥𝐹𝑖=1𝑥∈X is a normalization constant
that ensures 𝑃 is an actual probability distribution.
Joint probability
• The joint probability distribution for X = x is equal to:
𝑃 𝑋 = 𝑥 = 1
𝑍exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥
𝐹
𝑖=1
• 𝑤𝑖: real number representing a rule weight
• Can set to “infinite” (very large) for rules that capture certainties for
the environment
• 𝑛𝑖 (𝑥): #times rule 𝑖 is true in 𝑥 • Lots of groundings of a high-weight rule add to the probability of 𝑥.
• This is the model feature associated with every clique
• 𝑍 = exp 𝑤𝑖 ⋅ 𝑛𝑖 𝑥𝐹𝑖=1𝑥∈X is a normalization constant
that ensures 𝑃 is an actual probability distribution.
2|X|
entries…
MLN Inference
• Direct computation of joint distribution 𝑃 is intractable
• Often resort to approximations such as sampling (Gibbs Sampling
/ MC-SAT)
• But sometimes the joint isn’t even what we want!
• Maximum A – Posteriori (MAP) inference (𝑎𝑟𝑔𝑚𝑎𝑥𝑥𝑃(𝑋 =𝑥)) • Approximate (MaxWalkSAT)
• Exact (AND/OR Branch & Bound)
• Marginals 𝑃 𝑋 = 𝑥 and conditionals 𝑃 𝑋 = 𝑥 𝐸 = 𝑒) • Sampling
• The possible presence of evidence 𝐸 prunes out areas of the
network and makes sampling faster
DEC-MLN
(Putting it all together)
EC-LP: The Baseline
• To evaluate their EC dialect, the authors compare against
a Prolog-based EC dialect described in [2] (EC-LP).
• The axioms (𝑜𝑙𝑑𝑠𝐴𝑡) and domain-dependent predicates
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 in that dialect have the form
already discussed.
• As a Prolog-based implementation, that dialect employs
the Closed World Assumption (CWA)
EC-LP: The Baseline
• To evaluate their EC dialect, the authors compare against
a Prolog-based EC dialect described in [2] (EC-LP).
• The axioms (𝑜𝑙𝑑𝑠𝐴𝑡) and domain-dependent predicates
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡, 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 in that dialect have the form
already discussed.
• As a Prolog-based implementation, that dialect employs
the Closed World Assumption (CWA)
CWA???
Open vs Closed World Assumption
• MLNs employ First-Order Logic (FOL), which is more
general than Logic Programming (LP), in that it makes the
Open World Assumption (OWA) instead of the more strict
Closed World Assumption (CWA)
DEC-MLN⇒ FOL ⇒ OWA
EC-LP⇒ LP ⇒ CWA
OWA vs CWA (contd.)
• In the CWA, we only believe what is known to us, and
everything else is perceived false.
• Example:
Knowledge Base Query Response
in(Room_4424, Jason) ?- in(Room_4424, Paul)
OWA vs CWA (contd.)
• In the CWA, we only believe what is known to us, and
everything else is perceived false.
• Example:
Knowledge Base Query Response
in(Room_4424, Jason) ?- in(Room_4424, Paul) NO
CWA
OWA vs CWA (contd.)
• In the CWA, we only believe what is known to us, and
everything else is perceived false.
• Example:
Knowledge Base Query Response
in(Room_4424, Jason) ?- in(Room_4424, Paul) NO
CWA
Knowledge Base Query Response
in(Room_4424, Jason) ?- in(Room_4424, Paul) UNKNOWN
OWA
OWA and EC
• We discussed earlier about the law of inertia in EC
• In EC-LP, the law is straightforward, because Prolog employs the CWA • If the 𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 and 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑡𝑒𝑑𝐴𝑡 pre-conditions are not satisfied
by the evidence, then there is no fluent initiation / termination.
• However, DEC-MLN employs FOL (not LP), which follows the OWA! • Fluents may be initiated/terminated by irrelevant events, causing
the loss of the inertia
• The solution: circumscription via predicate completion • Intuitively: Introduce equivalences (⇔) wherever we have
implications (⇐)
• “Inject” the CWA into FOL, by ensuring that pre-conditions have a 1-1 relationship with conclusions.
Predicate completion
Σ =
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑚𝑒𝑒𝑡 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ←
𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑐𝑡𝑖𝑣𝑒 𝑃1 , 𝑇 ,
¬𝑎𝑝𝑝𝑒𝑛𝑠 𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑃2 , 𝑇 , 𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑚𝑒𝑒𝑡 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 ←
𝑎𝑝𝑝𝑒𝑛𝑠 𝑖𝑛𝑎𝑐𝑡𝑖𝑣𝑒 𝑃1 , 𝑇 ,
¬𝑎𝑝𝑝𝑒𝑛𝑠(𝑟𝑢𝑛𝑛𝑖𝑛𝑔 𝑃2), 𝑇 ,
¬𝑎𝑝𝑝𝑒𝑛𝑠 𝑎𝑐𝑡𝑖𝑣𝑒 𝑃2 , 𝑇 ,𝑐𝑙𝑜𝑠𝑒(𝑃1, 𝑃2, 𝑇)
Σ′ =
𝑖𝑛𝑖𝑡𝑖𝑎𝑡𝑒𝑑𝐴𝑡 𝑚𝑒𝑒𝑡 𝑃1, 𝑃2 = 𝑡𝑟𝑢𝑒, 𝑇 →
𝑒𝑛𝑡𝑖𝑟𝑒 𝑏𝑜𝑑𝑦 𝑜𝑓 𝑓𝑖𝑟𝑠𝑡 𝑟𝑢𝑙𝑒 𝑎𝑏𝑜𝑣𝑒 ∨(𝑒𝑛𝑡𝑖𝑟𝑒 𝑏𝑜𝑑𝑦 𝑜𝑓 𝑠𝑒𝑐𝑜𝑛𝑑 𝑟𝑢𝑙𝑒 𝑎𝑏𝑜𝑣𝑒)
• By splitting every equivalence (⇔) into sets Σ and Σ’ we can add weights to rules in
either set and control how an HLE is recognized (with what probability)
Different cases
• Σ and Σ’ have “infinite” weights:
• HLEs are initiated and terminated with absolute certainty, and
inertia is retained.
HLE
Probability
Time 𝑡𝑖𝑛𝑖𝑡1
0
1
𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … 𝑡𝑡𝑒𝑟𝑚2 …
Different cases
• Σ has non-infinite weights, Σ’ has infinite weights.
• Given the preconditions for initiation and termination, the result
(head) may or may not hold!
• Hence HLEs are initiated and terminated with a degree of
uncertainty but inertia is retained.
HLE
Probability
Time 𝑡𝑖𝑛𝑖𝑡1
0
1
𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … 𝑡𝑡𝑒𝑟𝑚2 …
0.4
0.6
Different cases
• Rules in Σ have infinite weights, Σ’ contains “soft” rules
• HLE initiated/terminated with certainty, but inertia is gradually lost
• By the implication in Σ’: initiation / termination conditions for the
HLE might be fired in the presence of irrelevant evidence (w.r.t the
rules)
HLE
Probability
Time 𝑡𝑖𝑛𝑖𝑡1
0
1
𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … …
0.4
0.6
Different cases
• Both Σ and Σ’ are “soft” sets
• A combination of cases (2) and (3) occurs.
HLE
Probability
Time 𝑡𝑖𝑛𝑖𝑡1
0
1
𝑡𝑖𝑛𝑖𝑡2 𝑡𝑡𝑒𝑟𝑚1 … …
0.4
0.6
Experimental results
• Input: • All 26419 frames of CAVIAR
• LLE annotations in the forms of 𝑎𝑝𝑝𝑒𝑛𝑠 ground facts
• Peoples’ coordinates are used to compute 𝑐𝑙𝑜𝑠𝑒-ness of people
• Peoples’ poses are encoded in ground 𝑜𝑟𝑖𝑒𝑛𝑡𝑎𝑡𝑖𝑜𝑛 predicates
• The frames that a person enters or exits the scene, in the form of corresponding ground predicates 𝑒𝑛𝑡𝑒𝑟 and 𝑒𝑥𝑖𝑡
• Output: • Sequence of ground 𝑜𝑙𝑑𝑠𝐴𝑡(𝐹 = 𝑉, 𝑇) predicates, indicating that 𝐹 = 𝑉 at time 𝑇 • A detection probability of 0.5 or above signifies a positive.
• Comparison to HLE annotation makes evaluation possible
• Evaluation: • Precision, Recall, F-measure
Experimental results (contd.)
• Compared against EC-LP for the HLE “meeting” over all
28 videos:
• DEC-MLNa: Only rules in Σ are soft-constrained
• DEC-MLNb: Both Σ and Σ’ are soft-constrained
Method TP FP FN Precision Recall
EC-LP 3099 2258 525 0.578 0.855
DEC-MLNa 3048 1762 576 0.633 0.841
DEC-MLNb 3048 1154 576 0.725 0.841
Experimental results (contd.)
• Compared against EC-LP for the HLE “meeting” over all
28 videos:
• DEC-MLNa: Only rules in Σ are soft-constrained
• DEC-MLNb: Both Σ and Σ’ are soft-constrained
• What does the improvement imply?
Method TP FP FN Precision Recall
EC-LP 3099 2258 525 0.578 0.855
DEC-MLNa 3048 1762 576 0.633 0.841
DEC-MLNb 3048 1154 576 0.725 0.841
Experimental results (contd.)
• Compared against EC-LP for the HLE “meeting” over all
28 videos:
• DEC-MLNa: Only rules in Σ are soft-constrained
• DEC-MLNb: Both Σ and Σ’ are soft-constrained
Method TP FP FN Precision Recall
EC-LP 3099 2258 525 0.578 0.855
DEC-MLNa 3048 1762 576 0.633 0.841
DEC-MLNb 3048 1154 576 0.725 0.841
Adding a small weight to
the rarely fired 2nd
initiation rule for 𝑚𝑒𝑒𝑡 reduces #FP (see paper
for details)
Experimental results (contd.)
• Compared against EC-LP for the HLE “meeting” over all
28 videos:
• DEC-MLNa: Only rules in Σ are soft-constrained
• DEC-MLNb: Both Σ and Σ’ are soft-constrained
Method TP FP FN Precision Recall
EC-LP 3099 2258 525 0.578 0.855
DEC-MLNa 3048 1762 576 0.633 0.841
DEC-MLNb 3048 1154 576 0.725 0.841
Allowing the inertia to
decrease enables the 𝑚𝑒𝑒𝑡 HLE’s probability to
decrease faster in cases of
co-occurrence with 𝑚𝑜𝑣𝑒 (see paper for details)
Experimental results - Conclusions
• Rules are imperfect
• According to the human annotation, meeting may occur in cases
which are not captured by our rules
• In this system, the rules were found manually, i.e there was no
structure learning
• But even with learning, one cannot achieve perfect F-measure, because
that would overfit extremely.
• Therefore, rules will always be imperfect!
• DEC-MLN allows for a relaxation of how much we trust
our event modeling rules.
Pros
• First probabilistic dialect of the EC
• Good theoretical insights on how the Event Calculus can
be translated into MLNs; implications about inference
• Lift over EC-LP baseline
• Interesting explanation of how imperfect rules can be thus
regularized to meet with their own inadequacies
• Per-frame approach
• Applicable online
Cons
• Hard to motivate to Vision people
• Rules might appear simplistic
• The notion of rule uncertainty is prevalent in SRL, but not all activity
recognition in Vision is rule-based
• Lackluster experimental evaluation
• Compare to 3rd paper, which experiments on the same dataset and
against the same baseline
• Per-frame approach
• Errors accumulate
Input stream uncertainty
What about uncertainty in the
input stream? Can you handle
that with DEC-MLN?
Input stream uncertainty
• Yes! • Assigning weights to Σ only allows emulation of the approach of [4],
which deals with input stream uncertainty through ProbLog
• Also see [5] for an MLN-based approach which incorporates input stream uncertainty by using “observational variables” (dummy rules)
What about uncertainty in the
input stream? Can you do
that with the DEC-MLN?
Discussion
• Logic-based expressiveness
• Integration with state-of-the-art probabilistic modeling systems
• Alchemy
• YAP Prolog / ProbLog
• Is it important enough?
• I think so, but LP has fallen out of favor probably because of the semantic
web’s inadequacies…
• Important directions: weight / structure learning from data
• Part of current work in NCSR
• Structure learning challenging because of OWA / CWA semantics
• What can be treated “inertially” in Vision?
• Identity maintenance?
• Activity recognition in the presence of occlusions?
References
1. R. Kowalski and M. Sergot: A logic-based Calculus of Events. In New Generation Computing 4: 67–95, 1986.
2. Artikis A., Sergot M. and Paliouras G. A Logic Programming Approach to Activity Recognition, ACM International Workshop on Events in Multimedia, 2010.
3. V. Shet, J. Neumann, R. Visvanathan and L.S Davis: Bilattice-based logical reasoning for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2007
4. A.Skarlatidis, A.Artikis, J.Filippou and G.Paliouras: A Probabilistic Logic Programming Event Calculus. In Theory and Practice of Logic Programming, Special Issue in Probability, Logic & Learning, 2013
5. S. Tran, L. S. Davis: Event modeling and recognition using Markov Logic Networks. In European Conference on Computer Vision (ECCV), 2008
6. M. Richardson, P.Domingos: Markov Logic Networks. In Journal of Machine Learning, vol. 62, p. 107-136, February 2006