4
Supplementary Material for Learning Triadic Belief Dynamics in Nonverbal Communication from Videos Lifeng Fan * , Shuwen Qiu ˚ , Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu UCLA Center for Vision, Cognition, Learning, and Autonomy {lfan, s.qiu, z.zheng}@ucla.edu, {tao.gao, sczhu}@stat.ucla.edu, [email protected] https://github.com/LifengFan/Triadic-Belief-Dynamics 1. Beam Search Algorithm 2. Dataset Fig. 1 showcase some snapshots from our dataset. Every three rows correspond to one long video, wherein the first row is the third-person view, and the other two rows are the first-person views from two agents. The first video is mainly about Joint Attention. The second video includes No Com- munication, Attention Following and Joint Attention; it also involves second-order false belief. The third video includes Attention Following. The fourth video includes No Commu- nication. 3. Surveys for Human Studies Below are the links to the questionnaires for the human subject studies in the keyframe-based video summary task. • Group 1: https://5minds.typeform.com/to/ dh782Z • Group 2: https://5minds.typeform.com/to/ T3hGhN • Group 3: https://5minds.typeform.com/to/ wovakS • Group 4: https://5mind.typeform.com/to/ SpOMu3 4. Additional Quantitative Results 4.1. ROC curve Fig. 3 show the ROC curves for all five minds in the predicting belief dynamics task. The numbers of belief dy- namics denote different categories: 0–occur, 1–disappear, 2–update, and 3–null. 5. Additional Qualitative Results Fig. 2 shows additional qualitative results for the keyframe-based video summary task. * Lifeng Fan and Shuwen Qiu contributed equally. Algorithm 1: Infer events via dynamic program- ming beam search Input : Extracted feature set Φ, constructed attention graph G, the set of interactive segment proposals Vs, and pre-trained likelihood ppej |ΦΛ j , GΛ j q. Output : Communication events Ve Initialization: Ve “H, B “tVe,p 0u, m, n. 1 while True do 2 B 1 “H 3 for tVe,puP B do / * Propose next m possible events (both the event segment and the event label). * / 4 tei u“ N extpVs,Ve,mq 5 if tei u is not empty then 6 for each proposed ei do / * Calculate the posterior probability of Ve via dynamic programming. * / 7 ppVe|Φ, Gq“ DP pVe, p, ei , Φ, Gq 8 Ve Ve Ytei u 9 B 1 B 1 YtVe,pu 10 end 11 end 12 else 13 B 1 B 1 YtVe,pu 14 end 15 end 16 if B 1 ““ B then 17 return Ve BestpB, 1q 18 end 19 else / * select n best event parsing with best posterior prob from all candidates. * / 20 D BestpB 1 ,nq 21 B D 22 end 23 end

Supplementary Material for Learning Triadic Belief

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplementary Material for Learning Triadic Belief

Supplementary Material forLearning Triadic Belief Dynamics in Nonverbal Communication from Videos

Lifeng Fan*, Shuwen Qiu˚, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu

UCLA Center for Vision, Cognition, Learning, and Autonomy{lfan, s.qiu, z.zheng}@ucla.edu, {tao.gao, sczhu}@stat.ucla.edu, [email protected]

https://github.com/LifengFan/Triadic-Belief-Dynamics

1. Beam Search Algorithm

2. DatasetFig. 1 showcase some snapshots from our dataset. Every

three rows correspond to one long video, wherein the first

row is the third-person view, and the other two rows are the

first-person views from two agents. The first video is mainly

about Joint Attention. The second video includes No Com-munication, Attention Following and Joint Attention; it also

involves second-order false belief. The third video includes

Attention Following. The fourth video includes No Commu-nication.

3. Surveys for Human StudiesBelow are the links to the questionnaires for the human

subject studies in the keyframe-based video summary task.

• Group 1: https://5minds.typeform.com/to/dh782Z

• Group 2: https://5minds.typeform.com/to/T3hGhN

• Group 3: https://5minds.typeform.com/to/wovakS

• Group 4: https://5mind.typeform.com/to/SpOMu3

4. Additional Quantitative Results4.1. ROC curve

Fig. 3 show the ROC curves for all five minds in the

predicting belief dynamics task. The numbers of belief dy-

namics denote different categories: 0–occur, 1–disappear,

2–update, and 3–null.

5. Additional Qualitative ResultsFig. 2 shows additional qualitative results for the

keyframe-based video summary task.

*Lifeng Fan and Shuwen Qiu contributed equally.

Algorithm 1: Infer events via dynamic program-

ming beam search

Input : Extracted feature set Φ, constructed

attention graph G, the set of interactive

segment proposals Vs, and pre-trained

likelihood ppej |ΦΛj ,GΛj q.

Output : Communication events Ve

Initialization: Ve “ H,B “ tVe, p “ 0u,m, n.1 while True do2 B1 “ H3 for tVe, pu P B do

/* Propose next m possible events(both the event segment andthe event label). */

4 teiu “ NextpVs, Ve,mq5 if teiu is not empty then6 for each proposed ei do

/* Calculate the posteriorprobability of Ve viadynamic programming. */

7 ppVe|Φ,Gq “ DP pVe, p, ei,Φ,Gq8 Ve “ Ve Y teiu9 B1 “ B1 Y tVe, pu

10 end11 end12 else13 B1 “ B1 Y tVe, pu14 end15 end16 if B1 ““ B then17 return Ve “ BestpB, 1q18 end19 else

/* select n best event parsingwith best posterior prob fromall candidates. */

20 D “ BestpB1, nq21 B “ D22 end23 end

Page 2: Supplementary Material for Learning Triadic Belief

Figure 1: Sample snapshots of the Meditation dataset.

Page 3: Supplementary Material for Learning Triadic Belief

Frame0 100 200 300 400 500 6000.0

0.2

0.4

0.6

0.8

1.0

Prob

ability

700 800 900 1000 1100

HumanOurs

DDP-LSTMFCSNDSN

Frame0 100 200 300 400 500 6000.0

0.2

0.4

0.6

0.8

1.0

Prob

ability

700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000

HumanOurs

DDP-LSTMFCSNDSN

Frame0 100 200 300 400 500 6000.0

0.2

0.4

0.6

0.8

1.0

Prob

ability

700 800 900 1000 1100 1200 1300

HumanOurs

DDP-LSTMFCSNDSN

Figure 2: Additional comparisons on video summarization.

Page 4: Supplementary Material for Learning Triadic Belief

Figure 3: ROC Curve