3
Learning Group Communication from Demonstration Navyata Sanghvi Carnegie Mellon University Pittsburgh, PA, USA Ryo Yonetani The University of Tokyo Tokyo, Japan Kris M. Kitani Carnegie Mellon University Pittsburgh, PA, USA Abstract—We consider the design of a communication policy for multi-group multi-agent communication, which takes as input the state of the world (e.g., history of communication, gaze direction, body pose of others) and outputs an optimal communi- cation mode (e.g., speaking, listening, responding) for appropriate social interaction. A key component of our communication policy design is a communication gating module, termed the Kinesic- Proxemic-Message Gate (KPM-Gate), that automatically infers group membership so that the actions generated by the com- munication policy depend only on the relevant group members. We pose the communication policy learning problem as a multi- agent imitation learning problem and we learn a single shared policy across all agents under the assumption of a decentralized Markov decision process. We term our entire policy network as the Multi-Agent Group Discovery and Communication Mode Network (MAGDAM network) as it learns social group structure as well as the dynamics of group communication. Our experimental validation on both synthetic and real world data shows that our model is able to discover social group structure in addition to learning an accurate multi-agent communication policy. I. I NTRODUCTION When engaged with other people, a large part of our commu- nication is non-verbal. In the context of group communication, kinesics factors such as body and head pose are influenced by group membership (e.g., we tend to look at people we are communicating with). Proxemics factors define our interper- sonal space (e.g., we tend to stand close to people we want to communicate with). These are indicative of constraints on in- formation we broadcast during social communication. We aim to design models of multi-agent communication that explicitly take into account the role of such non-verbal communication. The development of such models is important for understand- ing human-to-human interaction, and for developing artificial intelligence systems which interact with groups of people. We present a model of multi-group multi-agent communi- cation, termed Multi-Agent Group Discovery and Communica- tion Mode Network (MAGDAM network). MAGDAM is a deep neural network which takes as input neighboring agents’ rela- tive gazes and poses, and observable communication modes (Fig. 2(a)). Within the model, a Kinesic-Proxemic-Message Gate (KPM-Gate) provides a gating mechanism to select and advance only an agent’s most relevant encoded social cues, thus giving the model a natural way to discover communica- tion groups (Fig. 1). This, combined with an encoder of the agent’s own communication modes, finally outputs the agent’s next action to best advance intra-group communication. While there exists, separately, previous work on small-group conversational dynamics [9], as well as on group detection (a) KPM-Gate activation (b) Group configuration Fig. 1: Kinesic-Proxemic-Message Gating. Given poses, gaze directions, and communication modes of multiple parties, the KPM-Gate learns to select relevant messages. [10, 18, 19, 22], we are the first to present a model which simultaneously (1) effectively designs multi-group multi-agent communication protocols and (2) gives an automatic way to discover groups implicitly as a byproduct of the training process, performing comparably to state-of-the-art methods. Related Work: Much work has been done on multi-agent communication. Extensive surveys on how to adopt machine learning approaches are presented in [4, 20]. Recent work makes the use of deep reinforcement learning or imitation learning to discover policies which allow agents to communi- cate with each other, e.g., [7, 8, 12, 14, 16, 21, 15]. To the best of our knowledge, our work is the first to show how the modeling of physically constrained observations helps learn accurate communication policies, while simultaneously giving a natural way to infer social grouping in multi-group scenarios. II. GROUP COMMUNICATION POLICY NETWORK In the presence of multiple groups of multiple interacting agents, our goals are (1) to learn a policy of social interaction for each agent and (2) discover a social group structure to guide that social interaction. Formally, we pose the problem as a Decentralized Markov Decision Process (Dec-MDP) [3, 2]. Multi-group communications proceed as follows: (1) each agent observes a part of the current state of the environment (i.e., observes the state of nearby agents), (2) the agent takes an action based on the observation, and (3) the agents’ joint action induces a state transition of the environment and they receive a latent reward. Φ m = {1, ..., B} is a set of B groups that agent A m may be assigned to. φ m t Φ m is a particular assignment at time step t during interaction. We attempt to learn a decentralized policy π shared by all agents. This maps an agent’s observations Z into actions U . To ensure joint full observability, agents must implicitly learn group assignments φ from visually observed information while learning π. Note, (1) our model is shared by all agents under consid- eration, so that the number of agents in the scene may be

Learning Group Communication from Demonstration · design is a communication gating module, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that automatically infers group membership

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning Group Communication from Demonstration · design is a communication gating module, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that automatically infers group membership

Learning Group Communicationfrom Demonstration

Navyata SanghviCarnegie Mellon University

Pittsburgh, PA, USA

Ryo YonetaniThe University of Tokyo

Tokyo, Japan

Kris M. KitaniCarnegie Mellon University

Pittsburgh, PA, USA

Abstract—We consider the design of a communication policyfor multi-group multi-agent communication, which takes as inputthe state of the world (e.g., history of communication, gazedirection, body pose of others) and outputs an optimal communi-cation mode (e.g., speaking, listening, responding) for appropriatesocial interaction. A key component of our communication policydesign is a communication gating module, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that automatically infersgroup membership so that the actions generated by the com-munication policy depend only on the relevant group members.We pose the communication policy learning problem as a multi-agent imitation learning problem and we learn a single sharedpolicy across all agents under the assumption of a decentralizedMarkov decision process. We term our entire policy networkas the Multi-Agent Group Discovery and Communication ModeNetwork (MAGDAM network) as it learns social group structure aswell as the dynamics of group communication. Our experimentalvalidation on both synthetic and real world data shows that ourmodel is able to discover social group structure in addition tolearning an accurate multi-agent communication policy.

I. INTRODUCTION

When engaged with other people, a large part of our commu-nication is non-verbal. In the context of group communication,kinesics factors such as body and head pose are influenced bygroup membership (e.g., we tend to look at people we arecommunicating with). Proxemics factors define our interper-sonal space (e.g., we tend to stand close to people we want tocommunicate with). These are indicative of constraints on in-formation we broadcast during social communication. We aimto design models of multi-agent communication that explicitlytake into account the role of such non-verbal communication.The development of such models is important for understand-ing human-to-human interaction, and for developing artificialintelligence systems which interact with groups of people.

We present a model of multi-group multi-agent communi-cation, termed Multi-Agent Group Discovery and Communica-tion Mode Network (MAGDAM network). MAGDAM is a deepneural network which takes as input neighboring agents’ rela-tive gazes and poses, and observable communication modes(Fig. 2(a)). Within the model, a Kinesic-Proxemic-MessageGate (KPM-Gate) provides a gating mechanism to select andadvance only an agent’s most relevant encoded social cues,thus giving the model a natural way to discover communica-tion groups (Fig. 1). This, combined with an encoder of theagent’s own communication modes, finally outputs the agent’snext action to best advance intra-group communication.

While there exists, separately, previous work on small-groupconversational dynamics [9], as well as on group detection

(a) KPM-Gate activation (b) Group configuration

Fig. 1: Kinesic-Proxemic-Message Gating. Given poses, gazedirections, and communication modes of multiple parties, theKPM-Gate learns to select relevant messages.

[10, 18, 19, 22], we are the first to present a model whichsimultaneously (1) effectively designs multi-group multi-agentcommunication protocols and (2) gives an automatic wayto discover groups implicitly as a byproduct of the trainingprocess, performing comparably to state-of-the-art methods.Related Work: Much work has been done on multi-agentcommunication. Extensive surveys on how to adopt machinelearning approaches are presented in [4, 20]. Recent workmakes the use of deep reinforcement learning or imitationlearning to discover policies which allow agents to communi-cate with each other, e.g., [7, 8, 12, 14, 16, 21, 15]. To thebest of our knowledge, our work is the first to show how themodeling of physically constrained observations helps learnaccurate communication policies, while simultaneously givinga natural way to infer social grouping in multi-group scenarios.

II. GROUP COMMUNICATION POLICY NETWORK

In the presence of multiple groups of multiple interactingagents, our goals are (1) to learn a policy of social interactionfor each agent and (2) discover a social group structure toguide that social interaction. Formally, we pose the problemas a Decentralized Markov Decision Process (Dec-MDP) [3,2]. Multi-group communications proceed as follows: (1) eachagent observes a part of the current state of the environment(i.e., observes the state of nearby agents), (2) the agent takesan action based on the observation, and (3) the agents’ jointaction induces a state transition of the environment and theyreceive a latent reward. Φm = {1, ..., B} is a set of B groupsthat agent Am may be assigned to. φmt ∈ Φm is a particularassignment at time step t during interaction. We attempt tolearn a decentralized policy π shared by all agents. This mapsan agent’s observations Z into actions U . To ensure joint fullobservability, agents must implicitly learn group assignmentsφ from visually observed information while learning π.

Note, (1) our model is shared by all agents under consid-eration, so that the number of agents in the scene may be

Page 2: Learning Group Communication from Demonstration · design is a communication gating module, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that automatically infers group membership

(a) Group Communication PolicyNetwork: Components of MAGDAM

(b) Neighbor Module and Self-State Encoder (c) An Observation Encoder

Fig. 2: (a) Our network’s components. (b) Observations of pose, gaze, and self state history are converted by the NeighborModule and Self-State Encoder to inputs for the Policy Network (zactor and F ′). (c) The Observation Encoder within theNeighbor Module learns to score relevance (G) of encoded observations of neighbor state history (F ) and visual history (V ).

dynamic, and, furthermore, (2) agents learn a single modulefor their neighbors’ messages, and, thus, the number andorder of neighbors may be dynamic during deployment. TheMAGDAM network consists of three modules (Fig. 2): (1)Neighbor Module: This module encodes observations receivedfrom the environment; (2) Self-State Encoder Module: Thismodule encodes the agent’s own state history; (3) PolicyModule: This incorporates the others’ outputs and generates aprobability vector over the agent’s next action, i.e., communi-cation mode (speaking, listening, responding, etc.)Imitation Learning: By modeling MAGDAM’s encoders andmodules G, F , F ′, V , π as fully-differentiable functions, weensure that it is trainable end-to-end via back-propagation,using demonstrations of states and actions. There are similarapproaches in recent work [1, 17] on multi-agent behaviorprediction. We emphasize that training of MAGDAM requiresno supervision with respect to group assignments φ, as it isimpractical to fully annotate in dynamic situations. Moreover,we expect to discover them implicitly using the KPM-Gate G.

III. EXPERIMENTS WITH SIMULATED DATA

Existing datasets [5, 17, 23] provide a sequence of posesand gazes, while group assignments are annotated temporallysparsely and no communication modes are provided. Thismotivates us to simulate realistic multi-group communicationusing prior studies on the protocol of group communications.We use Synthetic Data [5] to provide 100 non-identical,independent physical layouts of multiple people (7 to 12 ineach) organized into several groups (2 or 3 members in each).Further, to design communication protocols, we refer to thestudy on multimodal small-group conversational dynamics [9].

We compare MAGDAM with the following baselines:Other-States-Only (OSO): We omit self-state encoder F ′.Self-State-Only (SSO): We omit the Neighbor Module. Av-erage Pooling (AveP): Prior work studies the role of messagebroadcasting from one to all agents [7, 15] or particularneighbors [21] by average pooling such messages. We imple-ment this by omitting the KPM-Gate G so that all neighborencoded ‘messages’ F are equally relevant. Social Pooling(SocP) [1, 13]:. We introduce a grid that divides the spaceinto non-overlapping regions and pools messages region-wise,preserving agents’ pose information, ignoring gaze directions.

TABLE I: Action Prediction Performance (mAP) with vary-ing numbers of neighbor agents J . History length = 15.

Model (mAP’s) J = 2 4 8 12

Other States Only (OSO) 0.67 0.69 0.70 0.68Self States Only (SSO) 0.76 0.76 0.76 0.76Average Pooling (AveP) 0.87 0.86 0.85 0.85Social Pooling (SocP) 0.85 0.84 0.84 0.84

MAGDAM Network 0.88 0.88 0.89 0.88

MAGDAM is trained to minimize categorical cross entropyloss between predicted and actual actions via Adam [11]. Weperform 2-fold cross-validation and evaluate mean averageprecision (mAP) over actions, averaged over the two test sub-sets. Table I shows communication action prediction perfor-mances, with a horizon length of 15, and various numbers ofneighbors J . MAGDAM clearly outperforms baselines, sinceit implicitly considers comparative relevance of observationsfrom different neighbors.

IV. EXPERIMENTS WITH REAL DATA

MAGDAM’s learned KPM-Gate by itself can be usedto discover communication groups in real-world data - theCoffee Break dataset [5] and the Cocktail Party dataset [23].For each frame, we define a distance D(m,ni) betweentwo people Am, Ani based on the output of learned gatingfunction G: D(m,ni) = 1 − 1

2 (G(g(ni←m), l(ni←m)) +

G(g(m←ni), l(m←ni))) and run the DBSCAN clustering al-gorithm [6] to cluster people into communication groups.

TABLE II: Performance on Group Detection: F1 scores:DS [10], HVFF-ms [18], GCFF [19], and GRUPO [22].

DS HVFF-ms GCFF GRUPO KPM-GateCoffee Break 0.39 0.30 0.63 N/A 0.63Cocktail Party N/A 0.39 0.64 0.64 0.60

Group detection performance is measured by the F1 scoreunder the |G| condition [5, 18]. As reported in Table II, ourapproach (KPM-Gate) demonstrates comparable performanceover state-of-the-art methods [10, 18, 19, 22]. This indicatesthat the physical constraints learned by MAGDAM are con-sistent with real-world human communications, even thoughtraining used simulated interactions.

Page 3: Learning Group Communication from Demonstration · design is a communication gating module, termed the Kinesic-Proxemic-Message Gate (KPM-Gate), that automatically infers group membership

REFERENCES

[1] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese. Social lstm: Human trajectoryprediction in crowded spaces. In Proceeding of the IEEEConference on Computer Vision and Pattern Recognition,2016.

[2] R. Becker, S. Zilberstein, and V. Lesser. Decentralizedmarkov decision processes with event-driven interactions.In Proceedings of the Third International Joint Confer-ence on Autonomous Agents and Multiagent Systems-Volume 1, pages 302–309. IEEE Computer Society, 2004.

[3] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilber-stein. The complexity of decentralized control of markovdecision processes. Mathematics of operations research,27(4):819–840, 2002.

[4] L. Busoniu, R. Babuska, and B. D. Schutter. A com-prehensive survey of multiagent reinforcement learning.IEEE Transactions on Systems, Man, and Cybernet-ics, Part C (Applications and Reviews), 38(2):156–172,March 2008.

[5] M. Cristani, L. Bazzani, G. Paggetti, A. Fossati,D. Tosato, A. D. Bue, G. Menegaz, and V. Murino.Social interaction discovery by statistical analysis of f-formations. In British Machine Vision Conference, pages23.1–23.12, 2011.

[6] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatialdatabases with noise. In Proceedings of the InternationalConference on Knowledge Discovery and Data Mining,pages 226–231, 1996.

[7] J. Foerster, Y. M. Assael, N. de Freitas, and S. White-son. Learning to communicate with deep multi-agentreinforcement learning. In Proceeding of the Advancesin Neural Information Processing Systems, pages 2137–2145, 2016.

[8] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, andS. Whiteson. Counterfactual multi-agent policy gradients.CoRR, abs/1705.08926, 2017.

[9] D. Gatica-Perez. Analyzing group interactions in con-versations: a review. In Proceedings of the InternationalConference on Multisensor Fusion and Integration forIntelligent Systems, pages 41–46, 2006.

[10] H. Hung and B. Krose. Detecting f-formations as domi-nant sets. In Proceedings of the International Conferenceon Multimodal Interfaces, pages 231–238, 2011.

[11] D. P. Kingma and J. Ba. Adam: A method for stochastic

optimization. CoRR, abs/1412.6980, 2014.[12] H. M. Le, Y. Yue, and P. Carr. Coordinated multi-agent

imitation learning. In Proceeding of the InternationalConference on Machine Learning, 2017.

[13] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. S. Torr,and M. Chandraker. Desire: Distant future prediction indynamic scenes with interacting agents. In Proceedingsof the IEEE Conference on Computer Vision and PatternRecognition, 2017.

[14] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel,and I. Mordatch. Multi-agent actor-critic formixed cooperative-competitive environments. CoRR,abs/1706.02275, 2017.

[15] I. Mordatch and P. Abbeel. Emergence of grounded com-positional language in multi-agent populations. CoRR,abs/1703.04908, 2017.

[16] P. Peng, Q. Yuan, Y. Wen, Y. Yang, Z. Tang, H. Long,and J. Wang. Multiagent bidirectionally-coordinated netsfor learning to play starcraft combat games. CoRR,abs/1703.10069, 2017.

[17] A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese.Learning social etiquette: Human trajectory understand-ing in crowded scenes. In Proceeding of the EuropeanConference on Computer Vision, pages 549–565, 2016.

[18] F. Setti, O. Lanz, R. Ferrario, V. Murino, and M. Cristani.Multi-scale f-formation discovery for group detection. InProceedings of the International Conference on ImageProcessing, pages 3547–3551, 2013.

[19] F. Setti, C. Russell, C. Bassetti, and M. Cristani. F-formation detection: Individuating free-standing conver-sational groups in images. PLOS ONE, 10(5):1–26, 2015.

[20] P. Stone and M. Veloso. Multiagent systems: A sur-vey from a machine learning perspective. AutonomousRobots, 8(3):345–383, Jun 2000.

[21] S. Sukhbaatar, A. Szlam, and R. Fergus. Learningmultiagent communication with backpropagation. InProceeding of the Advances in Neural Information Pro-cessing Systems, pages 2244–2252, 2016.

[22] M. Vazquez, A. Steinfeld, and S. E. Hudson. Paral-lel detection of conversational groups of free-standingpeople and tracking of their lower-body orientation. InProceedings of the IEEE/RSJ International Conferenceon Intelligent Robots and Systems, 2015.

[23] G. Zen, B. Lepri, E. Ricci, and O. Lanz. Space speaks:Towards socially and personality aware visual surveil-lance. In Proceeding of the International Workshopon Multimodal Pervasive Video Analysis, pages 37–42,2010.