put that there Aim m models for human pointing m interpretation …tpfeiffe/pubs/2008... · 2013....

Preview:

Citation preview

  • Conversational Interface Agentsfacilitate natural interactions inVirtual Reality Environments.

    Bielefeld University

    Deictic expressions

    pointinggestures regions or objects primary task

    Virtual Reality manipulation ofobjects

    selection of objectsray casting occlusion arm extension

    interaction mediatedEmbodied Conversational Agent

    understanding natural communication andnatural gesturesrobustness and accuracy interpretation naturalpointing gestures

    study

    pointing-based conversational interactions immersive VirtualReality

    (such as “ ”) are fundamentalin human communication to refer to entities in the environment.In situated contexts, deictic expressions often comprise

    directed at . One of thein applications is the visuallyperceivable . Thus VR research has focused ondeveloping metaphors that optimize the tradeoff between aswift and precise . Prominent examples are

    , , or . These technologies arewell suited for interacting directly with the system.When the with the system is , e.g., by an

    (ECA), the primary focus lieson a smooth of

    . It is thus recommended to improve theof the of

    , i.e., gestures without facilitation per visualaids or other auxiliaries.To attain these ends, we contribute results from a onpointing and draw conclusions for the implementation of

    in.

    put that there

    Conversational Pointing Gestures for Virtual Reality InteractionImplications from an Empirical Study

    Thies Pfeiffertpfeiffe@techfak.uni-bielefeld.de

    Ipke Wachsmuthipke@techfak.uni-bielefeld.de

    m

    m

    m

    m

    m

    m

    m

    m

    m

    m

    m

    m

    Measuring and Reconstructing Pointing inVisual Contexts

    Deictic Object Reference in Task-orientedDialogue

    Processing Instructions

    Deixis: How to Determine DemonstratedObjects Using a Pointing Cone

    Resolving Object References in MultimodalDialogues for Immersive VirtualEnvironments

    Resolution of Multimodal ObjectReferences Using Conceptual Short TermMemory

    3D UserInterfaces – Theory and Practice.

    Mutual Disambiguation of 3D MultimodalInteraction in Augmented and VirtualReality.

    A Gesture ProcessingFramework for Multimodal Interaction inVirtual Reality.

    SenseShapes: Using Statistical Geometryfor Object Selection in a MultimodalAugmented Reality System.

    AVirtual Interface Agent and its Agency.

    Towards Preferences inVirtual Environment Interfaces.

    Kranstedt, Lücking, Pfeiffer, Rieser &Staudacher

    Kranstedt, Lücking, Pfeiffer, Rieser &Wachsmuth

    Mouton de Gruyter, Berlin. 2006.

    Weiß, Pfeiffer, Eikmeyer & Rickheit

    Mouton de Gruyter, Berlin. 2006.

    Kranstedt, Lücking, Pfeiffer, Rieser &Wachsmuth

    .Springer-Verlag GmbH, Berlin Heidelberg.2006.

    Pfeiffer & Latoschik

    Pfeiffer, Voss & Latoschik

    D. A. Bowman, E. Kruijff, J. Joseph J.LaViola, and I. Poupyrev.

    Addison-Wesley, 2005.E. Kaiser, A. Olwal, D. McGee, H. Benko, A.Corradini, X. Li, P. Cohen, and S. Feiner.

    In

    , pages 12–19. ACM Press, 2003.M. E. Latoschik.

    In

    , AFRIGRAPH 2001, pages 95–100.ACM SIGGRAPH, 2001.A. Olwal, H. Benko, and S. Feiner.

    In

    , pages 300–301,Tokyo, Japan, October 7–10 2003.I. Wachsmuth, B. Lenzmann, T. Jörding, B.Jung, M. Latoschik, and M. Fröhlich.

    In

    , pages516–517, 1997.C. A. Wingrave, D. A. Bowman, and N.Ramakrishnan.

    In

    , pages 63–72,Aire-la-Ville, Switzerland, Switzerland, 2002.Eurographics Association.

    Proceedings of the Brandial 2006

    Situated Communication.

    Situated Communication.

    6th International Gesture Workshop

    Proceedings of the IEEE Virtual Reality 2004

    Proceedings of the EuroCogSci03.

    Proceedings of the 5thInternational Conference on MultimodalInterfaces

    Proceedings of the 1stInternational Conference on ComputerGraphics, Virtual Reality and Visualisationin Africa

    Proceedingsof The Second IEEE and ACM InternationalSymposium on Mixed and AugmentedReality (ISMAR 2003)

    Proceedings of the First InternationalConference on Autonomous Agents

    EGVE’02: Proceedings of the Workshop onVirtual Environments 2002

    Research BackgroundResearch Background

    AcknowledgementsAcknowledgements

    This work has been funded by theEC

    Deutsche Forschungsgemeinschaft(DFG) in the CollaborativeResearch Center 360, “

    ”.

    in the project ,FP6 - IST program - referencenumber 27654, and by the

    PASION

    SituatedArtificial Communicators

    BibliographyBibliography

    Secondary

    Future Work

    m

    m

    m

    Taking the direction of gaze into account does not always improveperformance (contrary to the mainstream opinion).

    Humans display a non-linear behavior at the borders of the domain

    Confirm the results in a mixed setting with one human and oneembodied conversational agent over virtual objects

    At least in thesetting used in our study, with widely spaced objects (20 cm), it canbe ignored when going for high overall success

    A combined model for the extension ofproximal and distal pointing. The boundarybetween proximal and distal pointing isdefined by the personal distance .d

    Motivation

    Aim

    How accura

    te are point

    ing gestures

    ?

    m

    m

    m

    ImprovedAdvances for the and

    Contributing to more

    models for human pointinginterpretation

    production of pointing gesturesrobust

    multimodal conversational interfaces

    Applicationsm

    m

    m

    m

    Human-Computer Interaction

    Human-Agent Interaction

    Assistive TechnologyEmpirical research Usability Studies

    - multimodal interfaces

    - Human-Robot Interaction

    and- automatic multimodal annotation- automatic grounding of gestures based on a world model

    - multimodal conversational interfaces

    Observations

    Modelling approach

    m

    m

    m

    m

    m

    (as expected)

    Ellipse shapes of the bagplots suggest a cone-based model of theextension of pointing

    Pointing is fuzzy , but even in short range distanceFuzziness increases with distanceOvershooting at the edge of the domain (intentional)Still, the human object identifier shows a good performance of83.9%correct identifications

    AI GroupFaculty of TechnologyBielefeld UniversityGermany

    PASION

    Marc E. Latoschikmarcl@techfak.uni-bielefeld.de

    SituatedArtificialCommunicators

    SFB 360

    Method

    How to determine the parameters of the pointing extension model?

    What is the opening angle?

    What defines the anchor of the model?

    Results

    Pragmatic Model

    IFP is more precisethan GFP

    distinguish between proximal and distalpointing

    When the pointing extension is handled on thelevel of pragmatics, we can allow for inferencemechanisms to disambiguate between severalobjects and, hence, use heuristics. For thesimulation, we used a basic heuristics based onangular distance between objects and thepointing-ray. Again, the table on the right depictsthe results of our simulation runs. This time IFPperforms better than GFP.

    . The opening angles in the proximalrows are rather large, while the angles in the moredistal angles are much smaller. This motivates usto

    .

    Primarym

    m

    m

    m

    Pointing is best interpreted at the level of pragmatics and not semanticsIndex-Finger-Pointing is more preciseGaze-Finger-Pointing is more accurateThe results stated in the tables to the left and our qualitativeobservations using IADE suggest a dichotomy of proximal vs. distalpointing. This fits nicely with the dichotomy common in manylanguages (here vs. there)

    Conclusion

    row IFP GFP

    α perf. α perf.

    1 84 70.27 86 68.92

    2 80 61.84 68 75

    3 71 71.43 69 81.82

    4 60 53.95 38 65.79

    5 36 43.84 24 57.53

    6 24 31.15 25 42.62

    7 14 23.26 17 23.26

    8 10 7.14 10 14.29

    all 71 38.54 61 48.12

    row IFP GFP

    α perf. α perf.

    1 120 98.65 143 98.65

    2 109 100 124 100

    3 99 94.81 94 93.51

    4 109 98.68 89 93.42

    5 72 97.26 75 94.52

    6 44 91.8 50 90.16

    7 38 86.05 41 67.44

    8 31 52.38 26 69.05

    all 120 96.04 143 92.71

    Simulations

    Exploring the data

    The optimal opening angles per row for aof the pointing extension. In

    addition, the performance in terms of correctly identifiedobjects in percent of all objects within the specified area isdepicted, both for IFP and GFP. The row titled all showsthe performance for rows 1-7, row 8 has been excludedbecause of the overshooting behavior.

    a strict semanticpointing cone model

    The optimal opening angles per row for aof the pointing extension. In

    addition, the performance in terms of correctly identifiedobjects in percent of all objects within the specified area isdepicted, both for IFP and GFP. The row titled all showsthe performance for rows 1-7, row 8 has been excludedbecause of the overshooting behavior.

    a pragmaticpointing cone model

    A visualization of the intersections of the pointing-ray (dots) for four different objects over allparticipants. The data is grouped via bagplots, the asterisk marks the mean, the darker areaclusters 50 percent, the brighter area 75 percent of the demonstrations. In the depicted settingthe person pointing was standing to the left, the person identifying the objects to the right.

    (M1)

    (M2)

    (M3)

    Study on object pointingm

    m

    m

    m

    m

    Interaction of two participantsTwo conditions: speech +gesture and gesture onlyReal objectsStudy with 62 participantsCooperative effort withlinguists

    proximalcone

    distalcone

    distance d

    Technologym

    m

    m

    m

    Audio + video recordingsMotion capturing using ARTGmbH optical tracking systemAutomatic adaptation of amodel of the user’s postureSpecial hand-made soft gloves

    Interaction Gamem

    m

    m

    m

    m

    m

    Description Giver is presented withthe object to demonstrateDescription Giver utters deicticexpression (s+g or g only)Object Identifier tries to identify theobjectDescription Giver gives feedback(yes/no)Proceed with next objectNo corrections or repairs!

    The study combines multimodaldata comprising audio, video,motion capture and annotationdata. A coherent synchronized viewof all data sources is provided usingthe Interactive Augmented DataExplorer (IADE), developed atBielefeld University.The picture to the right shows asession with the Interactive Aug-mented Data Explorer. The scientistto the right interactively explores therecorded data for qualitativeanalysis. The model to the left showsthe table with the objects and a stick-figure driven by the motion capturedata. The video taken from onecamera perspective is displayed onthe floating panel, together with theaudio recordings. Information from

    specific annotation tiers is presented as floating text.The scientist can, e.g., take the perspective of thedescription giver or the object identifier. Allelements are interactive, e.g. the video panels andannotations can be resized or positioned to allow fora comfortable investigation.

    Several simulation runs have been conducted totest different approaches to model the pointingextension on the collected data.

    For a strict semantic model the pointing extensionhas to single out object. In thesimulation runs we determined the optimal anglefor such a pointing cone per row and for theoverall area. The results are depicted in the tableon the right. For the strict semantic model theGFP offers better performance while having anarrower opening angle.

    Strict Semantic Model

    one and only one

    GFP is more accuratethan IFP.

    Do we aim with the index finger (IFP) orby gazing over the index finger (GFP)?

    spotlight

    camera

    display- M1: task- M2 + M3: system time

    tracking system

Recommended