23
LINGTOUR: a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc | lin | chollet @ enst . fr Catherine Pelachaud IUT de Montreuil - Université Paris 8 140, rue de la Nouvelle France 93100 Montreuil, France c. pelachaud @ iut . univ -paris8. fr Ding Xiaoqing, Mao Yuhang Dept. of Electronic Engineering Tsinghua University Beijing, 100084, China dingxq @ tsinghua . edu . cn Ni Yang Institut National des Télécommunications Département Electronique et Physique 9,Rue Charles Fourier 91011 Evry Cedex-France yang.ni@ int - evry . fr

LINGTOUR: a PDA for tourists

Embed Size (px)

DESCRIPTION

LINGTOUR: a PDA for tourists. Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc | lin | [email protected] Catherine Pelachaud IUT de Montreuil - Université Paris 8 140, rue de la Nouvelle France 93100 Montreuil, France - PowerPoint PPT Presentation

Citation preview

Page 1: LINGTOUR: a PDA for  tourists

LINGTOUR:a PDA for tourists

Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet

GET-ENST46, rue Barrault

75634 Paris Cedex 13goye | elc | lin | [email protected]

Catherine PelachaudIUT de Montreuil - Université Paris 8

140, rue de la Nouvelle France93100 Montreuil, France

[email protected]

Ding Xiaoqing, Mao YuhangDept. of Electronic Engineering

Tsinghua UniversityBeijing, 100084, China

[email protected]

Ni Yang Institut National des Télécommunications

Département Electronique et Physique9,Rue Charles Fourier

91011 Evry [email protected]

Page 2: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

LINGTOUR: an LINGTOUR: an historyhistory• Collaboration with TsingHua Collaboration with TsingHua

University :University :– Memorandum of understanding (2000)Memorandum of understanding (2000)– Vocal French-Chinese dictionary with Vocal French-Chinese dictionary with Le Le

RobertRobert– Master thesis of Master thesis of Dong QingfuDong Qingfu: :

« « Realization of Intelligent Camera Realization of Intelligent Camera Capable of Character Recognition and Capable of Character Recognition and TranslationTranslation » »

   

Page 3: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

The LINGTOUR projectThe LINGTOUR project• Multilingual management of information• Initially, a Initially, a PDA for travellersPDA for travellers : :

– Virtual guideVirtual guide : : access to multilingual information for access to multilingual information for tourists (practical and cultural)tourists (practical and cultural)

– Communication assistantCommunication assistant: : translation help, navigation translation help, navigation within lexicon and access to typical conversations within lexicon and access to typical conversations

– Travel assistantTravel assistant : : orientation and environment orientation and environment interpretation using local and positioning informationinterpretation using local and positioning information

• A personal assistant (PDA or smartphone) A personal assistant (PDA or smartphone) with with multimodalmultimodal and and ergonomicergonomic capabilities :capabilities :– inputsinputs (text, speech, stylus, images) (text, speech, stylus, images)– outputsoutputs (text, speech, images, video) (text, speech, images, video)

Page 4: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Interactions PDA - serverInteractions PDA - server

Multimodal navigation in maps and lexicon

Tsinghua University

Sound taking

Selection / extraction of text

Rafinement / corrections of the image

Images, sound

Images, sound, text

Character recognition, Vocal recognitionMultilingal translation,Speech synthesis

Supervision

Page 5: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Exploit the specificities of PDA

• One makes an optimal exploitation of possibilities of PDA for the multimodality : – Use, jointly, without any keyboard, input of the tactile

screen, microphone and camera, and– Exploit alternatively or simultaneously the graphic qnd sound

possibilities, according to the context, to represent the information.

• The PDA is connected as each time as possible to Internet:– to download actuality informations– to enable to export the tasks on a remote server:

• too complicated • Or too high cost for memory

– To enable the intervention, if necessary, of a human operater

Page 6: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

3 types of multimodal interface3 types of multimodal interface• Gesture and voice :Gesture and voice :

Combinaition ofCombinaition of Control menus + vControl menus + voocal inputcal input– Controling zoomable interfaces towards graphic or Controling zoomable interfaces towards graphic or

text inputstext inputs

• Intelligent Camera :Intelligent Camera : Rafinement of imagesRafinement of images – Based on the correlation of a series of imagesBased on the correlation of a series of images– to improve character recognitionto improve character recognition

• Cultural agents :Cultural agents : Conversational agents Conversational agents animated and animated and adapted adapted to the cultureto the culture– Adding toAdding to speech non-verbal behaviospeech non-verbal behaviouur: face, eyes, r: face, eyes,

gestures, depending to the culturegestures, depending to the culture

Page 7: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

ZUIs and Menu control 2DZUIs and Menu control 2D• Constraints ofConstraints of PDA : PDA :

screen sizescreen size• ZUIs : user zoomable ZUIs : user zoomable

interfacesinterfaces– Concept of Concept of semantic zoomsemantic zoom::

Progressive revelation of Progressive revelation of levels of detailslevels of details

• Menus control Menus control [1][1] : :– Selection + controlSelection + control of the of the

action (movement, zoom) by action (movement, zoom) by only one gestureonly one gesture

– No chang of context, no No chang of context, no manipulation of multiple manipulation of multiple interactions for only one interactions for only one operationoperation

Gesture and Gesture and voicevoice

[1][1] Pook, S., Lecolinet, E., Vaysseix, G. et Barillot, E., Control Menus: Execution and Control in a Single Interactor. Proc. ACM conf. on Human Factors in Computing Systems (CHI) 2000, 263-264. ACM Press.

Page 8: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Characteristics of menu controlCharacteristics of menu control– Combinning Combinning the selection and the controlthe selection and the control of an of an

operation for only one gestureoperation for only one gesture– Capable to integrate up to Capable to integrate up to 2 bars of movements 2 bars of movements

(vertical et horizontal)(vertical et horizontal)– The user concentrates his attention on the The user concentrates his attention on the contentcontent– Capable to have Capable to have sub-menussub-menus– Like the Pie menus Like the Pie menus [2][2] and the Marking menus and the Marking menus [3][3], ,

offering a beginner offering a beginner mode et an expert modemode et an expert mode• The spacious disposition of the menus helps the The spacious disposition of the menus helps the

mmeemorimorizzation ation • Quick gestures => the menus don’t appear on the screenQuick gestures => the menus don’t appear on the screen• Implicit passage from a mode to the otherImplicit passage from a mode to the other

[2][2] Hopkins, D., The design and implementation of Pie menus. Dr Dobb's journal of software tools, 1991, 16 (12), 16-26.[3][3] Kurtenbach, G. et al., The Hotbox: efficient access to a large number of menu-items. Proc. ACM – CHI, 1993, 231-327.

Gesture and Gesture and voicevoice

Page 9: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Application of the menu Application of the menu controlcontrol• navigation in a

map of town, • navigation by a

lexicon :– Helpful words and

clauses to tourists, – hierarchized in

categories such as : accomodation > hotel > reservation….

Gesture and Gesture and voicevoice

Page 10: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

The voice : multilingal recognition• voice recognition engine:

– Limited vocabulary, but– independant of speaker,– No leaning.

• The recognition in different langages :– sharing common acoustique models, one which

facilitates the future extensions to new languages.– Adaptable models to users and to usage conditions.

French

Chinesecommon

acoustique models

Models specific to the langage

Gesture and Gesture and voicevoice

Page 11: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

The voice is associated with The voice is associated with gestures…gestures…The vocal information is emploied

differently according to the given context :

• Navigation in the map : « tap and talk » : access by a vocal menu to diverse informations on the pointed objet.

• Navigation by lexicon :– like short cut access to categories,

then– to the access to input words or

clauses. The translation will appear / be synthesized in the target language.

• Possibly, improvement by using keywords ("word spotting").

Gesture and Gesture and voicevoice

Page 12: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

The « intelligent » camera• see, recognizesee, recognize and and translate translate

The character recognition – chinese in paticular – achieved now to high performance.

• to limit computing cost :– Recognition made on a sub-part of the image. – This sub-part can be chosen semi-automatically at the

moment of delimitation phase and previous segmentation.

• The text once recognized can be translated :– Locally

• to facilitate the translation, a vocal menu enables to choose the context : the notice of bus stops or street names, monuments, etc.

– Or by a remote server via a radiocommunication service.

• It’s also possible to be reproduced by vocal synthesis

Intelligent Intelligent cameracamera

Page 13: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

The cameraThe camera usageusage[4][4]

capturereco

translation

Intelligent Intelligent cameracamera

[4][4] Mao, Y., Dong, Q., Qi Y. et Chollet, G. Realization of an Intelligent Camera capable of Character Recognition and Translation. Proc. of Sino-French Symp. on Speech and Language Processing, Beijing, October 2000.

Disponible à l’adresse : http://www.tsi.enst.fr/~chollet/Projets/Chine/Lingtour/IntelCamera.doc

Page 14: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Improve the image Improve the image resolutionresolution• Difficulty :Difficulty :

– image far obtained in the street– Cheeper cameraquality/ insufficient resolution for the recognition

SolutionSolution : image rafinement– correlation and reconstruction of a series of

successive images. – Exploitation of the small differences due to natural

movement of the hand which keeps the camera. image with superieur resolution to one of

captures.

Intelligent Intelligent cameracamera

Page 15: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Principle of image rafinementPrinciple of image rafinement

Camera on the PDA

Vibration of the hand

Acquisition of image sequence

Evaluation of movements(sub-pixel)

Imageof better resolution

Recomposition of only one image

IntelligentIntelligent cameracamera

Page 16: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Rafinement of images Rafinement of images : results: resultsNotable improvement :

– Of visual quality– of rate of character– recognition

IntelligentIntelligentcameracamera

Page 17: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Conversational agents : Conversational agents : interestinterest• It enables to[5] tarnsfer an

information in more attractive and more user-friendly manner than simple vocal synthesis.

• The nonverbal expressions enable :– to disambiguate the meaning of an

utterance, – to emphasize certain words or

utterance fragments…• It supplies the informations with

different levels:– syntactic – semantic – emotionnal

• In a multicultural context, a visual demonstration can be also better vecter of teaching of certain usages.

Cultural agentsCultural agents

[5][5] Pelachaud, C., Carofiglio, V., De Carolis, B. et de Rosis, F., Embodied Contextual Agent in Information Delivering Application, First Intl. Joint Conf. on Autonomous Agents & Multi-Agent Systems, Bologna, July 2002

Page 18: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

« Greta » : facial animation engine• Objective : a model animated capable to simulate

in quick and realistic manner the dynamic aspects of human face.

• Realization : a facial animation engine of which the model 3D forms a young woman behaviour.

• Greta is :– the core of a decoder MPEG-4– Conform to specifications “Simple Facial Animation

Object Profile" of the standard. – capable :

• to generate the structure of an original model, • To animate this, • To reproduct in real time.

CulturalCultural agentsagents

Page 19: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Adopt the conversational agentsAdopt the conversational agents

• Transport on PDATransport on PDA of animated agents. of animated agents. – The power and the screen size of apparatus are limitedThe power and the screen size of apparatus are limited– The The complexitycomplexity and the and the level of detailslevel of details of the animation of the animation

have to be adapted. have to be adapted.

• Adaptation of the behaviour to users Adaptation of the behaviour to users ::In spite of recent advance in material of realism, the actual In spite of recent advance in material of realism, the actual

agents know only one type of behaviour, which reflects agents know only one type of behaviour, which reflects often the occidental culture. often the occidental culture.

Cultural and socialCultural and social adaptation to the adaptation to the context context : : The same information must be delivered differently, for The same information must be delivered differently, for

example: example: • to a French and to a Chinese, to a French and to a Chinese, • to a journalist and to a private. to a journalist and to a private.

Cultural agentsCultural agents

Page 20: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Conversational and cultural Conversational and cultural agents : semantic agents : semantic representationrepresentation

• Base : semantic representation independant on Base : semantic representation independant on the language, based on the standard the language, based on the standard XML-XSDXML-XSD..– description of the communicative fonction of gestures description of the communicative fonction of gestures

and signals composing the gestures. and signals composing the gestures.

• On-layer of the attributes specific to the cultureOn-layer of the attributes specific to the culture, , which influence on :which influence on :– the the choice of a gesturechoice of a gesture (smile or shake/nod of the (smile or shake/nod of the hhead),ead),– the the duration of a lookduration of a look… … More generally, these influences can concern :More generally, these influences can concern :– the the definition of a signal definition of a signal (hiding(hiding of a signal by an other), of a signal by an other), – Intensity of soundIntensity of sound, , – Sound Sound durationduration, etc., etc.

Page 21: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Conversational and cultural Conversational and cultural agents …agents …

in certain cultures,

Not to watch his interlocuter can be interpreted as a lack of his

attention /his interest…

In other cultures,

Watch straightforward in eyes can be interpreted as a form of

agression…

CulturalCulturalagentsagents

Page 22: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

Results and what follows…Results and what follows…At the end of the works which this project has enabled

to initiate, we hope be in a position to demonstrate :

• 1) the possibility to integrate on a mobile terminal (PDA, smartphone…) using the diverse interfaces presented here : – Menu control 2D, – capture and recognition of text, – Conversational agents.

• 2) the profits of the improvements which we recommend for each of these fonctionnalities: – integration of vocal commands in the menus, – rafinement of images by spatio-temporary correlation,– enrichment of the agents by the cultural attributes.

Gesture and Gesture and voicevoice

Intelligent Intelligent cameracamera

Cultural agentsCultural agents

Page 23: LINGTOUR: a PDA for  tourists

Interfaces multimodales Interfaces multimodales

pour un assistant au voyagepour un assistant au voyage

To evaluate these works To evaluate these works within the EURO-CHINA within the EURO-CHINA programme …programme …

• Collaboration engaged with Collaboration engaged with Peer2Phone (voice on IP via WIFI)Peer2Phone (voice on IP via WIFI)

• Presentation at the end of Presentation at the end of AApril in pril in BeijingBeijing

• A proposal with our Chinese partnars A proposal with our Chinese partnars for the Olympics in Beijingfor the Olympics in Beijing