11
eNTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Embed Size (px)

Citation preview

Page 1: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

eNTERFACE ’08Project 2

“Multimodal High Level Data Integration”

Final Report

August 29th, 2008

Page 2: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Application challenges

• 2 users in their home/office environment

• unrestricted natural language

• free human behavior

Page 3: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Components integrated

SpeechRecognizer

Video Analyzer

Sound Waves

SyntacticAnalyzer

Recognized String

Sequence ofImages

SemanticAnalyzer

Syntactic Triple

KnowledgeBase

Fusion Mechanism

Human BehaviorAnalyzer

Movements Coordinates

Movements Meanings

Advise PeopleLinguistic meanings

Audio Stream Video Stream

Page 4: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Audio Stream Video Stream

Sphinx-4Open CV

Sound Waves

C & C Parser

Recognized String

Sequence ofImages

C & C Boxer

Syntax Analysis

ProtegèJena

Fusion Mechanism

Human BehaviorAnalyzer

Movements Coordinates

Movements Meanings

Advise PeopleLinguistic meanings

Semantic Validation

Page 5: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Example Scenario

[Ronald] I want to call Nick. Nick mentioned that he attended a wine tasting course.

[Beto] It sounds interesting, I like wine.

[Ronald] Actually I plan to join the next class. He also mentioned a book about French wines, but I cannot recall the name of the author.

[Beto] Why don't you send a mail to Nick?

[Ronald] Maybe I can find a book about it in the library.

[Beto] Yes, you are right.

[Beto] Did you find it?

[Ronald] Yes, I did.

Page 6: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Hints for plan recognition by speech

Alerts:

want, need, wish, require, going to, plan, look for, wonder, can, may, must, do you know, do we have, etc.

Stop-alerts:

- negation (I am not going to…)

- past tense (Yesterday I was going to…)

Page 7: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Maybe I can find a book

about it in the library

Ronald is moving towards the book

shelves

Page 8: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Decision making

If (Ronald) [wants to send] {email to Nick} &

(Ronald [is moving to] {the computer} | He [is close to] {the computer}) then

open the mail client with the “to” field filled with [email protected] 

If (Ronald) [can] find {book} [about] {it} [in] {the library} &

(Ronald [is moving to] {the library} then

There is a book about French wines on the first shelf. 

If (Ronald) [can] find {book} [about] {it} [in] {the library} &

(Ronald [is moving to] {the computer}) then

Open a web search website and put the keyword in the search field.

Page 9: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Achievements• spatial relationships (based on the fixed “anchor” objects in the room)

• semantic fusion of events not coinciding in time

• good results in speaker identification: synchronisation between image and speech identification

• an open framework to manage fusion between two (our case) or more modalities was created during the project and will be enhanced further

• each component can run in a separated machine thanks to the distribution mechanism interchanging data through a TCP/IP network.

Page 10: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Future work• implement effective learning

• efficient decision making even from information fragments

• spatial relationships relatively to moving people

• 3D video analysis

• detection of orientation of the people in the scene

• eye gaze tracking

• recognition of various types of gestures

• dealing with natural language redundancy (repeating the same idea in different words)

Page 11: ENTERFACE ’08 Project 2 “Multimodal High Level Data Integration” Final Report August 29th, 2008

Further development of results

• integration on the OpenInterface platform (openinterface.org)

• create an open-source community around the project to

- gain ideas and contributions from outside

- have new modalities to fuse

• create a website, a forum, a mailing list