18
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines Speaky™ Media Center Speaky™ Media Center SpeechTEK 2007 SpeechTEK 2007 New York New York August 20-23, 2007 August 20-23, 2007 1

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines Speaky™ Media Center SpeechTEK 2007 New York August 20-23, 2007 1

Embed Size (px)

Citation preview

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky™ Media CenterSpeaky™ Media Center

SpeechTEK 2007SpeechTEK 2007

New YorkNew York

August 20-23, 2007August 20-23, 2007

1

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Mediavoice presents Speaky Media CenterMediavoice presents Speaky Media Center

Mediavoice is a 2000 born company with the mission to develop innovative solutions with the state of the art of speech technology

The Company has already developed 3 patented solutions

The solution we present here and we are about to launch on the market is

Speaky Media Center

…a completely novel way to interact with the PC, its services and content

2

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media CenterSpeaky Media Center

Speaky Media Center is a new home computer, a new system that is able to interact to people in a natural way, simply using voice. A system that listens to the user, receives his/her voice commands, his/her information and service requests, and speaks the results to the user

An innovative system that makes us enter in a new home-automation dimension, that really places the customer, the person, at the center of the digital world, facilitating the access and the use of the digital services and content

A speech intelligent personal assistant who realizes the digital convergence using a simple and for all interface; a system that facilitates the bridging of the Digital Divide

3

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media CenterSpeaky Media Center

This patent pending solution that Mediavoice is presenting is an add-on for the new Windows Vista operating system PCs, consisting in the intelligent interface software Speaky and a special remote control that adds the speech features to the typical characteristics of the remote control for Windows Media Center

With Speaky Media Center you can interact with the PC in a natural way through the voice to see television, to use the video recorder, to search a song and to listen to your own music, to see your own videos, your photos and to navigate the Internet; you can manage your house and all its systems, you can call a person pronouncing his/her name and speaking to the remote control like if it were a telephone or you can play, by yourself or in group (for example role games, speech quiz …)

…all that stuff simply pressing a remote control button and speaking to its microphone!

4

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: sectionsSpeaky Media Center: sections

5

MUSIC VIDEO PHOTOS DVD

TV TV PROGRAM

GUIDE

TV VIDEO RECORDER

TELEPHONE

HOME AUTOMATIO

N

HOROSCOPE WEATHER GAMES

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: the new Vista-compliant,the new Vista-compliant,

Radio-FrequencyRadio-Frequency remote control for data and voice, with microphone, Push-remote control for data and voice, with microphone, Push-To-Talk button and USB receiverTo-Talk button and USB receiver

MICROPHONE

SPEAKY PUSH-TO-TALK BUTTON

RADIO-FREQUENCY USB RECEIVER

Radio-Frequency technology overcomes

`the directionality' of the infrared and

guarantees a larger range of the remote

control

6

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: who is the user and how Speaky Media Center: who is the user and how he/she is facilitated by Speakyhe/she is facilitated by Speaky

7

Speaky’s mission is to facilitate the interaction with the digital world for all, in

order to allow the user to access the digital content in an easy and fast way

The user who wants to use a personal computer and its content, will no

longer need to learn by heart user guides, complex sintaxes, list of menu

options and impossible technical information

Speaky is also useful for blind and impared people

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: which issues and solutions Speaky Media Center: which issues and solutions during Projecting and Developmentduring Projecting and Development

8

i. Far or close mic?

ii. Speaker dependent or independent?

iii. Continuous recognition or triggered recognition?

iv. Fixed mic sensitivity or Automatic Gain Control use?

v. Infrared, bluetooth or radio frequency communication?

vi. Directed dialog or natural language?

vii. How to speech international texts by one voice: Phonetic mapping and

language guesser

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: i) i) Far or close mic?

9

A very close mic has been chosen to get the best ASR performance: we put

a mic on the remote control so that the user can give voice commands from a

few centimetres distance

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: ii) ii) Speaker dependent or independent?

10

Speaky Media Center is first af all a ‘social/family solution’, so it must be

open to all the users and must be easy and ready to use without any training

time so we have choosen the speaker independency

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: iii) iii) Continuous recognition or triggered recognition?

11

We use the Speaky button as a push to talk button: when the user pushes

it, the system sets to zero the audio volume and starts the ASR session which

will be stopped when the user releases the Speaky button

In this way we get the best ASR performance because:

The ASR starts to recognize just before the user starts speaking

The ASR listens only to the user speaking because all the media are

stopped during his speaking

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: iv) iv) Fixed mic sensitivity or Automatic Gain Control use?

12

using a static mic sensitivity the user may speak from a restricted mic

distance range, instead:

using the Automatic Gain Control (AGC) feature means that mic sensitivity

follows speaker’s behaviour: if the speaker speaks too loud or too close to the

mic, the sensitivity decreases; instead if the user speaks from far or low, then

the mic sensitivity increases

using the AGC feature the user is more free and may speak in a wider range

so the use of Speaky is more natural and with better ASR performance

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: v) v) Infrared, bluetooth or radio frequency communication?

13

there were many ways to make the remote control interact with the

Personal Computer:

infrared forces the user to point the remote control to the receiver and to ‘see’ it

bluetooth has a heavy software stack and it is an expensive proprietary protocol;

Instead:

radio frequency passes through walls, it is radial so that the user can handle the

remote control in any position, it is a low-consumption trasmitting technology and it is

open to anyone

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: vi) vi) Directed dialog or natural language?

14

This one may be the most important question to face, during the project of a

speech personal assistant the natural language is really reach and complex and it is still very difficult to model

Speaky’s scope is very large and rich so the user may say really anything speaking to it

more than that, the user may take pauses and mumble during his speech, and this behaviour

is very dangerous for the ASR performance

so:

We chose an ‘enriched directed dialog’, that is the user may say everything he sees on the

screen. Anyway the system recognizes also the most used prefixes for the specific areas in

order to make its understanding as wide as possible, including the cases in which the users,

although he knows that he can just say the directed commands, he may also say ‘something

around’ those commands (for example let me hear, let me see, i want to listen to….)

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Speaky Media Center: vii) vii) How to speechinternational texts by one voice: Phonetic mapping and

language guesser

15

In many Speaky areas the user may interact with international titles, for

example in music, videos, tv programs

in order to permit him to pronounce all the different language titles of his collection,

we used a particular feature of Loquendo ASR named ‘phonetic mapping and

language guesser’

with this innovative feature the system is able to guess the native language of the

title, to translate its pfonemes from the original language to the current ASR

language so that one ASR (using one language) may recognize many different

language titles

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: OPEN SpeakySpeaky Media Center: OPEN Speaky

16

Speaky Media Center is an open programming environment with an easy to

use API, to integrate any content and software provider in a fast and simply way

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: Partner and First CustomerSpeaky Media Center: Partner and First Customer

17

Loquendo is Speaky’s speech engines partner: BOOTH # 509, where you

can try Speaky Media Center

Olidata, the first italian PC vendor, is Speaky’s first customer

…we are looking for US commercial and technological partners!

Fabrizio Giacomelli CEO Mediavoice Powered by speech engines

Speaky Media Center: next steps…Speaky Media Center: next steps…

18

many Speaky’ news are coming soon…Stay with us!

www.mediavoice.it