Upload
russell-long
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky™ Media CenterSpeaky™ Media Center
SpeechTEK 2007SpeechTEK 2007
New YorkNew York
August 20-23, 2007August 20-23, 2007
1
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Mediavoice presents Speaky Media CenterMediavoice presents Speaky Media Center
Mediavoice is a 2000 born company with the mission to develop innovative solutions with the state of the art of speech technology
The Company has already developed 3 patented solutions
The solution we present here and we are about to launch on the market is
Speaky Media Center
…a completely novel way to interact with the PC, its services and content
2
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media CenterSpeaky Media Center
Speaky Media Center is a new home computer, a new system that is able to interact to people in a natural way, simply using voice. A system that listens to the user, receives his/her voice commands, his/her information and service requests, and speaks the results to the user
An innovative system that makes us enter in a new home-automation dimension, that really places the customer, the person, at the center of the digital world, facilitating the access and the use of the digital services and content
A speech intelligent personal assistant who realizes the digital convergence using a simple and for all interface; a system that facilitates the bridging of the Digital Divide
3
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media CenterSpeaky Media Center
This patent pending solution that Mediavoice is presenting is an add-on for the new Windows Vista operating system PCs, consisting in the intelligent interface software Speaky and a special remote control that adds the speech features to the typical characteristics of the remote control for Windows Media Center
With Speaky Media Center you can interact with the PC in a natural way through the voice to see television, to use the video recorder, to search a song and to listen to your own music, to see your own videos, your photos and to navigate the Internet; you can manage your house and all its systems, you can call a person pronouncing his/her name and speaking to the remote control like if it were a telephone or you can play, by yourself or in group (for example role games, speech quiz …)
…all that stuff simply pressing a remote control button and speaking to its microphone!
4
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: sectionsSpeaky Media Center: sections
5
MUSIC VIDEO PHOTOS DVD
TV TV PROGRAM
GUIDE
TV VIDEO RECORDER
TELEPHONE
HOME AUTOMATIO
N
HOROSCOPE WEATHER GAMES
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: the new Vista-compliant,the new Vista-compliant,
Radio-FrequencyRadio-Frequency remote control for data and voice, with microphone, Push-remote control for data and voice, with microphone, Push-To-Talk button and USB receiverTo-Talk button and USB receiver
MICROPHONE
SPEAKY PUSH-TO-TALK BUTTON
RADIO-FREQUENCY USB RECEIVER
Radio-Frequency technology overcomes
`the directionality' of the infrared and
guarantees a larger range of the remote
control
6
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: who is the user and how Speaky Media Center: who is the user and how he/she is facilitated by Speakyhe/she is facilitated by Speaky
7
Speaky’s mission is to facilitate the interaction with the digital world for all, in
order to allow the user to access the digital content in an easy and fast way
The user who wants to use a personal computer and its content, will no
longer need to learn by heart user guides, complex sintaxes, list of menu
options and impossible technical information
Speaky is also useful for blind and impared people
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: which issues and solutions Speaky Media Center: which issues and solutions during Projecting and Developmentduring Projecting and Development
8
i. Far or close mic?
ii. Speaker dependent or independent?
iii. Continuous recognition or triggered recognition?
iv. Fixed mic sensitivity or Automatic Gain Control use?
v. Infrared, bluetooth or radio frequency communication?
vi. Directed dialog or natural language?
vii. How to speech international texts by one voice: Phonetic mapping and
language guesser
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: i) i) Far or close mic?
9
A very close mic has been chosen to get the best ASR performance: we put
a mic on the remote control so that the user can give voice commands from a
few centimetres distance
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: ii) ii) Speaker dependent or independent?
10
Speaky Media Center is first af all a ‘social/family solution’, so it must be
open to all the users and must be easy and ready to use without any training
time so we have choosen the speaker independency
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: iii) iii) Continuous recognition or triggered recognition?
11
We use the Speaky button as a push to talk button: when the user pushes
it, the system sets to zero the audio volume and starts the ASR session which
will be stopped when the user releases the Speaky button
In this way we get the best ASR performance because:
The ASR starts to recognize just before the user starts speaking
The ASR listens only to the user speaking because all the media are
stopped during his speaking
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: iv) iv) Fixed mic sensitivity or Automatic Gain Control use?
12
using a static mic sensitivity the user may speak from a restricted mic
distance range, instead:
using the Automatic Gain Control (AGC) feature means that mic sensitivity
follows speaker’s behaviour: if the speaker speaks too loud or too close to the
mic, the sensitivity decreases; instead if the user speaks from far or low, then
the mic sensitivity increases
using the AGC feature the user is more free and may speak in a wider range
so the use of Speaky is more natural and with better ASR performance
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: v) v) Infrared, bluetooth or radio frequency communication?
13
there were many ways to make the remote control interact with the
Personal Computer:
infrared forces the user to point the remote control to the receiver and to ‘see’ it
bluetooth has a heavy software stack and it is an expensive proprietary protocol;
Instead:
radio frequency passes through walls, it is radial so that the user can handle the
remote control in any position, it is a low-consumption trasmitting technology and it is
open to anyone
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: vi) vi) Directed dialog or natural language?
14
This one may be the most important question to face, during the project of a
speech personal assistant the natural language is really reach and complex and it is still very difficult to model
Speaky’s scope is very large and rich so the user may say really anything speaking to it
more than that, the user may take pauses and mumble during his speech, and this behaviour
is very dangerous for the ASR performance
so:
We chose an ‘enriched directed dialog’, that is the user may say everything he sees on the
screen. Anyway the system recognizes also the most used prefixes for the specific areas in
order to make its understanding as wide as possible, including the cases in which the users,
although he knows that he can just say the directed commands, he may also say ‘something
around’ those commands (for example let me hear, let me see, i want to listen to….)
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Speaky Media Center: vii) vii) How to speechinternational texts by one voice: Phonetic mapping and
language guesser
15
In many Speaky areas the user may interact with international titles, for
example in music, videos, tv programs
in order to permit him to pronounce all the different language titles of his collection,
we used a particular feature of Loquendo ASR named ‘phonetic mapping and
language guesser’
with this innovative feature the system is able to guess the native language of the
title, to translate its pfonemes from the original language to the current ASR
language so that one ASR (using one language) may recognize many different
language titles
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: OPEN SpeakySpeaky Media Center: OPEN Speaky
16
Speaky Media Center is an open programming environment with an easy to
use API, to integrate any content and software provider in a fast and simply way
Fabrizio Giacomelli CEO Mediavoice Powered by speech engines
Speaky Media Center: Partner and First CustomerSpeaky Media Center: Partner and First Customer
17
Loquendo is Speaky’s speech engines partner: BOOTH # 509, where you
can try Speaky Media Center
Olidata, the first italian PC vendor, is Speaky’s first customer
…we are looking for US commercial and technological partners!