14
Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy

Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Embed Size (px)

DESCRIPTION

Speech as interface Speech as 1 st class content Mobile access Directory services Automation PC application Web service Text input Dictation Indexing Search Keyword extraction Transcription Meetings Voic s Closed Caption Translation Translating phone Speech Applications

Citation preview

Page 1: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Behrooz Chitsaz Lorrie Apple JohnsonMicrosoft Research U.S. Department of Energy

Page 2: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Multimedia ResearchSpeech Search

Face identification

Object

recognition

Video browsing

Semantic

extraction

(3D) Segmentation

(3D) Image search

Page 3: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Speech as interface

Speech as 1st class content

Speech Applications

Page 4: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Speech recognition

Spectral Analysis

Matching (Decoding)time alignment most likely hypothesis

W’=argmax(w1..wN)p(ot..o|w1..wN) P(w1..wN)

Acoustic Modelsp(ot..o|phoneme)

DictionaryP(phonemes|w)

Grammar (Language Model)

P(w1..wN)

“Hello World”

o1..oT

(w1..wN)^

Page 5: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

MAVIS technology

• Indexing automatic transcripts as text– Automatic transcription accuracy is only 50-80%

• MAVIS techniques– Word-level lattice indexing

• index word alternatives – robust to recognizer errors• 50-140% accuracy improvement • index timing – navigate to exact point in video

– Vocabulary Adaptation• Use NLP and Bing Search to expand word dictionary

– Automatic keywords to expose to search engines• Enables discovery of speech content through search engines• Bi-product of vocabulary adaptation

– See http://research.microsoft.com/mavis

Page 6: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

MAVIS Architecture

SQL Server(s)

1. S

ubm

it au

dio/

vide

o R

SS

2. R

etrie

ve A

IB

3. Import AIB in SQL

Web server(s)

4. S

earc

h/R

etrie

ve re

sults

• Store content to be processed in temporary Azure storage

• Do vocabulary adaptation using Bing• Run recognition engine on content• Store results or recognition process (AIB)

Page 7: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

U.S. Department of Energy Office of Scientific and Technical

Information (OSTI) Mission

• DOE invests > $10 billion/year in basic sciences, clean energy technology, and nuclear research.

• The immediate output from this investment is Information…Knowledge… R&D results

• OSTI’s mission is to accelerate scientific progress by accelerating access to this information.

Page 8: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

OSTI’s Core Products

• Information Bridge

• Science Accelerator

• Science.gov

Page 9: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

WorldWideScience.org

Page 10: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Emerging Forms of Scientific Information Require New Tools

• Numeric data, multimedia, and social media are emerging forms of scientific information

• Each form presents special opportunitiesand challenges

Page 11: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Search and Retrieval Challenges with Multimedia Science Information

• Lack of written transcripts, i.e. no “full text” to search

• Metadata, if available, is often minimal• Scientific, technical, and medical

terminology/vocabulary• Videos can be long, often up to an hour

or more

Page 12: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

• Video files collected from DOE’s National Laboratories

• RSS feeds with metadata and URLs sent to Microsoft Research

• Audio indexing performed via MAVIS• Audio index blob (AIB) returned to OSTI and

integrated with SQL servers• Users can search for a precise term within the video,

and be directed to the exact point in the video where the term was spoken

OSTI and Microsoft Research Partnership

Page 13: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Demonstration of ScienceCinema

Page 14: Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy

Looking to the Future

• Additional content from DOE researchers• Integration of multimedia searches into

WorldWideScience.org by June• High quality automatic closed captions• Multilingual translation capabilities