114
Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard

Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Speech and Music Retrieval

LBSC 796/INFM 718R

Session 11, April 20, 2011

Douglas W. Oard

Page 2: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Audio Indexing

• Music

• Sounds

• Speech

Page 3: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Encoding Audio

• Nyquist theorem: sampling rate

• Channel characteristics

– Bandpass

– Amplitude resolution

• Compression

– Lossless

– Perceptually-based lossy (e.g., .mp3)

Page 4: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

User-Centered Music Retrieval

• User types:

– Casual

• name that tune, playlist generation, query by humming, …

– Professional

• Instrument detection, timbre, …

– Scholar

• Content types:

– Score

– Performance

Page 5: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Shazam

• Spectrogram

– Time vs. frequency plot of intensity

• Peak filter

– Local maxima of intensity

• Difference coding

– Rate of frequency change over time

• Convolution

– Sliding correlation

Page 6: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Muscle Fish Audio Retrieval

• Compute 4 acoustic features for each time slice

– Pitch, amplitude, brightness, bandwidth

• Segment at major discontinuities

– Find average, variance, and smoothness of segments

• Store pointers to segments in 13 sorted lists

– Use a commercial database for proximity matching

• 4 features, 3 parameters for each, plus duration

– Then rank order using statistical classification

• Display file name and audio

Page 7: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

New Zealand Melody Index

• Index musical tunes as contour patterns

– Rising, descending, and repeated pitch

– Note duration as a measure of rhythm

• Users sing queries using words or la, da, …

– Pitch tracking accommodates off-key queries

• Rank order using approximate string match

– Insert, delete, substitute, consolidate, fragment

• Display title, sheet music, and audio

Page 8: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Contour Matching Example

• “Three Blind Mice” is indexed as:

– *DDUDDUDRDUDRD

• * represents the first note

• D represents a descending pitch (U is ascending)

• R represents a repetition (detectable split, same pitch)

• My singing produces:

– *DDUDDUDRRUDRR

• Approximate string match finds 2 substitutions

Page 9: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 10: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 11: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Spoken Word Collections

• Broadcast programming

– News, interview, talk radio, sports, entertainment

• Scripted stories

– Books on tape, poetry reading, theater

• Spontaneous storytelling

– Oral history, folklore

• Incidental recording

– Speeches, oral arguments, meetings, phone calls

Page 12: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Some Statistics

~250,000 hours of oral history in British Library

>4,000 radio stations webcasting

~35 million hours per year

~100 billion hours of phone calls each year

Page 13: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Economics of the WebText in 1995 Speech in 2003

Storage(words per $)

300K 1.5M

Internet Backbone(simultaneous users)

250K 30M

Modem Capacity(% utilization)

100% 20%

Display Capability(% US population)

10% 38%

Search Systems Lycos

Yahoo

SpeechBot

SingingFish

Page 14: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 15: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Speech Retrieval Approaches

• Controlled vocabulary indexing

• Ranked retrieval based on associated text

Automatic feature-based indexing

• Social filtering based on other users’ ratings

Page 16: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Supporting Information Access

Source

Selection

Search

Query

Selection

Ranked List

Examination

Recording

Delivery

Recording

Query

Formulation

Search System

Query Reformulation

and

Relevance Feedback

Source

Reselection

Page 17: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Description Strategies

• Transcription

– Manual transcription (with optional post-editing)

• Annotation

– Manually assign descriptors to points in a recording

– Recommender systems (ratings, link analysis, …)

• Associated materials

– Interviewer’s notes, speech scripts, producer’s logs

• Automatic

– Create access points with automatic speech processing

Page 18: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Detectable Speech Features

• Content

– Phonemes, one-best word recognition, n-best

• Identity

– Speaker identification, speaker segmentation

• Language

– Language, dialect, accent

• Other measurable parameters

– Time, duration, channel, environment

Page 19: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

How Speech Recognition Works

• Three stages

– What sounds were made?

• Convert from waveform to subword units (phonemes)

– How could the sounds be grouped into words?

• Identify the most probable word segmentation points

– Which of the possible words were spoken?

• Based on likelihood of possible multiword sequences

• These can be learned from “training data”

– Hill-climbing sequence models

• (a “Hidden Markov Model”)

Page 20: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Using Speech Recognition

Phone

Detection

Word

Construction

Word

Selection

Phone

n-grams

Phone

lattice

Words

Transcription

dictionary

Language

model

One-best

transcript

Word

lattice

Page 21: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Phone Lattice

Page 22: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Phoneme Trigrams

• Manage -> m ae n ih jh

– Dictionaries provide accurate transcriptions

• But valid only for a single accent and dialect

– Rule-base transcription handles unknown words

• Index every overlapping 3-phoneme sequence

– m ae n

– ae n ih

– n ih jh

Page 23: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Cambridge Video Mail Retrieval

• Translate queries to phonemes with dictionary

– Skip stopwords and words with 3 phonemes

• Find no-overlap matches in the lattice

– Queries take about 4 seconds per hour of material

• Vector space exact word match

– No morphological variations checked

– Normalize using most probable phoneme sequence

• Select from a ranked list of subject lines

Page 24: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 25: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 26: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Key Results from TREC/TDT

• Recognition and retrieval can be decomposed

– Word recognition/retrieval works well in English

• Retrieval is robust with recognition errors

– Up to 40% word error rate is tolerable

• Retrieval is robust with segmentation errors

– Vocabulary shift/pauses provide strong cues

Page 27: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Competing Demands on the Interface

• Query must result in a manageable set

– But users prefer simple query interfaces

• Selection interface must show several segments

– Representations must be compact, but informative

• Rapid examination should be possible

– But complete access to the recordings is desirable

Page 28: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

BBN Radio News Retrieval

Page 29: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

AT&T Radio News Retrieval

Page 30: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

MIT “Speech Skimmer”

Page 31: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Comparison with Text Retrieval

• Detection is harder

– Speech recognition errors

• Selection is harder

– Date and time are not very informative

• Examination is harder

– Linear medium is hard to browse

– Arbitrary segments produce unnatural breaks

Page 32: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

A Richer View of Speech

• Speaker identification

– Known speaker and “more like this” searches

– Gender detection for search and browsing

• Topic segmentation

– Vocabulary shift, cue words

– More natural breakpoints for browsing

• Speaker segmentation

– Visualize turn-taking behavior for browsing

– Classify turn-taking patterns for searching

Page 33: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Speaker Identification

• Gender

– Classify speakers as male or female

• Identity

– Detect speech samples from same speaker

– To assign a name, need a known training sample

• Speaker segmentation

– Identify speaker changes

– Count number of speakers

Page 34: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Visualizing Turn-Taking

Page 35: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Broadcast News Retrieval Study

• NPR Online

Manually prepared transcripts

Human cataloging

• SpeechBot

Automatic Speech Recognition

Automatic indexing

Page 36: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

NPR Online

Page 37: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 38: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

SpeechBot

Page 39: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 40: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Study Design

• Seminar on visual and sound materials

– Recruited 5 students

• After training, we provided 2 topics

– 3 searched NPR Online, 2 searched SpeechBot

• All then tried both systems with a 3rd topic

– Each choosing their own topic

• Rich data collection

– Observation, think aloud, semi-structured interview

• Model-guided inductive analysis

– Coded to the model with QSR NVivo

Page 41: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Criterion-Attribute Framework

Relevance

Criteria

Associated Attributes

NPR Online SpeechBot

Topicality

Story Type

Authority

Story title

Brief summary

Audio

Detailed summary

Speaker name

Audio

Detailed summary

Short summary

Story title

Program title

Speaker name

Speaker’s affiliation

Detailed summary

Brief summary

Audio

Highlighted terms

Audio

Program title

Page 42: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Lessons Learned from the

MALACH Project

Douglas W. Oard

College of Information Studies and

Institute for Advanced Computer Studies

Applying New Technologies to Improve Intellectual Access

to Large Oral History Collections

Page 43: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Outline

• The MALACH Project

• Learning from our users

• Evaluating new access technologies

• What’s left to do?

Page 44: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Indexing Options

• Transcript-based (e.g., NASA)

– Manual transcription, editing by interviewee

• Thesaurus-based (e.g., Shoah Foundation)

– Manually assign descriptors to points in interview

• Catalog-based (e.g., British Library)

– Catalog record created from interviewer’s notes

• Speech-based (MALACH)

– Create access points with speech processing

Page 45: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Supporting the Search Process

Source

Selection

Search

Query

Selection

Ranked List

Examination

Recording

Delivery

Recording

Query

Formulation

Search System

Query Reformulation

and

Relevance Feedback

Source

Reselection

• Speech Processing

• Computational Linguistics

• Information Retrieval

• Information Seeking

• Human-Computer Interaction

• Digital Archives

Page 46: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

The MALACH Project

Speech

Technology

English

Czech

Russian

Slovak

Language

Technology

Topic Segmentation

Categorization

Extraction

Translation

Interactive

Search

Systems

Interface Development

User Studies

Search

Technology

Test Collection

Page 47: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

The MALACH/CLEF Team

• USC (Shoah Foundation)

– Sam Gustman

• IBM TJ Watson

– Bhuvana Ramabhadran

– Martin Franz

• U. Maryland

– Doug Oard

– Dagobert Soergel

• Johns Hopkins

– Zak Schafran

• U. Cambridge (UK)

– Bill Byrne

• Charles University (CZ)

– Jan Hajic

– Pavel Pecina

• U. West Bohemia (CZ)

– Josef Psutka

– Pavel Ircing

• Dublin City University (IE)

– Gareth Jones

• Budapest U Tech+Econ (HU)

– Tibor Fegyo

– Peter Mihajlik

USA Europe

• IBM India

– Nanda Kambhatha

Asia

Page 48: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

• Spontaneous conversational speech

• Digitized

• Large

– 116,000 hours; 52,000 people; 32 languages

• Full-description segment indexing

– 10,000 hours, ~150,000 topical segments

• Real users

Page 49: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Interview Excerpt

• Content characteristics

– Domain-specific terms

– Named entities

• Audio characteristics

– Accented (this one is unusually clear)

– Two channels (interviewer / interviewee)

• Dialog structure

– Interviewers have different styles

Page 50: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Outline

• The MALACH Project

Learning from our users

• Evaluating new access technologies

• What’s left to do?

Page 51: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Who Uses the Collection?

• History

• Linguistics

• Journalism

• Material culture

• Education

• Psychology

• Political science

• Law enforcement

• Book

• Documentary film

• Research paper

• CDROM

• Study guide

• Obituary

• Evidence

• Personal use

Discipline Products

Based on analysis of 280 access requests

Page 52: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

2002-2003 Observational Studies

8 independent searchers

– Holocaust studies (2)

– German Studies

– History/Political Science

– Ethnography

– Sociology

– Documentary producer

– High school teacher

8 teamed searchers

– All high school teachers

Thesaurus-based search

Rich data collection

– Intermediary interaction

– Semi-structured interviews

– Observer’s notes

– Participant notes

– Screen capture

Qualitative analysis

– Theory-guided coding

– Abductive reasoning

Page 53: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music
Page 54: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Scholars: Relevance Criteria

Relevance Judgment Query Reformulation

Information

Need Query

S1

G1 A1

Situation n

Gap n Action n

Browse Only

- Accessibility (6%)

- Richness (6%)

- Emotion (3%)

- Duration (2%)

- Miscellaneous (3%)

Query & Browse

- Topicality (76%)

- Comprehensibility (2%)

- Novelty of content (1%)

- Acquaintance (1%)

Page 55: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Scholars: Topic Types

0 20 40 60 80 100 120 140

Object

Time Frame

Organization/Group

Subject

Event/Experience

Place

Person

Total mentions

6 Scholars, 1 teacher, 1 movie producer, working individually

Page 56: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Basis for Query Reformulation

• Searcher’s prior knowledge (31%)

• Viewing an interview or segment (20%)

• Help from an intermediary (15%)

• Thesaurus (13%)

• Assigned descriptors (12%)

• Pre-interview questionnaire (6%)

• Result set size (4%)

Total

Of 40%

From

Classification

Page 57: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Teachers: Search Strategies• Scenario-guided searching

– Look for experiences that would reflect themes

• Proxy searching

– Descriptors & interview questions

• Browsing

– Listened to testimonies for inspiration

– Used thesaurus for ideas and orientation

Page 58: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Iterating Searching & Planning“Yesterday in my search, I just felt like I was kind of going around in the dark. But that productive writing session really directed my search.”

Clarify themes &

define lessonsSearch & view

testimony

Group discussions clarify themes and

define activities, which hone teachers’

criteria

Testimonies give teachers ideas on

what to discuss in the classroom (topic)

and how to introduce it (activity).

“We actually looked at the testimonies that we found before we started writing the lesson ... We really started with the testimony and built our lesson around it.”

Page 59: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Teachers: Relevance Criteria

• Relevant to teaching content/ method

– A: Relationship to theme

– B: As part of broader curriculum

– C: Characteristics of the story

– D: Relationship of story to student

– E: Represents different populations

– F: Characteristics of Oral History

• Appropriateness

– Developmental

– Acceptability to stakeholders

• Length-to-contribution ratio

• Technical production quality

• B: Relates to other schoolwork

• B: Variety for the classroom

• B: Vocabulary

• C: Positive Message for Students

• C: Role of Interviewee in Holocaust Events

• D: Students connect with passage

• D: Students identify with interviewee

• D: Radical difference from students’ reality

• E: Age of interviewee at time of events

• E: Race

• F: Expressive power

• F: Language & verbal expression

• F: Nonverbal Communication

• F: Diction

• F: Flow of interview

Page 60: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Tie Into Curriculum

• Relates to other school work

• Suggests:

– Query by published example

–Using vocabulary lists to prime ASR

“I am dealing with the Holocaust through

literature. You know, Night, Anne Frank’s

diaries … I would very much like to see

[segments] that specifically relate to

various pieces of literature.”

Page 61: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Linking Students to a Story

• Students identify with interviewee

• Suggests

– Demographic search constraints

– Indexing characteristics, not just words

“Here she is, this older lady, but she

became a 16-year-old girl when she was

talking. I saw her in her school uniform

on the day of graduation. She was so

proud of the way she looked, and she felt

so sophisticated.”

Page 62: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

2006 Observational Study (Teachers)

Curriculum

& Standards

Objectives

Lesson Plan

Topics

Search

Topics

Segments for

Students

Develo

pin

g R

ele

vance C

rite

ria

Apply

Rele

vance

Crite

ria &

Sele

ct S

egm

ents

U.S. History – Civil Rights

“make the connection- there are survivors …

who went on to teach college classes in black

schools in the south”

Holocaust experience for Jews & other

victim groups; Racial and ethnic

discrimination in U.S.

African American

soldiers; Survivor

perspectives on Civil

Rights movement

Page 63: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

User Study Limitations

• Generalization

- Exploratory rather than comparative

- Results influenced by system and domain

• Limited number of study participants

- Insufficient to observe group differences

• Lack of prior experience with the system

- Experienced users may behave differently

Page 64: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Outline

• The MALACH Project

• Learning from our users

Evaluating new access technologies

• What’s left to do?

Page 65: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

0

10

20

30

40

50

60

70

80

90

100

Jan-

02

Jul-

02

Jan-

03

Jul-

03

Jan-

04

Jul-

04

Jan-

05

Jul-

05

Jan-

06

En

glis

h W

ord

Err

or

Ra

te (

%)

English Transcription Accuracy

Training: 200 hours from 800 speakers

ASR2003AASR2004A

ASR2006A

Page 66: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Speech Retrieval Evaluations

• 1996-1998: TREC SDR

– EN broadcast news / EN queries

• 1997-2004: TDT

– EN+CN+AR broadcast news / Query by example

• 2003-2004: CLEF CL-SDR

– EN broadcast news / Many query languages

• 2005-2007: CLEF CL-SR

– EN+CZ interviews / Many query languages

Page 67: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Automatic

SearchBoundary

Detection

Interactive

Selection

Content

Tagging

Speech

Recognition

Query

Formulation

English Test Collection Design

Page 68: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

English Test Collection Design

Automatic

SearchBoundary

Detection

Speech

Recognition

Query

Formulation

Topic Statements

Ranked Lists

EvaluationRelevance

Judgments

Mean Average Precision

Interviews

Content

Tagging

Manual: Topic boundaries

Manual: ~5 Thesaurus labels

Synonyms/broader terms

Person names

3-sentence summaries

Automatic: Thesaurus labels

Synonyms/broader terms

Automatic: 25% interview-tuned

40% domain-tuned

Training: 63 topics

Evaluation: 33 topics

Page 69: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

2006/07 CL-SR English Collection

• 8,104 topically-coherent segments

– 297 English interviews

– Known-boundary condition (“Segments”)

– Average 503 words/segment

• 96 topics

– Title / Description / Narrative

– 63 training + 33 evaluation

– 6 topic languages (CZ, DE, EN, FR, NL, SP)

• Relevance judgments

– Search-guided + “highly ranked” (pooled)

• Distributed to track participants by ELDA

Page 70: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

<DOC>

<DOCNO>

<INTERVIEWDATA>

<NAME>

<MANUALKEYWORD>

<SUMMARY>

<ASRTEXT2003A>

<ASRTEXT2004A>

<ASRTEXT2006A>

<ASRTEXT2006B>

<AUTOKEYWORD2004A1>

<AUTOKEYWORD2004A2>

Page 71: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Supplementary Resources

• Thesaurus (Included in the test collection)

~3,000 core concepts

• Plus alternate vocabulary + standard combinations

~30,000 location-time pairs, with lat/long

Is-a, part-whole, “entry vocabulary”

• Digitized speech

.mp2 or .mp3

• In-domain expansion collection (MALACH)

186,000 scratchpad + 3-sentence summaries

Page 72: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Manually Assigned Thesaurus Terms

Num

ber

of A

ssig

ned T

hesau

rus T

erm

s

Mean

End of

Interview

Mean

Page 73: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Sequence-Based ClassificationTemporal Label Weights (TLW)

(Based on Absolute Position)

ASR ASR ASR ASR

Seg 1: “I was born in Berlin …”

Seg 2: “My playmates included …”

Seg 3: “Later, we moved to Munich …”

Seg 4: “That’s when things started to …”

Seg 1 Seg 2 Seg 3 Seg 4

Thesaurus Terms

Time-Shifted Classification (TSC)

(Based on Relative Position)

Page 74: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Sequence Modeling Results

0.0

0.1

0.2

0.3

Geo-Time Geo-Time

"Term"

parent

Geo-Time

"Type"

parent

Concept Concepts

"Term"

parent

Concept

"Type"

parent

Clip

pe

d R

-Pre

cis

ion

kNN kNN+TSC kNN+TLW kNN+Both

(is-a)(part-whole)(is-a)(part-whole)

Page 75: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

An English Topic

Number: 1148

Title: Jewish resistance in Europe

Description:

Provide testimonies or describe actions of Jewish resistance in Europe

before and during the war.

Narrative:

The relevant material should describe actions of only- or mostly Jewish

resistance in Europe. Both individual and group-based actions are relevant.

Type of actions may include survival (fleeing, hiding, saving children),

testifying (alerting the outside world, writing, hiding testimonies), fighting

(partisans, uprising, political security) Information about undifferentiated

resistance groups is not relevant.

Page 76: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

The 118 CL-SR Topics

38

2005 English

Training

25

2005 English

Evaluation

33

2006/07 English

Evaluation

9

Unused

13

Czech

Safety

1

2006 Czech

Evaluation

+

3

2007 Czech

Evaluation

28

2006 Czech

Evaluation

+

41

2007 Czech

Evaluation

Page 77: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

5-level Relevance Judgments

• “Classic” relevance (to “food in Auschwitz”)

Direct Knew food was sometimes withheld

Indirect Saw undernourished people

• Additional relevance types

Context Intensity of manual labor

Comparison Food situation in a different camp

Pointer Mention of a study on the subject

Binary qrels

Page 78: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

English Assessment Process• Search-guided

– Iterate topic research/query formulation/judging

• Grad students in History, trained by librarians

– Essential for collection reuse with future ASR

– Done between topic release and submission

• Highly-ranked (=“Pooled”)

– Same assessors (usually same individual)

– Pools formed from two systems per team

• Top-ranked documents (typically 50)

• Chosen in order recommended by each team

• Omitting segments with search-guided judgments

Page 79: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Quality Assurance

• 14 topics independently assessed

– 0.63 topic-averaged kappa (over all judgments)

– 44% topic-averaged overlap (relevant judgments)

– Assessors later met to adjudicate

• 14 topics assessed and then reviewed

– Decisions of the reviewer were final

Page 80: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Assessor Agreement (2004)

0.0

0.2

0.4

0.6

0.8

1.0

Overall Direct Indirect Context Comparison Pointer

Relevance Type

Kap

pa

14 topics, 4 assessors in 6 pairings, 1806 judgments

(2122) (1592) (184) (775) (283) (235)

44% topic-averaged overlap for Direct+Indirect 2/3/4 judgments

Page 81: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

0

200

400

600

800

1000

1200

1400

16002

36

12

40

01

87

71

85

02

26

42

05

52

36

71

97

92

23

22

40

42

35

82

26

52

36

42

00

62

22

41

87

12

18

51

89

72

21

32

19

82

38

41

82

91

84

32

25

32

01

2

11

88

16

28

16

30

13

30

14

27

11

59

12

79

13

21

15

54

13

37

13

32

14

24

11

79

12

59

14

14

14

46

11

87

15

08

11

92

14

29

13

10

16

20

16

05

12

86

14

31

21

66

31

22

51

31

11

64

71

18

11

56

02

15

87

20

00

12

88

15

60

11

16

61

43

13

11

85

13

41

16

68

15

92

12

82

11

33

16

24

11

73

13

25

13

45

16

23

17

46

15

51

Non-relevant Relevant

English Relevance Judgments

2005 Training Topics (38) Unused (12)2005 Evaluation Topics (25)

Page 82: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

CLEF-2006 CL-SR Overview

• 2 tasks: English segments, Czech start times

– Max of 5 “official” runs per team per task

– Baseline English run: ASR / English TD topics

• 7 teams / 6 countries

– Canada: Ottawa (EN, CZ)

– Czech Republic: West Bohemia (CZ)

– Ireland: DCU (EN)

– Netherlands: Twente (EN)

– Spain: Alicante (EN), UNED (EN)

– USA: Maryland (EN, CZ)

Page 83: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

0.00

0.02

0.04

0.06

0.08

0.10

DCU Ottawa Maryland Twente UNED Alicante

Man

Un

inte

rpo

late

d A

vera

ge P

recis

ion

2006 English Results

Page 84: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

0

50

100

150

200

250

300

2185

2198

2253

2404

1829

1850

2400

2367

2265

1843

2055

2364

1877

1897

1871

2012

1979

2213

2224

2232

2264

2384

2358

2006

2361

Both Search-Guided Highly-Ranked

Contributions by Assessment Type

2005 Evaluation topics, Binary relevance

Page 85: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Search-Guided Rankings (2005)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Search-Guided + Highly Ranked

Se

arc

h-G

uid

ed

On

ly

Mean uninterpolated average precision, 35 Official CLEF-2005 runs, 25 evaluation topics

Page 86: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Search-Guided + Highly Ranked

Hig

hly

Ran

ked

On

lyHighly-Ranked Rankings (2005)

Mean uninterpolated average precision, 35 Official CLEF-2005 runs, 25 evaluation topics

Page 87: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Comparing Index Terms (2003)

0.0

0.1

0.2

0.3

0.4

0.5

ASR Notes ThesTerm Summary Metadata Metadata

+ASR

Me

an

Av

era

ge

Pre

cis

ion

Full

Title

Topical relevance, adjudicated judgments, Inquery

Page 88: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Maryland English Results (2005)

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

metadata+syn.en.qe

autokey+asr.en.qe

asr.en.qe

Mean Average Precision

25 evaluation topics, English TD queries

Precision@10

0.28

0.27

0.48

Page 89: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Comparing ASR with Metadata (2005)

0

0.2

0.4

0.6

0.8

1

1188

1630

2400

2185

1628

1187

1337

1446

2264

1330

1850

1414

1620

2367

2232

2000

14313

2198

1829

1181

1225

14312

1192

2404

2055

1871

1427

2213

1877

2384

1605

1179

Avera

ge P

recis

ion

ASR Metadata Increase

CLEF-2005 training + test – (metadata < 0.2), ASR2004A only, Title queries, Inquery 3.1p1

Page 90: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

ASR of % Metadata

0 10 20 30 40 50 60 70 80

1188

1630

2400

2185

1628

1187

1337

1446

2264

1330

1850

1414

1620

2367

2232

2000

14313

2198

1829

1181

1225

14312

1192

2404

2055

1871

1427

2213

1877

2384

1605

1179

Somewhere in ASR Only in

Metadata

wallenberg (3/36)* rescue jews

wallenberg (3/36) eichmann

abusive female (8/81) personnel

minsko (21/71) ghetto underground

art auschwitz

labor camps ig farben

slave labor telefunken aeg

holocaust sinti roma

sobibor (5/13) death camp

witness eichmann

jews volkswagen

(ASR/Metadata)

Error Analysis

(2005)

CLEF-2005 training + test – (metadata < 0.2), ASR2004A only, Title queries, Inquery 3.1p1

Page 91: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

2004 Error Analysis

0% 20% 40% 60% 80% 100%

1179

1605

1623

1414

1551

1192

14312

1225

1181

1345

1330

1446

1628

1187

1188

1630

Somewhere in ASR Results

(bold occur in <35 segments)

in ASR

Lexicon

Only in Metadata

wit eichmann

jew volkswagen

labor camp ig farben

slave labor telefunken aeg

minsk ghetto underground

wallenberg eichmann

bomb birkeneneau

sonderkommando auschwicz

liber buchenwald dachau

jewish kapo

kindertransport

ghetto life

fort ontario refugee camp

jewish partisan poland

jew shanghai

bulgaria save jew

Title queries, adjudicated judgments, Inquery

ASR % of Metadata

Page 92: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Correcting Relevant Segments

0.0

0.2

0.4

0.6

0.8

1.0

1446 14312 1414

Topic

Un

inte

rpo

late

d A

vera

ge P

recis

ion

ASR Corrected Metadata

Title+Description+Narrative queries

Page 93: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Other Languages

10/01 4/02 10/02 4/03 10/03 4/04 10/04 4/05 10/05 4/06 10/06

30

40

50

60

70

WER [%]

Czech Russian Slovak Polish

45h + LMTr

84h + LMTr

+ LMTr+TC

+ standard.

57.92%

45.91%

41.15%

38.57%

35.51%

+ adapt.

20h + LMTr

66.07%

50.82%

34.49%

Hungarian

100h + LMTr

+ stand.+LMTr+TC

100h + LMTr

+ stand.+LMTr+TC

45.75%

40.69%

50%

25h

Page 94: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

2006: Czech Test Collection

• Two sets of speech recognition transcripts

– Each with both channels recognized

• Queries in Czech or (for CLIR) English

• Unknown-boundary evaluation measure

– Based only on passage start times

– Start and end time assessments available

Page 95: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Standard Resources for Czech

• Brain-dead automatic segmentation

– 400-word passages, with 67% overlap

– Start time used as DOCNO

• Prague Dependency Treebank (LDC)

– Czech/English translation lexicon

• Includes precompiled morphology and probabilities

– Czech morphology

• For both formal and colloquial

• Partially translated thesaurus

– Most commonly used terms, single translation

Page 96: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Interview Usage

0 30 60 90 120 150 180Minutes from interview start

……

……

Cze

ch

Englis

h 80

0

AS

R T

rain

29

7

IR E

va

l

335

Seen

22

Un

see

n

Unseen

Seen

ASR Train

Page 97: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Czech Relevance Assessment

• Czech subject experts assessors

– New assessment system for unsegmented audio

– 2 assessor training topics available for debugging

• Search-guided assessment run in advance

– Search based on onset marks + ASR results

– Summaries from translated thesaurus + ASR

• Highly-ranked after run submission failed

– Time-base misalignment made this less useful

Page 98: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Czech Assessment System

Page 99: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Czech Relevant Passages

0

20

40

60

80

100

1166

1288

1225

1431

1630

2198

1310

1181

3018

1187

1311

1321

1663

1620

3004

3033

1185

3015

1843

3025

4000

3020

3009

1508

3005

3017

3014

Rele

van

t P

assag

es

Search Guided Highly Ranked

Breakdown not available for topics 1286 (70 passages), 2253 (124 passages)

Page 100: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Czech Judgment Rate

0

2

4

6

8

10

12

14

1288

1431

4009

1321

1166

1187

1663

1225

1311

1310

2384

3009

3024

3018

1620

2185

3033

3015

1185

3027

1286

1181

3025

4000

3005

2253

3028

3017

3014

3020

1843

2198

1630

1508

(po

sit

ive)

Ju

dg

men

ts /

Ho

ur

Data not available for topic 3004

Page 101: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Topic-Specific Passages (CZ)

0

50

100

150

200

250

300

350

400

1173

4011

1286

1288

1187

3028

1166

1311

3025

3014

3033

3005

4000

1663

1321

1310

3009

3018

14312

3017

1225

2253

1181

1185

1630

1843

1620

3020

1508M

ed

ian

Passag

e D

ura

tio

n (

seco

nd

s)

Page 102: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Topic-Independent Passages (EN)

Page 103: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Generalized Average Precision

.1

.4

.3

GAP=0.1/1 + (0.1+0.4)/2 + (0.1+0.4+0.3)/3

= 0.46

.9

.8

1.0

GAP=0.9/2 + (0.9+0.8)/4 + (0.9+0.8+1.0)/3

= 0.46

Key Idea: Weight contribution to average

precision by f(absolute error)

Page 104: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

UWB Czech ASR-Only Results

0

0.01

0.02

0.03

0.04

0.05

0.06

Words Stems Lemmas

Gen

era

lized

Avera

ge p

recis

ion

TD

TDN

Page 105: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

CLEF CL-SR Legacy

• Test collection available from ELDA

– IR

– CLIR

– Topic classification

– Topic segmentation

• Baseline results

– Evaluation measures

– System descriptions

Page 106: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Density of Year Mentions in 35% WER ASR

Mean: 2 mentions in every 3 segments

Page 107: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

English Information Extraction

Page 108: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

English Information Extraction

• Entity detection

– In hand-built transcripts: F = 66%

– In speech recognition: F = 11%

• Relationship detection

– In hand-built transcripts: F = 28%

– In speech recognition: F = 0% (!)

Page 109: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Extractive Summarization

SUMMARY:

why don t we start with you saying anything in your about grandparents great

grandparents...well as a small child i remember only one of my grandfathers and his

wife...didn t even have a store he just had a few sacks of different flour and...chela but two

killers of flour we have to remember related times were there was...

SEGMENT ASR (35% WER):oh i ll you know are yeah yeah yeah yeah yeah yeah yeah the very why don t we start with you saying anything in your

about grandparents great grandparents well as a small child i remember only one of my grandfathers and his wife his

second wife he was selling flour and the type of business it was he didn t even have a store he just a few sacks of

different flour and the entrance of an apartment building and people would pass by everyday and buy a chela but two

killers of flour we have to remember related times were there was no already baked bread so people had to baked her

own bread all the time for some strange reason i do remember fresh rolls where everyone would buy every day but not the

bread so that was the business that s how he made a living where was this was the name of the town it wasn t shammay

dish he ours is we be and why i as i know in southern poland and alisa are close to her patient mountains it was rather

mid sized town and uhhuh i was and the only child and the family i had a governess who was with me all their long from

the time i got up until i went to sleep she washed me practice piano she took me to ballet lessons she took me skiing and

skating wherever there was else that I was doing being non reach higher out i needed other children to players and the

governors were always follow me and stay with me while ours twang that i was a rotten spoiled care from work to do

family the youngest and the large large family and everyone was door in the army

QUERY: grandparents

KEYWORDS:Grandfathers, socioeconomic status, cultural and social activities

Przemysl (Poland), Poland 1926 (May 12) - 1935 (May 12), Poland 1935 (May 13) - 1939 (August 31)

Page 110: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Outline

• The MALACH Project

• Learning from our users

• Evaluating new access technologies

What’s left to do?

Page 111: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

What We’ve Learned

• ASR-based search works

– Breakeven for hand transcription: ~1,000 hours

– Breakeven for thesaurus tagging: ~5,000 hours

– Inaccessible content clusters by speaker

• Supervised classification works

– Global+local sequence provide useful evidence

• Entity extraction is weak, relationships fail

• Segmentation should be topic-specific

Page 112: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

For More Information

• CLEF Cross-Language Speech Retrieval track

– http://clef-clsr.umiacs.umd.edu/

• The MALACH project

– http://malach.umiacs.umd.edu/

• NSF/DELOS Spoken Word Access Group

– http://www.dcs.shef.ac.uk/spandh/projects/swag

Page 113: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Other Possibly Useful Features

• Channel characteristics

– Cell phone, landline, studio mike, ...

• Accent

– Another way of grouping speakers

• Prosody

– Detecting emphasis could help search or browsing

• Non-speech audio

– Background sounds, audio cues

Page 114: Speech and Music Retrievaloard/teaching/796/spring11/... · Speech and Music Retrieval LBSC 796/INFM 718R Session 11, April 20, 2011 Douglas W. Oard. Audio Indexing • Music

Human History

Oral Tradition Writing

Human Future

Writing and Speech