Toward Zero Resources (or how to get something from nothing)

Slide 1

Toward Zero Resources(or how to get something from nothing)Towards Spoken Term Discovery at Scalewith Zero ResourcesJansen, Church & HermanskyInterspeech-2010NLP on Spoken Documents Without ASRDredze, Jansen, Coppersmith & ChurchEMNLP-20101We Dont Need Speech Recognition To Process SpeechAt least for some tasks2Linking without LabelingASR = Linking + LabelingLinking: find repetitionsLabeling: assign text stringsBOW (Bag of Words) BOP (Bag of Pseudo-terms)Pseudo-Terms: Linking (without Labeling)BOP: Sufficient for many NLP tasks

Speech Processing ChainSpeech RecognitionSpeech CollectionText ProcessingInformation RetrievalCorpus OrganizationInformation ExtractionSentiment AnalysisManual Transcripts

Full Transcripts

This TalkBag of Words RepresentationGood enough for many tasksBottleneckNlp person4Our Goal

0 0 1 0 0 1 1 1Extract FeaturesLink Audio SegmentsLabel Segments with TextFull TranscriptsSpeech RecognitionText ProcessingFind long (1s) repetitionsBOW BOPInterspeech-2010Link Segments

Extract FeaturesEMNLP-2010LabelingIm a text person: remove ASR from speech pipeline5DefinitionsTowards: Not there yetZero Resources:No nothing (no knowledge of language/domain)The next crisis will be where we are least preparedNo training data, no dictionaries, no models, no linguisticsLow Resources: A little more than zeroSpoken Term Discovery (Linking without Labeling)Spoken Term Detection (Word Spotting): StandardFind instances of spoken phrase in spoken documentInput: spoken phrase + spoken documentSpoken Term Discovery: Non-standard taskInput: spoken document (without spoken phrase)Output: spoken phrases (interesting repeated intervals in document)

6

What makes an interval of speech interesting?Cues from text processing:Long (~ 1 sec such as The Ed Sullivan Show)RepeatedBursty (tf * IDF)tf: lots of repetitions within a particular documentIDF: with relatively few repetitions across other documentsUnique to speech processing:Given-New: First mention is articulated more carefully than subsequentDialog between two parties (A & B):A: utters an important phraseB: what?A: repeats the important phrase

7

8Related Work(Mostly Speech Literature and Mostly from Boston)Other approachesPhone recognition (Lincoln Labs)Use existing phone recognizers to create phone n-grams for topic classificationHazen et al., 2007, 2008Self organizing units (BBN)Unsupervised discovery of phone like units for topic classificationGarcia and Gish, 2006; Siu et al, 2010Find recurring patterns of speech (MIT-CSAIL)Park and Glass, 2006, 2008Similar goalsAudio summarization without ASRFinds similar regions to include in summaryZhu, 2009 (ACL)

3 of 4 in bostonSay name of organization9

10

11

12

13

14

15

16

17n2 Time & SpaceBut the constants are attractiveSparsityResigned algorithms to take advantage of sparsityMedian FilteringHough TransformLine Segment Search18

19

20

21

22

23

24

25

26

27Two TasksClassification (supervised)Learn how to organize speech documents into labels

Clustering (unsupervised)Discover latent groupings of speech documents

28Representations for LearningBack to NLP

Group matched segments into Pseudo-TermsBOW (bag of words) BOP (bag of pseudo-terms)

0 0 1 0 0 1 1 1

Matched SegmentsFeature Vectors29Creating Pseudo-TermsP1P2P330Example Pseudo-Termsterm_5term_6term_63term_113term_114term_115term_116term_117term_118term_119term_120term_121term_122our_life_insurancetermlife_insurancehow_much_welong_termbudget_forour_life_insurancebudgetend_of_the_month stay_within_a_certainyou_knowhave_tocertain_budget31Graph Based ClusteringNodes: each matched audio segmentEdges: edge between two segment if fractional overlap exceeds thresholdExtract connected components of graphThis work: One pseudo-term for each connected componentFuture work: better graph clustering algorithmsnewspapersnewspapera paperkeep track ofkeep trackPseudo-term 1Pseudo-term 210min32Tradeoff in Cluster QualityWe need to find the right tradeoff for our taskSelect tradeoff based on dev dataLessMorePseudo-Termsterm_5term_63term_116

our_life_insurancelife_insuranceour_life_insuranceSmallerLargerSimilarity ThresholdNot perfect, but good enough?33Document RepresentationTranscriptPseudo-termsFour score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.

P4 P6 P9 P19 P2 P12

34Feature Vectors: BOW BOPFour score and seven years is a lot of years.fourscoresevenyears...

1112term_12term_5term_12term_12term_5210 0 1 0 10 0 1 0 1Question: are pseudo-terms good enough?35Evaluation: DataSwitchboard telephone speech corpus600 conversation sides, 6 topics, 60+ hours of audioTopics: family life, news media, public education, exercise, pets, taxesIdentify all pairs of matched regionsGraph clustering to produce pseudo-termsO(n2) on 60+ hours is a lot!Efficient algorithms and sparsity not as bad as you think500 terapixel dotplot from 60+ hours of speechCompute time: 100 cores, 5 hours

13min36

A screen that is 3.5 miles, the distance from BBN to MIT

37EvaluationRepresentationsManual transcripts as bag of wordsRequires full speech recognitionPseudo-termsRequires acoustic model

38Two Evaluation TasksTopic clustering (unsupervised)Automatically discover latent topics in conversationsStandard clusterer given correct number of topics

Topic classification (supervised)Learn topic labels from supervised dataSeveral classification algorithmsCW (Dredze et al, 2008)MaxEnt10 fold CV39Clustering (Unsupervised) ResultsPseudo-95Manual- 9940Classification (Supervised) ResultsPseudo-95Manual- 9941What Weve Achieved

Speech CollectionPesuedo Term DiscoveryText ProcessingInformation RetrievalCorpus OrganizationInformation ExtractionSentiment AnalysisWe can process languages that dont have ASR42Future Directions(More something from nothing)Extend NLP of speech to new areasLanguages, domains, settings where we have little data for speech recognitionBOW (BOP) sufficient for many NLP tasksBOW (BOP) TF*IDF!Lingering QuestionsWhat else can we do?Topic models?Information extraction?Information retrieval?43