14
MPhil in Machine Learning, Speech and Language Technology 2015-2016 MODULE COURSEWORK FEEDBACK Student Name: Module Title: CRSiD: Module Code: College: Coursework Number: I confirm that this piece of work is my own unaided effort and adheres to the Department of Engineering’s guidelines on plagiarism Date Marked: Marker's Name(s): Marker's Comments: This piece of work has been completed to the following standard (Please circle as appropriate): Distinction Pass Fail (C+ - marginal fail) Overall assessment (circle grade) Outstanding A+ A A- B+ B C+ C Unsatisfactory Guideline mark (%) 90-100 80-89 75-79 70-74 65-69 60-64 55-59 50-54 0-49 Penalties 10% of mark for each day late (Sunday excluded) The assignment grades are given for information only; results are provisional and are subject to confirmation at the Final Examiners Meeting and by the Department of Engineering Degree Committee. Riashat Islam ri258 St John's Speech and Language Processing Applications MLSALT5 1

MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

MPhil in Machine Learning, Speech and Language Technology

2015-2016

MODULE COURSEWORK FEEDBACK

Student Name: Module Title: CRSiD: Module Code: College: Coursework Number: I confirm that this piece of work is my own unaided effort and adheres to the Department of Engineering’s guidelines on plagiarism

Date Marked: Marker's Name(s): Marker's Comments: This piece of work has been completed to the following standard (Please circle as appropriate):

Distinction Pass Fail (C+ - marginal fail)

Overall assessment (circle grade) Outstanding A+ A A- B+ B C+ C Unsatisfactory

Guideline mark (%) 90-100 80-89 75-79 70-74 65-69 60-64 55-59 50-54 0-49

Penalties 10% of mark for each day late (Sunday excluded)

The assignment grades are given for information only; results are provisional and are subject to confirmation at the Final Examiners Meeting and by the Department of Engineering Degree Committee.

Riashat Islam

ri258

St John's

Speech and Language Processing Applications

MLSALT5

1

Page 2: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

MLSALT5: Speech and Language ProcessingApplications - Keyword Spotting

Riashat IslamDepartment of EngineeringUniversity of Cambridge

Trumpington Street, Cambridge, CB2 1PZ, [email protected]

I. INTRODUCTION

In this work, we consider the task of keyword spotting basedon Swahili language, as part of the Babel project. Keywordspotting is the task to automatically detect keywords from acontinuous speech system or from written texts from a streamof audio. The task is to find a query keyword or phrase fromwritten text. We focus on KWS technology on low languageresource conditions, inspired from IARPA’s Babel program.The effectiveness of KWS technology is based on a trade-offbetween processing resource requirements and detectionaccuracy.

The report is outlined as follows. In section II, we include abrief explanation of keyword spotting tasks, and the infras-tructure provided. Section III considers a basic KWS systemfor querying words and phrases from a 1-best list. Section IVthen considers improvements in detection accuracy based onmultiple system combinations. For each section, we providean outline of different tasks, include a brief description of thecode, provide experimental results and include a discussion ofresults. Finally, section V includes a summary of the work anda discussion of overall keyword spotting system performance.Code snippets for 1-best output KWS and system combinationare also given in appendix.

II. BACKGROUND

In this work, we consider querying keywords from theintermediate representation of a two-stage keyword spottingsystem, ie, querying from texts instead of from speech directly.We consider the text output from a speech recognition systemas the intermediate representation and search the queriesfrom the texts for fast and efficient on-line query. First weconsider the 1-best output from a LVCSR system for KWS,and then evaluate the performance of our system consideringword decoding hypothesis. The results are then comparedwith output obtained from morphological decomposition. Wealways consider the 1-best output from a speech recognitionsystem, instead of an n-best output or a lattice based approach.

We further consider score normalization techniques such thatall keywords have scores comparable to each other, performingscore normalisation at the query level before KWS. We further

investigate how KWS detection accuracy can be improvedbased on system combination of merging the queries frommultiple systems. We use the Term-Weighted Value (TWV) asa performance metric to evaluate the usefulness of the systemto the user. TWV corresponds to the value that is obtained byone minus the average value lost by the system per term. Thehigher the TWV values, the better the system performance, anda value of 1 corresponds to a perfect score. We also investigatequerying out-of-vocabulary (OOV) terms which often leads toserious performance reduction in KWS systems. We analysehow the detection of OOV terms can be improved after scorenormalisation and system combination. The problem of OOVkeywords is particularly severe when developing KWS tech-niques in low language resource conditions. Hence, we wantto build a system that can achieve good KWS performance,and can handle both in-vocabulary (IV) and out-of-vocabulary(OOV) keywords.

III. 1-BEST KEYWORD SPOTTING

A basic keyword spotting system works as follows: a speech totext engine, an indexer to handle words and phrases from textfile, a detector and a module that can re-rank the detectionsaccording to features and can assign a score.

A. Question 1: Creating an indexerWe first write a basic indexer that is able to handle both wordsand phrases. In section III-A1 we include a brief explanationof code and discussion of how the indexer works. Usingan indexer, we want to convert a supplied word file to anindex file, such that we can directly score the detected wordsand estimate the posterior probability of each detection’scorrectness directly. The created indexer can create an indexcontaining a list of candidate detection words or hits foreach word. We also want to use phonetic transcripts to beable to handle OOV search terms. The kws.py script is ourindexer that can handle word CTM files. Given a suppliedreference.ctm file, run the command below

$ python kws . py

This creates a reference.xml file which can then be suppliedto the scoring script for scoring. For scoring, the detectorreceives a list of search terms and generates a scored list of

Page 3: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

detection words for each term. Scores are then re-assignedbased on a learned model, and a decision function is used toassign a threshold, such that all detections with scores abovethe threshold receives a yes decision. The accuracy of thesystem is also based on whether the match of the term inthe reference transcript is within 0.5 seconds of the assertedtime. TWV is used as an accuracy metric for our KWS system.

1) Explanation of Code: Code snippets for the indexerkws.py is available in appendix.Script directory:

/ home / r i 2 5 8 / Documents /MLSALT5/

For indexing the CTM file, we use pythonxml.tree.ElementTree module that can be used for parsing theCTM file and creating the reference.xml file. The query.xmlfile contains the keyword ids for each word. The xml datais represented as a tree using tree.getroot(), and the wholexml document is represented as a tree. Find each word in thequery using .find(’kwtext’).text from the query.xml file andadd each word to a list of keywords. Another list is furtherused to contain all the elements from the supplied CTM file.For each line in the CTM file, separate each string from theline and first all the words from the CTM file are converted tolowercase using .lower() method, and each word is added tothe set of words. We need to keep a list of the duration timefor each term, the file, channel and the probability of eachword. For the kws-list maintained that contains each querystring from the queries.xml file, each string is split usingword.split() such that we can obtain a sequence of individualwords. Considering the time axis for each words and phrases,the start time and duration for each query is maintained. Thisis because for phrase queries, the hits must be consecutive intime and this must align with the information from the CTMfile. This ensures that we can discard all possible paths, andonly consider paths for which the sequence of list entriesalso appear consecutively in time according to the CTM file.We also consider the gap of 0.5 seconds such that the endtime of a word is always within 0.5 seconds of the start timeof the next word. Once the results of all the hits based ona query is obtained, the results are put according to a KWSoutput format in a .xml file. We obtain a set of hits for a setof queries, and the generated set of hits are then scored usingthe scoring script.

2) Experimental Results:: Run the following command tosee if the system is working correctly. It contains a breakdownfor every query and we see that the number of corrections isthe same as the number of target query terms.

$ g e d i t \. / s c o r i n g / r e f e r e n c e / F u l l Occur MITLLFA3AppenWordSeg . bsum . t x t

The performance of the system can be evaluated as:

$ . / s c r i p t s / t e r m s e l e c t . sh

l i b / t e r m s / ivoov . mapo u t p u t / r e f e r e n c e . xml s c o r i n g a l l

The table below further summarizes the results. The resultsshows that the system achieves a score of TWV=1 for thereference.ctm file since the same reference file is used forwhich it has also been scored against.

Reference.ctmTWV Threshold Number

All 1 1.000 488IV 1 1.000 388OOV 1 1.000 100

B. Question 2: Decoding OutputOur indexer for the KWS system is then used with thedecode.ctm file and the following results summarized in thetable were obtained.

Decode.ctmTWV Threshold Number

All 0.31916 0.167 488IV 0.40141 0.167 388OOV 0.00000 0.167 100

The table shows that the results are poor, especially for OOVwords when we used our KWS system on the 1-best decodingoutput. This is mainly because the word decode.ctm filecontains contains incorrect text from the speech recognitionsystem which further causes wrong number of hits being addedto our index. Since the decode.ctm is a word based outputfrom a speech system, it is further difficult to detect OOVwords since the indexer is based on words. Hence for theOOV words, there are no sets of hits and hence the OOVquery words cannot be detected, leading to a score of 0.

C. Question 3: Morphological DecompositionWe study the impact of morphological decomposition in KWSsystems, such as to find instances of a given keyword fromthe text. Morphological analysis is often useful to handleOOV words in KWS systems. We want to improve the TWVscores for OOV queries by decomposing the query terms andthe indexes into morphs. By decomposing into morphs, wecan increase the probability of scoring a hit since there isa higher chance to find a match for the decomposed querymorphs.

1) Explanation of Code:: Code snippets for the script toperform morphological decomposition is available in appendixsection. The script kwsToMorph.py performs the morph de-composition given a morph dictionary.Script directory:

/ home / r i 2 5 8 / Documents /MLSALT5/

Our script kwsToMorph.py now considers taking themorph.dct and morph.kwlist.dct files. The CTM entries aregiven in morph.dct and the other file contains the dictionaryof queries. By reading both these files, using readlines(), we

Page 4: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

further use .split() to split the query entries and CTM entriesinto a sequence of morphenes. The for loop is maintainedfor each entry from the decode-morph.ctm file and each entryfrom the CTM file is further split up into several entries. Eachtext is further converted to lowercase. A dictionary list-of-words is further maintained for each of the duration, file,channel, time, probability and text. The duration for eachword is also split to assign equal durations for the morphs.Note that we do not apply separate scoring for each morph,but keep the scores for corresponding entries because wewant to assign scores to the sequence of hits instead ofindividual morphs. Our code decomposes the queries fromthe query.xml in addition to decomposing the indexes beforea query operation is made. Any queries are decomposed intomorphs before performing the query. Finally, when the queriesare decomposed, a lattice is further constructed and we searchfor paths for which the 0.5 seconds rule is again applicable(similar to as done previously). The code generates the xml filemorph.xml which can be further scored. However, the outputof the morph based system is not a 1-best list

$ py thon kwsToMorph . py

$ . / s c r i p t s / s c o r e . sh \o u t p u t / morph . xml s c o r i n g

2) Experimental Results:: The table below summarizesthe results when the morph-based KWS system is appliedto the morph ctm file, which is generated from the outputof a morph-based speech recognition system. Our resultsshow that the morph based system improves the performancefor OOV words. The improved performance is because theOOV queries are now further decomposed into morphs, andthere is a higher chance for the index to contain the requiredOOV morphs. Therefore, a sit of hits can be found for OOVqueries now, which can further be scored. However, note thatthe OOV morphs are dependent on the decomposed morphsfrom the in-vocabulary words. Therefore, not all OOV wordscan be handled efficiently. The performance for IV words aremuch better compared to OOV words.

Decode-Morph.ctmTWV Threshold Number

All 0.30833 0.301 488IV 0.37221 0.301 388OOV 0.06051 0.301 100

Furthermore, we note that even though the OOV performancehas improved, it came with a tradeoff of lower performancemeasures for in-vocabulary words (0.372 for morph-decomposition compared to 0.402 with word decodes).Additionally, the overall TWV performance for the entiresystem has also decreased.

The above changes in results can further be explained asfollows. We note that the TWV score is dependent on boththe number of true positives and the number of false positives.Table below further summarizes the results for the number of

hits, number of true positives (Corr) and false alarms (FA).

SummarySystem Targ Corr FA MissWord 963 405 329 558Morph 963 410 722 553

Note: We expected much higher number of hits for themorph system compared to the word based system, sincedifferent words can all share the same morph decomposition.However, table above does show that for the morph basedsystem, there are much higher false alarms (722) comparedto 329 in word based system, and more true positives forthe morph system. Even though the number of Misses hasincreased for morph-system, the number of false alarmsoffsets this reduction which leads to an overall decrease insystem performance for the morph-system (0.3083) comparedto word-system (0.3191).

Figure 1. DET Curve for Morph Decomposition based on our KWS System

D. Question 4: Score Normalisation

The script normalisation.py performs score normalisation andgenerates an output xml file with normalised scores. Scriptdirectory:

/ home / r i 2 5 8 / Documents /MLSALT5/

Score normalization is applied such that we can representthe scores of different keywords as the probability of beingcorrect compared to representing them as raw posteriors. Itis a method for normalizing the posterior scores such thatall keywords can have scores that are comparable with eachother. We show that by ranking the keyword detections basedon the normalized scores, we can significantly improve theTWV performance.

Previously, when comparing the TWV scores, the TWV valueswere based on placing a higher penalty to rarely occurringwords, since a higher penalty was incurred for a miss, es-pecially for rare words. In this section, we perform sum-to-one normalization (STO) such that we can make the scorescomparable for each hits for each keyword, given by:

Page 5: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

ski =s�kiPj s

�kj

(1)

where ski is the ith hit for keyword k. Furthermore, is atunable parameter, and we show how the TWV values changeswith changing � parameters. Finally, for each of the word andmorph based systems, we show the results for the fine tunedvalues.

1) Experimental Results: Normalization was performedby using the script normalisation.py. Below is an example ofthe set of commands that was used to perform sum-to-one(STO) score normalisation on the morph.xml file.

$ f o r gamma in 0 0 . 2 5 0 . 5 0 . 7 5 \1 1 . 2 5 1 . 5 1 . 7 5 2 3 5 7 1 1 ; do

$ py thon n o r m a l i s a t i o n . py o u t p u t / morph . xml gamma \o u t p u t / n o r m a l i s e morph gamma . xml

$end

Figure 2. Effect of Gamma Parameter in Score Normalization for KWS onDecoding hypothesis and on Morphological Decomposition

The tables further summarizes the results with the fine-tuned� parameters for score normalisation.

Morph STO Decode-Morph.ctmTWV Threshold Number

All 0.31688 0.052 488IV 0.28288 0.052 388OOV 0.06079 0.052 100

Table shows the results with � = 1.75 for score normalisationon the morph system. Comparing this result with thatobtained previosuly for the morph decomposition basedsystem without normalisation, we see that normalisationimproves the performance of the morph system from0.308 to 0.317. We also get improved OOV performanceof 0.06079, compared to 0.06051 achieved previously.However, our results show that the score normalisationtechnique significantly decreases the performance for IVwords from 0.372 to 0.283. Also note that the threshold

Figure 3. DET Plot with optimal � parameter for STO Morph decomposition

values siginificantly decreases after applying normalisation.

Word STO Decode.ctmTWV Threshold Number

All 0.32079 0.010 488IV 0.40347 0.010 388OOV 0.0 0.010 100

Figure 4. DET Plot with optimal � parameter for STO Decode decomposition

Table then shows the results with � = 3 for scorenormalisation on the word based system. Similar to themorph system, applying score normalisation also improvesperformance for the word based system from 0.319 to 0.321.However, it is interesting to note that score normalisationhas no effect on OOV words, keeping the TWV values forOOV words at 0 for the word based system. Compared tothe morph system, the TWV for IV words however increasesafter applying score normalisation. Similar to morph system,the threshold value does decrease significantly.

For the morph-system, the � parameter for STO was fine-tuned to be 1.75. For the word-system, the � parameteris fine-tuned at 3. Table below further summarizes theresults after applying score normalisation. We see that scorenormalisation does not change the number of hits, but thereis an overall improvement in TWV for both the systems.

Summary after STOSystem Targ Corr FA MissWord 963 405 329 558Morph 963 410 722 553

Discussion of Results: The decrease in threshold values afternormalisation is justified because the normalisation better

Page 6: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

obtains the true comparable posterior probability, comparedto applying a threshold simply based on raw scores. The rawscores leads to an over approximation to the true posteriorprobability. Also, the TWV values for the overall systemusually improves after score normalisation has been applied,since the scores assigned to the hits are now better modelled.

IV. SYSTEM COMBINATION

Code snippets for the script combination.py used for systemcombination is available in appendix section.Script directory:

/ home / r i 2 5 8 / Documents /MLSALT5/

In this section, we investigate the effect of combining detec-tions of several systems to improve the overall performance.By combining multiple systems, we want to generate a new listof hits such that it can exceed the performance of an individualsystem.

A. Question 1:Combination and Normalisation of Word andMorph Systems

1) Explanation of Combination Implementation:: Scriptdirectory:

/ home / r i 2 5 8 / Documents /MLSALT5/

The script combination.py performs combination of two setsof hits. The script takes the combination of two query resultsfrom two systems, and combines them. In doing so, we needto get the scores for the combined set of hits. We also considerthe time duration for each set of hits. Since the combinationis performed based on the raw scores for each system, scorenormalisation can furhter be performed on the raw combinedscores generated by this script. The script takes two input filesfrom two KWS systems, and generates an output xml file asbefore.

2) Word + Morph Combination without Normalisation:First, we analyse how by simply combining two sets ofhits together from the word and morph based systems, theoverall TWV performance can be improved. We combinedthe generated decode.xml and morph.xml files using the scriptcombination.py exlained before.

$ py thon c o m b i n a t i o n . py \o u t p u t / decode . xml \o u t p u t / morph . xml \o u t p u t / combina t ion decode morph . xml

$ . / s c r i p t s / s c o r e . sh \o u t p u t / combina t ion decode morph . xml \s c o r i n g c o m b i n a t i o n

The results of combining two sets of hits from our previousword and morph-based systems are given in the table below:

Word-Morph CombinationTWV Threshold Number

All 0.32993 0.0167 488IV 0.40231 0.067 388OOV 0.04908 0.067 100

Figure 5. DET Plot for Combination of Decode+Morph

The results for combination of word and morph systemsshows that the overall performance of the combined systemis better (0.3299) compared to each individual word (0.3208)and morph (0.3169( systems. However, we note that the OOVperformance for the combined system is less (0.0491) than thenormalised morph system 0.0608). This is because individualmorph systems can better account of OOVs since only themorphs are considered. However, when the set of hits arecombined, due to hits being generated from word-system, theOOV detoriates for the combined syste, even though overallperformance improves.

3) Word + Morph Combination with Normalisation: Inthis section, normalisation is further applied to the combinedword and morph based system. We again vary the � parameterto fine-tune and optimise the overall system performance.

$ py thon c o m b i n a t i o n . py \o u t p u t / decode . xml o u t p u t / \morph . xml \o u t p u t / combina t ion decode morph . xml

$ py thon n o r m a l i s a t i o n . py \o u t p u t / combina t ion decode morph . xml 0 . 5 \o u t p u t / combina t ion decodemorph normal i se 0 . 5 . xml

$ . / s c r i p t s / s c o r e . sh \o u t p u t / combina t ion decodemorph normal i se 0 . 5 . xml \s c o r i n g c o m b i n a t i o n n o r m a l i s e

Figure 6 shows the effect of changing � values for normalisingthe combined system. The maximum TWV on the overallsystem can be achieved for � = 1.75. Table below furthersummarizes the results for this.

Page 7: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

Figure 6. Normalisation on the combined set of hits, and the effect inperformance with changing � normalisation parameter

Word-Morph Combination-NormalisationTWV Threshold Number

All 0.34506 0.025 488IV 0.42120 0.025 388OOV 0.04963 0.025 100

Figure 7. DET Plot for Combination of Decode+Morph

After fine-tuning the � parameter for normlisation, we seethat the overall performance for the combined system furtherimproves to 0.34 from 0.33. This is the effect of fine-tuningthe gamma parameter in sum-to-one normalisation. Howeverit can still be noted that the OOV performance is better fornormalised individual systems.

4) Discussion of Results:: We can conclude that eventhough the combined and then normalised word and morphsystem achieves better TWV performance overall, it doesnot necessary achieve better OOV performance. However, itdoes achieve much lower threshold values for the combinedsystem, compared to the singly morph decompositon basedsystem. This again justifies that when only a morph systemis used, OOV query words are more likely to be found inthe decomposed index. However, when system combinationis used, OOV query words finding probability or scores doesnot necessarily improve.

Based on our experiments so far, we can conclude that scorenormalisation and system combination of word and morphbased systems can significantly improve the query detec-tion accuracy for the overall system. However, even thoughnormalisation improves OOV TWV values, using a systemcombination does not necessarily justify an improvement inOOV TWV values.

B. Question 2: IBM WFST Outputs

In this section, we investigate the effect of normalisation andsystem combination using three outputs obtained from an IBMWFST software. Three evaluation sets are provided - a morphbased sytem, and two different word based systems using IBMWFST.

1) Each System Evaluation on IBM WFST Outputs:First we consider the evaluation of the two word basedsystems and the morph system on the IBM WFST outputs.We analyse how score normalisation affects the system basedon WFST outputs. Table below summarizes the results.

Each System, IBM WFST OutputsWord Word

Sys2Morph

All 0.3994 0.4032 0.3602IV 0.5012 0.5074 0.4308OOV 0.0000 0.0000 0.0892

Table below further summarizes the results after applyingscore normalisation to each of the systems based on WFSToutputs. Results below are for � = 1.

Normalised Systems on IBM WFST OutputsWord Word

Sys2Morph

All 0.4602 0.0.4651 0.5205IV 0.5793 0.5857 0.5567OOV 0.0000 0.0000 0.3678

Comparing the two tables above, we see that TWVperformance significantly improves after applying scorenormalisation to each of the supplied systems. For example,for the morph-based system, there is an improvement inTWV from 0.36 to 0.52 after normalisation. This is againbecause after normalisation, the rare hits are given a higherscore than before, and this also leads to improved OOV TWVperformance from 0.089 to 0.367 for the morph-system.Applying our score normalisation implementation, this furthervalidates that normalisation can improve the OOV and overallsystem performance. The improvement is more significant formorph-sysetm OOV since even out-of-vocabulary words maynot be decomposed into the same sequence of morphs as IVwords.

Also, note that the evaluation systems based on IBMWFST KWS software achieves much better performancecompared to KWS system we considered previously. Thenormalised morph system based on WFST achieves a TWVof 0.52, which is much better than the Word + Morph KWScombination from our system. Furthermore, WFST basedoutputs can achieve significanlty higher OOV performance.

Effect of Gamma in Normalisation of Evaluation Systemsbased on IBM WFSTWe analyse the effect of simply performing score normali-sation on the unnormalised scores from the IBM WFST. As

Page 8: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

seen before, we again need to fine tune the � parameters,and then compare the normalised scores for the three differentevaluation systems.

Figure 8. Normalisation of each of the IBM Language Evaluation Systems

Figure 8 shows the effect of changing � parameters fornormalisation on each of the morph and word based WFSTsystems. We find that for all the three systems, � = 0.75usually achieves maximum TWV performance. This is furthersummarized in the table below.

2) Evaluation System Combination based on IBM WFST:System combination is then performed for each of the IBMlanguage systems. Using the set of hits from each system,we combine the query results together to score the combinedset of hits. The table below summarizes the results fromcombining the systems.

WFST System Combination (no normalisation)Morph-Word

Morph-Word2

Word-Word2

All 0.39252 0.39548 0.40175IV 0.46935 0.47400 0.50529OOV 0.09442 0.09079 0

Figure 9. DET Plot for combination of Morph and Word System based onIBM WFST KWS Software

The results above shows that by combining WFST systems,we can improve the OOV query performance significantly,even if the overall system performance may not be too high.

3) Evaluation System Combination With Normalisationbased on IBM WFST: Further from combination, score nor-malisation is then applied to the combined systems. Figure 11

below shows how the performance changes with � parametersfor score normalisation.

Figure 10. Combination followed by Normalisation on the IBM langaugeevaluate systems

Figure 11 shows that the combination and normalisation ofa morph based system significantly outperforms a combinedsystem of words only.

WFST System Combination with normalisation, � = 0.75Morph-Word

Morph-Word2

Word-Word2

All 0.54009 0.53589 0.46877IV 0.58217 0.57669 0.58958OOV 0.37680 0.37764 0

Figure 11. DET Plot for Normalised combination of IBM Word+MorphSystem

The table above summarizes the results when the � parameteris fine-tuned to 0.75, and we see overall system performance.Again, the results from table above shows that for the morphbased WFST system, the OOV TWV performance can besignificantly improved, and we also achieve a very highoverall score of 0.54009. Compared to that, for a combinedword system, the OOV measures are still 0 as expected.

Again, we find that system combination followed by normal-isation significantly improves the performance for each ofthe systems. In particular, the improvement is much higherfor IBM Morph-systems when combined with any of theIBM word-based systems. For example, after normalisation,the overall performance for Morph+Word system improvedto 0.54 from 0.39. There is also a significant increase in

Page 9: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

OOV performance from 0.094 to 0.377, which is the largestimprovement in OOV performance seen so far. Before weconcluded that using morph-systems, the morphene sequencecan better take account of OOV words. Comparing all theresults, we see that the IBM WFST-based KWS software cansignificantly show the improvements that can be achieved forsystem combination and score normalisation.

4) Query Length: Our previous results shows that theIBM WFST Morph-based system, with score normalisationcan outperform all the other KWS systems considered inthis work. We considered segmentation of the output of thissystem, and then analyse the performance with differentlengths of query. For this, we considered mapping each of thequeries from the queries.xml file to their corresponding lengthof the query. Furhtermore, since we are only considering amorph-based system, so the mapping of queries to lengthsrequires furher requires mapping the length of the query tothe number of decomposed morphs present in the query. Thismapping file is further used with termselect.sh to analyse theTWV performance with each query length.

Figure 12 shows how the performance is dependent on thelength of the query. Note that each length corresponds to thenumber of morphs, considering the IBM WFST normalisedmorph based system.

Figure 12. TWV performance based on length of query

Figure 12 shows that when we are considering only a singlemorphene (length =1), a single morph will correspond to ahigh TWV score, since it will generate a large number of hits.When the number of morphs is increased, it initially leads toeven better TWV scores. This is because, for example, whenconsidering 3 or 4 morphs, there will be a unique sequenceof morphs that can keep FA low, leading to a higher TWVscore. However, as we furhter increase the number of morphs,this leads to lower TWV scores. This is because such a largesequence of morphs is less likely to be occurred, and the lowprobability of occurrence would lead to lower identificationby the KWS sytem. Hence, for a long query length, there willbe large number of misses by the KWS system, which willlead to lower TWV scores.

Figure 13. DET plot for overall best system based on system combination

Figure 14. DET plot for IBM Morph System

V. SUMMARY AND OVERALL BEST SYSTEM

Figure 13 and 14 finally shows the DET plots based onscoring the best KWS systems, based on the IBM WFSTKWS software.

We can conclude that morph-based systems can achieve muchbetter OOV accuracy compared to word based systems. Eventhough overall TWV performance may deteriorate slightly,normalised morph systems combined with word best systemscan outperform. Considering the IBM WFST KWS software,we can also conclude that WFST based KWS systems mayperform significantly better in terms of overall and OOVperformance. We also showed that for the best performingKWS system, the lenght of query or the number of morphscan also affect the TWV performance. Overall, we analysedthe significance of score normalisation and combination ofset of hits from different systems, and how it may impact thekeyword spotting performance based on real data.

Page 10: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

APPENDIX: Keyword Spotting CODE

March 13, 2016

1 Indexer for Word-Based KWS System

A code snippet of the word-based KWS system indexer is included here. The script kws.py is available in:

1

2 /home/ r i 258 /Documents/MLSALT5/kws . py

1 i nput f i l ename = ’ / usr / l o c a l / teach /MLSALT5/ Pra c t i c a l / l i b /ctms/ r e f e r e n c e . ctm ’

2 inputxml = ’ / usr / l o c a l / teach /MLSALT5/ Pra c t i c a l / l i b /kws/ que r i e s . xml ’

3 output f i l ename = ’ / remote/mlsa l t 2 0 1 5 / r i 258 /Documents/MLSALT5/output / r e f e r e n c e . xml ’

4

5

6 ’ ’ ’

7 i nput f i l ename = ’/ usr / l o c a l / teach /MLSALT5/ Pra c t i c a l / l i b /ctms/decode . ctm ’

8 inputxml = ’/ usr / l o c a l / teach /MLSALT5/ Pra c t i c a l / l i b /kws/ que r i e s . xml ’

9 output f i l ename = ’/ remote/mlsa l t 2 0 1 5 / r i 258 /Documents/MLSALT5/output /decode . xml ’

10 ’ ’ ’

11

12

13

14 f o r l i n e in readwords :

15 l i n e = l i n e . s p l i t ( )

16 l i n e [ 4 ] = l i n e [ 4 ] . lower ( )

17 se t words . add ( l i n e [ 4 ] )

18

19

20 i f p r ev ious == l i n e [ 0 ] :

21

22 i f l i n e [ 4 ] in d i c t i ona ry words [ i ] . keys ( ) :

23 d i c t i ona ry words [ i ] [ l i n e [ 4 ] ] += [ l en ( l i s t w o r d s [ i ] ) ]

24

25 e l s e :

26 d i c t i ona ry words [ i ] [ l i n e [ 4 ] ] = [ l en ( l i s t w o r d s [ i ] ) ]

27 l i s t w o r d s [ i ] += [{ ’ durat ion ’ : l i n e [ 3 ] , ’ f i l e ’ : l i n e [ 0 ] , ’ channel ’ : l i n e [ 1 ] , ’ time ’ :

l i n e [ 2 ] , ’ prob ’ : l i n e [ 5 ] , ’ t ex t ’ : l i n e [ 4 ] } ]28

29

30

31 e l s e :

32 i+=1

33

34 l i s t w o r d s . append ( [ ] )

35

36 d i c t i ona ry words . append ({} )37

38 d i c t i ona ry words [ i ] [ l i n e [ 4 ] ] = [ l en ( l i s t w o r d s [ i ] ) ]

39

40 l i s t w o r d s [ i ] += [{ ’ durat ion ’ : l i n e [ 3 ] , ’ f i l e ’ : l i n e [ 0 ] , ’ channel ’ : l i n e [ 1 ] , ’ time ’ :

l i n e [ 2 ] , ’ prob ’ : l i n e [ 5 ] , ’ t ex t ’ : l i n e [ 4 ] } ]41 prev ious = l i n e [ 0 ]

42

43 o u t f i l e = open ( outputf i l ename , ’w ’ )

44

1

Page 11: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

45 o u t f i l e . wr i t e ( ’<kws l i s t kw l i s t f i l e n ame=”IARPA babel202b v1 . 0 d conv dev . kw l i s t . xml” language

=”swah i l i ” system id=””>\n ’ )

46

47

48 f o r word in kw s l i s t :

49

50 o u t f i l e . wr i t e ( ’<de t e c t e d kw l i s t kwid=” ’+kws d ic t [ word]+ ’ ” oov count=”0” sea rch t ime

=”0.0”>\n ’ )

51

52 s p l i t wo rd s = word . s p l i t ( )

53

54 f i r s t = sp l i t wo rd s [ 0 ]

55

56 f o r z , l i s t o f w o r d s in enumerate ( l i s t w o r d s ) :

57

58 i f f i r s t in d i c t i ona ry words [ z ] . keys ( ) :

59

60 occ = d i c t i ona ry words [ z ] [ f i r s t ]

61

62 f o r i in occ :

63 counter = 1

64 s t a r t t ime = f l o a t ( l i s t o f w o r d s [ i ] [ ’ time ’ ] )

65 time = s ta r t t ime+f l o a t ( l i s t o f w o r d s [ i ] [ ’ durat ion ’ ] )

66 dur = f l o a t ( l i s t o f w o r d s [ i ] [ ’ durat ion ’ ] )

67 s c o r e = f l o a t ( l i s t o f w o r d s [ i ] [ ’ prob ’ ] )

68

69

70 f o r j , mword in enumerate ( s p l i t wo rd s ) :

71

72

73

74 i f i+j >= len ( l i s t o f w o r d s ) :

75 counter = 0

76 break

77

78

79 i f mword == l i s t o f w o r d s [ i+j ] [ ’ t ex t ’ ] :

80 gap = f l o a t ( l i s t o f w o r d s [ i+j ] [ ’ time ’ ] ) time

81 dur = gap+dur+f l o a t ( l i s t o f w o r d s [ i+j ] [ ’ durat ion ’ ] )

82 time = time+f l o a t ( l i s t o f w o r d s [ i+j ] [ ’ durat ion ’ ] )+gap

83 s c o r e = sco r e ⇤ f l o a t ( l i s t o f w o r d s [ i+j ] [ ’ prob ’ ] )

84

85

86 i f gap > 0 . 5 :

87 counter = 0

88

89

90 e l s e :

91 counter = 0

92

93

94 i f counter == 1 :

95 o u t f i l e . wr i t e ( ’<kw f i l e =” ’+l i s t o f w o r d s [ i ] [ ’ f i l e ’ ]+ ’ ” channel=” ’+

l i s t o f w o r d s [ i ] [ ’ channel ’ ]+ ’ ” tbeg=” ’+s t r ( ”%.2 f ” % s ta r t t ime )+’ ” dur=” ’+s t r ( ”%.2 f ” %

dur )+’ ” s co r e=” ’+s t r ( ”%.2 f ” % sco r e )+’ ” d e c i s i o n=”YES”/>\n ’ )

Listing 1: Indexer - kws.py

2 Morphological Decomposition

The entire script is available in:

1

2 /home/ r i 258 /Documents/MLSALT5/kwsToMorph . py

1

2

Page 12: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

2

3 # decompose the morph . dct

4 f o r l i n e in f i l e mo r ph d i c t :

5 l i n e = l i n e . s p l i t ( )

6 dict ionary morph [ l i n e [ 0 ] ] = l i n e [ 1 : ]

7

8

9 # decompose the d i c t i ona ry o f morph . kw s l i s t . dct

10 f o r l i n e in f i l e morph :

11 l i n e = l i n e . s p l i t ( )

12 dict ionary morph [ l i n e [ 0 ] ] = l i n e [ 1 : ]

13

14

15 # crea t e the t r e e s t r u c tu r e again f o r the query terms

16 t r e e = ET. parse ( inputxml )

17 root = t r e e . g e t r oo t ( )

18

19 input words = i n p u t f i l e . r e a d l i n e s ( )

20

21 # di c t i ona ry to get the kwids from the query terms from query . xml

22 kw dic t i onary = {}23 kw l i s t = [ ]

24

25

26

27 f o r word in root :

28 kw= word . f i nd ( ’ kwtext ’ ) . t ex t

29 kw dic t i onary [ kw]= word . get ( ’ kwid ’ )

30 kw l i s t += [kw ]

31

32 words = {}33

34 i = 1

35

36 p r e v i o u s f i l e = ’ ’

37

38 l i s t o f w o r d s = [ ]

39

40 # loop f o r each CTM entry from decode morph . ctm

41 f o r l i n e in input words :

42 # fu r th e r s p l i t up each decode morph . ctm en t r i e s

43 l i n e = l i n e . s p l i t ( )

44 l i n e [ 4 ] = l i n e [ 4 ] . lower ( )

45

46

47 i f p r e v i o u s f i l e == l i n e [ 0 ] :

48 l i s t o f w o r d s [ i ] += [{ ’ durat ion ’ : l i n e [ 3 ] , ’ f i l e ’ : l i n e [ 0 ] , ’ channel ’ : l i n e [ 1 ] , ’ time ’

: l i n e [ 2 ] , ’ prob ’ : l i n e [ 5 ] , ’ t ex t ’ : l i n e [ 4 ] } ]49

50

51 e l s e :

52 i+=1

53 l i s t o f w o r d s . append ( [ ] )

54 l i s t o f w o r d s [ i ] += [{ ’ durat ion ’ : l i n e [ 3 ] , ’ f i l e ’ : l i n e [ 0 ] , ’ channel ’ : l i n e [ 1 ] , ’ time ’

: l i n e [ 2 ] , ’ prob ’ : l i n e [ 5 ] , ’ t ex t ’ : l i n e [ 4 ] } ]55 p r e v i o u s f i l e = l i n e [ 0 ]

56

57 o u t f i l e = open ( outputf i l ename , ’w ’ )

58

59

60 o u t f i l e . wr i t e ( ’<kws l i s t kw l i s t f i l e n ame=”IARPA babel202b v1 . 0 d conv dev . kw l i s t . xml” language

=”swah i l i ” system id=””>\n ’ )

61

62 f o r kw in kw l i s t :

63 o u t f i l e . wr i t e ( ’<de t e c t e d kw l i s t kwid=” ’+kw dic t i onary [ kw]+ ’ ” oov count=”0” sea rch t ime

=”0.0”>\n ’ )

64 s p l i t wo rd s = kw . s p l i t ( )

65 kws morphs = [ ]

3

Page 13: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

66

67 f o r word in s p l i t wo rd s :

68 kws morphs += dict ionary morph [ word ]

69

70 f i r s t = kws morphs [ 0 ]

71

72

73 f o r wo rd l i s t in l i s t o f w o r d s :

74 occ = [ i f o r i , x in enumerate ( wo rd l i s t ) i f x [ ’ t ex t ’ ] == f i r s t ]

75

76 f o r i in occ :

77 count = 1

78 s t a r t t ime = f l o a t ( wo rd l i s t [ i ] [ ’ time ’ ] )

79 time = s ta r t t ime+f l o a t ( wo rd l i s t [ i ] [ ’ durat ion ’ ] )

80 dur = 0 .0

81

82 f o r j , s p l i t wo rd in enumerate ( kws morphs ) :

83

84 i f i+j >= len ( wo rd l i s t ) :

85 break

86

87 i f s p l i t wo rd == word l i s t [ i+j ] [ ’ t ex t ’ ] :

88 gap = f l o a t ( wo rd l i s t [ i+j ] [ ’ time ’ ] ) time

89 dur = gap+dur+f l o a t ( wo rd l i s t [ i+j ] [ ’ durat ion ’ ] )

90 time = time+f l o a t ( wo rd l i s t [ i+j ] [ ’ durat ion ’ ] )+gap

91 i f gap > 0 . 5 :

92 count = 0

93

94 e l s e :

95 count = 0

96

97 i f count == 1 :

98 o u t f i l e . wr i t e ( ’<kw f i l e =” ’+word l i s t [ i ] [ ’ f i l e ’ ]+ ’ ” channel=” ’+word l i s t [ i

] [ ’ channel ’ ]+ ’ ” tbeg=” ’+s t r ( ”%.2 f ” % s ta r t t ime )+’ ” dur=” ’+s t r ( ”%.2 f ” % dur )+’ ” s co r e=” ’+

word l i s t [ i ] [ ’ prob ’ ]+ ’ ” d e c i s i o n=”YES”/>\n ’ )

Listing 2: Morphological Decomposition - kwsToMorph.py

3 System Combination

1

2 /home/ r i 258 /Documents/MLSALT5/ combination . py

1

2

3

4 f o r i , w o r d s f i l e 1 in enumerate ( root ) :

5 kw = wo rd s f i l e 1 . get ( ’ kwid ’ )

6 l = l en ( wo r d s f i l e 1 )

7

8

9 kws detected = root2 . f i nd ( ” .// d e t e c t e d kw l i s t [ @kwid=’%s ’ ] ” % kw)

10

11 i f kws detected i s None :

12 cont inue

13

14 i f l i s t ( wo r d s f i l e 1 ) == [ ] and l i s t ( kws detected ) != 0 :

15 f o r word in kws detected :

16 wo rd s f i l e 1 . append (word )

17

18

19 f o r k , word in enumerate ( kws detected ) :

20

21 beg2 = f l o a t (word . get ( ’ tbeg ’ ) )

22

4

Page 14: MODULE COURSEWORK FEEDBACK - WordPress.com · word.split() such that we can obtain a sequence of individual words. Considering the time axis for each words and phrases, the start

23 end2 = beg2 + f l o a t (word . get ( ’ dur ’ ) )

24

25 match = 0

26 f o r j in range (0 , l ) :

27 word = wo rd s f i l e 1 [ j ]

28 beg = f l o a t (word . get ( ’ tbeg ’ ) )

29

30

31 end = beg + f l o a t (word . get ( ’ dur ’ ) )

32

33 i f ( beg <= beg2 <= end ) or ( beg <= end2 <= end ) :

34

35 a l l s c o r e = f l o a t (word . get ( ’ s c o r e ’ ) ) + f l o a t ( kws detected [ k ] . get ( ’ s c o r e ’ ) )

36

37 s c o r e h i t 1 = f l o a t (word . get ( ’ s c o r e ’ ) )

38

39 s c o r e h i t 2 = f l o a t ( kws detected [ k ] . get ( ’ s c o r e ’ ) )

40

41 tbeg = ( beg⇤ s c o r e h i t 1 + beg2⇤ s c o r e h i t 1 ) / a l l s c o r e

42

43 durat ion = ( end⇤ s c o r e h i t 1+end2⇤ s c o r e h i t 2 ) / a l l s c o r e tbeg

44

45 word . s e t ( ’ tbeg ’ , ”%.2 f ” % tbeg )

46

47 word . s e t ( ’ dur ’ , ”%.2 f ” % durat ion )

48

49 word . s e t ( ’ s c o r e ’ , ”%.6 f ” % a l l s c o r e )

50

51 match = 1

52

53 i f match == 0 :

54

55 wo rd s f i l e 1 . append (word )

Listing 3: System Combination - combination.py

5