13
1 A brief tutorial on HTK for sequence data Thies Gehrmann Leiden University HTK is a complicated toolkit, and getting used to it will take some time. It is normally used for speech recognition technologies and incorporates a large number of highly complicated tools. This brief tutorial is aimed to familiarize you with using it for sequence data rather than audio data, for which it was intended. All materials presented here can be downloaded from medialab.liacs.nl/cmb2015/lab03.zip . Contents 1 Introduction 2 1.1 Using HTK at LIACS....................................................................................................................... 2 1.2 How to download HTK...................................................................................................................... 2 1.3 How to install HTK .................................................................................................................... 2 2 Basic HTK files 2 2.1 HMM List .......................................................................................................................................... 3 2.2 HMM Structure ................................................................................................................................. 3 2.3 HMM Network Lattice ..................................................................................................................... 4 2.4 Word Dictionary ............................................................................................................................... 5 2.5 Sequences ................................................................................................................................................. 5 3 Simple Dishonest Casino 5 3.1 HMM List .......................................................................................................................................... 5 3.2 Fair HMM .......................................................................................................................................... 6 3.3 Unfair HMM...................................................................................................................................... 6 3.4 Network lattice......................................................................................................................................... 7 3.5 Dictionary .......................................................................................................................................... 7 4 Complex Dishonest Casino 7 4.1 HMM List .......................................................................................................................................... 9 4.2 Die side HMM................................................................................................................................... 9 4.3 Network lattice......................................................................................................................................... 9 4.4 Dictionary ........................................................................................................................................ 11 5 HTK Tools 11 5.1 Posterior Decoding (MAP estimation) ...................................................................................... 11 5.2 Training .......................................................................................................................................................... 11

A brief tutorial on HTK for sequence dataliacs.leidenuniv.nl/~bakkerem2/cmb2015/htk_guide.pdfA brief tutorial on HTK for sequence data Thies Gehrmann Leiden University HTK is a complicated

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

  • 1

    A brief tutorial on HTK for sequence data

    Thies Gehrmann

    Leiden University

    HTK is a complicated toolkit, and getting used to it will take some time. It is normally used for speech recognition technologies and incorporates a large number of highly complicated tools. This brief tutorial is aimed to familiarize you with using it for sequence data rather than audio data, for which it was intended. All materials presented here can be downloaded from medialab.liacs.nl/cmb2015/lab03.zip .

    Contents

    1 Introduction 2 1.1 Using HTK at LIACS ....................................................................................................................... 2 1.2 How to download HTK...................................................................................................................... 2

    1.3 How to install HTK .................................................................................................................... 2

    2 Basic HTK files 2 2.1 HMM List .......................................................................................................................................... 3 2.2 HMM Structure ................................................................................................................................. 3 2.3 HMM Network Lattice ..................................................................................................................... 4 2.4 Word Dictionary ............................................................................................................................... 5

    2.5 Sequences ................................................................................................................................................. 5

    3 Simple Dishonest Casino 5 3.1 HMM List .......................................................................................................................................... 5 3.2 Fair HMM .......................................................................................................................................... 6 3.3 Unfair HMM ...................................................................................................................................... 6 3.4 Network lattice......................................................................................................................................... 7

    3.5 Dictionary .......................................................................................................................................... 7

    4 Complex Dishonest Casino 7 4.1 HMM List .......................................................................................................................................... 9 4.2 Die side HMM ................................................................................................................................... 9 4.3 Network lattice......................................................................................................................................... 9

    4.4 Dictionary ........................................................................................................................................ 11

    5 HTK Tools 11 5.1 Posterior Decoding (MAP estimation) ...................................................................................... 11

    5.2 Training .......................................................................................................................................................... 11

    http://medialab.liacs.nl/cmb2015/lab03.zip

  • 2

    1 Introduction

    This guide presents only a small portion of the functionality of HTK; many features have been intentionally omitted to aid simplicity. HTK is a ferocious beast that can be hard to tame, but hopefully this guide will help you understand it in the simplest form. For a complete guide on HTK, you should consult the HTK Book1 which explains in painful detail every feature of HTK. By reading this guide however, you should leave with a functioning understanding of HTK.

    1.1 Using HTK at ssh.liacs.nl (Note: not available >2014)

    If you have access to ssh.liacs.nl, you can use HTK by adding /home/tgehrman/htk/bin/ to your PATH variable with the following command in tcsh:

    set path=($PATH /home/tgehrman/htk/bin)

    and with the following command in BASH:

    export path=$PATH:/home/tgehrman/htk/bin

    Then all the commands in the HTK toolkit are available to you. This works only on ssh.liacs.nl.

    1.2 How to download HTK

    In order to download HTK, you must first register http://htk.eng.cam.ac.uk/register.shtml and accept the license agreement before you receive an email with your password. Once you have the pass- word, you can download the Windows executables or the Linux sources at http://htk.eng.cam.ac.uk/ download.shtml.

    1.3 How to install HTK

    Linux: As long as you have the libX11 packages and the basic development packages, you should be able to compile HTK very easily for the i686 CPUs. If you are running a 64 bit CPU, then you may have to patch some of the makefiles. If you are running Archlinux, you can download the HTK PKGBUILD from https://aur.archlinux.org/packages.php?ID=18982. Make sure to grab the one posted by dimigon and the associated patch files.

    2 Basic HTK files

    The operation of HTK requires a few files. Each of these will be described here. The figure shows how these files fit together

    • HMM List

    • HMM Structure

    • HMM Network

    • HMM Dictionary

    • HMM sequence data

    1HTK Book: http://htk.eng.cam.ac.uk/download.shtml

    http://htk.eng.cam.ac.uk/register.shtmlhttp://htk.eng.cam.ac.uk/download.shtmlhttp://htk.eng.cam.ac.uk/download.shtmlhttps://aur.archlinux.org/packages.php?ID=18982http://htk.eng.cam.ac.uk/download.shtml

  • 3

    2.1 HMM List

    This is the easiest part. You define a file which simply lists the filenames of all the HMMs you have created. For example, listing 1 shows a HMMLIST which lists two HMMs with filenames ’HMM1’ and ’HMM2’.

    HMM1 HMM2

    Listing 1: ”HTK HMM List file”

    2.2 HMM Structure

    In the HMM structure you define the interactions between states and the emission of symbols. This is the most complicated file in the whole HTK toolkit, so pay attention. Listing 2 shows a simple HMM. Do not panic, it will be explained line by line.

    ~o 1 1 ~h "fair"

    3

    Listing 2: ”HTK HMM file”

    2 6 4250 4250 4250 4250 4250 4250

    3 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00

    ˜o 1 1

    ˜o specifies that an HTK option is about to be specified. states that this HMM will emit discrete symbols, rather than continuous ones, i.e., we will specify a discrete distribution rather than a continuous one. 1 1 specifies how many streams this HMM deals with. In this case we have a 1 stream HMM with a width of 1 (i.e. each emission is a vector of size 1).

    ˜h “fair”

    ˜h indicates that this is an HMM whose name is specified in the following string. In this case it is called“fair”. The name of the HMM should be the same as the filename.

    ...

    All state and transition probabilities are defined between these two symbols.

    3

    This defines that the HMM has three states. It includes the start state, 1, and the end state 3. In this case there is only one state, 2, for which we can define an emission distribution.

    6

  • 4

    This indicates that this HMM deals with 6 symbols.

    4250 4250 4250 4250 4250 4250

    This line defines the probabilities for emitting each of the 6 symbols that this HMM deals with. They are obviously not real probabilities, but they are discretized values which represent probabilities. Each number is between 0 and 32767, representing values between 1 and 0.000001. They can be converted back to a probability using equation 1, and calculated using the inverse in equation 2.

    ps,sym = exp( Ds,sym / -2371.8 ) (1)

    Ds,sym = −2371.8 log {ps,sym} (2)

    In this case, 4250 represents a probability approximately equal to 1/6.

    3

    This section, and the matrix that follows, is the transition matrix between the 3 states. The transition matrix given shows a very simple state sequence, starting at state 1, going to state 2, emitting a symbol and ending at state 3.

    2.3 HMM Network Lattice

    This file defines a composite HMM structure based on several smaller HMMs. The structure is very simple, though perhaps a bit verbose. Listing 3 shows the structure of a HTK lattice file.

    N=5 L=2

    I=0

    I=1

    I=2 W=!NULL I=3 W=a I=4 W=b

    Listing 3: ”HTK Network Lattice file”

    #start

    J=0 S=0 E=2 l=-0.6931571806 J=1 S=0 E=4 l=-0.6931571806

    N=5 L=2

    This first line defines N , the number of nodes in the network, and L, the number of links in the network.

    I=i W=w

    Lines with this structure are node definitions. There must be one start node with no incoming links, and at least one stop node with no outgoing links. It is up to you which nodes are start and stop nodes, but for consistency it is easier to have 0 be the start node, and 1 be the stop node. I=i assigns a number i to a node. W=w assigns a word to a node. A node outputs a word. Words are defined later in the word dictionary. There are some predefined words, such as !NULL, an empty node which does not emit any symbol. NULL nodes serve only to reduce the number of links in a network, making it easier to define.

    J=j S=s E=e

    Lines with this structure are link definitions. Each link has an ID j, and is a transition starting from node s, and ending at node e. You can also associate a log probability with a transition with the optional statement l. l=-0.6931571806 indicates that the probability of this transition is 0.5

  • 5

    2.4 Word Dictionary

    The term word comes from the speech recognition purpose of the HTK toolkit. A word is composed of several phones, basic elements of speech. In terms of HMMs, you would define an HMM for each phone, and compose your word out of a combination of several phones. In sequence processing, you may want to define fixed-length regions in your sequence which could be composed out of several HMMs. Remember we used some words in our network lattice file? This is where we define our words. Listing 4 gives an example structure of a word dictionary file.

    b [b] b a [a] a AAA [AAA] a a a

    Listing 4: ”HTK Dictionary file”

    WORD [OUTSYM] HMMS

    Each line of this format defines a word called WORD, associated with an output symbol OUTSYM and composed of linking each HMM defined in the filename list HMMS linearly. They are connected with a transition probability of 1.0. For example, the last entry in Listing 4 is a word AAA composed by linking a HMM with filename a into a chain 3 nodes long.

    When you perform a MAP estimation on your network, it will output the word sequence, not the state sequence.

    2.5 Sequences

    HTK was not designed for sequence data. However, this is what we want to use it for, so our data needs to be modified. Erwin Bakker has developed a tool that can convert sequence data in human readable format to data in HTK format called htkconvert. Listing 15 shows the format of data in human readable format.

    10 1 2 3 4 5 6 7 8 9 0

    Listing 5: ”Human readable sequence file”

    The first number in the file indicates the length of the sequence to follow. All following numbers are the sequence characters. In this file, there are 10 characters and 10 distinct symbols.

    HTK only deals with numbers, so if you have discrete symbols such as A C G or T, you must convert them to a numerical description.

    3 Simple Dishonest Casino

    Figure 1 shows a simple dishonest casino example with two HMMs. One HMM for the fair die, and a second for the unfair die.

    Let us take a closer look at the files involved.

    3.1 HMM List

    fair unfair

    Listing 6: ”Two state hmmlist: htk tutorial/dishonest casino/two state/hmmlist”

    Listing 6 indicates that there are two HMMs, fair and unfair.

    Let us take a closer look at each of these HMMs

  • 6

    Figure 1: Dishonest casino - Two state

    3.2 Fair HMM

    Listing 7: ”Two state fair HMM: htk tutorial/dishonest casino/two state/fair”

    ~o 1 1 ~h "fair"

    3

    2 6 4250 4250 4250 4250 4250 4250

    3 0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00

    Notice that this HMM outputs one of six symbols with equal probability, and then stops.

    3.3 Unfair HMM

    Listing 8: ”Two state unfair HMM: htk tutorial/dishonest casino/two state/unfair”

    ~o 1 1 ~h "unfair"

    3

    2 6

    5461 5461 5461 5461 5461 1644

    3

  • 7

    0.00 1.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00

    Notice that this HMM outputs one of six symbols, with preference to the sixth symbol, and then stops.

    3.4 Network lattice

    N=4 L=8

    Listing 9: ”Two state network lattice: htk tutorial/dishonest casino/two state/network.hmm”

    I=0 I=1 I=2 W=fair I=3 W=unfair

    J=0 S=0 E=2 l=-0.6931471806 # ln 0.5 J=1 S=0 E=3 l=-0.6931471806 # ln 0.5

    J=2 S=2 E=2 l=-0.05229379472 # ln 0.95*0.999 J=3 S=2 E=3 l=-2.996732774 # ln 0.05*0.999 J=4 S=2 E=1 l=-6.907755279 # ln 0.001

    J=5 S=3 E=3 l=-0.10636101600 # ln 0.90*0.999 J=6 S=3 E=2 l=-2.303585593 # ln 0.10*0.999 J=7 S=3 E=1 l=-6.907755279 # ln 0.001

    Listing 9 is the network lattice file. This is where the magic happens. Four words are declared, a start word, an end word, and two words corresponding to fair and unfair. The prior probabilities of the fair and unfair state are 0.5. We use a very small probability to link the fair and unfair states to the end state. To remove all confusion, let us state the obvious:

    0.999(0.05 + 0.95) + 0.001 = 1

    0.999(0.10 + 0.90) + 0.001 = 1

    ∀c ∈ N, Sum(p (c → n)) = 1.0

    n∈N

    3.5 Dictionary

    Listing 10: ”Two state dictionary: htk tutorial/dishonest casino/two state/dictionary”

    fair [f] fair unfair [u] unfair

    Listing 10 describes each of the words used in the network file.

    So, for example, we see that the word fair, corresponds to the HMM with filename fair.

    4 Complex Dishonest Casino

    This example shows a more complicated version, with more HMMs. Figure 2 illustrates how it is done. In this case, each die side for both dice has its own state HMM. HMMs are actually re-used. Let us look at the defined HMMs.

  • 8

    Figure 2: Dishonest casino - Multi state

  • 9

    4.1 HMM List

    Listing 11: ”Multi state hmmlist: htk tutorial/dishonest casino/multi state/hmmlist”

    S1 S2 S3 S4 S5 S6

    Listing 6 indicates that there are six defined HMMs, one for each side of a die.

    Let us look at one of the HMMs.

    4.2 Die side HMM

    Listing 12: ”Die side 1 HMM: htk tutorial/dishonest casino/multi state/S1”

    ~o 1 1 ~h "S1"

    3

    2 6 0 32767 32767 32767 32767 32767

    3 0.00 1.00 0.00 0.00 0.00 1.00

    0.00 0.00 0.00

    Notice that this HMM outputs symbol 1 and then stops. The 5 other HMMs for the other sides of the die are defined similarly.

    4.3 Network lattice

    N=18 L=32

    Listing 13: ”Complex network lattice: htk tutorial/dishonest casino/multi state/network.hmm”

    I=0 I=1 I=2 W=!NULL I=3 W=!NULL I=4 W=!NULL I=5 W=!NULL I=6 W=f_1 I=7 W=f_2 I=8 W=f_3 I=9 W=f_4 I=10 W=f_5 I=11 W=f_6 I=12 W=u_1

  • 10

    I=13 W=u_2 I=14 W=u_3 I=15 W=u_4 I=16 W=u_5 I=17 W=u_6

    #start

    J=0 S=0 E=2 l=-0.6931571806 # Fair J=1 S=0 E=4 l=-0.6931571806 # Unfair

    # Fair

    J=2 S=2 E=6 l=-1.791759469 # 1: ln 1/6 J=3 S=2 E=7 l=-1.791759469 # 2: ln 1/6 J=4 S=2 E=8 l=-1.791759469 # 3: ln 1/6 J=5 S=2 E=9 l=-1.791759469 # 4: ln 1/6 J=6 S=2 E=10 l=-1.791759469 # 5: ln 1/6 J=7 S=2 E=11 l=-1.791759469 # 6: ln 1/6

    # Fair rolls -> end fair J=8 S=6 E=3 J=9 S=7 E=3 J=10 S=8 E=3 J=11 S=9 E=3 J=12 S=10 E=3 J=13 S=11 E=3

    # End fair

    J=14 S=3 E=2 l=-0.05229379472 # Fair: ln 0.95*0.999 J=15 S=3 E=4 l=-2.996732774 # Unfair: ln 0.05*0.999 J=16 S=3 E=1 l=-6.907755279 # End: ln 0.001

    # Unfair

    J=17 S=4 E=12 l=-2.302585093 # 1: ln 0.1 J=18 S=4 E=13 l=-2.302585093 # 2: ln 0.1 J=19 S=4 E=14 l=-2.302585093 # 3: ln 0.1 J=20 S=4 E=15 l=-2.302585093 # 4: ln 0.1 J=21 S=4 E=16 l=-2.302585093 # 5: ln 0.1 J=22 S=4 E=17 l=-0.6931471806 # 6: ln 0.5

    # Unfair rolls -> end unfair J=23 S=12 E=5 J=24 S=13 E=5 J=25 S=14 E=5 J=26 S=15 E=5 J=27 S=16 E=5 J=28 S=17 E=5

    # End unfair

    J=29 S=5 E=2 l=-2.303585593 # Fair: ln 0.1*0.999 J=30 S=5 E=4 l=-0.106361016 # Unfair: ln 0.9*0.999 J=31 S=5 E=1 l=-6.907755279 # End: ln 0.001

    Listing 13 displays the network lattice file for the more complicated model, which includes NULL nodes. Notice that without the NULL nodes this would have required many more links. In this case we have defined 12 words which are represented by HMMs. Look at the dictionary to see how these words are defined.

  • 11

    4.4 Dictionary

    f_1 [f] S1 f_2 [f] S2 f_3 [f] S3 f_4 [f] S4 f_5 [f] S5 f_6 [f] S6 u_1 [u] S1 u_2 [u] S2 u_3 [u] S3 u_4 [u] S4 u_5 [u] S5 u_6 [u] S6

    Listing 14: ”Multi state dictionary: htk tutorial/dishonest casino/multi state/dictionary”

    Notice how we have defined several words using the same HMM. For example, f_1 and u_1 are both defined as using HMM S1.

    5 HTK Tools

    In this section I will present you with tools that solve the most relevant problems. Posterior decoding, used to estimate the optimal label sequence given an emission sequence, and training, used to estimate the parameters in your HMM. HTK is of course composed of many other tools, but the ones most relevant to your tasks will be presented here.

    5.1 Posterior Decoding (MAP estimation)

    The tool you want to perform viterbi decoding with is the HVite tool.

    $>HVite -w network.hmm dictionary hmmlist sequence.htk

    Running the above command will create a file sequence.rec which contains the label sequence. It is also possible to get a word level sequence with the -T 1 option.

    If you define a sequence list file which contains a list of sequence filenames, then you can run all of them with the -S parameter.

    As an example, let us run HVite for our simple die example.

    [htk tutorial/dishonest casino/two state]$ HVite -w network.hmm dictionary hmmlist sequence.htk

    Which produces the output:

    Listing 15: ”Human readable sequence file”

    cat sequence.rec | awk ’{print $3}’ | tr -d ’\n’ ffffffffffffffffffffffffffffffffffffffffffffffffuuuuuuuuuuuuuuuuuuffffffffffff uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuffffffffffffffffffffffffffffffffffffffffffff fffffffffffffffffffffffuuuuuuuuuuuuuffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffuuuuuuuuuuuuuuuuuuufffffffffff

    5.2 Training

    You can train your HMM files with HRest. Note that this does not work for single-emission HMMs as we have seen so far. Furthermore, it is only possible to train HMMs, not HMM lattices. This is a limitation of HTK rooted in that the problematic parameters are found in the phones rather than the construction of the sentences.

  • 12

    The way to combat this, is to move the probabilities out from the lattice and into the HMM. Let us look at an example.

    5.2.1 Training emission probabilities

    Remember the unfair dice? We wish to estimate parameters for this dice. We have rolled a die a few hundred times, and now we wish to estimate the emission probabilities. We create a file which lists all the training sequences, shown in listing 16.

    Listing 16: ”List of training sequences: htk tutorial/dishonest casino train/unfair train/sequences”

    unfair1.bin unfair2.bin unfair3.bin unfair4.bin unfair5.bin

    Now, do you remember our unfair dice? Recall the unfair dice HMM shown in listing 8. In order to estimate the parameters, we need to change it slightly, as you can see in listing 17 The change is only that we have created a self-loop on state 2. This means that the HMM can emit several emissions.

    Listing 17: ”Modified unfair HMM file: htk tutorial/dishonest casino train/unfair train/unfair”

    ~o 1 1 ~h "unfair"

    3

    2 6

    5461 5461 5461 5461 5461 1644

    3 0.00 1.00 0.00 0.00 0.99 0.01 0.00 0.00 0.00

    Now when we train the HMM with the command:

    [htk tutorial/dishonest casino train/unfair train]$ HRest -w 1.0 -S sequences -M ../two state unfair

    This will produce an HMM in the directory ../two state:

    Listing 18: ”Modified trained unfair HMM file: htk tutorial/dishonest casino train/two state/unfair”

    ~o

    1 1 1 ~h "unfair" 3 2 6 5451 5400 5465 5518 5572 1624

    3 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 9.990000e-01 9.999999e-04 0.000000e+00 0.000000e+00 0.000000e+00

  • 13

    Now all we need to do, is fix the transition probabilities back to the way they were such that there is no self link on state 2:

    Listing 19: ”De-modified trained unfair HMM file: htk tutorial/dishonest casino train/two state/unfair.fixed”

    ~o

    1 1 1 ~h "unfair" 3 2 6 5451 5400 5465 5518 5572 1624

    3 0.000000e+00 1.000000e+00 0.000000e+00 0.000000e+00 0.000000e-00 1.000000e-00 0.000000e+00 0.000000e+00 0.000000e+00