18
User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

User Manual ofMining Mouse Vocalizations

Prepared byJesin Zakaria and Eamonn Keogh

Page 2: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CREATE SPECTROGRAM

Run the code createSpectro.m to 1. create spectrogram from a .wav file2. idealize the spectrogram3. extract candidate syllables from idealized spectrogram

Try the following exampleSet,rec = ‘..\031611KOKO02MATED.wav'; % put the address and name of the wav fileD = ‘...\031611KOKO02MATEDspectro\'; % location of the folder

% that will contain syllables

Depending on the size of main memory and recording set range of the for loopIn each iteration we created spectrogram of two minutes of the recording, this value can be changed to create spectrogram of longer section of the recording.

RUNNING TIME:Since the running time is faster than real time, we did not include running time analysis in our paper.For example,It took on average,(12.95 + 12.81 + 12.67)/3 = 12.81 second, to create spectrogram of a two minute long recording

It took, 85.7 second to extract connected components from the idealized spectrogram of a six minute long recording

Page 3: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CREATE SPECTROGRAM

rec = 'C:\Users\Jesin\Desktop\temp\031611KOKO02MATED.wav';t1 = 124000*250; t2 = 125000*250;

[Y, FS] = wavread(rec,[t1,t2]);[y,F,T,P]=spectrogram(Y,512,256,512,FS,'yaxis');

C = -10*log10(P);C(C<35)=0;C(C>80)=0;C(C~=0)=1; imshow(~C);

124 Time (second) 125

40

kHz

100

laboratorymice

Figure 1: Use the following code to create the idealized spectrogram.

Page 4: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

EXTRACT CANDIDATE SYLLABLES

In createSpectro.m we marked the part of code to extract candidate syllables

Results of all filtering steps are included in the extractcandidatesyllable.zip folder

The folder …\031611KOKO02MATEDspectro contains all connected components with duration >10 and <300 and within frequency range 30 to 110kHz

The folder …\031611KOKO02MATEDcontains all candidate syllables after filtering out some noise and excludingall the syllables but one that appear in the same time stamp

The folder …\sametime contains syllables that were excluded for appearing in same timestamp

Page 5: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CLASSIFY CANDIDATE SYLLABLES

Run the code classifySyllables.mRequire:1. labelGrndTruth.txt contains labels of the ground truth2. theta.txt contains thresholds for each class.

mean, sigma, mean+sigma and mean+2*sigma for each class of syllables in the ground truth are included in column 1, 2, 4 and 5 of theta.txt

3. Nomalized Ground truth4. Candidate syllables bitmaps5. List of candidate syllables in sorted order

Result:For our sample example,‘dis031611KOKO02MATED.txt’, contains distance of the candidate syllables to GroundTruth‘label 031611KOKO02MATED.txt’, contains labels of all the candidate syllables

If you want to see class distribution unblock the code for class distribution in classifySyllables.m

Page 6: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CLASSIFY CANDIDATE SYLLABLES

Normalization method

In our paper we said that all the candidate syllables and ground truth are normalized before computing the GHT distance between them.But for brevity we did not include details about our normalization method and also did not validate our normalization method.

In the next slide we will present detail about our normalization method.

Page 7: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CLASSIFY CANDIDATE SYLLABLESNormalization method

Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes)Syllables that are not clustered correctly are marked with red circle

GHT is calculated without normalizing the syllables

Page 8: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CLASSIFY CANDIDATE SYLLABLESNormalization method

Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes)Still there are some syllables that are not clustered correctly as evident from the following figure

GHT is calculated after normalizing the syllables by dividing x and y by the larger dimension(row or column)

Same set of syllables after normalization

Page 9: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CLASSIFY CANDIDATE SYLLABLESNormalization method (we used in our paper)

Set: 16 syllables of class 1, 3, 4 and 11 (non confusing classes)All the syllables except one (marked with arrow), are clustered correctly as evident from the following figure

GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively

Same set of syllables after normalization

Page 10: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

CLASSIFY CANDIDATE SYLLABLES

Same set of syllables after normalization

Set: 16 syllables of class 1 and 27 syllables of class 9 (Confusing classes)

Normalization method (we used in our paper)

GHT is calculated after normalizing the syllables by dividing x and y by the size of row and column respectively

Page 11: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

EDITING GROUND TRUTH

0 100 200 300 400 500 600 700

0

0.2

0.4

0.6

0.8

1

Adding more instances

Cla

ssif

icat

ion

Acc

urac

y

for edited ground truth

for all the labeled syllables

Run accuracyGrndTrth.m to generate the plotIt requires,

editMatrix.txtdis692.txtlabel692.txt

DESCRIPTION OF THE FILESIn our paper we have mentioned about the 692 annotated syllables by the domain expert.

Instead of using that 692 syllables as ground truth we used data editing technique, that resulted in a set of 108 syllables which we used as GROUNDTRUTH for our experiments

1. editMatrix.txt contains result of editing 692 annotated syllablesColumn 2, 3, 4 and 5 represent the number of syllable added to the ground truth, class label of the syllable, total number of classified syllable using the edited ground truth and accuracy rate.2. dis692.txt contains GHT distances of the 692 annotated syllables3. label692.txt contains class labels of the 692 syllables

groundtruth.zip contains the set of 692 syllable and 108 syllables that we mentioned in our paper.

Page 12: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

MOTIF DISCOVERY

Run findMotif.m to find motifs from a vocalization

944.7 – 945.2 sec194.8 – 195.2 sec

Instruction:In findMotif.m need to change

location of the folders that will contain motifs, .wav file, list of syllables,label of the syllables

And also create folder e.g. …/motif/6 …/motif/7 before running the code.These folders will contain motifs of length 6, 7 etc.

motif.zip contains motifs from the attached .wav file.

Page 13: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

Clustering mice vocalizations

Run clusterMtf.m to cluster motifs from mice vocalizations

The folder ‘dendo_mice’ contains all the required files used to generate the dendrograms of figure 12 and figure 13.

Page 14: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

d d q d ddqd

(‘q’ means, unknown class)

QUERY

Similarity search / Query by content

Some additional results are attached here

10 NN from four vocalizations are presented.

Page 15: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

qaiaiacia

(‘q’ means, unknown class)QUERY

Similarity search / Query by content

Some additional results are attached here

10 NN from four vocalizations are presented.

a

q i

a

i

ac

i

a

Page 16: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

Motif Significance

Run mtfSgnfnc.m to assess significance of motifs based on their z-score.

The folder ‘../mtfSgnfcn’ contains all the required files used to generate the plot of figure 17.

Page 17: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

Contrast sets

createContrastset.m is used to create the contrast sets.contratset.m is used to extract the patterns in contrast sets, from a vocalization.

The folder ‘../contrastSet’ contains some examples of contrast set that we mentioned in our paper. It also contains necessary files needed in createContrastset.m

‘contrastset.txt’ contains the list of substrings sorted in descending order of their information gain.

Page 18: User Manual of Mining Mouse Vocalizations Prepared by Jesin Zakaria and Eamonn Keogh

Question/ comment?Email at, [email protected]