30
Segmentation, Classification, and Visualization of Orca Calls using Deep Learning Hendrik Schröter, Elmar Nöth, Andreas Maier, Rachael Cheng, Volker Barth, Christian Bergler Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nürnberg IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) May 12 th – 17 th 2019, Brighton, United Kingdom (UK)

Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Segmentation, Classification, and Visualization of

Orca Calls using Deep Learning

Hendrik Schröter, Elmar Nöth, Andreas Maier, Rachael Cheng, Volker Barth, Christian Bergler

Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nürnberg

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

May 12th – 17th 2019, Brighton, United Kingdom (UK)

Page 2: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Motivation: Killer whale research

The Killer Whale (Orcinus orca) [1]

Page 3: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Motivation: Killer whale research

OrcaLab [2]

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 1

Page 4: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Motivation: Killer whale research

N54°

N52°

N50°

N56°

N48°

VAN

CO

UVER ISLAND

HA

IDA

GW

AII

20182018 PRINCE RUPERTPRINCE RUPERT

BELLA BELLABELLA BELLA

PORT HARDYPORT HARDYHANSON ISLANDHANSON ISLAND

20172017

BIG BAY

CRACROFT POINT

THE CLIFFBOAT BAY

QCS

FLOWER ISLAND

LOCAL LEFT

LOCAL RIGHT

PARSON ISLAND

CRACROFT POINT

CRITICAL POINT

QUEEN CHARLOTTE STRAIGHT

SWAIN POINT

RUBBING BEACH

FI

LL

LR

PI

CP

CRPT

SP

RB

CP

PI

CRPT

SP

RB

LL

LR

QCS

FI

ORCALAB

Covered recording area by the DeepAL [1] expedition and the fixed installed OrcaLab [2] hydrophones

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 2

Page 5: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Motivation: Killer whale research

The Orchive [3]

• collected by the OrcaLab [2] and Steven Ness [3]

• 20,000 hours of underwater recordings by using 6 stationary hydrophones

(1985–2010)

• 23,511 digitized audio tapes each∼45 min.

• Orchive Annotation Catalog (OAC) [3] comprises 15,480 orca/noise labels

DeepAL Fieldwork Data (DLFD) 2017/2018 [1]

• collected via a 15-meter research trimaran

• 1,007 hours of multi-channel underwater recordings

• 89 hours video footage about behavioral data

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 3

Page 6: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Example killer whale vocalizations

0.0 0.2 0.4 0.6 0.8 1.0time [s]

2000

4000

6000

8000

10000

12000

frequ

ency

[Hz]

Whistle

0.0 0.2 0.4 0.6 0.8 1.0 1.2time [s]

2000

4000

6000

8000

10000

12000

frequ

ency

[Hz]

Pulsed Call

0.00 0.25 0.50 0.75 1.00 1.25time [s]

2000

4000

6000

8000

10000

12000

frequ

ency

[Hz]

Echolocation Click

Spectrograms from three characteristic killer whale sounds.

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 4

Page 7: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Outline

Data Corpora and Preprocessing

Segmentation – Network Architecture, Training, and Results

Call Type Classification – Network Architecture, Training, and Results

Visualization – Call Type Features

Conclusion

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 5

Page 8: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Data Corpora and Preprocessing

Page 9: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Data Corpora – Orca/Noise Segmentation

Corpora

dataset

split training validation test

samples % orca samples % orca samples % orca

OAC1 11,504 8,042 84.9 1,711 83.3 1,751 82.4

AEOTD2 17,995 14,424 8.9 1,787 15.4 1,784 5.7

DLFD3 31,928 23,891 14.2 4,125 30.1 3,912 28.3

SUM 61,427 46,357 24.8 7,623 38.6 7,447 35.6

1 Orchive Annotation Catalog (OAC) [2]2 Automatic Extracted Orchive tape data (AEOTD) [3]3 DeepAL Fieldwork Data (DLFD) [1]

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 6

Page 10: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Data Corpora – Call Type Classification

Corpora

dataset

split training validation test

samples % samples % samples %

CCS1 138 102 73.9 19 13.8 17 12.3

CCN2 286 198 69.2 41 14.4 47 16.4

EXT3 90 63 70.0 12 13.3 15 16.7

SUM 514 363 70.6 72 14.0 79 15.4

1 Call Catalog Symonds (CCS) [2]2 Call Catalog Ness (CCS) [3]3 Orchive Extension Catalog (EXT)

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 7

Page 11: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Data Preprocessing

Preprocessing and Augmentation

• Power-Spectrogram

• Augmentation

• Amplitude scaling

• Frequency shift

• Time stretch

• Addition of noise spectrograms

• Trimming / Padding to fixed length

• dB-Normalization

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 8

Page 12: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Segmentation – Network Architecture, Training,

and Results

Page 13: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Network Architecture and Training

Architecture

Ini�al Convolu�on

7x7 Kernels

Residual Layer 1

3x3 Kernels

Residual Layer 2

3x3 Kernels

Residual Layer 3

3x3 Kernels

Residual Layer 4

3x3 Kernels

Global

Avg-Pooling

Fully

connected

Inputs

1@256x128

Feature Maps

64@128x64

Feature Maps

128@64x32

Feature Maps

256@32x16

Feature Maps

512@16x8

Hidden Units

512

Outputs

2

Feature Maps

64@128x64

ResNet18-based Convolutional Neural Network (CNN) without max-pooling in the first residual layer for a binary

classification problem

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 9

Page 14: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Network Results

Results

• Test accuracy of 95.0 % (TPR = 93.8 %, FPR = 4.3 %)

0 10 20 30 400.7

0.8

0.9

1

0,95

Epoch

Accu

racy

Training

Validation

Training and validation accuracy of the segmentation model.

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 10

Page 15: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Call Type Classification – Network Architecture,

Training, and Results

Page 16: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Network Architecture and Training

Architecture

Ini�al

Convolu�on

7x7 Kernels

Residual

Layer 1

3x3 Kernels

Residual

Layer 2

3x3 Kernels

Residual

Layer 3

3x3 Kernels

Residual

Layer 4

3x3 Kernels

Global

Avg-Pooling

Fully

connected

Inputs

1@256x128

64@128x64

128@64x32

256@32x16

512@16x8

Hidden Units

512

Outputs

12

64@128x64

Feature Maps

ResNet18-based Convolutional Neural Network (CNN) without max-pooling in the first residual layer for a

12-class problem

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 11

Page 17: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Network Results

Results

• Mean test accuracy of 87.0 %re

fere

nce

noise

n1 n2 n3 n4 n5 n7 n9 n12

n47

whist

le

echo

noise

n1

n2

n3

n4

n5

n7

n9

n12

n47

whistle

echo0.0

0.2

0.4

0.6

0.8

1.0

hypothesis

Confusion matrix from the call type classifier.

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 12

Page 18: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Network Results

Misclassifications

Reference

0.0 0.5 1.0time [s]

2000

4000

6000

8000

10000

frequ

ency

[Hz]

N9 N2 as N9

— Wrong predictions —

N5 as N9 N9 as N7

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 13

Page 19: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Visualization – Call Type Features

Page 20: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Call Type Feature Visualization

(c)

(e)

(n)

(0) (1.1) (1.2) (2) (3)

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 14

Page 21: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Conclusion

Page 22: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Conclusion

• Two-stage approach for robust segmentation and classification

• Applicable on any semi-labeled database

• Real-time factor of 1/25 (NVIDIA GTX 1050) enables on-the-fly detection in

the field

• Automatically segment large data corpora followed by a subsequent call type

classification

• Direct comparison to other work is difficult (different data corpora and/or

approaches) (Steven Ness [3])

• Training call type classifier with only few call type labels

• Increase training data to be more robust against signal variety of real-world

data

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 15

Page 23: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Thank you for your attention.

Questions?

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 16

Page 24: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

References I

1C. Bergler, Deepal fieldwork data 2017/2018 (dlfd),

https://www5.cs.fau.de/research/data/ (April 2019).

2ORCALAB, Orcalab - a whale research station on hanson island,

http://orcalab.org (September 2018).

3S. Ness, “The orchive : a system for semi-automatic annotation and analysis of a

large collection of bioacoustic recordings”, PhD thesis (Department of Computer

Science, University of Victoria, 3800 Finnerty Road, Victoria, British Columbia,

Canada, V8P 5C2, 2013), p. 228.

4A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,

A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch”, in

Nips 2017 workshop (2017).

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 16

Page 25: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Data Distribution

Call Type Label Distribution

Orca Call Type/

CorpusN01 N02 N03 N04 N05 N07 N09 N12 N47 echo whistles noise SUM

CCS [2] 33 10 — 21 14 18 26 16 — — — — 138

CCN [3] 36 — 56 60 — 31 70 — 33 — — — 286

EXT — — — — — — — — — 30 30 30 90

SUM 69 10 56 81 14 49 96 16 33 30 30 30 514

Orca call type, echolocation, whistle, and noise label distribution of the CCS, CCN, and EXT data corpus

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 17

Page 26: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Data Preprocessing

Preprocessing and Augmentation

Data: Training Input AudioAinp

Result: Trainable Spectrogram Strain

1 Sinp← 10 · log10(|FFT (resamp(mono(Ainp),44.1kHz), ffts = 4096, hop = 441)|2)

2 Strain← scaleAmplitude(Sinp,αdB = sample([−6dB,3dB]))

3 Strain← shiftPitch(Strain,α = sample([0.5,1.5]))

4 Strain← stretchTime(Strain,α = sample([0.5,2]))

5 Strain← compressFrequencies(Strain, fmin = 500Hz, fmax = 10000Hz,bins = 256)

6 Strain← addNoise(Strain,sample(Snoise),SNR = sample([12dB,−3dB]))

7 Strain← normalize(Strain,dBmin =−100dB,dBref = 20dB)

8 Strain← trimPad(Strain, length = sample(128))

9 return Strain

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 18

Page 27: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Segmentation Model – Network Training

Training

• implemented and trained using PyTorch [4]

• Adam optimizer (lrinit = 10−5, β1 = 0.5, β2 = 0.999)

• learning rate decayed by a factor of 0.5 if there was no improvement on the

validation accuracy for 4 epochs

• training stopped if there was no improvement on the validation accuracy for

10 epochs

• batch size = 32

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 19

Page 28: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Classification Model – Network Training

Training

• implemented and trained using PyTorch [4]

• Adam optimizer (lrinit = 10−5, β1 = 0.5, β2 = 0.999)

• learning rate decayed by a factor of 0.5 if there was no improvement on the

validation accuracy for 4 epochs

• training stopped if there was no improvement on the validation accuracy for

10 epochs

• batch size = 4

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 20

Page 29: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Comparison with previous work: Segmentation

NameSegment. Dataset

Accuracy AUCtype size

Ness [3] Orca 11041 92.12 % –

Ours Orca 61427 94.97 % 98.17%

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 21

Page 30: Segmentation, Classification, and Visualization of Orca Calls … · 2019-11-13 · motivation: killer whale research n54° n52° n50° n56° n48° v a n c o u v e r is la nd h a

Comparison with previous work: Classification

Ness [3]

• Classification of 12 pulsed calls

• Mean accuracy of 76%

• Per class accuracies between 60%

to 92%

Ours

• Classification of 9 pulsed calls,

whistle, echolocation and noise

• Mean test accuracy of 87%

• Per class accuracy between 50%

to 100%

Christian Bergler | LME | Segmentation, Classification, and Visualization of Orca Calls using Deep Learning ICASSP 2019, Brighton, UK 22