98
Detecting and Labeling Folk Literature in Spoken Cultural Heritage Archives using Structural and Prosodic Features Fabio Valente and Petr Motlicek CBMI 2012 Fabio Valente and Petr Motlicek (Idiap Research Institute) Detecting and Labeling Folk Literature CBMI 2012 1 / 23

cbmi_2012

Embed Size (px)

DESCRIPTION

http://fvalente.zxq.net/presentations/cbmi_2012.pdf

Citation preview

Detecting and Labeling Folk Literature in SpokenCultural Heritage Archives using Structural and Prosodic

Features

Fabio Valente and Petr Motlicek

CBMI 2012

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 1 / 23

Outline of the talk

1 Introduction and Motivation

2 Data Description

3 System Description

4 Experiments

5 Conclusion

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 2 / 23

Outline of the talk

1 Introduction and Motivation

2 Data Description

3 System Description

4 Experiments

5 Conclusion

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 3 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Audio and speech processing technologies have been applied toautomatically structure, index and access them.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Audio and speech processing technologies have been applied toautomatically structure, index and access them.

SpeechFind project (2005)

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Audio and speech processing technologies have been applied toautomatically structure, index and access them.

SpeechFind project (2005)MALACH project (2004)

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Audio and speech processing technologies have been applied toautomatically structure, index and access them.

SpeechFind project (2005)MALACH project (2004)CHoral project (2009)

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Audio and speech processing technologies have been applied toautomatically structure, index and access them.

SpeechFind project (2005)MALACH project (2004)CHoral project (2009)

Typically broadcast quality audio.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Audio and speech processing technologies have been applied toautomatically structure, index and access them.

SpeechFind project (2005)MALACH project (2004)CHoral project (2009)

Typically broadcast quality audio.

Data represent monolingual testimonies or interviews.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Audio archives of cultural heritage are large and heterogeneous.

Audio and speech processing technologies have been applied toautomatically structure, index and access them.

SpeechFind project (2005)MALACH project (2004)CHoral project (2009)

Typically broadcast quality audio.

Data represent monolingual testimonies or interviews.

Tasks include segmentation, transcription and information retrieval.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 4 / 23

Introduction and Motivation

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 5 / 23

Introduction and Motivation

Archives des parlers patois de la Suisse romande et des regions voisinesRadio Suisse Romande

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 5 / 23

Introduction and Motivation

Archives des parlers patois de la Suisse romande et des regions voisinesRadio Suisse Romande

Devoted to the preservation and the disclosure of Swiss Frenchdialects from various regions.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 5 / 23

Introduction and Motivation

Archives des parlers patois de la Suisse romande et des regions voisinesRadio Suisse Romande

Devoted to the preservation and the disclosure of Swiss Frenchdialects from various regions.

Broadcasted between 1950 and 1980 (varying and heterogeneousquality)

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 5 / 23

Introduction and Motivation

Archives des parlers patois de la Suisse romande et des regions voisinesRadio Suisse Romande

Devoted to the preservation and the disclosure of Swiss Frenchdialects from various regions.

Broadcasted between 1950 and 1980 (varying and heterogeneousquality)

Contains interviews, presentations and discussions...

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 5 / 23

Introduction and Motivation

Archives des parlers patois de la Suisse romande et des regions voisinesRadio Suisse Romande

Devoted to the preservation and the disclosure of Swiss Frenchdialects from various regions.

Broadcasted between 1950 and 1980 (varying and heterogeneousquality)

Contains interviews, presentations and discussions...

...but also folk literature (stories, poems, theatrical representations,tales) spoken in many dialects.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 5 / 23

Introduction and Motivation

Archives des parlers patois de la Suisse romande et des regions voisinesRadio Suisse Romande

Devoted to the preservation and the disclosure of Swiss Frenchdialects from various regions.

Broadcasted between 1950 and 1980 (varying and heterogeneousquality)

Contains interviews, presentations and discussions...

...but also folk literature (stories, poems, theatrical representations,tales) spoken in many dialects.

Folk literature accounts for almost 50% of the data.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 5 / 23

Introduction and Motivation

Previous works focused mainly on segmenting/detecting/indexing inmonolingual interviews and testimonies.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 6 / 23

Introduction and Motivation

Previous works focused mainly on segmenting/detecting/indexing inmonolingual interviews and testimonies.

Previous works made large use of ASR based transcriptions.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 6 / 23

Introduction and Motivation

Previous works focused mainly on segmenting/detecting/indexing inmonolingual interviews and testimonies.

Previous works made large use of ASR based transcriptions.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 6 / 23

Introduction and Motivation

Previous works focused mainly on segmenting/detecting/indexing inmonolingual interviews and testimonies.

Previous works made large use of ASR based transcriptions.

Can also folk literature be detected and labeled ?

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 6 / 23

Introduction and Motivation

Previous works focused mainly on segmenting/detecting/indexing inmonolingual interviews and testimonies.

Previous works made large use of ASR based transcriptions.

Can also folk literature be detected and labeled ?

How this perform without any ASR/lexical information (dialects aretoo sparse to adapt ASR) ?

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 6 / 23

Outline of the talk

1 Introduction and Motivation

2 Data Description

3 System Description

4 Experiments

5 Conclusion

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 7 / 23

Data Description

The radio shows have been broadcasted between 1950 and 1980 for the preservation anddisclosure of Swiss French dialects from various regions (Fribourg, Valais, Vaud, Jura) aswell as neighboring regions from Italy and France.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 8 / 23

Data Description

The radio shows have been broadcasted between 1950 and 1980 for the preservation anddisclosure of Swiss French dialects from various regions (Fribourg, Valais, Vaud, Jura) aswell as neighboring regions from Italy and France.

The corpus includes over 1500 recordings.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 8 / 23

Data Description

The radio shows have been broadcasted between 1950 and 1980 for the preservation anddisclosure of Swiss French dialects from various regions (Fribourg, Valais, Vaud, Jura) aswell as neighboring regions from Italy and France.

The corpus includes over 1500 recordings.

A subset of them also includes the show scripts with the approximated “story” boundariesas well as a short description, the dialect spoken, the name of participants and their roles(anchorman, guests, actors).

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 8 / 23

Data Description

The radio shows have been broadcasted between 1950 and 1980 for the preservation anddisclosure of Swiss French dialects from various regions (Fribourg, Valais, Vaud, Jura) aswell as neighboring regions from Italy and France.

The corpus includes over 1500 recordings.

A subset of them also includes the show scripts with the approximated “story” boundariesas well as a short description, the dialect spoken, the name of participants and their roles(anchorman, guests, actors).

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 8 / 23

Data Description

The radio shows have been broadcasted between 1950 and 1980 for the preservation anddisclosure of Swiss French dialects from various regions (Fribourg, Valais, Vaud, Jura) aswell as neighboring regions from Italy and France.

The corpus includes over 1500 recordings.

A subset of them also includes the show scripts with the approximated “story” boundariesas well as a short description, the dialect spoken, the name of participants and their roles(anchorman, guests, actors).

90 shows were uniformly sampled across the 30 years thus considering very differentacoustic quality in the audio.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 8 / 23

Data Description

The radio shows have been broadcasted between 1950 and 1980 for the preservation anddisclosure of Swiss French dialects from various regions (Fribourg, Valais, Vaud, Jura) aswell as neighboring regions from Italy and France.

The corpus includes over 1500 recordings.

A subset of them also includes the show scripts with the approximated “story” boundariesas well as a short description, the dialect spoken, the name of participants and their roles(anchorman, guests, actors).

90 shows were uniformly sampled across the 30 years thus considering very differentacoustic quality in the audio.

The radio broadcast shows contain 60 different story labels including interviews, tales,poetry, recitals, theatrical representations, discussions, bibliographical references.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 8 / 23

Data Description

The radio shows have been broadcasted between 1950 and 1980 for the preservation anddisclosure of Swiss French dialects from various regions (Fribourg, Valais, Vaud, Jura) aswell as neighboring regions from Italy and France.

The corpus includes over 1500 recordings.

A subset of them also includes the show scripts with the approximated “story” boundariesas well as a short description, the dialect spoken, the name of participants and their roles(anchorman, guests, actors).

90 shows were uniformly sampled across the 30 years thus considering very differentacoustic quality in the audio.

The radio broadcast shows contain 60 different story labels including interviews, tales,poetry, recitals, theatrical representations, discussions, bibliographical references.

The story labels from the show scripts are clustered together in folk literature (tales,history, legends, recitals, anecdotes, poems, theatre) and conventional broadcast data

(introductions, openings, interview, debates, comments).

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 8 / 23

Data Description

Finer classification scheme: Story-telling, Poetry, Theatre,Functional, Interview, Other

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 9 / 23

Data Description

Finer classification scheme: Story-telling, Poetry, Theatre,Functional, Interview, Other

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 9 / 23

Data Description

Finer classification scheme: Story-telling, Poetry, Theatre,Functional, Interview, Other

Story Label Number of Stories Time Percentage

Story-telling 92 16%

Poetry 67 10%

Theatre 26 15%

Functionals 230 16%

Interview 128 35%

Others 24 8%

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 9 / 23

Data Description

Finer classification scheme: Story-telling, Poetry, Theatre,Functional, Interview, Other

Story Label Number of Stories Time Percentage

Story-telling 92 16%

Poetry 67 10%

Theatre 26 15%

Functionals 230 16%

Interview 128 35%

Others 24 8%

Almost 40% of the audio consists of folk literature

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 9 / 23

Data Description

Summary of structural, stylistic and speaker properties for the different story labels. Speakingstyle properties as in [Llisterri2002].

Story Label Story Structure Speakers Speaking styleFunctionals Monologues Professional Prompted

ScriptedInterview Dialogues Professional Spontaneous

Multiparty non-ProfessionalStory-telling Monologues Professional Narrative

ExpressivePoetry Monologues Professional Narrative

non-Professional ExpressiveTheatre Monologues Professional Expressive

DialoguesMultiparty

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 10 / 23

Data Description

Summary of structural, stylistic and speaker properties for the different story labels. Speakingstyle properties as in [Llisterri2002].

Story Label Story Structure Speakers Speaking styleFunctionals Monologues Professional Prompted

ScriptedInterview Dialogues Professional Spontaneous

Multiparty non-ProfessionalStory-telling Monologues Professional Narrative

ExpressivePoetry Monologues Professional Narrative

non-Professional ExpressiveTheatre Monologues Professional Expressive

DialoguesMultiparty

Structural and stylistic already used for broadcast segmentation into topics and stories, speaker

role recognition and content summarization.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 10 / 23

Outline of the talk

1 Introduction and Motivation

2 Data Description

3 System Description

4 Experiments

5 Conclusion

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 11 / 23

System Description

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 12 / 23

System Description

Assign a label to each speaker turn, in two different tasks:

Folk spoken literature versus conventional broadcast.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 13 / 23

System Description

Assign a label to each speaker turn, in two different tasks:

Folk spoken literature versus conventional broadcast.

Storytelling, Poetry, Theatre, Functional, Interview and Other.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 13 / 23

System Description

Assign a label to each speaker turn, in two different tasks:

Folk spoken literature versus conventional broadcast.

Storytelling, Poetry, Theatre, Functional, Interview and Other.

1 Automatic Speaker Turn Extraction/Diarization.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 13 / 23

System Description

Assign a label to each speaker turn, in two different tasks:

Folk spoken literature versus conventional broadcast.

Storytelling, Poetry, Theatre, Functional, Interview and Other.

1 Automatic Speaker Turn Extraction/Diarization.

2 Structural Feature Extraction.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 13 / 23

System Description

Assign a label to each speaker turn, in two different tasks:

Folk spoken literature versus conventional broadcast.

Storytelling, Poetry, Theatre, Functional, Interview and Other.

1 Automatic Speaker Turn Extraction/Diarization.

2 Structural Feature Extraction.

3 Acoustic/Prosodic Feature Extraction.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 13 / 23

Automatic Speaker Turn Extraction

The raw audio is pre-processed using a Wiener filter de-noising.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 14 / 23

Automatic Speaker Turn Extraction

The raw audio is pre-processed using a Wiener filter de-noising.

19 MFCC coefficients are extracted.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 14 / 23

Automatic Speaker Turn Extraction

The raw audio is pre-processed using a Wiener filter de-noising.

19 MFCC coefficients are extracted.

HMM/GMM speech/non-speech detection. GMM trained on broadcast speech.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 14 / 23

Automatic Speaker Turn Extraction

The raw audio is pre-processed using a Wiener filter de-noising.

19 MFCC coefficients are extracted.

HMM/GMM speech/non-speech detection. GMM trained on broadcast speech.

Speaker Diarization system is used to infer “ who spoke when ”.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 14 / 23

Automatic Speaker Turn Extraction

The raw audio is pre-processed using a Wiener filter de-noising.

19 MFCC coefficients are extracted.

HMM/GMM speech/non-speech detection. GMM trained on broadcast speech.

Speaker Diarization system is used to infer “ who spoke when ”.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 14 / 23

Automatic Speaker Turn Extraction

The raw audio is pre-processed using a Wiener filter de-noising.

19 MFCC coefficients are extracted.

HMM/GMM speech/non-speech detection. GMM trained on broadcast speech.

Speaker Diarization system is used to infer “ who spoke when ”.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 14 / 23

Automatic Speaker Turn Extraction

The raw audio is pre-processed using a Wiener filter de-noising.

19 MFCC coefficients are extracted.

HMM/GMM speech/non-speech detection. GMM trained on broadcast speech.

Speaker Diarization system is used to infer “ who spoke when ”.

Turn defined as a speech region from the same speaker uninterrupted by pauses longerthen 500 ms while a Sentence is a speech region from the same speaker uninterrupted byany silence/pause

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 14 / 23

Structural Feature Extraction

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 15 / 23

Structural Feature Extraction

Features related to the structure of the show or type of story.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 15 / 23

Structural Feature Extraction

Features related to the structure of the show or type of story.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 15 / 23

Structural Feature Extraction

Features related to the structure of the show or type of story.

Turn duration, the maximum and the minimum sentence duration in the turn, the number of

sentences in the turn, the relative position of the turn in the show, the ratio between amount of

speech and amount of silence in the turn.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 15 / 23

Structural Feature Extraction

Features related to the structure of the show or type of story.

Turn duration, the maximum and the minimum sentence duration in the turn, the number of

sentences in the turn, the relative position of the turn in the show, the ratio between amount of

speech and amount of silence in the turn.

To capture longer term dependencies: N-gram of quantized consecutive

structural features are introduced

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 15 / 23

Structural Feature Extraction

Features related to the structure of the show or type of story.

Turn duration, the maximum and the minimum sentence duration in the turn, the number of

sentences in the turn, the relative position of the turn in the show, the ratio between amount of

speech and amount of silence in the turn.

To capture longer term dependencies: N-gram of quantized consecutive

structural features are introduced

Example: interviews are often a sequence of shorter and longer turnsfrom journalist and interviewees.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 15 / 23

Acoustic/Prosodic Feature Extraction

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 16 / 23

Acoustic/Prosodic Feature Extraction

Features related with acoustic/prosodic speaker behavior are extracted atturn level.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 16 / 23

Acoustic/Prosodic Feature Extraction

Features related with acoustic/prosodic speaker behavior are extracted atturn level.

The average speaking rate, the average articulation rate, various F0 statistics (mean, median,

minimum, maximum, variance and slope), and minimum, maximum, and mean RMS energy

(extracted using Praat) computed based on pseudo-syllables estimation.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 16 / 23

Acoustic/Prosodic Feature Extraction

Features related with acoustic/prosodic speaker behavior are extracted atturn level.

The average speaking rate, the average articulation rate, various F0 statistics (mean, median,

minimum, maximum, variance and slope), and minimum, maximum, and mean RMS energy

(extracted using Praat) computed based on pseudo-syllables estimation.

To capture longer term dependencies: N-gram of quantized consecutive

structural features are introduced

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 16 / 23

Acoustic/Prosodic Feature Extraction

Features related with acoustic/prosodic speaker behavior are extracted atturn level.

The average speaking rate, the average articulation rate, various F0 statistics (mean, median,

minimum, maximum, variance and slope), and minimum, maximum, and mean RMS energy

(extracted using Praat) computed based on pseudo-syllables estimation.

To capture longer term dependencies: N-gram of quantized consecutive

structural features are introduced

Example: interviews are often a sequence of professional andnon-professional speaker turns.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 16 / 23

Boosting Classifier

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 17 / 23

Boosting Classifier

In order to integrate continuous, discrete features as well as N-gramcounts into the same classifier, a boosting approach is used.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 17 / 23

Boosting Classifier

In order to integrate continuous, discrete features as well as N-gramcounts into the same classifier, a boosting approach is used.

Multi-class Boosting is implemented.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 17 / 23

Boosting Classifier

In order to integrate continuous, discrete features as well as N-gramcounts into the same classifier, a boosting approach is used.

Multi-class Boosting is implemented.

The weak learners are one-level decision trees.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 17 / 23

Boosting Classifier

In order to integrate continuous, discrete features as well as N-gramcounts into the same classifier, a boosting approach is used.

Multi-class Boosting is implemented.

The weak learners are one-level decision trees.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 17 / 23

Boosting Classifier

In order to integrate continuous, discrete features as well as N-gramcounts into the same classifier, a boosting approach is used.

Multi-class Boosting is implemented.

The weak learners are one-level decision trees.

Results are reported in terms of F-measure (Precision/Recall).

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 17 / 23

Boosting Classifier

In order to integrate continuous, discrete features as well as N-gramcounts into the same classifier, a boosting approach is used.

Multi-class Boosting is implemented.

The weak learners are one-level decision trees.

Results are reported in terms of F-measure (Precision/Recall).

Precision and Recall are computed as percentage of time correctlylabeled/retrieved in order to take into account possible mismatchesbetween the manual boundaries and the speaker diarization output.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 17 / 23

Outline of the talk

1 Introduction and Motivation

2 Data Description

3 System Description

4 Experiments

5 Conclusion

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 18 / 23

Experiments

Prosodic Structural Prosodic+StructuralFolk literature 0.80 (0.84/0.77) 0.69 (0.73/0.66) 0.83 (0.87/0.80)

Conventional Broadcast 0.91 (0.89/0.93) 0.87 (0.89/0.85) 0.92 (0.94/0.90)

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 19 / 23

Experiments

Prosodic Structural Prosodic+StructuralFolk literature 0.80 (0.84/0.77) 0.69 (0.73/0.66) 0.83 (0.87/0.80)

Conventional Broadcast 0.91 (0.89/0.93) 0.87 (0.89/0.85) 0.92 (0.94/0.90)

Prosodic Structural Prosodic+StructuralStorytelling 0.57 (0.58/0.56) 0.48 (0.47/0.48) 0.57 (0.59/0.55)

Poetry 0.37 (0.42/0.33) 0.28 (0.34/0.24) 0.45 (0.47/0.43)Theatre 0.72 (0.81/0.65) 0.69 (0.74/0.64) 0.81 (0.84/0.77)

Functionals 0.79 (0.75/0.83) 0.79 (0.77/0.82) 0.83 (0.81/0.84)Interview 0.83 (0.87/0.85) 0.72 (0.70/0.75) 0.87 (0.84/0.90)Others 0.14 (0.32/0.09) 0.13 (0.22/0.09) 0.23 (0.35/0.17)

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 19 / 23

Experiments

Prosodic features outperforms Structural features.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 20 / 23

Experiments

Prosodic features outperforms Structural features.

Results well above chance classification.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 20 / 23

Experiments

Prosodic features outperforms Structural features.

Results well above chance classification.

Interview / Functionals / Theatre recognized above 0.8 F-measure.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 20 / 23

Experiments

Prosodic features outperforms Structural features.

Results well above chance classification.

Interview / Functionals / Theatre recognized above 0.8 F-measure.

Confusion between Storytelling / Poetry

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 20 / 23

Experiments

Prosodic features outperforms Structural features.

Results well above chance classification.

Interview / Functionals / Theatre recognized above 0.8 F-measure.

Confusion between Storytelling / Poetry

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 20 / 23

Experiments

Prosodic features outperforms Structural features.

Results well above chance classification.

Interview / Functionals / Theatre recognized above 0.8 F-measure.

Confusion between Storytelling / Poetry

StoriesPoetry

TheatreFunctional

InterviewOther

Stories

Poetry

Theatre

Functional

Interview

Other

0

0.2

0.4

0.6

0.8

1

Confusion Matrix

InferredLabels

ReferenceLabels

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 20 / 23

Experiments

Storytelling and poetry are characterized by narrative/expressive speaking styles, howeverthe second is composed with more attention to rhythm and meters.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 21 / 23

Experiments

Storytelling and poetry are characterized by narrative/expressive speaking styles, howeverthe second is composed with more attention to rhythm and meters.

Linear regression on consecutive syllables can help in quantifying rhythm and meters[Kochanski2010].

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 21 / 23

Experiments

Storytelling and poetry are characterized by narrative/expressive speaking styles, howeverthe second is composed with more attention to rhythm and meters.

Linear regression on consecutive syllables can help in quantifying rhythm and meters[Kochanski2010].

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 21 / 23

Experiments

Storytelling and poetry are characterized by narrative/expressive speaking styles, howeverthe second is composed with more attention to rhythm and meters.

Linear regression on consecutive syllables can help in quantifying rhythm and meters[Kochanski2010].

Prosodic+Structural + Prediction FeaturesStorytelling 0.57 (0.59/0.55) 0.64 (0.65/0.63)

Poetry 0.45 (0.47/0.43) 0.52 (0.57/0.48)Theatre 0.81 (0.84/0.77) 0.85 (0.87/0.82)

Functionals 0.83 (0.81/0.84) 0.85 (0.83/0.87)Interview 0.87 (0.84/0.90) 0.88 (0.86/0.91)Others 0.23 (0.35/0.17) 0.23 (0.35/0.18)

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 21 / 23

Experiments

Storytelling and poetry are characterized by narrative/expressive speaking styles, howeverthe second is composed with more attention to rhythm and meters.

Linear regression on consecutive syllables can help in quantifying rhythm and meters[Kochanski2010].

Prosodic+Structural + Prediction FeaturesStorytelling 0.57 (0.59/0.55) 0.64 (0.65/0.63)

Poetry 0.45 (0.47/0.43) 0.52 (0.57/0.48)Theatre 0.81 (0.84/0.77) 0.85 (0.87/0.82)

Functionals 0.83 (0.81/0.84) 0.85 (0.83/0.87)Interview 0.87 (0.84/0.90) 0.88 (0.86/0.91)Others 0.23 (0.35/0.17) 0.23 (0.35/0.18)

Small improvements in disambiguating Storytelling and Poetry confusions.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 21 / 23

Experiments

Storytelling and poetry are characterized by narrative/expressive speaking styles, howeverthe second is composed with more attention to rhythm and meters.

Linear regression on consecutive syllables can help in quantifying rhythm and meters[Kochanski2010].

Prosodic+Structural + Prediction FeaturesStorytelling 0.57 (0.59/0.55) 0.64 (0.65/0.63)

Poetry 0.45 (0.47/0.43) 0.52 (0.57/0.48)Theatre 0.81 (0.84/0.77) 0.85 (0.87/0.82)

Functionals 0.83 (0.81/0.84) 0.85 (0.83/0.87)Interview 0.87 (0.84/0.90) 0.88 (0.86/0.91)Others 0.23 (0.35/0.17) 0.23 (0.35/0.18)

Small improvements in disambiguating Storytelling and Poetry confusions.

Still lot of confusion between those two classes.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 21 / 23

Outline of the talk

1 Introduction and Motivation

2 Data Description

3 System Description

4 Experiments

5 Conclusion

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 22 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Turn based evaluation with turns obtained using diarization.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Turn based evaluation with turns obtained using diarization.

Detection based on Structural features achieves F-measure 0.69

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Turn based evaluation with turns obtained using diarization.

Detection based on Structural features achieves F-measure 0.69

Detection based on Prosodic features achieves F-measure 0.80

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Turn based evaluation with turns obtained using diarization.

Detection based on Structural features achieves F-measure 0.69

Detection based on Prosodic features achieves F-measure 0.80

Once they are combined together they achieve F-measure 0.83

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Turn based evaluation with turns obtained using diarization.

Detection based on Structural features achieves F-measure 0.69

Detection based on Prosodic features achieves F-measure 0.80

Once they are combined together they achieve F-measure 0.83

Some data types are more easily recognized then others

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Turn based evaluation with turns obtained using diarization.

Detection based on Structural features achieves F-measure 0.69

Detection based on Prosodic features achieves F-measure 0.80

Once they are combined together they achieve F-measure 0.83

Some data types are more easily recognized then others

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23

Conclusion

Most of previous works on cultural heritage focused on broadcast interviews/testimoniesand made use of ASR words.

Feasibility study on Archives des parlers patois de la Suisse romande et des regions

voisines.

Half of those archives consists of storytelling, poetry, ...

Spoken in several French Swiss dialects, sparse and under-represented.

Can we detect and label folk literature in those archives without any word

information ?

Turn based evaluation with turns obtained using diarization.

Detection based on Structural features achieves F-measure 0.69

Detection based on Prosodic features achieves F-measure 0.80

Once they are combined together they achieve F-measure 0.83

Some data types are more easily recognized then others

Future works : make use of story/boundaries detector, richer feature sets.

Fabio Valente and Petr Motlicek (Idiap Research Institute)Detecting and Labeling Folk Literature CBMI 2012 23 / 23