19
Prosodic measurements and question types in the Spontal corpus of Swedish dialogues Sofia Strömbergsson, Jens Edlund & David House KTH Speech, Music and Hearing Stockholm, Sweden

Prosodic measurements and question types in the Spontal corpus of Swedish dialogues

  • Upload
    bryce

  • View
    36

  • Download
    3

Embed Size (px)

DESCRIPTION

Sofia Strömbergsson, Jens Edlund & David House KTH Speech , Music and Hearing Stockholm, Sweden. Prosodic measurements and question types in the Spontal corpus of Swedish dialogues. VariQ. Investigate and describe intonational variation in questions in spontaneous dialogue . - PowerPoint PPT Presentation

Citation preview

Prosodic measurements and question types in the Spontal corpus of Swedish dialogues

Prosodic measurements and question types in the Spontal corpus of Swedish dialoguesSofia Strmbergsson, Jens Edlund & David HouseKTH Speech, Music and HearingStockholm, Sweden

Investigate and describe intonational variation in questions in spontaneous dialogue.

Is there a standard type of question intonation?VariQ2Question intonationDeclarative => InterrogativeYou like apples. vs. You like apples?

Different question types, different intonationY/N questions: Wh-questions:

Where is the rise in conversational speech?Declarative => Interrogative by intonation alone (often lab settings).

In many languages, Y/N => RISE and Wh => LOW.But In e.g. Dutch conversational data, RISE more common in Y/N than in Wh (Heuven et al, 1999)

Eng. conv. data, Y/N: RISE not very common (Geluykens, 1988)German conv. data, Wh: intonation not linked to syntactic sentence structure; rather an independent signalling system (Selting, 1992; Selting, 1996)German conv. data, Y/N & Wh: rise and fall occur in both (Kohler, 2004)Swedish computer-directed spontaneous, Wh: final rises in 22%; friendlier and more socially interested

Kohler and Selting (and House et al): intonation is better related to semantic and pragmatic functions, than to the syntactic structure of questions.3Spontal corpus60 hours of dialogue / 120 half-hour sessions

Talk freely about anything

Audio, video and motion capture.

Balanced for the participants gender and previous acquaintance.

4Question extraction24 dialogue subset (= 12 hours)Talkspurts based on voice-activity/silenceOrthographic transcriptions

Two annotators tag questions908 talkspurts with question tag600 selected (Spontal balance, 2+ annotators agreeing)Talkspurts were extracted automatically based on pauses (= Voice Activity Detection).

Each dialogue was transcribed by one annotator, and then checked by another. Both annotators looked for questions while annotating, and labeled these with a question tag. Definition: Anyting that resembles, structurally or functionally, in whole or in part, a question.5Question markupQ1: [Y/N] [Wh] [Alts] [Other] Q2: [Required] [Optional] [Prohibited]Q3: [Forward] [Backward]Q4: [Reported] [Direct]

Ser du att det r hl i botten?Do you see theres a hole in the bottom?Q1-Q4, kept simple => repeatable and objective categorization of questions.Inspired by Stivers and Enfield (2010).

Q1: Related to traditional question typesQ2: Degree of response elicitationQ3: Does the person asking the question ask for something that has not already been said, or is it more a question of verifying or showing attitude towards what has already been stated?Q4: Is the question a case of reported speech or not?6Question typesQ1:Q2:Q3:Q4: Y/N > Wh >> Alts & OtherRequired >> Optional > ProhibitedForward >> BackwardDirect >> ReportedThe distribution of question types within the 600 questions.7Prosodic measuresPITCH

Variation

Rising/Falling intonationDIFF: (Avg pitch)2ndHalf - (Avg pitch)1stHalf TIME

Duration

Speech rateWeve looked at many different prosodic measures, I will focus on

Pitch variationPitch tracked in semitonesVAR = pitch variation within a question

DIFF: A rough measure of gross pitch movement through the question

8Duration

Significant differences between (almost) all Q1 categoriesWhos that?Is that Obama?Forward-directed questions longer

Is that Obama or Romney?Trivial: Alt qns longer than other qns. However, Y/N are also longer than Wh. All three types contain 0 or more response alternatives

Forward-qns by definition request for new information; has to be contained in the question.Backward-qns refer back to the previous dialogue, assumably with referential pronouns which are typically short.9Speech rateOther are slowerOptional are slowerBackward are slower

Is that true?!Did you?!Oh yeah?

Speech rate was calculated as number_of_words/sec. (Also as c2v-transitions/sec, but the results are the same.)10Pitch variation

More varied pitch in questions to which answers are OptionalIs that true?!Did you?!

More varied pitch in Backward-directed questionsReflecting attitude?Examples of Optional: Okey?, Did you?, Is that true?

11Rising/Falling intonation

Y/N falling, Wh rising (or flat)Forward falling, Backward rising (or flat) but large variation

Measure comparing the average pitch of the first and the second half of the question. (Negative values = falling, Positive values = rising.)12Rising/Falling intonation (cont.)

Rising intonation signalling non-understanding/non-acceptance?Tycker du inte?Dont you think?r det sant?Is that true?Vad sa du?Whatd you say?Jaha?Oh yeah?

To disentangle the variation within Backward-qns, we clustered the questions by Q1-Q3.1) Among RISING, all are Backward-directed.2) Only one Backward-qn type have FALLING.

Qualitative inspection of the questions with rising intonation lead us to suggest that the rise is a signal of non-understanding or non-acceptance of the preceding context. Obviously, our categorization scheme that doesnt reveal much semantic/pragmatic detail doesnt allow us to test this. More detailed analysis needed to explore that.13ConclusionsStrengths of coding scheme: GeneralQuick-and-easyOrthogonal, allows clustering across different dimensions

Interplay across different dimensionsOptional + BackwardOther + Optional + BackwardMore pitch variationSlowerAttitude?Uncertainty?Non-agreement?We are quite happy with the coding scheme weve suggested. General: Applicable to spontaneous dialogue.Quick-and-easy. (Plan to let more annotators try it.) Orthogonal/Clustering: allows subcategorization that is valuable when exploring the function of prosody in questions.

On the downside, it does not give semantic/pragmatic detail that would take longer and would also require that the annotators look at the context in which the questions occur.

14Thank you for your attention!Question markupQ1: [Y/N] [Wh] [Alts] [Other] Q2: [Required] [Optional] [Prohibited]Q3: [Forward] [Backward]Q4: [Reported] [Direct]

Vad betyder det?What does that mean?17Rising/Falling intonation (cont.)DIFF: Y/N falling, Wh rising; Backward risingBut confusion with sentence focus?

PROS: No significant dependenciesSo, to conclude what we found by measuring rising/falling intonation by comparing the average pitch of the first half of the question to the average pitch of the second half of the question:

Y/N falling, Wh risingBackward rising

However, we realize that this is a rough measure of intonation movement through a question. For example, it is bound to be affected by where the focus is in a sentence. Early focus -> fall, late focus -> rise

As I mentioned earlier, we also used another measure to estimate the intonation movement within questions. This measure is based on Prosogram a procedure avaliable in Praat that estimates what pitch movement is perceptually relevant. This allowed us to inspect the perceptually relevant pitch movement at the very end of a question movement within the last voiced segment or between the second last and the last segment.

When we examined if the PROS-measure of rising/falling pitch was dependent on any of the question types Q1-Q4, we found no significant dependencies. In other words, rises and falls (as measured by PROS) are, it seems, distributed randomly across the different question types.18Summary: Prosodic measuresDuration: Dependent on number of altsSpeech rate: Slow reflecting uncertainty?

Pitch variation: Reflecting attitude?

Rising/Falling intonation: DIFF: Dependent on sentence focus?PROS: Not dependent on question type