Sentence stress in presidential...

Preview:

Citation preview

Sentence stress in presidential speeches

39th Annual Meeting of the DGfS

Workshop on Prosody in Syntactic Encoding

Saarbrücken, March 10, 2017

A R T O A N T T I L A , T I M O T H Y D O Z AT , D A N I E L G A L B R A I T H , A N D N A O M I S H A P I R O

Why are sentences stressed the way they are?

We are gòing to begìn to áct, begìnning TODÁY.(Ronald Reagan, Inaugural Address, January 20, 1981, Sentence 21)

Two kinds of sentence stress (Jespersen 1920: 212-222)(a) Mechanical stress, rhythmic stress, “physiological”

(rhythmischer Druck, Einheitsdruck)

(b) Meaningful stress, semantic stress, “psychological” (Wertdruck, Neuheitsdruck, Gegensatzdruck)

Semantic stressIn the Gilmore Girls universe, Luke and Lorelai seemed inevitable. He served the coffee; she needed the coffee. (Correction: NEEDED the coffee.)

Entertainment Weekly, November 17, 2016 http://www.ew.com/article/2016/11/17/gilmore-girls-luke-

originally-woman

Mechanical stressHow much did they pay you for participating in the experiment?Five francs.

(Ladd 1996: 166)

Semantic stress is related to new informationHow is information packaged in the sentence?

(a) Evenly spread (Uniform Information Density)(Levy and Jaeger 2007; Jaeger 2010)

(b) Piles up towards the end (Communicative Dynamism)(Prague School, e.g., Firbas 1971)

(c) Seeks out stress peaks (Stress-Information Alignment)(Bolinger 1957, 1972; Calhoun 2010; Cohen Priva 2012)

(a) Uniform Information Density

(b) Communicative Dynamism

(c) Stress-Information Alignment

Bolinger (1957: 235)“The recipe for reconciling the two functions [semantic and mechanical] is simple: the writer should make them coincide as nearly as he can by maneuvering the semantic heavy stress into the position of the mechanical loud stress; that is, toward the end.”

Stress-Information Alignment: A Proposal(a) Phrasal stress is assigned by syntax.

(Chomsky, Halle, and Lukoff 1956; Chomsky and Halle 1968; Liberman and Prince 1977; Cinque 1993)

(b) Information seeks out stress peaks, especially in good prose.(Bolinger 1957, 1972; Calhoun 2010; Cohen Priva 2012)

stress = metrical strength

Plan of work1. Find a text performed by an individual (script + audio + video).

Inaugural addresses of Carter (1977), Reagan (1981), Bush Sr., (1989), Clinton (1993), Bush Jr. (2001), Obama (2009)

2. Assign mechanical stress to text by a computer.(MetricalTree, Dozat 2015-7)

3. Collect perceived stress judgments from native speakers.(MetricGold, Shapiro 2016-7)

4. Figure out to what extent perceived stress is explained by(i) the mechanical stress contour(ii) the distribution of information

Why is this interesting?Sentence stress is difficult to pin down:

• not represented in writing• hard to measure by phonometric methods

Yet it exists and is a hidden variable in many studies.

Understanding sentence stress may help solve other linguistic puzzles.

Preview of findings1. Both kinds of stress matter, but not uniformly:• Noun and adjective stresses tend to be loud and mechanical.• Verb and function word stresses tend to be soft and semantic.

2. Stress levels vary significantly across parts of speech:

nouns > adjectives > verbs > function words

1. Predicting mechanical stress

Rules vs. variabilityThe Nuclear Stress Rule (NSR) and the Compound Stress Rule (CSR)(Chomsky and Halle 1968, Liberman and Prince 1977, Cinque 1993)

Sentence stress is variable. Why?• Free will• Ambiguity in lexical stress results in variation in phrasal stress:

unstressed words (e.g., expletive it)stress-ambiguous words (e.g., in, into)stressed words (e.g., balloon).

Assumption: No variation in the phrasal stress rules themselves.

The stress rules (Chomsky and Halle 1968)The Nuclear Stress Rule (NSR):

Assign [1 stress] to the rightmost vowel bearing the feature [1 stress]. Applies to phrases (NP, VP, AP, S).

The Compound Stress Rule (CSR):Skip over the rightmost word and assign [1 stress] to the rightmost remaining [1 stress] vowel; if there is no [1 stress] to the left of the rightmost word, then try again without skipping the word. Applies to words (N, A, V).

Sample derivation

[[[John's] [[[black] [board]] [eraser]]] [was stolen]]

1 1 1 1 1

First cycle

[[[John's] [[[black] [board]] [eraser]]] [was stolen]]

1 1 1 1 1

[ 1 2 ]

Second cycle

[[[John's] [[[black] [board]] [eraser]]] [was stolen]]

1 1 1 1 1

[ 1 2 ]

[ 1 3 2 ]

Third cycle

[[[John's] [[[black] [board]] [eraser]]] [was stolen]]

1 1 1 1 1

[ 1 2 ]

[ 1 3 2 ]

[ 2 1 4 3 ]

Final cycle

[[[John's] [[[black] [board]] [eraser]]] [was stolen]]

1 1 1 1 1

[ 1 2 ]

[ 1 3 2 ]

[ 2 1 4 3 ]

[ 3 2 5 4 1 ]

Liberman and Prince’s (1977) versionThe rules are defined on local syntactic trees as follows:

In a configuration [A B],if the constituent is a phrase, B is strong (= NSR)if the constituent is a word, B is strong iff it branches (= CSR)

Syntax• To assign phrasal stress we need a syntactic parse.• We used the Stanford Parser (Chen and Manning 2014)• http://nlp.stanford.edu/software/lex-parser.shtml

Lexical stress(a) Unstressed words: it

Unstressed tags: CC, PRP$, TO, UH, DTUnstressed deps: det, expl, cc, mark

(b) Ambiguous words: this, that, these, thoseAmbiguous tags: MD, IN, PRP, WP$, PDT, WDT, WP, WRBAmbiguous deps: cop, neg, aux, auxpass

(c) All other words, tags, and deps are stressed.

Phrasal stressA sentence has 2n stress paths where n = the # of ambiguous words.

Example:

I ask you to share with me today the majesty of this moment(Richard Nixon, Inaugural Address, January 20, 1969, Sentence 2)

Stress paths: 26 = 64

Phrasal stressInstead of examining all parses we limit ourselves to the following:

Model 1: All ambiguous words unstressedModel 2: All monosyllabic ambiguous words unstressed;

all polysyllabic ambiguous words stressedModel 3: All ambiguous words stressedModel 4: The ensemble model (= mean model)

the savings of many years in thousands of families are gone(FDR, Inaugural Address, March 4, 1933, Sentence 19)

2. Perceived stress

What is perceived stress?Perceived stress = syllable prominence felt by a native speaker

Syllable prominence is “for the large part the work of the perceiver, generating his internal accent pattern on the basis of a strategy by which he assigns structures to the utterances. These structures, however, are not fabrications of the mind only, for they can be related to sound cues.”

(van Katwijk 1974: 5, cited in Baart 1987: 4)

No attempt to eliminate variation• Two native speakers may perceive the same prominence contour

differently: transcriptions reflect the grammar of the annotators.

• Variation is not noise, but data. We did not attempt to eliminate variation from transcriptions, beyond loose annotation guidelines.

• Interannotator reliability is good (Cronbach’s alpha = 0.85).

The Metric Gold annotation interface

Predicted stress (the mean model)“We are going to begin to act, beginning today” (Reagan 1981)

Perceived stress (Annotator 1)“We are going to begin to act, beginning today” (Reagan 1981)

Predicted vs. perceived stress (Annotator 1)

Predicted vs. perceived stress (Annotator 2)

The information-theoretic view“The error of attributing to syntax what belongs to semantics comesfrom concentrating on the commonplace. In phrases like bóoks towrite, wórk to do, clóthes to wear, fóod to eat, léssons to learn, gróceries toget - as they occur in most contexts - the verb is highly predictable:food is to eat, clothes are to wear, work is to do, lessons are to learn.Less predictable verbs are less likely to be de-accented-where one hasléssons to learn, one will probably have pássages to mémorize. It is onlyincidental that the syntax favors one or the other accent pattern.”

(Bolinger 1972, pp. 634)

Approximating the information of a worddoc.freq Document lexical frequencyd.cp.1 Document conditional probability (unigram)d.cp.2 Document conditional probability (bigram)d.cp.3 Document conditional probability (trigram)d.inform.2 Document informativity (bigram)d.inform.3 Document informativity (trigram)

corpus.freq Corpus lexical frequencyc.cp.1 Corpus conditional probability (unigram)c.cp.2 Corpus conditional probability (bigram)c.cp.3 Corpus conditional probability (trigram)c.inform.2 Corpus informativity (bigram)c.inform.3 Corpus informativity (trigram)

Perceived stress vs. corpus frequency (Annotator 1)

Perceived stress vs. corpus frequency (Annotator 2)

Perceived stress vs. bigram informativity (Annotator 1)

Perceived stress vs. bigram informativity (Annotator 2)

Informativity vs. linear position (Prague school)(Pearson correlation = 0.02, p = 0.01643)

Information vs. predicted stress (Bolinger 1957)(Pearson correlation = 0.40, p < 2.2e-16)

Bolinger correlations (stress vs. information)

Perceived (A1) Perceived (A2) PredictedBush, Jr. -0.5989862 -0.5461876 -0.4721894Bush, Sr. -0.5623127 -0.5184216 -0.4511763Carter -0.5693488 -0.5368974 -0.4670454Clinton -0.5652638 -0.5078555 -0.4798756Obama -0.5259391 -0.5275555 -0.470289Reagan -0.5326005 -0.5143632 -0.4535813

green = highest scorered = lowest score

Regression modeling: Predicting perceived stressFavors: high bigram informativity

high mechanical stress being a nounlate sentence position (but this depends on the annotator)

Disfavors: being a verb being a function word

Modeling perceived stress (Annotator 1)lm(formula = annotator1.log ~ c.inform.2 + mmean + category + widx,

data = all.presidents.data.core)

Residuals:

Min 1Q Median 3Q Max

-1.4781 -0.3108 -0.0468 0.2918 1.5699

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.7171844 0.0232962 30.786 < 2e-16 ***

c.inform.2 0.0647866 0.0021650 29.924 < 2e-16 ***

mmean 0.0634744 0.0049675 12.778 < 2e-16 ***

categoryFUNC -0.4661109 0.0168874 -27.601 < 2e-16 ***

categoryNOUN 0.1014131 0.0155456 6.524 7.17e-11 ***

categoryVERB -0.1791090 0.0162495 -11.022 < 2e-16 ***

widx 0.0015645 0.0004046 3.867 0.000111 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3911 on 10945 degrees of freedom

(30 observations deleted due to missingness)

Multiple R-squared: 0.5026, Adjusted R-squared: 0.5023

F-statistic: 1843 on 6 and 10945 DF, p-value: < 2.2e-16

Modeling perceived stress (Annotator 2)lm(formula = annotator2.log ~ c.inform.2 + mmean + category + widx,

data = all.presidents.data.core)

Residuals:

Min 1Q Median 3Q Max

-1.2963 -0.2193 -0.1133 0.2089 1.5097

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.5530634 0.0219427 25.205 < 2e-16 ***

c.inform.2 0.0695084 0.0020395 34.082 < 2e-16 ***

mmean 0.0346362 0.0046788 7.403 1.43e-13 ***

categoryFUNC -0.5283504 0.0158994 -33.231 < 2e-16 ***

categoryNOUN 0.0695119 0.0146388 4.748 2.08e-06 ***

categoryVERB -0.1949493 0.0153017 -12.740 < 2e-16 ***

widx 0.0004550 0.0003811 1.194 0.233

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3685 on 10946 degrees of freedom

(29 observations deleted due to missingness)

Multiple R-squared: 0.5382, Adjusted R-squared: 0.5379

F-statistic: 2126 on 6 and 10946 DF, p-value: < 2.2e-16

>

The relative importance of predictors of perceived stress

The relative importance of predictors of perceived stress

The POS effect: Perceived stress vs. informativity

The POS effect: Perceived stress vs. informativity

“confronting” vs. “have”We must show courage in a time of blessing by confronting problems instead of passing them on to future generations . (stress = 5)

Can we solve the problems confronting us? (stress = 3)

We don't have to talk late into the night about which form ofgovernment is better . (stress = 5)

We have every right to dream heroic dreams . (stress = 0) (There are 51 examples like this in the corpus.)

inform(confronting) = 6.054915inform(have) = 3.809777

The POS effect: Perceived stress vs. corpus frequency

The POS effect: Perceived stress vs. corpus frequency

Mean stresses for different parts of speech

Partial effects: Perceived stress in nouns

Partial effects: Perceived stress in verbs

3. General implications

Lexical frequency effectsObservation: High-frequency words reduce, low frequency words don’t.

Conjecture:• Low-frequency words (especially nouns) are high in information and

tend to occur in nuclear stress positions.• Hence they get high levels of phrasal stress.• Stress prevents reduction. • Hence low-frequency words (especially nouns) resist reduction.

If this is correct, lexical frequency effects reflect crystallized phrasal stress (cf. Coetzee and Kawahara 2013).

Lexical category effectsThere are accentual differences among parts of speech (Ladd 1996).Here’s one “accentability hierarchy”:

nouns > other lexical words > function words

Accent rule:• Accent is placed on the most accentable element of the focused

constituent.• If two elements belong to the same category, the one further to the

right in the sentence is more accentable.

Two accentability hierarchies (cited in Baart 1987)

• command verbs > quantifiers > nouns > sentence adverbs > adjectives > main verbs > negatives > pronouns > auxiliary verbs > copulatives > relatives > possessive determiners > prepositions > conjunctions > articles (Lea 1979)

• sentential adverbs > negatives > dummy auxiliaries in positive sentences > quantifiers > certain modals > adjectives > regular adverbs > nouns > negative contractions > verbs > demonstrative pronouns > prepositions > auxiliaries > articles (O'Shaughnessy and Allen 1983)

What on earth are these hierarchies?• Are they primitives of grammar?

• Perhaps they reflect the typical distribution of mechanical stress: nuclear stress falls typically on nouns, less typically on verbs, and least typically on function words.

• Conjecture: Accentability hierarchies derive from sentence stress.

More lexical category effects• A phonological “privilege scale” N > A > V manifests itself in

segmental phonology and is near-universal (Smith 2011).

• Mean word lengths in the presidents corpus:SEGMENTS SYLLABLES FEET

Nouns: 5.641571 2.115994 1.098343Adjectives: 5.153664 1.973995 1.056738Verbs: 3.862823 1.40507 1.027336

• Smith (1997) proposes special “noun faithfulness” constraints.

Recall Bolinger’s proposal: Accent goes by information(a) I have LESSONS to learn.

I have PASSAGES to MEMORIZE.

(b) Those are CRAWLING things.Those are CRAWLING INSECTS.

(c) I've got to SEE a guy.I've got to SEE a DOCTOR.

But the verbs also differ phonologically(monosyllable vs. polysyllable)

Number of segments: Nouns

Number of segments: Verbs

More lexical category effects: Foot structure• Nouns tend to be exhaustively footed; adjectives and verbs variably

so; function words tend to be extrametrical. Finnish:

/tavara-i-ta/ (tá.va)(ròi.ta) ‘thing-PL-PAR’ noun

vowel and consonant preserved

/avara-i-ta/ (á.va)ri.a ‘wide-PL-PAR’ adjective

vowel and consonant deleted

4. Summary

Summary of results1. Both mechanical and semantic stress are real:• Noun and adjective stresses tend to be loud and mechanical.• Verb and function word stresses tend to be soft and semantic.

2. Stress levels vary significantly across parts of speech:

nouns > adjectives > verbs > function words

3. Speculation: Lexical frequency and lexical category effects reflect the differential distribution of sentence stress.

References (1)Baart, Joan. 1987. Focus, Syntax, and Accent Placement, Ph.D. Dissertation, University of

Leiden.Bolinger, Dwight L. 1957. Maneuvering for Stress and Intonation. College Composition and

Communication, 8(4), 234-238.Bolinger, Dwight L. 1972. Accent is predictable (if you are a mind reader). Language 48, 633-

644.Bybee, Joan L. 2001. Phonology and Language Use, Cambridge University Press, Cambridge,

U.K.Calhoun, Sasha. 2010. How does informativeness affect prosodic prominence? Language and

Cognitive Processes 25(7-9), 1099-1140.Chen, Danqi and Christopher D Manning. 2014. A Fast and Accurate Dependency Parser using

Neural Networks, Proceedings of EMNLP 2014.Cinque, Guglielmo. 1993. A null-theory of phrase and compound stress. Linguistic Inquiry 24:

239-298.

References (2)Chomsky, Noam, Morris Halle and Fred Lukoff. 1956. ‘On accent and juncture in English’, in

M. Halle et al. (eds.), For Roman Jakobson: Essays on the occasion of his sixtieth birthday, Mouton & Co., The Hague, pp. 65-80.

Chomsky, Noam and Morris Halle. 1968. The Sound Pattern of English, Harper and Row, New York.

Coetzee, Andries and Shigeto Kawahara. 2013. ‘Frequency biases in phonological variation’, Natural Language and Linguistic Theory, 31, 47-89.

Cohen Priva, Uriel. 2012. Sign and signal: Deriving linguistic generalizations from information utility. Unpublished doctoral dissertation, Stanford University.

Firbas, Jan. 1971. On the concept of communicative dynamism in the theory of functional sentence perspective. Sbornik Prací Filosofické Fakulty Brnénské University (Studia Minora Facultatis Philosophicae Universitatis Brunensis) A-19. 135-144.

Gussenhoven, Carlos. 1983. Focus, mode and the nucleus. Journal of linguistics 19(2), 377-417.Jaeger, T. Florian. 2010. Redundancy and reduction: Speakers manage syntactic information

density. Cognitive psychology 61(1), 23-62.

References (3)Jespersen, Otto. 1920. Lehrbuch der Phonetik: Mit 2 Tafeln. BG Teubner.Ladd, D. Robert. 1996. Intonational Phonology. Cambridge University Press.Levy RP, Jaeger TF. 2007. Speakers optimize information density through syntactic reduction.

In Proceedings of the 20th Conference on Advances in Neural Information Processing Systems(NIPS 2007), ed. J. C. Platt, D. Koller, Y. Singer, S. T. Roweis, pp. 849-56. Curran Assoc., Red Hook, N.Y.

Liberman, Mark, and Alan Prince. On stress and linguistic rhythm. Linguistic Inquiry 8(2), 249-336.

Smith, Jennifer L. 1997. Noun faithfulness: On the privileged behavior of nouns in phonology. Rutgers Optimality Archive, http://ruccs. rutgers. edu/roa.

Smith, Jennifer L. 2011. Category-specific effects. The Blackwell Companion to Phonology 4, 2439-2463.

CreditsThanks for collaboration:• Alex Wade

Thanks for funding: • Stanford University, Office of the Vice-Provost for Undergraduate Education• The Roberta Bowman Denning Initiative Committee, H&S Dean’s Office

Thanks for advice, comments, suggestions, and criticisms: • Jared Bernstein, Joan Bresnan, Penny Eckert, Ryan Heuser, Paul Kiparsky,

Mark Liberman

Recommended