9
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, ACL 2010, pages 61–69, Uppsala, Sweden, 16 July 2010. c 2010 Association for Computational Linguistics No sentence is too confusing to ignore Paul Cook Department of Computer Science University of Toronto Toronto, Canada [email protected] Suzanne Stevenson Department of Computer Science University of Toronto Toronto, Canada [email protected] Abstract We consider sentences of the form No X is too Y to Z, in which X is a noun phrase, Y is an adjective phrase, and Z is a verb phrase. Such constructions are ambiguous, with two possible (and oppo- site!) interpretations, roughly meaning ei- ther that “Every X Zs”, or that “No X Zs”. The interpretations have been noted to de- pend on semantic and pragmatic factors. We show here that automatic disambigua- tion of this pragmatically complex con- struction can be largely achieved by us- ing features of the lexical semantic prop- erties of the verb (i.e., Z ) participating in the construction. We discuss our experi- mental findings in the context of construc- tion grammar, which suggests a possible account of this phenomenon. 1 No noun is too adjective to verb Consider the following two sentences: (1) No interest is too narrow to deserve its own newsletter. (2) No item is too minor to escape his attention. Each of these sentences has the form of No X is too Y to Z, where X, Y, and Z are a noun phrase, ad- jective phrase, and verb phrase, respectively. Sen- tence (1) is generally taken to mean that every in- terest deserves its own newsletter, regardless of how narrow it is. On the other hand, (2) is typi- cally interpreted as meaning that no item escapes his attention, regardless of how minor it is. That is, sentences with the identical form of No X is too Y to Z either can mean that “every X Zs”, or can mean the opposite—that “no X Zs”! 1 1 Note that in examples (1) and (2), the nouns interest and item are the subjects of the verbs deserve and escape, respec- This “verbal illusion” (Wason and Reich, 1979), so-called because there are two opposite inter- pretations for the very same structure, is of in- terest to us for two reasons. First, the con- tradictory nature of the possible meanings has been explained in terms of pragmatic factors con- cerning the relevant presuppositions of the sen- tences. According to Wason and Reich (1979) (as explained in more detail below), sentences such as (2) are actually nonsensical, but people coerce them into a sensible reading by revers- ing the interpretation. One of our goals in this work is to explore whether computational lin- guistic techniques—specifically automatic corpus analysis drawing on lexical resources—can help to elucidate the factors influencing interpretation of such sentences across a collection of actual us- ages. The second reason for our interest in this con- struction is that it illustrates a complex ambigu- ity that can cause difficulty for natural language processing applications that seek to semantically interpret text. Faced with the above two sen- tences, a parsing system (in the absence of spe- cific knowledge of this construction) will presum- ably find the exact same structure for each, giv- ing no basis on which to determine the correct meaning from the parse. (Unsurprisingly, when we run the C&C Parser (Curran et al., 2007) on (1) and (2) it assigns the same structure to each sen- tence.) Our second goal in this work is thus to ex- plore whether increased linguistic understanding of this phenomenon could be used to disambiguate such examples automatically. Specifically, we use this construction as an example of the kind of difficulties faced in semantic interpretation when meaning may be determined by pragmatic or other extra-syntactic factors, in order to explore whether tively. In this construction the noun can also be the object of the verb, as in the title of this paper which claims no sentence can/should be ignored. 61

No Sentence Is Too Confusing To Ignore

  • Upload
    ngophuc

  • View
    230

  • Download
    1

Embed Size (px)

Citation preview

Page 1: No Sentence Is Too Confusing To Ignore

Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground, ACL 2010, pages 61–69,Uppsala, Sweden, 16 July 2010. c©2010 Association for Computational Linguistics

No sentence is too confusing to ignore

Paul CookDepartment of Computer Science

University of TorontoToronto, Canada

[email protected]

Suzanne StevensonDepartment of Computer Science

University of TorontoToronto, Canada

[email protected]

Abstract

We consider sentences of the formNoX is too Y to Z, in which X is a nounphrase, Y is an adjective phrase, and Zis a verb phrase. Such constructions areambiguous, with two possible (and oppo-site!) interpretations, roughly meaning ei-ther that “Every X Zs”, or that “No X Zs”.The interpretations have been noted to de-pend on semantic and pragmatic factors.We show here that automatic disambigua-tion of this pragmatically complex con-struction can be largely achieved by us-ing features of the lexical semantic prop-erties of the verb (i.e.,Z) participating inthe construction. We discuss our experi-mental findings in the context of construc-tion grammar, which suggests a possibleaccount of this phenomenon.

1 No noun is too adjective to verb

Consider the following two sentences:

(1) No interest is too narrow to deserve its ownnewsletter.

(2) No item is too minor to escape his attention.

Each of these sentences has the form ofNo X is tooY to Z, where X, Y, and Z are a noun phrase, ad-jective phrase, and verb phrase, respectively. Sen-tence (1) is generally taken to mean thatevery in-terest deserves its own newsletter, regardless ofhow narrow it is. On the other hand, (2) is typi-cally interpreted as meaning thatno item escapeshis attention, regardless of how minor it is. Thatis, sentences with the identical form ofNo X is tooY to Z either can mean that “every X Zs”, or canmean the opposite—that “no X Zs”!1

1Note that in examples (1) and (2), the nounsinterest anditem are the subjects of the verbsdeserve andescape, respec-

This “verbal illusion” (Wason and Reich, 1979),so-called because there are two opposite inter-pretations for the very same structure, is of in-terest to us for two reasons. First, the con-tradictory nature of the possible meanings hasbeen explained in terms of pragmatic factors con-cerning the relevant presuppositions of the sen-tences. According to Wason and Reich (1979)(as explained in more detail below), sentencessuch as (2) are actually nonsensical, but peoplecoerce them into a sensible reading by revers-ing the interpretation. One of our goals in thiswork is to explore whether computational lin-guistic techniques—specifically automatic corpusanalysis drawing on lexical resources—can helpto elucidate the factors influencing interpretationof such sentences across a collection of actual us-ages.

The second reason for our interest in this con-struction is that it illustrates a complex ambigu-ity that can cause difficulty for natural languageprocessing applications that seek to semanticallyinterpret text. Faced with the above two sen-tences, a parsing system (in the absence of spe-cific knowledge of this construction) will presum-ably find the exact same structure for each, giv-ing no basis on which to determine the correctmeaning from the parse. (Unsurprisingly, whenwe run the C&C Parser (Curran et al., 2007) on (1)and (2) it assigns the same structure to each sen-tence.) Our second goal in this work is thus to ex-plore whether increased linguistic understandingof this phenomenon could be used to disambiguatesuch examples automatically. Specifically, we usethis construction as an example of the kind ofdifficulties faced in semantic interpretation whenmeaning may be determined by pragmatic or otherextra-syntactic factors, in order to explore whether

tively. In this construction the noun can also be the object ofthe verb, as in the title of this paper which claims no sentencecan/should be ignored.

61

Page 2: No Sentence Is Too Confusing To Ignore

lexical semantic features can be used as cues toresolving pragmatic ambiguity when a complexsemantico-pragmatic model is not feasible.

In the remainder of this paper, we present thefirst computational study of theNo X is too Y toZ phenomenon, which attempts to automaticallydetermine the meaning of instances of this seman-tically and pragmatically complex construction. InSection 2 we present previous analyses of thisconstruction, and our hypothesis. In Section 3,we describe the creation of a dataset of instancesthat verifies that both interpretations (“every” and“no”) indeed occur in corpora. We then analyzethe human annotations in this dataset in more de-tail in Section 4. In Section 5, we present the fea-ture model we use to describe the instances, whichtaps into the lexical semantics and polarity of theconstituents. In Section 6, we describe machinelearning experiments and classification results thatsupport our hypothesis that the interpretation ofthis construction largely depends on the semanticsof its component verb. In Section 7 we suggest thatour results support an analysis of this phenomenonwithin construction grammar, and point to somefuture directions in our research in Section 8.

2 Background and our proposal

The No X is too Y to Z construction was investi-gated by Wason and Reich (1979), and discussedmore recently by Pullum (2004) and Liberman(2009a,b). Here we highlight some of the mostimportant properties of this complex phenomenon.Our presentation owes much to the lucid discus-sion and clarification of this topic, and of the workof Wason and Reich specifically, by Liberman.

Wason and Reich argue that the compositionalinterpretation of sentences of the form of (1) and(2) is “every X Zs”. Intuitively, this can be under-stood by considering a sentence identical to sen-tence (1), but without a negative subject:This in-terest is too narrow to deserve its own newslet-ter, which means that “this interest is so narrowthat it does not deserve a newsletter”. This ex-ample indicates that the meaning oftoo narrowto deserve its own newsletter is “so narrow thatit does not deserve a newsletter”. When this neg-ative “too” assertion is compositionally combinedwith the No interest subject of sentence (1), it re-sults in a meaning with two negatives: “No inter-est is so narrow that it does not deserve a newslet-ter”, or simply, “Every interest deserves a newslet-

ter”. Wason and Reich note that in sentences suchas (1), the compositional “every” interpretation isconsistent with common beliefs about the world,and thus refer to such sentences as “pragmatic”.

By contrast, the compositional interpretation ofsentences such as (2) does not correspond to ourcommon sense beliefs. Consider an analogous(non-negative subject) sentence to sentence (2)—i.e.,This item is too minor to escape his attention.It is nonsensical that “This item is so minor thatit does not escape his attention”, since being more“minor” entails more likelihood of escaping atten-tion, not less. The compositional interpretation of(2) is similarly nonsensical—i.e., that “No itemis so minor that it does not escape his attention”;Such sentences are thus termed “non-pragmatic”by Wason and Reich, who argue that the com-plexity of the non-pragmatic sentences—arising inpart due to the number of negations they contain—causes the listener or reader to misconstrue them.According to their reasoning, listeners choose aninterpretation that is consistent with their beliefsabout the world—namely that “no X Zs”, in thiscase that “No item escapes his attention”—insteadof the compositional interpretation (“Every itemescapes his attention”).

While Wason and Reich focus on the compo-sitional semantics and pragmatics of these sen-tences, they also note that the non-pragmatic ex-amples typically use a verb that itself has someaspect of negation, such asignore, miss, andover-look. This property is also pointed out by Pullum(2004), who notes thatavoid in his example ofthe construction means “manage tonot do” some-thing. Building on this observation, we hypothe-size that lexical properties of the component con-stituents of this construction, particularly the verb,can be important cues to its semantico-pragmaticinterpretation. Specifically, we hypothesize thatthe pragmatic (“every” interpretation) and non-pragmatic (“no” interpretation) sentences will tendto involve verbs with different semantics. Giventhat verbs of different semantic classes have differ-ent selectional preferences, we also expect to seethe “every” and “no” sentences associated with se-mantically different nouns and adjectives.

3 Dataset

3.1 Extraction

To create a dataset of usages of the constructionno NP is too AP to VP—referred to as the tar-

62

Page 3: No Sentence Is Too Confusing To Ignore

get construction—we use two corpora: the BritishNational Corpus (Burnard, 2000), an approxi-mately one hundred million word corpus of late-twentieth century British English, and The NewYork Times Annotated Corpus (Sandhaus, 2008),approximately one billion words of non-newswiretext from the New York Times from the years1987–2006. We extract all sentences in these cor-pora containing the sequence of stringsno, is too,and to separated by one or more words. We thenmanually filter all sentences that do not havenoNP as the subject ofis too, or that do not havetoVP as an argument ofis too. After removing dupli-cates, this results in 170 sentences. We randomlyselect 20 of these sentences for development data,leaving 150 sentences for testing.

Although we find only 170 examples of thetarget construction in 1.1 billion words of text,note that our extraction process is quite strict andmisses some relevant usages. For example, we donot extract sentences of the formNothing is too Yto Z in which the subject NP does not contain theword no. Nor do we extract usages of the relatedconstructionNo X is too Y for Z, where Z is an NPrelated to a verb, as inNo interest is too narrowfor attention. (We would only extract the latter ifthere were an infinitive verb embedded in or fol-lowing the NP.) In the present study we limit ourconsideration to sentences of the form discussedby Wason and Reich (1979), but intend to con-sider related constructions such as these—whichappear to exhibit the same ambiguity as the targetconstruction—in the future.

We next manually identify the noun, adjective,and verb that participate in the target constructionin each sentence. Although this could be done au-tomatically using a parser (e.g., Collins, 2003) orchunker (e.g., Abney, 1991), here we want to en-sure error-free identification. We also note a num-ber of sentences containing co-ordination, such asin the following example.

(3) These days, no topic is too recent orspecialized to disqualify it from museumapotheosis.

This sentence contains two instances of the tar-get construction: one corresponding to the noun-adjective-verb tripletopic, recent, disqualify, andthe other to the tripletopic, specialized, disqual-ify. In general, we consider each unique noun-adjective-verb triple participating in the target con-struction as a separate instance.

3.2 Annotation

We used Amazon Mechanical Turk (AMT,https://www.mturk.com/ ) to obtain judge-ments as to the correct interpretation of each in-stance of the target construction in both the devel-opment and testing datasets. For each instance, wegenerated two paraphrases, one corresponding toeach of the interpretations discussed in Section 1.We then presented the given instance of the targetconstruction along with its two paraphrases to an-notators through AMT, as shown in Table 1. Ingenerating the paraphrases, one of the authors se-lected the most appropriate paraphrase, in theirjudgement, wherecan in the paraphrases in Ta-ble 1 was selected fromcan, should, will, and∅.Note that the paraphrases do not contain the ad-jective from the target construction. In the case ofmultiple instances of the target construction withdiffering adjectives but the same noun and verb,we only solicited judgements for one instance, andused these judgements for the other instances. Inour dataset we observe that all instances obtainedfrom the same sentence which differ only with re-spect to their noun or verb have the same inter-pretation. We therefore believe that instances withthe same noun and verb but a different adjectiveare unlikely to differ in their interpretation.

Instructions:

• Read the sentence below.

• Based on your interpretation of that sen-tence, select the answer that most closelymatches your interpretation.

• Select “I don’t know” if neither answer isclose to your interpretation, or if you arereally unsure.

That success was accomplished in large part totight control on costs , and no cost is too smallto be scrutinized .

• Every cost can be scrutinized.

• No cost can be scrutinized.

• I don’t know.

Enter any feedback you have about this HIT. Wegreatly appreciate you taking the time to do so.

Table 1: A sample of the Amazon MechanicalTurk annotation task.

63

Page 4: No Sentence Is Too Confusing To Ignore

We also allowed the judges to optionally enterany feedback about the annotation task which insome cases—discussed in the following section—was useful in determining whether the judgesfound a particular instance difficult to annotate.2

For each instance of the target construction weobtained three judgements from unique workerson AMT. For approximately 80% of the items,the judgements were unanimous. In the remainingcases we solicited four additional judgements, andused the majority judgement. We paid $0.05 perjudgement; the average time spent on each annota-tion was approximately twenty seconds, resultingin an average hourly wage of about $10.

The development data was also annotated bythree native English speaking experts (compu-tational linguists with extensive linguistic back-ground, two of whom are also authors of this pa-per). The inter-annotator agreement among thesejudges is very high, with pairwise observed agree-ments of 1.00, 0.90, and 0.90, and correspondingunweighted Kappa scores of 1.00, 0.79, and 0.79.The majority judgements of these annotators arethe same as those obtained from AMT on the de-velopment data, giving us confidence in the reli-ability of the AMT judgements. These findingsare consistent with those of Snow et al. (2008) inshowing that AMT judgements can be as reliableas those of expert judges.

Finally, we remove a small number of itemsfrom the testing dataset which were difficult toparaphrase due to ellipsis of the verb participatingin the target construction, or an extra negation inthe verb phrase. We further remove one sentencebecause we believe the paraphrases we providedare in fact misleading. The number of sentencesand of instances (i.e., noun-verb-adjective triples)of the target construction in the development andtesting datasets is given in Table 2. 160 of the 199testing instances (80%) have the “every” interpre-tation, with the remainder having the “no” inter-pretation.

4 Analysis of annotation

We now more closely examine the annotations ob-tained from AMT to better determine the extent to

2In other cases the comments were more humourous. Inresponse to the following sentenceIf you’ve ever yearnedto live on Sesame Street, where no problem is too big to besolved by a not-too-big slice of strawberry-rhubarb pie, thisis the spot for you, one judge told us her preferred types ofpie.

Dataset # sentences # instancesDevelopment 20 33Test 140 199

Table 2: The number of sentences containing thetarget construction, and the number of resulting in-stances.

which they are reliable. We also consider specificinstances of the target construction that are judgedinconsistently to establish some of the causes ofdisagreement.

One of the three experts who annotated the de-velopment items (discussed in Section 3.2) alsoannotated twenty items selected at random fromthe testing data. In this case two instances arejudged differently than the majority judgement ob-tained from AMT. These instances are given belowwith the noun, adjective and verb in the target con-struction underlined.

(4) When it comes to the clash of candidates onnational television, no detail, it seems, is toominor for negotiation, no risktoo smalltoeliminate.

(5) Lectures by big-name Wall Street felons willshow why no swindleris too bigto beattherap by peaching on small-timers.

For sentence (4), the AMT judgements were unan-imously for the “no” interpretation whereas theexpert annotator chose the “every” interpretation.We are uncertain as to the reason for this disagree-ment, but are convinced that the “every” interpre-tation is the intended one.

In the case of sentence (5), the AMT judge-ments were split four–three for the “every” and“no” interpretations, respectively, while the ex-pert annotator chose the “no” interpretation. Forthis sentence the provided paraphrases wereEv-ery swindler can beat the rap and No swindlercan beat the rap. If attention in the sentenceis restricted to the target construction—i.e.,noswindler is too big to beat the rap by peachingon small-timers—either of the “no” and “every”interpretations is possible. That is, this clausealone can mean that “no swindler is ‘big’ enoughto be able to beat the rap” (the “no” interpreta-tion), or that “no swindler is ‘big’ enough that they

64

Page 5: No Sentence Is Too Confusing To Ignore

are above peaching on small-timers” (or in otherwords, “every swindler is able to beat the rap bypeaching on small-timers”, the “every” interpreta-tion). However, the intention of the sentence as the“no” interpretation is clear from the referral in themain clause tobig-name Wall Street felons, whichimplies that “big” swindlers havenot beaten therap. Since the AMT annotators may not be devot-ing a large amount of attention to the task, theymay focus only on the target construction and notthe preliminary disambiguating material. In thisevent, they may be choosing between the “every”and “no” interpretations based on how cynical theyare of the ability (or lack thereof) of the Americanlegal system to punish Wall Street criminals.

We also examine a small number of examplesin the testing set which do not receive a clearmajority judgement from AMT. For this analysiswe consider items for which the difference in thenumber of judgements for each of the “every” andthe “no” interpretations is one or less This givesfour instances of the target construction, one ofwhich we have already discussed above, example(5); the others are presented below, again with thenoun, adjective, and verb participating in the targetconstruction underlined:

(6) Where are our priorities when we socarefully weigh costs and medical efficacy indeciding to offer a medical lifeline to theelderly, yet no amountof money is too greatto spendon the debatable paths we’ve takenin our war against terror?

(7) No neighborhoodis too remoteto diminishMr. Levine’s determination to discover andannounce some previously unheralded treat.

(8) No oneis too remoteanymore to beconcernedabout style, Ms. Hansensuggested.

In example (6) the author is using the target con-struction to express somebody else’s viewpointthat “any amount should be spent on the waragainst terror”. Therefore the literal reading ofthe target construction appears to be the “every”interpretation. However, this construction is be-ing used rhetorically (as part of the overall sen-tence) to express the author’s belief that “too muchmoney is being spent on the war against terror”,which is close in meaning to the “no” interpreta-tion. It appears that the annotators are split be-tween these two readings. For sentence (7) the

atypicality ofneighbourhood as the subject ofdi-minish may make this instance particularly diffi-cult for the judges. Sentence (8) appears to us to bea clear example of the “every” interpretation. Theparaphrases for this usage are “Everyone shouldbe concerned about style” and “No one should beconcerned about style”. In this case it is possiblethat the judges are biased by their beliefs aboutwhether one should be concerned about style, andthat this is giving rise to the lack of agreement.These examples illustrate that some of these us-ages are clearly complex for people to annotate.Such complex examples may require more contextto be annotated with confidence.

5 Model

To test our hypothesis that the interaction of the se-mantics of the noun, adjective, and verb in the tar-get construction contributes to its pragmatic inter-pretation, we represent each instance in our datasetas a vector of features that capture aspects of thesemantics of its component words.

WordNet To tap into general lexical semanticproperties of the words in the construction, weuse features that draw on the semantic classes ofwords in WordNet (Fellbaum, 1998). These bi-nary features each represent a synset in WordNet,and are turned on or off for the component words(the noun, adjective, and verb) in each instanceof the target construction. A synset feature is onfor a word if the synset occurs on the path fromall senses of the word to the root, and off other-wise. We use WordNet version 3.0 accessed usingNLTK version 2.0 (Bird et al., 2009).

Polarity Because of the observation that theverb in the target construction, in particular, hassome property of negativity in the “no” interpre-tation, we also use features representing the se-mantic polarity of the noun, adjective, and verbin each instance. The features are tertiary, repre-senting positive, neutral, or negative polarity. Weobtain polarity information from the subjectivitylexicon provided by Wilson et al. (2005), and con-sider words to be neutral if they have both positiveand negative polarity, or are not in the lexicon.

6 Experimental results

6.1 Experimental setup

To evaluate our model we conduct a 5-fold cross-validation experiment using the items in the test-

65

Page 6: No Sentence Is Too Confusing To Ignore

ing dataset. When partitioning the items in thetesting dataset into the five parts necessary for thecross-validation experiment, we ensure that all theinstances of the target construction from a singlesentence are in the same part. This ensures thatno instance used for training is from the same sen-tence as an instance used for testing. We furtherensure that the proportion of items in each class isroughly the same in each split.

For each of the five runs, we linearly scale thetraining data to be in the range[−1, 1], and ap-ply the same transformation to the testing data.We train a support vector machine (LIBSVM ver-sion 2.9, Chang and Lin, 2001) with a radial ba-sis function kernel on the training portion in eachrun, setting the cost and gamma parameters usingcross-validation on just the training portion, andthen test the classifier on the testing portion forthat run using the same parameter settings. Wemicro-average the accuracy obtained on each ofthe five runs. Finally, we repeat each 5-fold cross-validation experiment five times, with five randomsplits, and report the average accuracy over thesetrials.

6.2 Results

Results for experiments using various subsets ofthe features are presented in Table 3. We re-strict the component word—the noun, adjective, orverb—for which we extract features to those listedin column “Word”, and extract only the featuresgiven in column “Features” (WordNet, polarity, orall). The majority baseline is 80%, correspondingto always selecting the “every” interpretation. Ac-curacies shown in boldface are significantly betterthan the majority class baseline using a paired t-test. (In all cases where the difference is signifi-cant, we obtainp ≪ 0.01.)

We first consider the results using features ex-tracted only for the noun, adjective, or verb indi-vidually, using all features. The best accuracy inthis group of experiments, 87%, is achieved usingthe verb features, and is significantly higher thanthe majority baseline. On the other hand, the clas-sifiers trained on the noun and adjective featuresindividually perform no better than the baseline.These results support our hypothesis that lexicalsemantic properties of the component verb in theNo X is too Y to Z construction do indeed playan important role in determining its interpretation.Although we proposed that selectional constraints

from the verb would also lead to differing seman-tics of the nouns and adjectives in the two interpre-tations, our WordNet features are likely too sim-plistic to capture this effect, if it does hold. Beforeruling out the semantic contribution of these wordsto the interpretation, we need to explore whethera more sophisticated model of selectional prefer-ences, as in Ciaramita and Johnson (2000) or Clarkand Weir (2002), yields more informative featuresfor the noun and adjective.

Experimental setup % accuracyWord FeaturesNoun All 80Adjective All 80Verb All 87All WordNet 88All Polarity 80All All 88Majority baseline 80

Table 3: % accuracy on testing data for each exper-imental condition and the majority baseline. Ac-curacies in boldface are statistically significantlydifferent from the baseline.

We now consider the results using the WordNetand polarity features individually, but extracted forall three component words. The WordNet featuresperform as well as the best results using all fea-tures for all three words, which gives further sup-port to our hypothesis that the semantics of thecomponents of the target construction are relatedto its interpretation. The polarity features performpoorly. This is perhaps unsurprising as polarity isa poor approximation to the property of “negativ-ity” that we are attempting to capture. Moreover,many of the nouns, adjectives, and verbs in ourdataset either have neutral polarity or are not inthe polarity lexicon, and therefore the polarity fea-tures are not very discriminative. In future work,we plan to examine the WordNet classes of theverbs that occur in the “no” interpretation to try tomore precisely characterize the property of nega-tivity that these verbs tend to have.

6.3 Error analysis

To better understand the errors our classifier ismaking, we examine the specific instances whichare classified incorrectly. Here we focus on theexperiment using all features for all three com-ponent words. There are 23 instances which are

66

Page 7: No Sentence Is Too Confusing To Ignore

consistently mis-classified in all runs of the exper-iment. According to the AMT judgements, each ofthese instances corresponds to the “no” interpreta-tion. These errors reflect the bias of the classifiertowards the more frequent class, the “every” inter-pretation.

We further note that two of the instances dis-cussed in Section 4—examples (4) and (6)—areamong those instances consistently classified in-correctly. The majority judgement from AMT forboth of these instances is the “no” interpretation,while in our assessment they are in fact the “ev-ery” interpretation. We are therefore not surprisedto see these items “mis-classified” as “every”.

Example (8) was incorrectly classified in onetrial. In this case we agree with the gold-standardlabel obtained from AMT in judging this instanceas the “every” interpretation; nevertheless, thisdoes appear to be a difficult instance given the lowagreement observed for the AMT judgements.

It is interesting that no items with an “every” in-terpretation are consistently misclassified. In thecontext of our overall results showing the impactof the verb features on performance, we concludethat the “no” interpretation arises due to particularlexical semantic properties of certain verbs. Wesuspect then that the consistent errors on the 21truly misclassified expressions (23 minus the 2 in-stances discussed above that we believe to be an-notated incorrectly) are due to sparse data. Thatis, if it is indeed the verb that plays a major role inleading to a “no” interpretation, there may simplybe insufficient numbers of such verbs for traininga supervised model in a dataset with only 39 ex-amples of those usages.

7 Discussion

We have presented the first computational study ofthe semantically and pragmatically complex con-structionNo X is too Y to Z. We have developeda computational model that automatically disam-biguates the construction with an accuracy of 88%,reducing the error-rate over the majority-baselineby 40%. The model uses features that tap into thelexical semantics of the component words partic-ipating in the construction, particularly the verb.These results demonstrate that lexical propertiescan be successful in resolving an ambiguity pre-viously thought to depend on complex pragmaticinference over presuppositions (as in Wason andReich (1979)).

These results can be usefully situated withinthe context of linguistic and psycholinguistic workon semantic interpretation processing. Beginningaround 20 years ago, work in modeling of humansemantic preferences has focused on the extent towhich properties of lexical items influence the in-terpretation of various linguistic ambiguities (e.g.,Trueswell and Tanenhaus, 1994). While semanticcontext and plausibility are also proposed to playa role in human interpretation of ambiguous sen-tences (e.g., Crain and Steedman, 1985; Altmannand Steedman, 1988), it has been pointed out thatit would be difficult to “operationalize” the com-plex interactions of presuppositional factors withreal-world knowledge in a precise algorithm fordisambiguation (Jurafsky, 1996). Although not in-tended as proposing a cognitive model, the workhere can be seen as connected to these lines of re-search, in investigating the extent to which lexicalfactors can be used as proxies to more “hidden”features that underlie the appropriate interpreta-tion of a pragmatically complex construction.

Moreover, as in the approach of Jurafsky(1996), the phenomenon we investigate here maybe best considered within a constructional analy-sis (e.g., Langacker, 1987), in which both the syn-tactic construction and the particular lexical itemscontribute to the determination of the meaning of ausage. We suggest that a clause of the formNo X istoo Y to Z might be the (identical) surface expres-sion of two underlying constructions—one withthe “every” interpretation and one with the “no”interpretation—which place differing constraintson the semantics of the verb. (E.g., in the “no”interpretation, the verb typically has some “neg-ative” semantic property, as noted in Section 2.)Looked at from the other perspective, the lexicalsemantic properties of the verb might determinewhich No X is too Y to Z construction (and associ-ated interpretation) it is compatible with. Our re-sults support this view, by showing that semanticclasses of verbs have predictive value in selectingthe correct interpretation.

Note that such a constructional analysis ofthis phenomenon assumes that both interpretationsof these sentences are linguistically valid, giventhe appropriate lexical instantiation. This standsin contrast to the analysis of Wason and Reich(1979), which presumes that people are apply-ing some higher-level reasoning to “correct” anill-formed statement in the case of the “no” in-

67

Page 8: No Sentence Is Too Confusing To Ignore

terpretation. While such extra-grammatical infer-ence may play a role in support of language under-standing when people are faced with noisy data, itseems unlikely to us that a construction that is usedquite readily and with a predictable interpretationis nonsensical according to rules of grammar. Ourresults point to an alternative linguistic analysis,one whose further development may also help toimprove automatic disambiguation of instances ofNo X is too Y to Z. In the next section, we discussdirections for future work that could elaborate onthese preliminary findings.

8 Future Work

One limitation of this study is that the dataset usedis rather small, consisting of just 199 instancesof the target construction. As discussed in Sec-tion 3.1, the extraction process we use to obtainour experimental items has low recall; in particularit misses variants of the target construction such asNothing is too Y to Z andNo X is too Y for Z. Inthe future we intend to expand our dataset by ex-tracting such usages. Furthermore, the data usedin the present study is primarily taken from newstext. While we do not adopt the view of some thatusages of the target construction having the “no”interpretation are errors, it could be the case thatsuch usages are more frequent in less formal text.In the future we also intend to extract usages ofthe target construction from datasets of less formaltext, such as blogs (e.g., Burton et al., 2009).

Constructions other thanNo X is too Y to Z ex-hibit a similar ambiguity. For example, the con-structionX didn’t wait to Y is ambiguous between“X did Y right away” and “X didn’t do Y at all”(Karttunen, 2007). In the future we would like toextend our study to consider more such construc-tions which are ambiguous due to the interpreta-tion of negation.

In Section 4 we note that for some instances thecomplexity of the sentences containing the targetconstruction may make it difficult for the anno-tators to judge the meaning of the target. In thefuture we intend to present simplified versions ofthese sentences—which retain the noun, adjective,and verb from the target construction in the orig-inal sentence—to the judges to avoid this issue.Such an approach will also help us to focus moreclearly on observable lexical semantic effects.

We are particularly interested in further explor-ing the hypothesis that it is the semantics of the

component verb that gives rise to the meaning ofthe target construction. Recall Pullum’s (2004)observation that the verb in the “no” interpretationinvolves explicitlynot acting. Using this intuition,we have informally observed that it is largely pos-sible to (manually) predict the interpretation of thetarget construction knowing only the componentverb. We are interested in establishing the extent towhich this observation holds, and precisely whichaspects of a verb’s meaning give rise to the inter-pretation of the target construction.

Our current model of the semantics of the targetconstruction does not capture Wason and Reich’s(1979) observation that the compositional mean-ing of instances having the “no” interpretation isnon-pragmatic. While we do not adopt their viewthat these usages are somehow “errors”, we dothink that their observation can indicate other pos-sible lexical semantic properties that may help toidentify the correct interpretation. Taking the clas-sic example from Wason and Reich,no head in-jury is too trivial to ignore, one clue to the “no”interpretation is that generally a head injury is notsomething that is ignored. On the other hand, con-sidering Wason and Reich’s exampleno missile istoo small to ban, it is widely believed that missilesshould be banned. We would like to add featuresthat capture this knowledge to our model.

In preliminary experiments we have used co-occurrence information as an approximation tothis knowledge. (For example, we would expectthat head injury would tend to co-occur less withignore than with antonymous verbs such astreator address.) Although our early results usingco-occurrence features do not indicate that theyare an improvement over the other features con-sidered (WordNet and polarity), it may also bethe case that our present formulation of these co-occurrence features does not effectively capturethe intended knowledge. In the future we planto further consider such features, especially thosethat model the selectional preferences of the verbparticipating in the target construction.

These several strands of future work—increasing the size of the dataset, improving thequality of annotation, and exploring additionalfeatures in our computational model—will en-able us to extend our linguistic analysis of thisinteresting phenomenon, as well as to improveperformance on automatic disambiguation of thiscomplex construction.

68

Page 9: No Sentence Is Too Confusing To Ignore

Acknowledgments

We thank Magali Boizot-Roche and TimothyFowler for their help in preparing the data for thisstudy. This research was financially supported bythe Natural Sciences and Engineering ResearchCouncil of Canada and the University of Toronto.

References

Steven Abney. 1991. Parsing by chunks. In RobertBerwick, Steven Abney, and Carol Tenny, ed-itors, Principle-Based Parsing: Computationand Psycholinguistics, pages 257–278. KluwerAcademic Publishers.

Gerry T. M. Altmann and Mark Steedman. 1988.Interaction with context during human sentenceprocessing.Cognition, 30(3):191–238.

Steven Bird, Edward Loper, and Ewan Klein.2009. Natural Language Processing withPython. O’Reilly Media Inc.

Lou Burnard. 2000. The British National Cor-pus Users Reference Guide. Oxford UniversityComputing Services.

Kevin Burton, Akshay Java, and Ian Soboroff.2009. The ICWSM 2009 Spinn3r Dataset. InProc. of the Third International Conference onWeblogs and Social Media. San Jose, CA.

Chih-Chung Chang and Chih-Jen Lin. 2001.LIB-SVM: a library for support vector machines.Software available athttp://www.csie.ntu.edu.tw/ ˜ cjlin/libsvm .

Massimiliano Ciaramita and Mark Johnson. 2000.Explaining away ambiguity: Learning verb se-lectional preference with Bayesian networks. InProceedings of the 18th International Confer-ence on Computational Linguistics (COLING2000), pages 187–193. Saarbrucken, Germany.

Stephen Clark and David Weir. 2002. Class-basedprobability estimation using a semantic hier-archy. Computational Linguistics, 28(2):187–206.

Michael Collins. 2003. Head-driven statisticalmodels for natural language parsing.Compu-tational Linguistics, 29(4):589–637.

Stephen Crain and Mark Steedman. 1985. Onnot being led up the garden path: The useof context by the psychological syntax pro-cessor. In David R. Dowty, Lauri Karttunen,and Arnold M. Zwicky, editors,Natural lan-guage parsing: Psychological, computational,and theoretical perspectives, pages 320–358.Cambridge University Press, Cambridge.

James Curran, Stephen Clark, and Johan Bos.2007. Linguistically motivated large-scale NLPwith C&C and Boxer. InProceedings of the45th Annual Meeting of the Association forComputational Linguistics Companion VolumeProceedings of the Demo and Poster Sessions,pages 33–36. Prague, Czech Republic.

Christiane Fellbaum, editor. 1998.Wordnet: AnElectronic Lexical Database. Bradford Books.

Daniel Jurafsky. 1996. A probabilistic model oflexical and syntactic access and disambigua-tion. Cognitive Science, 20(2):137–194.

Lauri Karttunen. 2007. Word play.ComputationalLinguistics, 33(4):443–467.

Ronald W. Langacker. 1987. Foundations ofCognitive Grammar: Theoretical Prerequisites,volume 1. Stanford University Press, Stanford.

Mark Liberman. 2009a. No detail too small.Retrieved 9 February 2010 fromhttp://languagelog.ldc.upenn.edu/nll/ .

Mark Liberman. 2009b. No wug is toodax to be zonged. Retrieved 9 February2010 from http://languagelog.ldc.upenn.edu/nll/ .

Geoffrey K. Pullum. 2004. Too complex toavoid judgment? Retrieved 7 April 2010 fromhttp://itre.cis.upenn.edu/ ˜ myl/languagelog/ .

Evan Sandhaus. 2008. The New York Times An-notated Corpus. Linguistic Data Consortium,Philadelphia, PA.

Rion Snow, Brendan O’Connor, Daniel Jurafsky,and Andrew Y. Ng. 2008. Cheap and fast — Butis it good? Evaluating non-expert annotationsfor natural language tasks. InProceedings ofEMNLP-2008, pages 254–263. Honolulu, HI.

John Trueswell and Michael J. Tanenhaus. 1994.Toward a lexicalist framework for constraint-based syntactic ambiguity resolution. InCharles Clifton, Lyn Frazier, and Keith Rayner,editors, Perspectives on Sentence Processing,pages 155–179. Lawrence Erlbaum, Hillsdale,NJ.

Peter Wason and Shuli Reich. 1979. A verbal il-lusion. The Quarterly Journal of ExperimentalPsychology, 31(4):591–597.

Theresa Wilson, Janyce Wiebe, and Paul Hoff-mann. 2005. Recognizing contextual polarity inphrase-level sentiment analysis. InProceedingsof HLT/EMNLP-2005, pages 347–354. Vancou-ver, Canada.

69