Upload
abner-barker
View
219
Download
1
Embed Size (px)
Citation preview
Toward the Linking of Text Annotations, the FrameNet Lexicon, and an Intended
Future Constructicon
CJFillmore
Berkeley
Change of Emphasis• Departing slightly from promises made in the
abstract, I’ll be adding some discussion on– what it would take to discover and record the
constructions found in a large English Text that is also lexically annotated, in “Frame Semantics” terms, and
– how one could construct an Open Source online directory of partial descriptions of grammatical constructions for English,
• without ignoring the promised concern for– indicating in Lexical Entries information about the
constructions in which the words participate, and– indicating in Construction Entries, information
about the lexical items that participate in them.
• This obviously requires constructing a single articulated database that includes text annotations, a frame-based Lexicon, and a register of constructions - a Constructicon.
“FrameNet”
• Our goal in FrameNet is to document the use and meaning of lexical units in English - especially “frame-bearing” words - by careful examination of attested examples taken from a very large text Corpus.
• This means we need to find good examples of each of the words we describe, and that requires some attention.
Criteria for choosing examples
• FrameNet lexicographers are told that when they choose examples for illustrating the meaning and use of lexical units,– The example sentences should be structurally
simple.– Their lexical content should illustrate the
semantic frames they realize.– Enough examples should be collected to illustrate
all of each word’s valences - its basic combinatorial affordances.
Why use simple examples?
Suppose we’re working on the verb accuse.• Simple example:
– Their publisher accused me of plagiarism.
• Complex example:– Plagiarism is something I would hate to be
accused of.Point: The second is a perfectly good example of an English sentence, but its complexity has nothing to do with relevant facts about the verb accuse. It would not be a good dictionary example of the verb.
Finding “Frame Elements”• For words in the frame containing accuse, the
annotations we produce recognize three main roles, and our job is to show how these are expressed in sentences headed by the verb. We can refer to these three roles as– accuser [the person who does the accusing]– accused [the person accused of wrongdoing]– charge [the offense]
Their publisher accused me of plagiarism.Plagiarism is something I would hate to be accused of. (unexpressed)
Why “frame relevant” contexts?
They accused me of it.Pronouns don’t tell us much about what is going on in this sentence. Our examples are always single sentences, and even if we could find the antecedents of they and it in the surrounding text that would not tell us much about the verb itself.
Their publisher accused me of plagiarism.This has more information about the context of an accusation and provides information about the charge.
Why representative?
• We want examples of each valence possibility we discover. The verb accuse has VPs of two types:
V + NP + PP[of NP]– They accused me of theft.
• burglary, arson, perjury, murder
V + NP + PP[of VP-ing]– They accused me of stealing their car.
• lying to the judge, • killing their dog,• insulting their mother
V + NP
Problems with the criteria• Most “simple” sentences illustrating the use of a verb
are not frame-revealing, since the arguments are mainly pronouns.
• Sentences in which all of the frame-relevant elements are expressed in a single clause are unnatural-sounding -- the kinds of sentences linguists and psycholinguists make up. (“The publisher accused the author of plagiarism.”)
• Many words do not occur often enough (even in a very large corpus) to provide simple and clear examples of all of their affordances.
Full Text Annotation• In general, for our lexicographic work, we tried to
steer clear of syntactically complex structures, while knowing that we were missing the possibilities of giving good explanations of certain lexical units.
• For reasons related to the interests of our later funders, FrameNet activities have moved from “mere” lexicon building, with the use of a vast Research Corpus, to the annotation of continuous texts, letting the examples found there provide material for lexical analysis.
• This means that we now have to deal with– mistakes– ambiguities– sentence fragments– repetitions– and - especially -
“non-core” grammatical constructions
Constructions?
• If we’re going to start dealing with constructions in our work, we need strategies and principles for – recognizing a construction when we see one, – discovering and recording its properties, and – convincing ourselves and our colleagues that what
we’ve found really does need the kind of description and explanation that requires the positing of a special construction.
• As grammarians, we feel the need to incorporate each new construction within a consistent and coherent generative construction grammar; but as text analysts, we can be (temporarily) satisfied with partial descriptions.
• This is normal linguistics: we’ve always been able to recognize (clear cases of), say, the “tough construction,” but it’s taking forever to come up with a satisfying account of it.
The Strategy• If you find something that looks as if it can’t be
described within the framework provided by the current state of your theory, keep trying to make it fit.
• If you have to give up, then try to see it, not as a lonely idiom, but as an instance of some general grammatical phenomena, and explore such phenomena as thoroughly as you can.
• If nothing works, then call it an idiom and add it to the lexicon - at least for now.
Valence and Grammar• Familiar FrameNet valences presuppose a portion of
the basic grammar. • That is, information they provide about grammatical
functions (subject, object, complement, head, modifier, determiner, etc.) are taken as meaning that we know how these words behave in sentences built up with such construction types as predication, complementation, modification, determination, and the like.
• [ILLUSTRATE WITH “accuse”]• Comment on that word “core”.
THE PLAN OF THIS TALK1. To examine a few construction types.
a) one that has fixed slots and fixed words
b) one that’s pure syntactic form
c) one whose properties are mostly hidden
2. To suggest ways of connecting lexical and constructional information.
3. To suggest ways of annotating texts for their constructions.
4. To propose cooperatively building a public online construction registry for English.
Case: next week
• My account will be a little fussy, since I want to illustrate the reasons for deciding that something is a construction, and the need to look for its “boundaries.” So, suppose you come upon the phrase next week in a sentence like
Let’s finish this job next week.
1a
Case: next week
• First impression: What’s the problem?– This is a case of simple modification:
adjective next + noun week
• But wait!– why doesn’t next week have an article?– why doesn’t it come with a preposition?– why does it mean what it means?
1a
What does it mean?• The phrase next week, by itself, refers to the
calendar week which comes immediately after the calendar week which includes ‘now’, i.e., the moment of speaking.
• It is a deictically anchored time expression.• Compare it to the next week. This phrasing is
anaphorically anchored and is much more regular.
1a
Is it a simple idiom?• If it’s an idiom, just add it to the lexicon and look for a
more interesting problem.• But wait! We find completely analogous
interpretations with– next month– next year– next semester
• So maybe it’s a construction that uses the word next followed by a noun naming a temporal period.
1a
Restrictions• It works fine with week, month, year, and a
few special words like semester, but – it doesn’t work with day:
*next day
– and it doesn’t seem to work with calendric units that are too big to figure in the life experiences of a single individual:
*next millennium
• So we have to formulate all these restrictions too. (Maybe.)
1a
Wait! We’re not finished.• There are semantically and formally analogous
patterns that use, instead of next, the words this and last, -and they too are deictically anchored expressions, -and they too exclude day.– this X:
the X which contains ‘now’this week, this month, this year, *this day
– last X: the X which precedes the X containing ‘now’
last week, last month, last year, *last day
1a
What have we got so far?• Special use of this, next and last.
– notice: this is a demonstrative, next and last are adjectives
• combining, without prepositions or articles, with specific words that name calendric time periods
• forming meanings that relate these time periods as identical to, following, or preceding, the named period containing ‘now’.
1a
Descriptive Choices• We could state the conditions for the
construction as generally as possible, – regarding the exclusion of the day unit as
explained by a pre-emption:
in order to express these meanings, the words today, yesterday and tomorrow are required,
1a
– regarding the exclusion of century and millennium by describing the function of the construction in terms of the practical limits of human planning, and
– regarding the inclusion of non-calendric terms like semester or hour (meaning ‘class hour’ in a school setting), as an exploitation of the system, something that might not need to be described in the grammar.
1a
Are we there yet?• No. Here are some more facts about these words:• If we want to talk about the X that follows next X, or
the X that precedes last X, we say:– the X after next– the X before last
• Notice that here the words next and last, by themselves, mean next X and last X– the week after next, the week before last– the month after next, the month before last– COMPARE:
the day before yesterday, the day after tomorrow
1a
And there’s still more.• The words this, last and next also occur with
the names of members of temporal cycles, like – weekday names (Monday, etc.), – month names (January, etc.), – season names (summer, etc.), and – day part names (morning, etc.)
1a
• And these have regular but complicated interpretations: – last Friday is ‘the Friday of last week’; – next summer is ‘the summer of next year’; – this March is ‘the March of this year’, – last night is ‘the night of yesterday [last day]’.
and there are various extensions, pre-emptions, exceptions.
1a
Conclusions so far• We have here a family of constructions that
make clear use of particular lexical items, in particular combinations, having semantic interpretations that do not follow from anything else that we know about the grammar of English, which combine with words of particular semantic types.
1a
– A lexicon of English has to show that these words can have these functions in these constructions.
– A constructicon of English has to show what words, or classes of words, can participate in each of its constructions for which lexical membership is specified.
– Text annotations for English should link each word to the relevant lexical entry, and each construction instance should be linked to the relevant entry in a constructicon.
Intermediate Cases• There are lots of constructions that people
(some of them in this room) have described that have both lexical and grammatical-pattern requirements.
Traditional and Special• Questions, imperatives, relative clauses,
comparatives - each of these with many types.• Serial verbs (Goldberg), WXDY (Kay +), Let_alone
(Fillmore +), MadMagazine (Lambrecht), Presentatives (Lakoff), Nominal Extraposition (Michaelis +), Way Construction (Goldberg +), Away Construction (Jackendoff), Correlative Conditional (Lots of people), Tautologies (Wierzbicka), Just Because (Hirose, Bender & Kathol), and dozens more.
Adjective Negation with “no”
• It seems that the only adjectives that can be “negated” with no are fair, good, and different.
• And these seem to be different from the structure that has no modifying a comparative adjective:– no bigger than a bug– no taller than my baby sister– *no older than Methuselah– *no younger than Chuck
Presentatives
• Here comes Harry, wearing my shirt.• Here he comes, wearing my shirt.
• First part: here or there• Second part: V+NP or Pron+V• Verb: go, come, be, sit, stand, lie, hang• Third part (optional): secondary predicate
By contrast,
• There are some constructions that have no specified lexical components.
• One of these is “Right Node Raising”, so-called. We might want to call it the Shared Completion Construction.
• Description– a final phrase “completes” each of two truncated phrases, – these connected by some kind of conjoining or adjoining
device– associated with paired foci
1b
PrecedingContext
Part-1 Part-2 Completion
0 1 2 3
y or x…y is a conjoining (adjoining, subjoining) deviceform is 0+(x)+1+y+2+31 and 2 offer paired fociInterpretation: 0+1+3 {and} 0+2+3 where ‘{and}’ isthe meaning of the conjoining device
I wouldn’t touch let alone eat anything that ugly.
(x) y
1b
preceding context I wouldn’t
pre-conjunction -
first trunc phrase touch
conjunction let alone
second trunc phrase eat
completion anything that ugly
1b
preceding context -
pre-conjunction -
first trunc phrase I cooked
conjunction and
second trunc phrase she ate
completion the soup
1b
preceding context Libya has
pre-conjunction -
first trunc phrase shown interest in
conjunction and
second trunc phrase taken steps to acquire
completion weapons of mass destruction
1b
preceding context thus
pre-conjunction -
first trunc phrase reducing
conjunction if not
second trunc phrase eliminating
completion the chances of detection
1b
preceding context to pursue
pre-conjunction not only
first trunc phrase a more cost-effective
conjunction but also
second trunc phrase possibly the only real
completion line of defense against these threats
1b
Conjunctions
• “Conjunctions” observed participating in this construction include:– and, or, but, both…and, either…or,
not_only…but_also, if_not, but_not, if_even, rather_than, instead_of, let_alone,
1b
Conclusions so far
• This last construction seems to operate very generally, interacting with almost any grammatical device that permits the expression of contrasting foci.
• There appears to be no reason to associate this construction with anything in the lexicon.
• And it doesn’t seem possible to blame the construction itself for the meaning of any of the expressions it inhabits. It’s pure syntax.
1b
A “hidden” construction?• Now here’s something I think is a special
construction, but it’ll be hard to convince most of my grammarian friends.
• Components:A. a predicate with meaning related to ‘having’B. the word “the”C. a noun construable as the name of a resourceD. an infinitive complement controlled by whoever is
interpreted as the subject of the ‘having’ relation, or alternatively a Purpose phrase with “for”
1c
Examples• I don’t have the money to take a vacation.• We lack the staff to take on such a project.• Where can I find the cash to buy something
that expensive?• Do we have the resources to manage that?• We don’t have the fuel to make it to the next
town.• Who’ll give us the funds to do that?
1c
(verb with ‘having’ semantics)
• I don’t have the money to take a vacation.• We lack the staff to take on such a project.• Where can I find the cash to buy something
that expensive?• Do we have the resources to manage that?• We don’t have the fuel to make it to the next
town.• Who’ll give us the funds to do that?
1c
(noun construable as resource)
• I don’t have the money to take a vacation.• We lack the staff to take on such a project.• Where can I find the cash to buy something
that expensive?• Do we have the resources to manage that?• We don’t have the fuel to make it to the next
town.• Who’ll give us the funds to do that?
1c
(complement controlled by ‘haver’)
• I don’t have the money to take a vacation.• We lack the staff to take on such a project.• Where can I find the cash to buy something
that expensive?• Do we have the resources to manage that?• We don’t have the fuel to make it to the next
town.• Who’ll give us the funds to do that?
1c
Mystery• The construction allows us to explain the fact that the
sequence [the N to VP] is not a self-standing constituent, having a bounded meaning independent of its context.
• Evidence*I lost [the money to take a vacation].*We spilled [the fuel to get us to the next town].*She just fired [the staff to complete the project].
1c
Observation• The infinitive complement can be omitted under
conditions of definite zero anaphora (“definite null instantiation” in FrameNet terms).
• Usually DNI is possible only when it corresponds to an argument of some lexical unit.– We lost.– I’ve got an explanation. – Let me explain.– Who’s the father?– When did they arrive?
1c
DNI without a lexical host?
– Are you going to take on the new project?--No, can’t. We lack the staff.
– Can you drive me to Tokyo?--Sorry, I don’t have the fuel.
– Can you join us in the trip to Hawaii?--Where am I going to find the cash?
– Do you think he’s ready to face down the boss?--Nah, he doesn’t have the guts.
1c
Mystery• This construction allows us to explain
the definite NP in DNI-omitted cases. Consider the ambiguity of
I don’t have the cash.– Situation 1: there is some contextually
understood amount of cash– Situation 2: complement omitted in
reference to some contextually understood use to which the cash could be put
1c
Claims• Many of the properties of this construction are
shared by enough in place of the, though enough has possibilities not shared by the.
• Evidence– We don’t have {enough/the} money to do that.
We don’t have {enough/the} money for such a project.We don’t have {enough/the} money.
1c
Is there a lexical solution?
• Well, we could say that the, like enough, is a lexical item that participates in a discontinuous modifier of a noun: – {enough/the}…to pay for a vacation, – {enough/the} … for that project.
1c
How to study constructions in texts
• A kind of annotation currently used in FrameNet can be adapted for identifying the phrasal and lexical members of grammatical constructions, and new software can be created for linking such identifications to named entities in a growing constructicon.
There he was, standing in the snow with no clothes on.
LU stand: There [he] was, standing [in the snow] [with no clothes on].
Cx#373112: [There] [he] [was], [standing in the snow with no clothes on].
Cx#764632: There [he] was, standing in the snow [with] [no clothes] [on].
Toward an Open Source Construction Inventory
• We’re going to seek funding for adding construction analyses to the texts we annotate, hoping to end up with an inventory of the kinds of constructions that can be found in the texts we’re working with.
• We want, of course, to be able to give names to the constructions, and to index examples from the annotations to the constructicon itself.
• The description of the constructions in the constructicon will indicate what words, or what semantic or morphological classes of words, are privileged participants in the construction, and we’ll want the lexicon to indicate for each such word which constructions it is available for, and what semantic or morphological class it belongs to.
• Since we’ll want to be doing this anyway, I’m hoping to include in the proposal the means of having people from outside the project contribute construction descriptions of their own, because otherwise there’s no chance that we’ll end up with a rich enough collection to be of use to the research community.
• The criteria for submission would include providing one or more clear examples, from attested and documented sources, together with the contributor’s observations about the details of the construction.
• This would require building an online tutorial to explain what we’re doing, providing a web form that would cover the things that are needed in a construction entry, including a place for prototypical examples.
• As a “wiki”-like resource, it will be possible for other (authorized) contributors to critique and supplement existing descriptions at any time, and there will be a forum for discussions of the individual constructions and their properties.
• This is all a dream, of course, but wouldn’t it be nice if we could pull it off?
t h a n k y o u