Artificial Intelligence Review (1987) 1, 139-157
What we say and what we mean
A. Ramsay Cognitive Studies Program, University of Sussex, Falmer BNI 9QN, UK
Abstract. The general problem of natural language processing has not been solved, and may never be. Nonetheless, there are now a number of well- known techniques for certain aspects of the task; and there is a certain amount of agreement about what other problems need to be tackled, if not about how to tackle them. The current paper gives a survey of what we do know, and indicates the areas in which further progress remains to be made.
Linguistic and non-linguistic knowledge
Understanding and generating natural language seems to call upon two sorts of knowledge. It requires a substantial amount of knowledge about language itself - what strings of sounds and letters are in fact words of the given language, what sequences of words are well-formed sentences, how do different word-orders encode different messages, and so on. And it requires a similar amount of knowl- edge about the world in general - what sorts of things can a listener generally be expected to know in advance so they need not be mentioned explicitly, what can the partners in a conversation reasonably be expected to infer from what has been said already, what names do people use for referring to things and each other, and so on. These two sorts of knowledge might be seen as being concerned with what we say, and what we mean by it, respectively. The purely linguistic knowledge specifies how strings of sounds or characters encode messages; the general world knowledge is used for working out what actions are appropriate given the message encoded by what we have just heard or read.
They also correspond, to some extent, to a split between what we do and do not know how to make computers do. I would not want to suggest that the linguistic questions are solved, or anything like it, but there do certainly seem to be some theories about what needs to be done and how to do it. We have computer programs which, for non-trivial fragments of the lower levels of language processing, seem to work. As far as the use of world knowledge is concerned, we are in a much weaker position. We do know something about what needs to be done, about what sort of knowledge is required and when, but we have very few practical theories about how to deliver it. This will become apparent as we progress through our survey of the components of natural language processing systems (NLP systems).
140 A. Ramsay
The first thing we need to consider in the design of a NLP system is its architecture. There is a long tradition in linguistics that the best way to describe the rules of language is in terms of layers of rule sets. The particular layers are sometimes disputed, and in any case different layers are required for speech and text process- ing, but there is widespread agreement among linguists that some sort of layered approach is appropriate. A typical diagram of the layers would look something like Table 1.
Table 1. Linguistic levels
Level Subject matter
Lexical analysis Morphological analysis Syntax Semantics Discourse rules World knowledge
Words and word endings The significance of word endings Rules about word order Relationships and identities What can you say when What can you assume
Linguistics concerns itself with rules which operate within levels and ones which connect levels together. The first class would include rules which rule out strings as being possible words of English (theres no English word in which h follows a vowel, for instance), or possible inflections of English words (ing can only be added as a suffix to verbs), or legal word orders (the must not be followed by a verb), or meaningful sentences (eat must have an animate subject), or legitimate things to say [You cant refer to the blue block unless there is a unique blue block in the context) or reasonable assumptions to make. The second class uses the structures which are implicitly referred to in the first in order to show how choices about which words to use, or what inflections to give them, or what order to say them in, can be used to encode messages.
These levels of description have an obvious implication for the design of NLP systems. If people who are interested in describing what they see real language users doing find it convenient to talk in terms of a series of levels of analysis, is it not likely that this is going to be the easiest way to build artificial language users? Following this line of argument, we might expect to see NLP systems built up out of lexical, morphological, syntactic and so on components. Certainly all the tasks implied by this separation have to be performed: different words, or word endings, or word orders all encode different messages, and all these encodings must be decoded if we want to understand an utterance or a text. The decision to have rules which deal with different levels does not, however, entail any particular way of organizing the communications between the various components. It is clear that any implementation which assumes that there is a simple uni-directional flow of information from the top to the bottom of Table 1 (for comprehension), or from the
What we say and what we mean 141
bottom to the top (for generation), is simply not going to work. At any level there will be alternative readings of the data which cannot be disambiguated (compre- hension), or choices which cannot be settled (generation), for which the appro- priate information is instantly available at some other level. There are abundant examples of low-level ambiguities which can easily be resolved by the use of high level information, from the choices of meaning for bat in Some bats are believed to suck blood and I left my bat in the changing room to the well-known problem of choosing a referent for they in The councillors refused the women a permit because they feared violence and The councillors refused the women a permit because they advocated violence. Many NLP systems acknowledge that there is a problem here, but then ignore it and simply require the lower level system to backtrack and offer alternative analyses when the higher levels request them. There are two reasonable alternatives. One is to use a neutral working area for components to store the results of their analysis. Such an area, often termed a blackboard after Erman and Lessers (1975) work on speech processing, can be used for offering partial results and hypotheses to higher level components, or even for asking them explicit questions about ambiguities. The blackboard is an appeal- ing metaphor, but there are serious problems about implementing such a system, since decisions have to be made about the format of entries on the blackboard, its integrity as different components add and delete messages, and the control over the resources which should be available to each component. A major variant on the use of a neutral area is to embody all rules, of whatever level, as a single set of productions firing on states of a single database or working memory, all controlled by a single scheduling algorithm. Systems of this sort, such as Riesbecks (1978) conceptual analyser, can indeed mix rules at different levels, but it seems very difficult to provide them with rules of the required degree of sophistication. The alternative to using a neutral area is to try to carry ambiguities around in the form of constraints. This approach, which follows on from problem-solving work by Suss- man and Steele (1980) and Stefik (1981), takes, for instance, the view that the definition of bat as a wooden implement for striking balls is a constraint on the object being referred to by the word. When this constraint is added to other constraints on the object referred to in the text, it should help identify a referent - some particular wooden implement. Under this interpretation, a lexical ambiguity, such as the one between wooden implements and flying mice, need not be regarded as a problem at all. The description ( (X is a flying mouse) or (X is a wooden implement) ) and (X is something which I might leave in a changing room) is nearly always going to identify a single object, which will in all probability be a wooden implement. But we do not need to decide at the time when we recognize the word bat which of its alternative meanings is going to be the one that fits the relevant object - we just need to know how to use disjunctive descriptions when trying to find referents.
None of these ways of organizing a system seems entirely satisfactory. Using constraints has considerable appeal if we can do it, since it leaves us with a simple architecture (just work straight through from top to bottom or bottom to top, as appropriate). We do, however, have to work out how to use lexical entries, parse
142 A. Ramsey
trees, semantic representations, and so on as constraints. This is not the usual way they are used, and it may be difficult to construct programs that do it. The blackboard suffers from being much more difficult to program than might be expected at first sight, especially in terms of designing a resource allocation algorithm. To require lower levels to backtrack when high levels fail is unaccept- ably inefficient. Of the three options, this author inclines towards the first, but any system designer will have to make his own choice.
Given some more or less satisfactory answer to the problem of organizing the system as a whole, we have to consider each of its parts in detail. There are various options about what components will be necessary, but a breakdown into com- ponents corresponding to the levels in Table 1 is as typical as any. It should be noted that this is offered as a reasonable architecture for a system for understanding and interpreting English. It is well known that different languages lay different emphasis on different levels: word order is less important in Finnish than in English, word structure is more important in German than in English, and so on. These changes may indicate that the levels in our diagram might need to be merged or split for different languages, though it seems unlikely that there will be very radical changes. In the remainder of the paper we will be concerned entirely with English, and hence will consider computational models of lexical, morphological, syntactic, semantic and pragmatic processing for English. We will not go deeply into the details of algorithms, but will rather consider the extent to which algor- ithms for comprehension and generation actually exist.
Lexical processing here means the mechanisms which add inflections to lexical items to produce acceptable surface forms, and which recognize surface forms to be instances of inflected lexical items, To put it more concretely, the task is to recognize that if you add the ending -ing to the word recognize, you should end up with recognizing rather than recognizeing: and that the form recognizing, is derived from the word recognize, not from some putative form recogniz. English has a fairly simple set of rules which cover a very large proportion of cases, plus a number of words which are extremely irregular, even idiosyncratic, in the surface forms they show for various endings. Many of these are clearly inheritances from other languages which the words were imported from, and it might be possible to discover underlying regularities if we knew, for every such word, where it had come from. For example, -ought is clearly a regular past form, of which sought, bought, fought and perhaps caught are instances. There is, however, no way of predicting which words will follow this pattern rather than the usual one of adding -ed, nor of discovering the form of the root (seek, buy, fight, catch) to which the ending was added. It is thus usual to provide rules which capture all the standard cases, and to store all the idiosyncrasies explicitly.
What we say and what we mean 143
The basic rules for recognizing inflections are very simple. The simplest way to write them down is as a set of productions such as the following:
IF the ending is -ing THEN replace it by -e and look in the dictionary
Ramsay and Barrett (1987) give a set of about 15 such rules which cover a very large percentage of English word ending changes. The task for adding inflections is slightly harder, but is still very largely a matter of applying simple, regular rules. The way inflections are added often involves inspecting several of the terminal letters of the word being inflected to see whether they belong to particular classes such as vowels or consonants, for instance:
IF the last letter of the word is y, and previous letter is a consonant, and the first letter of the suffix is e
THEN replace the y by i and add suffix.
Production rules seem rather clumsier as a way to specify how to add endings than as a way of recognizing them. It seems that we need rather more rules, though still not an outrageous number, and that many of the rules are in fact only minor variants of each other, each being included to cover some small set of cases. Kay (1983) and others have argued that a neater way to represent the knowledge required for addition of words endings is as a finite state transducer. This is a simple machine which can be in any of a finite set of states, and which changes its state depending on (i) the state it is currently in, (ii) the state of the word to which the ending is being added, and (iii) the state of the partially constructed surface form. Whatever the details of the implementation, it is clear that for the majority of cases a very simple set of rules will suffice; that for adding suffixes these rules can be used deterministically; and that for pathological cases such as caught and were no rules can possibly be developed.
Grammars. Syntax is the study of structural patterns in word order and structure. It is the area of language which has received the greatest level of attention in linguistics in the last 30 years. It does indeed seem to be special. It is the first place at which human language diverges in principle from the sign systems of other animals, and it is the main means by which human language can denote arbitrarily complex relationships between entities, rather than just referring to objects and homogeneous states of affairs. It is also the area in which most work has occurred in computer systems for language processing. This is unsurprising. The lower levels, though non-trivial, can be dealt with adequately by the sort of rules described above, at least for text. The higher levels cannot even be started on until the syntactic processing is completed, since the relationships that they are concerned with are denoted by structural properties of the input, sometimes by very subtle structural properties.
In both linguistics and work on NLP s...