22
CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

Embed Size (px)

Citation preview

Page 1: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Languages: Natural and Formal

Page 2: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Language

• Definition– In math and computer science:

• A lexicon & rules for combining terms from the lexicon

– In common use:• Structured verbal interaction between people• Any structured interaction such as “The

Language of Film”

• Are computer languages a model for human natural language?

Page 3: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Wide Variability among Natural Languages

• Sentence Structure– SVO (Subject-Verb-Object) (English, Chinese)– OVS (Gaelic/Celtic)– SVO (Hindi, Japanese, Hopi)

• Written– Ideographic (Chinese), – Syllabic (Thai), – Alphabetic (English)

• Spoken– Tonal (Chinese) – Non-tonal (English)

Page 4: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Layers of Natural Language

• Words– Morphology, Orthography, Phonetics, Phonology

• Syntax – Phrase and sentence structure based on parts of speech

• Semantics – Literal meaning

• Pragmatics/Discourse – Uses beyond the literal meaning

Page 5: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Grammars

• Grammars are most often associated with modeling syntax though semantic grammars are also possible. In the broadest sense, grammars are rules for languages

• The most general grammars are “context-free”. That is, the structure does not depend of the context.

• The grammars used for natural language syntax are usually “constituent grammars”. That is they identify the relationship of the components (constituents) of the phrase.

• Grammars taught in grade school are “descriptive” grammars. Grammars in the formal analysis of language are “prescriptive” and usually “generative”.

• Grammars are usually defined by rules, but statistical transition networks are also used to model the structure of language.

Page 6: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Modeling Natural LanguageSyntax with Grammars

• Rewrite (or production) rules (phrase-structure grammar)

• A very simple example of rewrite rules

S NP+VPNP N, Adj+N,

VP V, V+NP

Page 7: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Parsing

• Can we identify the grammatical structure of a given statement? • Parsing is the basis of syntax checking for computer program

compilers.• A parse tree is structure of a given statement given

– a lexicon with parts-of-speech– a grammar

• A very simple sample parse tree shown at the right. This hasa Verb Phrase with a Direct Object.This Direct Object is itself a NounPhrase.

• Difficulties: Garden path sentences– “The man who hunts ducks out on weekends”

• Many algorithms have been developed for parsing,

SNP VP

N V

NP

Adj

Adj N

Page 8: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Psycholinguistics

• How do people process and learn language?• Chomsky’s claims for formal (discrete)

grammars:• All natural languages are context free• Children have grammatical rules wired in:

– “I goed to the store.”• Competence vs. performance

• People know what is grammatically correct even if they make errors.

• Transformational grammars describe rules for re-arranging of structure such as forming a question from a declarative sentence.

• An alternative to discrete (formal) grammars is statistical (approximate) grammars. These can be learned by association.

Page 9: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Modeling Syntax with Statistical Models

• While most grammars are a rule-based representation, a statistical representation of language may m capture structure more flexibly.

• In particular, Markov models can describe the transitions between different parts of speech. For instance, the Nouns are often followed by Verbs but Adjectives are rarely followed by Verbs.

Page 10: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Words

• What exactly is a word? (also matters for the design of search engines)– Sail-boat, Pennsylvania, 555-1212, F-16

• Definitions of words– Why aren’t the definitions of words in dictionaries all

the same?– Are exact definitions of words possible?

• Across time, across groups– How do words evolve in meaning?

• Sometimes by radial categories (that is, often by metaphor)

• What is the relationship between concepts and words?

Page 11: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Beyond Traditional Dictionaries:WordNet and FrameNet

• WordNet http://wordnet.princeton.edu/

– Shows hierarchical relationships for dictionary terms. Very loosely, this can be thought of as an ontology.

• FrameNet http://framenet.icsi.berkeley.edu/

– Shows the elements usually associated with a concept.

– For verbs show the relationship among concepts. For instance “to give” implies that there is a gift, a gifter, and a giftee.

Page 12: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Semantics

• Very different surface structures can have similar semantics.

• The semantics of natural language is often judged by the meaning and relationship of the components. Subjective and contextualized meaning is considered as pragmatics which we will discuss later.

• The semantics of statements in a computer programming language (i.e., a program) can be determined from its behavior.

Page 13: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Representing Semantics

• Semantic grammars– Even with different surface structure, can

we develop a standard representation for the meaning.

• Interlingua– A common representation for meaning

across languages. This could be useful for translation.

Page 14: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Pragmatics: Social Uses of Language

• Pragmatics extends the literal semantics to consider other ways language is used.– Referential

• Conveys information about some real phenomenon • This is what we think about as normal language use

– Expressive• describes feelings of the speaker

– Conative• attempts to elicit some behavior from the addressee

– Phatic• builds a relationship between both parties in a conversation

– Meta-lingual• self-references

– Poetic• focuses on the text independent of reference

from R. Jakobson

Page 15: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Discourse

• Sentences form macro-structures or super-structures of meaning. This includes structured language such as argumentation, negotiation, news, narrative, and explanations.

• What are the components (elements) and structure of discourse. For instance, structuring messages to make it clear for listeners.

• Given-New Bill (a person you know) went to the store (is in a new location)

• Theme-RhemeWhen in Rome (theme), do as the Romans do (rheme)

Page 16: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Argumentation

• Toulmin has proposed a general structure for arguments

• There are a lot of complex structured verbal interactions– Legal arguments– Design rationale– Negotiations

ClaimGrounds

RebuttalEvidence

Page 17: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Explanations and Causation

• An explanation consists of– Two types of phenomena being explained

• Causal antecedents– How do we explain the American Civil War?

• Sub-processes– How does a gasoline engine work?

– Background for the person receiving the explanation needs to be considered.

Page 18: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Stories and Narrative

• (Goals + Events + Resolution) + Characters• Many stories seem highly structured

– Some stories seem so structured that they have been described as “story grammars”. This is most notably true of Russian Fairy Tales

• Many stories also reflect familiar human quandaries– “Romeo and Juliet”

• Interactive and dynamic narrative (useful in games)– Could we become a player in an interactive

“Romeo and Juliet”?

Page 19: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Conversation

• Conversation adds a social and interactive component to language

• Conversational norms (Maxims) • Truthful, informative, relevant, clear• But these are routinely violated.

• e.g. shaggy dog stories.

• Managing conversations– Opening / Closing– Turn taking

Page 20: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

How close to Passing the Turing Test?

Chatterbots

IBM “Watson” plays Jeopardy.

CC 2007, 2011 attribution - R.B. Allen

Page 21: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Natural Language Processing (NLP)

We will revisit natural language in a few weeks when we look at the use of natural language in information systems.

Page 22: CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

Formal Languages

• Programming languages• High-level languages (e.g., C++) are

built to simplify the use of low-level machine language

• Debugging tools typically check syntax but not semantics

CC 2007, 2011 attribution - R.B. Allen