CC 2007, 2011 attribution R.B. Allen Languages: Natural and Formal

Preview:

Citation preview

CC 2007, 2011 attribution R.B. Allen

Languages: Natural and Formal

CC 2007, 2011 attribution R.B. Allen

Language

• Definition– In math and computer science:

• A lexicon & rules for combining terms from the lexicon

– In common use:• Structured verbal interaction between people• Any structured interaction such as “The

Language of Film”

• Are computer languages a model for human natural language?

CC 2007, 2011 attribution R.B. Allen

Wide Variability among Natural Languages

• Sentence Structure– SVO (Subject-Verb-Object) (English, Chinese)– OVS (Gaelic/Celtic)– SVO (Hindi, Japanese, Hopi)

• Written– Ideographic (Chinese), – Syllabic (Thai), – Alphabetic (English)

• Spoken– Tonal (Chinese) – Non-tonal (English)

CC 2007, 2011 attribution R.B. Allen

Layers of Natural Language

• Words– Morphology, Orthography, Phonetics, Phonology

• Syntax – Phrase and sentence structure based on parts of speech

• Semantics – Literal meaning

• Pragmatics/Discourse – Uses beyond the literal meaning

CC 2007, 2011 attribution R.B. Allen

Grammars

• Grammars are most often associated with modeling syntax though semantic grammars are also possible. In the broadest sense, grammars are rules for languages

• The most general grammars are “context-free”. That is, the structure does not depend of the context.

• The grammars used for natural language syntax are usually “constituent grammars”. That is they identify the relationship of the components (constituents) of the phrase.

• Grammars taught in grade school are “descriptive” grammars. Grammars in the formal analysis of language are “prescriptive” and usually “generative”.

• Grammars are usually defined by rules, but statistical transition networks are also used to model the structure of language.

CC 2007, 2011 attribution R.B. Allen

Modeling Natural LanguageSyntax with Grammars

• Rewrite (or production) rules (phrase-structure grammar)

• A very simple example of rewrite rules

S NP+VPNP N, Adj+N,

VP V, V+NP

CC 2007, 2011 attribution R.B. Allen

Parsing

• Can we identify the grammatical structure of a given statement? • Parsing is the basis of syntax checking for computer program

compilers.• A parse tree is structure of a given statement given

– a lexicon with parts-of-speech– a grammar

• A very simple sample parse tree shown at the right. This hasa Verb Phrase with a Direct Object.This Direct Object is itself a NounPhrase.

• Difficulties: Garden path sentences– “The man who hunts ducks out on weekends”

• Many algorithms have been developed for parsing,

SNP VP

N V

NP

Adj

Adj N

CC 2007, 2011 attribution R.B. Allen

Psycholinguistics

• How do people process and learn language?• Chomsky’s claims for formal (discrete)

grammars:• All natural languages are context free• Children have grammatical rules wired in:

– “I goed to the store.”• Competence vs. performance

• People know what is grammatically correct even if they make errors.

• Transformational grammars describe rules for re-arranging of structure such as forming a question from a declarative sentence.

• An alternative to discrete (formal) grammars is statistical (approximate) grammars. These can be learned by association.

CC 2007, 2011 attribution R.B. Allen

Modeling Syntax with Statistical Models

• While most grammars are a rule-based representation, a statistical representation of language may m capture structure more flexibly.

• In particular, Markov models can describe the transitions between different parts of speech. For instance, the Nouns are often followed by Verbs but Adjectives are rarely followed by Verbs.

CC 2007, 2011 attribution R.B. Allen

Words

• What exactly is a word? (also matters for the design of search engines)– Sail-boat, Pennsylvania, 555-1212, F-16

• Definitions of words– Why aren’t the definitions of words in dictionaries all

the same?– Are exact definitions of words possible?

• Across time, across groups– How do words evolve in meaning?

• Sometimes by radial categories (that is, often by metaphor)

• What is the relationship between concepts and words?

CC 2007, 2011 attribution R.B. Allen

Beyond Traditional Dictionaries:WordNet and FrameNet

• WordNet http://wordnet.princeton.edu/

– Shows hierarchical relationships for dictionary terms. Very loosely, this can be thought of as an ontology.

• FrameNet http://framenet.icsi.berkeley.edu/

– Shows the elements usually associated with a concept.

– For verbs show the relationship among concepts. For instance “to give” implies that there is a gift, a gifter, and a giftee.

CC 2007, 2011 attribution R.B. Allen

Semantics

• Very different surface structures can have similar semantics.

• The semantics of natural language is often judged by the meaning and relationship of the components. Subjective and contextualized meaning is considered as pragmatics which we will discuss later.

• The semantics of statements in a computer programming language (i.e., a program) can be determined from its behavior.

CC 2007, 2011 attribution R.B. Allen

Representing Semantics

• Semantic grammars– Even with different surface structure, can

we develop a standard representation for the meaning.

• Interlingua– A common representation for meaning

across languages. This could be useful for translation.

CC 2007, 2011 attribution R.B. Allen

Pragmatics: Social Uses of Language

• Pragmatics extends the literal semantics to consider other ways language is used.– Referential

• Conveys information about some real phenomenon • This is what we think about as normal language use

– Expressive• describes feelings of the speaker

– Conative• attempts to elicit some behavior from the addressee

– Phatic• builds a relationship between both parties in a conversation

– Meta-lingual• self-references

– Poetic• focuses on the text independent of reference

from R. Jakobson

CC 2007, 2011 attribution R.B. Allen

Discourse

• Sentences form macro-structures or super-structures of meaning. This includes structured language such as argumentation, negotiation, news, narrative, and explanations.

• What are the components (elements) and structure of discourse. For instance, structuring messages to make it clear for listeners.

• Given-New Bill (a person you know) went to the store (is in a new location)

• Theme-RhemeWhen in Rome (theme), do as the Romans do (rheme)

CC 2007, 2011 attribution R.B. Allen

Argumentation

• Toulmin has proposed a general structure for arguments

• There are a lot of complex structured verbal interactions– Legal arguments– Design rationale– Negotiations

ClaimGrounds

RebuttalEvidence

CC 2007, 2011 attribution R.B. Allen

Explanations and Causation

• An explanation consists of– Two types of phenomena being explained

• Causal antecedents– How do we explain the American Civil War?

• Sub-processes– How does a gasoline engine work?

– Background for the person receiving the explanation needs to be considered.

CC 2007, 2011 attribution R.B. Allen

Stories and Narrative

• (Goals + Events + Resolution) + Characters• Many stories seem highly structured

– Some stories seem so structured that they have been described as “story grammars”. This is most notably true of Russian Fairy Tales

• Many stories also reflect familiar human quandaries– “Romeo and Juliet”

• Interactive and dynamic narrative (useful in games)– Could we become a player in an interactive

“Romeo and Juliet”?

CC 2007, 2011 attribution R.B. Allen

Conversation

• Conversation adds a social and interactive component to language

• Conversational norms (Maxims) • Truthful, informative, relevant, clear• But these are routinely violated.

• e.g. shaggy dog stories.

• Managing conversations– Opening / Closing– Turn taking

How close to Passing the Turing Test?

Chatterbots

IBM “Watson” plays Jeopardy.

CC 2007, 2011 attribution - R.B. Allen

CC 2007, 2011 attribution R.B. Allen

Natural Language Processing (NLP)

We will revisit natural language in a few weeks when we look at the use of natural language in information systems.

Formal Languages

• Programming languages• High-level languages (e.g., C++) are

built to simplify the use of low-level machine language

• Debugging tools typically check syntax but not semantics

CC 2007, 2011 attribution - R.B. Allen

Recommended