31
Martin Kay CL Introduction 1 Martin Kay Stanford University Ling 138/238

Martin Kay Stanford University

  • Upload
    kyran

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Ling 138/238. Martin Kay Stanford University. Introduction to. Computational Linguistics. 30Introduction Oct1Complexity; String search 6Knuth-Morris-Pratt; Boyer Moore; 8Suffix Trees 13Tagging; Alignment 15 20Chomsky Hierarchy; Regular Expressions 22 - PowerPoint PPT Presentation

Citation preview

Page 1: Martin Kay Stanford University

Martin Kay CL Introduction 1

Martin Kay

Stanford University

Ling 138/238

Page 2: Martin Kay Stanford University

Martin Kay CL Introduction 2

30 Introduction

Oct 1 Complexity; String search

6 Knuth-Morris-Pratt; Boyer Moore;

8 Suffix Trees

13 Tagging; Alignment

15

20 Chomsky Hierarchy; Regular Expressions

22

27 Finite-state automata

39

Page 3: Martin Kay Stanford University

Martin Kay CL Introduction 3

Nov 3 Morphology

5

10 Context-free grammar

12

17 Unification, HPSG, LFG

19

24 Machine Translation

26

Dec 1 Summary; Wrap-up

3

Page 4: Martin Kay Stanford University

Martin Kay CL Introduction 4

Martin Kay

[email protected]

740 3043

Margaret Jacks 124

Office hours: TuTh 4.15-5.45 p.m.

Linguistics 138/238

Page 5: Martin Kay Stanford University

Martin Kay CL Introduction 5

Prerequisites and Expectations

• No prerequisites• Classroom participation• Occasional readings• Learn Prolog• Laboratory sessions• Homework Problems• Project

Page 6: Martin Kay Stanford University

Martin Kay CL Introduction 6

Project

• Learn something new about language• Significant programming• Group work• Modifying or amplifying existing code

A HMM-based taggerA searcher for tagged textImplementation of Suffix treesMorphological analysisNamed-entity recognition

Page 7: Martin Kay Stanford University

Martin Kay CL Introduction 7

Intellectual Relations

Relation to—Linguistics

—Psychology

—Artificial Intelligence

—Computer Science

Page 8: Martin Kay Stanford University

Martin Kay CL Introduction 8

Computational Linguistics as Science

Page 9: Martin Kay Stanford University

Martin Kay CL Introduction 9

Ideas from Computing

SearchDivide and ConquerGuides and OraclesNondeterminism

Dynamic ProgrammingScheduling, agendasCompilationUnificationAutomata TheoryCo-routining and parallelismTop-down vs. bottom-upComplexity

Page 10: Martin Kay Stanford University

Martin Kay CL Introduction 10

Ideas from Computing

Search

Nondeterminism

Dynamic Programming

Page 11: Martin Kay Stanford University

Martin Kay CL Introduction 11

A Maize

Keep you right hand on the wall

SearchNondeterminismDynamic Programming

Page 12: Martin Kay Stanford University

Martin Kay CL Introduction 12

Backup!

A Maize

Backup!

Backup!

Out!

SearchNondeterminismDynamic Programming

Page 13: Martin Kay Stanford University

Martin Kay CL Introduction 13

Nondeterminism

• A process is nondeterministic if there are points in it when a choice must be made, but the information necessary to make the choice is not available.

• Solution: Pick one of the alternatives. If it does not work out, come back and pick another one.

• Note: the information required to make the choice was available after all!

SearchNondeterminismDynamic Programming

Page 14: Martin Kay Stanford University

Martin Kay CL Introduction 14

DynamicProgramming

p o u r

f 1 2 3 4

o 2 1 2 3

r 3 2 2 2

Paris

DijonMulhouse

Strasbourg

Chalons Metz266

192 161

344

276

115

234288

458

620

619

SearchNondeterminismDynamic Programming

Page 15: Martin Kay Stanford University

Martin Kay CL Introduction 15

people np np np

s s s

like prep pp pp

v vp vp

the det np np

French adj n

n n

drink n

vp

The CKY Chart

Context free: All phrase with the same— Coverage, and— Category

enter into larger phrases as a single item

Context free: All phrase with the same— Coverage, and— Category

enter into larger phrases as a single item

SearchNondeterminismDynamic Programming

Page 16: Martin Kay Stanford University

Martin Kay CL Introduction 16

Ideas from Computing

Unification

Page 17: Martin Kay Stanford University

Martin Kay CL Introduction 17

UnificationAttribute Report 1 Report 2 Combined

Report

eyes blue blue blue

hair black or brown brown or red brown

accent Italian Italian

wife see below see below see below

children Ahemed & Angela Rebecca & Angela Ahmed, Angela &

Rebecca

age middle 48 Middle

Wife

eyes brown brown

weight 247 lbs 112 Kg 247 lbs

disposition surly surly

Unification

Page 18: Martin Kay Stanford University

Martin Kay CL Introduction 18

UnificationAttribute Report 1 Report 2 Combined

Report

eyes blue blue blue

hair black or brown brown or red brown

accent Italian Italian

wife see below see below see below

children Ahemed & Angela Rebecca & Angela Ahmed, Angela &

Rebecca

age middle 48 Middle

Wife

eyes brown grey FAIL

weight 247 lbs 112 Kg 247 lbs

disposition surly surly

Unification

Page 19: Martin Kay Stanford University

Martin Kay CL Introduction 19

English Agreement

The dog sleeps

The dogs sleep

The dog slept

The dogs slept

The sheep sleeps

The sheep sleep

The sheep slept

The sheep that was in the barn slept

The sheep that were in the barn slept

Unification

Page 20: Martin Kay Stanford University

Martin Kay CL Introduction 20

German Case

Der Junge sah den Lehrer

Den Lehrer sah der Junge

Das Mädchen sah der Junge

der Junge sah das Mädchen

Die Lehrerin sah den Lehrer

Die Lehrerin sah das Mädchen

Unification

Page 21: Martin Kay Stanford University

Martin Kay CL Introduction 21

Ideas from Computing

Finite-State Methods

Page 22: Martin Kay Stanford University

Martin Kay CL Introduction 22

Finite-State Methods in Language Processing

The Application of a branch of mathematics

—The regular branch of automata theory

to a branch of computational linguistics in which what is crucial is (or can be reduced to)

—Properties of string sets and string relations with

—A notion of bounded dependency

Finite-State Methods

Page 23: Martin Kay Stanford University

Martin Kay CL Introduction 23

Applications

• Finite Languges— Dictionaries

— Compression

• Phenomena involving bounded dependency

— Morpholgy

• Spelling

• Hyphenation

• Tokenization

• Morphological Analysis

— Phonology

• Approximations to phenomena involving mostly bounded dependency

— Syntax

• Phenomena that can be translated into the realm of strings with bounded dependency

— Syntax

Finite-State Methods

Page 24: Martin Kay Stanford University

Martin Kay CL Introduction 24

Ideas from Computing

Complexity

Page 25: Martin Kay Stanford University

Martin Kay CL Introduction 25

The Chomsky HierarchyGrammar Language AutomatonType 0 Recursively Turing Machines

Enumerable Sets

Context-sensitive Context-sensitive Nondeterministic linear space bound Turing Machines

Context-free Context-free Nondeterministic push- down automata

LR(k) Deterministic Context- Deterministic push-down free automata

Regular Expressions Regular Sets Finite-state automataLeft (Right) Linear

Complexity

Page 26: Martin Kay Stanford University

Martin Kay CL Introduction 26

Computation and Psychology

Sentence Processing

Page 27: Martin Kay Stanford University

Martin Kay CL Introduction 27

Computational Linguistics as Engineering

Page 28: Martin Kay Stanford University

Martin Kay CL Introduction 28

Tools for Linguists

• TLF, OED• Corpus Linguistics• Field Notes• Grammar Testing

Page 29: Martin Kay Stanford University

Martin Kay CL Introduction 29

Translation

• MT, Translator's Tools• Alignment, Dictionaries, Term Banks• Normalization and Tuning

Page 30: Martin Kay Stanford University

Martin Kay CL Introduction 30

Other Applications

• Writer's Tools—Spelling

—Dictionary, Thesaurus

—Grammar

• Natural Language Interfaces

• Information Storage and Retrieval

Page 31: Martin Kay Stanford University

Martin Kay CL Introduction 31

CL & AI

••••••••

• •

Text Interpretation

Meaning

Linguistics ???

• Text, Meaning, and Interpretation