43
The Complexity Sources and their Compensation in Language Processing Philippe Blache Laboratoire Parole et Langage CNRS & Aix-Marseille Universit´ e 1 / 41

The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

The Complexity Sources and their Compensation inLanguage Processing

Philippe Blache

Laboratoire Parole et LangageCNRS & Aix-Marseille Universite

1 / 41

Page 2: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Situation: different kinds of complexity

I System vs. Structural Complexity (Dahl, 2004)

System complexity Structural complexity

I Number of categories in each domainI Number of features for each categoriesI Grammar size for each domainI Lexicon sizeI Average local complexity

I Number of categoriesI DepthI Number of rules used to build the

structureI Number of words

I Absolute vs. Relative Complexity (Miestamo, 2008)I Absolute complexity: theory-orientedI Relative complexity: user-dependent

I Our perspective : structural relative complexity

Dahl O. (2004) The Growth and Maintenance of Linguistic Complexity, John Benjamins.Miestamo M. (2008) “Grammatical complexity in a cross-linguistic perspective”, in Language Complexity, John Benjamins.

2 / 41

Page 3: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Difficulty Models in Psycholinguistics

I Existing modelsI Incomplete Dependency HypothesisI Dependency Locality TheoryI Early Immediate Constituents PrincipleI Activation

I However, they fail at:I Describing language in its natural environmentI Explaining interaction between sources of information

Gibson E. (1998) “Linguistic complexity: Locality of syntactic dependencies”, Cognition, 68Gibson, E. (2000) “The dependency locality theory: A distance-based theory of linguistic complexity”, in Image, language, brain, MIT Press.Hawkins J. (2001) “Why are categories adjacent?”, in Journal of Linguistics, 37.Vasishth S. (2003) “Quantifying processing difficulty in human sentence parsing : The role of decay, activation, and similarity-based interference”, inProceedings of The European Cognitive Science Conference 2003

3 / 41

Page 4: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Language processing in the real word

I New challenges for Linguistics, Psycholinguistics and NLPI Dealing with natural dataI Language in its context: spoken language, natural interaction

I IssuesI Units are not always possible to determine, difficult to categorize (gradience)I Information can be parcimoniousI Language processing relies on domains interaction

4 / 41

Page 5: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Overview

I Hypothesis : Difficulty depends on the search space sizeThe larger the search space, the more difficulty

I Question: How search space size can be controlled?

I A framework : Maximize Online Principle (Hawkins, 2004)The more properties, the smaller the search space

I Questions:I How to represent properties?I How properties implement structural complexity?I How structural complexity predict processing difficulty?

5 / 41

Page 6: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Outline

I Basis: a constraint-based representation for syntax

I Cohesion: a model for syntactic complexityI How to measure cohesionI Experimenting cohesion in human language processing: an interplay between

difficulty and facilitation

6 / 41

Page 7: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Part I

Representing syntactic information

7 / 41

Page 8: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Classical Generative Syntax

Language and grammar

I Derivation

I Language = set of derived strings

I Recursively enumerable

Parsing

I Finding a derivation

I Building a tree

I Consequences:I There exists a complete grammar of the languageI The initial system is a complete grammar (acquisition)

8 / 41

Page 9: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Our Proposal: Property Grammars

I Describing the characteristics of an input (not building a structure)

I Linguistic statements as constraints

I Declarative approach : no specific mechanism but constraint evaluation

I Basics:I Constraints are independent

I No ranking (contra OT)I Seperate evaluation (contra (GEN)

I No hierarchical structure (contra PSG)I Constraints are at the same level (contra DEP)

9 / 41

Page 10: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

What kind of syntactic information?

I Linear precedence

I Mandatory cooccurrence between two categories

I Impossible cooccurrence between two categories

I No repetition of the same category within a construction

I Dependency between two categories

10 / 41

Page 11: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Linearity

Prec(A,B) : (∀x , y)[(A(x) ∧ B(y)→ y 6≺ x)]

I Example: nominal construction

Det ≺ Adj Det ≺ N Det ≺ ProR

Adj ≺ N N ≺ ProR N ≺ Prep

the

l&&

l

%%

l

&&very l -- famous l --

l''

reporter l ++who the senator attacked ...

11 / 41

Page 12: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Requirement (cooccurrence)

Req(A,B) : (∀x , y)[A(x)→ B(y)]

I Example:

V[trans] ⇒ N[obj] Det ⇒ N[com] Adj ⇒ N

ProR ⇒ N V[ditrans] ⇒ Prep Prep ⇒ N

I Relations without government:

(1) a. The most interesting book of the libraryb. *A most interesting book of the library

Sup ⇒ Det[def]

The

r

%%most

rssr %%

interesting

r %%book of the library

rxx

12 / 41

Page 13: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Exclusion (cooccurrence restriction)

Excl(A,B) : (∀x)( 6 ∃y)[A(x) ∧ B(y)]

I ExamplesI Nominal construction:

Pro ⊗ N N[prop] ⊗ N[com] N[prop] ⊗ Prep[inf]

I Relative construction:

ProR[subj] ⊗ N[subj]

13 / 41

Page 14: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Uniqueness

Uniq(A) : (∀x , y)[A(x) ∧ A(y)→ x ≈ y ]

I Example (nominal construction):

Uniq = {Det, ProR, Prep[inf], Adv}

The

u��

book that

u��

I read

14 / 41

Page 15: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Dependency: the type hierarchy

dep

mod comp aux conj

subj obj iobj xcomp

dep : generic dependency relationmod : modification (typically adjunction)spec : specification (typically Det-N)comp : head-complementsubj : subjectobj : direct objectiobj : indirect objectxcomp : other complements (e.g. N − Prep)aux : auxiliary-verbconj : conjunction

15 / 41

Page 16: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Dependency

I Example:

Det ;spec N[com] Adj ;mod N ProR ;mod N

The

spec

most

mod %%interesting

mod %%book of

mod��

the

spec ""library

comp

��

I Syntactic role: N[subj ] ;subj V

I Agreement: Det[agri ] ;spec N[agri ]; Adj [agri ] ;mod N[agri ]

16 / 41

Page 17: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Example: the nominal construction

Det ≺ {Det, Adj, ProR, Prep, N} Det ;spec N

N ≺ {Prep, ProR} Adj ;mod N

Pro ⊗ {Det, Adj, ProR, Prep, N} ProR ;mod N

N[prop] ⊗ Det Prep ;mod N

Uniq = {Pro, Det, N, ProR, Prep} Det ⇒ N[com]

{Adj, ProR, Prep} ⇒ N

17 / 41

Page 18: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Syntactic Representation: Constraint Graph

The

u

��

l

::

l

%%

r

!!

spec

��most

l77

mod ''r //roo interesting

mod ((

l

88book

u

ZZl

<<of

u

ZZ

mod��

l

��the

u

��

l88

r

EE

spec &&library

comp

��

18 / 41

Page 19: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Constraint violation

Contraint graph Characterization

The

l%%

l

##

d

88veryl **

d,r44 old

l ,,

d 33book

r

bb

P+ = {Det ≺ Adj ,Det ≺ N,Adv ≺Adj ,Adj ≺ N,Det ;

N,Adj ; N,Adv ;

Adj ,Det ⇒ N,Adv ⇒Adj ,Adj ⇒ N}

P− = ∅

Veryl **

dc44 old

l

��

d

BBthel{{ l ++

d 33book

r

^^

P+ = {Det ≺ N,Adv ≺Adj ,Adj ≺ N,Det ;

N,Adj ; N,Adv ;

Adj ,Det ⇒ N,Adv ⇒Adj ,Adj ⇒ N}

P− = {Det ≺ Adj}

19 / 41

Page 20: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Part II

Measuring cohesion

20 / 41

Page 21: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Graph-based measures

I Definition: degree = number of edges incident to the vertex

I Degree of a category in the grammar:

Adjr

##d

ProRr

��

d

��Det

l

66l

22

l --

l''

d11 N

lrr

lgg

rtt

Prep

r

22

d

@@

Pro

deg[gram](N) = 9deg[gram](ProR) = 2deg[gram](Adj) = 1

I Degree of a category in the sentence:

Thel %%

l

!!

d77old l ,,

d 22book

c

``

deg[sent](N) = 5deg[sent](Adj) = 1deg[sent](Det) = 1

21 / 41

Page 22: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Category Completeness

I The completeness level of a category depends on the number of relations inits description

I This measure also depends on the number of relations for the category in thegrammar

I Completeness ratio: the higher the number of relations in the grammar isverified, the higher the completeness value

completeness(cat) =deg[sent](cat)

deg[gram](cat)

22 / 41

Page 23: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Sentence Density

I Measure based on 4 types or properties: uniqueness, requirement,dependency, linearity

I Density of a construction: the ratio btw the evaluated properties and thepossible properties:

density(sent) = |properties(sent)||words(sent)|

23 / 41

Page 24: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Satisfaction ratio

I All constraints can be violated

I A characterization contains both satisfied and violated constraints

I The “quality” of a construction depends on the ratio satisfied/violated

I All constraints can be weighted. We note W+ (resp. W− the sum of theweights of the satisfied (resp. violated) constraints :

satisfaction(sent) = W +−W −

W ++W −

24 / 41

Page 25: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Cohesion function

Given S a sentence, w the set of its words:

cohesion(S) =|S|∑i=1

completeness(wi ) ∗ density(S) ∗ satisfaction(S)

Hypothesis: cohesion is correlated with difficulty

25 / 41

Page 26: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Example 1

le cote hysterique un peu de enfin c’est normal tu vois elle souffre et

machin

the hysterical side rather of well it’s normal you see she suffers and that’s it

le l ..d ��

r==

l

��

l

<<cote

l

99hysteriquedkk un peu

dii

luude

d

yyenfin

c ′l !!

d>>est

l &&r��

r

VV normaldjj [tu vois] elle

d ''

l::souffre

r

YYd;; et machin

ddd

26 / 41

Page 27: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Example 2le chien apparemment connaissait parfaitement – the dog apparently knew perfectly –– le coin – the areaet euh quand on est partis and hmm when we leftle chien a decide de nous suivre the dog decided to follow us

led<<

l %%r��

chien

d''

l

77apparemmentd

55connaissaitl **

r

``

l

&&

r

==parfaitement

dff le

d==

l -- coin

d

ee

et euh quand onl

d??est

l %%

d

DDpartisrbb

r

YY

d

ZZ

led==

l $$r��

chienl

d>> a

l %%

d<<decide

r

WW

r

YYl &&

r;;de

d

XX

l!!

nousl &&

d;;suivrerpp

d

aa

27 / 41

Page 28: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Evaluation

Cat Degree-gramDet 1N 34Adj 11Adv 17Prep 31Pro 4Conj 0Aux 8V 7Conj 21

Word Deg-sent Deg-gram Completenessle 0 1 0cote 5 34 0.15hysterique 3 11 0.27un 0 1 0peu 0 17 0de 2 31 0.06enfin 0 17 0c’ 1 4 0.25est 3 7 0.43normal 2 11 0.18tu vois 0 0 0elle 1 4 0.25souffre 2 7 0.29et 2 0 0machin 0 17 0

28 / 41

Page 29: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Evaluation

le l --d ��r 11

l %%

l

66cote

l

66hysterique

dii un peu

ddd

lvvde

d

||enfin

Words Constraints Completeness Density CohesionSent. 1 15 19 0.13 1.18 0.15Sent. 2 21 38 0.17 1.80 0.32

le

d

@@l ��r

��chien

d

##

l

;;apparemment

d

99connaissait

l &&

r

[[

l

""

r

BBparfaitement

d

aa le

d

AAl ++

coin

d

bb

29 / 41

Page 30: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Part III

Difficulty and Compensation: the Case of

Idiom Processing

30 / 41

Page 31: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

The situation

Idiom: multiword expression with a figurative meaning separate from the literalmeaning

Examples

I Decomposable idioms (variables)‘‘let the cat out of the bag’’

I Non-decomposable idioms (opaque semantics, no variability)‘‘spill the beans’’, ‘‘kick the bucket’’

Experimental perspective

I Idioms are read faster

I Idioms are related with specific brain activities

I Two different models according to the way they are processed

31 / 41

Page 32: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Compositional models (nonlexical models)

Main ideas

I Idiom comprehension uses normal language processing

I Idioms are represented as configurations of lexical items, no separaterepresentation in the lexicon

The Configurational HypothesisCacciari & Tabossi (1988) “The comprehension of idioms”, Journal of Memory and Language

I A sufficient portion of an idiomatic expression must be processed literallybefore the idiom can be identified

I After the “Recognition Point”, rest of the string is not processed literally

32 / 41

Page 33: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

The Cohesion Model: interplay between difficulty andfacilitation

I Our hypothesis: Difficulty can be compensated by Cohesion

I Experiment:I Idioms have high cohesion values

I Introducing a difficulty into an idiom (a syntactic violation) is compensated bythe cohesion

I We compare idiomatic vs. non-idiomatic sentences, with and without violation

33 / 41

Page 34: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Experimental design

Material

I Idiom (IDNV)Paul a une idee derriere la tete depuis ce matin

I Idiome with violation (IDV)Paul a une idee derriere le tete depuis ce matin

I Control (CTRNV)Paul a une douleur derriere la nuque depuis ce matin

I Control with violation (CTRV)Paul a une douleur derriere le nuque depuis ce matin

34 / 41

Page 35: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Experimental design

Specific positions to study

I Recognition point (RP)Paul a une idee derriere la tete depuis ce matin

I Modified word, where the violation is introduced (MM)Paul a une idee derriere le tete depuis ce matin

I Detection word, where the violation is detected (MD)Paul a une idee derriere le tete depuis ce matin

35 / 41

Page 36: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Results: recognition point

I Different processing idiom vs. controlI More positive amplitude in the P300 and N400 windows for idioms:

facilitation36 / 41

Page 37: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Results: modified word

I More negative N400 for violated idiom (IDV) than non violated (IDNV):surprisal at the unexpected (modified) word for idioms

I No significative P600 : no repair

37 / 41

Page 38: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Results: detection word

I Small N400, small P600 for the violated control

I Positive deflection for IDV (wrt IDNV) at N400+P600 : repair

38 / 41

Page 39: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Results

I Idioms are processed differently: more positive amplitude after RP

I Modifying a word after the RP in idioms generates N400: surprisal

I At the position of syntactic violation detection:I High negativity (difficulty) for control sentencesI Earlier positivity (P300) for idioms: expectancy confirmation

39 / 41

Page 40: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Violation Compensation

Paul a

l,d %%

l,d

��une

d,lAAdouleur

c

VV

l

FFderri ere

l

EEla

l

EE

d��

nuque

d

��depuis ce matin

Paul a

l,d $$c

��

l,d

��

c

��une

d,lCCidee

c

UU

l

GGderri ere

l

EE

c

IIla

l

GG

c

JJ

d ��tete

d

��depuis ce matin

Paul a

l,d $$c

��

l,d

��

c

��une

d,lCCidee

c

UU

l

GGderri ere

l

EE

c

IIle

l

GG

c

JJ

d ��tete

d

��depuis ce matin

40 / 41

Page 41: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Violation Compensation

Paul a

l,d %%

l,d

��une

d,lAAdouleur

c

VV

l

FFderri ere

l

EEla

l

EE

d��

nuque

d

��depuis ce matin

Paul a

l,d $$c

��

l,d

��

c

��une

d,lCCidee

c

UU

l

GGderri ere

l

EE

c

IIla

l

GG

c

JJ

d ��tete

d

��depuis ce matin

Paul a

l,d $$c

��

l,d

��

c

��une

d,lCCidee

c

UU

l

GGderri ere

l

EE

c

IIle

l

GG

c

JJ

d ��tete

d

��depuis ce matin

40 / 41

Page 42: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Violation Compensation

Paul a

l,d %%

l,d

��une

d,lAAdouleur

c

VV

l

FFderri ere

l

EEla

l

EE

d��

nuque

d

��depuis ce matin

Paul a

l,d $$c

��

l,d

��

c

��une

d,lCCidee

c

UU

l

GGderri ere

l

EE

c

IIla

l

GG

c

JJ

d ��tete

d

��depuis ce matin

Paul a

l,d $$c

��

l,d

��

c

��une

d,lCCidee

c

UU

l

GGderri ere

l

EE

c

IIle

l

GG

c

JJ

d ��tete

d

��depuis ce matin

40 / 41

Page 43: The Complexity Sources and their Compensation in Language ... · Di culty Models in Psycholinguistics I Existing models I Incomplete Dependency Hypothesis I Dependency Locality Theory

Conclusion

What can be done with constraints

I Describing whatever the input (including ill-formed)

I Measuring structural complexity

Complexity Model : a Cognitive Perspective

I An interplay between difficulty and facilitation

I An interaction between different sources of information

I Complexity depends on the quantity of information to reduce the search space

I Necessary to take into account the cognitive matrix as well as the context

41 / 41