Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit

Modelling Language EvolutionLecture 1: Introduction to Learning

Simon Kirby

University of Edinburgh

Language Evolution & Computation Research Unit

Course Overview

Learning Introduction to neural nets Learning syntax

Evolution Syntax Learning bias and structure

Culture Iterated learning The Talking Heads (practical)

Computers for modelling

Computers in linguistics Engineering (speech and language technologies) Research tools (waveform analysis, psycholinguistic

stimuli etc.) Recently: modelling building

Why build models?Why use computers?What is a model anyway?

What is a model?

One view:

We use models when we can’t be sure what our theories predict

Especially useful when dealing with complex systems

THEORY

MODEL

PREDICTION

OBSERVATION

A simple example

Vowels exist in a “space”

Only some patterns arise cross-linguistically E.g. vowel space seems to be symmetrically filled Why?

Theory to Model

We need a theory to explain vowel-space universalPossible theory:

Vowels tend to avoid being close to each other to maintain perceptual distinctiveness.

Use model to test theory (Liljencrants & Lindblom 1972)

In general, computational modelsare useful when dealing with“complex systems”

Is language a complex system?

Culturalevolution

Individual learning

Biological evolution

Yes – evolution on many different timescales:

Computational models will help us understand these interactions…

Learning

Language learning is crucial to language evolution What is learning?

Learning occurs when an organism changes its internal state on the basis of experience

What do we need to model learning?1. a model of internal states

2. A model of experience

3. An algorithm to change 1 into 2

One approach: Neural nets

An approach to internal states based on the brain

An artificial neuron is a computational unit that sums inputs and uses them to decide whether to produce an output

Networks of neurons

Typically there will be many connected neurons

Information is stored in weights on the connectionsWeights multiply signals sent between nodesSignals into a node can be excitatory or inhibitory

An artificial neuron

Add up all the inputs multiplied by their weightsf(net) is the “activation function” that scales the input

j

jiji awnet

A useful activation function

All or nothing for big excitations or inhibitions…… but more sensitive in between.

ineti ea

1

1

AND: a very simple network

A network that works out if both inputs are activated:

INPUT 1 INPUT 2

BIAS NODE(always set to 1.0)

OUTPUT

5 5

-7.5

Network gives an output over 0.5 only if both inputs are 1.

OR: another very simple network

A network that works out if either input is activated:

INPUT 1 INPUT 2


OUTPUT

10 10

-7.5

Network gives an output over 0.5 if either input is 1.

XOR: a difficult challenge

A network that works out if only one input is activated:

INPUT 1 INPUT 2


OUTPUT

? ?

?

Solution needs more complex net with three layers. WHY?

XOR network - step 1

XOR is the same as OR but not AND Calculate OR Calculate NOT AND AND the results

NOT AND OR

AND

XOR network - step 2

OUTPUTBIAS NODE

HIDDEN 1 HIDDEN 2

INPUT 1 INPUT 2

10

10

-7.5

-5-5

7.5

5 5

-7.5

NOT AND OR

AND

But what about learning?

We now have: a model of internal states (connection weights) a model of experience (inputs and outputs)

Learning: set the weights in response to experience

How? Compare network behaviour with “correct” behaviour Adjust the weights to reduce network error

Error-driven learning

1. Set weights to random values2. Present input pattern3. Feed-forward activation through the network to get

an output4. Calculate difference between output and desired

output (i.e. error)5. Adjust weights so that the error is reduced6. Repeat until network is producing the desired

results.

Gradient descent

Gradient descent is a form of error-driven learning Start on random point of “error surface” Move on surface in direction of steepest slope Potential problems:

May overshoot the global minimum Might get stuck in local minimum

Example: learning past tense of verbs

Network that takes present tense form of verb… …and produces past tense.

Uses examples to set weights Generalises to add /-ed/ to verbs it’s never seen before. Has it learnt a linguistic rule?

Is this psychologically plausible?

We need an error signalWhere does this error signal come from?Possibilities:

A teacher Reinforcement The outcome of some prediction:

e.g. what’s the next word? what’s the past tense of this verb?

Summary

Modelling tests theoriesComputer modelling appropriate for complex

systemsLanguage evolution involves several complex

systemsNeural nets are one approach to modelling learningNetworks can be made to adapt to data through

error-driven learning

Next lecture: how to model acquisition of syntax

Documents

Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit