85
www.univ-nantes.fr www.univ-nantes.fr Grammatical Inference: Learning Automata and Grammars Colin de la Higuera, 2016 Kyoto, 11 th April 2016

Grammatical Inference: Learning Automata and …Arewa nihongo de nanto iimasu ka?) Colin de la Higuera, Nantes, 2016 17 Transducers can be used to translate Inversely, translating

Embed Size (px)

Citation preview

www.univ-nantes.frwww.univ-nantes.frwww.univ-nantes.fr

Grammatical Inference:

Learning Automata and

Grammars

Colin de la Higuera, 2016

Kyoto, 11th April 2016

I am

• Professor at Nantes University

• Former President of the Société informatique de

France

• Researcher in Machine Learning

• Guest scholar at Akutsu Lab, University of Kyoto

• Jan-Jun 2016

2

2

Starting point: Machine Learning

Colin de la Higuera, Nantes, 2016 3

What is machine learning? (1)

Colin de la Higuera, Nantes, 2016 4

• What does a Universal Turing machine do?

– It takes the data and the code and runs the code on the data

– The code is therefore also data

• Next step (as proposed by Turing in 1948):

– The learning machine

– Takes the code and the data and returns the code transformed

Alan Turing, On Computable Numbers with an Application to the Entscheidungsproblem, Proc. London Math.

Soc., 2nd series, vol. 42, 1937, p. 230-265

Alan Turing, Computing machinery and intelligence, Mind, Oxford University Press, vol. 59, no 236, 1950

Alan Turing, 1912-1954

Turing’s dream

What is machine learning? (2)

• Let the data decide (not the algorithm)

• The algorithm can be used to organise, index,

search

• Not more

• Typical (or extreme?) application of this idea:

k-nearest neighbours

Colin de la Higuera, Nantes, 2016 5

The big data project

Image: https://commons.wikimedia.org/wiki/File:Big_Bang_Data_exhibit_at_CCCB_17.JPG

What is machine learning? (3)

• Use the data to build a model:

– A model is a way to

• Compress the data

• Interpret the data

• Forget the data

Colin de la Higuera, Nantes, 2016 6

The pragmatic approach

My talk at KU-ICR on Thursday

How do we choose the model?

• This depends on the data

• When the data is made of points in a 2 dimension space,

this is easy

• The model can be a half-plane, a line, a polynomial

How do we choose the model?

• This depends on the data

• When the data is made of points in a 2 dimension space,

this is easy

• The model can be a half-plane, a line, a polynomial

How do we choose the model?

• This depends on the data

• When the data is made of points in a 2 dimension space,

this is easy

• The model can be a half-plane, a line, a polynomial

How do we choose the model?

• This depends on the data

• When the data is made of points in a 3 dimension space,

this is (still) easy

• The model can be a hyperplane, a separating plane, a

polynomial

Colin de la Higuera, Nantes, 2016 10

How do we choose the model?

• This depends on the data

• When the data is made of points in a high dimension

space, this is still possible (with linear algebra)

• The model can be a hyperplane, a separating hyperplane, a

function

Colin de la Higuera, Nantes, 2016 11

Image: https://i.ytimg.com/vi/Kk6rd4_dAqA/maxresdefault.jpg

Typical techniques today

• Support vector machines

• (deep) Neural networks

Colin de la Higuera, Nantes, 2016 12

Images: https://upload.wikimedia.org/wikipedia/commons/1/10/Svm_10_perceptron.JPG

https://upload.wikimedia.org/wikipedia/commons/3/32/Single-layer_feedforward_artificial_neural_network.png

A research program for computer scientists

• We know how to manipulate strings, trees, graphs

• They are good for modelling

• They contain precious information about the interactions

• Why lose their power?

• Goal is therefore to learn from (such) structured data, and

learn models adapted to such data

Colin de la Higuera, Nantes, 2016 13

A comparison

AKA Pros Cons

Vector space

machine

learning

Statistical

pattern

recognition

Robust, many algorithms

and methods.

Existence of a topology

Black box effect.

Difficult to

understand

Rich data

representations

Structural

pattern

recognition

Richer representation.

Possibility of capturing

the interactions

Intelligibility

Often less noise

resistant.

Often more

expensive

14

14

The challenge

• When input is a set of strings, why not learn an

automaton, a formal grammar?

• Ie a model designed to represent languages!

Colin de la Higuera, Nantes, 2016 15

The data for grammatical inference

Colin de la Higuera, Nantes, 2016 16

The data: examples of strings

A sentence in English and its translation to Japanese:

• What's that called In Japanese?

• あれは日本語で何といいますか。

(Arewa nihongo de nanto iimasu ka?)

Colin de la Higuera, Nantes, 2016 17

Transducers can be used to translate

Inversely, translating

これは、ウサギですこれは、ウサギのですか

also needs differing

Colin de la Higuera, Nantes, 2016 18

This:これは is: λa: λ

cat: 猫です

computer:

は、コンピュータ

The data: examples of strings

• Time series pose the problem of the alphabet:

– An infinite alphabet?

– Discretizing?

– An ordered alphabet

Colin de la Higuera, Nantes, 2016 19

Sinus rhythm with acquired long QT, work found via Flickr, by Popfossa, CC BY 2.0

The data: examples of strings

Colin de la Higuera, Nantes, 2016 20

Codis profile, Chemical Science & Technology Laboratory, National Institute of Standards and Technology, work found via Wikipedia, CC BY-SA 3.0

The data: examples of strings

>A BAC=41M14 LIBRARY=CITB_978_SKB

AAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA

GTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAAT

GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTATTGCCATCTTTATTTCA

GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGA

CACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT

GGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG

GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCA

GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAA

CAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACA

CACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA

TGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAA

TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATA

AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCA

TCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAG

TATTCACTATCAAATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC

TAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGA

GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGAT

GGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAA

AAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC

TGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGAC

TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCA

TGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT

Colin de la Higuera, Nantes, 2016 21

The data: examples of strings

Colin de la Higuera, Nantes, 2016 23

https://upload.wikimedia.org/wikipedia/commons/3/36/Emperor_family_tree_0_en.p

ng , CC BY-SA 3.0

The data: examples of strings

Colin de la Higuera, Nantes, 2016 25

<book>

<part>

<chapter>

<sect1/>

<sect1>

<orderedlist numeration="arabic">

<listitem/>

<f:fragbody/>

</orderedlist>

</sect1>

</chapter>

</part>

</book>

The data: examples of strings

Colin de la Higuera, Nantes, 2016 26

<?xml version="1.0"?>

<?xml-stylesheet href="carmen.xsl" type="text/xsl"?>

<?cocoon-process type="xslt"?>

<!DOCTYPE pagina [

<!ELEMENT pagina (titulus?, poema)>

<!ELEMENT titulus (#PCDATA)>

<!ELEMENT auctor (praenomen, cognomen, nomen)>

<!ELEMENT praenomen (#PCDATA)>

<!ELEMENT nomen (#PCDATA)>

<!ELEMENT cognomen (#PCDATA)>

<!ELEMENT poema (versus+)>

<!ELEMENT versus (#PCDATA)>

]>

<pagina>

<titulus>Catullus II</titulus>

<auctor>

<praenomen>Gaius</praenomen>

<nomen>Valerius</nomen>

<cognomen>Catullus</cognomen>

</auctor>

27

A linguistic tree. (Courtesy of Mark Knauf and Etsuyo

Yuasa, Department of East Asian Languages and

Literatures (DEALL), Ohio State University.)

And also

• Business processes

• Bird songs

• Images (contours and shapes)

• Robot moves

• Observations of protocols, server exchanges

• Interactions between systems

• …

Colin de la Higuera, Nantes, 2016 29

The models in grammatical inference

Colin de la Higuera, Nantes, 2016 30

An HMM

Colin de la Higuera, Nantes, 2016 31

• https://en.wikipedia.org/wiki/Hidden_Markov_model

Another HMM (proteins)

• http://www.cbs.dtu.dk/~kj/bioinfo_assign2.html

• And a more interesting example:

• http://www.cbs.dtu.dk/~kj/hmm-real-life-example.pdf

Colin de la Higuera, Nantes, 2016 32

A finite state machine

• https://msdn.microsoft.com/en-us/library/aa478972.aspx

Colin de la Higuera, Nantes, 2016 33

Another FSM (a transducer)

• The "3-state busy beaver" Turing Machine in a finite state representation. Each

circle represents a "state" of the TABLE—an "m-configuration" or "instruction".

"Direction" of a state transition is shown by an arrow. The label (e.g.. 0/P,R) near

the outgoing state (at the "tail" of the arrow) specifies the scanned symbol that

causes a particular transition (e.g. 0) followed by a slash /, followed by the

subsequent "behaviors" of the machine, e.g. "P Print" then move tape "R Right".

No general accepted format exists. The convention shown is after McClusky

(1965), Booth (1965), Hill and Peterson (1974).

• https://commons.wikimedia.org/

Colin de la Higuera, Nantes, 2016 34

A transducer

• Comparing nondeterministic and quasideterministic finite-state transducers

built from morphological dictionaries

• Author Alicia Garrido-Alenda and Mikel L. Forcada

• https://commons.wikimedia.org/

Colin de la Higuera, Nantes, 2016 35

Stress patterns transducer

• Example: penult; alt secondary

Colin de la Higuera, Nantes, 2016 36

1,w0,s2

2,w0w0,s2s0

3,w0w0w0,s0s2s0

4,w0w0w0w0w0,s0s1s0s2s0

5,w0w0w0w0w0w0w0,s0s1s0s1s0s2s0

6,w0w0w0w0w0w0w0w0w0,s0s1s0s1s0s1s0s2s0

Adapted from http://st2.ullet.net/

A PCFG (so not only finite state machines)

Colin de la Higuera, Nantes, 2016 37

Summarising

• Finite state models

– DFA

– NFA

– PFA

– HMM

– transducer

• Grammatical models

– Context Free Grammar

– Probabilistic Context-Free Grammar

• (many others)

Colin de la Higuera, Nantes, 2016 38

Partial Conclusion

• If we have some strings and want to learn the

models we have just seen… what do we need?

Colin de la Higuera, Nantes, 2016 39

We need… to solve many problems

related to the models themselves

Colin de la Higuera, Nantes, 2016 40

41

4

1

3

1

2

1

2

1

2

13

2

b

b

a

a

a

b

4

3

2

1

PFA: Probabilistic Finite (state)

Automaton

A PFA

42

Pr(aba) = 0.7*0.4*0.1*1 +0.7*0.4*0.35*0.2 = 0.028+0.0196= 0.0476

0.2

0.1

1

a

b

a

a

a

b

0.45

0.350.4

0.7

0.3

0.1

b 0.4

Parsing with a PFA

1

a 0.3

a 0.7a 0.7

a 0.9 a 0.3

b 0.1

PrA(b)=0.1

PrA(aaaaa)=3*0.9*0.32*0.72=0.119

43

Most probable string is?

Most probable string: problems

Name: Most probable string (MPS)

• Instance: A probabilistic automaton A, a p>0

• Question: Is there in * a string x such that PrA(x) > p?

Name: Consensus string (CS)

• Instance: A probabilistic automaton A,

• Question: Find in * a string x such that y*

PrA(y) PrA(x)

44

Results (cdlh & Oncina 2013)

• Key lemma: if w has probability p, then it has length at most

|A|2/p

• As a corollary MPS is decidable!

• There exists an algorithm solving CS whose complexity is

O(|||A|2/popt2 )

45

Results (recent)

• Suppose we are trying to find the median string. That is the

string minimizing

xSde(w,x)PrD(x)

• then how do we compute this value?

• Currently, we are at least able to compute

xSde(w,x)PrD(x), for a given w.

46

How do we define learning?

Colin de la Higuera, Nantes, 2016 47

What are we hoping for? [the data]

• We are given some strings

• We are given some labelled strings

• We are not given any strings but can ask

questions

• (instead of strings, you can think graphs or trees)

Colin de la Higuera, Nantes, 2016 48

What are we hoping for? [the result]

• Given some strings, perhaps some labels for these

strings, build a FSM

• Eventual extra tasks

– Be robust

– Be fast

– Be able to prove that the result is “good”

Colin de la Higuera, Nantes, 2016 49

Learning models

• We can prove that algorithms « learn »

– that they can identify correctly something

– that they converge, decreasing the generalisation

error

Colin de la Higuera, Nantes, 2016 50

Just one complete example

Colin de la Higuera, Nantes, 2016 51

The problem:

• An agent must take cooperative decisions in a multi-

agent world

• His decisions will depend:

– on what he hopes to win or lose

– on the actions of other agents

Colin de la Higuera, Nantes, 2016 52

Hypothesis:

Colin de la Higuera, Nantes, 2016 53

e e

pp

l

p e

e e p e p le e e d

The opponent follows a rational strategy (given by a

DFA/Moore machine)

ME:

equations or

pictures

YOU:

listen or

doze

l d

An example of a rational

strategy

Example:

• Each prisoner can admit (a) or stay silent (s)

– If both admit: 3 years (prison) each

– If A admits but not B: A=0 years, B=5 years

– If B admits but not A: B=0 years, A=5 years

– If neither admits: 1 year each

Colin de la Higuera, Nantes, 2016 54

The prisoner's dilemma

Example:

Colin de la Higuera, Nantes, 2016 55

a

a

s

s

-3

-3

0

-5

0

-5

-1

-1

AB

• In our version we study an iterated version against an

opponent who follows a rational strategy

• Gain Function: limit of means (average over a very long series

of moves)

• For example, if we get into a recurrent situation where we

both admit, the gain will be -3

The general problem

• We suppose that the strategy of the opponent is given by

a deterministic finite automaton (DFA)

• Can we imagine an optimal strategy?

Colin de la Higuera, Nantes, 2016 57

Running example

Colin de la Higuera, Nantes, 2016 58

s s

aa

a

a s

s s

Running example

• Then (game theory):

– Consider the opponent’s graph in which we value the edges by our own gain

and find the best (infinite) path in the graph

Colin de la Higuera, Nantes, 2016 59

Suppose we know the opponent’s strategy

Running example

Colin de la Higuera, Nantes, 2016 60

Find the cycle of maximum mean weight

Find the best path leading to this cycle of

maximum mean weight

Follow the path and stay in the

cycle

Running example

Colin de la Higuera, Nantes, 2016 61

Find the cycle of maximum

mean weight

Find the best path leading to

this cycle of maximum

mean weight

Follow the path and

stay in the cycle

a s

a

s

-3 0

-5 -1

s s

aa

a

a s

s s-5

0 0

-1

-3 -1

Mean = -0.5

Best path

Question

Can we play a game against this opponent and…

can we then reconstruct his strategy ?

Colin de la Higuera, Nantes, 2016 62

The data (him, me)

Colin de la Higuera, Nantes, 2016 63

a a a s s a a a a s s s s s s a s a

λ a

a a

as s

asa a

asaa a

asaas s

asaass s

HIM ME If I play asa, his move is a

The logic of the algorithm

• The goal is to be able to parse and to have a partial solution consistent

with the data

• The algorithm is loosely inspired by a number of grammatical inference

algorithms

• It is greedy

Colin de la Higuera, Nantes, 2016 64

The algorithm

Colin de la Higuera, Nantes, 2016 65

λa

a ?a a

a

Sure: Have to deal with:

The first decision

The algorithm

Colin de la Higuera, Nantes, 2016 66

a a

The candidates

a

a

Occam’s razor

Entia non sunt multiplicanda praeter necessitatem

"Entities should not be multiplied unnecessarily"

The algorithm

Colin de la Higuera, Nantes, 2016 67

a a

The second decision

a a s

Sure: Have to deal with:

aa aas ?

The algorithm

68

a a

The third decision

a,s a

s

Inconsistent: Consistent:

aa aas s asa ?

s

a

a

s

Have to deal with:

s

a

Colin de la Higuera, Nantes, 2016

The algorithm

Colin de la Higuera, Nantes, 2016 69

The three candidates

a

a

ss a

a

ss

a

a

s s

a a

a

The algorithm

70

a

The fourth decision

a

s

Consistent:

aa aas sasa aasaa aasaas sasaass ?

s

a

a

s

Have to deal with:

s

s

a

a

Colin de la Higuera, Nantes, 2016

The algorithm

71

a

The fifth decision

a

s

Inconsistent:

aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa s

s

a,s

a

a

ss

a

s

Colin de la Higuera, Nantes, 2016

The algorithm

72

a

The fifth decision

a

s aa aas sasa aasaa aasaas sasaass sasaasss ?

s

a

ss

a

a

ss

a

ss

s

Consistent:

Have to deal with:

Colin de la Higuera, Nantes, 2016

The algorithm

73

a

The sixth decision

a

s

aa aas sasa aasaa aasaas sasaass sasaasss sasaasssa s

s

a

ss

a

a

ss

a

ss

sInconsistent:

s

Colin de la Higuera, Nantes, 2016

The algorithm

74

a

The sixth decision

a

s

aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa sasaasssa ?

s

a

ss

a

a

ss

a

ss

Consistent:

s

s

Have to deal with:

a

Colin de la Higuera, Nantes, 2016

The algorithm

75

a

The seventh decision

a

s

aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa sasaasssa s

s

a

ss

Inconsistent:

s

a

Colin de la Higuera, Nantes, 2016

The algorithm

Colin de la Higuera, Nantes, 201676

a

The seventh decision

a

s

aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa sasaasssa s

s

a

ss

Consistent:

s

a

The algorithm

77

a

The result

a

ss

a

ss

s

a

Colin de la Higuera, Nantes, 2016

How do we get hold of the learning data?

a) through observation (like here)

b) through exploration

Colin de la Higuera, Nantes, 2016 78

An open problem

79

a :20%s :80%

The strategy is probabilistic:

a

s

a

s

s

a

a :50%s :50%

a :70%s :30%

Colin de la Higuera, Nantes, 2016

Tit for tat

Colin de la Higuera, Nantes, 201680

a

a

ss

a

s

Summarising and concluding

Colin de la Higuera, Nantes, 2016 81

Time to say more about grammatical inference

• Machine learning where the data is strings and the models

are finite state machines

• Many applications (and new ones!)

• Many open questions (in fact, applications direct the

questions)

• Researchers in many countries, including Japan

– Etsuji Tomita, Thomas Zeugmann, Yasubumi Sakakibara,

Ryo Yoshinaka, Makoto Kanazawa, Takashi Yokomori

– And many others!

Colin de la Higuera, Nantes, 2016 82

Acknowledgements

• This presentation includes ideas that have appeared after working

with or reading the works of many people.

• Any list is necessarily arbitrary and insufficient.

• But at least, thanks to:

– Peter Flach (Machine Learning, Cambridge University Press)

– D. Carmel and S. Markovitch. Model-based learning of interaction strategies

in multi-agent systems. Journal of Experimental and Theoretical Artificial

Intelligence, 10(3):309–332, 1998

– D. Carmel and S. Markovitch. Exploration strategies for model-based

learning in multiagent systems. Autonomous Agents and Multi-agent

Systems, 2(2):141–172, 1999

Colin de la Higuera, Nantes, 2016 83