Upload
trinhnga
View
214
Download
1
Embed Size (px)
Citation preview
www.univ-nantes.frwww.univ-nantes.frwww.univ-nantes.fr
Grammatical Inference:
Learning Automata and
Grammars
Colin de la Higuera, 2016
Kyoto, 11th April 2016
I am
• Professor at Nantes University
• Former President of the Société informatique de
France
• Researcher in Machine Learning
• Guest scholar at Akutsu Lab, University of Kyoto
• Jan-Jun 2016
2
2
Starting point: Machine Learning
Colin de la Higuera, Nantes, 2016 3
What is machine learning? (1)
Colin de la Higuera, Nantes, 2016 4
• What does a Universal Turing machine do?
– It takes the data and the code and runs the code on the data
– The code is therefore also data
• Next step (as proposed by Turing in 1948):
– The learning machine
– Takes the code and the data and returns the code transformed
Alan Turing, On Computable Numbers with an Application to the Entscheidungsproblem, Proc. London Math.
Soc., 2nd series, vol. 42, 1937, p. 230-265
Alan Turing, Computing machinery and intelligence, Mind, Oxford University Press, vol. 59, no 236, 1950
Alan Turing, 1912-1954
Turing’s dream
What is machine learning? (2)
• Let the data decide (not the algorithm)
• The algorithm can be used to organise, index,
search
• Not more
• Typical (or extreme?) application of this idea:
k-nearest neighbours
Colin de la Higuera, Nantes, 2016 5
The big data project
Image: https://commons.wikimedia.org/wiki/File:Big_Bang_Data_exhibit_at_CCCB_17.JPG
What is machine learning? (3)
• Use the data to build a model:
– A model is a way to
• Compress the data
• Interpret the data
• Forget the data
Colin de la Higuera, Nantes, 2016 6
The pragmatic approach
My talk at KU-ICR on Thursday
How do we choose the model?
• This depends on the data
• When the data is made of points in a 2 dimension space,
this is easy
• The model can be a half-plane, a line, a polynomial
How do we choose the model?
• This depends on the data
• When the data is made of points in a 2 dimension space,
this is easy
• The model can be a half-plane, a line, a polynomial
How do we choose the model?
• This depends on the data
• When the data is made of points in a 2 dimension space,
this is easy
• The model can be a half-plane, a line, a polynomial
How do we choose the model?
• This depends on the data
• When the data is made of points in a 3 dimension space,
this is (still) easy
• The model can be a hyperplane, a separating plane, a
polynomial
Colin de la Higuera, Nantes, 2016 10
How do we choose the model?
• This depends on the data
• When the data is made of points in a high dimension
space, this is still possible (with linear algebra)
• The model can be a hyperplane, a separating hyperplane, a
function
Colin de la Higuera, Nantes, 2016 11
Image: https://i.ytimg.com/vi/Kk6rd4_dAqA/maxresdefault.jpg
Typical techniques today
• Support vector machines
• (deep) Neural networks
Colin de la Higuera, Nantes, 2016 12
Images: https://upload.wikimedia.org/wikipedia/commons/1/10/Svm_10_perceptron.JPG
https://upload.wikimedia.org/wikipedia/commons/3/32/Single-layer_feedforward_artificial_neural_network.png
A research program for computer scientists
• We know how to manipulate strings, trees, graphs
• They are good for modelling
• They contain precious information about the interactions
• Why lose their power?
• Goal is therefore to learn from (such) structured data, and
learn models adapted to such data
Colin de la Higuera, Nantes, 2016 13
A comparison
AKA Pros Cons
Vector space
machine
learning
Statistical
pattern
recognition
Robust, many algorithms
and methods.
Existence of a topology
Black box effect.
Difficult to
understand
Rich data
representations
Structural
pattern
recognition
Richer representation.
Possibility of capturing
the interactions
Intelligibility
Often less noise
resistant.
Often more
expensive
14
14
The challenge
• When input is a set of strings, why not learn an
automaton, a formal grammar?
• Ie a model designed to represent languages!
Colin de la Higuera, Nantes, 2016 15
The data for grammatical inference
Colin de la Higuera, Nantes, 2016 16
The data: examples of strings
A sentence in English and its translation to Japanese:
• What's that called In Japanese?
• あれは日本語で何といいますか。
(Arewa nihongo de nanto iimasu ka?)
Colin de la Higuera, Nantes, 2016 17
Transducers can be used to translate
Inversely, translating
これは、ウサギですこれは、ウサギのですか
also needs differing
Colin de la Higuera, Nantes, 2016 18
This:これは is: λa: λ
cat: 猫です
computer:
は、コンピュータ
The data: examples of strings
• Time series pose the problem of the alphabet:
– An infinite alphabet?
– Discretizing?
– An ordered alphabet
Colin de la Higuera, Nantes, 2016 19
Sinus rhythm with acquired long QT, work found via Flickr, by Popfossa, CC BY 2.0
The data: examples of strings
Colin de la Higuera, Nantes, 2016 20
Codis profile, Chemical Science & Technology Laboratory, National Institute of Standards and Technology, work found via Wikipedia, CC BY-SA 3.0
The data: examples of strings
>A BAC=41M14 LIBRARY=CITB_978_SKB
AAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA
GTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAAT
GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTATTGCCATCTTTATTTCA
GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGA
CACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT
GGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG
GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCA
GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAA
CAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACA
CACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA
TGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAA
TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATA
AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCA
TCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAG
TATTCACTATCAAATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC
TAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGA
GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGAT
GGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAA
AAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC
TGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGAC
TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCA
TGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT
Colin de la Higuera, Nantes, 2016 21
The data: examples of strings
Colin de la Higuera, Nantes, 2016 22
Cancionero de Palacio, work found via Wikipedia, CC BY-SA 3.0
The data: examples of strings
Colin de la Higuera, Nantes, 2016 23
https://upload.wikimedia.org/wikipedia/commons/3/36/Emperor_family_tree_0_en.p
ng , CC BY-SA 3.0
The data: examples of strings
Colin de la Higuera, Nantes, 2016 24
Phylogenetic Tree, Woese 1990, Maulucioni, work found via Wikipedia, CC BY-SA 3.0
The data: examples of strings
Colin de la Higuera, Nantes, 2016 25
<book>
<part>
<chapter>
<sect1/>
<sect1>
<orderedlist numeration="arabic">
<listitem/>
<f:fragbody/>
</orderedlist>
</sect1>
</chapter>
</part>
</book>
The data: examples of strings
Colin de la Higuera, Nantes, 2016 26
<?xml version="1.0"?>
<?xml-stylesheet href="carmen.xsl" type="text/xsl"?>
<?cocoon-process type="xslt"?>
<!DOCTYPE pagina [
<!ELEMENT pagina (titulus?, poema)>
<!ELEMENT titulus (#PCDATA)>
<!ELEMENT auctor (praenomen, cognomen, nomen)>
<!ELEMENT praenomen (#PCDATA)>
<!ELEMENT nomen (#PCDATA)>
<!ELEMENT cognomen (#PCDATA)>
<!ELEMENT poema (versus+)>
<!ELEMENT versus (#PCDATA)>
]>
<pagina>
<titulus>Catullus II</titulus>
<auctor>
<praenomen>Gaius</praenomen>
<nomen>Valerius</nomen>
<cognomen>Catullus</cognomen>
</auctor>
27
A linguistic tree. (Courtesy of Mark Knauf and Etsuyo
Yuasa, Department of East Asian Languages and
Literatures (DEALL), Ohio State University.)
Parse trees
Colin de la Higuera, Nantes, 2016 28
And also
• Business processes
• Bird songs
• Images (contours and shapes)
• Robot moves
• Observations of protocols, server exchanges
• Interactions between systems
• …
Colin de la Higuera, Nantes, 2016 29
The models in grammatical inference
Colin de la Higuera, Nantes, 2016 30
An HMM
Colin de la Higuera, Nantes, 2016 31
• https://en.wikipedia.org/wiki/Hidden_Markov_model
Another HMM (proteins)
• http://www.cbs.dtu.dk/~kj/bioinfo_assign2.html
• And a more interesting example:
• http://www.cbs.dtu.dk/~kj/hmm-real-life-example.pdf
Colin de la Higuera, Nantes, 2016 32
A finite state machine
• https://msdn.microsoft.com/en-us/library/aa478972.aspx
Colin de la Higuera, Nantes, 2016 33
Another FSM (a transducer)
• The "3-state busy beaver" Turing Machine in a finite state representation. Each
circle represents a "state" of the TABLE—an "m-configuration" or "instruction".
"Direction" of a state transition is shown by an arrow. The label (e.g.. 0/P,R) near
the outgoing state (at the "tail" of the arrow) specifies the scanned symbol that
causes a particular transition (e.g. 0) followed by a slash /, followed by the
subsequent "behaviors" of the machine, e.g. "P Print" then move tape "R Right".
No general accepted format exists. The convention shown is after McClusky
(1965), Booth (1965), Hill and Peterson (1974).
• https://commons.wikimedia.org/
Colin de la Higuera, Nantes, 2016 34
A transducer
• Comparing nondeterministic and quasideterministic finite-state transducers
built from morphological dictionaries
• Author Alicia Garrido-Alenda and Mikel L. Forcada
• https://commons.wikimedia.org/
Colin de la Higuera, Nantes, 2016 35
Stress patterns transducer
• Example: penult; alt secondary
Colin de la Higuera, Nantes, 2016 36
1,w0,s2
2,w0w0,s2s0
3,w0w0w0,s0s2s0
4,w0w0w0w0w0,s0s1s0s2s0
5,w0w0w0w0w0w0w0,s0s1s0s1s0s2s0
6,w0w0w0w0w0w0w0w0w0,s0s1s0s1s0s1s0s2s0
Adapted from http://st2.ullet.net/
A PCFG (so not only finite state machines)
Colin de la Higuera, Nantes, 2016 37
Summarising
• Finite state models
– DFA
– NFA
– PFA
– HMM
– transducer
• Grammatical models
– Context Free Grammar
– Probabilistic Context-Free Grammar
• (many others)
Colin de la Higuera, Nantes, 2016 38
Partial Conclusion
• If we have some strings and want to learn the
models we have just seen… what do we need?
Colin de la Higuera, Nantes, 2016 39
We need… to solve many problems
related to the models themselves
Colin de la Higuera, Nantes, 2016 40
42
Pr(aba) = 0.7*0.4*0.1*1 +0.7*0.4*0.35*0.2 = 0.028+0.0196= 0.0476
0.2
0.1
1
a
b
a
a
a
b
0.45
0.350.4
0.7
0.3
0.1
b 0.4
Parsing with a PFA
1
a 0.3
a 0.7a 0.7
a 0.9 a 0.3
b 0.1
PrA(b)=0.1
PrA(aaaaa)=3*0.9*0.32*0.72=0.119
43
Most probable string is?
Most probable string: problems
Name: Most probable string (MPS)
• Instance: A probabilistic automaton A, a p>0
• Question: Is there in * a string x such that PrA(x) > p?
Name: Consensus string (CS)
• Instance: A probabilistic automaton A,
• Question: Find in * a string x such that y*
PrA(y) PrA(x)
44
Results (cdlh & Oncina 2013)
• Key lemma: if w has probability p, then it has length at most
|A|2/p
• As a corollary MPS is decidable!
• There exists an algorithm solving CS whose complexity is
O(|||A|2/popt2 )
45
Results (recent)
• Suppose we are trying to find the median string. That is the
string minimizing
xSde(w,x)PrD(x)
• then how do we compute this value?
• Currently, we are at least able to compute
xSde(w,x)PrD(x), for a given w.
46
How do we define learning?
Colin de la Higuera, Nantes, 2016 47
What are we hoping for? [the data]
• We are given some strings
• We are given some labelled strings
• We are not given any strings but can ask
questions
• (instead of strings, you can think graphs or trees)
Colin de la Higuera, Nantes, 2016 48
What are we hoping for? [the result]
• Given some strings, perhaps some labels for these
strings, build a FSM
• Eventual extra tasks
– Be robust
– Be fast
– Be able to prove that the result is “good”
Colin de la Higuera, Nantes, 2016 49
Learning models
• We can prove that algorithms « learn »
– that they can identify correctly something
– that they converge, decreasing the generalisation
error
Colin de la Higuera, Nantes, 2016 50
Just one complete example
Colin de la Higuera, Nantes, 2016 51
The problem:
• An agent must take cooperative decisions in a multi-
agent world
• His decisions will depend:
– on what he hopes to win or lose
– on the actions of other agents
Colin de la Higuera, Nantes, 2016 52
Hypothesis:
Colin de la Higuera, Nantes, 2016 53
e e
pp
l
p e
e e p e p le e e d
The opponent follows a rational strategy (given by a
DFA/Moore machine)
ME:
equations or
pictures
YOU:
listen or
doze
l d
An example of a rational
strategy
Example:
• Each prisoner can admit (a) or stay silent (s)
– If both admit: 3 years (prison) each
– If A admits but not B: A=0 years, B=5 years
– If B admits but not A: B=0 years, A=5 years
– If neither admits: 1 year each
Colin de la Higuera, Nantes, 2016 54
The prisoner's dilemma
Example:
Colin de la Higuera, Nantes, 2016 55
a
a
s
s
-3
-3
0
-5
0
-5
-1
-1
AB
• In our version we study an iterated version against an
opponent who follows a rational strategy
• Gain Function: limit of means (average over a very long series
of moves)
• For example, if we get into a recurrent situation where we
both admit, the gain will be -3
The general problem
• We suppose that the strategy of the opponent is given by
a deterministic finite automaton (DFA)
• Can we imagine an optimal strategy?
Colin de la Higuera, Nantes, 2016 57
Running example
Colin de la Higuera, Nantes, 2016 58
s s
aa
a
a s
s s
Running example
• Then (game theory):
– Consider the opponent’s graph in which we value the edges by our own gain
and find the best (infinite) path in the graph
Colin de la Higuera, Nantes, 2016 59
Suppose we know the opponent’s strategy
Running example
Colin de la Higuera, Nantes, 2016 60
Find the cycle of maximum mean weight
Find the best path leading to this cycle of
maximum mean weight
Follow the path and stay in the
cycle
Running example
Colin de la Higuera, Nantes, 2016 61
Find the cycle of maximum
mean weight
Find the best path leading to
this cycle of maximum
mean weight
Follow the path and
stay in the cycle
a s
a
s
-3 0
-5 -1
s s
aa
a
a s
s s-5
0 0
-1
-3 -1
Mean = -0.5
Best path
Question
Can we play a game against this opponent and…
can we then reconstruct his strategy ?
Colin de la Higuera, Nantes, 2016 62
The data (him, me)
Colin de la Higuera, Nantes, 2016 63
a a a s s a a a a s s s s s s a s a
λ a
a a
as s
asa a
asaa a
asaas s
asaass s
HIM ME If I play asa, his move is a
The logic of the algorithm
• The goal is to be able to parse and to have a partial solution consistent
with the data
• The algorithm is loosely inspired by a number of grammatical inference
algorithms
• It is greedy
Colin de la Higuera, Nantes, 2016 64
The algorithm
Colin de la Higuera, Nantes, 2016 65
λa
a ?a a
a
Sure: Have to deal with:
The first decision
The algorithm
Colin de la Higuera, Nantes, 2016 66
a a
The candidates
a
a
Occam’s razor
Entia non sunt multiplicanda praeter necessitatem
"Entities should not be multiplied unnecessarily"
The algorithm
Colin de la Higuera, Nantes, 2016 67
a a
The second decision
a a s
Sure: Have to deal with:
aa aas ?
The algorithm
68
a a
The third decision
a,s a
s
Inconsistent: Consistent:
aa aas s asa ?
s
a
a
s
Have to deal with:
s
a
Colin de la Higuera, Nantes, 2016
The algorithm
Colin de la Higuera, Nantes, 2016 69
The three candidates
a
a
ss a
a
ss
a
a
s s
a a
a
The algorithm
70
a
The fourth decision
a
s
Consistent:
aa aas sasa aasaa aasaas sasaass ?
s
a
a
s
Have to deal with:
s
s
a
a
Colin de la Higuera, Nantes, 2016
The algorithm
71
a
The fifth decision
a
s
Inconsistent:
aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa s
s
a,s
a
a
ss
a
s
Colin de la Higuera, Nantes, 2016
The algorithm
72
a
The fifth decision
a
s aa aas sasa aasaa aasaas sasaass sasaasss ?
s
a
ss
a
a
ss
a
ss
s
Consistent:
Have to deal with:
Colin de la Higuera, Nantes, 2016
The algorithm
73
a
The sixth decision
a
s
aa aas sasa aasaa aasaas sasaass sasaasss sasaasssa s
s
a
ss
a
a
ss
a
ss
sInconsistent:
s
Colin de la Higuera, Nantes, 2016
The algorithm
74
a
The sixth decision
a
s
aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa sasaasssa ?
s
a
ss
a
a
ss
a
ss
Consistent:
s
s
Have to deal with:
a
Colin de la Higuera, Nantes, 2016
The algorithm
75
a
The seventh decision
a
s
aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa sasaasssa s
s
a
ss
Inconsistent:
s
a
Colin de la Higuera, Nantes, 2016
The algorithm
Colin de la Higuera, Nantes, 201676
a
The seventh decision
a
s
aa aas sasa aasaa aasaas sasaass sasaasss s asaasssa sasaasssa s
s
a
ss
Consistent:
s
a
The algorithm
77
a
The result
a
ss
a
ss
s
a
Colin de la Higuera, Nantes, 2016
How do we get hold of the learning data?
a) through observation (like here)
b) through exploration
Colin de la Higuera, Nantes, 2016 78
An open problem
79
a :20%s :80%
The strategy is probabilistic:
a
s
a
s
s
a
a :50%s :50%
a :70%s :30%
Colin de la Higuera, Nantes, 2016
Tit for tat
Colin de la Higuera, Nantes, 201680
a
a
ss
a
s
Summarising and concluding
Colin de la Higuera, Nantes, 2016 81
Time to say more about grammatical inference
• Machine learning where the data is strings and the models
are finite state machines
• Many applications (and new ones!)
• Many open questions (in fact, applications direct the
questions)
• Researchers in many countries, including Japan
– Etsuji Tomita, Thomas Zeugmann, Yasubumi Sakakibara,
Ryo Yoshinaka, Makoto Kanazawa, Takashi Yokomori
– And many others!
Colin de la Higuera, Nantes, 2016 82
Acknowledgements
• This presentation includes ideas that have appeared after working
with or reading the works of many people.
• Any list is necessarily arbitrary and insufficient.
• But at least, thanks to:
– Peter Flach (Machine Learning, Cambridge University Press)
– D. Carmel and S. Markovitch. Model-based learning of interaction strategies
in multi-agent systems. Journal of Experimental and Theoretical Artificial
Intelligence, 10(3):309–332, 1998
– D. Carmel and S. Markovitch. Exploration strategies for model-based
learning in multiagent systems. Autonomous Agents and Multi-agent
Systems, 2(2):141–172, 1999
Colin de la Higuera, Nantes, 2016 83
Colin de la Higuera, Nantes, 2016 84