74
Machine learning and the expert in the loop Mich` ele Sebag TAO ECAI 2014, Frontiers of AI 1 / 63

Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Machine learning and the expert in the loop

Michele Sebag

TAO

ECAI 2014, Frontiers of AI

1 / 63

Page 2: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Centennial + 2

Computing Machinery and Intelligence Turing 1950

... the problem is mainly one of programming.brain estimates: 1010 to 1015 bitsI can produce about a thousand digits of program lines a day

[Therefore] more expenditious method seems desirable.

⇒ Machine Learning

2 / 63

Page 3: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

ML envisioned by Alan Turing

The process of creating a mind

I Initial state [the innate] ML expert

I Education [environment, teacher] Domain expert

I Other

The teaching process... We normally associate punishments and rewards with theteaching process...

One could carry through the organization of an intelligent machinewith only two interfering inputs, one for pleasure or reward, andthe other for pain or punishment.

This talk: formulating the Pleasure-and-Pain ML agenda

3 / 63

Page 4: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization...rewards

All you need is expert’s feedbackInteractive optimizationProgramming by Feedback

Programming, An AI Frontier

4 / 63

Page 5: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

ML: All you need is logic

Perception → Symbols → Reasoning → Symbols → Actions

Let’s forget about perception and actions for a while...

Symbols → Reasoning → Symbols

Requisite

I Strong representation

I Strong background knowledge

I [ Strong optimization tool ] cf F. Fagesif numerical parameters involved

5 / 63

Page 6: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The Robot Scientist

King et al, 04, 11

Principle: generate hypotheses from background knowledge andexperimental data, design experiments to confirm/infirmhypotheses

Adam: drug screening, hit conformation, and cycles of QSARhypothesis learning and testing.

Eve: − applied to orphan diseases.

6 / 63

Page 7: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

ML: The logic era

So efficient

I Search: Reuse constraint solving, graph pruning,..

Requirement / Limitations

I Initial conditions: critical mass of high-order knowledge

I ... and unified search space cf A. Saffiotti

I Symbol grounding, noise

Of primary value: intelligibility

I A means: for debugging

I An end: to keep the expert involved.

7 / 63

Page 8: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization...rewards

All you need is expert’s feedbackInteractive optimizationProgramming by Feedback

Programming, An AI Frontier

8 / 63

Page 9: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

ML: All you need is data

Old times: datasets were rare

I Are we overfitting the Irvine repository ?

I [ current: Are we overfitting MNIST ? ]

The drosophila of AI

9 / 63

Page 10: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

ML: All you need is data

Now

I Sky is the limit !

I Logic → Compression Markus Hutter, 2004

I Compression → symbols, distribution

10 / 63

Page 11: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Big data

IBM Watson defeats human champions at the quiz game Jeopardy

i 1 2 3 4 5 6 7 81000i kilo mega giga tera peta exa zetta yotta bytes

I Google: 24 petabytes/day

I Facebook: 10 terabytes/day; Twitter: 7 terabytes/day

I Large Hadron Collider: 40 terabytes/seconds

11 / 63

Page 12: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The Higgs boson ML Challenge

Balazs Kegl, Cecile Germain et al.

https://www.kaggle.com/c/higgs-boson September 2014, 15th

12 / 63

Page 13: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The LHC in Geneva

ATLAS Experiment c©2014 CERN

B. Kégl (LAL&LRI/CNRS) Learning to discover: the Higgs challenge 4 / 36

Page 14: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The ATLAS detector

ATLAS Experiment c©2014 CERN

B. Kégl (LAL&LRI/CNRS) Learning to discover: the Higgs challenge 5 / 36

Page 15: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

An event in the ATLAS detector

ATLAS Experiment c©2014 CERN

B. Kégl (LAL&LRI/CNRS) Learning to discover: the Higgs challenge 6 / 36

Page 16: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The data• Hundreds of millions of proton-proton collisions per second

• hundreds of particles: decay products

• hundreds of thousands of sensors (but sparse)

• for each particle: type, energy, direction is measured

• a fixed list of ∼ 30-40 extracted features:

x ∈ Rd

• e.g., angles, energies, directions, number of particles

• discriminating between signal (the particle we are looking for) andbackground (known particles)

• Filtered down to 400 events per second, still petabytes per year

• real-time (budgeted) classification – a research theme on its own

• cascades, cost-sensitive sequential learning

B. Kégl (LAL&LRI/CNRS) Learning to discover: the Higgs challenge 8 / 36

Page 17: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The analysis

• Highly unbalanced data:

• in the H → ττ channel we expect to see < 100 Higgs bosons per yearin 400× 60× 60× 24× 356 ≈ 1010 events

• after pre-selection, we will have 500K background (negative) and 1Ksignal (positive) events

• The goal is not classification but discovery

• a classifier is used to define a (usually tiny) selection region in Rd

• a counting test is used to determine whether the number of observedevents selection region exceeds significantly the expected number ofevents predicted by an only-background hypothesis

B. Kégl (LAL&LRI/CNRS) Learning to discover: the Higgs challenge 9 / 36

Page 18: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization...rewards

All you need is expert’s feedbackInteractive optimizationProgramming by Feedback

Programming, An AI Frontier

13 / 63

Page 19: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

ML: All you need is optimization

Old times

I Find the best hypothesisI Find the best optimization criterion

I statistically soundI such that it defines a well-posed optimization problemI tractable

14 / 63

Page 20: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Episode 1 Amari, 79; Rumelhart & McClelland 86; Le Cun, 86

I NNs are universal approximators,...

I ... but their training yields non-convex optimization problems

I ... and some cannot reproduce the results of some others...

15 / 63

Page 21: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Episode 2I At last, SVMs arrive ! Vapnik 92; Cortes &Vapnik 95

I PrincipleI Min ||h||2I subject to constraints on h(x) (modelling data)

h(xi ).yi > 1, |h(xi )− yi | < ε, h(xi ) < h(x ′i ), h(xi ) > 1...classification, regression, ranking, distribution,...

I Convex optimization ! (well, except for hyper-parameters)I More sophisticated optimization (alternate, upper bounds)...

Boyd & Vandenberghe 04; Bach 04; Nesterov 07; Friedman & al. 07; ...16 / 63

Page 22: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Episode 3

I Did you forget our AI goal ?(learning ↔ learning representation)

I At last Deep learning arrives !

Principle

I We always knew that many-layered NNs offered compactrepresentations Hasted 87

2n neurons on 1 layer n neurons on log n layers

17 / 63

Page 23: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Episode 3

I Did you forget our AI goal ?(learning ↔ learning representation)

I At last Deep learning arrives !

Principle

I We always knew that many-layered NNs offered compactrepresentations Hasted 87

I But, so many poor local optima !

17 / 63

Page 24: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Episode 3

I Did you forget our AI goal ?(learning ↔ learning representation)

I At last Deep learning arrives !

Principle

I We always knew that many-layered NNs offered compactrepresentations Hasted 87

I But, so many poor local optima !

I Breakthrough: unsupervised layer-wise learningHinton 06; Bengio 06

17 / 63

Page 25: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

From prototypes to features

I n prototypes → n regions

I n features → 2n regions

Tutorial Bengio ICML 2012

18 / 63

Page 26: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

19 / 63

Page 27: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

I Ciresan et al: use prior knowledge (non linear invarianceoperators) to generate new examples

I Caruana: use deep NN to label hosts of examples; use them totrain a shallow NN.

19 / 63

Page 28: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

I SVMers’ view: the deep thing is linear learning complexity

Take home message

I It works

I But why ?

I Intelligibility ?

19 / 63

Page 29: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

SVMs and Deep Learning

Last Deep news

I Supervised training works, after all Glorot Bengio 10

I Does not need to be deep, after allCiresan et al. 13, Caruana 13

I SVMers’ view: the deep thing is linear learning complexity

Take home message

I It works

I But why ?

I Intelligibility ?no doubt you recognize a cat

Le &al. 12

19 / 63

Page 30: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization...rewards

All you need is expert’s feedbackInteractive optimizationProgramming by Feedback

Programming, An AI Frontier

20 / 63

Page 31: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Reinforcement Learning

Generalities

I An agent, spatially and temporally situated

I Stochastic and uncertain environment

I Goal: select an action in each time step,

I ... in order maximize expected cumulative reward over a timehorizon

What is learned ?A policy = strategy = state 7→ action

21 / 63

Page 32: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Reinforcement Learning, formal background

Notations

I State space SI Action space AI Transition p(s, a, s ′) 7→ [0, 1]

I Reward r(s)

I Discount 0 < γ < 1

Goal: a policy π mapping states onto actions

π : S 7→ A

s.t.

Maximize E [π|s0] = Expected discounted cumulative reward= r(s0) +

∑t γ

t+1 p(st , a = π(st), st+1)r(st+1)

22 / 63

Page 33: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Find the treasure

Single reward: on the treasure.

23 / 63

Page 34: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Wandering robot

Nothing happens...

24 / 63

Page 35: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The robot finds it

25 / 63

Page 36: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Robot updates its value function

V (s, a) == “distance“ to the treasure on the trajectory.

26 / 63

Page 37: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Reinforcement learning* Robot most often selects a = arg max V (s, a)* and sometimes explores (selects another action).* Lucky exploration: finds the treasure again

27 / 63

Page 38: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Updates the value function

* Value function tells how far you are from the treasure given theknown trajectories.

28 / 63

Page 39: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Finally

* Value function tells how far you are from the treasure

29 / 63

Page 40: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Finally

Let’s be greedy: selects the action maximizing the value function

30 / 63

Page 41: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Reinforcement learning

Three interdependent tasks

I Learn value function

I Learn transition model

I Explore the world

Issues

I Exploration / Exploitation dilemma

I Representation, approximation, scaling up

I REWARDS designer’s duty

31 / 63

Page 42: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Reinforcement learning and the expert

Input needed from expert

I A reward function standard RLSutton &Barto 08; Szepesvari 10

I An expert demonstrating an “optimal“ behavior inverse RLAbbeel &al. 04-12; Billard &al. 05-13

Lagoudakis &Parr 03; Konaridis &al. 10

I A reliable teacher preference-based RLAkrour &al. 11, Wilson &al. 12, Knox &al. 13, Saxena &al. 13

I A teacher Akrour et al. 14

32 / 63

Page 43: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Strong expert jumps in and demonstrates behaviorInverse reinforcement learning

I From demonstration to rewardsFrom (st , at , st+1), learn a reward function r s.t.

Q(si , ai ) ≥ Q(si , a) + 1, ∀a 6= ai

Ng & Russell 00, Abbeel & Ng 04, Kolter &al. 07

I Then apply standard RL

33 / 63

Page 44: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Inverse Reinforcement LearningIssues

I An informed representation (speed; bumping in a pedestrian)I A strong expert Kolter et al. 07; Abbeel 08

In some cases there is no expert

Swarm-bot (2001-2005) Swarm Foraging, UWE

Symbrion IP, 2008-2013; http://symbrion.org/

34 / 63

Page 45: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Inverse Reinforcement LearningIssues

I An informed representation (speed; bumping in a pedestrian)I A strong expert Kolter et al. 07; Abbeel 08

In some cases there is no expert

Swarm-bot (2001-2005) Swarm Foraging, UWE

Symbrion IP, 2008-2013; http://symbrion.org/ 34 / 63

Page 46: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Expert jumps in and provides preferences...

Cheng et al. 11, Furnkranz et al. 12

Context

I Medical prescription

I What is the cost of a death event ?

Approach

a <π,s a′ iff following π after (s, a) is better than after (s, a′)

35 / 63

Page 47: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization...rewards

All you need is expert’s feedbackInteractive optimizationProgramming by Feedback

Programming, An AI Frontier

36 / 63

Page 48: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

The 20 Question game

I First a society game 19th century

I Then a radio game end 40s, 50s

I Then an AI game Burgener, 88

http://www.20q.net/

20 questions → 20 bits → discriminates among 220 ≈ 106 words.

37 / 63

Page 49: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Interactive optimization

Optimizing the coffee taste Herdy et al., 96Black box optimization:

F : Ω→ IR Find arg max F

The user in the loop replaces F

Optimizing visual rendering Brochu et al., 07

Optimal recommendation sets Viappiani & Boutilier, 10

Information retrieval Shivaswamy & Joachims, 12

38 / 63

Page 50: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Interactive optimization

Optimizing the coffee taste Herdy et al., 96Black box optimization:

F : Ω→ IR Find arg max F

The user in the loop replaces F

Optimizing visual rendering Brochu et al., 07

Optimal recommendation sets Viappiani & Boutilier, 10

Information retrieval Shivaswamy & Joachims, 12

38 / 63

Page 51: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Interactive optimization

Features

I Search space X ⊂ IRd(recipe x : 33% arabica, 25% robusta, etc)

I A non-computable objective

I Expert can (by tasting) emit preferences x ≺ x ′.

Scheme

1. Alg. generates candidates x , x ′, x“, ..

2. Expert emits preferences

3. goto 1.

Issues

I Asking as few questions as possible 6= active ranking

I Modelling the expert’s taste surrogate model

I Enforce the exploration vs exploitation trade-off39 / 63

Page 52: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization...rewards

All you need is expert’s feedbackInteractive optimizationProgramming by Feedback

Programming, An AI Frontier

40 / 63

Page 53: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Programming by feedback

Akrour &al. 14

1. Computer presents the expert with a pair of behaviors yt1 , yt2

2. Expert emits preferences yt1 yt2

3. Computer learns expert’s utility function

4. Computer searches for behaviors with best utility

5. Goto 1

41 / 63

Page 54: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Relaxing Expertise Requirements: The trend

Expert

I Associates a reward to each state RL

I Demonstrates a (nearly) optimal behavior Inverse RL

I Compares and revises agent demonstrations Co-active PL

I Compares demonstrations Preference PL, PF

Ex-per-tise

Agent

I Computes optimal policy based on rewards RL

I Imitates verbatim expert’s demonstration IRL

I Imitates and modifies IRL

I Learns the expert’s utility IRL, CPL

I Learns, and selects demonstrations CPL, PPL, PF

I Accounts for the expert’s mistakes PF

Au-ton-omy

42 / 63

Page 55: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Programming by feedback

Critical issues

I Asks few preference queriesNot active preference learning: Sequential model-based optimization

cf H. Hoos’ talk

I Accounts for preference noiseI Expert changes his mindI Expert makes mistakesI ...especially at the beginning

43 / 63

Page 56: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Formal setting

X Search space, solution space controllers, IRD

Y Evaluation space, behavior space trajectories, IRd

Φ : X 7→ Y

Utility function

U∗ Y 7→ IR U∗(y) = 〈w∗, y〉 behavior spaceU∗X X 7→ IR U∗X (x) = IEy∼Φ(X )[U∗(y)] search space

Requisites

I Evaluation space: simple to learn from few queries

I Search space: sufficiently expressive

44 / 63

Page 57: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Programming by Feedback

Ingredients

I Modelling the expert’s competence

I Learning the expert’s utility

I Selecting the next best behaviorsI Which optimization criterionI How to optimize it

45 / 63

Page 58: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Modelling the expert’s competence

Noise model δ ∼ U[0,M]Given preference margin z = 〈w∗, y− y′〉

P(y ≺ y′ | w∗, δ) =

0 if z < −δ1 if z > δ1+z

2 otherwise

Prob of error

1/2

delta−delta

Preference margin Z

46 / 63

Page 59: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Learning the expert’s utility function

Data Ut = y0, y1, . . . ; (yi1 yi2), i = 1 . . . tI trajectories yiI preferences yi1 yi2

Learning: find θt posterior on W W = linear fns on Y

Proposition: Given Ut ,

θt(w) ∝∏

i=1,t P(yi1 yi2 | w)

=∏

i=1,t

(12 + wi

2M

(1 + log M

|wi |

))with wi = 〈w, yi1 − yi2〉, capped to [−M,M].

Ut(y) = IEw∼θt [〈w, y〉]

47 / 63

Page 60: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Best demonstration pair (y , y ′)inspiration, Viappiani Boutilier, 10

EUS: Expected utility of selection (greedy)

EUS(y, y′) = IEθt [〈w, y − y ′〉 > 0] . Uw∼θt ,y>y ′(y)+ IEθt [〈w, y − y ′〉 < 0] . Uw∼θt ,y<y ′(y′)

EPU: Expected posterior utility (lookahead)

EPU(y, y′) = IEθt [〈w, y − y ′〉 > 0] . maxy“Uw∼θt ,y>y ′(y′′)+ IEθt [〈w, y − y ′〉 < 0] . maxy“Uw∼θt ,y<y ′(y′′)

= IEθt [〈w, y − y ′〉 > 0] . Uw∼θt ,y>y ′(y∗)+ IEθt [〈w, y − y ′〉 < 0] . Uw∼θt ,y<y ′(y′∗)

Thereforeargmax EPU(y, y′) ≤ argmax EUS(y, y′)

48 / 63

Page 61: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Optimization in demonstration space

NL: noiseless N: noisy

Proposition

EUSNL(y, y′)− L ≤ EUSN(y, y′) ≤ EUSNL(y, y′)

Proposition

max EUSNLt (y, y′)− L ≤ max EPUN

t (y, y′) ≤ max EUSNLt (y, y′) + L

Limited loss incurred (L ∼ M20 )

49 / 63

Page 62: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Optimization in solution space

1. Find best y, y′ → Find best yto be compared to best behavior so far y∗t

The game of hot and cold

2. Expectation of behavior utility → utility of expected behaviorGiven the mapping Φ: search 7→ demonstration space,

IEΦ[EUSNL(Φ(x), y∗t )] ≥ EUSNL(IEΦ[Φ(x)], y∗t )

3. Iterative solution optimization

I Draw w0 ∼ θt and let x1 = argmax 〈w0, IEΦ[Φ(x)]〉I Iteratively, find xi+1 = argmax 〈IEθi [w], IEΦ[Φ(x)]〉, with θi

posterior to IEΦ[Φ(xi )] > y∗t .

Proposition. The sequence monotonically converges toward alocal optimum of EUSNL

50 / 63

Page 63: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Experimental validation

I Sensitivity to expert competenceSimulated expert, grid world

I Continuous case, no generative modelThe cartpole

I Continuous case, generative modelThe bicycle

I Training in-situThe Nao robot

51 / 63

Page 64: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Sensitivity to (simulated) expert incompetence

Grid world: discrete case, no generative model25 states, 5 actions, horizon 300, 50% transition motionless

ME Expert incompetenceMA > ME Computer estimate of expert’s incompetence

1

1/2

1/2

1/4

1/4

1/4

1/64

1/64

1/641/128

1/128

1/256

......

True w∗ on gridworld

52 / 63

Page 65: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Sensitivity to (simulated) expert incompetence, 2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60

Tru

e u

tilit

y

#Queries

ME = .25 MA = .25ME = .25 MA = .5ME = .25 MA = 1ME = .5 MA = .5ME = .5 MA = 1ME = 1 MA = 1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 10 20 30 40 50 60

Expert

err

or

rate

#Queries

ME = .25 MA = .25ME = .25 MA = 1ME = .5 MA = 1ME = 1 MA = 1

True utility of xt expert’s mistakes

A cumulative (dis)advantage phenomenon:

The number of expert’s mistakes increases as the computer

underestimates the expert’s competence.

For low MA, the computer learns faster, submits more relevant demonstrations

to the expert, thus priming a virtuous educational process.

53 / 63

Page 66: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Continuous Case, no Generative Model

The cartpoleState space IR2, 3 actionsDem. space IR9, dem. length 3,000

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1 2 3 4 5 6 7 8 9 10

Tru

e U

tilit

y

#Queries

ME = .25, MA = .25ME = .25, MA = .5ME = .25, MA = 1ME = .5, MA = .5ME = .5, MA = 1ME = 1, MA = 1

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 3 4 5 6 7 8 9 10

Featu

re w

eig

ht

#Queries

Gaussian centered on the equilibrium state

Cartpole True utility of xt Estimated utility of featuresfraction in equilibrium

Two interactions required on average to solve the cartpole problem.No sensitivity to noise.

54 / 63

Page 67: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Continuous Case, with Generative Model

The bicycleSolution space IR210 (NN weight vector)State space IR4, action space IR2, dem. length ≤ 30, 000.

True utility-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0 2 4 6 8 10 12 14 16 18 20

Tru

e U

tilit

y

#Queries

ME = 1, MA = 1

Optimization component: CMA-ES Hansen et al., 2001

15 interactions required on average to solve the problem for lownoise.versus 20 queries, with discrete action in state of the art.

55 / 63

Page 68: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Training in-situ

The Nao

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25 30

Tru

e U

tility

#Queries

13 states20 states

The Nao robot Nao: true utility of xt

Goal: reaching a given state.Transition matrix estimated from 1,000 random (s, a, s ′) triplets.Dem. length 10, fixed initial state.12 interactions for 13 states25 interactions for 20 states

56 / 63

Page 69: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Overview

Preamble

Machine Learning: All you need is......logic...data...optimization...rewards

All you need is expert’s feedbackInteractive optimizationProgramming by Feedback

Programming, An AI Frontier

57 / 63

Page 70: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Partial Conclusion

Feasibility of Programming by Feedback for simple tasks

Back on track:

One could carry through the organization of anintelligent machine with only two interfering inputs,one for pleasure or reward, and the other for pain orpunishment.

Expert says: it’s better pleasureExpert says: it’s worse pain

58 / 63

Page 71: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Revisiting the art of programming

1970s Specifications Languages & thm proving

1990s Programming by Examples Pattern recognition & ML

2010s Interactive Learning and OptimizationI Optimizing coffee taste Herdy, 96I Visual rendering Brochu et al., 10I Choice query Viappiani et al., 10I Information retrieval Joachims et al., 12I Robotics

Akrour et al., 12; Wilson et al., 12; Knox et al. 13; Saxena et al 13

toward Programming by ML & Optimization

59 / 63

Page 72: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Programming by Feedback

About the numerical divide

C.A. : Once things can be done on desktops they can be done byanyone. Anderson, 12

?? : Well ... not everyone is a digital native..

About interaction

as designer No need to debug if you can just say: No !and the computer reacts (appropriately).

as user I had a dream: a world where I don’t need to read themanual...

60 / 63

Page 73: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Future: Tackling the Under-Specified

Knowledge-constrained Computation, memory-constrained

61 / 63

Page 74: Mich ele Sebag TAO ECAI 2014, Frontiers of AIsebag/Slides/Frontiers_ECAI14_Sebag.pdf · Machine learning and the expert in the loop Mich ele Sebag TAO ECAI 2014, Frontiers of AI 1/63

Acknowledgments

Riad AkrourMarc SchoenauerAlexandre ConstantinJean-Christophe Souplet

62 / 63