NUS-ISS Learning Day 2017 - How About a Game of Chess?

HOW ABOUT A NICE GAME OF CHESS?

Lee Chuk Munnchukmunnlee@nus.edu.sg

http://bit.ly/learningday2017

What is this talk about?

An introduction to key ideas in reinforcement learning

Conceptual, fairly high level

Intuition about the maths without the equations- Not quite. There are 4 equations

What is Machine Learning?

“Machine learning is the field of

study that gives computers the

ability to learn without being

explicitly programmed”

Arthur Samuel

Wrote the first

program that learn

how to play

checkers in 1959

Uses the minimax

algorithm

Arthur Samuel

Programmed vs Learnt

Traditional Program

MachineProgram

Answer

Machine Learning

Machine

New AnswerMachine Learning

Program

Answer

Training phase

How Do Machine Learn?

Slides adapted from https://www.slideshare.net/aaalm/2016-01-27-a-gentle-and-structured-introduction-to-machine-learning-v105-for-release-58329371

By generalising

Supervised

Learning

Give samples

Give answers to the

samples

Infer rules from the

samples and the

answers

By comparing

Unsupervised

Learning

Give samples

Do not give answers

Use some metrics to

infer similarity by

grouping them

By reward

Reinforcement

Learning

Do not give samples

Do not give answers

Infer the rules from the

positive or negative

feedback

Reinforcement Learning circa 1977

ActionObserveAgent

Environment

State Reward

Reinforcement Learning

A ‘snapshot’ of the environment at a point in time

What are the possible actions to take?

LEFT UP

State Transition

State State State

Action

LEFT UP

Which of these 2

states is the better

one to be in?

Features of this state

Straight ahead, no reduction

in speed

Features of this state

Proximity to power pill and

around the corner (obstacle) but

reduce speed

Utility

State State State

Action

The state has the

best features

Utility is how we calculate

the ‘goodness’ of a state

Using the utility function we

can express the agent’s

preference

u(state0) > u(state1)

agent prefer state0 over

state1

To Win

Maximize our utility

Utility, Value and Reward

Rewards - How do they differ?

What Actions to Take?

Image from http://ai.berkeley.edu/home.html

Rewards

- Can be either positive and negative

- Given at the end

- Given at every step - living reward

Prefer now to later

- Discounting - earlier rewards will have higher utility than later rewards

Image from http://ai.berkeley.edu/home.html

How will the Agent Behave?

Scenario #1

Living reward is -1

End game reward is 100

Scenario #2

Living reward is 1

End game reward is 100

Uncertainty

Controller Problem

80% move in the correct direction

10% go left

10% go right

What is the Value?

Time to Changi Airport

Normal traffic = 30 mins

Probability = 60%

Heavy traffic = 45 mins

Probability = 40%

Time to Changi Airport

(30 ✕ .6) = 18 mins (45 ✕ .4) = 18 mins

≅36 mins

Bellman Equation

Richard Bellman

The Bellman Equation

state0

state1

(state0, action)

(state0, action, state1)

Q-state

Policy Extraction or How to Win

Step 1 - start by being optimal

Step 2 - keep being optimal

1. Prefer the closer exit (1),

risking the cliff (-10)

1. Prefer the closer exit (1),

avoiding the cliff (-10)

1. Prefer the distant exit (10),

risking the cliff (-10)

1. Prefer the distant exit (10),

avoiding the cliff (-10)

Model Free Learning

Trial and Error

Don’t know the transitions

Don’t know the rewards

Model Free Learning

Learn by trial and error

Eventually will approximate Bellman updates

Explore vs Exploit

Where to Learn?

Reinforcement Learning - An Introductionby Richard S Sutton and Andrew G Barto

Berkeley AI Course - http://ai.berkeley.edu/home.html

David Silver - http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Teaching.html

Thank YouHave a great day

Maximize your utility

NUS-ISS Learning Day 2017 - How About a Game of Chess?

Education

NUS-ISS Are You Able to Communicate Effectively with Your Senior Management?

NUS-ISS Learning Day 2015 - Why All IT Professionals should know IT Service Management

NUS-ISS PCP for FullStack Software Developers

NUS-ISS Learning Day 2016 - What Other IT Management Skills Should IT Project Managers Have

ISS Stahlscheren 1 - Indeco Deutschland · ISS anstatt Arm ISS anstatt Schaufel ISS fest Gemeinsame Konfigurationen für die folgenden Modelle: ISS 10/20 - ISS 20/30 - ISS 25/40 -

NUS-ISS Learning Day 2016 - Productisation - The New Thinking in Managing Stakeholders and IT Systems

NUS MA1100

NUS ARTISTICOS

CHRISTINA YAN ZHANG · Ensure NUS-USI, NUS Wales & NUS Scotland representation across all NUS zones, sections etc. Ensure NUS and SU events invite speakers from diverse backgrounds

NUS-ISS Learning Day 2017 - Future Skills for Project Managers

CHESS FOR KIDS Lesson 1. Lesson Goals What is chess? Chess facts. Chess history. Why learn chess?

NUS-ISS Learning Day 2016 - Data Visualisation & Infographics

NUS-ISS Learning Day 2016 - The Digital Mindset - Do You have what it takes to Help Your CEO Grow the Business?

Arbres nus

NUS-ISS Learning Day 2016 - Tiny Components But Big Possibilities

NUS Overseas Colleges Programme 20040130. NUS Overseas Colleges 2

Nominais Nus

NUS-ISS Learning Day 2017 - Governance in the Age of Digital

NUS-ISS Learning Day 2015 - Project Management - May the Agility be with You

NUS NEC Report | NUS-USI Presidents3-eu-west-1.amazonaws.com/.../Fergal_McFerran_-_NEC_report_Feb_2016.pdf · 1 NUS NEC Report | NUS-USI President Author & Job title: Fergal McFerran,