58
1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise ICARUS

1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

Embed Size (px)

Citation preview

Page 1: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

1

Transfer Learning Site Visit August 4, 2006

Report of the ISLE Team

Pat LangleyTom Fawcett

Daniel ShapiroInstitute for the Study of Learning and Expertise

ICARUS

Page 2: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

Transfer Learning Site Visit August 4, 2006

Results from Year 1for the ISLE Team

Page 3: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

3

ISLE: Transfer in ICARUSPI: Pat Langley

ICARUS Architecture

Architecture Components• Conceptual inference: Icarus performs bottom-up inference from relational ground state literals to higher level state concepts.

• Skill execution: Icarus retrieves relevant skills for goals and executes them reactively.

• Skill learning: Icarus acquires general hierarchical reactive skills that explain/generate successful solution paths.

• Value learning: Icarus employs reinforcement learning to acquire a value function over game states using a factored state representation (hierarchy of first-order predicates)

Long-TermLong-TermConceptualConceptual

MemoryMemory

Short-TermShort-TermConceptualConceptual

MemoryMemory

Short-TermShort-TermGoal/SkillGoal/SkillMemoryMemory

ConceptualConceptualInferenceInference

SkillSkillExecutionExecution

PerceptionPerception

EnvironmentEnvironment

PerceptualPerceptualBufferBuffer

Problem SolvingProblem SolvingSkill LearningSkill Learning

MotorMotorBufferBuffer

Skill RetrievalSkill Retrieval

Long-TermLong-TermSkill MemorySkill Memory

Contains relational and hierarchical knowledge

about relevant concepts

Generates beliefs using observed environment and long term

conceptual knowledge Creates internal description of the

perceived environment

Contains descriptions of the perceived objects

Contains inferred beliefs about the environment

Contains hierarchical knowledge about executable skills

Finds novel solutions for

achieving goals

Acquires new skills based on

successful problem solving traces

Selects relevant skills based on beliefs and goals

Contains goals and intentions

Executes skills on the environment

•Logically defined arbitrary rules of play•Addressed by learning value function over game states

Testbeds: Urban Combat and GGP

•First-person real-time shooter game•Goal: find and defuse IEDs•Addressed by learning new skills

Results• Urban Combat: Evaluation ongoing• GGP: Transfer ratio of 1.3 on TL 7, jump start of 20..

TL MetricsReward

Score P ValueTransfer ratio 1.3009 0.2715Transfer ratio (truncated) 1.4375 0.1660Transfer ratio (smoothed) -0.6753 0.8055Jump start 28.5714 0.0900Jump start (smoothed) 19.2857 0.1735ARR (narrow) 0.6833 0.1900ARR (wide) -INFINITY 0.4005Asymptotic Advantage -INFINITY 0.5885Ratio (area under curves) 0.7687 0.6645Transfer difference 464.2869 0.3335Transfer difference (scaled)

Page 4: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

4

University of Michigan: Transfer in SoarPI: John Laird

Payoff

Problem/Objective

Solution Approach/Accomplishments

• Study transfer learning using multiple online architectural learning mechanisms

• Chunking (EBL)• Reinforcement Learning, • Semantic Learning• Episodic Learning

• Determine strengths and weaknesses• Develop reasoning strategies that

maximize transfer

• Fair comparison of learning mechanisms• All use same performance system

• Integration and synthesis of multiple learning mechanisms and reasoning strategies on same problem

• Not reliant on one mechanism• Best technique used for given problem• Positive interaction between methods

• Integrated Soar & Urban Combat Testbed• Three learning approaches in UCT

• Levels 0-2• Significant transfer

Body

Long-Term MemoriesProcedural

Short-Term Memory

Dec

isio

n P

roce

dure

Chunking

Episodic

EpisodicLearning

SemanticLearning

Semantic

ReinforcementLearning

Perception Action

Soar

Level Memory Search RL

0 31.3 38.7 22.8

1 10.4 n/a 6.8

2 9.6 8.1 1.1

Page 5: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

5

Northwestern University: CompanionsPIs: Kenneth D. Forbus, Thomas Hinrichs

Payoff

Problem/Objective

Solution Approach/Accomplishments

• Extend Companion Cognitive Systems architecture to achieve transfer learning

• Advance analogical processing technology• Develop techniques for learning self-models

• Test using ETS Physics testbed

• New techniques for robust near and far transfer learning

• Advances can be incorporated in other cognitive architectures, systems

• Near term: Analogy Servers• Long-range: Companions architecture

used in military/intelligence systems• Today’s cluster is tomorrow’s multi-

core laptop

• Analogy approach based on how humans seem to do transfer

• Study worked solutions to learn equations, modeling assumptions

• e.g., when could something be a point mass?

• Pilot experiment: Achieved transfer levels 1, 3, & 5

ETS-generatedAP Physics test

Worked Solutions

Sketches included Y2-3

Learned strategies,

encoding rules,and cases

Learning to solve problemsby studying worked solutions

Page 6: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

6

UT Arlington: Urban Combat Testbed (UCT)PIs: L. Holder, M. Youngblood, D. Cook

• Develop Urban Combat Testbed (UCT), a simulated, real-time, urban combat domain

• Agent interface provides detailed, real-time perceptual information and command execution

• Human interface provides compelling video interface and keyboard/mouse command interface

• Develop scenarios for human and agent trials for each level of transfer• Execute human and agent trials, compare transfer learning performance• Investigate other approaches to transfer learning

•Human transfer learning•Hierarchical reinforcement learning•Agent-based cognitive architectures

• UCT version 1.0 available• Based on Quake 3 Arena first-person shooter

(FPS) game• Enhanced to include realistic urban combat

environments• Agent version provides interface to game

percepts and commands• Human-player version provides standard

interface as in commercial FPS games• Under development

• Set of scenarios to evaluate different levels of transfer learning• Random generation of scenarios• Ability to log game interaction

Technical Details Highlights

Vision/Goals• Develop Urban Combat Testbed (UCT) capable of generating tasks to evaluate transfer learning performance

• Conduct significant human trials to evaluate human transfer learning performance

• Disseminate UCT to community as a benchmarking tool for cognitive performance

• Investigate novel cognitive architectures for achieving transfer learning in Urban Combat and similar domains• Achieve 70% of human transfer learning performance

Page 7: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

7

UT Arlington: Reinforcement LearningPI: M. Huber

Technical Approach Benefits and New Capabilities

Integration and Deliverables Example and Performance

Transfer of skill and concept hierarchies from training to transfer tasks

Transfer skills and concepts are found automatically and carry probability and value attributes• Transfer skills are extracted based on local system characteristics in the task domain

•Sub-skills are reward-independent•Transfer skills have an associated probabilistic model

• Hierarchical concepts capture capabilities of the skill set•Concepts capture probabilistic behavior of skills•Concepts capture value attributes of the task domain

• Generated representation hierarchy and refinement process have bounded optimality properties

•Policies learned on the representation are within a bound of optimal

The approach provides skill and concept hierarchies for use as representations by reasoning systems• Provides probabilistic and utility information to representation hierarchies in ICARUS

•Explicit tie of reasoning structure and reinforcement learning• Generates new, capability-specific concepts that could serve

as new predicates in Markov Logic Networks (MLN)•Probabilistic attributes can facilitate fast integration into MLN

Integration and Delivery MilestonesIntegration: ICARUS int. MLN MLN/ICARUSDevelopment: Skill utility Skill generalization Skill extension Deliverables: Prototype w. Prototype w. Final system

UCT interface skill gen.

Urban Combat Testbed (UCT)•Training task: Go to flag•Transfer task: Retrieve different flag

• Transfer from training to transfer task•29 sub-skills and associated concepts•Reduction from 20,000 to 81 states

• Transfer Performance (Transfer Ratio - TR)

•TR 2.5 with skill transfer•TR 5 with skill and concept transfer

Skill HierarchyConcept Hierarchy

Selective, task-specific state space construction

Hierarchical state representation

Task learningSkill and concept

extraction

•Extraction of sub-skills using subgoal discovery

•Learning of concepts characterizing skill capabilities

• Transferred concepts and skills are used to construct a more

abstract Bounded Parameter state representation

•Learning on new, more compact representation leads to improved learning performance

Year 1 Year 2 Year 3

Page 8: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

8

University of Washington: Markov LogicPI: Pedro Domingos

Payoff

Problem/Objective

Approach/Accomplishments

- Transfer learning requires: - Relational inference & learning - Uncertain inference & learning- Markov logic provides this - Simple, general, unified framework- Needs: - Scaling to large problems - Online, “lifelong” operation - Extension to continuous data - Extension to decision-making

- Key approaches: - Representation mapping - Statistical predicate invention- Accomplishments to date: - LazySAT: Efficient use of memory (400,000 X less than WalkSAT on BibServ) - MC-SAT: Fast mixed inference (>1000 X faster than Gibbs, tempering) - Alchemy system - Collaborated on integration w/ Icarus, etc.

- Enables highest levels of transfer - Between relational structures, as opposed to surface descriptions- Enables transfer “in the wild” - Noisy, rich, real-world domains - As opposed to shoehorning problems into standard machine learning form-Broadly applicable AI technology - Greatly increases speed of adaptation

Markovlogic

ILPWeight

Learning

WalkSAT MCMC

SourceDomain

TargetDomain

0

4

8

12

16

0 100 200 300 400 500No. Records

No

. Cla

use

s (m

illio

n)

WalkSAT

LazySAT

Page 9: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

9

CycorpPI: Michael Witbrock

Payoff

Problem/Objective:Knowledge-based transfer learning

• Supply background knowledge and well-encoded, logically meaningful domains and problem spaces

• Elaborate on background knowledge and knowledge gathered from source tasks and domains

• Informed by existing background knowledge in Cyc

SourceTestbed

Situation,Status, &Queries

Advice &Support

TargetTestbed

Cyc KB (background knowledge)

Collect knowledgerelevant to a task,

domain, or problem

Elaborate on knowledge: • Inferential expansion• Probabilistic weighting• Rule formation (ILP)

ExecutionAgent(s)

Perform inference;supply advice, queryresults, background

knowledge

• Information flow among complementary learningand transfer mechanisms and approaches

• Establish a well-founded, mutually compatible baseof assumptions and facts – necessary for transfer

• Allow systems to communicate observations, conclusions, skills, memories and intentions

• Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems

• New high-level, semantically connected knowledge, within a context of existing knowledge: understanding

Solution Approach/Accomplishments

• Representation of initial domains and solutions• Existing knowledge relevant to domains identified• Physics testing domain: encoding developed, first

transfer level problems represented in Cyc• Urban Combat (FPS) testbed: map space semantics

defined; distribution being developed

• Initial integration of probabilistic reasoning• System integrated, scalability testing underway• Alchemy system Extended

• Rule and skill learning underway• First utomatically-generated results

from evaluation domains• Application of work from BUTLER seedling

New Rules and Skills:

RuleInduction

New Facts: Automated KnowledgeAcquisition

Expanded Knowledge:Inference &

Markov Logic

Page 10: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

10

Maryland/Lehigh: Hierarchical Task NetsPIs: Dana Nau, Héctor Muñoz-Avila

Problem/Objective

Solution Approach/Accomplishments

• Learn applicability conditions of HTN methods that tell how to decompose tasks into subtasks

• Input: plan traces produced by an expert problem-solver

• Reflects abstraction levels in the game• Output: methods consistent with plan traces

• Can be transferred in different games

• HTNs represent knowledge of different granularity at different levels

• Facilitates transfer to different games• Increasingly capable HTN learning

algorithms• Y1: transfer levels 1-3• Y2: transfer levels 4-7• Y3: transfer levels 8-10

• Approach: our new HDL algorithm• Can start with no prior information• Can start with info transferred from a previous

learning session• Accomplishments:

• Development of the HDL algorithm• Theoretical conditions in which HDL achieves

full convergence [paper at ICAPS-06]• Experiments: even when only halfway to

convergence, HDL solved > 3/4 of test set

MadRTS real-time strategy game

HDL++ Learning

Agent

Statistical methods to compare learning

curves

TIELT

Payoff for TL

Scenario Generator

Page 11: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

11

Rutgers University: Relational TemplatesPI: Michael Pazzani

Constraint Clauses evaluated

None 320,968

Unique 195,489

Commutative 165,601

Both 88,230

Approach

Payoff

Problem/Objective

Solution Approach/Accomplishments

• Learn templates from Markov Logic Networks (MLNs)

• Learn Markov Logic Networks (MLNs) from templates

• Learning general concepts and strategies applicable across many domains,

• Transitivity• thwarting, feigning

• Constraining Learning of MLN clauses• Creating template from MLN clauses by

Least General Generalization Speed Up

MLNsTemplates

SameVenue(a1,a2) v

!SameVenue(a2,a3) v !SameVenue(a3,a1)

SameTitle(a1,a2) v !SameTitle(a2,a3) v

!SameTitle(a3,a1)

P(a1,a2) v !P(a2,a3) v

!P(a3,a1)

Page 12: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

12

UT Austin: Theory RefinementPI: Mooney

SummaryProblem/Objective• Faster learning in target domain by efficiently

transferring probabilistic relational knowledge using bottom-up theory refinement.

• Determine appropriate predicate mapping by searching possible mappings to find the most accurate for the target domain.

• Use relational path-finding to more effectively construct new clauses in the target domain.

Develop transfer learning methods for Markov Logic Networks (MLNs) that:

• Efficiently revise structure and parameters of learned knowledge from source domain to fit novel target domain.

• Automatically recover an effective mapping of the predicates in the source domain to those in the target domain.

Approach 2. Determine which parts of the source structure are still valid in target domain and which need to be revised; annotate source MLN accordingly.

3. Specialize only overly-general clauses and generalize only overly-specific ones, leaving the good ones unchanged.

• Alchemy and our transfer algorithm equally improve accuracy over learning from scratch.

• Our approach decreases learning time and number of revision candidates significantly.

Experimental Results

(Mihalkova & Mooney, ICML06-TL workshop)1. Find an effective predicate

mapping.

4. Search for additional clauses using relational path-finding.

Page 13: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

13

UT Austin: Reinforcement LearningPI: Peter Stone

β(A’)→A

γ(S’)→S

I-TAC

SME-QDBN

1

2:S A ' :S' A '

OXOXXOX

XOOXOXXOXOX

Problem/Objective

1 2

• Develop core architecture-independent unified transfer learning technology for reinforcement learning

• Key technical idea: transfer via inter-task mapping– Generalization of value-function-based transfer

• Automatic discovery of inter-task mapping– I-TAC (inter-task action correlation)– SME-QDBN (structure mapping + qualitative dynamic Bayes networks)

• Value-function-based transfer and policy-based transfer• Focus on results in many domains• Transfer of knowledge among reinforcement learning tasks (within the same

domain/testbed)– RoboCup Soccer, GGP

• Compare with Icarus GGP performance

ResultsTechnical Approach• Automatic discovery of inter-task mapping

– I-TAC (inter-task action correlation)• Data-centered approach• Train a classifier to map state transition pairs to actions in the source• Use the classifier and state mapping to obtain the action mapping

– SME-QDBN (structure mapping + qualitative dynamic Bayes networks)• Knowledge/model-centered approach• Represent action model using qualitative DBNs• Specialized and optimized SME for QDBNs, using heuristic search

• RoboCup soccer– Value-function-based transfer: sarsa-learning, function approximators– Policy-based transfer: neuro-evolution (NEAT)

• GGP: value-function-based– Using symmetry to scale up the same type of games– Identifying game-tree features to transfer among different types of

games

Source Actions (a)

Tar

get A

ctio

ns (

a’)

163 (76%)51 (24%)1 (<1%)4v3 Pass3

133 (50%)133 (50%)0 (0%)4v3 Pass2

97 (36%)174 (64%)1 (<1%)4v3 Pass1

71 (24%)0 (0%)227 (76%)4v3 Hold

297 (92%)25 (8%)2 (<1%)3v2 Pass2

26 (7%)330 (93%)0 (0%)3v2 Pass1

0 (0%)0 (0%)382 (100%)3v2 Hold

3v2 Pass23v2 Pass13v2 Hold

I-TACSME-QDBN

t.r. = 5.8t.t.r. = 84%

Connect-3 (4x4, same opp)

CaptureGo (3x3, same opp)

t.r. = 5.6t.t.r. = 83%

t.r. = 4.3t.t.r. = 88%

t.r. = 4.3t.t.r. = 73%

Minichess (5x5)

RoboCup GGP

Page 14: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

14

• Model/knowledge oriented approach• Using knowledge about

– How actions affects state variables?– How state variables relate to each other?

• Use structure mapping to find similarities between source and target tasks

• Discover β and γ together

Objective

Technical Approach

Results Keepaway match scores

• Representation: qualitative dynamic Bayes networks• Specialized and optimized SME for QDBNs• SME-QDBN uses heuristic search to find the

mapping of the maximal score1. Generate local matches and calculate the conflict set for each

local match;2. Generate initial global mappings based on immediate relations

of local matches;3. Merge global mappings with common structures;4. Search for a maximal global mapping with the highest score;

Structure Mapping

• can be decomposed into two parts– Mappings of states () and actions ()– Transforming representation of value functions

(table-based or function approximation)

• Current work focuses on automatic discovery of mappings of state variables and actions

– Data oriented approach (I-TAC)– Model/knowledge oriented approach (structure mapping)

• Data oriented approach to automatic discovery of • Considers mappings of states () and actions ()

separately

(S’)→S (A’)→A

• Assume that is given. How can we learn β?

Inter-Task Action Correlation (I-TAC)

Technical Approach1. Collect transition data in source domain2. Train a classifier from state pairs to actions3. Collect transition data in target domain, define as

β(a’) = arg maxa #{all tuples with a’ | C(γ(s’1),γ(s’2)) = a}

ResultsSource Actions (a)

Tar

get A

ctio

ns (

a’)

3v2 Hold 3v2 Pass1 3v2 Pass2

3v2 Hold 382 (100%) 0 (0%) 0 (0%)

3v2 Pass1 0 (0%) 330 (93%) 26 (7%)

3v2 Pass2 2 (<1%) 25 (8%) 297 (92%)

4v3 Hold 227 (76%) 0 (0%) 71 (24%)

4v3 Pass1 1 (<1%) 174 (64%) 97 (36%)

4v3 Pass2 0 (0%) 133 (50%) 133 (50%)

4v3 Pass3 1 (<1%) 51 (24%) 163 (76%)

UT Austin: Mapping Value FunctionsPI: Stone

Page 15: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

15

UT Austin: Feature ConstructionPI: Stone

• Scale up from small to large version of same game– Simultaneous update of isomorphic states– Exploit symmetry to scale up RL in board games

• Transfer between different small games– Table-based learning but transfer in feature space– Automated discovery of state-features– Initialization by feature-matching– Two person, complete-information, turn-taking games

Technical ApproachResults (rand opp)

Objective

Connect-3, 4x4

t.r. = 4.3t.t.r. = 73%

Minichess (5x5)

• Verify presence of symmetries on smaller task (larger task => too much memory)

• Transfer knowledge to larger task (simultaneous backups for upto 8 transitions)

Othello, 4x4

Feature discovery Features discovered

• Feature extraction/matching based on abstract game-tree expansion upto 2 levels

FindingsLimited look-ahead based features are quick to extract and match, few (manageable knowledgebase), highly common/reusable, and faster than minimax lookahead against suboptimal opponents

Future Plan:Abstraction Matching

SourceGame

AbstractMDP

TargetGame

Abstraction Discovery(Jong & Stone, 05) Online Abstraction

Matching (ongoing work)

SourceGame

AbstractMDP

TargetGame

Abstraction Discovery(Jong & Stone, 05) Online Abstraction

Matching (ongoing work)

transfer

Minimax lookahead

tr =1.66, ttr=56.7%tr~ 40, ttr~ 99%

Page 16: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

Transfer Learning Site Visit August 4, 2006

Proposal for Year 2from the ISLE Team

Page 17: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

17

Changes from Initial Plans

Year 1

• Full integration did not happen in Y1– Component systems (Icarus, Soar,

Companions, LUTA, CaMeL) were developed independently and did not emerge as a single system

– Ideas and/or subsystems of component efforts to be integrated in later years.

• Little use of background knowledge in Y1– Still believe it is critical for taking full

advantage of transfer opportunities, but…– Y1 concentrated on basic navigation and

problem solving without exploiting deep semantic domain knowledge

• Markov logic not used in Y1 testbed evaluations– Initial integration with ICARUS is finished, but

efficiency issues advised against its use for Y1 tasks

– Improving efficiency is a top Y2 priority

Year 2

• Continuing with three main architectures– Development of Component systems will

continue. – Evaluation will focus on comparing and

contrasting agent architectures.• Focus on highest transfer levels in all three

testbeds – Urban Combat, Physics, and GGP

• More interesting scientific results linked to key claims, but fewer total experimental conditions and less engineering

• Management structure for project will change to a matrix organization

Page 18: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

18

Year 2 Matrix Management Structure

ISLE (Langley)Oversight

ISLE (Langley)ICARUS

UW (Domingos)Markov logic

UW (Domingos)Alchemy

Rutgers (Pazzani)Rel. templates

UT (Mooney)Theory Revision

Michigan (Laird)Soar extension

NU (Forbus)Companions

UT (Stone)LUTA extension

Maryland (Nau)HTN planning

ISLE (Konik)Skill learning

Cyc. (Witbrock)CYC integr.

ISLE (Langley)Oversight

ISLE (Shapiro)Urban Combat

WSU (Holder)UC extensions

ISLE (Stracuzzi)ICARUS on GGP

NU (Forbus)Compns Physics

UT (Stone)LUTA on GGP

ISLE (Stracuzzi)GGP evaluation

Cyc. (Matuszek)Physics eval.

Michigan (Laird)UC evaluation

ISLE (Choi)UC evaluation

ISLE (Konik)ICARUS Physics

WSU (Holder)Humans on UC

TechnologyDevelopment

ExperimentalEvaluations

Technology work breaks down into extending Markov logic, integrating Markov logic and HTNs into ICARUS, and extending other agent architectures

Evaluation efforts focus on GGP (external), Urban Combat (internal), and ETS Physics (external), each used on two agent architectures

Page 19: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

19

Expected Year 2 Products

Extended Alchemy software that includes:

Techniques for inventing new predicates that support mapping across domains

Methods for revising inference rules based on observed regularities (from UT Austin)

Methods for using relational templates to learn from few instances (from Rutgers)

Ability to access background knowledge from CYC (from Cycorp)

Extended ICARUS software that includes:

Techniques for learning goal-oriented mappings that support transfer

More flexible inference using Alchemy as a central module (from Washington)

Extended methods for learning skills in adversarial contexts (from Maryland)

Methods for combining skill learning with value learning (from UT Austin)

Extended versions of software for:

Soar (Michigan) that supports transfer by semantic learning and chunking

Companions (Northwestern) that supports transfer by deep structural analogy

LUTA (UT Austin) that achieves transfer by knowledge-based feature construction

Extended Urban Combat testbed that:

Includes a richer variety of objects, activities, and spatial settings

Supports multi-agent coordination and multi-agent competition

Allows tests of high-level transfer from urban military operations to search and rescue activities

Page 20: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

20

Claims about Transfer Learning

Claim: Transfer that produces human rates of learning depends on reusing structures that are relational and composable

Test: Design source/target scenarios which involve shared relational structures that satisfy specified classes of transformations

Example: Draw source and target problems from branches of physics with established relations among statements and solutions

Claim: Deep transfer depends on the ability to discover mappings between superficially different representations

Test: Design source/target scenarios that use different predicates and distinct formulations of states, rules, and goals

Example: Define two games in GGP that are nearly equivalent but have no superficial relationship

Meta-Claim: These claims hold for domains that involve reactive execution, problem-solving search, and conceptual inference

Test: Demonstrate deep transfer in testbeds that need these aspects of cognitive systems

Example: Develop transfer learning agents for Urban Combat, GGP, and Physics

Predicate invention for representation mapping in Markov logic (Washington)

Goal-directed solution analysis for hierarchical skill mapping (ISLE)

Representation mapping through deep structural analogy (Northwestern)

Semantic learning augmented with procedural chunking (Michigan)

We will explore four paths to deep transfer:

Page 21: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

21

ISLE Year 2 Plans for ICARUS

Combine rapid analytic creation of hierarchical skills with statistical estimation of their utilities

Learn relational concepts that characterize the conditions under which skills achieve goals

Retrieve relevant skills even when the goals that index them match only incompletely

Acquire mappings among domain representations based on analysis of problem solution traces

Use these capabilities to support deep transfer

We will demonstrate deep transfer in three separate testbeds with distinct characteristics.

ICARUS’ Unique Capabilities

Plans for Evaluation

Integration Plans

Mapping Concepts and Skills

Long-TermLong-TermConceptualConceptual

MemoryMemory

Short-TermShort-TermConceptualConceptual

MemoryMemory

Short-TermShort-TermGoal/SkillGoal/SkillMemoryMemory

ConceptualConceptualInferenceInference

SkillSkillExecutionExecution

PerceptionPerception

EnvironmentEnvironment

PerceptualPerceptualBufferBuffer

Problem SolvingProblem SolvingSkill LearningSkill Learning

MotorMotorBufferBuffer

Skill RetrievalSkill Retrieval

Long-TermLong-TermSkill MemorySkill Memory

Replace w/Alchemyinference software

Augment with CYCknowledge base

Incorporate HTN planning methods

Cycorp

Washington

Maryland

Add methods for learning value fns

UT Austin

source concepts

targetconcepts

sourceskills

targetskills

ICARUS will not only learn hierarchical skills and concepts, but also how they map across different settings

Urban Combat Problem Solvingin Physics

General Game Playing

Page 22: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

New Technology: Concept Revision

1. Learn new domain-specific concepts

2. Generalize these concepts to expand possible transfer opportunities

3. Specialize again in target domain to increase utility

Details vary, but underlying structure unchanged

Y2 Plans for ICARUS on GGP

ChallengesDomain Independence

– Remove assumption of “chess-like” games– Expand beyond common board games, consider puzzles

or games with many players• Concept learning and revision

– Remove assumption that domain-specific concepts will be provided

– Agent must discover new concepts or revise existing ones

GoalsDemonstrate discovery / transfer of structural domain knowledge

– Build on Y1 success with first-order concepts

– Learn relationships among concepts to capture domain structure

– Expand learning of relative concept utility to revise concepts to improve utility

• Generalize existing concepts to expand coverage

• Specialize general concepts to improve utility

– Derive new concepts from game description

Domain-specific

concept (source)Generalized concept Specialized concept

(target)

New Technology: Concept Derivation

1. Derive basic concepts from game description

2. Evaluate utility through experience

3. Construct more complex structures by combining concepts and

expanding derivation

GGP Game

Description

Simple derived

conceptsComplex structures

Return to description

and derived concepts for further expansion

Page 23: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

23

GeneralConcept

Transform the current situation• When expecting or searching for transferRetrieve a memory based on

transformed situation• Automatic (procedural) or• Deliberate (semantic/episodic)

Use transfer memory to impact behavior• Control selection of actions• Decide on strategy or tactic

Perform target task • Generate behavior• Sense environment• Create internal situational assessment

Michigan Year 2 Plans for Soar

Body

Long-Term Memories

Procedural

Short-Term Memory

Dec

isio

n Pr

oced

ure

Chunking

Episodic

EpisodicLearning

SemanticLearning

Semantic

App

rais

al

Det

ecto

r

ReinforcementLearning

Perception Action

Experience IdentifyGeneralizeAbstract

Store

Source Problems Target Problems

Soar provides• Extreme flexibility in every phase of

transfer• Multiple performance methods• Task-dependent knowledge for

abstraction, transformation, instantiation• Multiple learning mechanisms

Create general concept/skill/…• Generalization based on multiple examples• Abstraction based on prior semantic

knowledge

Store in memory for later recall• Different memories for different types of

knowledge • Procedural, semantic, episodic

Identify elements that might be useful• Everything, but literal (episodic)• Categories, structures (semantic)• Results of processing (chunking)• Explicit analysis (reflection)

Perform source task • Generate behavior• Sense environment• Create internal situational assessment

Experience

Retrieve

Use

Transform Retrieve

Transform

Transform/map retrieved memory• Explicitly map to current situation or• Instantiate for current situation

Retrieve from memory a related concept• For some memories this will be automatic• For others, it will be deliberate

Page 24: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

24

Level 9 Transfer in Soar

Body

Long-Term Memories

Procedural

Short-Term Memory

Dec

isio

n Pr

oced

ure

Chunking

Episodic

EpisodicLearning

SemanticLearning

Semantic

App

rais

al

Det

ecto

r

ReinforcementLearning

Perception Action

Experience IdentifyGeneralizeAbstract

Store

Source Problems Target Problems

Source: Hunted dies after getting trapped in a dead end

• learns spatial configuration of dead end• learns dead end is deadly to hunted

Target: Hunter tries to chase hunted to a location it has recognized as a dead end.

Perform source task • Hunted dies after getting trapped in a dead

end

Experience Use

Transform Retrieve

Perform target task • As hunter, tries to develop strategy for

killing hunted

Use transfer memory to impact behavior• Searches for dead ends• Tries to “herd” hunted into dead ends

Identify elements that might be useful• Death is feedback that made mistakeCreate general concept/skill/…• Uses episodic knowledge to recall behavior that

led up to death• Analyzes spatial configuration• Causal knowledge determines critical features

Store in memory for later recall• Stores dead-end concept in semantic

memory and associates bad result

Transform the current situation• Creates internal model of huntedRetrieve a memory based on

transformed situation• Queries memory – what would be bad when I

imagine myself as hunted?• Retrieves memory of dead end

Page 25: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

25

Level 10 Transfer in Soar

Body

Long-Term Memories

Procedural

Short-Term Memory

Dec

isio

n Pr

oced

ure

Chunking

Episodic

EpisodicLearning

SemanticLearning

Semantic

App

rais

al

Det

ecto

r

ReinforcementLearning

Perception Action

Experience IdentifyGeneralizeAbstract

Store

Source Problems Target Problems

Source: 1v1 • Learns to pick up ammo to deny enemy

Target: Fire rescue• Transforms to remove gasoline near fire

Experience Use

Transform Retrieve

Perform source task • Tries to kill enemyIdentify elements that might be useful• Encounters experience when can pick up

enemy ammo and realizes that would deny enemy ammo

Store in memory for later recall• Stores general concept in semantic

memory

Perform target task • As fire rescuer, try to search building

(and avoid dieing, flames, etc.)

Transform the current situation• Analyzes situation• Determines that fire is its enemy

Retrieve a memory based on transformed situation

• Queries memory for ways to defeat an enemy

• Retrieves general concept about resources

Create general concept/skill/…• Uses background knowledge to generalize to a

concept of deny enemy its resources necessary to hurt me

Use transfer memory to impact behavior• Instantiates general concept in current

situation [resource map to air and fuel – wood, gasoline, …]

• Takes actions to eliminate fuel

Page 26: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

26

Northwestern Year 2 Plans for Companions

Foundation: Analogical Processing• Northwestern’s technology is based on how humans seem to do

transfer – by analogy and similarity• Based on Gentner’s (1983) Structure-Mapping theory• Simulations of cognitive processes engineered into components in

prior DARPA research– SME: Analogical matching, similarity estimation, comparison– SEQL: Generalization– MAC/FAC: Similarity-based retrieval

Approach• Extend Companions Cognitive Systems architecture by

– Creating and incorporating advances in analogical processing

– Develop techniques to learn self-models to help formulate own knowledge goals

• Compare Companions and ICARUS in physics testbed– Help ISLE and Cycorp integrate our

representations and support libraries into ICARUS

– Extend as necessary (e.g., sketching support)

Metrics• Coverage = Fraction of time an answer is generated• Accuracy = Whether the answer is right (including partial

credit)

Companions Cognitive Systems Architecture• Structure-mapping operations appear to be heavily used

throughout human reasoning and learning• Hypothesis: Can achieve human-like reasoning and learning by

making structure-mapping operations central in a cognitive architecture

Year

Coverage Accuracy

Near Transfer

Far Transfer

Near Transfer

Far Transfer

1 50% 0% 50% 0%

2 80% 50% 90% 80%

3 90% 80% 90% 90%

Page 27: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

27

Northwestern Year 2 Plans for Far Transfer (7-10)

“A battery is like a pump”

Advice can be about

appropriate analogs,

mappings, analogical inferences

Analogical encoding will let Companions work

with more abstract advice

Metamappings will guide cross-

domain analogies by first matching

general knowledge

KB

Persistent Mappings store ongoing

understanding of cross-domain analogy

Expanded self-modeling

capabilities to improve skills and

knowledge

Need to study pulley problems more

Need to figure

out trig inverses

Page 28: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

28

University of Washington Year 2 Plans

Integration

Evaluation

Unique Contribution

- Integrate into Icarus- Apply to Physics and Urban Combat- Transfer Level 8: - 60% of human performance- Transfer Levels 9-10: - 30% of human performance- Infrastructure: - Component-wise evaluation - “White box” evaluation

Technologies

Alchemy

ICARUS

Percepts Inferences

- Representation mapping - Entities - Attributes - Relations - Ontologies - Situations - Events- Based on statistical predicate invention - Discover abstract relations, etc., & transfer- Infrastructure - Efficient inference and learning - Online, “lifelong” operation - Extension to continuous data - Extension to decision-making

PredicateInvention

InfrastructureRepresentationMapping

Page 29: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

29

UT Austin Year 2 Plans (Mooney)

Theory RefinementPredicate MappingImprove system’s ability to revise the structure of the

source Markov Logic Network (MLN) to fit the target domain.

• Improve efficiency of clause generalization and specialization procedures by using bottom-up search to directly identify productive changes rather than blindly searching the space of possible refinements.

• Improve generation of new clauses in the target domain by exploiting advanced ILP methods.

Improve system’s ability to accurately map predicates from source to target domain.

• Use schema-mapping techniques from information integration to suggest predicate mappings by analyzing source and target data.

• Use lexical knowledge (e.g. WordNet) to guide matching of predicate names in source and target.

• Use heuristic search to improve efficiency of finding the best overall predicate mapping.

Integration with Alchemy and ICARUS Evaluation on Testbeds

Incorporate MLN Transfer Learning Methodsinto Alchemy and Icarus

• Integrate predicate mapping and theory refinement methods into UW Alchemy MLN software package.

• Integrate our transfer learning methods for MLNs into Icarus+Alchemy to provide transfer of static inferential knowledge from the source to the target domain.

Evaluate MLN TL methods on ISLE Testbeds• In Urban Combat and other ISLE testbeds, measure the

accuracy of transfer learning at making within-state inferences (using AUC) compared to learning an MLN from scratch by adapting knowledge from source to target tasks for several levels of transfer.

• Measure training time of our system versus existing Alchemy to demonstrate improved efficiency.

• Compare ablated version of Icarus without MLN-transfer-learning to enhanced version on final testbed performance metrics and demonstrate improved performance.

Source Training Data

MLN Learner(Alchemy)

Source MLN

MLN RevisionTarget

Training DataTarget MLN

Predicate Mapping

Page 30: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

30

Rutgers University Year 2 Plans

Integration

Evaluation

Unique Contributions

- Alchemy Integration into Icarus- Apply to Physics and Urban Combat- Transfer Levels 9-10: - 50% of human performance- Infrastructure: - Component-wise evaluation: Alchemy

Technologies-Learning and Instantiating Templates

- Based on second order learning - Discover general regularities & transfer

-Entailment Learning-Combining Inductive and deductive learning-Discover simple rules and combine (e.g., OCCAM)

- Infrastructure-Template Learner for Markov Logic Networks- Deductive Learner for Markov Logic Networks

Templates

Alchemy

ICARUS

Percepts Inferences

MLNS

Template Learning and Instantiation

Learning by entailment

Page 31: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

31

Cycorp Year 2 PlansTechnologies and Capabilities

Problem/Objective:Knowledge-based transfer learning

• Supply formalized domain expertise and well-encoded, logically meaningful domains and problem spaces

• Elaborate on background knowledge via ILP and inference, provide advice, and extend knowledge gathered from source tasks and domains

• Informed by existing background knowledge in Cyc

New Rules and Skills:

Rule Inductionvia ILP

New Facts: Domain & General

Knowledge

Expanded Knowledge:

Inference, Advice, & Probabilities

• Technical Development• Provide domain knowledge for use by Urban Combat,

Physics, and GGP performers• Provide inference capabilities, including query support,

goal advice, and knowledge elaboration, for UCT, Physics and GGP performers

• Pursue knowledge gathering and elaboration via ILP over domain and background knowledge

• Pursue inference speedup and results improvement via Reinforcement Learning of inference pathways

• Integration & Coordination• Integrate Alchemy and other probabilistic reasoning

approaches with Cyc’s inference capabilities

Tasks

• Responsibility for technical integration of Alchemy, Cyc, and other inference approaches

Cycbackground knowledge& inference capability

SourceTestbed

Situation,Status, &Queries

Advice,Support, &Elaboration

TargetTestbed

Collect knowledgerelevant to a task,

domain, or problem

Develop knowledge: • Inferential expansion• Probabilistic weighting• Rule formation (ILP)

ExecutionAgent(s)

Perform inference;advice, query results,background, skills and

memories

• Responsibility for technical coordinationof groups developing on the Physics testbed

Payoff• Information flow among complementary learning

and transfer mechanisms and approaches• Establish a well-founded, mutually compatible base

of assumptions and facts – necessary for transfer• Allow systems to communicate observations,

conclusions, skills, memories and intentions

• Learning can take full advantage of existing background knowledge, knowledge from less- obviously related domains and problems

• High-level, semantically connected knowledge, within a context of existing knowledge = understanding

Page 32: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

32

Cycorp Year 2 PlansEvaluation and Integration

Coordination & IntegrationRepresentation• Coverage: in each testbed,

• How many problems are represented?• How many types of problem? What

novel problem categories?• How many and what type of obstacles,

goals, percepts, and actions?• What novel types of solution information?

• Accuracy:• Well-represented domains are critical for

successful performance; accuraterepresentations are demonstrated bysuccessful agent evaluations.

CycCycKBKB

Testbeds:Urban Combat,Physics, GGP

S2

IS2-1 IS2-j

S1

IS1-i

IS1-1

IS2-1IS1-i

IS2-1

IS1-1 IS1-i

IS2-j

S3

IS3-1

IS1-1

InferenceEngines &

Approaches

Reasoning Over

Queries & Testbeds

Coordinating Inferences

Infe

rence

s (Q

uerie

s, G

oals,

Searc

h Pat

hs, E

labora

tions)

Querie

s & In

fere

nce N

eeds;

Skills

, Conce

pts, L

TMs

Domain Knowledge

Coordination &

Semantic Content Background & Problem

RepresentationLTMs &

Followup Queries

Large ScaleILP, Knowledge

Seeking,Generalization,

LTMs

COMPANIONS

Soar

Queries, G

oals,

Elaboratio

ns

Knowledge, Goals,

Analysis, A

dvice

ICARUS

• How many formal representations of problems and queries in different testbeds are shared by different architectures?

• How many inference requests, of how many types, go through a common interface?

• How effectively can knowledge be probabilistically qualified (as measured by crossfold validation)?

• Learning & Transfer:• What novel fact-level knowledge gathered for

the source is reused in the target space?• How many facts, in what domains?• How many rules can be obtained via ILP over

gathered and domain knowledge, in what domains?• What agent skills are obtainable by ILP within Cyc?

• Advice-giving and query results:• What appropriate, novel goals are presented?• What improvement on random search can be

obtained through advice?• Skills, abilities, and long-term memories:

Inference & Learning

Knowledge& Inference

Soar, ICARUS,Companions

UrbanCombat,

GGP,Physics

• What novel abilities can agents demonstratewith knowledge and inference support?

• What new problems are solvable that could not be solved without that support?

Page 33: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

33

Maryland/Lehigh Year 2 Plans

New Technologies

Solution and Evaluation

1. Mapping between Icarus’ hierarchical representations and HTNs

2. Techniques for systematically extending planners to work in adversarial domains (i.e., multiple possible responses from an adversary)

3. Extensions to Icarus to learn in such domains

• A mapping between Icarus’ hierarchical representations and Hierarchical Task Networks

• New algorithms will provide capabilities to reason about adversaries: I.e., to learn about them and to plan against them

• This will provide high-level transfer in adversarial environments via learning about abstract strategies/models of the behaviors of single or groups of adversaries in one scenario and transferring this knowledge to another scenario

Contributions1. How: Generalize our planner-modification techniques

to deal with adversaries2. Work with ISLE to generalize Icarus’ learning to learn

about adversaries3. When:

1. September-December 2006: develop the theory and implement the new algorithms

2. January-April 2007: Work with the ISLE team to incorporate algorithms into Icarus

3. May 2007: Evaluation: Use the GGP testbed for Year 2

• Icarus does plan abstraction by grouping actions

• The groups are analogous to Hierarchical Task Network (HTN) decomposition templates (e.g.., as in SHOP2)

• “Planner-modification” techniques for systematically generalizing planners to work with nondeterministic actions (i.e., multiple possible outcomes)

Capabilities

a5 a6a2

a3 a4

a1

our actionState 1

State 2

adversary response 1

adversary response 2action

State 1

State 2

possible outcome 1

possible outcome 2

Page 34: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

34

UT Austin Year 2 Plans (Stone)

β(A’)→A

γ(S’)→S

I-TAC

SME-QDBN

:S A ' :S' A '

OXOXXOX

XOOXOXXOXOX

Evaluation

Unique Abilities• Automatic discovery of inter-task mapping

– I-TAC (inter-task action correlation)• Train a classifier to map state transition pairs to actions in the source• Use the classifier and state mapping to obtain the action mapping

– SME-QDBN (structure mapping + qualitative dynamic Bayes nets)• Knowledge/model-centered approach• Represent action model using QDBNs• Specialized and optimized SME for QDBNs

using heuristic search•Policy-based transfer

Capabilities/Technologies

Integration• Incorporate RL into Icarus and/or Soar

– Focus on leveraging action-value functions into generalizable planning knowledge

– Abstract learned RL knowledge to relational representations

• ISLE team comparisons: compare value function transfer vs. Icarus approach in GGP

• Evaluate same core algorithms in multiple domains• GGP: value-function-based

– Use symmetry to scale up within same game types– Game-tree features to transfer among different types of games– Automatic abstraction discovery

• RoboCup Soccer– Value-function-based transfer: sarsa, function approximators– Policy-based transfer: neuroevolution (NEAT)

• Urban Combat -- continued evaluation of year 1 effort

Core architecture-independent TL for reinforcement learning

t t+1HoldDist(K1,C)

Dist(K2,C)

Dist(T1,C)

Dist(K1,K2)

Dist(K1,T1)

Dist(K2,T)

Ang(K2,K1,T)

DAng(C,K1,K2)

DAng(C,K1,T1)

Dist(K2,T1)

Ang(K2,K1,T1)

Dist(K1,C)

Dist(K2,C)

Dist(T1,C)

Dist(K1,K2)

Dist(K1,T1)

Dist(K2,T)

Ang(K2,K1,T)

DAng(C,K1,K2)

DAng(C,K1,T1)

Dist(K2,T1)

Ang(K2,K1,T1)

-

min

min

13 inputs, 3 outputs

19 inputs, 4 outputs

Page 35: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

35

• Generalization of skills and concepts•Policy homomorphisms as general skills •Relational concepts and featuresfrom homomorphic mapping

• Relational Learning •Reinforcement learning with relational concepts and generalized skills

• Estimation of skill/concept utility•Skill utility to regulate exploration•Concept utility to improve state hierarchy construction

UT Arlington Year 2 Plans (Huber)

Technical Approach Novel Capabilities

Integration Evaluation Plans

Generalization of skills and concepts, and estimation of skill and concept utilities to improve transfer

Automatic definition of relevant representational concepts and utility-based guidance for efficient hierarchy construction and skill exploration in RL.• Learning of generalized, “parametric” skills

•Generalized policies apply in novel situations and environments•Skills have operator descriptions with utilities and probabilities

• Automatic generation of useful representational concepts•Generation of task-relevant relations in the form of predicates•Discovery of relevant feature sets and object types

• Automatic derivation of skill and concept utilities•Concept utilities allow construction of appropriate representation•Skill utilities guide exploration or guide planning

Provides RL-based creation of hierarchical skills and concepts with symbolic representations, and skill and concept utilities to provide guidance on their use.• RL-based skill learning component for use in ICARUS

•Learned operator representations facilitate integration of skills

•Learned features and concepts can augment concept hierarchy• Skill and concept utilities for search and planning guidance

•Skill utility estimates can guide operator selection•Concept utility can inform the representation investigated

Development and Integration TimelineSkill generalization:

Skill/concept utility:

New capabilities will extend the set of transfer levels the Hierarchical RL system can address• Evaluation within the Urban Combat Testbed (UCT)

•Application of standalone system to transfer levels 1-6•Evaluation focus on tasks with significant change of the environment and of the task objective

• Evaluation of performance using Transfer Ratio (TR)

•Target of TR values larger than 2• Evaluation of use of capabilities by evaluation of frequency and task utility of generalized skills

Year 2 Year 3

Skill HierarchyConcept Hierarchy

Selective, task-specific state space construction

Hierarchical state representation

Task learningSkill and concept

extraction

Skill and concept

generalization

Skill and conceptutility

DevelopmentIntegration

Page 36: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

36

Year 1 Evaluation Plans

Comparison among architectures should reveal the conditions for successful transfer learning.

But implementing agents that can operate in multiple testbeds takes considerable time and resources.

Instead, we will develop agents within two architectures for each testbed, with only one (ICARUS) being applied to all three of them.

Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge.

UrbanCombat

ETSPhysics

GGP

Soar

Companions

LUTA

ICARUS

Page 37: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

37

Year 2 Evaluation Plans

Comparison among architectures should reveal the conditions for successful transfer learning.

But implementing agents that can operate in multiple testbeds takes considerable time and resources.

Instead, we will develop agents within two architectures for each testbed, with only one (ICARUS) being applied to all three of them.

Experiments will evaluate how well each pair of frameworks supports transfer involving quite different forms of knowledge.

UrbanCombat

ETSPhysics

GGP

Soar

Companions

LUTA

ICARUS

Page 38: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

38

Urban CombatLevel 9 Transfer: Hunted to Hunter

Hunted Hunter

Learn that there is a path with very low visibility

Learn that getting caught in a dead end is deadly

Avoid path that goes near ambush places

Learn to check the hidden path periodically

Learn to try to trap hunted in dead end

Discover a place that makes a good ambush

Tactical reasoning andstrategies;Symbolic and spatialrepresentations

Transfer

Scenarios in UrbanCombat Testbed (UCT)

Page 39: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

39

Urban CombatLevel 10 Transfer: 1v1 to Fire Rescue

1 vs. 1 Fire Rescue

Pick up enemy ammo

Avoid being seen or shot

Don’t get caught in dead end

Use doors, walls for protection

Always have multiple exits

Consume enemy resources

Take advantage of terrain

Always leave an out

Tactical reasoning andstrategies;Symbolic and spatialrepresentations

Transfer

Backburn or backdraftRemove wood from fire’s path

Scenarios in UrbanCombat Testbed (UCT)

Page 40: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

40

Year 2 Plans for Urban Combat Evaluation

UCT

TL0

UCT

TL1

Source (A)Problems

Target (B)Problems

Interface

BTLPerf 0 B

TLPerf 1

Experience

Perf

orm

ance

Transfer Ratio > 30% (Y2 Go/No-Go)

Source: 1v1, 2v2

Target: FireRescue

SourceBK

BK+TK

TargetBK TL0

UCT

B

TransferredKnowledge (TK)

Tactical

Terrain

Resource

Spatial

Tactical

Terrain

Resource

Spatial

Ammunition

Combustible

Deadend

NoExit

TransferLearning

Performance:•Transfer ratio (go/no-go)•Demonstrate deep transfer•Comparison to human trials

Page 41: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

41

Year 2 Plans for Urban Combat Evaluation

• Metrics– Hunter: Time it takes agent to kill opponent

plus time penalties for health loss– Hunted: Inverse of time before opponent

kills agent plus time penalties for health loss

– 1v1: Time to kill N opponents plus time penalties for health loss and fewer than N kills

– Fire Rescue: Time to rescue ally from fire plus time penalties for health loss

Page 42: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

42

Year 2 Plans for Urban Combat Evaluation

• Transferred knowledge– Hunter ↔ Hunted (TL level 9)

• Spatial– Visibility, dead-ends, ambush places, terrain

• Tactical– Check (hunter) / seek (hunted) low visibility areas– Trap in (hunter) / avoid (hunted) dead-ends– Seek (hunter) / avoid (hunted) ambush places

– 1v1 ↔ Fire Rescue (TL level 10)• Spatial

– Accessibility of resources (ammunition / combustibles)– Dead-ends, exits, terrain

• Tactical– Consume enemy resources– Use terrain for protection– Always leave an out

– Taxonomic• Types of terrain, spaces, resources and tactics

Page 43: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

43

Year 2 Plans for Urban Combat Evaluation

• Performance milestones– Based on TL levels 9 and 10– Go/No Go: Transfer ratio > 30%– Demonstrate achievement of specific deep

transfer opportunities (e.g., ammunition combustible

– Comparison to human trials• Transfer ratios

• Deep transfer

Page 44: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

Level 10: Differing

Source / target game graphs share substructure

corresponding to transferable strategy.

Year 2 Experimental Plans for GGP

GGP Terms• Game: defines environment in which agent operates.

– Includes initial state, terminal states, transition function, goals

– Score associated with goal and terminal states

• Match: competition between two agents in a game

• Scenario: source / target pairing of games

– Source / target may vary in one or more ways (initial state, terminal states, goals, transitions)

– Typically exactly one source and one target game

Protocol• Two transfer levels

9. Reformulating

10. Differing

• Seven scenarios per level

• Multiple consecutive matches in each game • Fixed opponent (non-learning)• Players receive score according to goal

• Domain performance metric: Score from satisfied goal

• Domain performance goal: Maximize score

Source Target

Structural Transfer

Level 9: ReformulatingSource / target game graphs are isomorphic.

• Source / target game descriptions fundamentally different

• Different axioms

• Different structural representation

• Equivalent meaning (same state graph structure)

• Transfer must occur at structural level

Opponent choices

can lead to several

possible successor

states

Current goal value is 50

Regardless of the opponent’s move,

agent can reach a state with a

higher goal value

Example: Generalized Fork

75 7065

50

Page 45: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

45

Year 2 Plans for Physics Testbed

• All of Newtonian Dynamics

– Requires calculus, higher-degree polynomials, graphs

• Two areas from Dynamical Analogies

– Well-explored cross-domain analogies in various physical domains

– Excellent venue for exploring distant transfer

M

F

L

M

• Two areas from Dynamical Analogies– Well-explored cross-domain analogies

in various physical domains– Excellent venue for exploring distant

transfer– Example: Domain A = linear motion

Domain B = rotational motion, thermal systems, hydraulics, electricity, …

Page 46: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

46

Physics TestbedDynamical Analogies in Detail

Page 47: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

47

BACKUP SLIDES

Page 48: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

48

Transfer Using Structure Mapping

• Model/knowledge oriented approach• Using knowledge about

– How actions affects state variables?– How state variables relate to each other?

• Use structure mapping to find similarities between source and target tasks

• Discover β and γ together

Objective

Technical Approach

Results Keepaway

Summary• Works nicely for Keepaway• Strong demand for domain knowledge• Provides similarity measures for source and target• Future work

–Improve efficiency–Learn QDBN from data–Apply to GGP and Urban Combat

• Qualitative DBNs– Dynamic Bayes networks are structure representation for

actions: an action (directly) affects a small number of state variables

– Probabilities are less relevant; more qualitative properties matter: no change, increase/decrease, etc.

• Specialized and optimized SME for QDBNs– Fixed types of entities– How entities match?– How to evaluate mappings?

• SME-QDBN uses heuristicsearch to find the mapping of the maximal score

– Prune with upper bounds

1. Generate local matches and calculate the conflict set for each local match;

2. Generate initial global mappings based on immediate relations of local matches;

3. Merge global mappings with common structures;4. Search for a maximal global mapping with the highest score;

Step 3 Step 4

Algorithm

t t+1HoldDist(K1,C)

Dist(K2,C)

Dist(T1,C)

Dist(K1,K2)

Dist(K1,T1)

Dist(K2,T)

Ang(K2,K1,T)

DAng(C,K1,K2)

DAng(C,K1,T1)

Dist(K2,T1)

Ang(K2,K1,T1)

Dist(K1,C)

Dist(K2,C)

Dist(T1,C)

Dist(K1,K2)

Dist(K1,T1)

Dist(K2,T)

Ang(K2,K1,T)

DAng(C,K1,K2)

DAng(C,K1,T1)

Dist(K2,T1)

Ang(K2,K1,T1)

-

min

min

Page 49: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

49

Policy Transfer Using NEAT

• An alternative to value-function-based transfer• Direct policy transfer based on the mappings of state

variables (β) and actions (γ)

• NEAT (NeuroEvolution of Augmenting Topologies)– Uses genetic algorithms to evolve neural networks– Neural networks are used as action selectors

Results for Keepaway

Results for SchedulingObjective

• NEAT evolves 3v2 players

13 inputs, 3 outputs

19 inputs, 4 outputs

• Use a (from β & γ) to transform organisms

• NEAT evolves 4v3 with population from 3v2

• An autonomic computing task• Task: determine in what order to process jobs• Goal: maximize aggregate utility

• Source task: 2 job types (8 state variables, 8 actions)• Target task: 4 job types (16 state variables, 16 actions)

Episodes

Cos

t Scratch

With Transfer

t.t.r. = 80%t.r. = 35

t.t.r. = 84%t.r. = 5.8

Comparison of Sarsa and NEAT• Taylor, Whiteson, & Stone (GECCO-06)

Page 50: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

50

Evaluation Process

• Year 1: Near-transfer (levels 1-6)– A = set of basic problems, B= transfer variations– Training runs include quiz of four problems, followed by

worked solutions– Experiment design worked out by ETS, NU

• ETS provided training + test examples for NU’s research needs– Novel problems from same templates were used for

evaluation• Tests were carried out on a sequestered NU cluster

– 5 nodes to ETS for Physics– Scripting language developed to facilitate creation of

experiments– Code frozen at start of evaluation– Efforts to make Companions usable by others has been an

important step towards making the architecture into a robust product

Page 51: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

51

Good News from Evaluation

• With 2/3rds of the data in, likely to achieve our 50% near transfer goal

• Simple model works surprisingly well– Study worked solution = store it. – Extracted equations, modeling conditions, and

sanity checks from prior problems• Analogical mapping problems did arise

– Some straightforward rerepresentation techniques may suffice for near transfer of concrete cases

• Most of the failures were to due to limitations in parts of the system where learning currently does not take place– Provides clear examples for helping drive Y2

research

Page 52: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

52

Research Questions Raised by Evaluation so far

• Self-monitoring to learn self-models is important– 25% of restructuring problems failed because of hard-wired

resource bounds. • A smarter system would figure out that it was making

reasonable progress, and dynamically increasing the bounds would enable it to solve the problem.

• A really smart system would look over the pattern of activity, see a lot of redundancy, and figure out how to change its strategies to be more efficient

• Even black-box subsystems need to be extensible by learning– Example: ArcSineFn, ArcCosineFn left out of algebra

system, causing 25% of the restructuring problems to fail.– In real world, such gaps are always possible– Cognitive systems must be adaptable enough to work

around them.

Page 53: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

53

Common materials developed for Physics Testbed

• Representational Infrastructure– Starting with ResearchCyc plus Northwestern’s

representations– Extended to include representation of worked solutions

(NU, Cycorp, ETS)• Support Libraries

– Algebra package for symbolic, numerical equation solving

• With Johan de Kleer (PARC, Inc.)– Units package, tightly integrated with Cycorp

representations• Sketching tool for creating sketches associated with

worked solutions and problems– Modification of Sketching Knowledge Entry Associate

(sKEA) developed in earlier DARPA programs– Not deployed this year due to tight evaluation schedule

Page 54: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

54

Understanding Strengths and Weakness of Soar / ICARUS

• UCT Stresses– Real time– Integration of reaction, decision making, and planning– Spatial reasoning

• Level 9 and 10 transfer stress – Flexibility in reasoning – Mixing task performance, deliberate reflection– Using multiple learning mechanisms– Multiple strategies for transfer– Knowledge-based transfer

Page 55: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

55

Michigan Year 2 Plans for Urban Combat

• What capabilities/technologies you will provide?– Transfer across variations at the highest levels: 9 – Reformulation, 10 - Differing– Transfer of tactical reasoning and strategies across symbolic and spatial

representations – Use synergy across multiple learning mechanisms and general strategy discovery

methods– Compare and contrast across two cognitive architectures (Soar and ICARUS)

• How you plan to evaluate those capabilities?– In variations of UCT scenarios and maps across very different goals

• 1v1 Hunter/hunted• 1v1, 2v2 combat engagements • Search/rescue

– How fast and how safely can they perform their tasks?• How and when they will be integrated into the larger system(s)?

– From day 1 they will be integrated in Soar & ICARUS – full cognitive architectures– All scenarios require complete end to end behavior

• What unique ability your technologies will add?– Transfer across tasks requiring a combination of bottom-up, knowledge-based, and

spatial reasoning– Real-time, on-line learning and transfer– Learning mechanisms that require small numbers of source trials

Page 56: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

56

Example Level 10 Transfer

Examples of general tactics learned from source: 1v1 Engagements

1.Consume opponent’s resources– Pick up enemy’s ammo

2.Divide and conquer – Attack one enemy at a time

3.Attack from a distance– Attack with rifle at distance

4.Minimize exposure– Take advantage of terrain

5.Always leave yourself an out6.Sacrifice for ultimate goal

Examples of transfer to target (rescue from burning building).

1. Consume fire’s resources– Set a backburn/backfire– Remove fuel (word)

2. Divide and conquer– Putting out one fire at a time

3. Avoid getting close to fire– Use ropes & tools to work at

distance4. Minimize exposure to fire

– Use barriers/doors5. Always have a safe exit available6. One fights (to death) while other

rescues

Page 57: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

57

Transfer of Spatial Knowledge

• Why Spatial Reasoning and Knowledge?– Ubiquitous across all military domains, inherent to military tactics and strategy– Requires integration of symbolic and metric data

• Examples:– Types of surfaces that facilitate/hinder travel

• Not only speed travel but are likely to provide unobstructed paths• Roads and sidewalks vs. grassy areas and buildings• Bridges over water

– Placement of IEDs relative to structures (fixed and dynamic)• Best places to search for IEDs - Areas that can be ignored

– Common structures, organization of building that aid searching (next slide)• Transfer basic structure (in two-story houses, bedrooms usually on second floor)

– Locations for attacking/defending from enemies• Exposure to detection and fire• Location for setting ambush / being ambushed• Sniper positions• How to out flank opponent

– Spatial organization of groups of agents• Too difficult for year 2 (maybe can get to in year 3) • Ability to provide cover to teammates • How to search as a group• How to attack and defend as a group

Page 58: 1 Transfer Learning Site Visit August 4, 2006 Report of the ISLE Team Pat Langley Tom Fawcett Daniel Shapiro Institute for the Study of Learning and Expertise

58

Combinations of Spatial and Symbolic

• Common spatial layouts of specific types of buildings– Movie theaters– Schools– Restaurants – Office buildings– Homes

• Common spatial layouts of specific types of rooms– Bathrooms– Offices– Classrooms– Theaters

• Correlation of signs and spatial structures– Exit signs– Street signs– Restrooms

Will add graphics