17
1 An Application of Reinforcement Learning to Aerobatic Helicopter Greg McChesney Texas Tech University Greg.mcchesney@ttu.edu Apr 08, 2009 CS5331: Autonomous Mobile Robots

1 An Application of Reinforcement Learning to Aerobatic Helicopter Greg McChesney Texas Tech University [email protected] Apr 08, 2009 CS5331: Autonomous

Embed Size (px)

Citation preview

1

An Application of Reinforcement Learning to Aerobatic Helicopter

Greg McChesneyTexas Tech University

[email protected]

Apr 08, 2009CS5331: Autonomous Mobile

Robots

Overview

Creating a robot that can fly autonomously

Software developed at Stanford as part of their AI lab

This paper is slightly outdated as many new maneuvers have been created.

Apr 08, 2009CS5331: Autonomous Mobile

Robots 2

Learning Approach

Apprenticeship Collect data from human trying

maneuver (multiple times) Learn a model from the data Find controller than can simulate based

on model Test on helicopter (pray it doesn’t

crash)

Apr 08, 2009CS5331: Autonomous Mobile

Robots 3

Helicopters State

Position Velocity Angular Velocity Controlled with 4 dimensions

Cyclic pitch Tail rotor

Take gravity out when calculating the model

Apr 08, 2009CS5331: Autonomous Mobile

Robots 4

Controller Design

Use a Markov decision process Sextuple (S,A,T,H,s(0),R)

S-set of states A-set of actions (inputs) T-dynamic model-set of probability

distributions for the next state H-horizon or number of time steps of

interest s(0)-initial state R-reward function

Apr 08, 2009CS5331: Autonomous Mobile

Robots 5

Differential Dynamic Programming(DDP)

Compute the linear approximation Compute the optimal solution to the

linear quadratic regulator Must take into account error state Cost for change in input-needed in real

testing

Apr 08, 2009CS5331: Autonomous Mobile

Robots 6

DDP-Continued

2 phases DDP to find open loop input sequence Use DDP again refining the inputs as a

deviation from the nominal open-loop input sequence

Integral control-take into account wind and errors in the model

Apr 08, 2009CS5331: Autonomous Mobile

Robots 7

Rewards

24 features Used inverse reinforcement learning Rewards from inverse reinforcement

usually did not produce correct result

Took inverse results and manually tuned them to get good results

Apr 08, 2009CS5331: Autonomous Mobile

Robots 8

Helicopter

Xcell Tempest 54” long 19” high 13 lbs Two-stroke engine Orientation sensors GPS-doesn’t work during flips

Apr 08, 2009CS5331: Autonomous Mobile

Robots 9

Apr 08, 2009CS5331: Autonomous Mobile

Robots 10

Flip

Apr 08, 2009CS5331: Autonomous Mobile

Robots 11

Roll

Apr 08, 2009CS5331: Autonomous Mobile

Robots 12

Tail-In Funnel

Apr 08, 2009CS5331: Autonomous Mobile

Robots 13

Nose-In Funnel

Apr 08, 2009CS5331: Autonomous Mobile

Robots 14

Questions

Motivations/Who pays for it I can see applications in the defense

sector DARPA

Could more maneuvers be done just by changing some parameters? Probably not because the filter is

learned based on a model so you would need to create a new model

Apr 08, 2009CS5331: Autonomous Mobile

Robots 15

More Questions

What's the relationship between reinforcement learning and MDP? Not Sure

Could a helicopter like this operate in the West Texas wind storms?

Apr 08, 2009CS5331: Autonomous Mobile

Robots 16

Fun Stuff

Videos: http://heli.stanford.edu/ http://www.youtube.com/watch?v=VCd

xqn0fcnE Helicopter

http://www.miniatureaircraftusa.com/helicopterkits/1025_Spectra_G/1025_kit_main.asp

Apr 08, 2009CS5331: Autonomous Mobile

Robots 17