ICO Learning

  • View
    55

  • Download
    0

Embed Size (px)

DESCRIPTION

ICO Learning. Gerhard Neumann Seminar A, SS06. Overview. Short Overview of different control methods Correlation Based Learning ISO Learning Comparison to other Methods ([Wörgötter05]) TD Learning STDP ICO Learning ([Porr06]) Learning Receptive Fields ([Kulvicius06]). - PowerPoint PPT Presentation

Text of ICO Learning

  • ICO LearningGerhard NeumannSeminar A, SS06

  • OverviewShort Overview of different control methodsCorrelation Based LearningISO LearningComparison to other Methods ([Wrgtter05])TD LearningSTDPICO Learning ([Porr06])Learning Receptive Fields ([Kulvicius06])

  • Comparison of ISO learning to other MethodsComparison for Classical Conditioning learning Problems (open loop control)Relating RL to Classical ConditioningClassical Conditioning: Pairing of two subsequent stimuli is learned such that the presentation of the first stimulus is taken as a predictor of the second one. RL: Maximization of Rewards:

    v Predictor of future reward

  • RL for Classical ConditioningTD-Error:Derivation Term : Weight Change:=> Nothing new so farGoal: Output v should react after learning to the onset of the CS xn, and remains active until the reward terminatesPresent CS internally by a chain of n + 1 delayed pulses xi Replace the states from traditional RL with time steps

  • RL for Classical Conditioning

    Special kind of E-TraceSerial Compound RepresentationLearning Steps:Rectangular response of vSpecial Treatment of the reward not necessaryx0 can replace the reward when setting w0 to 1 at the beginning

  • Comparison for Classical ConditioningCorrelation Based Learning

    Reward x0 is not an independent term as in TD learningTD-Learning

  • Comparison for Classical ConditioningTD-Learning

    ISO-Learning

    Uses another form of E-Traces (Band-pass filters)Used for all input pathways-> also for calculating the output

  • Comparison for the closed loopClosed loopActions of the agent affect future sensory inputComparison not so easy any more, because behavior of the algorithms is now quite differentReward Based ArchitecturesActor-Critic ArchitectureUse Evaluative Feed-BackReward MaximationA good reward signal is veryoften hard to findIn nature: Found by evolutionCan theoretically be applied to any learning problemResolution in the State Space:Only applicable for low dimensional state spaces-> Curse of dimensionality!

  • Comparison for the closed loopCorrelation Based ArchitecturesNon-evaluative feedback, all signals are value freeMinimize DisturbanceValid Regions are usually much bigger than in for reward maximationBetter Convergence !!Restricted SolutionsEvaluations are implicitely build into the sign of the reaction behaviorActor and Critic are the same architectureal building blockOnly for a restricted set of learning problemsHard to apply for complex tasksResolution in Time:Only looks at temporal correlation of the input variablesCan be applied for high dimensional state spaces

  • Comparison of ISO learning and STDPISO learning generically produces a bimodal weight change curveSimiliar to the STDP (Spike timing dependent plasticity) learning weight change curve

    ISO learning STDP rule: Potential from the synapse: Filtered version of a spikeGradient Dependent ModelMuch faster time scale used in STDPCan model different kind of synapses with different filters easily

  • OverviewShort Overview of different control methodsCorrelation Based LearningISO LearningComparison to other Methods ([Wrgtter05])TD LearningSTDPICO Learning ([Porr06])Learning Receptive Fields([Kulvicius06])

  • ICO (Input Correlation Only) LearningDrawback of Hebbian LearningAuto-Correlation can result in divergence even if x0 = 0ISO learning: Relies on orthogonal filters of different inputsOrthogonal to its derivativeOnly works for if steady state is assumed Auto correlation does not vanish any more if the weights are changed during the impulse response of the filters-> can not be applied for large learning rates=> Can be used only for small learning rates, otherwise Auto-Correlation causes divergence of the weights

  • ICO & ISO LearningISO Learning

    ICO Learning

  • ICO LearningSimple adaption of the ISO Learning ruleCorrelate only inputs with each otherNo correlation with the output -> No Auto CorrelationDefine one Input as the reflex input x0Drawback: Loss of Generality: Not Isotropic any moreNot all inputs are treated equally any moreAdvantage:Can use much higher learning rates (up to 100x faster)Can use almost arbitrary types of filterNo Divergence in weights any more

  • ICO LearningWeight change curve (open loop, just one Filter bank)Same as for ISO learning

    Weight changing curveISO learning contains exponential instabilityEven after setting x0 to 0 after 100000 timesteps

  • ICO Learning: Closing the LoopOutput of learner v feeds back to its inputs xj after being modified by the environmentReactive Pathway: Fixed Reactive Feedback controlLearning Goal: Learn earlier reaction to keep x0 (Disturbance or error signal) at 0One can proof that under simplified conditions that one shoot learning is possibleWith one filter bank, impulse signalsUsing Z-Transform

  • ICO Learning: ApplicationsSimulated Robot Experiment:Robot has to find food (disks in the environment)Sensors for Uncondition Stimulus:2 Touchsensors (Left + Right)Reflex: Robot elicits a sharp turn as it touches a diskPulls the robot into the centre of the diskSensors for predictive Stimulus2 Sound (Distance) Sensors (Left + Right), DisksCan measure distance to the diskStimulus: Difference between Left + Right sound signalsUse 5 filters (resonators) in the filter bankOutput v: Steering angle of the Robot

  • ICO Learning: Simulated RobotOnly One experience has been sufficient to show an adapted behaviorOnly Possible with ICO learning

  • Simulated RobotComparison for different Learning ratesICO Learning ISO Learning

    Learning was successful if for a sequence of four contacts Equivalent for small learning ratesSmall Auto correlation term

  • Simulated RobotTwo Different Learning Rates

    Divergent Behavior of ISO learning for high learning ratesRobot shows avoidance behavior from food disks

  • Applications continuedMore Complex Task:Three food disks simultanouslyNo simple relationship between the reflex input and the predictive input any moreSuperimposed Sound FieldsIs only learned by ICO learning, not by ISO learning

  • ICO: Real Robot ApplicationReal Robot:Target White disk from a distanceReflex: Pulls the robot into the white disk just at the moment the robot drives over the diskAchieved by analysing the bottom-scanline of a cameraPredictive input: Analysing Scanline from the top of the imageFilter Bank5 FIR Filters with different filter lengthAll coefficients set to 1 -> smear out signalNarrow viewing angle of the cameraPut robot more or less in front of the disk

  • ICO: Real Robot ExperimentProcessing the inputCalculate the deviation of the positions of all white points in a scanline to the center of the scanline1D signalResults:A before learningB & C After learning14 contactsWeights oscillate aroundtheir best values, but do not diverge

  • ICO Learning: Other ApplicationsMechanical ArmArm is always controlled with a PI controller to a specified set pointInput of the PI controller: Motor positionPI controller is used as reactive filterDisturbance:Pushing force of a second small arm mounted to the main armFast reacting touch sensors measures D.Use 10 resonator filters in the filter bank

  • ICO Learning: Other ApplicationsResult:Control is shifted backwards in timeError signal (derivation to the set point) almost vanishesOther example: Temperature ControlPredict temperature changes caused by another heater

  • OverviewShort Overview of different control methodsCorrelation Based LearningISO LearningComparison to other Methods ([Wrgtter05])TD LearningSTDPICO Learning ([Porr06])Learning Receptive Fields([Kulvicius06])

  • Development of Receptive fields through temporal Sequence learning [Kulvicius06]Develop receptive fields by ICO learningLearn behavior and receptive fields simultanouslyUsually these 2 learning processes are considered seperatelyFirst approach where the receptive field and the behavior is trained simultanously!!Shows the application of ICO learning for high dimensional input spaces

  • Line FollowingSystem: Robot should learn to better follow a line painted on the groundReactive Input:x0 Pixels at the bottom ot the image Predictive Inputx1 Pixels in the middle of the imageUse 10 different filters in the filter bank (resonators)Reflexive Output:Brings robot back to the lineNot a Smooth behaviorMotor OutputS Constant Speedv modifies speed and steering of the robotUse Left-Right symmetry

  • Line FollowingSimple SystemFixed sensor banks, all pixels are summed up

    Input x1 predicts x0

  • Line Following Three different TracksSteep, Shallow, Sharp For one learning experiment always the same track is usedRobot steers much smootherUsually 1 trial is enough for learningVideosWithout LearningSteepSharp

  • Line Following: Receptive FieldsReceptive fieldsUse 225 pixels for the far sensorsUse individual filter banks for each pixel10 filters per pixelLeft-Right Symmetry:Left Receptive field is a mirror of the right

  • Line Following: Receptive FieldsResultsLower learning rates have to be usedMore trials are needed (3 to 6 trials)Different RFs are learned for different tracks

    Steep and Sharp Track, Plots show the sum of all filter weights for one pixel

  • ConclusionCorrelation Based LearningTries to minimize the influence of disturbancesEasier to learn than Reinforcement LearningThe framework is less generalQuestions:When to apply Correlation Based Learning and when Reinforcement Learning How is it done by Animals/Humans?How can these two methods be combinedCorrelation learning in early learning stageRL for fine tuningICO LearningImprovement of