10/29/04 1 Acquisition of Control Knowledge of Nonholonomic System by Active Learning method Yoshitaka Sakurai Nakaji Honda Junji Nishino Presented by:

10/29/04 1

Acquisition of Control Knowledge ofNonholonomic System by Active Learning method

Yoshitaka Sakurai Nakaji Honda Junji Nishino

Presented by: Pujan Ziaie

能動学習法を用いた非ホロノミック系の知識制御の獲得

10/29/0410/29/04 22

Paper InformationPaper Information

JournalJournal of of Advanced Computational Advanced Computational Intelligence Intelligent InformaticsIntelligence Intelligent Informatics– Received August 28,2002 ; accepted December 13,2002Received August 28,2002 ; accepted December 13,2002

Proc.Proc. of 2003 of 2003 IEEE International IEEE International Conference on Systems Conference on Systems – pp.2400--2405 (2003.10)pp.2400--2405 (2003.10)

10/29/0410/29/04 33

About authorAbout author– Yoshitaka Sakurai (P.H.D. Student)Yoshitaka Sakurai (P.H.D. Student)

–University of University of Electro-CommunicationsElectro-CommunicationsDepartment of systems EngineeringDepartment of systems EngineeringHonda Lab.Honda Lab.

10/29/0410/29/04 44

IntroductionIntroduction

ALMALM (Active Learning Method) (Active Learning Method) IDSIDS (Ink Drop Spread) (Ink Drop Spread) Simulation for Simulation for Gymnastic Bar ActionGymnastic Bar Action

– Mathematical Model & EquationsMathematical Model & Equations– Active Learning ApproachActive Learning Approach

ConclusionConclusion

10/29/0410/29/04 55

Active Learning Active Learning MethodMethod Why ALM?Why ALM?

– No need to now the System Inner No need to now the System Inner StructureStructure

– Improving performance by its ownImproving performance by its own

CharacteristicsCharacteristics ConstructionConstruction ModelingModeling

10/29/0410/29/04 66

ALM ALM CharacteristicsCharacteristics

Using SiSO systemsUsing SiSO systems– Choosing most effective dataChoosing most effective data

Accumulation of knowledge by Accumulation of knowledge by ExperienceExperience– Reinforcement Learning (reward or Reinforcement Learning (reward or

punishment)punishment) Estimation of overall information By Estimation of overall information By

fragmentary informationfragmentary information

10/29/0410/29/04 77

ALM ALM ConstructionConstruction Similar to human learningSimilar to human learning

Knowledge Acquisition Part Controller

System Under Control

Modeling

Data collectionEvaluation

Database Control Rule

Sampling RuleStorage Of I/O data

IDS

Trial & Error

10/29/0410/29/04 88

ALM ALM Modeling Modeling (1)(1)

Dividing MIMO System to SISO SystemsDividing MIMO System to SISO Systems Dividing input Domains to fuzzy Dividing input Domains to fuzzy

regionsregions Extracting the continues narrow pathExtracting the continues narrow path Calculating the output by Sum of the Calculating the output by Sum of the

(Adaptability of each region * region-output)(Adaptability of each region * region-output)

MIMOSystem

SISO

SISO

SISO

CombinationRule

CombinationRule

10/29/0410/29/04 99

ALM ALM Modeling Modeling (2)(2)

Example :Example : 2-Input > 1-Output 2-Input > 1-Output

VS S M L VL

X1

y

a

ßM ßL

yM yL

X2 X2b b

ZMZL

y = ßvs * Zvs + ßs * Zs + ßM * ZM + ßL * ZL + ßVL * ZvL

y = ßM * ZM + ßL * ZL

X1=a & X2=b

10/29/0410/29/04 1010

IInknk DDroprop SSpread pread methodmethod

What is IDS?What is IDS?• Extract narrow path by using fuzzy Extract narrow path by using fuzzy

process on input-output dataprocess on input-output data Why using IDS?Why using IDS?

• Create a continuous narrow pathCreate a continuous narrow path• Measure the data distribution amount Measure the data distribution amount

(extracting the (extracting the most effective inputmost effective input))

10/29/0410/29/04 1111

IDS Algorithm IDS Algorithm (1)(1)

Using irradiation pyramid on data planeUsing irradiation pyramid on data plane

10/29/0410/29/04 1212

IDS Algorithm IDS Algorithm (2)(2)Data plan

Projected plan

Combining the lights Narrow path

10/29/0410/29/04 1313

IDS Algorithm IDS Algorithm (3)(3)

Sample of IDSSample of IDS

Gathering more Data( through feedback )

Gathering more Data

10/29/0410/29/04 1414

Control process Control process (1)(1)

Defining the control structureDefining the control structure• Dividing inputs into regions according Dividing inputs into regions according

to their rangeto their range• Selecting most efficient input for the Selecting most efficient input for the

required output ( by human or required output ( by human or controller)controller)

• Defining evaluation rule for selecting Defining evaluation rule for selecting suitable datasuitable data

10/29/0410/29/04 1515


Control cycleControl cycle1.1. Gathering data by using control rulesGathering data by using control rules

– First time using random numbersFirst time using random numbers– After first time, using the developed After first time, using the developed

controllercontroller

2.2. Evaluate the gathered dataEvaluate the gathered data

3.3. Improve the partial knowledge Improve the partial knowledge function (in case of proper data)function (in case of proper data)

4.4. Repeat from step 1Repeat from step 1

10/29/0410/29/04 1616


Output calculation methodOutput calculation method1.1. Remove the most efficient input Remove the most efficient input

from inputsfrom inputs2.2. Build input states tree according Build input states tree according

to valid fuzzy regionsto valid fuzzy regions3.3. Extract a narrow path of Extract a narrow path of

the the most efficient inputmost efficient input and and output output for each leaf of the treefor each leaf of the tree

4.4. Calculate the final output value Calculate the final output value by sum of by sum of output of each nodeoutput of each node multiplied by the multiplied by the adaptability of adaptability of that nodethat node..

VS S M L VL

Xn

y

a

ßM ßL

From narrow path

By multiplying theMembership valuesOf nodes from rootTo the leaf

10/29/0410/29/04 1717

Gymnastic Bar ActionGymnastic Bar ActionModel of Bar GymnastModel of Bar Gymnast– 4 joints & 5 links4 joints & 5 links

Link 0 is not drivenLink 0 is not driven

– θθ00 is dependent of the is dependent of the position of center of gravityposition of center of gravity of of the model and the model and shape of posture.shape of posture.

The mass of the head is The mass of the head is assumed to be 0.assumed to be 0.

GOAL:GOAL: achieve the largest swing angel achieve the largest swing angel

10/29/0410/29/04 1818

Equations:Equations:–θθii: relative angle between link i-: relative angle between link i-1 and link i at each joint i. 1 and link i at each joint i. (i=0..4)(i=0..4)–TT: kinetic energy: kinetic energy–VV: potential Energy (gravity): potential Energy (gravity)–LL: T-V > Lagrangian equation: T-V > Lagrangian equation

–IiIi:: moment of inertia moment of inertia–xi, yixi, yi: coordinates of center : coordinates of center of gravity of the iof gravity of the ithth link link–NiNi: torque applied on each : torque applied on each joint ijoint i

10/29/0410/29/04 1919

Acquisition of knowledgeAcquisition of knowledge

Does a little Kid learn the Does a little Kid learn the gymnastic Bar, by Solving gymnastic Bar, by Solving lagrangian equationlagrangian equation?!?!

NO! Trying to Learn from the Trying to Learn from the

environment by environment by trial and errortrial and error

10/29/0410/29/04 2020

ALMALM against against Model of Bar gymnastModel of Bar gymnast

Knowledge Acquisition Part Controller

SimulatorData collection

Evaluation

DatabaseControl Rule

Sampling Rule

IDS

SequentialDatabase

Modeling

IO Model

IDS Diagrams

After some specified timeComparing with last

most Swing angles

Probability based on distribution

10/29/0410/29/04 2121

Simulation propertiesSimulation properties

Sampling rate: each 1/1000 SecSampling rate: each 1/1000 SecEvaluation: each 2 minutesEvaluation: each 2 minutesAngle range & division:Angle range & division:– θθ00 : : -180 to 180 -180 to 180 > 8 MFs> 8 MFs

– θθ1: 1: 0 to 130 0 to 130 > 5 MFs> 5 MFs

– θθ2: 2: -180 to 0 -180 to 0 > 5 MFs> 5 MFs

– θθ3: 3: -130 to 30 -130 to 30 > 5 MFs> 5 MFs

– θθ4: 4: 0 to 130 0 to 130 > 5 MFs> 5 MFs

Most Effective input of each Most Effective input of each output (joint Torque) > output (joint Torque) > the angle the angle of the same jointof the same joint

2210/29/04

Simulation result (1)

2310/29/04

Simulation result (2)

2410/29/04

Conclusion

ALM is a Strong flexible method against some complicate control problems

Mathematics is completely useless for many control problems

Advantages of this approach flexibility easiness

disadvantages imperfect information collecting rule still too crisp

2510/29/04

Why did I choose this paper?

I liked it. It was quite a challenge It was brand new I have some ideas to improve it

using fuzzy approaches for outputcorrecting membership functions instead of

adding new data

2610/29/04

acknowledgment

Special thanks toSakurai-san for giving me his time and

answering my questionYamazaki-san who helped me to write the

Japanese translation of technical wordsSerata-san who set me an appointment with

Sakurai-san

10/29/0410/29/04 2727

Thank you all for listeningThank you all for listening

Any easy questions?!Any easy questions?!

10/29/0410/29/04

Declaration Declaration SlideSlide

Sampling ruleSampling rule

Xn

probability function

• Probability based on distribution function

y

y

10/29/0410/29/04

Declaration Declaration SlideSlide

input treeinput tree i.e. : i.e. :

y – four inputs (x0..x3)y – four inputs (x0..x3)– – x1 is the most efficientx1 is the most efficient

VS S M L VL

Xn

y

a

ßM ßL x0

x2 x2

x3 x3 x3 x3

ßob ßoa

ß2c ß2cß2d ß2d

ß3e

adaptability of this state(1): ß_L1_S1: ß0b * ß2c * ß3e

x1 output for this state(1): f_L1_S1(x1)

y

y

ß_L1_S1*f_L1_S1(x1)+

ß_L1_S2*f_L1_S2(x1)+

ß_L1_S3*f_L1_S3(x1)+

ß_L1_S4*f_L1_S4(x1)

ß3f

using partial knowledge

function

Documents

10/29/04 1 Acquisition of Control Knowledge of Nonholonomic System by Active Learning method Yoshitaka Sakurai Nakaji Honda Junji Nishino Presented by: