Model-Assisted Stochastic Learning for Robotic Applications

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 8, NO. 4, OCTOBER 2011 835

Model-Assisted Stochastic Learning forRobotic Applications

Jeremy A. Marvel, Member, IEEE, and Wyatt S. Newman, Senior Member, IEEE

Abstract—We present here a framework for the generation, ap-plication, and assessment of assistive models for the purpose ofaiding automated robotic parameter optimization methods. Ourapproach represents an expansion of traditional machine learningimplementations by employing models to predict the performancesof input parameter sequences and then filter a potential popula-tion of inputs prior to evaluation on a physical system. We furtherprovide a basis for numerically qualifying these models to deter-mine whether or not they are of sufficient quality to be capableof fulfilling their predictive responsibilities. We demonstrate theeffectiveness of this approach using an industrial robotic testbedon a variety of mechanical assemblies, each requiring a differentstrategy for completion.

Note to Practitioners—This paper was motivated by the problemof online parameter optimization for robotic assembly applicationsin which the mechanical joining of components must be completedboth quickly and gently. Traditional approaches for such optimiza-tions require offline modeling or user-specified tests involving trial-and-error or design of experiments due to the inherent risks ofdamaging the robots, tools, or components. We propose a methodof automating and enhancing the optimization process by applyingand hybridizing machine learning practices. Through the utiliza-tion of both unsupervised learning methods and dynamic numer-ical model building and analysis, a robotic system can tune its op-erational parameters and effectively prune a potentially infiniteparameter space of inefficient or dangerous values. These modelscan be algorithmically analyzed in order to monitor their quality,and predict whether or not they stand to benefit the optimizationprocess.

Index Terms—Intelligent robots, machine learning, model-basedlearning, parameter optimization.

I. INTRODUCTION

T HE UTILIZATION of robotic manipulators in tasks re-quiring the physical contact of rigid parts has been limited

due to difficulties in algorithmically describing the processesand restrictions in control paradigms. The automation of robotictasks is not trivial, even when considering the powerful tools ofcompliant motion control and machine vision. The force control

Manuscript received August 04, 2010; revised February 02, 2011; acceptedMarch 21, 2011. Date of publication July 05, 2011; date of current version Oc-tober 05, 2011. This paper was recommended for publication by Associate Ed-itor L. Moench and Editor K. Goldberg upon evaluation of the reviewers’ com-ments.

J. A. Marvel is with the University of Maryland, Institute for Research in Elec-tronics and Applied Physics, College Park, MD 20742-3511 USA, and also withthe Department of Electrical Engineering and Computer Science, Case WesternReserve University, Cleveland, OH 44106-7071 USA (e-mail: [email protected]; [email protected]).

W. S. Newman is with the Department of Electrical Engineering and Com-puter Science, Case Western Reserve University, Cleveland, OH 44106-7071USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TASE.2011.2159708

parameters necessary for the gentle acquisition of parts and thesafe handling of collisions of rigid components requires exten-sive manual tuning followed by lengthy trial-and-error testingwhenever changes are introduced.

Programming a robot to utilize force control for specific taskssuch as the mechanical assembly of components requires an ex-pert programmer to tune the program and reactive parameters.Even well-tuned programs, however, may be insufficiently ro-bust against minor configuration variations, adversely affectingreliability, throughput, and product quality. Adaptability andself-monitoring become necessities in automated systems tomaintain process quality.

In this paper, we present a framework for self-guided,self-diagnosing optimization. Section II provides a brief historyof learning in robotic applications. Section III establishes abaseline for learning using unguided stochastic search. InSection IV, we propose a methodology for providing model-as-sisted guidance to stochastic searches, and in Section V, wepropose a novel model assessment metric for self-analysis.

II. ROBOTIC LEARNING

There is no lack of interest in the automated tuning of pa-rameters for process optimization. However, most efforts withinthe domain of mechanical assembly focused on assembly se-quencing and workspace configuration. Other domains withinthe field of robotics seek the optimization of path and trajectorygeneration, which are largely independent of the actual tasksbeing accomplished by the robots.

A. Task Parameter Tuning

Much of what passes for modern tuning methods areexpected to be executed by adept hands. Experts employ sta-tistical approaches for tuning processes in order to improveperformance. Manual tuning methods based on Designs ofExperiments (DOEs) are demonstrably faster than exhaustiveparameter searches, but require the use of expensive personnelto construct, implement, and test the solutions. Automated toolshave been developed [1] to ease the labor burden, but humanoperators must still identify the parameters that are most likelyto drive performance.

A self-tuning model was developed by Simon et al. [2] inwhich optimization takes place on motion control primitives atthe program level. By applying a least squares fit to the dataand associating cost functions, the system ran a gradient descentapproach to parameter optimization. However, the approach fo-cused exclusively on snap-fit insertions, and did not consider thecomplexities of more advanced assemblies unsolvable by meansof gradient descent, particularly those showing variations in per-formance over time.

1545-5955/$26.00 © 2011 IEEE

836 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 8, NO. 4, OCTOBER 2011

A sensor-based robot adaptation approach was proposed byDeiterding and Henrich [3] to address the problems of acquiringcomplex sensor data and dealing with the necessary executionspeed slowdown caused by using such sensors. Using existingoptimization strategies, the system should react and adapt tocause-and-effect type changes in the configuration (i.e., the ori-gins and results of changes in performance). The approach re-quired human intervention to first identify the process compo-nents that can be optimized, however, which is largely a task oftrial-and-error.

Automatically learning through exploration of a problemspace to perform a given task has an attractive draw. Problemsare reduced to sets of known operational states, and the actuallearning determines when to transition from one state to thenext. An example of this is PEGASUS [4], which transformsany given partially observable Markov Decision Process(POMDP) into an “equivalent” one with only deterministictransitions, reducing the policy search problem to a simplifiedhunt for policies with higher estimated values. This system wasapplied to reinforcement learning for autonomous helicopterflight [5]. In dangerous problem domains such as autonomousflight, however, such explorations are best accomplished bymeans of simulations, and the states must first be known anddefined.

Vijayakumar et al. [6] developed the Locally WeightedProjection Regression (LWPR) to handle the dimensionalityproblem of robot control. LWPR is based on their earlier workof locally weighted regression (LWR) to learn control of basicmotor skills and handle devil sticks [7], and was then laterrevisited [8] for real-time learning on the same problem. Theirsystem was tested for online learning on a nonlinear crossfunction and control of a seven degree of freedom robot armfor discrete movement and rhythmic tasks [6]. It was tested onhumanoid robot for the separate tasks of motion reproductionand gaze stabilization [9], and for autonomous airplane control[10]. LWPR has yet to be applied to optimization, and waslimited to inverse kinematics problems.

Robot training models based on biological principles havealso been extensively researched. Artificial Neural Networks(ANNs) have been used in an impressive array of tasks rangingfrom trajectory control of robot manipulators [11], [12] toautonomous vehicle control [13]. Newman et al. [14] demon-strated that ANNs can also be used successfully to train a robotto perform specific assembly tasks. However, the major draw-backs to these methods are that they are all highly specializedto specific problems, and must be reworked when modificationsare introduced to the assembly setup or the robot. Additionalmachine learning methods such as Genetic Algorithms (GAs)[15] have also been explored extensively for assembly sequenceand scheduling optimization [16]–[20], but their employmentfor the actual robotic tasks is relatively limited. What researchhas been done has shown that GA can be used not only forparameter optimizations [21], but also for process learning.

To train robots by observing human operators perform tasks,Atkeson and Schaal [22] developed a method for robot trainingby filming a single instance of the task, building a model usinglinear regression of a priori knowledge of the task physics, andthen learning to balance a pendulum using cost minimizations.

Billard and Schaal [23] developed a simulation of a humanoidand trained it with an ANN to learn arm trajectories from obser-vations of human subjects, which was later expanded to extractrelevant aspects for imitation in a given task [24]. Newman et al.[25] transferred process knowledge by applying an abstractionmethod to observed simple assembly tasks performed by humanoperators. They discovered human search strategies based onthe actions of blindfolded test subjects to recreate the limitedknowledge available to both the human and the robot observer.

B. Model-Assisted Robot Training

The framework proposed in this paper relies on the develop-ment, utilization and analysis of models for parameter optimiza-tion, and continues an ongoing trend of research within man-ufacturing [26]. Most of these simulations and models, how-ever, were used extensively for manufacturing process, com-ponent design, and configuration analysis and tuning. The de-sired application of inline model building, training, and param-eter tuning seems to have been overlooked.

At the forefront of assembly models are the representationsknown as Assembly Features (AFs) that characterize the aspectsof the construction process. Deneux [27] introduced AF to ad-dress engineering design and assembly sequence solution em-bedding. He described them as solutions to using the represen-tations of relationships between two or more groups of parts andtheir respective part features. Gayretli [28] developed a mod-ularized system based on AF to aid the evaluation and opti-mization of manufacturing processes by creating cost estimatesand testing process plan practicality. Van Holland et al. [29],seeing that AF as component relationships were insufficient forassembly sequencing, expanded the definition to be a duality de-scription of component handling and component connections.The result was an object-oriented, integrated tool to juxtaposefunctional and geometric models. Ullah et al. [30] expanded themodel to include assembly intent, and described assemblies nu-merically though set theory. In each of these cases, the opti-mizing capabilities of AF were limited to operator-driven as-sembly planning and design, and did not address joining com-ponents.

Sanderson [31] described assemblies in terms of numericalrepresentations, enumerated by cumulative entropies. Theassembly task optimization was a matter of minimizing thetotal entropy of the system. The parts entropy method provideda common basis for comparing multiple assembly processes.Though his work focused mostly on the entropy of parts acqui-sition, orientation and positioning, it poses a promising pathfor describing dynamically generated models for performanceenhancement potential because of this comparative capacity.

The more direct method of topologically mapping the en-vironment was central to the optimized localization strategyfor physical assemblies devised by Chhatpar and Branicky[32]. Though these maps were built autonomously by the robotsystem, they were not built inline with the assembly process.Localization was based on matching observations to a C-spacemap captured prior to execution either analytically or by meansof empirical exploration of the environment.

MARVEL AND NEWMAN: MODEL-ASSISTED STOCHASTIC LEARNING FOR ROBOTIC APPLICATIONS 837

Yamanobe et al. [33] devised a method of optimizing forcecontrol parameters for assembly by means of creating special-ized software simulations for clutch assemblies. The resultsdemonstrated the promise of simulation, but the simulationswere computationally expensive, had to be manually devel-oped, and the assemblies had to be simplified significantly formore reasonable simulation times. Because the parameters forphysical assemblies could not be analytically derived, many ofthe physical attributes had to be empirically computed basedon a number of physical trials. Similarly, Yuan and Yang [34]proposed a virtual reality simulation integrated with an ANNfor assembly sequence planning and learning from human oper-ators. By means of human-run demonstrations, the ANN modelwas trained to optimize the part sequence planning and pathgenerations to avoid part collisions. All tests were performedin silico, however, and no actual assemblies were performed.

An alternative to explicit simulators, numerical modelingmethods fall under the umbrella of linear regression to estimatethe outputs of unknown input sequences, and adjusted based onempirical data. These models may also be used to describe thestrengths and natures of the relationships between the input pa-rameters and the output values. Smaller spreads of data throughthe regression, for instance, typically indicate that the systemoutputs are tightly coupled with their respective inputs. Themost common method of generating linear regressions to fitthe data uses the least squares method [35], though some haveattempted to approach the problem of fitting data in complexlinear spaces with less intuitive solutions such as integrableHamiltonian systems [36].

Generalized reinforcement learning (RL) with model assis-tance, while not explicitly applied to industrial tasks, has beenevaluated for various applications. When exact representationsof operational functions do not exist, RL methods have em-ployed models to guide exploration of a parameter space. Suchmethods have been applied to various robotic tasks such as nav-igation [37], adaptive behavior optimization [38], and control[39]. A shortcoming of such implementations, however, is thereliance on predefined states and application-specific transitionfunctions for complex MDP domains, though research done byTaylor et al. [40] demonstrates that the transition functions canalso be learned.

III. STOCHASTIC SEARCH AS A BASELINE FOR LEARNING

To establish a baseline for autonomous robot learning, thestochastic search properties of GA are utilized for the processof exploration for parameter optimization. A robot running on-line searches of the parameter space can learn how to assemblea variety of components both quickly and gently using only asimplistic, easily integrated random search.

Wei et al. [21], [41] used a regionally guided GA derivativecalled Guided Evolutionary Simulated Annealing (GESA, firstdescribed by Yip and Pao [42]) for assembly optimization. Coe-volving clans of gene sequences joined simulated annealing andsimulated evolution to form a competition model within and be-tween clans. GESA was designed to favor the evolutionary lin-eage that demonstrated the most proficiency and strongest like-lihood of producing optimal results. However, different sets ofsearch space vectors may result in highly competitive assembly

Fig. 1. In this GA, two competing clans are created based on a single grand-parent gene sequence (A). Mutations are based on a fixed-length gene sequence,the values of which can be modified up or down (B) according to a mutationrange vector. After each generation, the best-performing child is selected to bethe parent of the next generation (C).

performances, and prematurely stunting the progeny pool of agiven clan could hinder its convergence toward viable solutions.

The research presented here uses a variation to the GESA al-gorithm [43] that removed the interclan competition. Each clan,instead, was treated as a separate entity that did not vie for pop-ulation resources, but coevolved to accommodate the diverginggenetic strains. Competition was focused on intraclan dynamics,with the parent gene actively competing against child genes forgenerational dominance (Fig. 1).

The GA can be illustrated as a two-component system. Thedriver generates the gene sequences and performs the evolu-tionary succession evaluations. The interface interprets the genesequences, evaluates the results of running the sequences on therobotic platform, and then returns the performance results to thedriver. This modularity ensures that the GA driver is indepen-dent of the problem domain.

Both the evolution and guidance toward convergence weredriven by adaptive mutation vectors unique to each clan. Foreach generation, the bounds of offspring genetic variance arebased on the mutation function described as

(1)

Here, the function generates a sequence of size of gene-length vectors of random delta values with normal distributionaround the progenitor gene sequence, . These delta values arecreated with mutation variance, , to create the offspringof clan , . Parameter verification is handled by the robotcontroller, which reported invalid parameters to the GA as re-sulting in failed attempts.

To guide evolution, the values in the mutation variance vectorare reduced for each successive generation based on the decli-nation function

(2)

The learning rate causes the mutation rate to decrease overtime, . It is a user-defined parameter that changes based onprogeny succession. The value of is bimodal: ,

, where indicated child succession as


parent, and meant that the parent of the previous genera-tion retained its parent status. Its inclusion in (2) ensures thata clan will eventually converge on some local optimum, evenif no discernable progress is seen. Larger values of allowfor more exploration, but are likely to converge much slowerin the absence of child succession. Smaller values cause tonarrow faster, increasing the chance of convergence on subop-timal search parameters.

For smaller values of , the performance curve is smoother,but naturally takes longer to converge toward smaller errors.Conversely, larger values are subject to more varied per-formances, but also have the potential for better resultingperformances given an equitable number of trials in which toexplore the parameter space.

A. The Task-Specific GA Implementation

Typical GAs use binary strings with mutations consisting ofinserting and deleting digits, and randomly flipping bits. Here,however, the genes are represented as vectors of floating pointnumbers. Mutations of the gene parameters were limited tomodifying existing values, and were bound by a finite rangethat could be specifically set for each element.

The driver operated with the assumption that a higher scoreindicates a better gene sequence (i.e., a reward function is max-imized). For mechanical assembly, the interface was designedto minimize the assembly time and contact forces. The scoringmetric for the results, , was thus based on the reward func-tion, , defined as

(3)

where for clan , the child’s score, , is a function ofthe reported assembly time, , and the average encounteredforce, , with respect to the maximum allowed time and forcevalues, and . The scaling factor providedthe basis for shifting the weight of the score with regard to wherethe process importance was focused.

Robot motions are reduced to a set of primitive behaviors rep-resented as independent, atomic actions that can be chained to-gether to achieve complex actions. The system uses three basicsearch behaviors: spiral search, linear move, and radial search(Fig. 2). Combinations of these primitive types are sufficient forcompleting a variety of assembly tasks. Each search strategywas coded as a 20-element gene vector of floating point num-bers; however, at most only 15 were used. The remaining fivewere reserved for future features.

B. Trial Results

The GA assembly configuration consisted of an external PClinked to the robot controller via a fast Ethernet connection. ThePC application performed the GA evolution, and transmittedgene sequence information to the robot, which were then inter-preted by the robot controller and translated into primitive mo-tion commands. Each search method had a series of end condi-tions that defined when the robot had either successfully com-

Fig. 2. (A) Spiral search. (B) Linear move. (C) Radial search. Both the linearmove and radial search support tool rotations, while the radial and spiralsearches involve complex trajectories.

Fig. 3. The transmission valve body assembly consisted of the valve plug (left)being inserted into one of three receptor holes in the body base using a spiralsearch (right).

pleted a stage of the assembly or had failed completely. Mul-tiple stages of assembly were represented as chained gene se-quences, which were separated and interpreted by the robot con-troller. The gene sequences were conveyed to an ABB IRB-140robot with an IRC5 controller. The robot and controller werekept in stock configurations with the ABB Force Control optioninstalled, and an ATI Gamma force sensor affixed to the arm.The PC was used for GA parameter searches and logging, whilekinematic control and force interpretation were handled by theIRC5.

Initial trials performed an insertion of valve plugs into a trans-mission body assembly, which consisted of a single stage per-forming a spiral search for cylindrical insertion (Fig. 3). TheGA population consisted of four clans, each with ten assignedchildren. The system was run for 20 generations, for a total of4400 assembly attempts (parent and child gene sequences wererun five times and the results averaged), with the applied force,search spiral radius, search speed, and the number of turns perspiral being evolved. The results of training (Fig. 4) show thattwo of the clans produced highly competitive results: one with

s average assembly time, while the other performeddependably better with an average assembly time of s.In contrast, a week of manual tuning the same parameters hadan average assembly time of about 2.5 s.

With a properly constructed GA, viable assembly solutionscan be revealed quickly and without necessitating detailedknowledge of the task space outside of assembly strategies.This is consistent with previously reported work, indicatingthat GA-based learning is an effective approach for improvingthe performance of assembly tasks.


Fig. 4. After 20 generations of the transmission valve body assembly the timeto complete the assembly saw marked downward trends.

IV. MODEL ASSISTANCE

While training by means of a stochastic search could be con-sidered the baseline of learning, it has the limitation of beingmyopic. Genetic parental succession learns nothing from pooror even decent solutions. Attempts to minimize the waste of dis-carded gene sequences focus on learning rates [44], populationsizes, mutation rates, rates of child succession, and competitionmetrics [42], but do not address the source of the waste. As GAexplore the parameter space, information is lost because thereis no memory of any of the sequences that were not one of thetop performers.

This lack of memory is problematic not only because usefulinformation may be lost, but also since there is little to preventthose parameter values from being revisited. Gene evaluationhas a nonzero cost associated with it. At best, one could hope forgene sequences that were merely “okay” performers. At worst,the revisited genes may never complete the task, and the cost ofwasted time could be prohibitively excessive.

Each gene sequence must be evaluated before it can be clas-sified. This may be avoided, however, by predicting the perfor-mances of gene sequences. Using a black box method of eval-uating child genes, a GA can prune a population to a smallhandful of potential candidates for succession. Only these sub-sets of candidates are actually evaluated; thus some level of trustmust be placed on the black box.

These “models” shape the performance of a GA. Good qualitymodels stand to improve the optimization process, while badmodels could hinder it. Simulators of task spaces are useful,but expensive to develop and validate. In contrast, dynamicallygenerated models are initially unreliable, but may become morerobust against operational changes over time. Either methodstands to gain from system feedback. Knowing what did anddid not work, the system may shape its own understanding ofthe relationships between input parameters and their respectiveoutput performances.

The earlier results demonstrated the autonomous systems’ ca-pacity to self-optimize using a simple metric of success. Thetest case presented here improves those performances by aug-menting the GA with models that map the parameter space in-puts to their respective performance space outputs. Such modelsaid convergence toward optimization, resulting in better solu-tions earlier in the process of GA training.

Convergence acceleration is the positive motion toward someoptimal solution faster (either as a chronological value or as atally of unique trials) than it would take to achieve the same per-formance with the unassisted model. Assuming that the models

Fig. 5. An aluminum pentagonal puzzle insert (left) was inserted by first en-gaging the circular lip (middle) and then rotated to match the pentagon profile(right).

effectively predict the performance of the genes produced bythe GA, and assuming that each trial has a constant evaluationcost , by evaluating only the projected best-performing se-quences of the total genes otherwise necessary to achieve agiven performance, an average learning performance improve-ment of can be expected.

To gauge the efficacy of internal modeling on physical assem-blies, trials consisted of an aluminum pentagonal puzzle (Fig. 5).The assembly consisted of a spiral search (stage 1) to engage thecircular lip of the puzzle piece, followed by a rotational search(stage 2) that rotated the puzzle piece to engage the profile withthe housing body. A standard feed-forward ANN trained viaback propagation was evaluated as the model-building paradigmdue to familiarity, though any number of learning algorithmscould be used instead [45].

Stage 1 was trained independently and its optimized valueslocked in place, and only the evaluations of stage 2 are describedhere. Because the puzzle housing was fixtured, uncertainty wassimulated by adding a random rotational offset in the range of

along the axis and random lateral offsets in the range ofon both the and axes to the tool starting location

to each assembly attempt. An additional search (stage 0) wasadded to allow the robot to attempt to compensate with posi-tional and rotational offsets.

The ANN took as inputs the number of search parameters ofall three stages (60 inputs). Twenty hidden-layer neurons and asingle output neuron were used. All search parameters for the60-dimensional network input remained constant with the ex-ceptions of the stage 0 rotational offset, and the stage 2 rota-tional search range, search speed, and hopping amplitude andfrequency. The puzzle does not require excessive force to be as-sembled, so the applied downward force was fixed at 5 N, andthe force term in (3) was eliminated by setting .

Plotted in Fig. 6 are the performances of the GA evalua-tion with and without the assistive model for stage 2. The firstthree generations of tuning were evaluated without the internalmodel, and were utilized as a base starting point for all furthertests. Data from those generations were used as training sets.Later generations were run either with or without assistance.The unassisted GA produced ten child sequences for evalua-tion, while the model-assisted GA generated 1000 child genesequences with only the top ten evaluated. The individual as-sisted evaluations performed as well as or better than the averageperformance of the unassisted GA; and the average assisted per-formance had a speedup of over the performance of theunassisted GA (Table I).


Fig. 6. The raw (light lines) and expected (dark lines) results for a high-di-mensional physical assembly with (dashed) and without (solid) the benefit ofinternal modeling are shown. The expected performances for both are mono-tonically decreasing, though the actual trial-by-trial performances vary.

TABLE IASSISTED VERSUS UNASSISTED AVERAGE GA PERFORMANCE

V. PROPERTIES OF QUALITY MODELS

It has been shown that stochastic learning methods can beimproved by using assistive models as filter methods. Until now,full faith has been put in the models developed without allowingfor the possibility of model error. When in doubt of the model,the only viable option available is to revert once again to purestochastic search. But how does one assess whether or not themodel is performing as expected, or decide how many childrenshould be entrusted to the assistive model?

Metrics such as RMS error offer a possible solution for modelquality analysis. Such metrics, however, assess only how wellthe model fits what has been seen, and do little to judge themodel’s predictive capacity. Another option is to compare thetop performers suggested by the model with a purely randomselection of child sequences. This necessitates either automati-cally limiting the number of allocated child slots for the modelor evaluating extraneous gene sequences, resulting in potentiallywasteful tests.

Though the models stand to greatly benefit the system, theissues regarding when and by how much they actually assist thelearning process remain. Identifying models that extract usefulinformation requires a means of assessing a model’s quality.One expectation is that a high-quality model can benefit aprocess by providing a reliable medium for offline optimizationand simulation. A low-quality model, however, may actually beworse than no model at all, since it may present inaccurate orcontrary information. Separating the two is problematic withouta third, more accurate model for comparison. So, what is thequality of the model, and by what metric does one comparemultiple models?

The latter is addressable by comparing the results of evalu-ating parameter sequences in the different models and on thephysical system. The model that performs closer to the empir-ical results is naturally the better choice. However, such an ap-proach is naïve, and does little to quantify how well a model

captures the parameter-performance mapping or even whethera given system is even capable of being learned. There is littlea priori indication which models could be of assistance in im-proving the performance of the search.

A. The Derivation of a Metric for Comparison

The most effective aspect available for comparison is theset of outputs produced by the model for a training set. Whenplotted, these values create a multidimensional surface. It isposited that the output surfaces of a good quality model willpossess the following traits: 1) they capture the empiricalevidence; 2) they are not horizontal planes; and 3) they expresslow spatial frequency. Additional properties clearly exist, butcurrently only these three aspects are evaluated.

Capturing the empirical evidence is mandatory since anymodel that cannot explain what has already been seen is of littleuse as a performance predictor. The quality of the fit to the datais defined by the RMS error, , between the predicted output,, and the actual average performance output, , computed for

each of the previously evaluated samples as

(4)

The RMS value computed in (4) can be thought of as themodel fit standard deviation. The scatter of the repeated trialresults is defined in terms of empirical data by

(5)

The values are computed as the standard deviations for thetrials run for each of the input parameter sequences.The metric of model comparison can be used as a scoring

function, with the score of the “better” model being higher thanthe “worse” model. To force the fitness metric to conform toa unit-less scalar value in the range [0, 1], with 1 being perfectfitness, the surface fitness, , is defined as the sigmoid computedbased on the ratio of the model fit to the trial scatter standarddeviations

(6)

When plotted, the empirical data can be illustrated as a sur-face with thickness , as shown in Fig. 7. The fit to the data ishow well the model adheres to the center of this surface.

For the horizontal plane disqualification, horizontal is definedas the absence of adjustments of values in the parameter spacethat result in a discernable performance difference in the evalu-ative process. If adjusting parameters does not result in changesin performance, there is little benefit to addressing the model.Doing so would exhibit a performance on par with an unassistedrandom search, meaning the resources expended to develop andreference the model were wasted. This does not preclude thepossibility of flat planes in general, as transitions in any direc-tion that result in improved performances are demonstrably op-timizable. However, the range of outputs of the model is irrele-


Fig. 7. Illustration of the model fit to the 1D empirical data with reference tothe natural variance �.

vant for this value, since only the quality of whether or not theoutput surface plot is horizontal was tested.

This requires the potential improvement, , between the ob-served maximum and minimum projected outputs of the model( and ) to be significant in relation to the natural vari-ations exhibited between trials, defined as

(7)

If the projected performance improvement between the best andworst possible parameter sequences is dwarfed by the expectedlevel of noise, there is arguably little to be gained by modelingthe system.

To find the potential benefit relative to the natural variancebetween trials given identical parameter sequences, and assessthis value in the range [0, 1], the horizontal metric is thus com-puted by (8)

(8)

The low spatial frequency constraint follows that gradientdescent performs best when the surface function produces asmooth transition from a given coordinate to neighboring pointsover large areas. Surface plots with high spatial frequency pro-vide little data about short performance trends. Optimization isthus left to random chance. Gradient trends over large searchspaces may never be discovered due to local optima that distractshortsighted search algorithms. Although high-dimensionalmesas (a range of parameters for which highly optimal perfor-mances are guaranteed, but around which successful assemblymay be impossible) are learnable, they are rarely discovered bytrend searches.

The smoothness error of the -dimensional model surface

(9)computes the summed output error using a multidimensionallow-pass surface filter and the projected plotted outputs acrossthe model surface. In short, it computes the running averagepoint difference for all model outputs . This value isdistinct from in that only the model surface is investigatedwhile the trial results are ignored. While the number of posi-tional samples is accounted for in the equation by the value of

, the step size between neighboring points along the surface isultimately user-defined.

Fig. 8. One-dimensional plot showing the difference between the actual andrunning average surfaces for high spatial frequency.

Fig. 8 shows a simulated 1D surface model generated withhigh spatial frequency. Given , the low-pass filtered sur-face model does not fit the data well, and this is reflected in thesmoothness error .

The frequency score value is computed based on the ratioof the natural variance to the smoothness error as defined as

(10)

The three qualities of the model output surface are all inter-related. A model cannot express high spatial frequency on itssurface and be a horizontal plane, for example. Similarly, if themodel accurately captures all of the empirical data and is simul-taneously a horizontal plane, no level of modeling will benefitthe optimization process. The quality metric isthus the product of the three scores

(11)

The nonlinearity of is necessary for the quality metric.Though a linear sum would also peak as the three values ap-proach 1.0, it would not permit any one term declare the modelas being of low quality over the other two. Thus, the numericalequivalent of the logical “and” is required.

The value of gives not only a metric for qualifying a givenmodel, but also a comparative means for assessing models aremore likely to produce better performance predictions. For anytwo given models and , , the model isconsidered to be the better of the two if .

B. Experimental Results

In order to validate the quality hypothesis, numerous phys-ical trial configurations were set up to test the projected benefitsof utilizing internal models for assembly. Here, those assem-blies, their respective quality measurements, and their statisticalanalyses are described.

1) Case Study: Pentagonal Puzzle: Recall the results of thetwo-stage pentagonal puzzle assembly. It is hypothesized thatthe hopping parameters served little useful function in the as-sembly effort. The final rotational offset from the previous ex-periments was thus used and locked in place, and the unassistedand assisted trials were rerun such that only the rotational rangeand rotational search speed were searchable. Like earlier, the as-sistive model was a feed-forward ANN trained by back propaga-tion, though the number of hidden-layer neurons was reduced toten. With low dimensionality of hidden-layer neurons, the sur-face plots would be inherently smooth, making the and termsthe most influential in determining the value of .


Fig. 9. The average assembly performance results of the unassisted (solid) andassisted (dashed) GA for stage 2 of the puzzle assembly.

Fig. 10. Stage 2 puzzle assembly network outputs. The points are the normal-ized training samples and the height-mapped surface represents the output ofthe network for normalized inputs in the range of [�1.0, 1.0].

Test Results: The first three generations of training con-sisted of the unassisted GA performing training to provide acommon starting point for future generations of both the as-sisted and unassisted GAs. During those generations, the genesequences and their performances were logged to train the assis-tive model before being activated in generation 4. Five instancesof training were performed over 12 generations for both the as-sisted and unassisted models, with each generation having tenchild evaluations (10 of 10 children generated for the unassistedmodel, and 10 of 1,000 for the assisted model).

The results of these tests closely followed those seen inFig. 6, and are plotted here in Fig. 9. For both the unassistedand assisted GA, the results are monotonically decreasing asthe number of trials increases. By the end of the training, theexpected time to complete the stage 2 assembly for the assistedGA is 1.59 s, while the unassisted GA’s expected completiontime is 2.27, demonstrating the benefit to using the assistivemodel with the GA model for this assembly.

Model Analysis: To assess the quality of a model, thelogged gene and result vectors were passed into a feed-forwardANN constructed, trained, and evaluated identically to the oneused for GA assistance. Because only two of the 60 total geneinputs were able to be mutated by the GA, the resulting map-ping can be visualized as a 3D surface (Fig. 10). The spread ofthe data on each axis was used to normalize the plotted inputs.The axis represents the normalized assembly times for theinput pairs.

After the network was trained, the evaluation of quality wasperformed on the resulting model surface. The output values for

and were scaled based on the variance measurement, ,

TABLE IIPUZZLE MODEL SURFACE PROPERTIES

Fig. 11. The sun gear (left) was inserted into the gear base (as seen from above),where it meshed with the three freely spinning primary planet gears.

Fig. 12. The expected performance of the sun gear assembly demonstrates littledifference between the unassisted and assisted GA.

such that all elements were given in seconds. For the computa-tion of the spatial frequency value, the value of was 1, and sur-face samples of the model were taken at 0.1 unit intervals alongthe and axes. The resulting scores are shown in Table II.By the quality metric, the model for the stage 2 puzzle assemblywas of high quality, exhibiting high adherence to the empiricaldata, low surface frequency, and demonstrating that changes inthe input parameters result in nonzero changes in the output.

2) Case Study: Sun Gear: The pentagonal puzzle possessedphysical attributes that assisted the assembly process (i.e., a cir-cular engagement lip). How does the quality metric stand up toan assembly that has little in the way of assistive features? Doessuch an assembly have anything that could be learned, or is itreliant on an exhaustive search of the configuration space?

To test this, a sun gear assembly (Fig. 11) was evaluated. Thisconsisted of a 31-tooth central sun gear inserted into a circulargear hub, where it meshed with three freely-spinning planetgears. Neither the sun gear insert nor any of the planetary gearshad a chamfer, and all contact surfaces were level and coplanar.Thus, there was no physical “hint” to either guide the insert intoplace, or to help the teeth to align.

The assembly consisted of a single rotational search. Theparameters being optimized were the axis rotational searchspeed and range, and the circular search radius and speed. Theresults of training are illustrated in Fig. 12, and the performanceof the assisted and unassisted implementations are detailed inTable III.

No discernable performance improvement was gained by ei-ther training implementation. The assembly time averages wereneither monotonically decreasing, nor did they demonstrate any


Fig. 13. Model surface plots for the: (A) speed versus radius, (B) speed versus turns, and (C) radius versus turns.

TABLE IIISUN GEAR MODEL SURFACE PROPERTIES

observable downward trend. Given the large standard devia-tions, it is not clear if the assisted and unassisted implementationperformances were unique. The low-quality score was a resultof both a disagreement between the model surface and the em-pirical test results, and a high level of natural variance in thedata. The quality of the model was also low, which supports theempirical observations.

3) Case Study: Transmission Valve Body: Until now, allquality assessments have been analyzed postprocess, with theknowledge already in hand regarding whether or not a particularsearch would benefit from having an assistive model. Returningto the transmission valve body, three different parameter pairswere tested prior to the GA optimization to provide a predictionfor their effectiveness in aiding the learning rate. Though thedownward applied force was kept constant at 5 N, the threedefining parameters for the spiral search were divided intopairs for analysis: 1) the spiral search speed (speed) versus thespiral radius (radius); 2) speed versus the number of turns perspiral (turns); and 3) radius versus turns. For a given pair, theunevaluated parameter was locked at a pretuned value. Each ofthe three model environments were passed through the samemodel-building process.

Initial data for the ANNs was gathered by generating 200random parameter samples for each parameter pairing, and thenevaluating those sequences through the robot. Nontrivial noisein the form of position uncertainty was simulated for every eval-uation by adding a random, lateral offset in the range ofon both the and axes. No position offset was learned, how-ever, in order to maintain the number of search stages constant at1. In each model case, the ANN topology consisted of 20 inputlayer nodes, 10 hidden layer nodes, and a single-output layernode.

The ANNs were then trained for 5000 epochs. The surfaceplots generated by the networks are shown in Fig. 13, and thesurface properties are listed in Table IV. The inputs and outputswere normalized to be in the range [ , 1.0] in each plot. In-stead of using the assembly time for the model surface height,the plots use the normalized scores. For the model quality anal-ysis, however, these values were converted back to seconds forcomparison with the model’s natural variance.

TABLE IVVALVE BODY MODEL SURFACE PROPERTIES

Fig. 14. Performance results of the pure stochastic search and model-assistedsearch for the transmission valve body assembly. Of the three trials, only theone testing the search speed and spiral radius (top) demonstrated performanceimprovement over time.

Based on the results of applying the quality metric to thethree models, one would suspect that the model performance ofthe search speed versus the search radius would outperform theother two models. One might even venture to guess whether thespeed-versus-turns and radius-versus-turns training sets wouldeven produce performance improvements at all.

Both the assisted and unassisted GA optimizations weretested. The evaluations of these tests are shown in Fig. 14. Un-like previous trials, the trained ANNs from the surface analysistests were loaded for their respective parameter pair tests, and


the practice of training for three generations prior to turningon the assistive models. All evaluations started with the sameinitial search conditions. The assisted and unassisted imple-mentation trials were run five times each for eight generationsof training for each parameter pair.

As predicted by the performance metrics, use of the speed-versus-radius model demonstrated improvement over the unas-sisted search. In contrast, the speed-versus-turns and radius-versus-turns exhibited highly variable performance without anyreal indication that their applications to stochastic searches per-formed any better than their respective unassisted GA imple-mentations. Of additional interest is the fact that even the unas-sisted instantiations of the latter test conditions failed to exhibitperformance improvements over time.

VI. CONCLUDING REMARKS

We have presented an architecture for a robotic system to au-tomatically explore a complex parameter space. We then extractuseful information from it, and then self-evaluate to determinethe quality of its understanding of the world in which it operates.Our method provides a system for the autonomous optimizationof parameters with minimal a priori information. Through thegeneration of internal models, the system can build a multidi-mensional representation of its task space, and then evaluate thequality of those models based on their surface properties.

The system can readily be expanded to allow for the simul-taneous generation of multiple models, and can be used to au-tomatically identify parameter combinations that are likely toexhibit the most influence on the task performance. Additionaldevelopment of the quality metrics are similarly likely to aidprocess engineers determine which model strategies to employgiven a specific task configuration.

REFERENCES

[1] G. Zhang et al., “On-pendant robotic assembly parameter optimiza-tion,” in Proc. 7th World Congr. Intell. Controls Autom., 2008, pp.547–552.

[2] D. A. Simon, L. E. Weiss, and A. C. Sanderson, “Self-tuning of robotprogram primitives,” in Proc. IEEE Int. Conf. Rootb. Autom., 1990, vol.1, pp. 708–713.

[3] J. Deiterding and D. Henrich, “Automatic adaptation of sensor-basedrobots,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2007, pp.1828–1835.

[4] A. Y. Ng and M. Jordon, “PEGASUS: A policy search method for largeMDPs and POMDPs,” in Proc. 16th Conf. Uncertainty Artif. Intell.,2000, pp. 406–415.

[5] A. Y. Ng, H. J. Kim, M. I. Jordan, and S. Sastry, “Autonomous he-licopter flight via reinforcement learning,” Adv. Neural Inf. Process.Syst., no. 16, pp. 799–806, 2004.

[6] S. Vijayakumar and S. Schaal, “Fast and efficient incremental learningfor high-dimensional movement systems,” in Proc. IEEE Int. Conf.Rob. Autom., 2000, pp. 1894–1899.

[7] S. Schaal and C. G. Atkeson, “Robot juggling: Implementation ofmemory-based learning,” IEEE Control Syst. Mag., vol. 14, no. 1, pp.57–71, 1994.

[8] S. Schaal, C. G. Atkeson, and S. Vijayakumar, “Real-time robotlearning with locally weighted statistical learning,” in Proc. IEEE Int.Conf. Robot. Autom., 2000, pp. 288–293.

[9] S. Vijayakumar et al., “Statistical learning for humanoid robots,”Auton. Robot., vol. 12, no. 1, pp. 55–69, 2002.

[10] S. Vijayakumar, A. D’Souza, and S. Schaal, “Incremental onlinelearning in high dimensions,” Neural Comput., vol. 17, no. 12, pp.2602–2634, 2005.

[11] G. Josin, D. Charnay, and D. White, “Robot control using neural net-works,” in Proc. IEEE Int. Conf. Neural Networks, 1988, pp. 625–631.

[12] F. L. Lewis, “Neural network control of robot manipulators,” IEEEExpert, vol. 11, no. 3, pp. 64–75, 1996.

[13] D. A. Pomerleau, “Knowledge-based training of artificial neural net-works for autonomous robot driving,” Rob. Learn., pp. 19–43, 1993.

[14] W. S. Newman, Y. Zhao, and Y.-H. Pao, “Interpretation of force andmoment signals for compliant peg-in-hole assembly,” in Proc. IEEEInt. Conf. Robot. Autom., 2001, pp. 571–576.

[15] D. Goldberg, Genetic Algorithms in Search, Optimization, and Ma-chine Learning. Reading, MA: Addison-Wesley, 1989.

[16] F. Bonneville, C. Perrard, and J. M. Henioud, “A genetic algorithmto generate and evaluate assembly plans,” in Proc. INRIA/IEEE Symp.Emerging Technol. Factory Autom., 1995, vol. 2, pp. 231–239.

[17] J. Bautista et al., “Application of genetic algorithms to assembly se-quence planning with limited resources,” in Proc. IEEE Int. Symp. As-sembly Task Planning, 1999, pp. 411–416.

[18] P. DeLit, P. Latinne, B. Rekiek, and A. Delchambre, “Assembly plan-ning with an ordering genetic algorithm,” Int. J. Prod. Res., vol. 39, no.16, pp. 3623–3640, 2001.

[19] L. M. Galantucci, G. Percoco, and R. Spina, “Assembly and disas-sembly planning by using fuzzy logic and genetic algorithms,” Int. J.Adv. Robot. Syst., vol. 1, no. 2, pp. 67–74, 2004.

[20] C. Lu, Y. S. Wong, and J. Y. H. Fuh, “An enhanced assembly planningapproach using a multi-objective genetic algorithm,” Proc. Inst. Mech.Eng., Part B: J. Eng. Manuf., vol. 220, no. 2, pp. 255–272, 2006.

[21] J. Wei, “Intelligent robotic learning using guided evolutionary simu-lated annealing,” M.S. thesis, Dept. Electr, Eng., Comput. Sci., CaseWestern Reserve Univ., Cleveland, OH, 2001.

[22] C. G. Atkeson and S. Schaal, “Learning tasks from a single demonstra-tion,” in Proc. IEEE Int. Conf. Robot. Autom., 1997, pp. 1706–1712.

[23] A. Billard and S. Schaal, “Robust learning of arm trajectories throughhuman demonstration,” in Proc. IEEE/RSJ Int. Conf. Intell. Robot.Syst., 2001, pp. 734–739.

[24] A. Billard, Y. Epars, G. Cheng, and S. Schaal, “Discovering imitationstrategies through categorization of multi-dimensional data,” in Proc.IEEE/RSJ Int. Conf. Intell. Robot. Syst., 2003, pp. 2398–2403.

[25] W. S. Newman, C. Birkhimer, and R. Hebbar, “Towards automatictransfer of human skills for robotic assembly,” in Proc. IEEE/RSJ Int.Conf. Intell. Robot. Syst., 2003, pp. 2528–2533.

[26] T. I. Ören, “Artificial intelligence in simulation,” Ann. Oper. Res., vol.53, no. 1, pp. 287–319, 1994.

[27] D. Deneux, “Introduction to assembly features: An illustration syn-thesis methodology,” J. Intell. Manuf., vol. 10, pp. 29–39, 1999.

[28] A. Gayretli and H. S. Abdalla, “A feature-based prototype system forthe evaluation and optimization of manufacturing processes,” in Proc.24th Int. Conf. Comp. Ind. Eng.., 1999, vol. 37, pp. 481–484.

[29] W. Van Holland and W. F. Bronsvoort, “Assembly features in modelingand planning,” Robot. Comput. Integr. Mfg.., vol. 16, pp. 277–294,2000.

[30] M. Hamidullah, E. L. J. Bohez, and M. A. Irfan, “Assembly features:Definition, classification, and instantiation,” in Proc. IEEE Int. Conf.Emerging Tech., 2006, pp. 617–623.

[31] A. C. Sanderson, “Parts entropy methods for robotic assembly systemdesign,” in Proc. IEEE Int. Conf. Robot. Autom., 1984, pp. 600–608.

[32] S. R. Chhatpar and M. S. Branicky, “Localization for robotic assem-blies with position uncertainty,” in Proc. IEEE/RSJ Int. Conf. Intell.Robot. Syst., 2003, pp. 2534–2540.

[33] N. Yamanobe et al., “Optimization of damping control parameters forcycle time reduction in clutch assembly,” in Proc. IEEE/RSJ Int. Conf.Intell. Robot. Syst., 2005, pp. 3251–3256.

[34] X. Yuan and S. X. Yang, “Virtual assembly with biologically inspiredintelligence,” IEEE Trans. Syst, Man, Cybern. Part C Appl. Rev., vol.33, no. 2, pp. 159–167, 2003.

[35] D. C. Lay, Linear Algebra and its Applications, 2nd ed. Reading,MA: Addison-Wesley, 1999.

[36] A. M. Bloch, “A completely integrable Hamiltonian system associatedwith line fitting in complex vector spaces,” Bull. (New Series) Amer.Math. Soc., vol. 12, no. 2, pp. 250–252, 1985.

[37] S. Ross, B. Chaib-draa, and J. Pineau, “Bayesian reinforcementlearning in continuous PMDBPs with application to robot navigation,”in Proc. EEEE Int. Conf. Robot. Autom., 2008, pp. 2845–2851.

[38] T. Hester, M. Quinlan, and P. Stone, “Generalized model learning forreinforcement learning on a humanoid robot,” in Proc. IEEE Int. Conf.Robot. Autom., 2010, pp. 2369–2374.

[39] P. Abbeel, A. Coates, T. Hunter, and A. Y. Ng, “Autonomous autorota-tion of an RC helicopter,” Experimental Robot., vol. 54, pp. 385–394,2009.


[40] M. E. Taylor, G. Kuhlmann, and P. Stone, “Autonomous transfer forreinforcement learning,” in Proc. 7th Int. Joint Conf. Autom. Agentsand Multiagent Syst., 2008, vol. 1, pp. 283–290.

[41] J. Wei and W. S. Newman, “Improving robotic assembly performancethrough autonomous exploration,” in Proc. IEEE Int. Conf. Robot.Autom., 2002, pp. 3303–3308.

[42] P. P. C. Yip and Y.-H. Pao, “A guided evolutionary computation tech-nique as function optimizer,” in Proc. IEEE World Congr. Comp. In-tell., 1994, vol. 2, pp. 628–633.

[43] J. Marvel et al., “Automated learning for parameter optimization ofrobotic assembly tasks utilizing genetic algorithms,” in Proc. IEEE Int.Conf. Rob. Biomimetics., 2008, pp. 179–184.

[44] O. Gheorghies, H. Luchian, and A. Gheorghies, “A study of adaptationand random search in genetic algorithms,” in Proc. IEEE Congr. Evol.Comput., 2006, pp. 2103–2110.

[45] J. Marvel and W. Newman, “Accelerating robotic assembly parameteroptimization through the generation of internal models,” in Proc. IEEEInt. Conf. Techol. Pract. Robot. Appl., 2009, pp. 42–47.

Jeremy A. Marvel (S’06–M’10) received the B.A.degree in computer science from Boston University,Boston, MA, in 2003, the M.A. degree in computerscience from Brandeis University, Waltham, MA, in2005, and the Ph.D. degree in computer engineeringfrom Case Western Reserve University, Cleveland,OH, in robotics in 2010.

He is a Research Associate with the Institute forResearch in Electronics and Applied Physics, Uni-versity of Maryland, College Park, and works for theIntelligent Systems Division at the National Institute

of Standards and Technology, Gaithersburg, MD, as a Guest Researcher. Hisresearch focuses on mechatronics, intelligent robotics and adaptive systems.

Wyatt S. Newman (M’87–SM’01) received the S.B.degree in engineering science from Harvard College,Cambridge, MA, the S.M. degree in mechanical en-gineering from the Massachusetts Institute of Tech-nology (MIT), Cambridge, in thermal and fluid sci-ences, the M.S.E.E. degree in control theory and net-work theory from Columbia University, New York,and the Ph.D. degree in mechanical engineering fromMIT in design and control.

He is a Professor in the Department of ElectricalEngineering and Computer Science, Case Western

Reserve University, Cleveland, OH. His research is in the areas of mecha-tronics, robotics, and computational intelligence, in which he has 10 patentsand over 100 technical publications. He spent eight years in industrial researchat Philips Laboratories, Briarcliff Manor, NY, engaged in electromechanicaldesign and control. He joined Case Western University in 1988, and in 1992,he was named NSF Young Investigator in Robotics. He also holds an adjunctappointment at the Cleveland Clinic and the Cleveland V.A. Medical Center.Additional professional appointments and experience include Visiting Scientistat Philips Naturrkundig Laboratorium, Eindhoven, The Netherlands, visitingfaculty at Sandia National Laboratories, Intelligent Systems and RoboticsCenter, Albuquerque, NM, NASA summer Faculty Fellow at NASA GlennResearch Center, and Visiting Fellow at Princeton University.

Documents

Model-Assisted Stochastic Learning for Robotic Applications