Model-Assisted Stochastic Learning for Robotic Applications

  • Published on

  • View

  • Download

Embed Size (px)


<ul><li><p>IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 8, NO. 4, OCTOBER 2011 835</p><p>Model-Assisted Stochastic Learning forRobotic Applications</p><p>Jeremy A. Marvel, Member, IEEE, and Wyatt S. Newman, Senior Member, IEEE</p><p>AbstractWe present here a framework for the generation, ap-plication, and assessment of assistive models for the purpose ofaiding automated robotic parameter optimization methods. Ourapproach represents an expansion of traditional machine learningimplementations by employing models to predict the performancesof input parameter sequences and then filter a potential popula-tion of inputs prior to evaluation on a physical system. We furtherprovide a basis for numerically qualifying these models to deter-mine whether or not they are of sufficient quality to be capableof fulfilling their predictive responsibilities. We demonstrate theeffectiveness of this approach using an industrial robotic testbedon a variety of mechanical assemblies, each requiring a differentstrategy for completion.</p><p>Note to PractitionersThis paper was motivated by the problemof online parameter optimization for robotic assembly applicationsin which the mechanical joining of components must be completedboth quickly and gently. Traditional approaches for such optimiza-tions require offline modeling or user-specified tests involving trial-and-error or design of experiments due to the inherent risks ofdamaging the robots, tools, or components. We propose a methodof automating and enhancing the optimization process by applyingand hybridizing machine learning practices. Through the utiliza-tion of both unsupervised learning methods and dynamic numer-ical model building and analysis, a robotic system can tune its op-erational parameters and effectively prune a potentially infiniteparameter space of inefficient or dangerous values. These modelscan be algorithmically analyzed in order to monitor their quality,and predict whether or not they stand to benefit the optimizationprocess.</p><p>Index TermsIntelligent robots, machine learning, model-basedlearning, parameter optimization.</p><p>I. INTRODUCTION</p><p>T HE UTILIZATION of robotic manipulators in tasks re-quiring the physical contact of rigid parts has been limiteddue to difficulties in algorithmically describing the processesand restrictions in control paradigms. The automation of robotictasks is not trivial, even when considering the powerful tools ofcompliant motion control and machine vision. The force control</p><p>Manuscript received August 04, 2010; revised February 02, 2011; acceptedMarch 21, 2011. Date of publication July 05, 2011; date of current version Oc-tober 05, 2011. This paper was recommended for publication by Associate Ed-itor L. Moench and Editor K. Goldberg upon evaluation of the reviewers com-ments.</p><p>J. A. Marvel is with the University of Maryland, Institute for Research in Elec-tronics and Applied Physics, College Park, MD 20742-3511 USA, and also withthe Department of Electrical Engineering and Computer Science, Case WesternReserve University, Cleveland, OH 44106-7071 USA (e-mail:;</p><p>W. S. Newman is with the Department of Electrical Engineering and Com-puter Science, Case Western Reserve University, Cleveland, OH 44106-7071USA (e-mail:</p><p>Color versions of one or more of the figures in this paper are available onlineat</p><p>Digital Object Identifier 10.1109/TASE.2011.2159708</p><p>parameters necessary for the gentle acquisition of parts and thesafe handling of collisions of rigid components requires exten-sive manual tuning followed by lengthy trial-and-error testingwhenever changes are introduced.</p><p>Programming a robot to utilize force control for specific taskssuch as the mechanical assembly of components requires an ex-pert programmer to tune the program and reactive parameters.Even well-tuned programs, however, may be insufficiently ro-bust against minor configuration variations, adversely affectingreliability, throughput, and product quality. Adaptability andself-monitoring become necessities in automated systems tomaintain process quality.</p><p>In this paper, we present a framework for self-guided,self-diagnosing optimization. Section II provides a brief historyof learning in robotic applications. Section III establishes abaseline for learning using unguided stochastic search. InSection IV, we propose a methodology for providing model-as-sisted guidance to stochastic searches, and in Section V, wepropose a novel model assessment metric for self-analysis.</p><p>II. ROBOTIC LEARNING</p><p>There is no lack of interest in the automated tuning of pa-rameters for process optimization. However, most efforts withinthe domain of mechanical assembly focused on assembly se-quencing and workspace configuration. Other domains withinthe field of robotics seek the optimization of path and trajectorygeneration, which are largely independent of the actual tasksbeing accomplished by the robots.</p><p>A. Task Parameter Tuning</p><p>Much of what passes for modern tuning methods areexpected to be executed by adept hands. Experts employ sta-tistical approaches for tuning processes in order to improveperformance. Manual tuning methods based on Designs ofExperiments (DOEs) are demonstrably faster than exhaustiveparameter searches, but require the use of expensive personnelto construct, implement, and test the solutions. Automated toolshave been developed [1] to ease the labor burden, but humanoperators must still identify the parameters that are most likelyto drive performance.</p><p>A self-tuning model was developed by Simon et al. [2] inwhich optimization takes place on motion control primitives atthe program level. By applying a least squares fit to the dataand associating cost functions, the system ran a gradient descentapproach to parameter optimization. However, the approach fo-cused exclusively on snap-fit insertions, and did not consider thecomplexities of more advanced assemblies unsolvable by meansof gradient descent, particularly those showing variations in per-formance over time.</p><p>1545-5955/$26.00 2011 IEEE</p></li><li><p>836 IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 8, NO. 4, OCTOBER 2011</p><p>A sensor-based robot adaptation approach was proposed byDeiterding and Henrich [3] to address the problems of acquiringcomplex sensor data and dealing with the necessary executionspeed slowdown caused by using such sensors. Using existingoptimization strategies, the system should react and adapt tocause-and-effect type changes in the configuration (i.e., the ori-gins and results of changes in performance). The approach re-quired human intervention to first identify the process compo-nents that can be optimized, however, which is largely a task oftrial-and-error.</p><p>Automatically learning through exploration of a problemspace to perform a given task has an attractive draw. Problemsare reduced to sets of known operational states, and the actuallearning determines when to transition from one state to thenext. An example of this is PEGASUS [4], which transformsany given partially observable Markov Decision Process(POMDP) into an equivalent one with only deterministictransitions, reducing the policy search problem to a simplifiedhunt for policies with higher estimated values. This system wasapplied to reinforcement learning for autonomous helicopterflight [5]. In dangerous problem domains such as autonomousflight, however, such explorations are best accomplished bymeans of simulations, and the states must first be known anddefined.</p><p>Vijayakumar et al. [6] developed the Locally WeightedProjection Regression (LWPR) to handle the dimensionalityproblem of robot control. LWPR is based on their earlier workof locally weighted regression (LWR) to learn control of basicmotor skills and handle devil sticks [7], and was then laterrevisited [8] for real-time learning on the same problem. Theirsystem was tested for online learning on a nonlinear crossfunction and control of a seven degree of freedom robot armfor discrete movement and rhythmic tasks [6]. It was tested onhumanoid robot for the separate tasks of motion reproductionand gaze stabilization [9], and for autonomous airplane control[10]. LWPR has yet to be applied to optimization, and waslimited to inverse kinematics problems.</p><p>Robot training models based on biological principles havealso been extensively researched. Artificial Neural Networks(ANNs) have been used in an impressive array of tasks rangingfrom trajectory control of robot manipulators [11], [12] toautonomous vehicle control [13]. Newman et al. [14] demon-strated that ANNs can also be used successfully to train a robotto perform specific assembly tasks. However, the major draw-backs to these methods are that they are all highly specializedto specific problems, and must be reworked when modificationsare introduced to the assembly setup or the robot. Additionalmachine learning methods such as Genetic Algorithms (GAs)[15] have also been explored extensively for assembly sequenceand scheduling optimization [16][20], but their employmentfor the actual robotic tasks is relatively limited. What researchhas been done has shown that GA can be used not only forparameter optimizations [21], but also for process learning.</p><p>To train robots by observing human operators perform tasks,Atkeson and Schaal [22] developed a method for robot trainingby filming a single instance of the task, building a model usinglinear regression of a priori knowledge of the task physics, andthen learning to balance a pendulum using cost minimizations.</p><p>Billard and Schaal [23] developed a simulation of a humanoidand trained it with an ANN to learn arm trajectories from obser-vations of human subjects, which was later expanded to extractrelevant aspects for imitation in a given task [24]. Newman et al.[25] transferred process knowledge by applying an abstractionmethod to observed simple assembly tasks performed by humanoperators. They discovered human search strategies based onthe actions of blindfolded test subjects to recreate the limitedknowledge available to both the human and the robot observer.</p><p>B. Model-Assisted Robot Training</p><p>The framework proposed in this paper relies on the develop-ment, utilization and analysis of models for parameter optimiza-tion, and continues an ongoing trend of research within man-ufacturing [26]. Most of these simulations and models, how-ever, were used extensively for manufacturing process, com-ponent design, and configuration analysis and tuning. The de-sired application of inline model building, training, and param-eter tuning seems to have been overlooked.</p><p>At the forefront of assembly models are the representationsknown as Assembly Features (AFs) that characterize the aspectsof the construction process. Deneux [27] introduced AF to ad-dress engineering design and assembly sequence solution em-bedding. He described them as solutions to using the represen-tations of relationships between two or more groups of parts andtheir respective part features. Gayretli [28] developed a mod-ularized system based on AF to aid the evaluation and opti-mization of manufacturing processes by creating cost estimatesand testing process plan practicality. Van Holland et al. [29],seeing that AF as component relationships were insufficient forassembly sequencing, expanded the definition to be a duality de-scription of component handling and component connections.The result was an object-oriented, integrated tool to juxtaposefunctional and geometric models. Ullah et al. [30] expanded themodel to include assembly intent, and described assemblies nu-merically though set theory. In each of these cases, the opti-mizing capabilities of AF were limited to operator-driven as-sembly planning and design, and did not address joining com-ponents.</p><p>Sanderson [31] described assemblies in terms of numericalrepresentations, enumerated by cumulative entropies. Theassembly task optimization was a matter of minimizing thetotal entropy of the system. The parts entropy method provideda common basis for comparing multiple assembly processes.Though his work focused mostly on the entropy of parts acqui-sition, orientation and positioning, it poses a promising pathfor describing dynamically generated models for performanceenhancement potential because of this comparative capacity.</p><p>The more direct method of topologically mapping the en-vironment was central to the optimized localization strategyfor physical assemblies devised by Chhatpar and Branicky[32]. Though these maps were built autonomously by the robotsystem, they were not built inline with the assembly process.Localization was based on matching observations to a C-spacemap captured prior to execution either analytically or by meansof empirical exploration of the environment.</p></li><li><p>MARVEL AND NEWMAN: MODEL-ASSISTED STOCHASTIC LEARNING FOR ROBOTIC APPLICATIONS 837</p><p>Yamanobe et al. [33] devised a method of optimizing forcecontrol parameters for assembly by means of creating special-ized software simulations for clutch assemblies. The resultsdemonstrated the promise of simulation, but the simulationswere computationally expensive, had to be manually devel-oped, and the assemblies had to be simplified significantly formore reasonable simulation times. Because the parameters forphysical assemblies could not be analytically derived, many ofthe physical attributes had to be empirically computed basedon a number of physical trials. Similarly, Yuan and Yang [34]proposed a virtual reality simulation integrated with an ANNfor assembly sequence planning and learning from human oper-ators. By means of human-run demonstrations, the ANN modelwas trained to optimize the part sequence planning and pathgenerations to avoid part collisions. All tests were performedin silico, however, and no actual assemblies were performed.</p><p>An alternative to explicit simulators, numerical modelingmethods fall under the umbrella of linear regression to estimatethe outputs of unknown input sequences, and adjusted based onempirical data. These models may also be used to describe thestrengths and natures of the relationships between the input pa-rameters and the output values. Smaller spreads of data throughthe regression, for instance, typically indicate that the systemoutputs are tightly coupled with their respective inputs. Themost common method of generating linear regressions to fitthe data uses the least squares method [35], though some haveattempted to approach the problem of fitting data in complexlinear spaces with less intuitive solutions such as integrableHamiltonian systems [36].</p><p>Generalized reinforcement learning (RL) with model assis-tance, while not explicitly applied to industrial tasks, has beenevaluated for various applications. When exact representationsof operational functions do not exist, RL methods have em-ployed models to guide exploration of a parameter space. Suchmethods have been applied to various robotic tasks such as nav-igation [37], adaptive behavior optimization [38], and control[39...</p></li></ul>


View more >