JEDLIK LABORATORIES REPORTS FACULTY OF INFORMATI ON ... · [3] Iv án Gonz alez-D´ ´ az, Vincent Buso, Jenny Benois-Pineau, Guillaume Bourmaud, and R émi M égret, Modeling

JEDLIK LABORATORIES REPORTS

FACULTY OF INFORMATION TECHNOLOGY AND BIONICS AND

THE JEDLIK LABORATORIES

Faculty of Information Technology and Bionics and

The Jedlik Laboratories

JEDLIK LABORATORIES REPORTS

Vol. VII, No. 2

JLR – 2 / 2019

EDITOR:

Péter Szolgay

EDITORIAL BOARD:

Marco Gilli (Torino), György Karmos (Budapest), Maciej Ogorzalek (Krakow),

Sándor Pongor (Budapest), Gábor Prószéky (Budapest), Ronald Tetzlaff (Dresden)

RESEARCH IN HYBRID CONTROL OF

NEURO-PROSTHETICS

FRENCH-SWISS-HUNGARIAN WORKSHOP

April 29, 2019

JLR – 2 / 2019

PÁZMÁNY UNIVERSITY ePRESS

BUDAPEST, 2019

© PPKE Információs Technológiai és Bionikai Kar, 2019

ISSN 2064-3942

Kiadja a Pázmány Egyetem eKiadó

Felelős kiadó

A. R. D. Szuromi Szabolcs Anzelm O. Praem.

a Pázmány Péter Katolikus Egyetem rektora

Emberi Erőforrások Minisztériuma

17823/2018/FEKUTSTRAT

Contents

The Algorithms of Artificial Intelligence in the Deep Learning Framework for the Recognition of Ob-jects in Ego-centered Visual Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Jenny Benois-Pineau, Iván González-Díaz, Miltiadis Poursanidis, Aymar de Rugy

Hybrid Sensorimotor Control of a Prosthetic Arm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Aymar de Rugy, Sébastien Mick, Matthieu Guémann, Mathilde Couraud, Christophe Halgand, FlorentPaclet and Daniel Cattaert

From Neuroprosthetics to Symbiotic Brain-Machine Interaction . . . . . . . . . . . . . . . . . . . . . . 13Ricardo Chavarriaga

Combining object recognition, gaze tracking and electromyography to guide prosthetic hands experi-ences from two research projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Henning Müller, Manfredo Atzori

A comparison of different SIFT implementation for vision-guides prosthetic arms . . . . . . . . . . . . 19Attila Fejér, Zoltán Nagy, Jenny Benois-Pineau, Péter Szolgay, Aymar de Rugy, Jean-Philippe Domenger

THE ALGORIHTMS OF ARTIFICAL INTELLIGENCE IN THE DEEP LEARNINGFRAMEWORK FOR THE RECOGNITION OF OBJECTS IN EGO-CENTERED VISUAL

SCENES

Jenny Benois-Pineau1, Ivan Gonzalez-Dıaz2, Miltiadis Poursanidis1, Aymar de Rugy3

1LaBRI, University of Bordeaux, Talence, France2Universidad Carlos III de Madrid

3INCIA, University of Bordeaux, Talence, [email protected],[email protected],

[email protected], [email protected]

ABSTRACTWith the miniaturization of video recording devices, new

content has emerged such as ego-centered video. Thisrecording point of view allows for fine-grained analysisof human instrumental activities. Recognition of objects ofeveryday-life in such videos is also required for assistanceto amputees of upper limbs in hybrid - EMG - guidedand vision - guided neuro-prostheses. Object recognition innatural scenes is one of the key-problems in computer vision.It has been successively addressed by various machinelearning approaches in the framework of supervised learningparadigm. Since recently the artificial intelligence modelssuch as Deep learning approaches have shown stronglyincreased performances for this task.

Index Terms— Ego-centered vision, Deep Learning, Ob-ject recognition

I. INTRODUCTION

Our target application is assistance to upper-limb am-putees wearing neuro-prostheses. Although there have beenseveral recent attempts to use computer vision for prosthesiscontrol, such as in adjusting a robotic or prosthetic grasp toa recognized object [1], [2], they were typically conductedin too simple visual environments. A critical aspect ofour research is to enable robust object identification inchallenging and realistic visual scenarios containing clutter,occlusions and a multitude of objects. Addressing this taskappropriately would also be relevant to an enormous range ofscenarios where identifying the object-of-interest in a scenebecomes a key step for subsequent visual indexing tasksas activity recognition [3] or video summarization [4]. Inthe follow up we will introduce some popular ego-centerddatasets, present main principles of object detectors on thebasis of Deep Neural networks and report our recent researchresults.

II. EGOCENTERED DATASETS IN NATURALENVIRONMENTS

The ego-centric video corpora has been introducedfor understanding of human activities as early as in2000ties. The most known of them is the Gerogia Tech(GTEA) dataset http://www.cbi.gatech.edu/fpv/. Today

This work is supported by Osez Interdisciplinarite CNRS researchRoBioVis

four datasets are available (GTEA, GTEA Gaze, GTEAGaze+, EGTEA Gaze+). The goal of recording of thesedatasets was in developmpent of methods for recognitionof objects the subjects manipulate and of activities. Threeof them are accompanied with recorded gaze fixationsof the subjects executing activities of everyday life.The second dataset,recorded for the same purpose ofrecognition of daily living activites is the ADL dataset(https://www.csee.umbc.edu/ hpirsiav/papers/ADLdataset/)[5]comprising 20 video files of the overall volume of 43GB.Recently a new dataset has been recorded by a consortium oflabs from Universities of Bristol, Toronto, Catania, the EPICkitchen dataset (https://epic-kitchens.github.io/2018). Thisdataset is the widest annotated with recorded instrumentalactivities in kitchen environments. It contains annotationsfor both objects and actions [6] and comprises 39,594 actionsegments and nearly 11 Mframes. All these datastes havebeen recorded and made publicly available. Nevertheless,in the real-world studies, the availability of data is notguaranteed because of privacy issues, specifically if specialgroups of population are considered, such as Alzheimerpatients in studies [7] and [8]. All these datasets containrecordings when subjects already execute instrumentalactivities and manipulate objects, while for the purposeof controlling robotic neuro-prosthetic arm with vision weneed to design methods of recognition of objects the subjectonly intends to grasp. Clearly such datasets have to beaccompagnied with recorded gaze fixations which expressthe intention of the subject to grasp an object. This is the caseof grasping-in-the-wild (GITW) dataset openly available forresearch at https://www.nakala.fr/data/11280/24923973. Itcontains 404 ego-centered videos in which the subjectwearing the camera is looking for an object and thengrabbing it.The dataset was recorded in 7 natural kitchenenvironments, comprises 16 different daily life objectscategories and was used for object recognition in [9]. Itis important to say, that while recording a simple ”lab”dataset is quite an easy task [10], the state-of-the art in ego-centred object recognition and advances in hybrid systemsrequire natural complex environments such as in GITW.

III. DEEEP CNNS: CONTENT SELECTION INOBJECT RECOGNITION TASK

Deep Convolutional Neural Networks(CNN) have becomethe necessary tool for recognition of objects in complex

7

a. b. c.

Fig. 1. Selection of object proposals.Examples: a)Gaze fix-ation Density map, b)Projections of fixations into the sameframe,c)Object proposals generated

scenes. Their general principle in classification-recognitiontask consists in approximation of an unknown classifierfunction f(x, α) parametrized by α. Trained on the set ofpairs of data and labels {(xi, yi)}, the so called ”traningdataset” it will provide for each input data x its class label yfrom a given taxonomy. For object recognition task,candidatewindows in an image, ”object proposals” have to be gener-ated and submitted to subsequent convolutional layers of aDeep CNN. In the first works on object recognition[11], theproposals were generated either arbitrary or using methodsseeking for highly textured areas in the images, such as ”se-lective search” [12]. In [13] we have showed that selection ofobject proposals on the basis of predicted saliency of regionsin the image can improve stability of training and drasticallyreduces the computation time at the generalization step. Incase of ego-centered video when gaze fixations are available,the object proposal can be generated on the basis of Gazefixation density maps (Figure 1, a). But this measures arecoming form a live system such as human, are noisy becauseof distractors, weakly repeatable. This is why a temporalfiltering of them by smoothing geometric locus with kernel-based estimators after their projection into a reference videoframe (Figure 1, b) is appropriate. Then a set of objectproposals can be generated around smoothed position in eachframe (Figure 1, c)and submitted to the classifier.

IV. GENERAL ARCHITECTURES OF DEEP NNSFOR OBJECT RECOGNITION

Nowadays, a variety of architectures of convolution neuralnetworks do exist, GoogleNet[14], VGG[15] just to mentionfew. In case of high resolution video frames from wearablevideo cameras, these deep networks show good perfor-mances in classification tasks. Recently proposed ResNetCNN shows the best performances. It is different fromprevious networks in the sense that it does not learn theoriginal mapping F (x, α) at each layer but the residual oneH(x, α) = (F (x, α) − x. Then the original mapping isobtained as H(x, α) + x. The transmission of inputs intoeach layer x to get the final mapping is realized with short-cut connections in the network which do not require trainingof weights. While the authors of ResNet [16] report onthe networks with hundreds of layers, quite good results inour problem are achieved with relatively limited number ofthem (ResNet50)[17]. Here the mean accuracy with five-foldvalidation was achieved of of 75.3 ±3.3%While CNNs are applied for object recognition in video onper-frame basis, the temporal coherency of the visual scenecan be taken into account by the Long-Short Term memoryNetworks (LSTM). First introduced in [18] these networksdiffer from temporal Recurrent Neural Networks, as LSTMscontain information outside the normal flow of the recurrentnetwork in a gated cell. In each cell there are three kindsof gates: input, output and forget. The forget gets control

the weights of state-loops. In the case of video processing,they allow filtering object detection information along thetime when predicting actions. In our work [9] using objectdetection scores and motion descriptors the intention to graspand object was recognized with 0.6 F-score due to the LSTMuse.

V. CONCLUSIONIn this short summary we have presented new and very

quickly developing methods: recognition of objects in ego-centered visual scenes. We have first introduced most pop-ular ego-centered open datasets recorded for human sens-ing. Then we focused on the unavoidable today basis ofthese methods, such as Deep Neural Networks. Despite athe growing success of these classifiers, in ego-centeredobject recognition tasks there is a lot to do. First of all,attention mechanisms have to be better explored in end-to-end training. Incremental interaction scenario naturallyyields development of incremental learning approach in theseparadigms which is the actual focus of our work.

VI. REFERENCES[1] Jonathan Weisz, Peter Allen, Alexander G Barszap, and

Sanjay S Joshi, “Assistive grasping with an augmentedreality user interface,” The International Journal ofRobotics Research, vol. 36, 05 2017.

[2] Marko Markovic, Hemanth Karnal, BernhardGraimann, Dario Farina, and Strahinja Dosen,“Glimpse: Google glass interface for sensory feedbackin myoelectric hand prostheses,” Journal of NeuralEngineering, vol. 14, no. 3, pp. 036007, 2017.

[3] Ivan Gonzalez-Dıaz, Vincent Buso, Jenny Benois-Pineau, Guillaume Bourmaud, and Remi Megret,“Modeling instrumental activities of daily living inegocentric vision as sequences of active objects andcontext for alzheimer disease research,” in ACMinternational workshop on Multimedia indexing andinformation retrieval for healthcare, 2013, pp. 11–14.

[4] Yong Jae Lee and Kristen Grauman, “Predictingimportant objects for egocentric video summarization,”Int. J. Comput. Vision, vol. 114, no. 1, pp. 38–55, Aug.2015.

[5] Hamed Pirsiavash and Deva Ramanan, “Detectingactivities of daily living in first-person camera views,”in 2012 IEEE Conference on Computer Vision andPattern Recognition, Providence, RI, USA, June 16-21,2012, 2012, pp. 2847–2854.

[6] Dima Damen, Hazel Doughty, Giovanni MariaFarinella, Sanja Fidler, Antonino Furnari, EvangelosKazakos, Davide Moltisanti, Jonathan Munro, TobyPerrett, Will Price, and Michael Wray, “Scaling ego-centric vision: The epic-kitchens dataset,” in EuropeanConference on Computer Vision (ECCV), 2018.

[7] Svebor Karaman, Jenny Benois-Pineau, VladislavsDovgalecs, Remi Megret, Julien Pinquier, RegineAndre-Obrecht, Yann Gaestel, and Jean-Francois Dar-tigues, “Hierarchical hidden markov model in detectingactivities of daily living in wearable videos for studiesof dementia,” Multimedia Tools Appl., vol. 69, no. 3,pp. 743–771, 2014.

[8] Georgios Meditskos, Pierre-Marie Plans, Thanos G.Stavropoulos, Jenny Benois-Pineau, Vincent Buso, and

8

Ioannis Kompatsiaris, “Multi-modal activity recogni-tion from egocentric vision, semantic enrichment andlifelogging applications for the care of dementia,” J.Visual Communication and Image Representation, vol.51, pp. 169–190, 2018.

[9] Ivan Gonzalez-Dıaz, Jenny Benois-Pineau, Jean-Philippe Domenger, Daniel Cattaert, and Aymarde Rugy, “Perceptually-guided deep neural networksfor ego-action prediction: Object grasping,” PatternRecognition, vol. 88, pp. 223–235, 2019.

[10] Philippe PA c©rez de San Roman, Jenny Benois-Pineau, Jean-Philippe Domenger, Florent Paclet, DanielCataert, and Aymar de Rugy, “Saliency driven objectrecognition in egocentric videos with deep cnn: towardapplication in assistance to neuroprostheses,” ComputerVision and Image Understanding, vol. 164, pp. 82 – 91,2017, Deep Learning for Computer Vision.

[11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hin-ton, “Imagenet classification with deep convolutionalneural networks,” in NIPS, Lake Tahoe, Nevada, UnitedStates., 2012, pp. 1106–1114.

[12] Jasper R. R. Uijlings, Koen E. A. van de Sande, TheoGevers, and Arnold W. M. Smeulders, “Selectivesearch for object recognition,” International Journal ofComputer Vision, vol. 104, no. 2, pp. 154–171, 2013.

[13] Abraham Montoya Obeso, Jenny Benois-Pineau,Mireya Vazquez, and Alexandro Alvaro RamirezAcosta, “Saliency-based selection of visual content fordeep convolutional neural networks,” MultimedToolsAppl, pp. 1–24, 2018.

[14] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Ser-manet, Scott E. Reed, Dragomir Anguelov, DumitruErhan, Vincent Vanhoucke, and Andrew Rabinovich,“Going deeper with convolutions,” in IEEE Conferenceon Computer Vision and Pattern Recognition, CVPR2015, Boston, MA, USA, June 7-12, 2015, 2015, pp.1–9.

[15] K. Simonyan and A. Zisserman, “Very deep convo-lutional networks for large-scale image recognition,”CoRR, vol. abs/1409.1556, 2014.

[16] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and JianSun, “Deep residual learning for image recognition,” in2016 IEEE Conference on Computer Vision and PatternRecognition, CVPR 2016, Las Vegas, NV, USA, June27-30, 2016, 2016, pp. 770–778.

[17] Ivan Gonzalez-Dıaz, Jenny Benois-Pineau, Jean-Philippe Domenger, and Aymar de Rugy,“Perceptually-guided understanding of egocentricvideo content: Recognition of objects to grasp,”in Proceedings of the 2018 ACM on InternationalConference on Multimedia Retrieval, ICMR 2018,Yokohama, Japan, June 11-14, 2018, 2018, pp.434–441.

[18] Sepp Hochreiter and Jurgen Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp.1735–1780, 1997.

9

Hybrid Sensorimotor Controlof a Prosthetic Arm

Aymar de Rugy, Sebastien Mick, Matthieu Guemann, Mathilde Couraud,Christophe Halgand, Florent Paclet and Daniel Cattaert

Institut de Neurosciences Cognitives et Integratives d’AquitaineUniv. Bordeaux & CNRS, Bordeaux, France

INTRODUCTION

By interfacing human with artificial devices such as arobotic arm or prostheses, the main goal of our research teamis twofold: (i) increase our understanding of the fundamentalmechanisms of sensorimotor control, and (ii) exploit thisknowledge to restore and optimize movement. In this talk Iwill present 4 research axes that our team develops towardthese goals.

AXIS 1: NATURAL SENSORIMOTOR MAPPING

Myoelectric control, whereby recorded muscle activities areused to control a prosthesis, inevitably induces discrepanciesbetween the prosthesis movements and the natural movementsthe same muscle activities would normally produce. Althoughthese discrepancies could be reduced by simulating realisticbiomechanical models, important regulations from spinal cir-cuits are typically lacking all existing control schemes. In thisaxis, we model and simulate these important spinal regulationsto reinstall them in myoelectric control. In a recent work,we also coupled myoelectric elbow control with vibrotactilefeedback (Fig1) in order to supplement or replace visualfeedback loops [1].

Fig. 1. A- Biomechanical arm model controlled by a spinal network togenerate simple movement in the presence of perturbations. B- Myoelectricelbow control with vibrotactile feedback. Video available at https://youtu.be/FNYkaPUVi7M

AXIS 2: DEVELOPMENTAL ROBOTICS

In collaboration with the developmental robotic teamFLOWERS (headed by PY Oudeyer) and the startup companyPollen-Robotics, we developed the 3D-printed robotic armREACHY for the specific purpose of our research and thatof others aiming at developing and testing control principleson a flexible and easy to interface robotic platform [2], [3].In a recent application, we used supervised learning with anartificial neural network to learn the mapping between the 7joint angles of REACHY and the position of its hand in space,

and use this mapping to resolve inverse kinematics and directlycontrol the hand position. When the movements used to learnthis mapping are produced by humans during free reaching,this strategy enables to encapsulate natural coordination withinthe network and exploits it for natural robot control. Fig2 andits associated video of REACHY tele-operated from a uniquemarker on the hand of an operating subject illustrate thatthis method is effective at producing easy to control naturalmovements. Similar strategies are explored to reconstructdistal joints from proximal ones, in order to better exploitremaining stump kinematics for prostheses control.

Fig. 2. REACHY teleoperated from a single marker on the subjects hand.Video available at https://youtu.be/Oa9mHMoDtYI

AXIS 3: CO-ADAPTATION

Instead of leaving subjects alone to learn a potentiallydifficult mapping between muscle activities and movements,recent attempts are being made whereby the decoder itselfis concurrently adapting as a function of ongoing errors(Fig3). In this research axis, we use a simplified myoelectriccontrol with a perturbation for which human adaptation iswell characterized and modeled (ie, a visuomotor rotation), inorder to explore co-adaptation settings in a principled manner.Experimental results and simulations revealed that a relativelylow gain of decoder adaptation minimizes final errors butgenerates slow and incomplete adaptation, whereas highergains increase adaptation rate but also errors by amplifyingnoise. We show how a variable co-adaptation gain tuned toerror dynamics can cumulate the advantages of both withouttheir drawbacks [4]. However, important work remains to bedone to transfer these principles established in a simplifiedmyoelectric context to more complex prosthesis control.

AXIS 4: COMPUTER VISION AND GAZE INFORMATION

In collaboration with Prof. Benois Pineau from LABRI,we explore ways to integrate computer vision and gaze in-

11

Fig. 3. Co-adaptation

formation to prosthesis control. In [5], deep convolutionalneural network combined with visual attention map computedover gaze fixations measured by a glass-worn eye-trackerenabled fast recognition of object of interest that could beused to assist prosthesis control. In [6], a similar networkwas augmented with the predictive power of Long-Short TermMemory networks to analyze gaze and visual dynamics, andpredict an intended action toward an object even before thegrasping action toward that object was actually launched bythe user. Fig4 and its associated short video also illustrates aproof of concept whereby REACHY is simply controlled bygaze information and minimal muscle contraction.

Fig. 4. REACHY gaze-controlled. Video available at https://youtu.be/qloR67AaqQ4

REFERENCES

[1] M. Guemann et al., “Sensory and motor parameter estimation for elbowmyoelectric control with vibrotactile feedback,” in Congress of theInternational Society of Electrophysiology and Kinesiology, 2018.

[2] S. Mick et al., “Reachy, a 3d-printed human-like robotic arm as a test bedfor prosthesis control strategies,” in Congress of the International Societyof Electrophysiology and Kinesiology, 2018.

[3] S. Mick, D. Cattaert, F. Paclet, P.-Y. Oudeyer, and A. De Rugy, “Perfor-mance and usability of various robotic arm control modes from humanforce signals,” Frontiers in neurorobotics, vol. 11, p. 55, 2017.

[4] M. Couraud, D. Cattaert, F. Paclet, P.-Y. Oudeyer, and A. De Rugy,“Model and experiments to optimize co-adaptation in a simplified myo-electric control system,” Journal of neural engineering, vol. 15, no. 2, p.026006, 2018.

[5] P. P. de San Roman, J. Benois-Pineau, J.-P. Domenger, F. Paclet,D. Cataert, and A. De Rugy, “Saliency driven object recognition inegocentric videos with deep cnn: toward application in assistance toneuroprostheses,” Computer Vision and Image Understanding, vol. 164,pp. 82–91, 2017.

[6] I. Gonzalez-Dıaz, J. Benois-Pineau, J.-P. Domenger, D. Cattaert, andA. de Rugy, “Perceptually-guided deep neural networks for ego-actionprediction: Object grasping,” Pattern Recognition, vol. 88, pp. 223–235,2019.

12

From Neuroprosthetics toSymbiotic Brain-Machine interaction

Ricardo ChavarriagaCNBI, Center for Neuroprosthetics, School of Engineering

Ecole Polytechnique Federale de Lausanne, EPFLGeneva, Switzerland

[email protected]

Abstract—Brain-Machine Interfaces (BMI), also referred toas Neuroprosthetics, are systems that translate brain activitypatterns into commands that can be executed by an artificialdevice. They enable the possibility of controlling devices suchas a prosthetic arm or exoskeleton, a wheelchair, typewritingapplications, or games directly by modulating our brain activity.For this purpose, BCI systems rely on signal processing andmachine learning algorithms to decode the brain activity. Here wepresent a brief survey of the state of the art on these technologies,as well as their main application domains.

Index Terms—brain-computer interface, brain-machine inter-face, neuroprosthetics, machine learning, assistive technologies,rehabilitation,gaming

I. INTRODUCTION

A Brain-Machine Interface (BMI) is a system that translatebrain activity patterns into commands that can be executedby an artificial device. This technology, also referred to asNeuroprosthetics, enables the possibility of controlling devicessuch as a prosthetic arm or exoskeleton, a wheelchair, type-writing applications, or games directly by modulating our brainactivity.

Figure 1 shows the general architecture of a BMI; itconsists of a closed-loop where neural activity is measuredeither with non-invasive sensors or implanted sensors. Amongthe different modalities that can be used, we can mentionrecordings of electrical bran activity using intra-cortical elec-trodes (single and multi-unit activity, Local Field Potentials(LFP), electrocorticography (ECoG), or Electroencephalogra-phy (EEG) [1]; as well as magnetic and optical techniques,e.g., Magnetoencephalography (MEG) [2] or functional near-infrared spectroscopy (fNIRS), respectively [3].

Acquired signals are then processed and machine learningalgorithms are used to identify the mental processes that gen-erate them [4]. Afterwards, the output of the decoding processis translated onto commands for the device to be controlled.Last but not least, feedback information on the selected actionsare provided to the user. This last step is extremely importantsince the perception of a causal relation between the mentaltask and the executed actions is a necessary element to acquirethe skills required to control the neuroprosthetic device.

II. APPLICATIONS

Multiple applications have been proposed for neuropros-thetic devices. Initially, most of them were oriented to userswith severe motor disabilities [5]. In particular subjects withlocked-in syndrome, as they have a very limited range ofusable interfaces. In addition, applications for people withoutdisabilities have also been proposed in fields like gaming or to

Fig. 1. Processing steps of a neuroprosthetic device. The acquired signalis pre-processed through both spatial and spectral filters. Then discriminantfeatures are extracted and used as inputs to a decoder that yields an outputcommand that is sent to an external device to be executed. By definition,the system is a closed-loop as executed actions are perceived by the subjectsthrough their own senses or explicit feedback.

improve human-machine interaction. A brief account of theseapplications is shown below.

A. Communication

One of the most studied applications, in particular for non-invasive approaches it the use of text-entry systems as ameans to restore communication capabilities. This capacityis reported as one of the highest priority of end-users. Themost common paradigm is the so-called P300-based BCIspellers [6]. This paradigm requires subjects to focus theirattention on flashing stimuli, typically sequential highlightingof rows and columns in a matrix of characters, and the BMIfocuses on decoding a positive waveform occurring approxi-mately 300 ms after the infrequent task-relevant stimulus (e.g.the intended character) is detected. Multiple studies are beingperformed in evaluating this type of interfaces in different userpopulations [7]. Other approaches are based on steady-statevisual evoked potentials [8] or motor imagery [9].

B. Prosthetic devices

Another application is the decoding of motor intentions, ormovement kinematics to control prosthetic devices, both forthe upper or lower limbs. A large number of studies usingintra-cortical recordings have used regression approaches todecode arm position, velocity or grasping force and control arobotic arm [10], [11]. Non-invasive approaches have also at-tempted to decode kinematics [12], as well as other correlatesof motor intention including movement onset [13], graspingonset and types [14].

Complementarily, other studies have focused on the de-coding information of lower-limb movements to control exo-

13

skeletons or neuromuscular stimulation of the legs [15]. Whileanother application focuses on the retoration of mobilitythrough control of electric wheelchairs [16]

C. Motor neurorehabilitation

One of the most promising applications of BMI is thepossibility of using them to steer neural plasticity in the frameof motor neurorehabilitation. In these cases, the decoding ofmotor-imagery of the affected limb is coupled with congruentfeedback using visual stimulation, electrical neurostimulationor passive mobilisation using exo-skeletons. This is intendedto promote beneficial neuroplasticity that can help in therehabilitation process [17], [18].

D. Other applications

Besides the applications for people with disabilities, BMIcan also be used to identify user’s states and intentions inapplications for the general public. This includes gamingapplications [19], semi-autonomous driving [20] and neuroer-gonomics [21].

III. DISCUSSION

Neuroprosthetic systems provided a new way for humans tointeract with their environment. Advances in sensing technol-ogy, machine learning, and robotics have yielded impressiveachievements on the development of systems for communica-tion, motor substitution or rehabilitation, as well as consumer-oriented applications.

Nonetheless, multiple challenges still need to be overcome.Most of these systems have only been tested in rather smallpopulations, in particular those oriented for clinical purposes.Moreover, these tests are often limited to short periods of time.In consequence, little information is available on the long-term effects of neuroprosthetic use, as well as on strategies toimprove learning of the skills required to control the system.

REFERENCES

[1] J. d. R. Millan and J. M. Carmena, “Invasive or Noninvasive: Under-standing Brain-Machine Interface Technology [Conversations in BME],”IEEE Engineering in Medicine and Biology Magazine, vol. 29, no.February, pp. 16–22, 2010.

[2] S. Waldert, H. Preissl, E. Demandt, C. Braun, N. Birbaumer, A. Aertsen,and C. Mehring, “Hand movement direction decoded from {MEG} and{EEG}.” J Neurosci, vol. 28, no. 4, pp. 1000–1008, jan 2008.

[3] U. Chaudhary, B. Xia, S. Silvoni, L. G. Cohen, and N. Birbaumer,“BrainComputer InterfaceBased Communication in the CompletelyLocked-In State,” PLoS Biology, vol. 15, no. 1, 2017.

[4] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Congedo, A. Rako-tomamonjy, and F. Yger, “A review of classification algorithms for EEG-based braincomputer interfaces: a 10 year update,” Journal of NeuralEngineering, vol. 15, no. 3, p. 031005, jun 2018.

[5] J. Millan, R. Rupp, G. R. Muller-Putz, R. Murray-Smith, C. Giugliemma,M. Tangermann, C. Vidaurre, F. Cincotti, A. Kubler, R. Leeb, C. Neuper,K. R. Muller, and D. Mattia, “Combining Brain-Computer Interfacesand Assistive Technologies: State-of-the-Art and Challenges,” Frontiersin Neuroscience, vol. 4, p. 161, 2010.

[6] L. A. Farwell and E. Donchin, “Talking off the top of your head:toward a mental prosthesis utilizing event-related brain potentials,”Electroencephalography and Clinical Neurophysiology, vol. 70, no. 6,pp. 510–523, dec 1988.

[7] E. W. Sellers, D. B. Ryan, and C. K. Hauser, “Noninvasive brain-computer interface enables communication after brainstem stroke.” SciTransl Med, vol. 6, no. 257, p. 257re7, oct 2014.

[8] X. Chen, Y. Wang, M. Nakanishi, X. Gao, T.-P. Jung, and S. Gao, “High-speed spelling with a noninvasive brain-computer interface.” Proc NatlAcad Sci U S A, vol. 112, no. 44, pp. E6058—-E6067, nov 2015.

[9] S. Perdikis, R. Leeb, J. Williamson, A. Ramsay, M. Tavella, L. Desideri,E.-J. Hoogerwerf, A. Al-Khodairy, R. Murray-Smith, and J. del RMillan, “Clinical evaluation of {BrainTree}, a motor imagery hybrid{BCI} speller,” Journal of Neural Engineering, vol. 11, no. 3, p. 36003,apr 2014.

[10] A. B. Ajiboye, F. R. Willett, D. R. Young, W. D. Memberg, B. A.Murphy, J. P. Miller, B. L. Walter, J. A. Sweet, H. A. Hoyen, M. W.Keith, P. H. Peckham, J. D. Simeral, J. P. Donoghue, L. R. Hochberg,and R. F. Kirsch, “Restoration of reaching and grasping movementsthrough brain-controlled muscle stimulation in a person with tetraplegia:a proof-of-concept demonstration,” The Lancet, vol. 389, no. 10081, pp.1821–1830, may 2017.

[11] J. L. Collinger, B. Wodlinger, J. E. Downey, W. Wang, E. C. Tyler-Kabara, D. J. Weber, A. J. McMorland, M. Velliste, M. L. Boninger,and A. B. Schwartz, “High-performance neuroprosthetic control by anindividual with tetraplegia,” The Lancet, vol. 381, no. 9866, pp. 557–564, feb 2013.

[12] A. Y. Paek, H. A. Agashe, and J. L. Contreras-Vidal, “Decodingrepetitive finger movements with brain activity acquired via non-invasiveelectroencephalography.” Front Neuroeng, vol. 7, p. 3, 2014.

[13] E. Lew, R. Chavarriaga, S. Silvoni, and J. d. R. Millan, “Detectionof Self-Paced Reaching Movement Intention from {EEG} Signal,”Frontiers in Neuroengineering, vol. 5, no. JULY, p. 13, 2012.

[14] I. Iturrate, R. Chavarriaga, M. Pereira, H. Zhang, T. Corbet, R. Leeb,and J. d. R. Millan, “Human EEG reveals distinct neural correlates ofpower and precision grasping types,” NeuroImage, vol. 181, no. July,pp. 635–644, nov 2018.

[15] K. Lee, D. Liu, L. Perroud, R. Chavarriaga, and J. d. R. Millan, “A brain-controlled exoskeleton with cascaded event-related desynchronizationclassifiers,” Robotics and Autonomous Systems, vol. 90, pp. 15–23, 2017.

[16] A. Fernandez-Rodrıguez, F. Velasco-Alvarez, and R. Ron-Angevin,“Review of real brain-controlled wheelchairs,” Journal of Neural En-gineering, vol. 13, no. 6, p. 061001, dec 2016.

[17] A. Biasiucci, R. Leeb, I. Iturrate, S. Perdikis, A. Al-Khodairy,T. Corbet, A. Schnider, T. Schmidlin, H. Zhang, M. Bassolino,D. Viceic, P. Vuadens, A. G. Guggisberg, and J. d. R. Millan, “Brain-actuated functional electrical stimulation elicits lasting arm motorrecovery after stroke,” Nature Communications, vol. 9, no. 1, p. 2421,dec 2018. [Online]. Available: http://www.nature.com/articles/s41467-018-04673-z

[18] M. A. Cervera, S. R. Soekadar, J. Ushiba, J. d. R. Millan, M. Liu,N. Birbaumer, and G. Garipelli, “Brain-computer interfaces for post-stroke motor rehabilitation: a meta-analysis,” Annals of Clinical andTranslational Neurology, vol. 5, no. 5, pp. 651–663, may 2018.

[19] B. Kerous, F. Skola, and F. Liarokapis, “{EEG}-based {BCI} and videogames: a progress report,” Virtual Reality, vol. 22, no. 2, pp. 119–135,jun 2018.

[20] R. Chavarriaga, M. Uscumlic, H. Zhang, Z. Khaliliardali, R. Aydark-hanov, S. Saeedi, L. Gheorghe, and J. d. R. Millan, “Decoding NeuralCorrelates of Cognitive States to Enhance Driving Experience,” IEEETransactions on Emerging Topics in Computational Intelligence, vol. 2,no. 4, pp. 288–297, aug 2018.

[21] K. Gramann, S. H. Fairclough, T. O. Zander, and H. Ayaz, “Editorial:Trends in Neuroergonomics,” Frontiers in Human Neuroscience, vol. 11,apr 2017.

14

Combining object recognition, gaze tracking andelectromyography to guide prosthetic hands –

experiences from two research projects*Henning Muller

University of Geneva &HES-SO

Sierre, [email protected]

Manfredo AtzoriInformation Systems Institute

HES-SOSierre, Switzerland

[email protected]

Abstract—Hand amputation can completely change the lifeof the concerned persons in terms of every day activities andpersonal independence, even much stronger so if both handsare lost. Most hand prostheses are little accepted by amputeesand only give basic functionalities back to the amputees, suchas simple opening and closing. Some modern prostheses allowfor much more complex movements but the control mechanismsreally need to be improved for good and natural control (someprosthesis are controlled with the arm/shoulder, so body-poweredand non-natural). On this topic several research projects existwith varying objectives, from invasive methods to using additionalsignals such as cameras in the prostheses, which is described inthis text.

This article summarises several years of research executedin the MedGIFT research group in two projects funded by theSwiss National Science Foundation in collaboration with nationaland international partners. It starts with the descirption of abasic acquisition setup that uses mainly electromyography andacceleration sensors and finishes with the current multi-sensorintegration that also includes a scene camera, object recognitionand gaze tracking of the person using a prosthesis combinedwith surface Electromyography and acceleration sensors on theforearm. The text finishes with a short outlook into futureresearch challenges for controlling hand prostheses.

Index Terms—multi-sensory information, gaze tracking, pros-thesis control, electromyography

I. INTRODUCTION

A hand amputation is not one of the most frequent injuriesbut it is one that can have a strong personal impact, asmany daily activities can become difficult to perform. It wasestimated that around 41,000 people were living with a majorupper limb loss in the USA in 2005 [1]. There are cosmeticprostheses, for example a simple hook without any activefunctionality. Then there are body-powered prostheses, wherea non-natural moment is used for opening and closing a hand.This is usually only possible for a single movement and notfor more complex parts. surface Electromyography (sEMG)allows to measure the electrical activity of the remnant musclesand it constitutes the third large group of professional prosthe-ses Most often this is also only used for one movement or avery small number of movements but more complex prostheseswith single fingers and up to 50 movements exist as well. Areview of hand prostheses can be found in [2].

In research environment also invasive methods that do notonly use muscles but also connected directly to the nerves

This work was partly supported by the Swiss National Science Foundation(SNF) in the context of the MeganePro project.

have been implemented, as well as brain-computer interfaces,both as invasive and non invasive methods, for example viaEEG (Electro-EncephaloGram) [3].

Unfortunately commercial prostheses are frequently rejectedby the users and so only a minority of amputated persons usesprostheses. Critical points are for example linked to heat andthe wight of prostheses [4]–[6]. Most commercial prosthesesare also very expensive [7], [8]. Addition of video streams forthe analysis of grasps has also been done in research work [9].

Comparing quality of prosthesis control has been difficult,as the number of movements and subjects varies strongly formone publication to another.

II. METHODS EMPLOYED AND RESULTS OBTAINED IN THEPROJECTS

This section describes the main results obtained in theMeganePro and NinaPro1 projects, from the standardised ac-quisition setup developed to the obtained classification resultsof the tests with amputated persons and healthy controls.Ethics approval for the project was obtained both in Switzer-land and in Italy (at the University hospital of Padova, Italy),where most data acquisitions were done.

A. Acquisition setup

Figure 1 shows one example setup for the MeganePro andNinaPro project. Usually sEMG electrodes are placed aroundthe forearm, with several types of electrodes having been usedin the project, from Otto Bock, to the Myo armband andDelsys Trigno wireless electrodes that are all compared in [10].After an analysis for an acquisition protocol described in [11]first tests were performed with an amputated subject [12] tomake sure that good signal quality could be obtained andalso to measure how difficult and stressful such tests are foramputees. Based on the first experiences the protocol wasslightly simplified and more breaks were included to limitthe amount of stress and the impact of fatigue. The objectiveof the protocol was clearly to favour natural control of theprosthesis [13]. In several setups, tests were also run using aCyberGlove and echoing the movement from one hand onthe other. Force measurements were also used in some ofthe tests, also to possibly synchronise movements. Severalslightly different setups were used for the acquisitions, usuallywith a large number of around 50 movements, exceeding in

1http://ninapro.hevs.ch/

15

complexity what was commonly used in the field and to allowfor a maximum of possible uses.

Fig. 1. Acquisition setup in the MeganePro project including the descriptionof movements to execute on a screen and data acquisition with electromyog-raphy and acceleration tracking and in addition the gaze tracker that includesa scene camera for analysing the field of view of the person.

Several of the data sets produced in the NinaPro projectwere published and are also publicly available for other re-searchers to use them [14], [15]. In the setup of the MeganeProproject [16] a gaze tracking device from Tobii (Tobii 2 glasses)was added, as can be seen in Figure 2. This adds informationfor making a decision on the movement to be taken byrecognising objects in the images [17] and also by analysingthe gaze point prior to starting a new movement and adaptingthe movement to the object that was selected. An overviewof such gaze trackers with more technical information of thedevices can also be found in [18].

Fig. 2. The Tobii glasses that were used in opur experiments.

One of the major challenges in recording data from severalsources is the synchronisation of the data sources, from thesEMG stream of 12 electrodes, to 12 acceleration sensors in3 directions, the video stream and gaze point in the videorecorded. Participants were following a video showing themovements to before, and a delay in starting the movementafter it is shown on video can vary from one movement toanother and also between repetitions of the same movement.Frequencies of the devices are not at all the same and thereare possibilities for up- or down-sampling the devices, whichcan both have an impact on the classification results. It isimportant to well identify the onset of a movement to then beable to react quickly. In general the maximum amount of timefor starting a movement that the prosthesis user initiated

Most of the data acquisition were performed without anactual prosthesis but wit the help of a 3D printed prosthesis itwas in the end possible to also have such visual feedback forthe participants in the study [19], [20], as such feedback alsoallows the amputees to have visual feedback on the movements

Fig. 3. Several data sources need to be combined for the final decision making,including complex synchronisation of data acquired with varying frequencies.

detected and thus possible the participants can adapt theirbehaviour, leading to much better classification results.

B. Outcomes of the NinaPro and MeganePro projects

Based on the initial NinaPro data acquisitions [11] andseveral studies that were done later with the same protocolbut different electrodes, for example [10] much research wasmade possible both in our research group but also in severalother researcher groups that obtained access to the data set.Many techniques have been applied for the data classification,for example deep learning approaches [21].

One outcome has been the important correlation of someof the clinical parameters with the performance of amputeesin [22]. It was shown that remaining forearm percentageis clearly correlated with performance. More surprising alsopersonal, subjective phantom limb sensation was correlatedwith performance, which has so far no clear explication. Thetime since the amputation is also positively correlated withclassification accuracy of movements and this independent onwhether the persons use a prosthesis or not, which was sur-prising A possible explication is that natural re-innervation in-creases the quality if the signal that is available for movementrecognition. Neuroscience experiments on the other hand showthat the brain area responsible for a missing limb decreasesover time. The general accuracy of movement recognition ofpersons who use or who do not use prostheses can be foundin [23].

An important aspect of such experiments and data acqui-sitions is whether the results can be repeated and for thiss protocol was defined where the same ten persons wererecorded for five straight days twice per day, once in themorning and once in the afternoon [24]. The results show thatthere are significant differences in the data and that it is hardto lear across sessions even for the same persons. This can belinked to the exact electrode position but also to fatigue in theafternoon and external factors.

The availability of the large amount of recorded data ofmuscle activities for many movements also allowed furtheranalyses. In [25], several synergies of the muscles wereidentified and this can have an important possible impactfor the analysis of neural diseases or rehabilitation beyondprosthesis control- I can also help to build better prosthesesby using the detected synergies.

16

Another project that was made possible via all the acquireddata is the creation of a new hand taxonomy that is not basedon subjective human analysis but on experimental quantitativedata [26]. Such a taxonomy can equally have an importantimpact on the domains of prosthesis control and more generalin rehabilitation involving the hand, also for example afterstroke.

III. CONCLUSIONS AND FUTURE WORK

Within the NinaPro project the foundations for a protocol fordata acquisition with many amputees and also non-amputeeswas created with a large number of movements that allowto cover most movements for activities of daily living in arealistic scenario. This created a benchmark for performanceanalysis of movement control and sharing the benchmark datawith the community was valuable, thus allowing to comparethe many approaches on the same bases. Links betweenclinical data and movement quality were made but also severalshortcomings were found in the repeatability experimentsmeaning that transfer learning was very difficult and mightbe impossible in the current setup. The difference betweenamputees and healthy subjects is significant and when usingmany movements the quality of classifying movements fullycorrectly is somewhat limited in amputees. Some information,for example of the thumb muscle is simply absent in amputees.Additional information can be obtained with the gaze trackerand a scene camera, which allows to identify objects in ascene and which of the objects is likely to be used via thegaze information. This has the potential to partly leveragethe missing information and might improve the classificationaccuracy particularly for amputees. These differences can alsobe explained by other parameters. In most studies the amputeesare patients with a large variety in age and socioeconomicstatus. The control group are most often volunteers from theuniversity campus who are usually of a higher socioeconomicclass and thus healthier and usually quite young comparedto amputees. This can already make an important differencein acquisition quality and in the MeganePro project a dataset will soon be released that contains 20 amputees and acontrol group that is matched by age, gender and partly theeducation status. Another difference is that healthy personshave feedback on the movement with their hands whereasamputees do not have any feedback in the experiments. In asmall test [20] with a real, 3D printed prosthesis showed thatamputees are able to adapt to the system, possibly leadingto higher results. Showing results in augmented reality fortraining can also help with thus basic sensory feedback.

As a conclusion, both the NinaPro and MeganePro projectshave created an open environment for research in hand pros-thetics by making data and source code available ands sharingopenly with the research community.

REFERENCES

[1] K. Ziegler-Graham, E. J. MacKenzie, P. L. Ephraim, T. G. Travison, andR. Brookmeyer, “Estimating the Prevalence of Limb Loss in the UnitedStates: 2005 to 2050,” Archives of Physical Medicine and Rehabilitation,vol. 89, no. 3, pp. 422–429, 2008.

[2] M. Atzori and H. Muller, “Control capabilities of myoelectric roboticprostheses by hand amputees: A scientific research and marketoverview,” Frontiers in Systems Neuroscience, vol. 9, no. 162, 2015.

[3] I. Iturrate, R. Chavarriaga, M. Pereira, H. Zhang, T. Corbet, R. Leeb,and J. del R. Millan, “Human EEG reveals distinct neural correlates ofpower and precision grasping types,” NeuroImage, vol. 181, pp. 635–644, 2018.

[4] S. Ritchie, S. Wiggins, and A. Sanford, “Perceptions of cosmesis andfunction in adults with upper limb prostheses: a systematic literaturereview.” Prosthetics and orthotics international, vol. 35, no. 4, pp. 332–41, 2011.

[5] F. Cordella, A. L. Ciancio, R. Sacchetti, A. Davalli, A. G. Cutti,E. Guglielmelli, and L. Zollo, “Literature Review on Needs of UpperLimb Prosthesis Users,” Frontiers in Neuroscience, vol. 10, p. 209, may2016.

[6] E. Biddiss, D. Beaton, and T. Chau, “Consumer design priorities forupper limb prosthetics,” Disabil. Rehabil. Assist. Technol., vol. 2, no. 6,pp. 346–357, 2007.

[7] D. K. Blough, S. Hubbard, L. V. McFarland, D. G. Smith, J. M. Gambel,and G. E. Reiber, “Prosthetic cost projections for servicemembers withmajor limb loss from Vietnam and OIF/OEF.” Journal of rehabilitationresearch and development, vol. 47, no. 4, pp. 387–402, 2010.

[8] D. Van Der Riet, R. Stopforth, G. Bright, and O. Diegel, “An overviewand comparison of upper limb prosthetics,” IEEE AFRICON Conference,2013.

[9] I. Gonzalez-Dıaz, J. Benois-Pineau, J. Domenger, D. Cattaert, andA. de Rugy, “Perceptually-guided deep neural networks for ego-actionprediction: Object grasping,” Pattern Recognition, vol. 88, pp. 223–235,2019.

[10] S. Pizzolato, L. Tagliapietra, M. Cognolato, M. Reggiani, H. Muller, andM. Atzori, “Comparison of six electromyography acquisition setups onhand movement classification tasks,” Plos One, 2017.

[11] M. Atzori, A. Gijsberts, S. Heynen, A.-G. Mittaz-Hager, O. Deriaz,P. van der Smagt, C. Castellini, B. Caputo, and H. Muller, “Buildingthe NINAPRO Database: A Resource for the Biorobotics Community,”in Proceedings of the IEEE International Conference on BiomedicalRobotics and Biomechatronics (BioRob), 2012, pp. 1258–1265.

[12] M. Atzori, M. Baechler, and H. Muller, “Recognition of Hand Move-ments in a Trans–Radial Amputated Subject by sEMG,” in Proceedingsof IEEE International Conference on Rehabilitation Robotics (ICORR),2013.

[13] M. Atzori, A. Gijsberts, B. Caputo, and H. Muller, “Natural ControlCapabilities of Robotic Hands by Hand Amputated Subjects,” in AnnualInternational Conference of the IEEE Engineering in Medicine andBiology Society (EMBC), 2014.

[14] M. Atzori, A. Gijsberts, I. Kuzborskij, S. Elsig, A.-G. Mittaz Hager,O. Deriaz, C. Castellini, H. Muller, and B. Caputo, “Characterization ofa benchmark database for myoelectric movement classification,” NeuralSystems and Rehabilitation Engineering, IEEE Transactions on, vol. 23,no. 1, pp. 73–83, 2015.

[15] M. Atzori, A. Gijsberts, C. Castellini, B. Caputo, A.-G. M. Hager,S. Elsig, G. Giatsidis, F. Bassetto, and H. Muller, “Electromyographydata for non-invasive naturally-controlled robotic hand prostheses,”Scientific Data, vol. 1, 2014.

[16] F. Giordaniello, M. Cognolato, M. Graziani, A. Gijsberts, V. Gregori,G. Saetta, A.-G. M. Hager, C. Tiengo, F. Bassetto, P. Brugger, B. Caputo,H. Muller, and M. Atzori, “Megane pro: myo-electricity, visual and gazetracking integration as a resource for dexterous hand prosthetics,” inIEEE International Conference on Rehabilitation Robotics, 2017.

[17] M. Cognolato, M. Graziani, F. Giordaniello, G. Saetta, F. Bassetto,P. Brugger, B. Caputo, H. Muller, and M. Atzori, “Semi-automatictraining of an object recognition system in scene camera data using gazetracking and accelerometers,” in International Conference on ComputerVision Systems (ICVS), Jul. 2017.

[18] M. Cognolato, M. Atzori, and H. Muller, “Head-mountedeye gaze tracking devices: An overview of modern devicesand recent advances,” Journal of Rehabilitation and AssistiveTechnologies Engineering, vol. 5, Jun. 2018. [Online]. Available:http://journals.sagepub.com/doi/10.1177/2055668318773991

[19] M. Cognolato, M. Atzori, C. Marchesini, S. Marangon, D. Faccio,C. Tiengo, F. Bassetto, R. Gassert, N. Petrone, and H. Muller, “Multi-functional control and usage of a 3d printed robotic hand prosthesis withthe myo armband by hand amputees,” BioRxiv, 2018.

[20] M. Cognolato, M. Atzori, D. Faccio, C. Tiengo, F. Bassetto, R. Gassert,and H. Muller, “Hand gesture classification in transradial amputees usingthe myo armband classifier,” in 7th IEEE International Conference onBiomedical Robotics and Biomechatronics (Biorob), Aug. 2018, pp. 156– 161.

[21] M. Atzori, M. Cognolato, and H. Muller, “Deep learning with convo-lutional neural networks applied to electromyography data: A resourcefor the classification of movements for prosthetic hands,” Frontiers inNeurorobotics, vol. 10, 2016.

[22] M. Atzori, A. Gijsberts, C. Castellini, B. Caputo, A.-G. M. Hager,E. Simone, G. Giatsidis, F. Bassetto, and H. Muller, “Clinical parametereffect on the capability to control myoelectric robotic prosthetic hands,”Journal of Rehabilitation Research and Development, vol. 53, no. 3, pp.345–358, 2016.

17

[23] M. Atzori, A.-G. M. Hager, E. Simone, G. Giatsidis, F. Bassetto,and H. Muller, “Effects of prosthesis use on the capability to controlmyoelectric robotic prosthetic hands,” in 37th Annual InternationalConference of the IEEE Engineering in Medicine and Biology Society(EMBC), Aug. 2015.

[24] F. Palermo, M. Cognolato, A. Gijsberts, B. Caputo, H. Muller, andM. Atzori, “Analysis of the repeatability of grasp recognition for handrobotic prosthesis control based on semg data,” in IEEE InternationalConference on Rehabilitation Robotics, 2017.

[25] A. Scano, A. Chiavenna, L. M. Tosatti, H. Muller, and M. Atzori,“Muscle synergy analysis of a hand-grasp dataset: a limited subset ofmotor modules may underlie a large variety of grasps,” Frontiers inNeurorobotics, 2018.

[26] F. Stival, S. Michieletto, M. Cognolato, E. Pagello, H. Muller, andM. Atzori, “A quantitative taxonomy of human hand grasps,,” Journalof NeuroEngineering and Rehabilitation,, vol. 16, no. 28, 2019.

18

A comparison of different SIFT implementation forvision-guides prosthetic arms

Attila Fejer∗†, Zoltan Nagy†, Jenny Benois-Pineau∗, Peter Szolgay†, Aymar de Rugy‡, Jean-Philippe Domenger∗∗ Laboratoire Bordelais de Recherche en Informatique, University of Bordeaux, Bordeaux, France

Email: [email protected] , [email protected], [email protected]† Faculty of Information Technology and Bionics, Pazmany Peter Catholic University, Budapest, Hungary

Email: [email protected], [email protected], [email protected]‡ Institut de Neurosciences Cognitives et Intgratives d’Aquitaine, University of Bordeaux, Bordeaux, France

Email: [email protected]

Abstract—This paper compares the hardware implementationsof the SIFT algorithm. We implemented some time-consumingparts of the SIFT algorithm needed for the analysis in C/C++language on TUL PYNQ-Z2 FPGA board. This implementationallows for a low power consumption of the programmable logicpart of the system. The obtained value is 0.274W. Processingcapacity is 106 images per second on a small wearable sizedevice which allow for the real-time implementation of the wholeanalysis in the future.

Index Terms—FPGA, SIFT, prosthetic arm, computer vision,image processing

I. INTRODUCTION

The primary goal of our research is to develop a wearabledevice for control of robotic prosthetic arms. In this scenariothe amputee wears glasses with an eye-tracker [1] and a stereocamera system is fixed on the upper-limb prosthesis. In thewhole vision-based control algorithm requires detection ofcharacteristic Scale Invariant Feature Transform (SIFT) pointsfor matching of different camera views.

We have built and tested in software, a wearable computingframework. In the present research, we move to the imple-mentation of most computationally heavy parts of it on awearable device. Hence we have implemented an emulateddigital solution for SIFT detection and compare it to differentimplementations. Namely, the software, the emulated digitaland the hybrid solutions are compared.

The full software solution is the most flexible [2] but it isvery time-consuming.

The analog/hybrid solution previously proposed in [3] ex-hibits very low dissipated power. The most attractive feature ofthis computing paradigm is parallel processing. Thank this thevery high computing power in analog VLSI implementationsis achieved. But the size of the memory array is small [3]–[5].For the SIFT algorithm, the basic building block is a Gaussianpyramid computation [3]. In [3] the analog sensor/processorimplementation has an 88x60 processing elements and eachprocessing element has 4 photo-diodes. The computation unitis linked to the vision sensor unit of the camera. The visionsensor array has 176x120 pixels only implemented in 0.18m CMOS technology. This solution is satisfactory from acomputational point of view, but it is not flexible with regardto increasing video resolutions.

Many-core computers are the current approach to the so-lution of computationally intensive problems. A wider viewof the concept or many cores should include the idea ofmixing different kinds of resources and processors. In thiscase logic processors (Configurable logic blocks), arithmetic

processors (DSP blocks) and a general purpose processor (e.g.Microblaze) are in the same FPGA chip or are computingtogether, forming heterogeneous many-core systems. Our so-lution for wearable device follows this architecture. Usingthe cellular architecture of processor and memories new kindof parallel algorithms have to be developed. Computationalcomplexity, computing power is a multi-parameter vector, anyalgorithm solving a problem will have a speed-power-area-bandwidth-accuracy metric. These parameters and this metricshould be handled simultaneously and optimized. Furthermore,implementation of vision algorithms on FPGA with higherlevel languages like C or C++ is preferable as they are easierfor architecture synthesis than low-level languages such asVHDL or Verilog.

II. SIFT IMPLEMENTATION

We started our SIFT implementation from different C [6]and C++ codes [7]. The overall SIFT detector algorithm wassplit into two different parts: i)the Gaussian filtering of imagesand Difference of Gaussians (DOG) computation (GFDG),ii)the keypoint/extrema detector and filtering parts (EDF).Thus two types of units have been defined GFDG-unit andEDF-unit. We focus on the GFDG unit in the following.

A. GFDG computation unit

This unit computes the Gaussian-filtered images and theDifference of them, which is the first step of the SIFTcomputation.

In this implementation we use 3x3 2D Gaussian kernel.

B. GFDG computation module

This module computes the Gaussian filtered images andthe DoG in an octave. GFDG module contains a GFDGComputation Unit and delay arrays. The number of delayarrays depends on the scale in the octave.

The SIFT points are detected as extrema in three consecutiveDOGs in an octave each computed by one module. To reducecomputation time, Gaussian filtering, DOG and scale-spaceextrema (SSE) detection are all realised in one clock cycle andin parallel. The last Gaussian filtered image is downsampledand is fed into the input of the next octave. Processing is thenrealized in the next GFDG.

III. EXPERIMENTS AND RESULTS

Table I shows the resource usage of the complete systemwith the following parameters: input image size: 1920 × 1080,

19

TABLE IONE OCTAVE RESOURCE USAGES ON TUL PYNQ-Z2 FPGA BOARD

FROM VIVADO 2017.4

1; Resource # 2; Utilizationwith I\O

3; Utilizationwithout I\O 4; Available

LUT 25822 18873 53200LUTRAM 4402 3602 17400

FF 34176 28492 106400BRAM 11 11 140

DSP 141 141 220BUFG 1 0 32

TABLE IICOMPARE GAUSS OCTAVE CALCULATIONS IN DIFFERENT PLATFORMS

Intel XeonE-2146G

TULPYNQ-Z2

ARMCortex-A9

Xilinx UltraScale+ZCU104

CMOS VisionSensor

pixels / second 300 million 220 million 1.8 million 450 million 2.6 millionresolution 1920 x 1080 1920 x 1080 1920 x 1080 1920 x 1080 176 x 120

frames / second 145 106 0.86 217 125dissipated power 80 W 0.274 W 2.5 W 0.695 W 70 mW

precision: single-precision floating-point, 3 scales, 1 octave (3GFDG Units, 1 SSE maximum unit).

The generated hardware uses ∼36% of the all availableresources. There are enough resources to implement furtherSIFT steps and implement more algorithms on the TULPYNQ-Z2 FPGA board.

Table II shows how many frames can be processed in thegiven hardware in a second. The result shows FPGAs, Intelserver CPU and the CMOS Vision Sensor can process morethan 100 frames per second. However, energy consumption ofthe Intel server CPU is higher than the FPGAs and the CMOSVision Sensor which is not good in our case because it is notportable. The CMOS Vision Sensor solution only handle 176x 120 pixels images which is much smaller than the FPGAs.

The power dissipation of the current partial SIFT moduleis very low: 1.672W. The dissipation of the embedded ARMprocessor is 1.256W, while the FPGA programmable logicconsumes only 0.274W. The current partial SIFT module isrunning only on FPGA. According to [8] TUL PYNQ-Z2has maximal dissipation power of 2.5W. The CMOS VisionSensor has the lowest energy consumption, however, the inputresolution and the processing pixels numbers are lower thanthe FPGAs. Comparison of the power consumption, processingspeed and input image size for the different architecturesshows that the FPGA is the best choice for creating a wearabledevice.

The current implementation is running on 200MHz clockfrequency so it can process 106 full HD (1920 x 1080) imagesper second, which is higher than the current input video framerate. Therefore real-time processing is achievable.

IV. CONCLUSIONS AND FUTURE WORKS

The FPGA implementation of the key steps of the SIFTalgorithm was presented. The results show that a wearabledevice can be developed by using FPGA with 106 images persecond processing rate.

It shows there are enough resources to develop more SIFTsteps on the TUL PYNQ-Z2 FPGA board. But we also have aplan to use larger capacity FPGA boards such as the ZCU104.

The power consumption of the FPGA programmable logic is0.274W which in comparison to high-end CPU is almost 300times less. The FPGA board is a better option to acceleratethe algorithms in this project compared to CPUs. In the future,

we will implement the whole SIFT matching process on theFPGA.

The FPGA based solution is preferred because it has enoughcomputing power > 100 frames per second. FPGA based so-lutions also support higher image resolution which is requiredin our case. The FPGA dissipated power is enough low tomake a wearable device (<∼ 1W ).

V. ACKNOWLEDGEMENT

The support of the Szechenyi 2020 Program, of the Hu-man Resource Development Operational Program, and ofthe Program of Integrated Territorial Investments in Central-Hungary (project numbers: EFOP-3.6.2-16-2017-00013 AND3.6.3-vekop-16-2017-00002), and of the European Structuraland Investment funds, CNRS and Balaton PHC are gratefullyacknowledged.

REFERENCES

[1] I. Gonzlez-Daz, J. Benois-Pineau, J.-P. Domenger, D. Cattaert,and A. de Rugy, “Perceptually-guided deep neural networksfor ego-action prediction: Object grasping,” Pattern Recognition,vol. 88, pp. 223 – 235, 2019. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0031320318304011

[2] B. Buzasi, “Depth estimation for robotic prosthesis arm with object-to-grasprecognition from eye-tracker glasses, application to neuroprothesiscontrol,” Master’s thesis, UBx, PPCU, UAM, 7 2018.

[3] M. Surez, V. M. Brea, J. Fernndez-Berni, R. Carmona-Galn, D. Cabello,and A. Rodrguez-Vzquez, “Gaussian pyramid extraction with a cmos vi-sion sensor,” in 2014 14th International Workshop on Cellular NanoscaleNetworks and their Applications (CNNA), July 2014, pp. 1–2.

[4] A. Rodrıguez-Vazquez, R. Domınguez-Castro, F. Jimenez-Garrido,S. Morillas, A. Garcıa, C. Utrera, M. D. Pardo, J. Listan, and R. Romay,“A CMOS vision system on-chip with multi-core, cellular sensory-processing front-end,” in Cellular Nanoscale Sensory Wave Computing,T. Roska, C. Baatar, and W. Porod, Eds. Springer US, oct 2009, pp.129–146.

[5] Toshiba. (2019) Sos02. [Online]. Available: http://www.toshiba-teli.co.jp/en/products/industrial/sps/sps.htm

[6] R. Hess, “An open-source siftlibrary,” in Proceedings of the 18thACM International Conference on Multimedia, ser. MM ’10. NewYork, NY, USA: ACM, 2010, pp. 1493–1496. [Online]. Available:http://doi.acm.org/10.1145/1873951.1874256

[7] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools,2000.

[8] B. DSP, “Gpu vs fpga performance comparison,” Berten DSP, Tech. Rep.,2016.

20

Documents

JEDLIK LABORATORIES REPORTS FACULTY OF INFORMATI ON ... · [3] Iv án Gonz alez-D´ ´ az, Vincent Buso, Jenny Benois-Pineau, Guillaume Bourmaud, and R émi M égret, Modeling