8
Mobile control interface for modular robots Luc Girod Student: Luc Girod, Supervisor: Stéphane Bonardi, Massimo Vespignani, Professor: Auke Ijspeert June 6, 2014 Abstract The goal of this project was to create an augmented reality (AR) interface for modular robots called Roombots. Currently a 3D simulation of Roombots exists but it doesn’t use the capability of augmented reality to improve the user experience. We envision an application that allow the user to watch the simulated Roombots superimpressed to the real view from the camera, from any points of view by just moving in the environment as if there were real Roombots in the room. We considered many possibilities and we chose to use a SLAM-like algorithm because this doesn’t rely on any sensors or external beacons except the camera to works. I first begin to implement such tool and finally decided to modify an augmented reality software from Oxford called PTAM (Parallel Tracking and Mapping) to match our needs. 1 Introduction In this project we would like to create an applica- tion allowing to display an augmented reality (AR) simulation of modular robots called Roombots 1 in their real environment. Roombots are small mod- ular robots which can attached themself together like building blocks to create self-assemble and self- reconfigurable furnitures. The solution should be able to runs this simulation as an augmented scene which stay in place regardless of the user position and angle of view. There are many already de- veloped solutions, but we choose a system which doesn’t need extra devices or previous knowledge of the environment. I first begin to create such system on my own, but we found an open source program that we can modify to meets our requirements. Figure 1: This is two Roombots connected together. (http://biorob.epfl.ch/roombots) 1 http://biorob.epfl.ch/ Figure 2: This is a 3D scene of Roombot with the non AR application.) 2 Considered solutions Among the considered solutions, the first was to rec- ognize an already known pattern which indicates our orientation and, by knowing the camera projection, the distance to the pattern can be known. It’s also possible to get this distance with a depth sensor 2 such as a Kinect 3 . The problem of this method is the need of a pattern which must be in the field of view. Using sensors on the camera was an idea but the error of the position and orientation increases with each movement of the user. We decided to use a SLAM-like 4 (SLAM stands for simultaneous localization and mapping) algorithm. It’s a set of 2 A depth sensor is a device that give informations about the distances of each object in the view 3 http://www.microsoft.com/en-us/ kinectforwindows/ 4 https://en.wikipedia.org/wiki/Simultaneous_ localization_and_mapping 1

Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

Embed Size (px)

Citation preview

Page 1: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

Mobile control interface for modular robots

Luc GirodStudent: Luc Girod, Supervisor: Stéphane Bonardi, Massimo Vespignani, Professor: Auke Ijspeert

June 6, 2014

Abstract

The goal of this project was to create an augmented reality (AR) interface for modular robots calledRoombots. Currently a 3D simulation of Roombots exists but it doesn’t use the capability of augmentedreality to improve the user experience. We envision an application that allow the user to watch thesimulated Roombots superimpressed to the real view from the camera, from any points of view by justmoving in the environment as if there were real Roombots in the room. We considered many possibilitiesand we chose to use a SLAM-like algorithm because this doesn’t rely on any sensors or external beaconsexcept the camera to works. I first begin to implement such tool and finally decided to modify anaugmented reality software from Oxford called PTAM (Parallel Tracking and Mapping) to match ourneeds.

1 Introduction

In this project we would like to create an applica-tion allowing to display an augmented reality (AR)simulation of modular robots called Roombots 1 intheir real environment. Roombots are small mod-ular robots which can attached themself togetherlike building blocks to create self-assemble and self-reconfigurable furnitures. The solution should beable to runs this simulation as an augmented scenewhich stay in place regardless of the user positionand angle of view. There are many already de-veloped solutions, but we choose a system whichdoesn’t need extra devices or previous knowledge ofthe environment. I first begin to create such systemon my own, but we found an open source programthat we can modify to meets our requirements.

Figure 1: This is two Roombots connected together.(http://biorob.epfl.ch/roombots)

1http://biorob.epfl.ch/

Figure 2: This is a 3D scene of Roombot with thenon AR application.)

2 Considered solutionsAmong the considered solutions, the first was to rec-ognize an already known pattern which indicates ourorientation and, by knowing the camera projection,the distance to the pattern can be known. It’s alsopossible to get this distance with a depth sensor2

such as a Kinect3. The problem of this method isthe need of a pattern which must be in the field ofview. Using sensors on the camera was an idea butthe error of the position and orientation increaseswith each movement of the user. We decided touse a SLAM-like 4 (SLAM stands for simultaneouslocalization and mapping) algorithm. It’s a set of

2A depth sensor is a device that give informations aboutthe distances of each object in the view

3http://www.microsoft.com/en-us/

kinectforwindows/4https://en.wikipedia.org/wiki/Simultaneous_

localization_and_mapping

1

Page 2: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

techniques to estimate the localization and to re-construct the environment in 3D while the camerais moving. The advantage of such algorithm is that itcreates itself a mapping of the room without mark-ers. It doesn’t need any external sensors and noprevious knowledge of the environment is needed.While developing an augmented reality API, wefound a good solution from Oxford 5 university calledPTAM 6 for (Parallel tracking and mapping) whichI tested and modified to match our needs.

3 First work

3.1 Features detection

In all SLAM algorithms we need to detect points ofinterest7 in the image. Detecting points of interestis a kind of features detection 8. Features detectionrefers to the techniques used to detect interestingportion in a picture like edges, corners or curves.A point of interest is a point in the image which isstable under image perturbations such as those thatoccur when the camera moves. There are multipleways to detect such point: The most well knownalgorithms are SUSAN9, Harris10, Shi-Thomasi11. Ichoose to use the Harris corner detector because thisalgorithm is easy to implement and there are a lotof examples of implementations. To implement thisHarris detector, we aquire an image from the cameraand then use OpenGL to draw this image as a tex-ture that fill the window on which the scene is drawn.Then, in a shader12 for each pixel its neighbor pixelsare read and a value is computed with the Harrisdetector and written to the frame-buffer13. The re-sult is then saved to a new texture. Once the Harrisdetector has created a new texture with corner inwhite, the program loop to each pixel of this newpicture and add point where there is a white spot(white pixels represent high values and black pixelrepresent low values).

3.2 Motion blur

A big problem to handle is the motion blur. The mo-tion blur is the integral of the picture during the timeof exposure. As for all camera this time is not null,

5http://www.ox.ac.uk/6http://www.robots.ox.ac.uk/~gk/PTAM/7https://en.wikipedia.org/wiki/Interest_point_

detection8https://en.wikipedia.org/wiki/Feature_detection9http://users.fmrib.ox.ac.uk/~steve/susan/

10www.bmva.org/bmvc/1988/avc-88-023.pdf11www.ai.mit.edu/courses/6.891/handouts/

shi94good.pdf12a shader is a program that run in the graphic card aka

GPU13A place where OpenGL draw

when an object moves, its color will be distributedalong the direction of its motion as well as whenthe camera moves. This effect causes the disappear-ance of many features because this will decrease thevalue of certain derivatives which the features de-tector needs. A motion blur can be represented bya function that spread a points on the picture akaPSF (Point spread function). Such a function canbe defined by a convolution.

Figure 3: This picture shows the effect ofa convolution with a point spread functionon a picture with an object. (adapted fromhttps: // fr. wikipedia. org/ wiki/ Fichier:

Convolution_ Illustrated_ eng. png )

4 Harris detectorA Harris14 detector detects corners by making theassumption that at a border, there is a strong changein intensity in colors and then that if a strong changeoccurs in two directions there is a corner. Mathe-matically if we see the picture as a function of (x, y)a change in intensity in one direction represents thederivative in this direction. These derivatives aresimply calculated by taking the differences betweenthe pixels next to the current point. The computa-tion of the derivative at the red pixel is then

Figure 4: This picture represents a shema of a pixeland its neightbors (a, b, c, d, e, f, g, h) to illustratehow the derivative is computeddx = 2 · (Ie − Id) + Ih − Ia + Ic − Ifdx = 2 · (Ib − Iq) + Ih − Ia + Ic − If

14www.bmva.org/bmvc/1988/avc-88-023.pdf

2

Page 3: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

Where I represents the intensity of the pixel anddx, dy the derivative in x and y

To detect two large derivatives in different di-rections we construct the following matrix withthe derivatives all around the current point.

A =∑

x,y w(x, y)

[d2x dxdydydx d2y

]Where w(y, x) is a

weighting function. If the eigenvalues are large pos-itive, then the point is probaly a corner. The Harrisdetector determines if this is the case by computingdet(A)− ktrace(A)2

Figure 5: This picture is the result image of the Har-ris detector. The intersection of a chessboard areclearly shown by a white spot.

In the GLSL15(GLSL is OpenGL Shading Lan-guage) part which runs on the GPU, the matrix Ais constructed like this:

mat2 m = mat2 ( 0 ) ;f o r ( i n t i = 0 ; i < 8 ; i++){

m[ 0 ] [ 0 ] += gx [ i ] ∗ gx [ i ] ∗ gw [ i ] ;m[ 1 ] [ 1 ] += gy [ i ] ∗ gy [ i ] ∗ gw [ i ] ;m[ 0 ] [ 1 ] += gx [ i ] ∗ gy [ i ] ∗ gw [ i ] ;

}m[ 1 ] [ 0 ] = m[ 0 ] [ 1 ] ;

where gx[i], gy[i], gw[i] represents respectively thederivative in x, the derivative in y and the weightwhich is computed with a Gaussian16 of the pixelposition relative to the sub-image kernel17 and anexperimentally fixed value.

5 Handling motion blurSince the time of exposure of a camera isn’t instan-taneous, the movement of the camera is integratedin the picture. This creates a blur. A trivial solu-tion to avoid this problem is to decrease this time of

15https://www.opengl.org/documentation/glsl/16http://en.wikipedia.org/wiki/Gaussian_function17http://en.wikipedia.org/wiki/Kernel_(image_

processing)

exposure. But it’s not often possible and this timecan’t be null. Another solution is to do a blind-deconvolution. A blind-deconvolution is the processto invert the effect of a convolution without priorknowledges of the convolution. (Here we want toinvert the effect created by the motion blur). Butthere are faster methods. Sharpening the image tomake corners more visible was tested but this pro-cess adds noise. If the sharpening is strong enoughto avoid the blur on the corners, there are plenty ofpoints that are added because of the noise and thesepoints wouldn’t be stable as the noise change a lotbetween each frames. An other option is to deter-mine the average direction of the motion blur andits intensity and then to compensate this move.

According to the paper by Xiaogang Chen18 theresult of the motion blur is that a high-frequencydecreases on the motion direction. On a picture, thehigh-frequency means pixels value that are rapidlychanging in space. This is caused by the blending ofcolors over the blur direction. The authors of thispaper indicate that the general motion blur directionis where the squared derivative is the smaller. Fol-lowing the previously mentioned paper, the deriva-tive of an image at the direction k degree from thehorizontal plane is computed like this[

dx dy] [cos(k)

sin(k)

]Then on a kernel of size M × N the function

of the squared directional derivatives is J(k) =∑Nx=1

∑Ny=1

([dx dy

] [cos(k)sin(k)

])2

On the blur direction J(k)will be minimum. Inmy implementation, I use a kernel in a picture ofsize 32x32 pixels.

Figure 6: This picture represents the direction of themotion blur at each sub-image kernel of 32×32 pixelby a small red line.

18www.pami.sjtu.edu.cn/people/xgchen/chen_

ICIP2010.pdf

3

Page 4: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

6 How SLAM algorithms worksIn all the SLAM-like algorithm the first part is theinitialization. The first thing is to detect featuresthat can be tracked from different points of view(Harris, Shi-Thomasi). Then these features mustbe detected from another point of view. Once wehave multiple pictures with each their set of features(which are shared but appear at different positiondue to the different place of the camera) they areassociated together such that we know that a givenfeature in a picture corresponds to the same worldposition on another image. For this we use differentkinds of features descriptors19. In PTAM for eachpoints of interest detected, a small patch of size 8x8in the picture is stored and transformed with the es-timated camera transformation before they are com-pared. Once there is a set of points that matches thepoints on the other image, it is then possible to findtheir position in the world.

Figure 7: This picture is a scheme of the matchingpoints between two frames taken from different pointsof view where the transformation between each posi-tion is represented by the matrix T

If we know the transformation P that projects apoint in the world to a point in the picture from thecamera and the transformation T which describe thederivative of the displacement of the camera betweentwo images, then we have a system of equations. Thetransformation that projects a point from the worldto a point in the image aquired by the camera canbe know by a calibration of the camera.P (x, y, z)T = (x1, y1)

T

P · T (x, y, z)T = (x2, y2)T

From which we can know the position x, y, z ofthe point relative to the camera (stereo technique).Then a plane is computed and the position of the 3Dpoints are computed relatively to the plane. Oncethe initialization is done, the position of the cam-era relative to the virtual plane is stored as the po-sition of the point. In PTAM the structure thatstores these information is called a key-frame. Thenthe displacement of the camera is estimated usingan algorithm to find the camera transformation be-tween two camera poses (usually 5-points algorithm

19www.cs.toronto.edu/~kyros/courses/2503/

Handouts/features.pdf

or 8-points algorithm20). The error is then correctedusing the previous generated key-frames. New key-frames are added once the precision is above a de-fined threshold using an heuristic (number of pointsused for tracking for example). Adding new key-frame is done in the mapping part while finding in-terest point and finding their position is done in thetracking part. Generally both works together in oneprocess but in PTAM they run in parallel in two dif-ferent threads. The points of interest on each key-frames are then reprojected and all key-frames arereajusted.

Figure 8: This is a schema that represents thekeyframes K and their associated set of points

The position of the camera is not perfect andneed to be adjusted prior to previous key-frames.In PTAM this process is called Bundle adjustment.This is where PTAM use the second order minimiza-tion. When the camera don’t view to any of theknown environment this error will always be highand this is exactly what we use to determine if thescene must be drawn or not.

Figure 9: This schema represents the correctionbetween the estimated transformation and the realtransformation.

This process continues to track point and to up-date the camera position and to minimize the erroras the program runs.

20https://en.wikipedia.org/wiki/Eight-point_

algorithm

4

Page 5: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

7 PTAM

PTAM21 is an augmented reality software developedat Oxford22 university. It come with an example ofa 3D augmented scene.

7.1 Pros/Cons

To have an idea about how it matches our needs, Itested it in a regular room with a low-end Logitech23

camera C170 on a regular laptop. I tried differentenvironments, watching on a table with nothing onit, on a table with many objects, on the ground andfor each environment I move the camera to manydifferent orientations. I tried to hide the objectiveof the camera and to orient the camera to anotherplace and notice how PTAM handles these cases.Another test is to view the augmented scene from avery close distance to the scene and then from a veryfar distance. What we observed is that PTAM canhave some difficulties to derive a precise estimationof the distances. When the scene is watched for adistance of less than 10-15 cm, the scene will notstay where it should be and will jump at differentplaces in the field of view. In the other case until adistance of 3m the scene is drawn without problemas long as the detected features are still apparentsin the picture taken from the camera. A good pointis that when PTAM lost its "view" it is capable torecover the landmarks instantly when the camera iswatching back to the scenery.

PTAM needs to detect features in the room, thenthe scene will move between multiples position in acompletely empty room or if there is not so muchcontrast.

An other problem is that PTAM was created torun in small area. This means that the points ofinterests lie only in a small area and once it havea sufficient number of these, it would not add anymore new points. A solution to this problem is to usemultiples maps or to increase the number of pointsthat can added. There is an implementation calledPTAMM24 which is a modification of PTAM usingmultiple maps.

As PTAM uses only a single camera it can’thave a precise information about the size of the ob-jects. The factor that can have an effect on thisis the displacement of the camera during the cre-ation of the map by stereo at the initialization phase.

21http://www.robots.ox.ac.uk/~gk/PTAM/22http://www.ox.ac.uk/23http://www.logitech.com/de-ch/home24http://www.robots.ox.ac.uk/~bob/research/

research_ptamm.html

Figure 10: This is a screenshot of PTAM during theinitialization phase.

The last problem is the scene which isdrawn even when the camera doesn’t point tothe known map. For example the user iswatching toward the augmented scene and thenhe watches behind himself and see the scenehe was seeing before appears from nowhere.

Figure 11: This picture is taken when the camera iswatching toward the scene.

Figure 12: This picture is taken when the cameraisn’t watching to the scene anymore.

5

Page 6: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

Multiples solutions were considered to tackle thisproblem. It is possible to detect the angle ofview with the matrix of transformation provided byPTAM and then, when we know we are out of thescene we search for a pattern and until this patternis found we do not draw the scenery. Or we canuse external sensors to acquire external informationabout where we are watching. Note that it is notpossible to simply relies on angles to determine ifthe scene is visible because the transformation aredetermined with the key-frames and a matrix is cre-ated from an iterative algorithm. The solution takenis to draw only if the score returned by the ESMalgorithm is high enough. ESM stands for EfficientSecond order Minimization which is an algorithm forhomography transformation. This algorithm min-imize the sum-of-square differences (SSD) betweena reference template and the current template. InPTAM it’s used to estimate the transformation pa-rameters. To determine at which threshold we candraw the scene, I just printed the score obtainedby the ESM and tested experimentally what valueswere obtained when facing toward the scene and to-ward other direction. This give good results, sincethe score varies with a factor 10 between both cases.

8 PTAM design

PTAM is programmed in C++ and relies on a li-brary called libCVD also from Oxford. libCVD25 isa library for computer vision but unlike OpenCV26

it is designed to be small and fast instead of beeing acomplete framework. PTAM has two threads, one isdoing the tracking part while the other is updatingthe map.

While the tracking system continues to detect fea-tures, the mapping add new keyframes. Integratenew points when they can be associated between thekeyframes and performs adjustment over the exist-ing keyframes.

25http://www.edwardrosten.com/cvd/cvd/html/26http://opencv.org/

Figure 13: This is the shema that we canfind in the paper of the developers of PTAM(http: // www. robots. ox. ac. uk/ ~gk/publications/ KleinMurray2007ISMAR. pdf ).It shows clearly how the mapping work.

9 Adaptation of PTAM for a customscene

PTAM already provide a simple scene EyeGame butit’s possible to change this scene for another. Theprovided scene has its own class. This scene is in-stancied in ARDriver and all of its functions arecalled here. It is then easy to rewrite a new classthat use OpenGL to draw our own AR scene. I cre-ated a small system that can be easily used to createa Roombots interface. I added a mesh27 loader toload a Roombot mesh component. The mesh loaderload mesh in the Obj28 format.

27a mesh describe a 3D object with 3D vectors and texturescoordinates to be applied to it

28http://www.martinreddy.net/gfx/3d/OBJ.spec

6

Page 7: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

AppendicesDetails on the PTAM source code

ARDriver

• Draws a plane.

• Creates an instance of the AR scene, calls itsdrawing function with the camera inverse trans-formation as argument.

ATANCamera

• Calibrates the camera and computes the pro-jection matrix.

Bundle

• Manages Keyframes.

• Minimizes the error in the transformation ma-trix between key-frames.

CalibCornerPatch, CalibImage

• Detects chess corner for the calibration phase.

CameraCalibrator

• It is the tool which permits the user to calibratethe camera.

EyeGame

• The provided initial AR scene.

GLWindow2

• HUD, input events.

GLWindowMenu

• Display application menus.

HomographyInit

• Compute an approximation of the cameratransformation between two points of view.

Keyframe

• Creation of new Keyframes.

• Store set of points associated with a camera po-sition.

Map

• Store MapPoint and delete bad points.

• Store Keyframes.

Map Maker

• Create the initial map with stereo initialization.

• Recompute keyframes’s transformation matrixto correct the errors.

• Generate new keyframes.

• Look for shared points between keyframes toadd them in the current map.

MapPoint

• Store a world position of a point.

• Store a pointer to the position of the imagepatch where the point appear.

Map Viewer

• Simply draw a map in view map mode.

MEstimator

• Implementation of a MEstimator 29.

MiniPatch

• It is a small portion of the image.

PatchFinder

• It is a features-descriptor used to compare a newpoints with previous one.

Relocaliser

• Estimate a camera rotation.

Shi Thomasi

• Detect corner with a modified harris detector.

SmallBLurryImage

• This create a small copy of the image for akeyframe. Use by the relocalizer.

SmallMatrixOpts

• Tools for basic opersations on matrix such asmatrix inverse.

System

• Grab the videos, launch threads and main loop.

Tracker

• Manage all the tracking procedure.

VideoSource29https://research.microsoft.com/en-us/um/people/

zhang/INRIA/Publis/Tutorial-Estim/node24.html

7

Page 8: Mobile control interface for modular robots - Biorobotics · PDF fileMobile control interface for modular robots Luc Girod Student: ... We envision an application that allow the user

• Open the video device and provide video source.

Image acquisitionIn my own implementation the image from the

camera is taken with Video4Linux in the pixel for-mat yuv422. This format store 4 byte per 2 pixels.Gray-scale information for each pixel and color infor-mation for the pair. The image is then converted toRGB for display and to 1 byte gray-scale for track-ing. Note that PTAM and many others computervision systems do the same.

8