10
40. SBAI - Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999 APPLICATIONS OF CELLULAR NEURAL NETWORKS TO IMAGE UNDERSTANDING Mariofanna Milanova Paulo E. M. Almeida Marcelo Godoy Simões 181 - Department of Mechanical Engineering - Escola Politécnica - USP Av. Prof. Mello Moraes, 2231 - Cid. Universitária - CEP : 05.508-900 - SP - SP - BRASIL e-mails- [email protected], [email protected] FAX: +55-(0)11-813-1886 181181- Department of Research and Graduate / CEFET-MG Av-.Amazonas, 7675 - Nova Gameleira - CEP : 30.510-000 - BH - MG - BRASIL e-mail>- [email protected] FAX : +55-(0)31-319-5212 ABSTRACT The Cellular Neural Networks (CNN) model is now a paradigm of ceIlular analog programmable multidimensional processor array with distributed local logic and memory. CNNs consist of many paraIlel analogue processors computing in real time. One desirable feature is that these processors arranged in a two dimensional grid, only have local connections, which lend themselves easily to VLSI implementations. The connections between these processors are determined by a cIoning template, which describes the strength of nearest-neighbour interconnections. The cloning templates are space- invariant, meaning that a11 the processors have the same relative connections. In this paper first we describe the architecture of CNN. Next, a new application of CNN using them for the 3D scene analysis is studied. . ·KEYWORDS CeIlular neural networks, associative memory, object recognition, image sequences, optical tlow. 1 INTRODUCTION The contrast between artificial and natural vision systems is due to the inherent parallelism and continuous time and signal values of the latter. In particular, the ceJIs of the natural retina combine photo transduction and colIective parallel processing for the realization of low-level image processing operations (feature extraction, motion analysis, etc.) concurrently with lhe acquisition of lhe ímage. Having coJIected of 19 spatio-temporal information from the imagery, there exist spatial representations of this information that allow us to extract parameters necessary for 3D object recognition. The CeJIular Neural Network paradigm is considered as a unifying model for spatio-temporal properties of thevisual system [1],[2]. . This paper is organised a foJIows. In Section n we review briefiy lhe architecture of CNN. In Section li lhe applications of CNN for low-Ievel image processing is presented. In Section IV design approach of CNNs for associative mernories is implemented. Designing in an CNN can be done in a systematic manner by a synthesis procedure which stores ali desired memory pattems as reachable memory vectors. In Section V we show a new application of CNNs, when using them for lhe recognition of optical flow field characteristics. Experimental results are presented in Section VI. 2 ARCHITECTURE OF CELLULAR NEURAL NETWORKS . . CeJIular Neural Networks (CNN) and the CNN universal machine (CNN -UM) were invented in 1988 and 1992, respectively (1)-[3) .The most general defrnition of such networks is that they are arrays of identical dynamical systems, the cells, that are only 10caJIy connected 2. In lhe original Chua and Yang model each cell is a one-dimensional dynamical system. It is lhe basic unit of a CNN. Any ceJI is connected only to its neighbour cells, i.e. adjacent cells interact directly with each other. Cells not in lhe immediate neighbourhood have indirect effect because of lhe propagation effects of the dynamics of the

APPLICATIONS OF CELLULAR NEURAL … · U , and output y . equ ivalent block diagram a continuous is ShOWIl Figure 1. - ... Hopfield and Tank have shown that netwo rks of nonlinear

Embed Size (px)

Citation preview

40. SBAI - Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

APPLICATIONS OF CELLULAR NEURAL NETWORKSTO IMAGE UNDERSTANDING

Mariofanna Milanova Paulo E. M. Almeida Marcelo Godoy Simões

181 - Department ofMechanical Engineering - Escola Politécnica - USPAv. Prof. Mello Moraes, 2231 - Cid. Universitária - CEP : 05.508-900 - SP - SP - BRASILe-mails - [email protected], [email protected] FAX: +55-(0)11-813-1886

181181- Department of Research and Graduate / CEFET-MGAv-.Amazonas, 7675 - Nova Gameleira - CEP : 30.510-000 - BH - MG - BRASILe-mail>- [email protected] FAX : +55-(0)31-319-5212

ABSTRACTThe Cellular Neural Networks (CNN) model is

now a paradigm of ceIlular analog programmablemultidimensional processor array with distributedlocal logic and memory. CNNs consist of manyparaIlel analogue processors computing in real time.One desirable feature is that these processors arrangedin a two dimensional grid, only have localconnections, which lend themselves easily to VLSIimplementations. The connections between theseprocessors are determined by a cIoning template,which describes the strength of nearest-neighbourinterconnections. The cloning templates are space-invariant, meaning that a11 the processors have thesame relative connections.

In this paper first we describe the architectureof CNN. Next, a new application of CNN using themfor the 3D scene analysis is studied. .

·KEYWORDSCeIlular neural networks, associative memory, objectrecognition, image sequences, optical tlow.

1 INTRODUCTIONThe contrast between artificial and natural

vision systems is due to the inherent parallelism andcontinuous time and signal values of the latter. Inparticular, the ceJIs of the natural retina combine phototransduction and colIective parallel processing for therealization of low-level image processing operations(feature extraction, motion analysis , etc.) concurrentlywith lhe acquisition of lhe ímage. Having coJIected of

19

spatio-temporal information from the imagery, thereexist spatial representations of this information thatallow us to extract parameters necessary for 3D objectrecognition. The CeJIular Neural Network paradigm isconsidered as a unifying model for spatio-temporalproperties of thevisual system [1],[2].

. This paper is organised a foJIows. In Section nwe review briefiy lhe architecture of CNN. In Sectionli lhe applications of CNN for low-Ievel imageprocessing is presented . In Section IV design approachof CNNs for associative mernories is implemented.Designing in an CNN can be done in a systematicmanner by a synthesis procedure which stores alidesired memory pattems as reachable memoryvectors. In Section V we show a new application ofCNNs, when using them for lhe recognition of opticalflow field characteristics. Experimental results arepresented in Section VI.

2 ARCHITECTUREOF CELLULARNEURAL NETWORKS. . CeJIular Neural Networks (CNN) and the CNN

universal machine (CNN -UM) were invented in 1988and 1992, respectively (1)-[3) .The most generaldefrnition of such networks is that they are arrays ofidentical dynamical systems, the cells, that are only10caJIy connected 2. In lhe original Chua and Yangmodel each cell is a one-dimensional dynamicalsystem. It is lhe basic unit of a CNN. Any ceJI isconnected only to its neighbour cells, i.e. adjacentcells interact directly with each other. Cells not in lheimmediate neighbourhood have indirect effect becauseof lhe propagation effects of the dynamics of the

40. SBAI- Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

network. The cell located in the position (iJ) of a two-dimensional M x N array is denoted by Cij, and its r-neighbourhood N rij is defined by

»:ij={Cu Imax{lk-il,ll-jl} S; r; 1S; k S;M, 1S; I S;N}(1)

where lhe size of the neighbourhood r is a positiveinteger number.

Each cell has a state x, a constant external inputU, and output y. The equivalent block diagram of acontinuous time ceIl is ShOWIl in Figure 1. The first -order non-linear differential equation defining lhedynamics of a cellular neural network can be writtenas follows :

d x ..(t) 1C-lJ_=-- x ".(t)+ LA(i,j;k,l) Ykl(t)

õt R IJ CMEN',

. + L B(i,j;k,l)ukl+!CueN 'ij

y .. (t) =.!. (Ix ..(t) +li-Ix..(t)-11)lJ 2 IJ lJ

(2)

where xij is the state of ceIl Cjj , XjiO) is lhe initialcondition of lhe cell , C and R conform the integrationtime constant of the system , and I is an independentbias constant.

From [2] Yij(t) = f(x;j(t», where f can be anyconvenient non-linear function.

'----L_.... Y'J

Inputs

Figure 1 : Block diagram ofone ceIl.

The matrices A( .) and B(.) are known ascloning templates. A(.) acts on lhe output ofneighbouring ceIls and is referred to as lhe feedbackoperator. B(.) in turn affects lhe input control and isreferred to as lhe control operator. Of cause, A( .) andB(.) are application dependent. A constant bias I andlhe cloning templates determine lhe transientbehaviour of the cellular non-linear network. (Ingeneral, the cloniri.gtemplates do not have to be spaceinvariant, they can be, but it is not a necessity) Asignificant feature of CNN is that it has twoindependent input capabilities: lhe generic input andlhe initial state of the cells. Normally they arebounded by:

IUij (t)I s 1 and IXij (0)1 1

Similarly, if S;1 lhen!Yy(t)! S;1·

20

When used as an array processing device, lheCNN perforrns a mapping

x..(O)}I} Yij(t)Uij(t)

where F is a function of lhe cloning template (A, B, I).The functionality of the CNN array can be

controlled by the cIoning template A, B, I, where in2D ceIlular neural network A and B are (2r+ 1) x(2r+1)real matrices and I is a scalar number. In manyapplications A(ij ;k,l) and B(i,j ;k,l) are spaeeinvariant. If A(ij;k,l) = A(k,l;i,j), then the CNN iscalled symmetrical or reciproca!.

There are two main cases: continuous -time(CT-CNN) and discrete-time (DT-CNN) cellularneural networks. The equations for each cell of a DT-CNNare

xt)= LA(i,j;k,l)yk/k)+ L B(i,j;k,l)ukl+lijCu eN ' , Ctle N rlj

Yit)=f( xij(k -1»f(x) =sgn(x )

(3)

A special cIass of two-dimensionaI cellularneural networks is described by ordinary differentialequations of lhe form (see 2, [5]).

dx..(t)lJ L--=-a..x ..(t)+ T.. kl sat(xkl(t )+I..dt lJ lJ IJ, lJ

y ..(t) = sat(x..(t»IJ lJ(4)

where 1 S; i S;M, 1 S;j S;N, aij=lIRC > O, and Xij andYij are the states and lhe outputs of lhe network,respectively.

{

1.x;jsat (.x;}= O if -1<.x;j <1

-1 x .<-1IJ-

and satO represents lhe activation function. .We consider zero inputs (uij ==°for alI i and j)

and a constant bias vector I = [Ill ,!12,.... ,!MNf.Under these circumstances, we will refer to (4) as azero-input non-symmetric cellular neural networkwhere lhe n neurones are arranged in a M xN array ( ifn = M x N) and lhe interconnection structure isconfined to local neighbourhoods of radius r.

System (4) is a variant of the ànalog Hopfieldmodel with activation function sat(.), perhaps lhe bestknown of lhe associative neural network memories.The Hopfield networks , in general, are completelyconnected. Therefore; ·lhe number of eonnectionsseales as the square of lhe number of units . Thispresents a serious problem in lhe VLSIimplementation of large networks. This liinitation hasbeen overcome by adopting both CNN models -

40. SBAI - Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

continuous (CNNs) and discrete-time (DTCNNs)models - as associative memories [4], [5], [6].

3 CELLULAR NEURAL NETWORKFOR LOW-LEVEL IMAGEPROCESSINGThe most popular application for CNN has

been in image processing, essentially because of theiranalog feature and sparse connections, which areconductive to real-time proce ssing [1], [8], [9].

A two dimensional CNN can be viewed as aparallel non-linear two-dimensional filter and havealready been applied for noise removal, shapeextraction, edge detection, etc.

Let us first approximate the differentialequation (2) by a difference equation. Let t =nh,where h is a constant time step, and approximate theder ivative of xit) by its corresponding differenceform [1]

I)h)-x,/nh)]=_..!.x,/ lIh)+ LA(i, j ;k,l )y,,(lIh)+h R CCk./)s N,a,j )

LB(i,j;k,l)J" + I ;jC(k./>cN, Ci,j)

(5a)andYi/ llh) =O.5(!Xij (lIh)+1H xij ( lIh) - I I) = !(xij (nh»

(5b)Let

l;j= 'L,B(i,j;k,l ),tk/+/ l$i$M; is is»C(kJ )EN,li.j )

(5c)Wc can recast (5a) and (5b) into lhe form

Xi/li + I) = Xij (l1 ) + I,A(i,j; k,I)!(x" (11» + lij]C R C{ t .l}4< N,(i.i )

(6)Equation (6) can be interpreted as a two-

dimensional filter for transforming an image ,represented by x(n), into another one, represented byx(n+ 1). The filter is non-linear because f(xkl(n» in(6)is a nonlinear function. Usually, the filter is spaceinvariant for image processing .The property of thefilter is determined by the parametcrs in (6). To find aset of parameters ( coefficients, synaptic weights) sothat a network performs according to a given task isone of the problems in CNN. For the one-step filter in(6) , the pixel values , Xij(n+1) of an image aredetermined directly from the pixel values , Xij(n) in thecorresponding neighborhood Nr(ij) (3 x 3).

Therefore, a one-step filter can only make useof the local properties of images. When the globalproperties of an image is important lhe above one-stepfilter can be iterated n times to extract additionalglobal information from the image. Well know

21

property of an iterati ve filter is the so-calledpropagation property. This property can be observedby substituting xij(n) in (6) iteratively down to Xij(O),which coincides with the input image.

X/li) = L g "ijk/ ( x ,,(O» 1s i s M ; 1s j s NC( kJ)EN, (i .j)

(7)Therefore, the propagation property of iterative

filters makes it possible to extract some globalfeatures in images . The image at time t depends on theinitial image Xij(O) and the dynamic rules of thecellular neural network. Therefore, we can use acellular neural nctwork to obtain a dynamic transformof an initiaI image at any time t.

The template coefficient (weights) of a CNNwhich will give a desired per formance can either befound by design, leaming or mapping. In this paper,one application of global learn ing approach and oneapplic ation of design approach are pres ented. Weinvestigated var ious visual task using global learningapproach for CNN. Ali variants of global learnin galgorithms are based on the idea that an cost func tionis defined which measu res how well the network mapsa set of input images onto the desired output images Insection IV we also show design approach for CNNassociative memory. Design with CNNs [10] can havemany different faces , e.g. programming the networklo have some desired fixed points or to evolve along aprescribed trajectory from a given initial condition tosome desired fixed point under con troI of a giveninput or specifyin g some desired local dynamics.

Many visual tasks are related to visualreconstruction and can be cast as optimizationproblems. Examples are shape frorn shading, edgedetection, motion analysis, structure from motion, andsurface interpolation. These ill-posed, inverse problemyield a solution through mínimizatíon techniques.From Poggio, 1985 [11] (Tablel), we see that variousearly and intermcdiate levei computer vision tasks areobta ined by energy minimization of variousfunctionals. As shown by Koch [12], quadraticvariational problems can be solved by linear, analogelectri cal, or chemical networks using regularizationtechniques, additi ve models , Markov random field(MRF) . However, quadratic variational principieshave Iimitation s. The main problem is the degree ofsmoothness required for the unknown function that isto be recovered. For instance the surfaceinterpolation scheme outlined above smoothes overedges and thus fails to detect discontinuities.

Hopfield and Tank have shown that networks ofnonlinear analog "neurons" can be effective incomputing the solution of optimizationproblems.(traveling salesman problem TSP, stereornatching problem [13], [14]. As shown by Bose andLiang [15], CNN is an analog Hopfield network inwhich the connections are limited to units in local

40. SBAI- Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

Table 1: Problems and the Corresponding Functionals

Problem Regularization principie

Edge detection J[(Sf-iY +JJ..tx]] dx

Area based Optical flow f[(i."u+iyv+i,Y +Lly+V,2 +v/)] dxdy

Contour based Optical flow J[(V.N-VNj +{Surface reconstructlon J[(S. f - d2+ÀVxx +2fx/ + f y/ ))] dxdySpatio-temporal approximation f [(S.f -i? +À(Vf.v + fi?] dxdydt

Color 111 y - Ax W+ À Ilpz WShape from shading j[(E - R(f,g)Y + Â(rx2+ f/ +g/ + g/)] dxdy

Sfereo f { +À{Vd)2} dxd;

Confours f E snake (v ( s )) ds

neighborhood of individual units with bi-directionalsignal paths .

Hopfield's idea was to solve combinatorialoptimization problems by allowing the binary variableto vary continuously between Oand land to introduceterms in the energy funclion that forced the finalsolution to one of the corners of the hypercube (O,l]N.Brietly, let the output variable Yi(t) for neuron i havelhe range O <Yi(t) < 1 and be a continuous andmonotonic increasing function of the internai statevariable xi(t) of lhe neuron i. : Yi = f(Xi) The output isthen given as ( a sigmoid-Iike function) :

1 xi 1y; = -(1 + tanh-) = - 2x . /2 Xo 1+ e ,'o

where Xo determines the steepness of the gainfunction.

The dynamics of the CNN network aredescribed by a system of nonlinear ordinarydifferential equations (4) and by an associatedcomputation energy function (called the LyapunovFunction) which is minimized during the computationprocess o

The resulting changing equation thatdetermines the rate ofchange Xij is

22

êJxij(t) Xij(t) Lc--=---+ T ·k/ y ..(t)+/..I) dt R. 1), ' I} I)

I

(8)where Yi/t) =f(Xi/I))

We replace the sign-type non-linearity in (4) bya sigmoidal non-line linearity. In this case lhe systembecomes a continuously valued dynamical systemwhere gradients are well defined and classicaloptimization algorithms can be applied.

The Lyapunov function , E(t) , of the cellularneural network is

(9)By using an appropriately defined energy

function, stability of the CNN can be proved in thesame way as an analog or continuous Hopfieldnetwork. Hopfield and Tank [16] investigated theanalogy between finding a solution to a givenoptimization problem and setting an appropriateLyapunov function corresponding to lhe additiveneural model.

40. SBAI - Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

The analog network has two advantages. Firstthe function E is Lyapunov, while for a digitalnetwork it is not. . Second, deeper mínima of energyhave generally larger basins of attraction. A randomlyselected starting state has a higher probability offalling inside the basins of attraction of a deeperminimum,

4 DESIGN OF CNNS FORASSOCIATIVE MEMORIESThe goal of associative memories is to store a

set of desired patterns as stable memories such that astored pattern can be retrieved when the input patterncontains sufficient information about that storcdpattem. In a first phase some number p of patterns SI1,I<1l<P, is stored during a learning processoIn a secondphase , the recall step, one of these p patterns (possiblycorrupted by noise) is presented to the associativememory. Its output must then converge towards thispattem.

An associative memory can be implemented asa continuous-time or discrete-time dynamical system

ax- = Fw(x(t), u) OI' x (n +1)= Fw(x(n), u)at(lO)

parameterised by a matrix W encoding the pattems Sl'·.The input u can be presented as an initial conditionx(O) and or as an independent input.

The most popular way to store patterns SI' is tostore them as stab le equilibrium points of the N -dimensional dyn amical system (lO).

A first attempt at developing a design methodfor associative memories using DTCNNs was made in[17] , where the well-known Hebbian mie was used todetermine the connection weights. However, seriouslimitations were found relating to the kind of patternsto be stored. The Hebbian training signal will nottypically be optimal for learning-invariant objectrecognition due to erroneous classifications made bythe neuron to spatially similar images from differentobjects and spatially dissimilar images derived fromthe same object.

We use the synthesis procedure presented byLiu [5] for the design of a cloning template for CNN .He considers a class of two-dimensional discrete -time cellular neural networks described by equationsofthe form

aXij- = -Ax.. +Tsat(x..)+/··at . IJ IJ IJ

(11)

A =diag[ab ...,all ]T = [Tij] represents the feedback cloning

template

/ = [l1l,l12...Imn]T is the bias vector and

lisa(Jij)= O if -I<Jij <1. -}

sat(xi) = [sat(xll), ....,sat(xmn)]Twith

satO represents the activation function.

Among the synthesis techniques of CNN's forassociative memories, the eigenstructure methodappears to be especially effective. This method hassuccessfully been applied to the synthesis of neuralnetworks defined on hypercubes, the Hopfield modeland iterative algo rithms. The key idea is to make aproper choice of the interconnection matrix T. Next,we present the synthesis problem and the synthesisprocedure.

. Suppose that P is an asymptotical1y stableequilibrium point and a=sat(p) is a memory vector ofsystem (11) with parameters A ,T, and I The synthesisproblem is as follows: Given m vectors in B" (thedesired memory patterns), say CXJ, . .. CXm• How can weproperly choose A,T, and I so that the resultingsynthesised system (Ll) has the properties of anassociative memory [5].

4.1 Synthesis ProcedureWe choose vectors pi for i=I, ....m and a

diagonal matrix A with positive diagonal elements,such that Api = llCXi , where 11 > O, i.e. choose pi =[P/,...Plli]T with p/a/>I ; i=I, ....m and j=I,..,/l,A=diag[aJ,...,all] with aj > O for j = 1....,11 and 11 >max{a;} such that ajp/ = llCX/. We use A=diag[l, ...,I]and 11= lO.

Compute the n x (m-I) matrix

y= [/ ,....,ym.l] = [CXI-cx"', .... ,cx"'.I_cx",]

(12)

Perform a singular value decomposition of Y =USVT, where U .and V are unitary matrices and S is adiagonal matrix with the singular .values of Y on itsdiagonal.

Compute T + = [Tij i = L i Ui (Ui)T(13)

where i=I, ...,p and p = rank(Y)

T ' = [Tij ' ] = Liu\ui)TYij= sat(xij) with 1$ i $ m; l$j $ n. (14)

Xij and Yij are the states and outputs of the where i=p+l, ...,nnetwork respectively, and:

23

40. SBAI - Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

Choose a positive value for the parameter 't and 3. c1assification of these features .compute

Figure 2 : The strueture of the proeess forfeature selection

The steps in the system are, from top tobottom:

l.The system begins with a motion sequence of n+Iimages (frames) of an object.

5.1 Selecting Feature Vectors forRecognitionIn any pattern recognition problem feature

detection is of utmost importance. The featuresextracted from different views must have a reasonabledegree of invariance against shift, rotation, scaIe andother variations. When an object cannot be identifiedon the basis of information about shape, other types ofinformation will pIay a criticai role - motion, spatiaIproperties (size, location), texture [18], [19] ,[20],[21],[22], [25] .

image sequence

optical flow

features of tlowfield

rearranged featuresof lime series

Our work in the spirit of LittIe and Boyd 23,24, ísa model-free approach making no attempt torecover a structural model of the 3D object. Therefore,it describes the shape of the motion of the object witha set of features. We derive features frorn denseopticaI flow data (u(x,y) ,v(x,y». In contrast to LittIeand Boyd we aim on image sequences of a statiecompIex object, taken by a mov ing camera. Wedetermine a range of scale-independent scalar featuresof each flow image that characterise the spatialdistribution of lhe flow. The features are invariant toscaIe and do not require identification of referencepoints on the moving camera. The flow diagram of thesystem that creates our motion features is presented inFigure 2.

We. use 't=0.95.

Then ai ,....,ci" wilI be stored as memoryvectors in the system (4). The states correspondingto ai, i=l, ...,m, will be asymptotically stableequilibrium points of system (11). There are severaladvantages using lhe eigenstructure method:

As is well known, the outer design methoddoes not guarantee that every desired memory partemwill be stored as an equilibrium point (memory point)of the synthesized system when the desired patternsare not mutually orthogonal. The network designed bythe eigenstructure method is cap able of storingequilibrium points, whieh by far may outnumber theorder of lhe network. (For example of a neuralnetwork of dimension n=81 which stores 151 vector asequilibrium points, refer to [6].

In a network designed by the eigenslructuremethod, ali of lhe desired patterns are guaranteed to bestored as asymptoticalIy stable equilibrium points.System (I I) is a variant of lhe recurrent Hopfieldmodel with activation function sat(.) There are alsoseveral differences from the Hopfield model : I) TheHopfield model requires that T is symmetric. We donot make this assumption for T . 2) The Hopfieldmodel is allowed to operate asynchronously, but lhepresent model is required to operate in a synchronousmode. 3) In a Hopfield network used as an associativernernory lhe weights are computed by a Hebb rulecorrelating the prototype vectors to be stored, whilethe connections in the Cellular Neural Network areonly local. For exarnple, a CNN of the form (I I) withM=N=9 and r=3, has 2601 total interconnections,while a fulIy connected NN with n=8 I wilI have atotal of 656 I interconnections.

(15)

5 CELLULAR NEURAL NETWORKSAPPLlCATION IN 3D OBJECTRECOGNITION SYSTEMIn this section we are now concentrating on an

application of CNN associative memory for 3D objectrecognilion, that is based on lhe following idea. WhiIethe robot is moving from one viewpoint to another togather characteristic views of an object, an imagesequence is taken and analysed on the way.Obviously, such a ' sequence contains a lot ofadditionaI information. It implicitly codes the 3Dstructure of the object for lhe price of a huge amountof data. For the reduetion of data and of processingtime, we downsampIe the images in the sequence to asize of 32x32 pixeIs. The processing of lhe imagesequence consists mainly of three steps:

I . computing the optical flow;2. extracting features from the flow ;

24

40. SBAI - Simpósio Brasileiro de Automação Inteligente , São Paulo, SP, 08-10 de Setembro de 1999

Figure 3 : Some images ora scqucnce

li I r r r r IFigure 4 : The u data (x-flow) visualized as normalized gray value images

• !I IFigure 5 : The v data (y-flow) visualized as normalized moving gray value images

2.The optical flow algorithm is sensitive to brightnesschanges caused by reflections, shadows, andchanges of ilIumination, so we first filter the imagesby a Laplacian of Gaussian to reduce the additiveeffects .

3.We compute the optical flow of the motion sequenceto get n images (frarries) of (u, v) data, where u is thex-direction flow and v is the y-direction flow. Weuse the method presented by Bülthoff, Liule, andPoggio [23]. The dense optical flow is generated byminimizing the sum of absolute differences betweenimage palches. We compute the flow only in a boxsurrounding the object. The result is a set of movingpoints. Let T(u, v) be defined as:

T(u, v) segments moving pixels frorn non-movingpixels.

4.For each frame of lhe flow, we compute a set ofscalars that characterizes the shape of lhe flow inthat frame. We use ali the points in the flow andanalyze their spatial distribution. The shape ofmotion is the distribution of flow, characterized byseveral sets of rneasures of the flow. Similar to Littleand Boyd, we compute the following scalars:

centx : x coordinate of centroid of moving regioncenty : y coordinate of the centroid of moving regionwcentx : x coordinate of centroid of moving regionweighted by I(u, v)1wcenty : y coordinate of centroid of moving regionweighted by I(u, v)1dcentx = wcentx - centxdcenty = wcerity - centyaspct : aspect ratio of moving regionwaspct : aspect ratio of moving region weighted byI(u,v)!daspct = aspct-waspct

fI,T(u,v) =0,

ifl(u, v)1 Iotherwise

uwcentx : x coordinate of centroid of region weightedby Iuiuwcenty : y' coordinate of centroid of moving regionweighted by luIvlVcentx : x coordinate of centroid of moving region. weighted by Iv 1

vwcenty : y coordinate of centroid of moving regionweighted by Iv I5. Each image Ij in a sequence of n images generatesm=l3 scalar values, sij, where i varies from 1 to m,and j from 1 to n. We rearrange the scalars to formone time series for each scalar - S;=[Sil, ...,Sin]'

These time series of scalars are then used asfeature vectors for the classification step, which isimplemented by usíng a Cellular Neural Network.Figures 3-5 show some images of a sequence andsome flow images. Figure 6 visualizes the . featurevector of the whole sequence,

Figure 6: Visualization of the feature array, onecolumn for each time series Si

lt can easily be seen in Figure 6 that obviouslyseveral features are strongly correlated.

25

40. SBAI- Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

6 EXPERIMENTSA series of experiments was done using a set of

images of the Columbia image database. This databaseconsists of image sequences of objects, which wereplaced on a turnlable when tak.ing images every 5°.The background is uniform and the objects arecentered. To speed up the fIow computalion and tohandle the amount of data, we reduced the image

resolution to 32x32 pixels and used only every secondimage. Thus, our 'irnage sequences consist of 36 lowresolution images taken every 10°. Tlie features of theimage sequences of ten different objects of lhedatabase were used. Fig. 7 shows the learned objects.Fig. 8 shows an image sequence of object number two.

Figure 7 : Images of the ten objects

Figure 8 : Image sequence of object number two

object I object 2 object 3 object4 object 5

Figure 9 : Features of tive test objects shown as gray value images

26

40. SBAI - Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

The resulting feature vectors for the time seriesof the feature vectors of five objects are shown infigure 9 . For a better visualization they are shown asnormalized gray images. We show the differentfeatures as described in sect. V in x-direction, the timein y-direction. One can see that they formcharacteristic patterns that are used for anunambiguous recognition of these objects. Theassociative memory is used to restore incompletesequences and to classify them. Obviously,recognition of a previously learnt object would not bea problem at ali, since the CNN is designed such thateach stored feature vector x, is an equilibrium point oftheCNN.

In our experiments we wanted to measure theinfIuence of two important parameters: what happensif the object is not correctly centered in the images andhow sensitive is the system to noisy input images?

Thus, we modified the image sequences bytranslating the objects by t pixels in x- and y-directioneach and by adding a random uniformly distributednoise of maximal value n to each of the images. In ourtest t varies from O to 6 and n varies from O to 30.Consider, that the images are only of size 32x32pixels. Thus a translation of the object of up to 20% ofthe image's size is possible . For each value of theparameters (t, n) the experiments were repeated fourtimes - the noise was of random nature. Table 2shows the achieved recognition rates for our testobjccts.

Table 2 : Recognition rates for varyingtranslation and noise

Translation t noise n recognition(pixels) rate (%)O O 100,00O 5 97,50O 10 92,50O 15 87,50O 20 65,00O 25 60,00O 30 62,502 O 100,002 5 100,002 10 92,502 15 87,502 20 67,502 25 87,502 30 85,004 O 100,004 5 97,504 10 87,504 15 82,504 20 72,504 25 65,004 30 65,006 O 80,006 5 97,50

27

Translation t noise n recognition(pixels) rate (%)6 10 77,506 15 72,506 20 72,506 25 55,006 30 57,50

In addition, Fig. 10 shows the averagerecognition rates for t < tI and n < n. .

WI

" 00E.g m·2

[ I'il'l:'

Figure 10: Average recognition rates ofthe CNN

In addition, we compared the CNN associativememory to a nearest neighbor c1assifier. TheEuclidean distance was used as the distance measurebetween the feature vectors. The main disadvantage ofthe nearest neighbor classifier compared with anassociative memory is that it relies on a distancemeasure that holds no information on sub-patterns andthe kind of distance between two different featurevectors.

Thus, the recognition rates of the nearestneighbor classifier are below those of the CNNassociative memory. Fig. 11 shows the achievedrecognition rates for the nearest neighbor c1assifier.Fig. 12 compares the recognition rates for a fixedvalue of t and variable noise n.

Figure 11 : Recognition rates of the nearestneighbor c1assifier

40. SBAI- Simpósio Brasileiro de Automação Inteligente, São Paulo, SP, 08-10 de Setembro de 1999

m . ' a .a _noísen

Figure 12 : Recognition rates of CNN and NN forconstant translation t and variable noise Il

Further experiments will be done concerningdifferent resolutions of the images and concerningvariable lengths of the image sequences.

24. 1. Little and J. Boyd, Recognizing People by Their Gait:the Shape of Motion, Online document, http ://www-vision.ucsd.edu/ (1997).

25 . M. Milanova, Recovering and representíng threedimensional objects for computer vision or computergraphic applications. in: Proceedings of the DSP'95,(Cyprus, 1995) 544-552.

16. Hoplield, D.Tank , "Neural Computation of Decisions inOpt imization Problems",Biological Cybernetics, Vol.52,pp. 141-152,1985 Bülthoff, J. Little , and T. Poggio, Aparallel aIgorithm for real-time computation of opticalflow. Nature, 337 (1989) 549-533 .

17. S. Tan , J. Hao, and J VanderwalIe, CNN as a model ofassociative memories, Proc CNNA -90, Budapest, 1990 ,IEEE, New York,1990, 26-35

18. H. Bülthoff; S. Edelman; and M. Tarr, How are three-dimensional objects represented in lhe brain? TechnicaIreport A.I.Memo N1479 , Massachusetts Institutc ofTechnology, A.I. Lab, 1994.

19. G. Hartmann, U. Büker, S. DrílevA hybrid ArtificialIntelligence - Neuro Architecture. to appear in: B.Jâhne, et aI., Handbook on Computer Vision andApplications (Academic Press, San Diego, 1999).

20 . U. Büker, Hybrid Object Models: Combining Symholicand Subsymbolic Object Recognition Strategies. in:Proc. of the 4th Int, Conf. on Information Systems,Analysis, and Synthesis. VoU (I1IS, Orlando, 1998)444-451 .

21. M. Hebert et aI. (Ed.), Object Recognition in ComputerVision . (Springer, New York, 1995).

22. S.M. Kosslyn, Image and Brain . (MIT Press,Cambridge, 1995).

23. J. Little and J. Boyd , Describing motion for recognition.in: IEEE Symposium on Computer Vision (1995) 235-240.

8. A. Radvanti," Structural Analysis of Stereograms forCNN Depth Detec tion", IEEE Trans. Circuits Syst.l , Vol46 ,pp.239-252,1999.

9. F. Lithon , D.Dragomirescu ,"A CelIular Ana log Networfor MRF-Based Motion Detection", IEEE Tran s.Circuits Syst.l, VoI46,pp. 281-293,1999 .

10. J. Nossek , Design and leming withcellular neuralnetworks, Int. Joumal of Circuit Theory andApplications, Vo124, 15-24, 1996.

11. T. Poggio, V. Torre and C. Koch, "Co rnputationalVision and Reqularisation Theory", Nature, VoI. 317 ,pp314-319,1985.

12. C. Koch, 1. Marroquin, A YuilIe , Analog "neuralnetworks in early vision, Proc. Natl. Acad. Sei. USA ,Vol. 83, pp4263-4267, 1986.

13. H. Wechsler, Computer Vision, Academic Press, Inc,1990.

14. G Pajares, 1. Cruz , J. Aranda , "Relaxation by Hopfieldnetwork in stereo image matching", Patter Recognition ,Vo131, No 5 pp 561-574,1998.

15. N.K. Bose and Pliang, Neural Network Fundamen talswitli Graphs, Algorithms and Applications , McGraw-HiIl Series in Electrical and Computer Engineering,1996.

G. Grassi , A New Approach to Dcsign Ccllular NeuralNctworks for Associativc Mcmories. IEEE Transactionson Circuits and System - I Fundamental Theory andApplications, Vol. 44, 9 (1997) 362-366.

D. Liu and A. Michel, Sparsely Interconnected NeuralNetworks for Associative Memories with Applic ationsto CelIular Neural Networks. IEEE Transactions onCircuits and System - I I Analog and Digital SignalProcessing, Vol. 41, 4 (1994) 295-307.

D. Liu, Cloning Template Design of CelIular NeuralNetworks for Associative Memories. IEEE Transactionson Circuits and System - I Fundamental Theory andApplications, Vol. 44, 7 (1997) 646-650.

P. Szolgay, I. Szatmari, and K. Laszlo, A Fast FixedPoint Leaming Method to Implement AssociativeMemory on CNN's. IEEE Transaction on Circuits andSystems - I: Fundamental Theory and Applic ations ,Vol. 44,4 (1997) 362-366 .

7 CONCLUSIONWe have presented applications of CNNs, in

image processing and 3D object recognition. Thesystematic steps towards design and learning withCNNs provide powerful techniques for finding thetemplate coefficients (synaptic weights) to perform adesired task. The nearest neighbour interactiveproperty of CNN makes them much more amenable toVLSI implementation.

REFERENCESI. L.O. Chua and L. Yang, "CelIular Neural Networks:

Theory and Applications", IEEE Trans. 011 Circuits andSystems, (CAS), Vol.35 . pp. 1257-1290,1988.

2. L.O. Chu a and T. Roska , "The CNN Paradigm. IEEETransactions 011 Circuits and Systems (Part I) CAS-40, 3, pp 147-156, (1993) .

3. T. Roska and J. Vandewalle, Cellular Neural Networks.(John Wiley&Sons) 1993.

4.

5.

6.

7.

28