6
Pattern Recognition Letters 7 (1988) 259 264 April 1988 North-Holhmd Efficient systematic analysis of occlusion Martin C. COOPER D@artement d'ln/brmatique, Universit~ de Bordeaux I, 351 ('ours de la lib,;ration, 33405 Talence, Bor~h,au.v. l')'ance Received 30 September 1987 Revised 27 Novcmber 1987 Abstract." Occlusion generally makes multiobject pictures so highly ambiguous that enumeration o1 legitimate interpretations is not practical. For occlusion this paper introduces a compact structural description which has the advantage that it can bc determined in time that is a polynomial function of the number of objects and the number of pixels in the picture. Key words. Image analysis, occlusion, ambiguity. 1. Introduction Previous work on recognition of occluded objects in multi-object pictures (e.g. [1,2,3,4]) has been characterised by being heuristic rather than prov- ably correct, and by not exploring all possible inter- pretations of a picture. The state of the art of the analysis of occlusion currently appears to be far in- ferior to the state of the art of syntax analysis of sen- tences in formal languages, for which there are well known provably-correct algorithms that enumerate all legitimate parses of a given sentence. Indeed there have not been many published attempts to make occlusion analysis as rigorous as syntax anal- ysis. The blocks world investigations of Huffman [5] and CIowes [6] used heuristic rules concerning shapes of vertice.s. Like syntax analysis, occlusion analysis turns out to be a minefield of combinatorial explosion. The output of occlusion analysis is unlikely to be useful if it is an enumeration of an explosively large number of alternative interpretations of a given multi-object picture. The present paper contributes a non-explosive representation for the results of oc- clusion analysis, and a polynomial-time algorithm for producing this representation from a given picture. To lnake radical progress with an extremely corn- plicated problem such as occlusion anaiysis of real- life cluttered scenes, it is surely sensible to start by taking a step back and making simplifications that facilitate mathematical accountability. Accordingly this paper is concerned with synthetic two-dimen- sional pictures composed of noise-free images of ob- jects. A fixed set S of such ol~ject-images is stored explicitly in memory, and a picture is composed ex- clusively of object-images that belong to S. An ob- ject-image is stored simply as an array of pixels. If, for example, pictures were to be composed of triangles and squares in any shift and any dilation, the set S would contain all possible shifted and di- lated images of a triangle and a square. As another possibility, if pictures were to be orthographically composed of images of a cube in any three-dimen- sional attitude and at any distance, all ~uch images would be stored in S explicitly. This is done so as to allow the present work to concentrate entirely on occlusion, leaving transformations between an ob- ject-representation and possible object-images to bc dealt with elsewhere. Our research strategy is to start by factorising out the occlusion problem from problems of noise, geometrical projective transformation and illumi- nation. The residual occlusion problem is very far from trivial, and deserves more attention than it has received during the last twenty years. 0167-8655 88'$3.50 @) 1988. Elsexier Science Publishers B.V. (North-t]olland 259

Efficient systematic analysis of occlusion

Embed Size (px)

Citation preview

Pattern Recognition Letters 7 (1988) 259 264 April 1988

North-Holhmd

Efficient systematic analysis of occlusion

Martin C. COOPER D@artement d'ln/brmatique, Universit~ de Bordeaux I, 351 ('ours de la lib,;ration, 33405 Talence, Bor~h,au.v. l')'ance

Received 30 September 1987

Revised 27 Novcmber 1987

Abstract." Occlusion generally makes multiobject pictures so highly ambiguous that enumerat ion o1 legitimate interpretations

is not practical. For occlusion this paper introduces a compact structural description which has the advantage that it can bc

determined in time that is a polynomial function of the number of objects and the number of pixels in the picture.

Key words. Image analysis, occlusion, ambiguity.

1. Introduct ion

Previous work on recognition of occluded objects in multi-object pictures (e.g. [1,2,3,4]) has been characterised by being heuristic rather than prov- ably correct, and by not exploring all possible inter- pretations of a picture. The state of the art of the analysis of occlusion currently appears to be far in- ferior to the state of the art of syntax analysis of sen- tences in formal languages, for which there are well known provably-correct algorithms that enumerate all legitimate parses of a given sentence. Indeed there have not been many published attempts to make occlusion analysis as rigorous as syntax anal- ysis. The blocks world investigations of Huffman [5] and CIowes [6] used heuristic rules concerning

shapes of vertice.s. Like syntax analysis, occlusion analysis turns out

to be a minefield of combinatorial explosion. The output of occlusion analysis is unlikely to be useful if it is an enumeration of an explosively large number of alternative interpretations of a given multi-object picture. The present paper contributes a non-explosive representation for the results of oc- clusion analysis, and a polynomial-time algorithm for producing this representation from a given picture.

To lnake radical progress with an extremely corn-

plicated problem such as occlusion anaiysis of real- life cluttered scenes, it is surely sensible to start by taking a step back and making simplifications that facilitate mathematical accountability. Accordingly this paper is concerned with synthetic two-dimen- sional pictures composed of noise-free images of ob- jects. A fixed set S of such ol~ject-images is stored explicitly in memory, and a picture is composed ex- clusively of object-images that belong to S. An ob- ject-image is stored simply as an array of pixels.

If, for example, pictures were to be composed of triangles and squares in any shift and any dilation, the set S would contain all possible shifted and di- lated images of a triangle and a square. As another possibility, if pictures were to be orthographically composed of images of a cube in any three-dimen- sional attitude and at any distance, all ~uch images would be stored in S explicitly. This is done so as to allow the present work to concentrate entirely on occlusion, leaving transformations between an ob- ject-representation and possible object-images to bc

dealt with elsewhere. Our research strategy is to start by factorising

out the occlusion problem from problems of noise, geometrical projective transformation and illumi- nation. The residual occlusion problem is very far from trivial, and deserves more attention than it has received during the last twenty years.

0167-8655 88'$3.50 @) 1988. Elsexier Science Publishers B.V. (North-t]ol land 259

Volume 7, Number 4 PATTERN RECOGNITION LETTERS April 1988

2. Picture generation

A formal language defines a class of sentences by formulating rules that generate exactly this class of sentences. Without such definition, it would not be possible to prove the correctness of a syntax analy- sis algorithm. Analogously, it will be possible to prove the correctness of an occlusion analysis algo- rithm only if we have a rigorous definition of the class of pictures that are to be analysed. For this purpose, following [7], we borrow from formal lan- guage theory the technique of definition-by-genera- tion. We now formulate a very simple algorithm that uses a given set S of object-images to generate pictures. We subsequently consider algorithms that can be proved to yield correct analysis of the class of pictures that can be generated by this algorithm.

The following simple generative algorithm builds up a synthetic picture in a rectangular array named Picture, in which the pixels are initially undefined.

Proeedure Picgen; begin

repeat randomly choose an object-image from S and denote this by Image; for each pixel [i,j] in Image do if Picture [i,j] has not yet been assigned a value then Picture [i,j]: = Image[/,j]

until every pixel in Picture has been assigned a value

end

The limitations of PicGen are similar to those of the very closely related Painters algorithm [10] in Computer Graphics.

By way of example Figure l(a) shows a collection of three objects, each having uniform grey-level. The square and the triangle share the same uniform grey-level in this example, and the hatching in- dicates that the circle has a uniform grey-level that is different to that of the square and the triangle. In this example, the set S comprises all possible rota- tions and translations of these three objects. Figure l(b) shows the contents of array Picture after five successive iterations of PicGen with this set S.

3. Generating lists

In the present work the input to the occlusion analysis algorithm is a picture generated by PicGen together with a detailed complete listing of all ob- ject-images in S. In that the contents of S are fully available to the analysis algorithm, the analysis is knowledge-based. What the analysis algorithm does not 'know', and has to figure out for itself, is how PicGen chose and put together a collection of ob- ject-images to make up a picture.

The input to the analysis process is now clear. One possibility for the output is the complete enu- meration of generating lists for the given picture, defined as follows.

A generating list is an ordered subset of S such that the given picture would have been generated if PicGen had selected only the object-images in the generating list, and had selected them in the sequen- ce in which they occur in the generating list, and each object-image in the generating list is visible (i.e. non-occluded) in Picture in at least one pixel

[81. The following recursive backtrack routine enu-

(b)

@ Figure 1. (a) A set of objects, and (b) the generation of a picture using images of these objects.

260

Volume 7. N umber 4 PATTERN RECOGNITION LETTERS April 1988

merates all generating lists for a given picture. The

special symbols ~ and - are explained later.

Procedure AllGenLists(Picture);

begin for each Image in S do if (Image c~ Picture is not empty) and (Image

Picture) then

begin Enter Image into generating list;

NewPicture: = Picture - Image;

AllGenListsl NewPicture)

end end

The notat ion is most readily unders tood in terms

of a simple one-dimensional example in which Pic- ture consists of the six pixels 885557 and two of the

object-images in S are 8888*** and * '555" , where * denotes a pixel whose value is undefined because

it does not lie within the object image. The object 8888** does not match 885557 because of the mis-

match between 8's and 5"s. However the object **555* is found to match and is accordingly sub-

t rac t ed ou t of 885557 leaving 88###7, where # is es- sentially a don ' t care value that can subsequently

match any pixel value. We use the infix opera tor -- to denote the subtract ing-out operation. The result

of Picture - Image is to change all pixels of Picture

that match non-* pixels of Image to #. Thus for example 885557 -- **555* = 88###7.

In our notat ion ~ is a boolean infix opera tor such that the proposi t ion P ~ Q is true if and only

if P and Q are identical within the area of overlap between P and Q. Pixels that are labelled * or # are

regarded as not belonging to any area of overlap.

For example 885557 ~ **555* is true and 885557 8888** is false, but 88###7 ~ 8888** is true.

When AllGcnLists inserts an object-image into a generating list, the housekeeping is such that the

insertions along every path from the root to any leaf of the search-tree constitute a generating list.

Unfortunately AllGenLists [8] is not a practical solution to our artificially simple occlusion analysis problem, primarily because the number of generat- ing lists for a picture is generally explosively great, and any algorithrn that enumerates them will inevi- tably suffcr from the same combinator ia l explosion. What is wrong is not the algori thm itself but, more

fundamentally, the kind of analytical output that

the algori thm is designed to produce. One of the reasons for explosiveness of the

number of generating lists is that some of the order-

ing of object-images within a generating list is gen-

erally arbitrary. For example, at the top right of the

picture generated in Figure l(b) the triangle could

be occluding the square or the square could be oc- cluding the triangle, and this ambiguity is not re-

solvable. AllGenLists will produce at least one generating list in which the triangle precedes the

square and at least one generating list ir which the square precedes the triangle. Such proliferation of

generating lists does not contr ibute meaningfully to the analysis of occlusion, and reinforces the view

that generating lists are not what we should try to

produce.

4. Parses

Instead of generating lists, we could enumerate all possible parses of a given picture. Here, by anal- ogy with linguistics, a parse is defined to be a seg-

mentat ion (i.e. partit ioning) of a picture into non- overlapping areas, whose union is the picture, with

an object-image label for each areal, such that this

labelling is consistent with a possible generation of

the picture by PicGen. To be more precise, let R(Image) del:Lote the set

of pixels of Picture that are assigned a value when Image is selected by PicGen. A parse P is a labelling of areas that is consistent with an execution of

PicGen if, for each object-image Image selected by PicGen. the union of all areas labelled Image in P

is R(lmage). To simplify our work we do not worry if R(lmage) is split into more than one area in P. Two parses are considered to be the same if they as- sign each pixel of the picture the same object-image label. We do not seek the coarsest parti t ioning of

the picture that is consistent with an execution of

PicGen. Even if we could enumerate the coarsest-parti-

t ioning parses for a given picture this would gener- ally be combinatorial ly explosive because of ambi- guity. For example for the top right of the picture generated in Figure l(b) we would obtain one parse in which the triangle/square overlap a~ea was la-

261

Volume 7, Number 4 PATTERN RECOGNITION LETTERS April 1988

belled triangle and another separate parse with this

same area labelled square. To avoid such duplica-

tion we should not produce entirely separate parses that incorporate different interpretations of an am-

biguous area. Instead we should seek a single struc-

tural description that incorporates both interpreta-

tions of an ambiguous area. This leads to the idea for a parse structure.

5. Parse structure

A Parse structure is a set of (area, object-image

label) pairs such that each of these pairs belongs to a parse as defined above in Section 4. Thus a parse structure is a segmentat ion (i.e. parti t ioning) of a

picture into areas each of which may have more than one object-image label.

Each pixel within any given area in a parse can

be regarded as being implicitly labelled with that area's object-image label. Thus for any given parse there is an associated pixel labelling. Similarly a parse structure implicitly labels each pixel with a set

of object-image labels. A parse structure is said to

be complete if its pixel labelling includes the pixel la- belling of every parse.

The main point of this paper is to introduce an

efficient algori thm for finding a complete parse structure. This algorithm, named CoverParses, starts with a single area equal to the entire area of Picture. Successive iteration may split an area into

smaller areas, each with associated parse labels. The parse structure is built up as a set of (area, ob-

ject-image label) pairs in PS. At any given time PSA is the set of all areas such that the pair (Area, L) is

in PS for some object-image label L. The algori thm is:

Procedure CoverParses; begin

PS: = empty;

PSA: = a single area equal to the entire area of Picture;

repeat for each Image in S do for each area A in PSA do

A1

® A1

AI: - AI : - A2: t l (circle) A2: t l (circle)

A3: t2(square)

AI : - A I : - A2: t l (circle) A2: tl (circle) A3: t2(square) A3: t2(square) A4: t3(triangle) A4: t3(triangle)

A5: t4(circle)

A I : - A I : - A I : - A2: t l (circle) A2: t l (circle) A2: t l (circle) A3: t2(square) A3: t2(square) A3: t2(square) A4: t3(triangle) A4: t3(triangle) A4: t3(triangle) A5: t4(circle) AS: t4(circle) A5: t4(circle) A6: t5(square) A6: t5(square) A6: 15(square)

A7: t2(square) t6(triangle) t3(triangle) A7: t2(square)

t3(triangle)

Figure 2. The interpretation by All Parses of the picture generated in Figure l(b).

262

Volume 7. Number 4 P A T T E R N R E C O G N I T I O N LETTERS April 1988

(a) (b) AI: t l (trian)le) A2: t2(circle) A3: t3(circh~) A4:

Figure 3. (a) A picture, and (b) its partially constructed parse structure.

if Image r~ A is not empty then if (Image n A) ~ (Picture c~ A) then if Valid(A, Image, PS) then begin

if Image does not completely cover A then begin

In PS and PSA split area A into two areas which are A ~ Image and the set differen- ce between A and A n Image; assign all the previous labels of A to both of these areas in PS

end; insert (A ~ Image, Image) into PS

end until (PS is unchanged in an iteration)

end

The important check Valid(A, Image, PS) is ex- plained later. Assuming, for the moment, that this check always yields true, Figure 2 shows the build up in PS when CoverParses is run on the picture at the end of Figure 1. In Figure 2, ti(circle), tj(circle) are different object-images of the circle object, and similar notation is used for other objects. To simpli- fy this example, vertical translation of the square and triangle is restricted to units of half their height.

At the start of Figure 2 (Image m A) ~ (Picture r-, A) is true for the object-image tl(circle), so we obtain two areas, A1 and A2. Subsequently (t2(square) c~ A 1) ~ (Picture r~ AI) so we obtain a further two areas, A1 and A3. The remainder of Figure 2 is obtained similarly.

6. Cheeking validity

(A r~ Image, Image) should be inserted into PS only if there exists at least one parse, as defined in

Section 4, in which the area A c~ Image is labelled Image. The purpose of the check Valid(& Image, PS) is to determine whether at least one such parse exists. The area labelled B in Figure 3(b) is an example of an area such that (Image r~B) (Picture r~ B) is true for the square ob'4ect-image that coincides with area A, but area B should not be given a square object-image label because the square cannot be occluded by t3(circle) and at the same time occlude t3(circle), and therefore no parse exists in which area B has a square object-image label.

The predicate Valid returns true if and only if it succeeds in finding a list H of n object-images that could have been the first n object-images generated by PigGen such that these objects occlude all the disagreements between Image and Pict'are whilst not occluding any of the area A.

For example, in the analysis of Figure 3(a), the pair (A,t4(square)) is found to be valid because the generating list H = (tl(triangle), t2(circ[e), t3(cir- cle)) comprises object-images that could have been the first three chosen by PicGen, and these object- images occlude all disagreements between the squa- re and the picture, as indicated in Figure 3(b).

Like the previous routine AllGenLists, Valid works by subtracting out. Valid makes a copy Co- picture of Picture and does the subtracting out op- eration on Copicture so as not to spoil Picture. The routine is:

Function Valid (A, Im, PS); begin

Q: = {Im: (Area, Im) is in PS for some Area and Im r~ A is emptyl; Copicture: = Picture; repeat

263

Volume 7, Number 4 PATTERN RECOGNITION LETTERS April 1988

for each object-image Im in Q do if Im Copicture then Copicture: = Copicture - Im

until (no matching object-image is found in an iteration) or (Image ~ Copicture); Valid: = (Image ~ Copicture)

end

tural description as a complete enumeration of par- ses, but it has the very important advantage that it can be determined in polynomial time.

A strategic comment is that this work has shown that non-heuristic occlusion analysis is not wholly intractable.

In this the operation (Im ~ Copicture) is not im- plemented laboriously by pixel-to-pixel matching, but instead more efficiently by evaluating the clause.

(Area, Im) is in PS for all Area in PSA such that (Im c~ Copicture) n Area is not empty.

A detailed proof that, by requiring Valid to be true, CoverParses will find only valid labels, and furthermore that CoverParses will find a complete parse structure, can be found in [9]. Also in [9] there is a short-cut that speeds up CoverParses by doing a relatively cheap preliminary test of validity before calling Valid.

7. Complexity

The number of iterations of the repeat until loop of CoverParses has an upper bound of 1 + m • N where m is the cardinality of S and N is the cardi- nality of the final PSA. N cannot be greater than the number of pixels in Picture. The repeat until loop in Valid is executed no more than m times because each object image is subtracted out of Copicture at most once. Since all other loops are deterministic, and all these loops are over m elements of S or over all N areas in PSA the worst-case time-complexity of Valid and CoverParses is polynomial.

8. Conclusion

A parse structure is not such an articulate struc-

Acknowledgments

This work was funded by UK Science and Engi- neering Research Council Grant number GR/ C74751.

References

[1] Ayache, N. and O.D. Faugeras (1984). A new method for the recognition and positioning of 2D objects. Proc. 7th In- ternational Conference on Pattern Recognition, Montreal, July, 1274-1277.

[2] Bolles, R.C. and R.A. Cain (1982). Recognising and locating partially visible objects, the local feature focus method. In- ternational Journal of Robotics Research 1,(3), 57 82.

[3] Hattich, W., W. Schwerdmann and H. Tropf(1982). Experi- ence with two hybrid systems for the recognition of overlap- ping workpieces. Proc. 6th International Conference on Pat- tern Recognition, Munich, 67(~673.

[4l Segen, J. (1984). Locating randomly orientated objects from partial view. Proc. SPIE 3rd International Conference on Robot Vision and Sensory Control, Vol. 449, Part 2, 6 7 ~ 684.

[5] Clowes, M.B. (1971). On seeing things. Artificial Intelligence 2(1), 79-116.

[6] Huffman, D.A. (197l). Impossible objects as nonsense sen- tences. In: D. Mitchie, Ed. Machine intelligence 6, Edin- burgh University Press.

[7] Ullmann, J.R. (1983). An Investigation of occlusion in one dimension. Computer Vision, Graphics and Image Process- ing, 22, 194-203.

[8] Cooper, M.C. (1988). Accelerated analysis of occlusion. Ac- cepted for publication in Image and Vision Computing, February 1988.

[9] Cooper, M.C. (1987). Visual occlusion and the interpreta- tion of ambiguous figures. Technical Report No. 87/4, De- partment of Computer Science, Hull, UK.

[10] Newell M.G., R.G. Newell and T.L. Sancha (1972). A solu- tion to the hidden surface problem. Proc. 1972 ACM Natio- nal Conference, 443-448.

264