A tensor voting approach for the hierarchical segmentation of 3-D acoustic images

Preview:

Citation preview

126

A Tensor Voting Approach for the Hierarchical Segmentation of 3-D Acoustic Images

Linmi Tao 1, Vittorio Murino 1, and Gérard Medioni 2 1 Dipartimento di Informatica, University of Verona, 37134 Verona, Italy

2 Institute for Robotics and Intelligent Systems, University of Souther n California Los Angeles, CA 90089-0273, USA

tao,murino@sci.univr.it, medioni@iris.usc.edu

Abstract

We present a hierarchical and robust algorithm

addressing the problem of filtering and segmentatio n of three-dimensional acoustic images. This algorithm is based on the tensor voting approach – a unified computational framework for the inference of multip le salient structures. Unlike most previous approaches , no models or prior information of the underwater environment, nor the intensity information of acoustic images is considered in this algorithm. Salient str uctures and outlier noisy points are directly clustered in two steps according to both the density and the structural information of input data. Our experimental trials show promising results, very robust despite the low computational complexity. 1. Introduction

This paper addresses the filtering and the segmentation of noisy three-dimensional (3-D) points’ sets acquired by a high-resolution acoustic camera [1]. Such camera is formed by a two-dimensional (2-D) array of acoustic transducers sensible to signals backscattered from the scene previously insonified by a high-freq uency acoustic pulse. The whole set of the acquired raw signals is processed in order to estimate signals coming fr om a 2-D set of fixed steering directions (called beam signals ) and to attenuate those coming from other directions . In this way, the 3-D points can be computed by detecting the time instant at which the maximum peak occurs in ea ch beam signal. In addition, the amplitude of the maximum peak of each signal can be associated with the 3-D point to get an intensity image registered with the 3-D image.

Unfortunately, raw images obtained by an acoustic camera are typically of quite poor quality, due to strongly degrading speckle noise, the non-ideal characteristics of the sensor transfer function (i.e., side lobes), and the

wavelength, which is much longer than that of the v isible light [2]. Besides these intrinsic characteristics of the camera, the quality of acoustic images is also affe cted by the environment conditions, and the pose of surfaces relative to the direction of sonar pulse transmissi on, that may lead to images with a variable density of points.

Although a preliminary low level signal processing is directly performed in the acoustic camera, it has been proved useful to filter the acoustic images. The si mplest way of noise filtering is to determine a threshold level by assuming that echoes have stronger response than ot her cluttering interferences, although different applications and sensing configurations face different problems and several similar techniques are utilized in general. In the case of measuring the seabed topography, Okino [3] set the threshold for separating the seabed echoes and reverberation. Henriksen [4] addressed the same pro blem in real-time underwater object detection for autono mous underwater vehicle by setting two thresholds to red uce the amount of data and to remove noise efficiently. In [15], a method for the segmentation of simulated 3-D sonar images using surface fitting procedures was propose d to recover 3-D surface structures from sparse data.

Traditional optical image processing techniques are also frequently used in 3-D acoustic image processi ng. Filters or masks, such as square Gaussian weighted average filter, square selective weighted average f ilter, etc., are used to convolve the raw acoustic data fo r smoothing the images or reducing the noise [5]. Sta tistical approaches, typically Markov Random Fields [6], are applied to model the physical data acquisition proc ess, the properties of the underwater environment, and then to restore or to segment the acoustic image [7,8,9]. I ntensity information of the points comes into use as reliability information during the estimation process of these statistical approaches [10,11].

Wide information is utilized in these approaches, b ut the intrinsic structural information in 3-D data is not exploited yet. In other words, points captured by the

0-7695-1521-5/02 $17.00 © 2002 IEEE

127

acoustic camera, although at low resolution (3 cm a t best) and very noisy, succeeds to capture the inherent structural information of the surfaces in the scene. In this paper, a new approach is proposed to extract the structural information in acoustic images by convolving the im ages with a well-defined kernel. This approach, called tensor voting, is developed by Guy and Medioni for segmenta-tion [12, 13] purposes. The input of the algorithm is the position of the points and no prior information abo ut the physical characteristic of the acoustic camera or the underwater environment are employed in the process. The original points’ sets are segmented hierarchically according to their inherent structural information in three groups: high density structured points, low density structured points, and noisy points.

The rest of the paper is organized as follows. Sect ion 2 presents a short overview of the tensor voting ap proach. In Section 3, the proposed method is proposed aimed at the segmentation of 3-D acoustic data. Experimental results with real data, and its robustness to high percentage of noise are presented in Section 4, and , finally, conclusions are drawn in Section 5.

2. A brief review of the tensor voting

Tensor voting is a unified computational framework developed by Guy and Medioni [12, 13] for inferring multiple salient structures from sparse noisy data in 2-D or 3-D space. The methodology is grounded on two elements: tensor calculus for representation, and linear voting for communication. Local structures are unif ormly represented by a second order symmetric tensor, whi ch effectively encodes preferred direction, while avoi ding early decision on normal orientations and maintenan ce of global orientation consistency. Data communication is accomplished by a linear voting process, which simultaneously ignores outlier noise, corrects erro neous orientation (if present), and detects surface orien tation discontinuities. The methodology is non-iterative a nd robust to considerable amount of outlier noise. The only free parameter is the scale of the size of the voti ng kernel. 2.1 Second-order symmetric tensor

In 3-D, the second order symmetric tensor is an ellipsoid, which is fully described by its associat ed eigensystem, with three eigenvectors 1e , 2e and 3e (Fig.

1), and the three corresponding eigenvalues 0321 ≥≥≥ λλλ , that is,

[ ]

λλ

λ

T

T

T

e

e

e

eee

3

2

1

3

2

1

321

ˆ

ˆ

ˆ

00

00

00

ˆˆˆ (1)

Rearranging the eigensystem, the ellipsoid is given by

( ) ( ) BPS 33221 λ+λ−λ+λ−λ (2)

where S defines a stick tensor, P defines a plate tensor and B defines a ball tensor (Fig. 1):

Tee 11 ˆˆ=S TT eeee 2211 ˆˆˆˆ +=P (3)

TTT eeeeee 332211 ˆˆˆˆˆˆ ++=B

These tensors define the three basis tensors for an y general second-order symmetric 3-D tensor. By Equat ion (1), a linear combination of these basis tensors de fines any second-order symmetric tensor and, on the contr ary, any second-order symmetric tensor can be decomposed into three independent elements: stick tensor, plat e tensor, and ball tensor. From Equation (2), some features can be estimated, specifically: (a) the normal direction o f a surface, estimated by the eigenvector 1e and the relative

eigenvalue ( )21 λλ − , is encoded by the stick tensor; (b)

the tangent direction of a curve, indicated by the

eigenvector 3e and the eigenvalue ( )32 λλ − , is encoded

by the plate tensor; (c) the relative confidence of the

junction of curves, estimated by the eigenvalue 3λ , is

encoded by the ball tensor (Fig. 1).

2.2 The voting kernel

Different from the classical convolution kernels, t he voting kernel defines the most likely normal by sel ecting a most likely continuation curve between two points ( O and P in Fig. 2). The length of the normal vector at P, representing the strength of the vote, is defined b y the following equation in spherical coordinates:

( )

σϕ+−=σϕ

2

22

exp,,cr

rFD (4)

where r is the arc length OP, c is a constant chosen a priori, ϕ is the curvature, and σ is the scale of analysis, which is the only free parameter in the basic forma lism. In the vote collection stage of tensor voting, the second order moment contributions from each vector vote are aggregated, and the result is a second-order symmet ric tensor. 2.3 The tensor voting processing

In 2-D or 3-D space, each input site can be a point , a point with an associated normal direction, or any combination of the above. This input is first encoded into second order symmetric tensors, namely the tokens. If the input sites have only positional data, this informa tion is transformed into an isotropic tensor with unit radi us, i.e., a ball.

128

This encoding procedure is followed by the first voting stage, in which tokens communicate their information with each other in a neighborhood accor ding a predefined voting kernel (see Eq. 4 and Fig. 2), and refine the information they carry. In fact, this st age is the same of the classical convolution. The difference of the voting from the convolution is that the voting kern el defines the relationship between the tensors based not only on the distances, but also on their relative n ormal directions. After this process, each token is now a generic second-order symmetric tensor, encoding both the strength and the orientation of the input points.

Figure 1. A second symmetric tensor, its eigensystem and the decomposition.

Figure 2. The relationship between two tokens, O and P. N is the normal in O.

The refined token are voted again in the second sta ge. These generic tensor tokens propagate their informa tion in all of neighbors, despite the presence of tokens on these positions or not, based on the predefined vot ing kernel. Thus, this procedure leads to a dense tenso r map, which recovers every point in the domain. This processing is built on the assumption that some dat a points are lost in the sampling. In practice, the d omain space is voxelized into a uniform array of voxels a nd the second voting procedure is very time consuming.

The resulting dense tensor map is then decomposed to stick tensors, plate tensors, and ball tensors. Points, which have local maximum strength along a direction of the stick tensors, are selected as points belonging to a surface. Points, which have local maximum strength along a direction of the plate tensor, are labelled as points located on curves. Junctions are identified as the points having the local maximum strength of the ball tensors with no preferential direction. The rest of the points are labelled as noisy points. The final output is the set of points located on surfaces, curves, and their junct ions.

It is very difficult to clearly describe the proced ures of the tensor voting in very few pages. Hopefully, the basic idea can be reached via the following example .

An acoustic image is represented by a set of points in the three-dimensional space Ω. The first step of the tensor voting procedure is to voxelize the 3-D space Ω to small cubic volumes. Some cubes contain original 3-D points, but others may be empty. Suppose that A, B and C are three of these cubes, and that one point is contained in A, one point is contained in B, whereas no points are present in C.

Subsequently, the points are encoded as ball tensor s since there is only position information associated with the points. In the first voting stage, A and B exchange their information or vote with other points and, as the result of the tensor voting processing, the primary ball tensors are refined to generic second-order symmetr ic tensors.

In the second voting stage, A and B exchange their information with other points again. The processing here is that cube C also collects information from its neighbors even if there is no any point in it. The result of this stage of voting is that all of the cubes in the space Ω has been encoded by general tensors, which is the so-called dense tensor map.

Finally, all of the tensors are decomposed and various structures, i.e., surfaces, curves, and junctions, are extracted from stick, plate, and ball tensors. 3. Hierarchical filtering and segmentation

The proposed hierarchical filtering and segmentation procedure based on tensor voting has been developed trying to limit the computational complexity of the

129

original technique, i.e., avoiding the time-consuming second voting stage. This technique has been named tensor convolution to indicate the relationship with the classical convolution operation, while differentiat ing it from the standard tensor voting.

3.1 The concept of tensor convolution

In the terminology of convolution, points collect information from their neighbors according a predef ined convolution kernel. Figure 3 shows a simple example of this new convolution process acting on both points’ strength (intensity) and directions.

The convolution process considers a local area around each image point, carrying out an average of the points in this local area weighted by the values co ntained in the convolution kernel (e.g., see Fig. 3a).

A general convolution mask can be expressed as:

( ) ∑=

+=•n

ii yxP

1

αβ , (5)

where xi is one of the n neighbors of a point y. xi = 1 or 0 means if there is a point or not on a location of t he neighbors, while y = 1 means there is always a point, which is convolved. α = -8, β = 1 and n = 8 in this case of the convolution kernel of Fig. 3a.

With reference to Fig. 3b and 3c, it is assumed that the sites marked by zero have no points and the oth ers, marked by 1 or P( ) have points on it. Suppose that the points on the sites P( ) are convolved by the convolution kernel shown in Fig. 3a, the difference between the convolving results of 3(b) and 3(c) is that P(b) > P(a), since the points in Fig. 3(c) are denser than those in Fig. 3(b).

Further, suppose that the points have their associa ted directions and this information is utilized in the convolution process, Equation (5) is changed to:

( ) ∑=

+=•n

ii yxP

1

vv αβ , (6)

where ixv

and yv

are vectors.

Simulation examples of this new convolution algorithm are presented in 3(d) and 3(e), where the points have the same density but associated different norm al directions. For easily comparing the results of the two different convolution algorithms, suppose that:

=else 1

point No 0ixv

, (7)

1=yv

. (8)

where ||·|| is the length of a vector. Let the cent er points on Fig. 3d and 3e be convolved by the algorithm (Eq . 6), and the results are shown by the dot arrows on the same figures. These results suggest that both the struct ure information (the normal directions) and the density

information play important roles in deriving the polarity and the strength of the result vector.

In a general convolution, the number of the neighbors is restricted as eight and the length of a vector i s not always one. In this case, the distance between y and xi should be taken in the convolution algorithm, since the neighbors in different distances contribute differe nt values to the convolution process. This general convolving algorithm is described as:

( ) ( )( )∑=

+=•n

iii yxyxdP

1

,vv αβ , (9)

where d(xi, y) is the distance between the points xi and y.

(a)

(b) (c)

(d) (e)

Figure 3. The density and structural information in a convolution process ( P(b) > P(a)). See text for

explanation.

One more problem is that this definition of the convolution can only work with the points, which al ready has been associated normal directions. For the poin ts, which have only position information, a method shou ld be defined for inducing a preferred direction on the

130

convolving processing. This method is proposed in t ensor voting by setting up the most likely continuation c urves between two points (Fig. 2).

Finally, the convolving kernel of the tensor convolution is defined by Equation (9) and Figure ( 2), while β(d(xi, y)) is defined by Equation (4) in details. By this convolution technique, some inherent structure information hided in the clouds of points can be ex plored.

In the case of acoustic images, the input is only t he location information of points in 3-D space and is encoded by a bundle of unit vectors with all of the possible directions. This bundle of unit vectors is represented by a unit ball tensor. However, the enc oding processing does not produce any prior directional information on the points, which can be utilized in the convolution process.

According to the definition of the convolution kern el, the most likely continuation curves are used to connect the any two points involving in the convolution (e.g., O and P in Fig. 4). In this example (Fig. 4), the informat ion generated by O at P is expressed by vectors, their directions are defined by the normal of the most li kely continuation curves and their modules can be calcul ated by the convolving algorithm in Eq. (9). Thus, the convolution process derived the most preferred orientation in P based on the relative position between O and P.

For the acoustic images, the induced direction represents the normal directions of the points, embedding the inherent surface structure, from which the acoustical pulses are reflected. In other words, the inherent structural information is exploited in this process.

Figure 4. The convolution between O and P in the 3-D space. The dot lines are the most likely continuation curves between O and P and the

vectors are their normal at P. 3.2 Hierarchical segmentation and filtering

The framework of the proposed hierarchical acoustic

image segmentation approach is shown in Fig. 5. In this framework, the acoustic images are encoded to ball tensors, convolved and decomposed twice into the three

elements of the eigensystem of the second symmetric tensors. Every convolution and decomposition is fol lowed by the segmentation using different thresholds based on the eigenvalue of the decomposed stick tensors. The result of the two segmentation processes is that the high-density structured points are separated from the low-density structured points, and noisy points are fil tered out from the low-density structured points. This techni que is robust to several levels of noise.

First of all, the points in an acoustic image are transformed into the isotropic tensors of unit radius (ball tensors), since there is only the three-dimensional pose information of the points, which is utilized in this approach.

Figure 5. The framework of the hierarchical segmentation based on tensor voting.

In the first convolution stage, every point is

processed using the predefined convolution kernel (Eq. 9, Fig. 2). This procedure is the same as the general

131

convolution operation except for the convolving kernels. The result of this process is that each point is now encoded by a generic second-order symmetric tensor (ellipsoid tensor) instead of the primary ball tens or.

As mentioned above, the structural information in acoustic images indicates the surfaces, which backs catter sonar pulses. The assumed reasonable assumption is that the structural information in acoustic images is en coded by the similarity of the normal directions of the s tructured points.

Actually, normal directions of the points located on the same surface change smoothly. This is the well-known smoothness constraint proposed by Marr [17]. This information, the similarity of the normal dire ctions of the structured points, can be extracted by the t ensor convolution and, as simulated in Fig. 3(d) and Fig. 3(e), the similarity in orientations tends to generate mu ch larger values than the randomly distributed directi ons.

After the first tensor convolution, the resulting generic tensors are decomposed into the three elements: stick, plate, and ball tensors (Fig. 1). In this pa per, we are only interested in extracting the set of structured points, or, in other words, points lying on surfaces from t he randomly distributed points, and this information i s encoded in the stick tensors. Thus, based only on t he eigenvalues of the stick tensors, ( )21 λλ − , the acoustic

image is segmented into two images: high-density structured points (HDS Points, Fig. 5) and mixed points (MIX Points, Fig.5), which include the low density structured points and noisy points.

In order to filter MIX Points, the points in the im age are encoded to the ball tensors again. In other wor ds, only the positional information of the points are employed to encode the points, meanwhile all information from t he first tensor voting processing is not used.

Similar to the first tensor convolution stage, the ball tensors are convolved, and the result is decomposed again. The segmentation is still based on the eigenvalues of the stick tensors, and finally, noisy points are filter ed out.

3.3 The segmentations’ thresholds

The threshold used in the first segmentation is dependent on the difference of the distribution den sities of the points associated to the high-density and the low-density points’ sets. The threshold used in the second segmentation is basically based on the density of t he noisy points’ set.

In our experiments, the thresholds are heuristicall y set up as 10 to 20 percent of the largest eigenvalue

( )21 λλ − of the stick tensors, since the histograms of the

eigenvalues are very sharp between the structured p oints and noisy points in our experiments (Fig. 6).

(a) Reference image: Fig. 7a, 3 rd image from left (T1=18.5, σσσσ = 10, Total Points = 2459)

(b) Reference image: Fig. 7c, 3 rd image from left (T2=0.41, σσσσ = 10, Total Points = 1773)

Figure 6. The distribution of λGxV (VALUE) after

the first (a) and second (b) convolution stage.

According to Equation (4), the resulting generic tensors are sensitive to the distances between the points and as indicated in [2], 3-D acoustic images have typically a very variable points’ density due to the different relationship between surface normals and the acoustic pulse directions. The high-density points’ distribution occurs when the acoustic pulse direction is nearly parallel to the surface normal, whereas low- density points’ distribution comes across when the angle between

132

these directions is large. In short, a threshold, T 1, is necessary for separate high-density structured (HDS ) points from low-density structured (LDS) points, an d another threshold, T2, should also be set to segment LDS points from non-structured noisy points.

Unfortunately, these two thresholds (T1 > T2) cannot be directly applied to the convolution results. Actually, points in the acoustic images can be grouped into four groups: G1) points in high-density structured sets, G2) noisy points near high-density structured sets, G3) points in low-density structured sets, G4) noisy points. After the first convolution, the variation trends of the eigenvalues

of the decomposed stick tensors, ( ) λ=λ−λ GiV21 , along

the groups Gi (i=1,..,4) is the following:

λλλλ3121 GGGG VVandVV ≥≥ (9)

λλλλ4342 GGGG VVandVV ≥≥ (10)

A typical example is shown in Fig. 6a, in which λ2GV

and λ3GV are both close to zero (less than 20). Obviously,

there is no good way to set up an appropriate thres hold (T2) for segmenting G2 from G3. Indeed λ

2GV is much

larger than λ3GV (Fig. 6a and 6b) in this example. The

values ),( 32λλ

GG VV are dependent on both the density of

G3 and the distance between G2 and G1. This problem can be solved in the hierarchical

convolution and segmentation approach. A threshold (T1) is set heuristically to separate G1 from the other groups. In this experiment T1 is simply set to 10% of the b iggest value, since the histogram is very sharp in the are a near zero. Note that there are some points that their values are bigger then 2 in Fig. 6a, but there are no any poin ts that their values are bigger then 2 in Fig. 6b.

The only difference between two convolutions is tha t the HDS points (G1) are removed from the other poin ts. That means, after the first segmentation, G2 is changed back to "normal" noise points since there are no HDS points near G2, and the associated values of these points (e.g., ( )1,22 TVG ∈λ ) go back to "normal" ( ( )0.41 ,02 ∈λ

GV ,

which are less than λ3GV . This result is inferred evidently

by comparing T1 in Fig. 6a with λGxV distribution in Fig.

6b. As a result, a hierarchical convolution technique i s

proposed for segmenting HDS points, LDS points and noisy points in acoustic images. The first segmenta tion separates the original image into two images: HDS i mage (Fig. 7(b)), which includes HDS points (G1) and MIX image (Fig. 7(c)), which contains the other groups (G2, G3, G4).

The points in MIX image are encoded to ball tensors again. The second convolution is performed with the same convolution kernel and the results are again decomposed into their three elements accordingly.

The second segmentation is still based on the strength values of the stick tensors from the decomposition. Heuristically, the second threshold (T2) is set to separate LDS points (G3) from noise points ( G2, G4). Finally, LDS image (Fig. 7(d)), which includes LDS points, is segregated from MIX image.

3.4 The difference between hierarchical

segmentation and tensor voting

It should be emphasized that the second tensor convolution in the hierarchical segmentation is dif ferent from the second stage of the tensor voting, in whic h the generic tensors propagate their information into all of their neighbors, in three respects. First of all, the second tensor convolution is still based on the sparse inp ut points, whereas the second stage of tensor voting is dense. Second, the three-dimensional space is not voxelize d to unit cubes in the tensor convolution, which it is v oxelized in practice for generating the dense tensor map in the second stage of tensor voting. Third, the informati on generated by the first convolution is not brought i nto the second convolution process, except for separating HDS and MIX points’ sets, whereas the second voting utilizes the refined tensors.

4. Experimental results

We demonstrate the general usefulness of the hierarchical segmentation and filtering approach wi th four real 3-D acoustic images (Fig. 7). The original image (F ig. 7a, no noise) is acquired by an acoustic camera (Echoscope 1600, Omnitech A/S). The objects in the image constitute an offshore rig made by pipelines, and a piece of seabed. The other images (Fig. 7a 50% Nois e, 100% Noise, and 200% Noise) are synthesized based o n the original images by adding different percentages of clutter points. The noisy points are uniformally distributed in the 3-D space, and are added with re ference to the original number of points. Such percentages are 50%, 100%, and 200%.

In these images, the HDS points represent the pipeline structure nearly perpendicular to the image plane of the acoustic camera, while the LDS clouds are di vided into two parts: two tubes above the HDS cloud, and a piece of seabed under the HDS cloud.

In these experiments, the scale parameter of Equation (4) is always set to 10 ( σ = 10), since the segmentation results are not sensitive to a reasonable range of this parameter. The threshold, as discussed above in det ails, for the first segmentation is 10% of the maximum of

133

( )21 λλ − , and the threshold for the second segmentation

(after HDS points removal) is set to 20% of the maximum of the eigenvalues of the stick tensors. The segmen tation results are not sensitive to a range of these thres holds too, because the convolution processing leads to a big difference between the eigenvalues of structured points and that of noisy points.

The acoustic images are segmented into two parts in the first segmentation (Fig. 7a, X% Noise → Fig. 7b X% Noise + Fig. 7c X% Noise). Evidently, these results are hardly affected by the different percentages of noise points. Indeed, the segmentation is still stable up to 1000% noise in our experiments. In this case, HDS points set are almost indistinguishable in the image.

The points in the images of Fig. 7c are encoded to ball tensors and convolved again. The resulting tensors are decomposed and LDS clouds is extracted from noi sy points based on ( )21 λλ − values. The outcome is

illustrated in Fig. 7d. All of the low-density stru ctures are well separated from noise. This result demonstrates that the algorithm is robust at over 200% noise for the LDS points.

5. Conclusions

A tensor convolution approach based on the computational framework of tensor voting is propose d to filter and to segment noisy 3-D acoustic images. This approach extracts both point distribution informati on and the inherent structural information directly from t he input data by using a hierarchical convolution and segmentation scheme. Unlike traditional convolution methods, the new convolution kernel, based on tensor voting, is introduced to the 3-D data filtering and segmentation, in whic h the inherent structural information is exploited. Experiments demonstrate that this new approach can extract low density structured points from noisy images, while this is usually a difficulty task for some three dimensiona l data filtering algorithms. The experiments also show tha t the tensor convolution is robust with respect to different noise levels.

No Noise 50% Noise

100% Noise 200% Noise

(a)

No Noise 50% Noise

100% Noise 200% Noise

(b)

134

No Noise 50% Noise

100% Noise 200% Noise

(c)

No Noise 50% Noise

100% Noise 200% Noise

(d)

Figure 7. Filtering and segmentation experiments. (a) Noisy acoustic images. (b),(c) Results of the first segmentation stage: HDS

points’ set (b), and MIX points’ set (c) are extracted. (d) Results of the second

segmentation stage: noisy points are filtered, and LDS points’ set is extracted.

ACKNOWLEDGMENTS This work is supported by the European Commission under the project no. GRD1-2000-25409 named ARROV – Augmented Reality for Remotely Operated Vehicles based on 3D acoustical and optical sensors for underwater inspection and survey . Acoustic images have been acquired by Echoscope 160 0, acoustic camera, and are courtesy of Dr. R.K. Hanse n, Omnitech A/S (Norway). 6. References [1] R. K. Hansen and P. A. Andersen, “A 3-D Underwater

Acoustic Camera – Properties and Applications”, In P. Tortoli and L. Masotti (eds.), Acoustical Imaging , Plenum Press, London, 1996, pp. 607 – 611.

[2] V. Murino, A. Trucco, “Three-dimensional Image Genera-

tion and Processing in Underwater Acoustic Vision”, Proceedings of the IEEE , 2000, 88(12), pp. 1903 – 1946.

[3] M. Okino, Y. Higashi, “Measurement of Seabed Topo-

graphy by Multibeam Sonar Using CFFT”, IEEE J. Oceanic Engineering, 1986, 11(4), pp. 474 – 479.

[4] L. Henriksen, “Real-time Underwater Object Detection

Based on an Electrically Scanned High-resolution So nar”, Proc. IEEE Symposium on Autonomous Underwater Vehic le Technology, Cambridge, 1994.

[5] D. Sauter, L. Parson, “Spatial Filtering for Speckle Reduc-

tion, Contrast Enhancement, and Texture Analysis of GLORIA Images”, IEEE J. Oceanic Engineering , 1994, 19(4), pp. 563 – 576.

[6] S. Z. Li, Markov Random Field Modeling in Computer

Vision, Springer-Verlag, Tokyo, 1995. [7] V. Murino, “A. Trucco, Markov-based Methodology for the

Restoration of Underwater Acoustic Images”, International J. of Imaging Systems and Technology, 1997, 8(4), pp. 386 – 395.

[8] M. Mignotte, C. Collet, P. Pérez, P. Bouthemy, “Mar kov

Random Field Model and Fuzzy Formalism-based Data Modeling for the Sea-bed Classification in Sonar Imagery”, SPIE Conference on Mathematical Modeling, Bayesian Estima-tion and Inverse Problems, Colorado, 1999, 3816(29), pp. 229 – 240.

[9] S. Dugelay, C. Graffigne, J. M. “Augustin, Segmentation of

Multibeam Acoustic Imagery in the Exploration of th e Deep Sea Bottom”, Proceedings of 13 th International Conference on Pattern Recognition , Vienna, 1996, pp. 437 – 445.

[10] V. Murino, A. Trucco, “Confidence-based Approach to

Enhancing Underwater Acoustic Image Formation”, IEEE

135

Transactions on Image Processing , 1999, 8(2), pp. 270 – 285.

[11] V. Murino, A. Trucco, C.S. Regazzoni, “A Probabilistic

Approach to the Coupled Reconstruction and Restoration of Underwater Acoustic Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence , 1998, 20(1), pp. 9 – 22.

[12] G. Guy and G. Medioni, “Inference of Surfaces, 3-D Curves

and Junctions from Sparse, Noisy 3-D Data”, IEEE Trans. Patt. Analy. Machine Intell. , 1997, 19(11), pp. 1265 – 1277.

[13] G. Medioni, M–S. Lee, C–K. Tang, A Computational

Framework for Segmentation and Grouping , Elsevier, Oxford, 2000.

[14] D. A. Danielson, Vectors and Tensors in Engine ering and

Physics, Addison-Wesley Publishing Company, London, 1996.

[15] L.V. Subramaniam, R. Bahl, “Segmentation and Surface

Fitting of Sonar Images for 3D Visualization”, Proc. 8th Int. Symp. on Unmanned Untethered Submersible Technology , Durham (NH, USA), 1995. pp. 290 – 298.

[16] R.K. Hansen, P. A. Andersen, “A 3-D underwater acoustic

camera – properties and applications. In P. Tortoli and L. Masotti, editors, Acoustical Imaging , Plenum Press, 1996.

[17] D. Marr, Vision: A Computational Investigation into the

Human Representation and Processing of Visual Information, W. H. Freeman and Co., San Francisco, 1982.