Point Cloud Transport - Osaka Universitymakihara/pdf/icpr2012_nakajima.pdf · Point Cloud Transport Hozuma Nakajima, Yasushi Makihara, Hsu Hsu, Ikuhisa Mitsugami, Mitsuru Nakazawa,

Point Cloud Transport

Hozuma Nakajima, Yasushi Makihara, Hsu Hsu, Ikuhisa Mitsugami, Mitsuru Nakazawa,Hirotake Yamazoe, Hitoshi Habe, Yasushi Yagi

ISIR, Osaka University, Japan{nakajima, makihara, hsu, mitsugami, nakazawa, yamazoe, habe, yagi}@am.sanken.osaka-u.ac.jp

Abstract

In this paper we propose a method for temporal in-terpolation of a point cloud undergoing occlusions andtopological changes. The point cloud is first mergedinto fine clusters, which are then further merged intocoarse clusters for each source and target shape. Inconjunction with trash box bins to cope with occlusions,a coarse correspondence between a source and a targetshape is found that minimizes the transportation costin the earth mover’s distance framework. Subsequently,a fine correspondence is found in a similar way basedon the coarse correspondence constraint to suppress lo-cally isolated motion. Finally, the source and targetpoint clouds are transported based on the fine corre-spondence. Experiments with point cloud sequencescaptured by a Kinect range finder show promising re-sults.

1. IntroductionRecently, range data have been used in a wide range

of fields including 3-D face recognition [3], 3-D facemodeling [1], human pose estimation [12], and culturalheritage modeling [9]. To capture dynamic objects (e.g.,facial expressions and human poses), range data withhigher frame-rates are preferable in several applicationsfor better visualization and performance.

One of the ways of generating such range data istemporal interpolation by shape morphing techniques,which fall mainly into two categories: (1) surface-basedapproaches, and (2) volume-based approaches.

The surface-based approaches usually start by find-ing correspondences between the source and target sur-faces, which is also an important process for non-rigidsurface registration, otherwise known as deformable ob-ject registration [11, 6]. Most of the surface-based ap-proaches, however, cannot successfully cope with topo-logically different shapes. As an exception, Bronsteinet al. [5] proposed a topology-invariant similarity com-prising both intrinsic and extrinsic similarities. In the

method, a shape is modeled as a metric space witha two-dimensional smooth compact connected surfaceembedded in the three-dimensional Euclidean space [4].Acquiring such smooth compact connected surfaces,however, could be difficult if the range data contain sig-nificant noise and also discontinuities due to occlusions.

The volume-based approaches treat topologicallydifferent shapes in a more natural way by interpolat-ing signed distance fields for source and target shapes(e.g., a positive value for the inside shape and a nega-tive value for the outside shape) [10] or by applying alevel-set method to the signed distance fields [2]. How-ever, they naturally require volume data with a closedsurface and hence cannot be applied to range data withan open surface observed from a single view.

In this paper, we propose an interpolation methodfor range data with open surfaces containing noise, dis-continuities due to occlusions, and topological changes.Because it is difficult to reconstruct smooth surfacesfrom such range data, we formulate this interpolation aspoint cloud transport rather than non-rigid surface de-formation. In this formulation, a point cloud in a sourceshape is directly warped into that of a target shape be-yond topological changes.

The proposed method was inspired by topology-freevolume-based 2D shape morphing [8], which, however,suffers from incorrect correspondences due to occlu-sions, as well as from locally isolated motion due to theabsence of constraints on the smoothness of adjacentmovements.

To overcome such problems, we introduce (1) trashbox bins, and (2) a coarse-to-fine framework. Trash boxbins realize appearing and disappearing processes forpoints with no plausible correspondences due to occlu-sions. Such points in the source and target shapes go toor come from the trash box bins equipped with the targetand source shapes, respectively. In the coarse-to-fineframework, the coarse correspondence is employed asa global constraint for the fine correspondence, therebysuppressing undesirable locally isolated motion.

21st International Conference on Pattern Recognition (ICPR 2012)November 11-15, 2012. Tsukuba, Japan

978-4-9906441-1-6 ©2012 IAPR 3803

Figure 1. Overview. In (b) and (c), eachpoint is depicted by an alpha-blendedcolor based on membership to each clus-ter. In (d) and (e), flows from source totarget clusters are depicted by a colortransition from yellow to cyan, while acolor transition from/to red indicates theappearance/disappearance by trash boxbins. In (f), interpolated point clouds areshown with a transition rate of 0.25, 0.5,and 0.75 from left to right.

2. Proposed framework2.1. Overview

An overview of the proposed framework is given inthis section (see Fig. 1). A point cloud is first mergedinto fine clusters, which are then further merged intocoarse clusters for each source and target shape. Acoarse correspondence between a source and a targetshape is found in conjunction with the trash box bins tocope with occlusions. Subsequently, a fine correspon-dence is found in conjunction with the coarse corre-spondence constraint to suppress locally isolated mo-tion. Finally, source and target point clouds are trans-ported based on the fine correspondence. Details ofeach of these steps are given in the following subsec-tions.

2.2. ClusteringGiven a point cloud composed of N 3D data points

{x i} (i = 0, . . . , N − 1), NF fine clusters are obtainedby fuzzy C-means (FCM) clustering [7]. We define thej-th fine cluster’s mean x̄Fj and weight wFj as

x̄Fj =

∑N−1i=0 mFijx i∑N−1i=0 mFij

, wFj =N−1∑i=0

mFij , (1)

where mFij is the membership of the i-th 3D data pointto the j-th fine cluster, which holds

∑NF−1j=0 mFij=1 ∀i.

Furthermore, NF fine clusters are merged into NC

coarse clusters in the same way. We define the l-thcoarse cluster’s mean x̄Cl and weight wCl as

x̄Cl=

∑NF−1j=0 mCjlwFj x̄Fj∑NF−1

j=0 mCjlwFj

, wCl=

NF−1∑j=0

mCjlwFj , (2)

where mCjl is the membership of the j-th fine cluster tothe l-th coarse cluster, which satisfies

∑NC−1l=0 mCjl =

1 ∀j. Figures 1(b) and (c) illustrate examples of themembership of the fine and coarse clusters.

2.3. Coarse correspondenceThe second step involves acquiring a coarse corre-

spondence so as to minimize the transportation costfrom the source to target shapes in a similar way tothat given in [8]. Let the sets of means and weights forthe source coarse clusters be X̄s

C = {x̄ sCl} and ws

C ={ws

Cl} (l = 0, . . . , NsC − 1) and those for the target

coarse clusters be X̄tC = {x̄ t

Cm} and w tC = {wt

Cm}(m = 0, . . . , N t

C − 1), where NsC and N t

C are the num-ber of source and target coarse clusters, respectively.Note that superscripts s and t denote source and target,respectively.

Moreover, an extra cluster for the trash box bin isadded for each source and target shape. As a result, thenumbers of source and target coarse clusters are incre-mented to Ns

C+1 and N tC+1, respectively. For notation

convenience, we assign such extra clusters to the NsC-

th source and N tC-th target coarse clusters, respectively.

Weights wsCNs

Cfor the Ns

C-th source coarse cluster andwt

CNtC

for the N tC-th target coarse cluster are then set

to the sum of the weights of the opposite-side coarseclusters excluding the trash box bins; more specifically,ws

CNsC

=∑Nt

C−1m=0 wt

Cm and wtCNt

C=

∑NsC−1

l=0 wsCl,

respectively. Note that this trash box weight setting en-ables all the clusters to be accommodated by the trashbox bins in the transport stage; that is, entirely appear-ing or disappearing in an extreme case.

The transportation cost and flow (transportationamount) from the l-th coarse cluster of the source shapeto the m-th coarse cluster of the target shape includ-ing the trash box bins are denoted as dClm and fClm

(l = 0, . . . , NsC ,m = 0, . . . , N t

C), respectively. Thetransportation cost is then defined as

dClm =

∥x̄ t

Cm−x̄ sCl−̄v∥2 (l<Ns

C and m<N tC)

0 (l=NsC and m=N t

C)

dCtb2 otherwise

, (3)

where v̄ is a translation vector between the gravity cen-ters of the source and target point clouds, and dCtb is thetrash box cost for coarse clusters, which is equivalent tothe threshold for appearing or disappearing in the caseof occlusion. Therefore, the trash box cost must be setso that it is larger than the maximum Euclidean distancefor plausible point cloud transportation, but smaller thanthe Euclidean distance by incorrect correspondence.

Finally, the earth mover’s distance (EMD) flows areoptimized so as to minimize the transportation cost bythe Hungarian algorithm.

3804

{f∗Clm} = arg min

{fClm}

NsC∑

l=0

NtC∑

m=0

fClmdClm, (4)

which is subject to∑Ns

C

l=0 fClm = wtCm ∀m,∑Nt

Cm=0 fClm = ws

Cl ∀l, and fClm ≥ 0 ∀l,m. Now,we can regard the obtained {f∗

Clm} as the cluster-basedmany-to-many correspondence weight coefficients be-tween the source and target shapes. Figure 1(d) showsan example of the coarse flows.

2.4. Fine correspondenceThe third step involves acquiring a fine correspon-

dence. To suppress locally isolated motion, we considerthe constraint that the fine clusters’ flows are close tothe dominant coarse flows. For this purpose, the domi-nant flow vs

Cl from the l-th source coarse cluster is firstobtained as

vsCl=

{̄x tCm′−x̄ s

Cl (m′<N t

C)v̄ (m′=N t

C), m′=argmax

mf∗Clm. (5)

Note that a translation vector v̄ between the gravity cen-ters of the source and target point clouds is substitutedas above when the flow to the trash box is dominant(m′ = N t

C).The dominant flow vs

Fj from the j-th source finecluster is then interpolated based on the coarse domi-nant flow {vs

Cl} and membership {msCjl} of the source

fine clusters to the coarse clusters is given as

vsFj =

∑NsC−1

l=0 msCjlv

sCl∑Ns

C−1

l=0 msCjl

. (6)

The dominant flow v tFk to the k-th target fine cluster is

obtained in a similar way.The transportation cost and flow from the j-th source

fine cluster to the k-th target fine cluster are denoted asdFjk and fFjk (j = 0,. . . ,Ns

F , k = 0,. . . ,N tF ), respec-

tively. The total transportation cost is then defined as

dFjk=

min{∥x̄ t

Fk−x̄ sFj−vs

Fj∥2,∥x̄ t

Fk−x̄ sFj−v t

Fk∥2} (j<NsF and k<N t

F )

0 (j=NsF and k=N t

F )

dFtb2 otherwise

,

(7)where dFtb is the trash box cost for fine clusters. Notethat the above transportation cost is defined so that it issmall as the correspondence approaches either of the in-terpolated dominant flows vs

Fj and v tFk from the source

and to the target, respectively. In addition, the trash boxcost dFtb for fine clusters should be smaller than that forcoarse clusters dCtb to avoid locally isolated motion.

Finally, the EMD flows are optimized so as to min-imize the transportation cost in the same way as forcoarse clusters and the obtained flows are denoted as{f∗

Fjk}. Figure 1(e) illustrates an example of the fineflows.

(a) Ball handling (b) SittingFigure 2. Range data. Left and right im-ages show the source and target.

2.5. TransportPoint clouds of the source and target shapes are

transported according to a transition rate α. From atransport point of view, an intermediate point cloud isexpressed as a set of multiple weighted data points, be-cause there are multiple point clouds that come fromboth the source and target shapes, and which are derivedfrom multiple memberships of each 3D data point tothe clusters as well as many-to-many correspondencesas a result of the EMD framework. In fact, the i-th 3Ddata point x s

i in the source point cloud is transported tox sijk (j = 0, . . . , Ns

F − 1) by the flow f∗Fjk from the

j-th source fine cluster to the k-th target fine cluster byweight ws

ijk, expressed as

x sijk =

{x si+α(x̄ t

Fk−x̄ sFj) (k < N t

F )

x si+αvs

Fj otherwise(8)

wsijk =(1− α)ms

Fijf∗Fjk, (9)

The i-th 3D data point in the target point cloud is trans-ported to x t

ijk (k = 0, . . . , N tF − 1) by weight wt

ijk inthe same way, expressed as

x tijk =

{x ti−(1−α)(x̄ t

Fk−x̄ sFj) (j < Ns

F )

x ti−(1−α)v t

Fk otherwise(10)

wtijk =αmt

F ijf∗Fjk. (11)

Figure 1(f) shows examples of the transition of pointcloud transport.

3. Experiments3.1. Setup

To evaluate the proposed method, range data for sev-eral human actions were captured by a KinectTM rangefinder, as shown in Fig. 2. Foreground range data wereextracted by background subtraction and a point cloudwas acquired by projecting the foreground range datainto 3D space based on the intrinsic camera parameters,calibrated in advance. The numbers of fine and coarseclusters were set to NF = 200 and NC = 20, respec-tively. The trash box costs for fine and coarse corre-spondence were, respectively, set to dFtb = 100 [mm]and dCtb = 200 [mm] experimentally. The proposedmethod was compared with topology-free shape mor-phing [8] as a benchmark.

3.2. ResultsResults of the point cloud transport transitions are

shown in Fig. 3. Regarding the ball-handling scene, be-cause the face is partially occluded by the ball in the

3805

target shape (green circle in the first row of Fig. 3),the face in the source and the ball in the target partiallycorrespond with each other in the benchmark method,which results in undesirable artifacts (red circles in thefirst row of Fig. 3).

On the other hand, the proposed method can appro-priately handle occlusions as appearing or disappearingprocesses using the trash box bins and also suppresslocally isolated motion with the coarse-to-fine frame-work. Therefore, the proposed method successfullysuppresses such artifacts and realizes more natural pointcloud transitions as shown from the second to fourthrows of Fig. 3.

4. ConclusionThis paper described a point cloud transport method

for temporal interpolation of range data. To overcomeincorrect correspondence due to occlusions, we intro-duced appearing and disappearing processes using trashbox bins. In addition, because locally isolated mo-tion often occurs due to the lack of a constraint for thesmoothness of adjacent motion, a coarse-to-fine frame-work was included in the proposed method. Both coarseand fine correspondences were found so as to mini-mize the transportation cost in the earth mover’s dis-tance framework. The experimental results show thatthe proposed method works well with occlusions andtopological changes.

AcknowledgmentThis work was supported by CREST (Core Research

for Evaluation Science and Technology) of JST (JapanScience and Technology).

References

[1] V. Blanz and T. Vetter. A morphable model for the syn-thesis of 3d faces. In Proc. of the 26th Annual Conf. onComputer Graphics and Interactive Techniques, 1999.

[2] D. Breen and R. Whitaker. A level-set approach for themetamorphosis of solid models. IEEE Trans. on Visu-alization and Computer Graphics, 7(2):172–192, Apr.2001.

[3] A. M. Bronstein, M. M. Bronstein, and R. Kimmel.Three-dimensional face recognition. Int. Journal ofComputer Vision, 64(1):5–30, 2005.

[4] A. M. Bronstein, M. M. Bronstein, and R. Kimmel. Nu-merical Geometry of Non-Rigid Shapes. Springer, 2008.

[5] A. M. Bronstein, M. M. Bronstein, and R. Kimmel.Topology-invariant similarity of nonrigid shapes. Int.Jorunal of Computer Vision, 81(3):281–301, 2009.

[6] K. Fujiwara, K. Nishino, J. Takamatsu, B. Zheng, andK. Ikeuchi. Locally rigid globally non-rigid surface reg-istration. In Proc. the 13th IEEE Int. Conf. on ComputerVision, pages 1527 –1534, Nov. 2011.

[7] F. Hoppner, F. Klawonn, R. Kruse, and T. Runkler.Fuzzy Cluster Analysis. John Wiley and Sons, 1999.

[8] Y. Makihara and Y. Yagi. Earth mover’s morphing:Topology-free shape morphing using cluster-based emdflows. In Proc. of the 10th Asian Conf. on ComputerVision, pages 2302–2315, Queenstown, New Zealand,Nov. 2010.

[9] D. Miyazaki, T. Ooishi, T. Nishikawa, R. Sagawa,K. Nishino, T. Tomomatsu, Y. Takase, and K. Ikeuchi.The great buddha project: Modelling cultural heritagethrough observation. In Proc. the 6th Int. Conf. onVirtual Systems and MultiMedia, pages 138–145, Gifu,2000.

[10] B. Payne and A. Toga. Distance field manipulation ofsurface models. IEEE Computer Graphics and Appli-cations, 12(1):65–71, 1992.

[11] R. Sagawa, K. Akasaka, Y. Yagi, H. Hamer, and L. V.Gool. Elastic convolved icp for the registration of de-formable objects. In Proc. the 2009 IEEE Int. Workshopon 3-D Digital Imaging and Modeling, pages 1558–1565, Kyoto, Japan, Oct. 2009.

[12] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finoc-chio, R. Moore, A. Kipman, and A. Blake. Real-timehuman pose recognition in parts from single depth im-ages. In Prof. the 22th IEEE Conf. on Computer Visionand Pattern Recognition, pages 1297 –1304, Jun. 2011.

Figure 3. Results of point cloud transport.Transition rates α are 0.0, 0.25, 0.5, 0.75,and 1.0 from left to right, respectively.The first and second rows: benchmarkmethod [8] and the proposed method forball handling. The third and fourth rows:reconstructed surface mesh for ball han-dling and sitting by the proposed method.

3806

Documents

Point Cloud Transport - Osaka Universitymakihara/pdf/icpr2012_nakajima.pdf · Point Cloud Transport Hozuma Nakajima, Yasushi Makihara, Hsu Hsu, Ikuhisa Mitsugami, Mitsuru Nakazawa,