Eﬃcient Extraction of Robust Image Features on Mobile Devices - People …people.cs.vt.edu/~yxiong/PaperAfter2006/Efficient... · 2010-07-02 · Eﬃcient Extraction of Robust Image

Efficient Extraction of Robust Image Features on Mobile Devices

Wei-Chao Chen Yingen Xiong Jiang Gao Natasha Gelfand Radek Grzeszczuk∗

Nokia Research Center, Palo Alto

Recent convergence of imaging sensors and general purpose pro-cessors on mobile phones creates an opportunity for a new class ofaugmented reality applications. Robust image feature extraction isa crucial enabler of this type of systems. In this article, we dis-cuss an efficient mobile phone implementation of a state-of-the-artalgorithm for computing robust image features called SURF. Weimplement several improvements to the basic algorithm that sig-nificantly improve its performance and reduce its memory footprintmaking the use of this algorithm on the mobile phone practical. Ourprototype implementation has been applied to several practical ap-plications such as image search, object recognition and augmentedreality applications.

1 Introduction

Integration of cameras and general-purpose processors in mobilephones signals the dawn of a new class of mobile computation de-vices. In the field of computer vision and graphics, we witness a va-riety of applications utilizing the mobile phone platforms, includingMARA [2] and Lincoln [7].

One particularly appealing, but challenging, aspect of mobileaugmented reality is feature tracking for augmentation of the real-world imagery with overlay information similar to Skrypnyk andLowe [6]. An important component of such application is the com-putation of robust, scale-invariant feature descriptors [4, 1]. Fig-ure 1 shows a typical image matching algorithm pipeline using fea-ture descriptors.

These algorithms tend to be very computationally intensive evenfor modern desktop PCs. Mobile devices, despite their rapid ad-vances, will likely not match the desktop PC performances in thenear future. Naturally, we have an option of sending a query imageto the server and computing its features there, but the wireless net-work latency and slow uplink bandwidth pose severe constraints onreal-time AR applications. It is therefore an interesting challenge tooptimize these algorithms for both space and performance to maketheir execution on a mobile platform efficient.

Our Contributions. We have chosen the SURF algorithm [1]as the basis of our implementation because of its favorable com-putational characteristics and its state-of-the-art matching perfor-mance. We then implemented and optimized this algorithm on amobile phone. Our implementation is on average 30% faster anduses half as much memory.

2 Platform Considerations

We target a mobile phone platform that uses Texas Instrument’sOMAP 2 application processor architecture which integrates anARM11 core, an image/video accelerator, a DSP and a PowerVR

∗{wei-chao.chen, yingen.xiong, jiang.gao, natasha.gelfand,radek.grzeszczuk}@nokia.com

Interest-PointExtraction

RepeatableAngle

Computation

DescriptorComputation

ApproximateNearest Neighbor

Matching

FeatureDatabase

(Local/Remote)

MatchedFeatures/Objects

Input CameraImage

SURF Algorithm GeometricConsistency

Check

Figure 1: Image matching algorithm overview.

graphics core, among others. The operating system of choice forour mobile device is Symbian OS. The amount of memory avail-able to each thread running on a mobile device is typically fairlylimited to ensure system stability. Hence, a careful considerationmust be given to a mobile phone implementation of an algorithm toensure good performance.

3 Implementation

The SURF algorithm [1] consists of three major steps: interest pointextraction, repeatable angle computation and descriptor computa-tion. The interest point extraction step starts with computing the de-terminant of the Hessian matrix and extracting local maxima. TheHessian matrix computation is approximated with a combination ofHaar basis filters in successively larger levels. Therefore this stepis O(mnlog2(max(m,n))) for a m× n size image. Each extractedinterest point is further improved by quadratic localization.

After the interest points and their scales are obtained, a repeat-able angle is extracted for each interest point prior to computingthe feature descriptor. This step computes the angle of the gradientssurrounding the interest point and the maximum angular response ischosen as the direction of the feature. This direction is then used tocreate a rotated square around the interest point, and regularly sam-pled gradients within this template are combined per grid locationto form the final descriptor. Because both of these steps require pro-cessing image footprints proportional to the interest point scale, anefficient sampling algorithms can speedup significantly in these twosteps. Finally, we evaluated the quality of our implementation andand adjust the parameters in the algorithm using the benchmarksfrom the paper by Mikolajczyk et al [5].

3.1 Our Implementation

In the first two steps of the algorithm we use integral image forefficient Haar transform computation similar to [1]. It takes onlytwo floating operations per pixel to transform from a regular imageto integral image in-place, therefore we only store either the originalor integral image, and convert them back and forth as needed.

Interest point detection. This step involves computing Hes-sian determinant value at every location (x,y) on the scale space s,

yingxion

Text Box

yingxion

Text Box

ISMAR ’07: Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, Washington, DC, USA, 2007, pp. 1–2, IEEE Computer Society.

1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−3

0

200

400

600

800

1000

1200

1400

Features / Pixel

Run

time

(ms)

Feature Descriptor Runtimes

SURF 1.0.9Ours

Figure 2: Actual runtime of both our and implementation from [1]running at different detection thresholds.

followed by a 3×3×3 local maximum extraction and quadratic lo-calization. Because the minimum scale uses a Gaussian at σ = 1.2,by setting the cutoff frequency at 50% of the maximum amplitude,the Nyquist sampling rate is equivalent to 1/3.2 of the original im-age resolution. We choose 2× sub-sampling rate at the minimumscale level, and store only three scale levels necessary for the localmaximum extraction. This consumes 3× 1/4 = 75% of the inputimage.

Repeatable Angle Computation. We generate the Gaussianfilter lookup table to reduce the floating point computation require-ment. We also used an efficient arctan approximation to speed upthe angle computation process. This approximation is only used inthe angle binning process and, as a result, it yields almost no changeto the final extracted angle. Overall, this stage is extremely effi-cient and incurs only minor overhead (< 10%) compared to upright-SURF.

Descriptor Extraction. In this step, we need to compute imagegradient regularly sampled near the interest point. The samplinggrid is rotated to the angle computed from the previous step. Tospeed up the resampling process, we pre-compute mipmaps usingthe round-up algorithm in [3]. One of the main advantages is wecan resample at the proper scale prior to computing the feature de-scriptors, and as a result each pair of the (dx,dy) Haar transformcomputation operates in the downsampled 2× 2 pixel grid regard-less of the scale of the interest point. This approach, however, re-quires 33% memory overhead in addition to the original image. Italso uses the original input image which is reverted in-place fromthe integral image.

4 Results

First we compare our PC feature descriptor implementation againstthe published implementation from [1]. Figure 2 shows the run-time on a laptop with Intel Core Duo T2500 processor operating at1.8Ghz. We use the test images from the paper by Mikolajczyk etal. [5] for our experiments. Each point in this figure represents theexecution time over four scale octaves and 2× initial subsamplingduring interest point localization. Because the descriptor computa-tion time is fairly proportional to the number of detected featureswhile the interest point extraction is dependent of the image resolu-tion, we run each test image over different detection thresholds andcompute the feature density (namely, detected features for everyimage pixel) on the x-axis. On average, we achieve approximately30% speedup over [1], using double precision floating point num-bers in both cases.

1 2 3 4 5 6 7 8 9 10 11 120

5

10

15

20

25

30

Test Cases

Per

form

ance

Rat

io (

Pho

ne/P

C)

SURF Performance Ratio (PC v.s Phone)

Figure 3: SURF performance ratio between PC and Phone.

Since we are not aware of any implementation of the SURF al-gorithm for a mobile phone, we compare the ratio of our algorithmrunning on the phone versus running on the PC, as shown in Fig-ure 3. The test images are randomly chosen from [5]. The phoneversion, running on the Nokia N95 smart phone, shows on aver-age 22× slowdown compared to the PC. Since the ratio is relativelyhigh, we believe a further improvement in the mobile phone perfor-mance is possible by taking advantage of the special image process-ing instructions available on the embedded CPU and by vectorizingparts of the code for the built-in floating point SIMD unit. We leavethis as future work. We are also looking at developing a new, robustfeature extraction algorithm that would be even more efficient thanthe current implementation thanks to a more direct mapping to themobile platform.

5 Conclusion

We have described a mobile phone implementation of a state-of-the-art robust feature desctiptor algorithm. We have achieved a sig-nificant improvement in performance over the original implemen-tation, which allow us to explore many interesting new researchdirection, which before were only possible with bulky PC/camerasystems and can now run on a small mobile device.

To our knowledge, our implementation is the first of its kind on amobile phone. We are excited about the new opportunities this cre-ates for mobile augmented reality applications and are planning tocontinue improving the results published here. We hope to releasethe code to the public in the near future.

References

[1] H. Bay, T. Tuytelaars, and L. Van Gool. SURF: Speeded Up RobustFeatures. In ECCV (1), pages 404–417, 2006.

[2] Kate Greene. Hyperlinking reality via phones. MIT Technology Review,11-12 2006. Nokia MARA project by M. Kahari and D. Murphy.

[3] Stefan Guthe and Paul Heckbert. Non-power-of-two mipmap creation.Technical report, NVIDIA, 2005.

[4] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints.International Journal of Computer Vision, 60(2):91–110, 2004.

[5] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas,F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affineregion detectors. Int. J. Comput. Vision, 65(1-2):43–72, 2005.

[6] I. Skrypnyk and D. G. Lowe. Scene Modelling, Recognition and Track-ing with Invariant Image Features. In ISMAR’04, pages 110–119, 2004.

[7] C. Lawrence Zitnick, Jie Sun, Richard Szeliski, and Simon Winder.Object instance recognition using triplets of feature symbols. TechnicalReport MSR-TR-200753, Microsoft Research, 2007.

Documents

Eﬃcient Extraction of Robust Image Features on Mobile Devices - People …people.cs.vt.edu/~yxiong/PaperAfter2006/Efficient... · 2010-07-02 · Eﬃcient Extraction of Robust Image