A Vision-Based Mobile Platform for Seamless Indoor/Outdoor Positioning

DEPARTMENT OF COMPUTER SCIENCENATIONAL UNIVERSITY OF IRELAND MAYNOOTH

A Vision-Based Mobile Platform for Seamless Indoor/Outdoor

PositioningGuillaume GALES

Eric MCCLEANJohn MCDONALD

Introduction

2

Camera

Internet access

GPS / Compass

Accelerometer

Introduction

3

Internet access

Vision-based platformfor positioning

PositionCamera pose

Introduction

4


Georeferenced façade database

Façade recognition system

Client applications

Introduction

5


Client applications

Example of AR application

6

Mobile client side

Server side

Authoring client

Façade extraction

Façade Matching

Database

WidgetRetrieval

Rendering

Example of application

7

Outline

Vison-based platform• Façade extraction• Georeferenced façade database• Façade recognition system• Camera pose estimation

Applications• Augmented reality• Navigation• Positioning

Conclusion and perspectives

8

Vison-based platform

Façade Extraction

Key of the vision-based platform• Georeferenced façades are the frame of references of 3D content

Input: image of a façadeOutput: homography between the façade and its imageAdvantages:• Robust matching (invariant to rotation and perspective changes)• Façade normalization (used to build a representation of the environment)• Camera pose estimation (used by the visualization system)

10

Façade Extraction

11


For each street, take pictures of façadesAutomatic façade extraction, matching and stitching• Geometrical constraint makes matching robust• Invariance to rotation and perspective changes

12


13


14

Façade recognition system

Candidate selection• GPS coordinates• Bag-of-word description for selecting

candidates

Similarity constraint

Figure 2: Database infrastructure for computing planar facade mo-saics. The individual facades are stitched together into planes tobuild a frame of reference for authoring.

where x1 are the homogeneous coordinates of a pixel within the firstplanelet and x2 are the homogeneous coordinates of a pixel withinthe second planelet.

Here, we want to match a planelet against a plane. We havealready computed the feature points for our planelet. Let Xplaneletbe the matrix of their homogeneous coordinates in the coordinatesystem of the planelet. We also have selected subsections of plane,the strips, where the planelet is more likely to match. For each strip,we start by retrieving, the matrix of the homogeneous coordinatesof the feature points in the coordinate system of the plane. Theorigin of a plane the origin is given by the origin of the first planeletused to build this plane. We have:

Xplane = TXplanelet (2)

where we need to estimate the parameters of T.For each feature point from the planelet, we measure the Ham-

ming distance between its ORB descriptor and the feature pointsfrom the strip. If this distance is close enough, i.e. below a thresh-old, we add this match to the set of putative matches.

Next, we apply the RANSAC algorithm to get the largest con-sensus set of feature point matches that satisfies the geometric con-straint (2). If the number of matches from this best consensus set isgreater than some threshold, we assume that our planelet matchesthe current plane with a transformation T.

If a match is found for different strips from the same plane, weonly keep the one found with the largest consensus set. When amatch is validated, the planelet is saved to the database and markedas belonging to the plane with the geometric relationship T. Fol-lowing this, the feature points of the planelet are transformed intothe frame of reference of the plane (by transforming their coordi-nates by T) and added to the database.

T2T1

T1T!1

2

Figure 3: Merging. When a planelet matches two planes, it is firstmatched to the best one: i.e. the one giving the largest consensusset satisfying the geometric constraint (2). Then, all the planeletsfrom the second best plane are merged into the first one.

Finally, the strips for that plane are updated (or created) by up-dating (or calculating) the bag-of-words taking into account thenewly added feature points.

3.3 Merging

If a planelet matches more than one plane, and if the number offeature point matches satisfying (2) is greater than a threshold, weconsider that the planelet matches both planes. In this case, wematch the planelet with the first best result as described in the pre-vious section and combine the whole second plane with the first,as shown in Figure 3. The planelets defining the second planes aremarked as belonging to the other plane where the relationship iscalculated as follows:

Tplane1= T1T!1

2 Tplane2(3)

where Tplane2is the geometric relationship of a given planelet to

the second best plane. The coordinates of their feature points andstrips are also updated.

4 DISCUSSION ON POSSIBILITIES AND CHALLENGES FOR

AUTHORING SOLUTIONS

The infrastructure presented in this paper provides an intuitive andflexible platform for augmented reality applications. In particularwe distinguish between two levels of authoring:

• Base level authoring (Populating the image database) – Asdetailed in Section 3, the creation of the image database is au-tomatic due to the facade extraction algorithm. However, weassume that the facades from one street can be approximatedby a planar surface and that they exhibit discriminant features.This planar representation has a number of advantages. Forone street, only the descriptors of the feature points from theplane image need to be stored in the database which leads to a“light” database. Furthermore, once a street has been created,the planar representation can be reused and shared by manydifferent applications that can easily extend it to a 3D space toadd augmented content to that street.

• Content authoring (Adding 3D content) – Different applica-tions can add their own augmented content on top of the baselayer given by the planar representation of the facades of astreet. For example, we can imagine an application showingthe opening hours or the promotions of a store, virtual ex-hibits, social tagging, etc.. One of the advantages of usingfaccades, is that we can also orientate 3D objects within theuser view and therefore are not restricted to the 2D facades.For example, Figure 5 shows how the system can be used ina 3D navigation context. As detailed in [9], the facade basedmodel also provides an easy and intuitive interface for manual

15

Camera pose estimationintrinsics are knownextrinsics are given by the façade extraction algorithm (homography between plane and its image decomposed into rotation and translation)

Camera Façade

16

H�1

H

x = K⇥R t

⇤

2

664

0 0 sh sh0 h h 00 0 0 01 1 1 1

3

775

= H

2

40 0 sh sh0 h h 01 1 1 1

3

5

0

sh

h

Applications

An Image Database Infrastructure for Authoring, Storing and Retrieval inAugmented Reality Mobile Applications

Guillaume GALES!

National University ofIreland Maynooth

Ireland

Eric MCCLEAN†


Ireland

John MCDONALD‡


Ireland

ABSTRACT

Content authoring is an important stage in the workflow of creat-ing rich augmented reality applications. In this paper we describea facade-based database infrastructure for authoring and storing 3D

content for use in urban environments. It provides frames of ref-erence for the environment as well as a mechanism to match newimages with the facades and thus retrieving associated 3D content.The infrastructure is flexible in that we can add different 3D “lay-ers” of content on top of the facades and hence opens many pos-sibilities for augmented reality applications in urban environments.Furthermore the system provides a representation suitable for bothmanual and automatic content authoring.

Keywords: Augmented Reality, Infrastructure, Authoring,Facade-based Database, Content Storing and Retrieving.

1 INTRODUCTION

Mobile augmented reality applications provide rich and useful in-formation to their users about their surrounding environment. Tocreate usable augmented reality applications, an efficient infrastruc-ture is required. Such infrastructures involve:

1. building a map of the environment ;

2. adding content ;

3. retrieving content.

In this paper we propose an infrastructure that makes authoringeasy, intuitive and flexible. Our goal is to create a platform for mo-bile applications to be used in an urban environment. Users take animage of a building facade with their mobile phone, then 3D wid-gets providing information about the building viewed in the sceneare displayed and correctly oriented relative to the scene. The au-thoring of such information is made easy by the infrastructure de-sign which provides an image of the facade in a viewpoint normal-ized space. This space is used as a frame of reference to store 3D

content.We use images of these facades to build a map of the environ-

ment, as well as a frame of reference to link 3D content. Our sys-tem is based on the fact that, in an urban environment, many of thefacades of a building can be approximated by planes.

We start by creating an image database where the images ofthe facades of the same street are stitched together to providea 2D frame of reference for that particular street. This frameof reference is then extended to 3D by using the normal of theplane as the third dimension. Next, given any particular loca-tion within that frame of reference, we can easily add 3D wid-gets providing information about that location. Finally, whena user submits an image of a facade from her/his mobile, this

!e-mail:[email protected]†e-mail: [email protected]‡e-mail: [email protected]

Figure 1: Overall system. The infrastructure of the image databaseprovide a 3D frame of reference to position and to store 3D content.

image is matched against the database to recognise the facade.Since there is an homography between the image of the in-put facade and the image stored in the database, we can eas-ily retrieve the coordinates of the augmented content associatedwith the user view. A video showing these steps is available athttp://www.cs.nuim.ie/research/vision/data/ismar2012/

The infrastructure described in this paper has many advantages:

• Easy and intuitive authoring – It makes authoring easy and in-tuitive by providing each building facade as a reference framefor 3D content. For example, we can position a 3D cup ofcoffee in front of the doors of a building to indicate there is acoffee shop inside this building.

• Flexible and expandable – The process of adding new streetsto the database is automatic. The system only requires theimages of the facades from new streets. Furthermore, oncea street has been added, it can be reused for many differentapplications by using different layers of 3D content.

• Scalable content model – When the database is queried, aquick preselection of potential facade matches based on GPScoordinates and bag-of-words make our solution fast and suit-able for a large amount of data. Furthermore, we do not need

Augmented reality

Extension of façade to 3D: frame of reference

18

Augmented reality

Desktop application• Easy and intuitive interface to use for non expert users• Predefined list of 3D models

19

Augmented reality

Façade ExtractionPose EstimationFaçade MatchingWidget Retrieval

An Image Database Infrastructure for Authoring, Storing and Retrieval inAugmented Reality Mobile Applications

Guillaume GALES!


Ireland

Eric MCCLEAN†


Ireland

John MCDONALD‡


Ireland

ABSTRACT

Content authoring is an important stage in the workflow of creat-ing rich augmented reality applications. In this paper we describea facade-based database infrastructure for authoring and storing 3D

content for use in urban environments. It provides frames of ref-erence for the environment as well as a mechanism to match newimages with the facades and thus retrieving associated 3D content.The infrastructure is flexible in that we can add different 3D “lay-ers” of content on top of the facades and hence opens many pos-sibilities for augmented reality applications in urban environments.Furthermore the system provides a representation suitable for bothmanual and automatic content authoring.

Keywords: Augmented Reality, Infrastructure, Authoring,Facade-based Database, Content Storing and Retrieving.

1 INTRODUCTION

Mobile augmented reality applications provide rich and useful in-formation to their users about their surrounding environment. Tocreate usable augmented reality applications, an efficient infrastruc-ture is required. Such infrastructures involve:

1. building a map of the environment ;

2. adding content ;

3. retrieving content.

In this paper we propose an infrastructure that makes authoringeasy, intuitive and flexible. Our goal is to create a platform for mo-bile applications to be used in an urban environment. Users take animage of a building facade with their mobile phone, then 3D wid-gets providing information about the building viewed in the sceneare displayed and correctly oriented relative to the scene. The au-thoring of such information is made easy by the infrastructure de-sign which provides an image of the facade in a viewpoint normal-ized space. This space is used as a frame of reference to store 3D

content.We use images of these facades to build a map of the environ-

ment, as well as a frame of reference to link 3D content. Our sys-tem is based on the fact that, in an urban environment, many of thefacades of a building can be approximated by planes.

We start by creating an image database where the images ofthe facades of the same street are stitched together to providea 2D frame of reference for that particular street. This frameof reference is then extended to 3D by using the normal of theplane as the third dimension. Next, given any particular loca-tion within that frame of reference, we can easily add 3D wid-gets providing information about that location. Finally, whena user submits an image of a facade from her/his mobile, this

!e-mail:[email protected]†e-mail: [email protected]‡e-mail: [email protected]

Figure 1: Overall system. The infrastructure of the image databaseprovide a 3D frame of reference to position and to store 3D content.

image is matched against the database to recognise the facade.Since there is an homography between the image of the in-put facade and the image stored in the database, we can eas-ily retrieve the coordinates of the augmented content associatedwith the user view. A video showing these steps is available athttp://www.cs.nuim.ie/research/vision/data/ismar2012/

The infrastructure described in this paper has many advantages:

• Easy and intuitive authoring – It makes authoring easy and in-tuitive by providing each building facade as a reference framefor 3D content. For example, we can position a 3D cup ofcoffee in front of the doors of a building to indicate there is acoffee shop inside this building.

• Flexible and expandable – The process of adding new streetsto the database is automatic. The system only requires theimages of the facades from new streets. Furthermore, oncea street has been added, it can be reused for many differentapplications by using different layers of 3D content.

• Scalable content model – When the database is queried, aquick preselection of potential facade matches based on GPScoordinates and bag-of-words make our solution fast and suit-able for a large amount of data. Furthermore, we do not need

20

Results

21

Navigation

22


Outdoor/Indoor navigation application

Path finding algorithm

+

PositioningAccurate positioning (ongoing work)

23



Vision-based platform for positioning• Georeferenced façade database• Façade recognition system

Mobile applications• Augmented reality

- Authoring solution

- User generated content

- Collaborative GI (HTML5)

• Navigation (Ongoing)• Optimisations

25

Acknowledgment

Research presented in this paper was funded by a Strategic Research Cluster grant (07/SRC/I1169) by Science Foundation Ireland under the National Development Plan. The authors gratefully acknowledge this support.Thank you for your attention

26

Documents

A Vision-Based Mobile Platform for Seamless Indoor/Outdoor Positioning