Efﬁcient Indexing and Retrieval of Large-scale Geo-tagged ...ylu720/resource/lu-GeoInformatica16.pdf · We propose our index structures in Section 5 and describe search algorithms

Noname manuscript No.(will be inserted by the editor)

Efficient Indexing and Retrieval of Large-scale Geo-taggedVideo Databases

Ying Lu · Cyrus Shahabi · Seon Ho Kim

the date of receipt and acceptance should be inserted later

Abstract We are witnessing a significant growth in the number of smartphone usersand advances in phone hardware and sensor technology. In conjunction with the pop-ularity of video applications such as YouTube, an unprecedented number of user-generated videos (UGVs) are being generated and consumed by the public, whichleads to a Big Data challenge in social media. In a very large video repository, it isdifficult to index and search videos in their unstructured form. However, due to re-cent development, videos can be geo-tagged (e.g., locations from GPS receiver andviewing directions from digital compass) at the acquisition time, which can providepotential for efficient management of video data. Ideally, each video frame can betagged by the spatial extent of its coverage area, termed Field-Of-View (FOV). Thiseffectively converts a challenging video management problem into a spatial databaseproblem.

This paper attacks the challenges of large-scale video data management usingspatial indexing and querying of FOVs, especially maximally harnessing the geo-graphical properties of FOVs. Since FOVs are shaped similar to slices of pie andcontain both location and orientation information, conventional spatial indexes, suchas R-tree, cannot index them efficiently. The distribution of UGVs’ locations is non-uniform (e.g., more FOVs in popular locations). Consequently, even multilevel grid-based indexes, which can handle both location and orientation, have limitations inmanaging the skewed distribution. Additionally, since UGVs are usually captured ina casual way with diverse setups and movements, no a priori assumption can be madeto condense them in an index structure. To overcome the challenges, we proposea class of new R-tree-based index structures that effectively harness FOVs’ cameralocations, orientations and view-distances, in tandem, for both filtering and optimiza-tion. We also present novel search strategies and algorithms for efficient range anddirectional queries on our indexes. Our experiments using both real-world and large

Ying Lu, Cyrus Shahabi · Seon Ho KimIntegrated Media Systems Center, University of Southern California, Los Angeles, CA, 90089 USAE-mail: ylu720,shahabi,[email protected]

2 Ying Lu et al.

synthetic video datasets (over 30 years’ worth of videos) demonstrate the scalabilityand efficiency of our proposed indexes and search algorithms.

Keywords Geo-tag, index, video retrieval, scalability, user-generated video

1 Introduction

Driven by the advances in video technologies and mobile devices (e.g., smartphones,Google Glasses), a large number of User Generated Videos (UGVs) are being pro-duced and consumed. According to a study by Cisco [1], the overall mobile datatraffic reached 1.5 exabytes per month at the end of 2013, 53% of which was mo-bile video data. It is forecasted that mobile video traffic will reach 15.9 exabytes permonth by 2018. It is obvious that mobile videos will play a critical role in daily life;however, it is still very challenging to organize and search such a huge amount ofuser-generated mobile videos.

To overcome this challenge, we leverage smartphone sensors while capturingvideos to model video content with its geospatial properties at the fine granularitylevel of frame. We can model a video frame as a geometric figure, i.e., Field-Of-View [2] using camera location, orientation and lens angle. FOV model has beenproven to be very useful for various media applications as demonstrated by the on-line mobile media management system, MediaQ [8]. In the presence of such geospa-tial metadata, we propose a new efficient index for a large scale geo-tagged videodatabase.

(a) Range query (b) Directional query

Fig. 1 MediaQ [8] interface

Unlike traditional spatial objects (e.g., points, rectangles), FOVs are spatial ob-jects with orientation (i.e., viewing direction of camera). For example, Figure 1 showsthe FOV (blue pie-slice) of the video frame currently being displayed in MediaQ.Then, a video clip can be modeled as a series of spatial objects, i.e., FOVs. WithFOV metadata, there can be two typical spatial queries on geo-tagged videos [14]:range and directional queries. A range query finds all the FOVs that overlap with auser-specified circular query area. Figure 1(a) illustrates a circular range query in Me-diaQ, searching for videos covering an area at the University of Southern California,

Efficient Indexing and Retrieval of Large-scale Geo-tagged Video Databases 3

where the markers on the map show the locations of the video results of the rangequery. A directional query finds all the FOVs whose orientations overlap with theuser-specified direction within a range. Figure 1(b) shows the results of a directionalquery when the input direction is the North.

Note that “direction” discussed in this paper is an inherent attribute of a spatialobject (i.e., FOV). This is different from how direction has been treated in the past inthe spatial database field, where direction is only a component of a query. For exam-ple, the goal of an object-based directional query in [12] is to find objects that satisfythe specified direction of the query object (e.g., “finding restaurants to the north of anapartment”) while the goal of a directional query in this study is to find all the objectspointing towards the given direction. To distinguish these two characteristics, we willuse the term “orientation” when referring to the direction attribute of FOV objectsand “direction” when we refer to the query component.

Indexing FOVs poses challenges to existing spatial indexes due to their inabilityto incorporate the orientation property of FOVs and the way UGVs are collected inthe real world. Let us elaborate each case below. First, the FOVs of geo-tagged videosare spatial objects with both locations and orientations. Existing indexes cannot effi-ciently support this type of data. For example, one straightforward approach to indexFOVs with a typical spatial index such as R-tree [6] is to enclose the area of eachFOV with its Minimum Bounding Rectangle (MBR). In this case, R-tree suffers fromunnecessarily large MBRs and consequently large “dead spaces” (i.e., empty area thatis covered by an index node but does not overlap with any objects in the node). Also,it can perform neither orientation filtering nor orientation optimization, resulting inmany false positives. The state-of-the-art of indexing for FOVs is a three-level grid-based index (Grid) [14]. Grid stores the location and direction information at differ-ent grid levels which will not be efficient for video querying since video queries (e.g.,range queries) involve both location and orientation information of FOVs at the sametime. Second, in real life, the FOVs of user-generated videos are recorded in a casualway with various shooting directions, moving directions, moving speeds, zoom lev-els and camera lens. Grid suffers from efficiency problem for indexing FOVs withthe different zoom levels and camera lens’ properties. In addition, the FOVs are notuniformly distributed in the real world. Certain areas result in a significantly largernumber of FOVs due to the high frequency and long duration of uploaded videos forthose locations. Trivially, Grid performs poorly for non-uniformly distributed FOVssince the occupancy of grid files rises quickly for skewed distribution.

To overcome the drawbacks of the existing approaches, we propose a class ofnew index structures using both location and orientation information, called OR-trees,building on the premises of R-tree. Our first straightforward approach uses R-tree toonly index the camera locations of FOVs as points and then augments the index nodesto store their orientations. This variation of OR-tree is expected to generate smallerMBRs and reduce their dead spaces while supporting orientation filtering. To enhancefurther, we devise a second variation by adding an optimization technique that usesorientation information during node split and merge operations. Finally, in our thirdand last variation, we add the FOVs’ viewable distances into the consideration duringboth the filtering and optimization process.

4 Ying Lu et al.

Our extensive experiments using a real-world dataset and big synthetically gener-ated datasets (more than 30 years’ worth of videos) demonstrate how each variationof OR-tree works comparing to the competitors, i.e., R-tree and Grid. Unlike ouroriginal intuition, the first variation which simply augments the nodes with orien-tations did not produce any better results than R-tree nor Grid. However, when weutilize orientation and view distance information for the merge and split operationsin the second and third variations, the index performance in supporting range and di-rectional queries significantly improved almost by a factor of two compared to R-treeand Grid. This implies that a naive addition of extra orientation information in aug-menting R-tree does not necessarily enhance the performance of indexing. However,the results demonstrate that our sophisticated optimization techniques in augment-ing R-tree using extra orientation and view distance is critical in the enhancement ofthe index performance, which is the main contribution of this paper. Another majorcontribution of this paper is the new search algorithms to efficiently process rangeand directional queries with OR-trees. For example, we devise a new method to iden-tify an index node of OR-tree as “total hit” (i.e., all the child nodes are in the resultset) without accessing its child nodes, which results in a significant reduction in pro-cessing cost. Finally, we develop an analytical model to compute the bound of themaximum possible improvement of OR-trees over R-tree.

This article extends our previous study [13] with detailed algorithms on indexconstruction and video query processing, and new sections of extensive experimentalstudy and theoretical analysis on the maximum improvement over R-trees.

The remainder of this paper is organized as follows. In Section 2, we representthe FOV spatial model for geo-tagged videos and formally define the video queries.Section 3 reviews the related work. In Section 4, we present two baseline indexes.We propose our index structures in Section 5 and describe search algorithms for theproposed indexes in Section 6. We analyze the maximum improvement over R-treesin Section 7. Section 8 reports our experimental results. Finally, Section 9 concludesthe paper and discusses some of our future directions.

2 Preliminaries

2.1 Video Spatial Model

In this paper, videos are represented as a sequence of video frames, and each videoframe is modeled as a Field Of View (FOV) [2] as shown in Figure 2. An FOV f isdenoted as (p, R,

−→Θ ), in which, p is the camera location, R is the visible distance,−→

Θ is the orientation of the FOV in form of a tuple <−→θb

−→θe>, where, in a clockwise

direction,−→θb and

−→θe are the beginning and ending view direction (a ray), respectively.

We store−→Θ as a tuple of two numbers: Θ<θb, θe>, where, θb =

ø−→N−→θb (respectively

θe =ø−→N−→θe ), is the angle from the north

−→N to the beginning (respectively ending)view direction

−→θb (respectively

−→θe ) in a clockwise direction. During video recording

using sensor-rich camera devices, for each video frame f , we can capture the cameraview direction θ of the the video frame f with respect to the north from the compass


sensor automatically. Further, according to the camera lens properties and zoom level,we can calculate the viewable angle α of the video frame f [2]. Consequently, wecan derive the starting and ending view directions: θb = (θ− α

2 +360)mod360 , andθe = (θ+ α

2 +360)mod360 (θ is the center direction between its starting and endingview directions).

In this paper, we represent the coverage of each geo-video as a series of FOV ob-jects. FOVs in two dimensions are in sector shaped, which consider camera azimuths.3-dimensional FOVs are in cone shaped, which consider other two camera rotationtypes: pitch and roll. Though our indexes and search algorithms for 2-dimensionalFOVs can be extended to 3-dimensional FOVs, this paper focuses on 2-dimensionalFOVs. Additionally, we mainly focus on videos that are taken outdoor in an open areaassuming that there are no obstructions within FOVs. Let V be a video dataset. For avideo vi ∈ V , let Fvi be the set of FOVs of vi. Hence, the video database V can berepresented as an FOV database F = Fvi |∀vi ∈ V.

p

θα

North

R

Θr

bθ

eθ

→

bθ

→

eθ

Fig. 2 FOV model

2.2 Queries on Geo-Videos

As we represent a video database V as an FOV spatial database F , the problem ofgeo-video search is transformed into spatial queries on FOV database F . Next, weformally define two typical spatial queries on geo-videos [14]: range queries anddirectional queries. Given a query circle Qr(q, r) with the center point q and theradius r, a range query finds FOVs that overlap with Qr. It is formally defined as:

RangeQ(Qr,F)def⇐⇒ f ∈ F|f ∩Qr = ∅ (1)

Given a query direction interval Qd(θb, θe) and a query circle Qr(q, r), a directionalquery finds FOVs whose orientations overlap with Qd within the range Qr, and isformally defined as:

DirectionalQ(Qd, Qr,F)def⇐⇒ f ∈ F|f.

−→Θ ∩Qd = ∅ and f ∩Qr = ∅, (2)

where, f.−→Θ denotes the orientation of FOV f .

6 Ying Lu et al.

3 Related Work

Geo-referenced videos have been of wide interest to the research communities ofmultimedia and computer vision. For example, Zhu et al. [23] generated panoramafrom videos based on the geo-metadata of videos. Shen et al. [18] studied automatictag annotations for geo-videos and detected obstructions within FOVs for 3D vis-ibility query processing. In spatial database field some studies [2, 17] focused ongeo-video modeling and representation. Other studies [4, 9, 14, 15, 21, 22] mainly fo-cused on geo-video indexing and queries. Navarrete et al. [15] utilized R-tree [6] toindex the camera locations of videos. Toyama et al. [21] used grid files to index thecamera location and time information. The two studies [15, 21] treat videos / videoframes as points. In addition, others [4,9,14] focused on indexing and query process-ing of geo-videos represented as FOV objects. Ay et al. [4] indexed FOV objects withR-tree, and Ma et al. [14] proposed a grid-based index for FOV objects. However,neither of them are efficient. Their drawbacks are analyzed in Section 4.2. Kim etal. [9] present an R-tree based index called GeoTree for FOV objects. The differencebetween GeoTree and R-tree is that GeoTree stores Minimum Bounding Tilted Rect-angle (MBTR) in the leaf nodes. An MBTR is a long tilted rectangle paralleling withthe moving direction of the FOV stream enclosed in the MBTR. GeoTree is only suit-able for indexing mobile videos whose moving directions do not change frequently.However, in real life, the moving directions of mobile videos change frequently, e.g.,with Google Glass or a mobile phone. Further, GeoTree considered the moving di-rection of FOV objects instead of the orientations of FOV objects. Furthermore, itdoes not store the orientation information in the index node for filtering. Flora etal. [5] discusses how to use the existing spatial databases (e.g., Informix) to answerrange queries efficiently through using polygons to represent the coverage of videosegments and carefully designing the DB schema. However, the study [5] focuses onrange queries only. Furthermore the polygon representation of video frames and thequery processing algorithms do not consider the orientation information of videoswhich would result in poor performance for directional queries. Therefore, the exist-ing work is not efficient and effective for indexing and querying the geo-videos withboth location and orientation information. To this end, this paper focuses on indexingsuch geo-video objects with both location and orientation information. Indexing thetemporal information of videos is beyond the focus of this work.

Since FOV objects are spatial objects with orientations, studies on directions arerelated to our work. The exploration of directional relationships between spatial ob-jects has been widely researched [16, 20], including absolute directions (e.g., north)and relative directions (e.g., left). Some work mainly studied on direction-awarespatial queries [10–12]. Other studies focused on the moving directions of movingobjects [9, 19]. In our paper, the directions of FOV objects are the inherent attributesof the objects rather than that of queries and hence different from the directions dis-cussed in the above studies.


4 Baseline Methods

4.1 R-tree

One baseline for indexing FOV objects is using R-tree [6], which is one of the basicand widely used index structures for spatial objects. To index FOVs efficiently usingR-tree, we index the FOV object based on the MBRs of the visible scene of the FOVobjects. Consider the example in Figure 3, in which, f1, . . . , f8 are FOV objects. Thelocations, orientations and visible distances of the FOVs are also given in Figure 3.We will use Figure 3 to explain the baseline R-tree in Section 4, and our proposedindexes OAR-tree and O2R-tree in Section 5. The R-tree of those FOVs is illustratedin Figure 4. Since R-tree is based on the optimization of minimizing the area of MBRsof FOV objects, the MBRs of the leaf nodes of the R-tree are the dashed rectangles inFigure 3 (assume the fanout is 2). Following the same optimization criteria, the MBRof the non-leaf node R5 (respectively R6, R7) can be built up from the MBRs of thenodes R1 and R2 (respectively R3 and R4, R5 and R6). For clarity, we did not plotthese MBRs in Figure 3.

Range and Directional queries based on R-tree For the range query Qr in Fig-ure 3, we need to access all the index nodes (R1 ∼ R7) of the R-tree since all oftheir MBRs overlaps Qr. However, of which, only two FOV objects f1 and f2 areresults. For the directional query with the query direction interval Qd (0 − 90) andthe query range Qr, we also need to access all of the R-tree nodes since this R-treecannot support orientation filtering.

Qr

f1

f3

f8

f4

f2f5

f7

R1R2

R3R4

Qd

f6

f.p f.θ f.Rf1 4.3, 1.1 330 - 7 8.5f2 5.2, 0.4 340 - 20 8.4f3 7.0, 1.6 20 - 50 8.0f4 8.0, 1.1 24 - 58 8.5f5 4.6, 1.8 260 - 315 3.0f6 5.8, 1.0 263 - 320 2.7f7 11.0, 8.0 170 - 215 6.0f8 12.6, 8.0 170 - 215 6.0

Fig. 3 Sample dataset of FOV objects

Hence, R-tree has the following drawbacks for indexing FOVs:

1. Dead space. Figure 5 illustrates the “dead spaces” (or empty area, the area thatis covered by the MBR of an R-tree node, but does not overlap with any objectsin the subtree of the node [6]) of FOV f1 and R-tree node R1 in Figure 3. Deadspaces will cause false positives for range queries, and thus increase both index

8 Ying Lu et al.

[0.2, 8.4][0.4, 9.6]

[7.0, 15.2][1.0, 9.1]

[2.9, 8.4][0.4, 8.8]

[7.0, 13.9][1.6, 9.1]

[8.0, 15.2][1.0, 8.6]

[0.2, 5.5][1.1, 9.6]

f1

R1

[1.5, 4.7][1.1, 4.0]

f5

[2.9, 8.4][0.4, 8.8]

f2

[3.1, 5.8][0.4, 3.0]

f6

[7.0, 13.9][1.6, 9.1]

f3

[7.0, 11.2][2.0, 8.0]

f7

[8.0, 15.2][1.0, 8.6]

f4

[9.6, 14.2][2.0, 8.0]

f8

R2 R3 R4

R5 R6

R7

[0.2, 5.5][1.1, 9.6]

Fig. 4 R-tree

f1MBR(f1)

(a) FOV object

R1

f1

f5

(b) Index node

Fig. 5 Dead spaces of object and index node of R-tree. Dashed area denotes the dead spaces.

node accesses and cpu computation cost. Taking the range query in Figure 3 as anexample, due to the dead spaces of index nodes R3 and R4, it needs to access R3

and R4, which are not necessary to access since FOVs in neither R3 nor FOVs inR4 are results.

2. Large MBRs. The area of the MBR of an R-tree node would be large due to thelarge visible scenes of the FOV objects enclosed in the node. With R-tree, largeMBRs will increase the number of accessed node for a given range query sincethe decision whether to visit a node depends on if the MBR overlaps the queryarea [6].

3. No orientation filtering. With regular R-tree, there is no orientation informationin the index nodes of the R-tree.

4. No orientation optimization. R-tree is constructed based on the optimization ofminimizing the covering area of FOV objects, without considering their direc-tions.

4.2 Grid-based Index

Another approach that considers the directions of FOVs is Grid-based Index, termedGrid [14], a three-level grid-based index structure based on viewable scene, cameralocations and view directions. The first level indexes FOVs in a coarse grid, whereeach grid cell maintains the FOVs that overlap with the cell. At the second level, eachfirst-level cell is divided into a set of subcells. Each subcell maintains the list of FOVs


whose camera locations are inside the subcell. At the third level, it divides 360 into xintervals. Each direction interval maintains a list of FOVs whose orientations overlapwith the interval. Grid uses the first and second levels for range filtering to processrange queries and use the third level for orientation filtering to process directionalqueries.

However, the Grid-based Index has the following drawbacks:

1. It stores the location and orientation information at different levels, which is notefficient since video queries usually involve both location and orientation infor-mation of FOVs at the same time during query processing.

2. It is not suitable for indexing FOVs with different zoom levels and camera lens’properties since those FOVs have different viewable distances [2] and it will resultin a large number of candidate second-level cells.

3. It performs poorly for skewed distribution of FOVs since the bucket occupancyof grid files rises very steeply for skewed distribution [7].

5 The class of OR-trees

To overcome the drawbacks of R-tree [6] and Grid [14], we devise a class of newindex structures, called OR-trees, incorporating camera locations, orientations andviewable distances of videos. The class of OR-trees includes three new indexes eachof which is built on the premise of its pervious version.

5.1 Orientation Augmented R-tree: OAR-tree

Recall that with R-tree, using MBRs to estimate FOVs will result in large MBRs,large “dead spaces” and the loss of orientation information. In this section, we intro-duce a new index called Orientation Augmented R-tree (OAR-tree) based on smallerMBRs, reduced “dead spaces”, and incorporating orientation information in the indexnodes, to accelerate the query efficiency.

In particular, based on R-trees, we add two new entries storing the orientation andviewable distance information of FOV objects on both leaf nodes and internal nodesof OAR-trees. For the leaf index nodes of an OAR-tree, instead of the MBRs of FOVobjects, we store three values and a pointer to the actual FOV objects. Based on this,we can avoid the “dead spaces” of FOV objects to reduce false positives. Specifically,each leaf index node N of an OAR-tree contains a set of entries in the form of (Oid,p, R, Θ), where Oid is the pointer to an FOV in the database; p is the camera locationof the FOV object; R is its visible distance; and Θ is its view orientation.

For internal index nodes, we replace 1) Oid with a pointer to the child node, 2)p with the MBR of all camera points in the child node, 3) R with an aggregate valuerepresenting all visible distances in the child node, and 4) θ with an aggregate valuerepresenting all orientations in the child node. Specifically, each non-leaf index nodeN of an OAR-tree contains a set of entries in the form of (Ptr, MBRp, MinMaxR,−−−→MBO), where

– Ptr is the pointer to a child node of N ;

10 Ying Lu et al.

– MBRp is the MBR of the camera locations of the FOVs in the subtree rooted atPtr; as shown in Figure 6, MBRp is obviously much smaller than the MBR ofFOVs in R-tree.

– MinMaxR is a tuple < MinR, MaxR >, where MinR (respectively MaxR)is the minimum (respectively maximum) visible distance of the FOVs in the sub-tree rooted at Ptr;

–−−−→MBO is the Minimum Bounding Orientation (MBO), defined in Definition 1 be-low, of the orientations of the FOVs in the subtree rooted at Ptr. From Defini-tion 1, we can see that

−−−→MBO is a tuple of < θb, θe >, which has the same form

as view orientations Θ of FOV objects. For example, the minimum bounding ori-entation of FOVs f3 and f4 in the example in Figure 3 is MBO(f3, f4) as shownin Figure 7.

Definition 1 (Minimum Bounding Orientation (MBO)) Given a set of FOVs’ ori-entations Ω =

¦Θi < θbi, θei >

©, 1 ≤ i ≤ n, n is the number of orientations in

Ω , then the Minimum Bounding Orientation (MBO) of Ω is the minimum angle in aclockwise direction that covers all the orientations in Ω , i.e., MBO(Ω) =< θb, θe >,

such thatø−→θb−→θe = min

θbi∈Ω

¦maxθej∈Ω

ú−→θbi

−→θej©

.

Figure 8 illustrates the OAR-tree for the objects in Figure 3. Taking the first entryE of node N5 as an example, entry E points to the node N1, which is the child node ofN5. In N1, there are two FOVs f1 and f5. N1.MBRp is [4.3, 4.6, 1.1, 1.8] (which is inthe form of [minx,maxx,miny,maxy]), which is the MBR of the camera locationsof f1.p = (4.3, 1.1) and f5.p = (4.6, 1.8). N1.MBO is <260, 7> is the MBOof f1.Θ =<330, 7> and f5.Θ =< 260, 315>. Furthermore, N1.MinMaxR =[3.0, 8.5] is the minimum and maximum of the viewable distances f1.R = 8.5 andf5.R = 3.0, respectively.

R-tree

MBROAR-tree

MBRp

f3

f4

Fig. 6 MBR comparison between R-tree and OAR-tree.

f3

f4f4'

Fig. 7 An MBO. f ′4 is the translation of f4.

Like the traditional R-tree [6], the OAR-tree is constructed on the optimizationheuristic of minimizing the enlarged area of the minimum bounding rectangles ofthe camera locations of the index nodes, i.e., the OAR-tree aims to group the indexnodes that the camera locations of FOVs in the subtrees are close in the same higher-level node. For example, based on the optimization of minimizing the area of cameralocations, as shown the OAR-tree in Figure 8, FOVs f3, f4 are grouped into the nodeN3, and f7 and f8 in N4. Subsequently, for the example in Figure 3, comparing


[2.7, 8.5]

[4.3, 5.8][0.4, 1.8]

[6.0, 8.5]

[7.0, 12.6][1.1, 8.0]

[260°, 7°] [24°, 215°]

[3.0, 8.5]

[4.3, 4.6][1.1, 1.8]

[2.7, 8.4]

[5.2, 5.8][0.4, 1.0]

[260°, 7°] [263°, 20°]

[8.0, 8.5]

[7.0, 8.0][1.1, 1.6]

[6.0, 6.0]

[11.0, 12.6][8.0, 8.0]

[24°, 58°] [170°, 215°]

8.5

[4.3, 1.1]

f1

N1

[330°, 7°]

8.4

[5.2, 0.4]

f2

[340°, 20°]

8.0

[7.0, 1.6]

f3

[30°, 50°]

8.5

[8.0, 1.1]

f4

[24°, 58°]

3.0

4.6, 1.8]

f5

[260°, 315°]

2.7

[5.8, 1.0]

f6

6.0

[11.0, 8.0]

f7

6.0

[12.6, 8.0]

f8

[263°, 320°] [170°, 215°] [170°, 215°]

N3N2 N4

N5 N6

N7

Fig. 8 An OAR-tree example

with the baseline R-tree, for the OAR-tree we visit two less index nodes for the rangequery Qr (both N3 and N4 can be pruned), and one less index node for the directionalquery Qd (N4 can be pruned and all the FOVs in the subtree of N3 can be reported asresults). It is non-trivial to process the range and directional queries based on OAR-tree efficiently, and we will discuss the algorithms in Section 6.

The OAR-tree stores the MBRs of camera locations, and incorporates the aggre-gate orientation and viewable distance information of all the children nodes to achievesmaller MBRs and orientation filtering. However, the OAR-tree is only based on theoptimization of minimizing the covering area of the camera locations, which mayresult in large false positives for both range and directional queries. Similar to the“dead space” of an R-tree node, we formally define the “Virtual Dead Space” of anOAR-tree node in Definition 2. Different from the dead space of an R-tree node wherethe coverage of an R-tree node, i.e., MBR, is stored, for the virtual dead space of anOAR-tree, its virtual coverage is not stored. However, both of them will produce falsepositives for range queries. Figure 9(a) shows the virtual dead spaces of the OAR-treenode containing f1 and f5, and the OAR-tree node containing f1 and f2.

Definition 2 (OAR-tree node virtual dead space) Given an OAR-tree index nodeN(MBRp,

−−−→MBO,MaxMinR), then the virtual dead space of N is the area that

is virtually covered by N , but does not overlap with any FOVs in the subtree of N .The virtual coverage of N is a convex region such that any point in the convexcan be covered by any FOV (p,

−→Θ , R), ∀p ∈ N.MBRp, ∀

−→Θ ∈ N.

−−−→MBO, ∀R ∈

N.MaxMinR.

Consider Figure 9(a) again, for the example in Figure 3, FOV f1 is grouped to-gether with f5 in the OAR-tree based on the camera point optimization. However,if f1 is grouped together with f2, additionally considering orientation information,then the virtual dead spaces of the OAR-tree node containing FOVs f1 and f5 will besignificantly reduced and so does the number of false positives.

12 Ying Lu et al.

Based on this observation, we next discuss how to enhance OAR-tree by consid-ering orientation optimization during the index construction.

f1

f5

(a) Camera point opti-mization

f1 f2

(b) Camera point andorientation optimiza-tion

Fig. 9 Virtual dead spaces of OAR-tree nodes based on different optimizations. Dashed area indicatesvirtual dead space.

5.2 Orientation Optimized R-tree: O2R-tree

In this section, we propose a new index called Orientation Optimized R-tree (O2R-tree) that optimizes based on both the camera locations covering area and the simi-larity in orientation.

The stored information of O2R-tree index nodes is the same as that of the OAR-tree. The main difference between O2R-tree and OAR-tree is in the optimizationcriteria during the merging and splitting of the index nodes.

While the framework of the O2R-tree construction algorithm (see Algorithm 1),is similar to that of OAR-tree, considering additional orientation information, theprocedures ChooseLeaf and Split are different and are presented as follows.

The procedure ChooseLeaf is to choose a leaf index node to store a newly in-serted FOV object. ChooseLeaf traverses the O2R-tree from the root to a leaf node.When it visits an internal node N in the tree, it will choose the entry E of the nodeN with the least Waste, to be given in Equation 5, which combines the camera loca-tions and orientation information of the FOV objects. The procedure Split is to splitan index node N into two nodes when the node N overflows. We use the standardQuadratic Split algorithm [6] based on our proposed Waste function.

We proceed to compute Waste considering the combination of the wastes of thecamera locations and view orientations.

Given an O2R-tree entry E(MBRp,MaxMinR,MBO) and an FOV f(p,R,Θ),let ∆Area(E, f) be the empty area (or dead space) that encloses the camera loca-tion of f with the MBR of the camera locations of FOVs in E. The definition of∆Area(E, f) is formulated in Eqn(1).

∆Area(E, f) = Area(MBR(E, f))−Area(E) (3)


where Area(MBR(E, f)) is the area of the minimum bounding rectangle enclos-ing E.MBRp and f.p; Area(E) is the areas of the minimum bounding rectangleEi.MBRp. The angle waste for the view orientation is computed by Eqn(4)

∆Angle(E, f) = ÿMBO(E.MBO, f.Θ)−ü−−−−−→E.MBO −÷−−→f.Θ (4)

where ÿMBO(E.MBO, f.Θ) is the clockwise cover angle of the minimum boundingorientation enclosing E.MBO and f.Θ.

Algorithm 1: Insert (R: an old O2R-tree, E: a new leaf node entry)Output: The new O2R-tree R

1 N ← ChooseLeaf(R, E);2 add E to node N ;3 if N needs to be split then4 N , NN ← Split(N , E);5 if N .isroot() then6 initialize a new node M ;7 M .append(N ); M .append(NN );8 Store nodes M and NN //N is already stored ;9 R.RootNode←M ;

10 else11 AdjustTree(N.ParentNode, N , NN );

12 else13 Store node N ;14 if ¬N .isroot() then15 AdjustTree(N.ParentNode, N , null);

Procedure ChooseLeaf(R, E)16 N ← R.RootNode;17 while N is not leaf do18 E′ = argmax

Ei∈N

Wastelo(Ei, E); // Eqn (5);

19 N←E′.P tr;

20 Return N ;

Procedure Split(N , E)21 E1, E2 = argmin

Ei,Ej∈N∪EWastelo(Ei, Ej ); // Eqn (5);

22 for each entry E′ in N ∪ E, where E′ = E1, E′ = E2 do23 if Wastelo(E′, E1)≥Wastelo(E′, E2) then24 Classify E′ as Group 1;25 else26 Classify E′ as Group 2

27 Return Group 1 and Group 2;

Combining Eqn(3) and Eqn(4) using linear regression and normalization, we cancompute the overall waste cost in Eqn(5).

Wastelo(E, f) = βl∆Area(E, f)

max∆Area+ βo

∆Angle(E, f)

max∆Angle(5)

14 Ying Lu et al.

In Eqn(5), max∆Area (respectively max∆Angle) is the maximum of ∆Area(E, f)(respectively ∆Angle(E, f)) for all the pair entries Ei and Ej to normalize the cam-era location (respectively orientation) waste. Parameters βl and βo, 0 ≤ βl, βo ≤ 1,βl+βo = 1, are used to strike a balance between the area and angle wastes. A smallerWastelo(E, f) indicates that the entry E is more likely to be chosen for insertion ofobject f . Note that Eqn 5 can be naturally extended to compute the similarity betweentwo entries of O2R-trees.

Qr Qd

f1

f3

f8

f4

f2

f5

f7N1

f6

N2

N3

N4

Fig. 10 Leaf nodes of O2R-tree for the example in Figure3

For the example in Figure 3, as shown in Figure 10, the O2R-tree groups f1 and f2into a node N1, and groups f5 and f6 into a node N2. Whereas, for the same examplein Figure 3, OAR-tree, as shown in Figure 8, groups f1 and f5 into a node N1 andgroups f2 and f6 into a node N2. Hence, as compared to the OAR-tree, O2R-treevisits one less index node for the range query Qr (node N2 can be pruned), and oneless index node for the directional query Qd (node N2 can be pruned).

5.3 View Distance and Orientation Optimized R-tree: DO2R-tree

Considering the camera location and orientation for optimization may still be in-sufficient. To illustrate this, consider Figure 11, FOV f1 is packed with f2 in nodeN1 (Figure 11(a)) based on the O2R-tree optimization as their camera locations andorientations are the same. While additionally considering the visible distances foroptimization, f1 is packed with f3 in N1 (see Figure 11(a)) due to the high dissimi-larity of visible distances between f1 and f2 and that between f1 and f3. Therefore,the range query qr needs to visit two index nodes (i.e., N1 and N2 in Figure 11(a))based on the O2R-tree optimization. However, if we consider the view distances foroptimization as well then we only need to visit one node (i.e., N1 in Figure 11(b)).Hence, we discuss how to construct the index based on the optimization criterion in-corporating the view distance information of FOV objects, and we call the new indexView Distance and Orientation Optimized R-tree (DO2R-tree).

Again, the stored information of DO2R-tree index nodes and the index construc-tion framework are the same as those of the O2R-tree. The difference between DO2R-


f1f2 f3f4

Qr

N1

N2

(a) Without view distance

f1f2f3f4

Qr

N1

N2

(b) With view distance

Fig. 11 Illustration of optimization criteria with and without considering view distance. Suppose the fanoutis 2.

tree and O2R-tree is the optimization criteria. Hence, unlike the waste function inEqn(5), the new waste function incorporates the view distance differences as given inEqn(7).

Given a DO2R-tree entry E(MBRp,MaxMinR,MBO) and an FOV f(p,R,Θ),the waste of viewable distance ∆R(E, f) is defined in Eqn(6).

∆Diff(E, f) = Diff(Ef)−Diff(E) (6)

where Diff(E) is the difference between maximum and minimum viewable dis-tances of entry E, i.e., D(E) = E.MaxR− E.MinR. Diff(Ef) is the differencebetween maximum and minimum viewable distances of node enclosing the viewabledistances of both E and f .

Combining the wastes of camera location area and orientation covering angle andthe waste of view distance differences together, we can compute the overall wastecost in Eqn(7).

Wastelod(E, f) = βl∆Area(E, f)

max∆Area+ βo

∆Angle(E, f)

max∆Angle

+ βd∆Diff(E, f)

max∆Diff(7)

In Eqn(7), max∆Diff is the maximum of ∆Diff(E, f) for all the pair entries Ei

and Ej to normalize the visible distance. Parameters βl, βo and βd, 0 ≤ βl, βo, βd ≤1, βl + βo + βd = 1, are used to tune the impact of the three wastes. In particular,if βd = 0, then DO2R-tree reduces to O2R-tree, and if also βo = 0, then it becomesOAR-tree.

6 Query processing

We proceed to present the query algorithms for range queries and directional queries,respectively, based on DO2R-tree which is the generalization of the three indexesdiscussed in Section 5.

16 Ying Lu et al.

6.1 Range queries

In this section, we develop an efficient algorithm to answer range queries. At thehigh-level, the algorithm descends the DO2R-tree in the branch-and-bound manner,progressively checking whether each visited FOV object / index node overlaps withthe range query object. Subsequently, the algorithm decides whether to prune an ob-ject / index node, or to report the FOV object / index node (all the FOV objects in theindex node) to be result(s). In the following, before presenting the algorithm, we willfirst present an exact approach to identify whether an FOV object overlaps with therange query object, and then we exploit it to identify whether a DO2R-tree index nodeshould be accessed or not through two newly defined strategies: 1) pruning strategyand 2) total hit strategy.

6.1.1 Search strategies for range queries

q

p

r

(a) Case 1

q

pR

r

→

bθ

→

eθ

Θ

f

g

(b) Case 2

q

p

R

r

→

bθ

→

eθ

Θ

f

g

be

(c) Case 3

→

bθ

→

eθ

Θ

(d) Case 4

Fig. 12 Overlap identifying for an object FOV

Let Qr(q, r) be the range query circle with center point q and radius r, and letfp, R, Θ<θb, θe>

be an FOV object, we explain the exact approach to calculate

whether FOV f overlaps with the query Qr. As shown in Figure 12, there are fouroverlapping cases: Case 1) The FOV camera location f.p is within the query Qr. Theformal equation is given in Eqn(8). Obviously, FOVs in this case must overlap withthe query Qr; Case 2) f.p is outside of Qr, and the ray −→pq is within the FOV vieworientation of f.Θ. In this case, f can overlap with Qr iFF Qr intersects with thecircle with the center point p and radius R, as formally defined in Eqn(9); Case 3)


f.p is outside of Qr, and the ray−→θb is between the ray −→pq and ray −→pg. In this case, f

can overlap with Qr iFF the segment pb intersects with the arc fg, which is formallydefined in Eqn(10); Case 4) Analogously, f.p is outside of Qr, and the ray

−→θe is

between the ray−→pf and ray −→pq. In this case, f can overlap with Qr iFF the segment

pe intersects with the arc fg, which is formally defined in Eqn(11).Therefore we can derive the lemma bellow to identify whether an FOV object

overlaps with a range query circle.

Lemma 1 (Overlap Identifying for An Object) Given an FOV fp, R, Θ<θb, θe>

and a range query Qr(q, r), f overlaps with Qr iFF it satisfies Eqn(8), or Eqn(9),or Eqn(10), or Eqn(11).

|pq| ≤ r (8)

|pq| ≤ r +R andø−→θb−→pq +ø−→pq−→θe =

õ−→Θ (9)

|pq| cosø−→pq−→θb −Ér2 − (|pq| sinø−→pq−→θb)2 ≤ R and

ø−→θb−→pq +ø−→pq−→θe =õ−→Θ (10)

|pq| cosø−→pq−→θe −Ér2 − (|pq| sinø−→pq−→θe)2 ≤ R and

ø−→θb−→pq +ø−→pq−→θe =õ−→Θ (11)

Based on Lemma 1, we can develop our first strategy, the pruning strategy, to ex-amine whether a DO2R-tree index node N can be pruned or not. In order to prune Nwithout accessing the objects in the subtree of N , we first introduce two approxima-tions in Definition 3.

Let NMBRp,<MinR,MaxR>,MBO<θb, θe>

be an index node in DO2R-

tree. As shown in Figure 13, let <−→pbq,−→peq>, where pb, pe ∈ MBRp, be a ray tuplesuch that ∀p ∈ MBRp, the ray −→pq is between −→pbq and −→peq in a clockwise direction.Subsequently, we have following definition and lemmas.

Definition 3 (MaxA and MinA) The maximum (respectively minimum) cover an-gle in clockwise direction from the MBO of the DO2R-tree index node N to the ray−→pq, denoted as MaxA(MBO,MBRp, q) (respectively MinA(MBO,MBRp, q)), isdefined as:

MaxA(MBO,MBRp, q) = Max¦ù−→θb−→peq,ù−→pbq−→θe© (12)

MinA(MBO,MBRp, q) =8<:0 if

−→Θ overlaps with <−→pbq,−→peq>

Min¦ù−→θe−→pbq,ù−→peq−→θb© otherwise

(13)

Lemma 2 ∀f(p,Θ)∈ N(MBRp,MBO), ∀θ ∈ f.Θ,ø−→θ −→pq ≤MaxA(MBO,MBRp, q)

andø−→θ −→pq ≥ MinA(MBO,MBRp, q)

Proof Since there are two cases of the relationships between MBO and <−→pbq,−→peq>(see Figure 13), Lemma 2 is obviously true.

18 Ying Lu et al.

MBRpMBO

MaxA(MBO, MBRp, q)

MinA(MBO, MBRp, q)

e

b

(a) Case 1

MBRp

MBO

MaxA(MBO, MBRp, q)MinA(MBO, MBRp, q)

e

b

(b) Case 2

Fig. 13 Illustration of MinA and MaxA

Lemma 3 (Pruning strategy) Index node N can be pruned if it satisfies Eqn(14), orEqn(15), or Eqn(16),

MinD(q,MBRp) ≥ r +MaxR (14)MinA(MBO,MBRp, q) ≥ arcsin

r

MinD(MBRp, q)(15)

MinD(q,MBRp) cos(MaxA(MBO,MBRp, q))−Èr2 −MinD2(MBRp, q) sin2(MinA(MBO,MBRp, q)) ≤ MaxR, (16)

where MinD(MBRp, q) is the minimum distance from q to MBRp

Proof ∀f ∈ N , 1) If Eqn(14) is true then ∀R ∈ [MinR,MaxR], |pq| > r +R, i.e.,f does not satisfy Eqn(8) in Lemma 1. Obviously, f does not satisfy Equations 9, 10and 11, either. 2) If Eqn(15) is true, then, according to Lemma 2, for any viewabledistance R of f , the orientation of f does not point to any point in the query circle, andthus it does not satisfy Eqn(10) and Eqn(11). Obviously, f does not satisfy Eqn(8)nor Eqn(9). Additionally, 3) if Eqn(16) is true then f does not satisfy Equations 10and 11 nor Equations 8 and 9. Therefore Lemma 3 is true.

We are now ready to discuss our second strategy. We call an index node N a“total hit” iFF all the objects in N overlap with the query circle (i.e., they all belongto the results). This is a new concept that does not exist with regular R-trees. If anindex node N is a “total hit”, then it is not necessary to exhaustively check for all theobjects in N one by one, so the processing cost can be significantly reduced.

Hence, based on the two approximations above, we propose a novel search strat-egy, total hit strategy, to examine whether an index node N is a total hit or not withoutaccessing the FOV objects in the subtree of N .

Lemma 4 (Total Hit strategy) All the FOVs in the subtree of N can be reported asresults if it satisfies Eqn(17), or all the equations (18), (19) and (20),

MaxD(q,MBRp) ≤ r (17)MaxD(q,MBRp) ≤ r +MinR (18)

MaxA(MBO,MBRp, q) ≤ arcsinr

MaxD(MBRp, q)(19)

MaxD(q,MBRp) cos(MinA(MBO,MBRp, q))−Èr2 −MaxD2(MBRp, q) sin2(MaxA(MBO,MBRp, q)) ≤ MinR (20)


Proof ∀f ∈ N , if Eqn(17) is true then Eqn(8) in Lemma 1 is obviously true. ifEqn(18), Eqn(19) and Eqn(20) are true then Eqn(10) (or Eqn(11) ) in Lemma 1 istrue. Therefore Lemma 4 is true.

6.1.2 Search algorithm

Based on the two new strategies discussed above, we proceed to develop an efficientalgorithm to answer range queries (see Algorithm 2). The overview of the algorithmis to descend the DO2R-tree in the branch-and-bound manner, progressively applyingeach of the strategies to answer the range queries.

6.2 Directional queries

We next explain an efficient algorithm (see Algorithm 3) for processing directionalqueries with DO2R-tree. Given a direction interval Qd, we can easily decide whetherthe orientation of a DO2R-tree index node overlaps with Qd using orientation infor-mation in the DO2R-tree nodes. Similar to the range query algorithm, Algorithm 3also follows in a branch-and-bound manner, progressively applying the search strate-gies to answer the directional query. Note that we apply the range search strategiesand the orientation filtering methods at the same time to decide whether a DO2R-treeindex node should be pruned or a “total hit” (Line 5 and Line 7).

Algorithm 2: Range Query (R: DO2R-tree root, Qr: range circle)Output: All objects f ∈ results

1 Initialize a stack S with the root of the DO2R-tree R;2 while ¬S.isEmpty() do3 N ← S.top(); S.pop();4 if N satisfies Eqn(14) ∧ Eqn(15) ∧ Eqn(16) then5 Prune N ; //Lemma 3 ;6 else7 if N satisfies Eqn(17) ∨

Eqn(18)∧Eqn(19)∧Eqn(20)

then

8 results.add(N .subtree()) //Lemma 4 ;9 else

10 for each child node ChildN of N do11 S.push(ChildN );

7 Analysis

We analyze the maximum improvement spaces of FOV queries with R-trees for bothrange and directional queries.

20 Ying Lu et al.

Algorithm 3: Directional Query (R: DO2R-tree root, Qd: directional interval,Qr: range circle)

Output: All objects f ∈ results1 Initialize a stack S with the root of the DO2R-tree R;2 while ¬S.isEmpty() do3 N ← S.top(); S.pop();4 if N satisfies Eqn(14) ∧ Eqn(15) ∧ Eqn(16) then5 Prune N //Lemma 3;6 else7 if N satisfies Eqn(17) ∨

Eqn(18)∧Eqn(19)∧Eqn(20)

then

8 results.add(N .subtree()) //Lemma 4 ;9 else

10 for each child node ChildN of N do11 S.push(ChildN );

Lemma 5 Assume that the camera locations and orientations of N FOV objects areuniformly distributed in an area and 0−360, respectively, and the viewable dis-tances and viewable angles of the FOVs are the same (i.e., the FOV area is constant).Compared with the range query algorithm with R-tree in Section 4.1, the I/O cost ofthe optimal algorithm with the optimal index for range queries is at most 66.7% timesless than that of the approach with R-tree in Section 4.1.

Proof Given a range query Qr(q, r) and an arbitrary FOV f , if the MBR of f , f.mbr,overlaps with Qr, then there is at most 50% probability that f is a false positive sincethe area of the “dead space”, as shown in Figure 5, can be proved to be at most half ofthat of f.mbr. Let M be the number of FOVs whose MBRs overlap with Qr. Thenthe number of R-tree nodes that need to be accessed is M

f (1+Ph

i=11fi ) ≤ 3M

2f . Thenthe optimal search algorithm with an optimal index (i.e., only accessing results) visitsat most ( 3M2f − M

2f )/3M2f < 66.7% less nodes than the approach with R-tree. Hence,

Lemma 5 is true.

Lemma 5 implies that the maximum improvement over R-tree for range FOVqueries is 66.7%.

Lemma 6 Assume that the camera locations and orientations of N FOV objects areuniformly distributed in an area and 0−360, respectively, and the viewable dis-tances and viewable angles of the FOVs are the same. Given a query direction inter-val Qd(θb, θe), comparing with the algorithm of the directional query Qd with R-treein Section 4.1, the improvement of the optimal algorithm based on the optimal indexfor direction queries is 1−33.3%∗øθbθe/360 times less than that of the approach withR-tree in Section 4.1, whereøθbθe is the coverage angle from θb to θc in a clockwisedirection.

Proof Given a directional query with query direction interval Qd(θb, θe) and rangeQr(q, r), for an arbitrary FOV f , the probability that if f is a false positive forthe directional query is 50%(1 −øθbθe/360) since there is 50% probability that f


overlaps with Qr, according to Lemma 5, and there is probabilityøθbθe/360 that fis within the direction interval Qd. Let M be the number of FOVs whose MBRsoverlap with Qr and whose orientations overlap with Qd. Then the number of R-tree nodes that need to be accessed is M

f (1 +Ph

i=11fi ) ≤ 3M

2f . Then the optimalsearch algorithm with an optimal index (i.e., only accessing results) visits at most( 3M2f − M

2føθbθe/360)/ 3M

2f <1− 33.3% ∗øθbθe/360 less nodes than the approach withR-tree. Hence, Lemma 6 is true.

Let IRopt (=66.7%) be the maximum percentage that the range query algorithmwith R-tree can be improved, and let IDopt (=1−33.3%∗øθbθe/360) be the maximumpercentage that the directional query algorithm with R-tree can be improved. Since0 ≤øθbθe/360 ≤ 1, obviously, IDopt ≥ IRopt. Lemma 6 implies that the maximumimprovement over R-tree for directional FOV queries is at least 66.7%. Further, theimprovement space decreases as the query directional interval increases.

8 Performance Evaluation

We conducted experimental studies to evaluate the efficiency of our methods usingtwo fundamental queries: range and directional.

8.1 Experimental methodology

Implemented Indexes We implemented our proposed indexes and search algo-rithms: OAR-tree, O2R-tree, and DO2R-tree for range and directional queries. Inaddition, we implemented two baselines for comparison: R-tree and Grid based In-dex (Grid).Datasets and Queries We used two types of datasets: Real World (RW ) and Syn-thetically Generated (Gen) dataset as shown in Table 1. RW was collected by morethan 100 people: around 50% of videos at the University of Southern California(USC), 30% of videos at Singapore downtown and the National University of Sin-gapore (NUS), and 20% of videos at 18 other cities all over the world, e.g., Chicago,London. The distribution of FOVs in RW was very skewed. To evaluate the scalabil-ity of our solutions, we synthetically generated five Gen datasets with different datasizes in log scale from 0.1M (million) to 1B (billion) FOVs using the mobile videogeneration algorithm presented in [3]. In Gen, FOVs were uniformly distributedacross the 10km×10km area around the USC, and their view distances were also ran-domly distributed in 100∼500 meters, assuming a group of persons being distributedin the 10km × 10km area initially and then walking randomly at a speed of around4.86 km/h and taking videos with their smartphones, see Table 2. Unless specified,the dataset size of 1M FOVs was assumed by default in the reported experimentalresults. To evaluate the performance of our solutions under the setting of skewed dis-tribution, we additionally synthetically generated five Gen datasets from 0.1M to 1Bwhere FOVs are non-uniformly (gaussian) distributed in the 10km× 10km area.

22 Ying Lu et al.

For each dataset (RW and Gen), we generated 5 query sets for range queries byincreasing the query radius from 100 to 500 meters incremented by 100 meters, whichis a reasonable variation range for video queries since users usually are interestedin videos in a small specified area. Each range query set contained 10,000 querieswith different query locations but the same query radius. For Gen, query points wereuniformly distributed in the 10km × 10km area around USC. For RW , half of therange queries were uniformly distributed in the 10km × 10km area around USCwhile the other half were distributed in a 10km × 10km area in Singapore. Again,unless specified, the query radius of 200 meters was assumed by default in the rest ofdiscussion. Additionally, we generated query sets for directional queries varying thequery direction interval from 45 to 315 incrementing by 45.

Table 1 Datasets for the experiments

Statistics RW Gentotal # of FOVs 0.2 M 0.1 M∼1 Btotal # of videos 1276 100∼1 M

FOV# per second 1 1average time per videos 3 mins 0.28 hours

total time 62 hours 27.8 hours ∼ 31.71 yearsaverage camera moving speed ( km/h) 4.50 4.86

average camera rotation speed ( degrees/s) 10 10average viewable distance R (meters) 100 250

average viewable cover angle α (degrees) 51 60

Table 2 Synthetically Generated Dataset

total FOV# 0.1 M 1 M 10 M 100 M 1 Btotal video# 100 1 K 10 K 100 K 1 Mtotal time 27.8 hours 11.6 days 115.7 days 3.2 years 31.7 years

Setup and metrics. We implemented all the indexes on a server with Intel CoreTM2Duo CPU [email protected], 4GB of RAM and used the page size of 4KB. We eval-uated their performance based on disk-resident data. The fanouts of R-tree, OAR-tree(respectively O2R-tree, DO2R-tree) were 170, 102, respectively. For the baseline ofthe Grid-based Index, following the setting reported in [14], we set the first-level cellsize to be 250 meters, set the second-level cell size to be 62.5 meters, and set theview direction interval to be 45. As the metrics for our evaluation, we report theaverage query processing time and the average number of page accesses per queryafter executing 10,000 queries per each set.Space requirement. The space usages of the index structures for different datasetsare reported in Table 3. The space requirements of OAR-tree, O2R-tree, DO2R-treewere almost identical so we report only the space usage of DO2R-tree. DO2R-treerequires a littles more space than R-tree since it needs extra space to store the ori-entations and viewable distances in each index node. However, the space usage of


DO2R-tree was significantly less (about 5 times less) than that of Grid because Gridredundantly stores each FOV at each level.

Table 3 Sizes of indexing structures (Megabytes)

RW 0.1M 1M 10M 100M 1BR-tree 8.66 3.41 32.69 289 2,536 23,978Grid 42.02 16.70 163 1,576 16,519 163,420

DO2R-tree 9.75 5.51 60.25 379 3,784 38,299

8.2 Evaluation of range query

In this set of experiments, we evaluated the performance of the indexes for rangequeries using the Gen datasets. First, Figure 14 reports the average number of pageaccesses of all indexes. Figure 14(b) shows the same but focuses on data size from0.1M to 10M. Note that the data size is shown in log scale but the page access num-ber is shown in linear scale. The most important observation is that both O2R-treeand DO2R-tree significantly outperformed the other indexes, which demonstrates thesuperiority of our optimization approaches. O2R-tree (respectively DO2R-tree) ac-cessed around 40% (respectively 50%) less pages than Grid, and around 50% (re-spectively 60%) less than R-tree. The improvement of DO2R-tree is very close tothe theoretical maximum improvement (66.7%, as analyzed in Section 7), clearlyshowing the effectiveness of orientation and view distance optimizations and searchalgorithms of O2R-tree and DO2R-tree. Another observation is that OAR-tree in-curred slightly more page accesses than R-tree. This is expected because OAR-tree isonly based on the optimization of minimizing the covering area of camera locations.The dead space of OAR-tree node can be larger than that of R-tree node, thus it mayproduce more false positives than R-tree, even though it incorporates view orientationand distance information into the index nodes for filtering. This demonstrates that notthe simple consideration of orientation but the optimization criteria considering theorientation significantly facilitates the reduction of the dead spaces of tree nodes andsubsequently the reduction of false positives. In addition, DO2R-tree was marginallybetter than O2R-tree for range queries since incorporating additional viewable dis-tance for optimization helped the acceleration of the range queries. Overall, the num-ber of page accesses increased linearly as the dataset size, i.e., the number of FOVs,increased.

Second, Figure 15 reports the average query processing time in the above experi-ments. The overall performance improvement showed a similar trend as in Figure 14but one can observe that O2R-tree and DO2R-tree provided a better improvement inthe percentage of reduction as compared to R-tree and Grid than the case of pageaccess. This is because the “total hit” search strategy applied on our indexes can fa-cilitate the reduction of processing time, while R-tree (respectively Grid) needs tocheck all the FOV objects in a node (respectively cell) even though all the FOVs areresults.

24 Ying Lu et al.

Finally, Figure 16 reports the impact of query radius on the average page accesseswhile varying the query radius from 100 to 500 meters. Other than Grid, the perfor-mance trend held for other indexes as the query radius increased. One can obviouslysee that the number of page accesses for Grid increases rapidly and is significantlygreater than that of the other indexes for radius greater than 200 meters. This is be-cause the first-level cell size of Grid was set at 250 meters in our experiments so Gridneeds to access more first-level cells, each of which contains more objects. Grid isnot flexible for queries with various radiuses.

8.3 Evaluation of directional query

In this set of experiments, we evaluated the performance of the indexes for directionalquery using the same datasets. We used the default query radius of 200 meters andset the query direction interval to be 90 in these experiments.

As shown in Figure 17, the overall performances of O2R-tree and DO2R-treewere significant and similar to the results of range queries. Specifically, O2R-tree (re-spectively DO2R-tree) accessed about 70% (respectively 65%) less number of pagesthan Grid and accessed about 67% (respectively 63%) less than R-tree. This demon-strates that the orientation optimization in building O2R-tree and DO2R-tree wasmore effective in supporting directional queries (as expected). O2R-tree is based onthe optimizations of locations and orientations while DO2R-tree considers all op-timization criteria, i.e., locations, orientations, view distances. DO2R-tree performsslightly worse for the directional queries than O2R-tree since DO2R-tree has a lowerextent of optimization on orientation than O2R-tree.

Next, we evaluated the impact of the query direction interval. In Figure 8.2, asexpected, the number of page accesses for R-tree was not influenced by the querydirection interval. The number of page accesses for other indexes slightly increasedwhen the angle increased. Clearly, O2R-tree and DO2R-tree demonstrated at least 2times better performance than others. The number of page accesses for Grid growsmuch faster than other indexes because Grid needs to visit its third level for orienta-tion filtering for each candidate at the second-level cell and the number of candidatecells at the third level linearly increases as query direction interval grows.

8.4 Impact of weight parameters

This set of experiments aimed to evaluate how each optimization criteria (i.e., loca-tion, orientation and view distance) impacts the index performance. First, we builtO2R-trees with different weight for orientation difference βo (in Eqn 5) from 0 to 1(BetaO in Figure 18). Figure 18 shows the number of page accesses of O2R-tree,built with different βo, for range query with 1M FOV dataset. Note that the case ofβo = 0 is actually an OAR-tree which does not use any orientation information foroptimization and hence its performance is the worst. This shows simply augmentingnodes with orientation without optimization does not help. As we increase βo, i.e.,applying more optimization using orientation, the performance of O2R-tree becomessignificantly better in the mid range of βo and becomes the best around βo = 0.4.


0

10000

20000

30000

40000

50000

60000

0.1M 1M 10M 100M 1B

# of

pag

e ac

cess

es

dataset size

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

(a) 0.1M ∼ 1B

0

100

200

300

400

500

600

700

800

0.1M 1M 10M

# of

pag

e ac

cess

es

dataset size

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

(b) 0.1M ∼ 100MFig. 14 Page accesses of range queries

0

0.5

1

1.5

2

2.5

3

3.5

4

0.1M 1M 10M 100M 1B

Que

ry ti

me

(sec

onds

)

datasize

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

Fig. 15 Query processing time of range queries

0

100

200

300

400

500

600

700

800

100 200 300 400 500

# of

pag

e ac

cess

es

query radius (meter)

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

Fig. 16 Impact of radius in range queries

0

10000

20000

30000

40000

50000

60000

0.1M 1M 10M 100M 1B

# of

pag

e ac

cess

es

dataset size

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

(a) 0.1M ∼ 1B

0

100

200

300

400

500

600

0.1M 1M 10M

# of

pag

e ac

cess

es

dataset size

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

(b) 0.1M ∼ 10MFig. 17 Varying dataset size for Directional queries

20

40

60

80

100

120

0 0.2 0.4 0.6 0.8 1

# of

pag

e ac

cess

es

BetaO in O2R-tree

O2R-tree

Fig. 18 Vary βo in O2R-tree

20

40

60

80

100

120

0 0.2 0.4 0.6 0.8 1

# of

pag

e ac

cess

es

BetaD in DO2R-tree

DO2R-tree

Fig. 19 Vary βd in DO2R-tree

0

10000

20000

30000

40000

50000

60000

0.1M 1M 10M 100M 1B

# of

pag

e ac

cess

es

dataset size

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

(a) 0.1 M ∼ 1 B

0

10000

20000

30000

40000

50000

60000

70000

0.1M 1M 10M 100M 1B

# of

pag

e ac

cess

es

dataset size

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

(b) 0.1 M ∼ 1 BFig. 20 Comparison of skewed Gen dataset

26 Ying Lu et al.

20

40

60

80

100

120

45 90 135 180 225 270 315

# of

pag

e ac

cess

es

Query direction interval (degree)

R-treeGrid

OAR-treeO2R-tree

DO2R-tree

Fig. 21 Vary query direction interval for directionalqueries

0 5

10 15 20 25 30 35 40

R-tree Grid OAR O2R DO2R

# of

pag

e ac

cess

es

Indexes

RangeDirectional

Fig. 22 Comparison on RW dataset

However, close to the other extreme case of βo = 1 (i.e., βl=0), only orientation dif-ference is considered in building O2R-tree so the performance of range query suffers.Hence, a good balance between area and angle wastes can produce the best result.

Next, we built DO2R-trees with different weight of viewable distance differenceβd (in Eqn 7) from 0 to 1 (BetaD in Figure 19). Learned from the previous result,we set βl = 0.6 ∗ (1 − βd) and βo = 0.4 ∗ (1 − βd) to assume their best setting.When βd = 0, the tree is actually O2R-tree with βo = 0.4. When βd = 1, onlyviewable distance difference is considered. Figure 19 shows that the performance ofDO2R-tree was comparable (best when βd = 0.2) when βd ≤ 0.6 but it suffered asβd approached 1. In our dataset, the distance difference among FOVs were not largeso the impact of βd was minimal. For a specific application, we can find appropriateparameter settings empirically.

8.5 Evaluation using skewed Gen dataset

This set of experiments evaluates the performance of our indexes and search algo-rithms for FOV objects with skewed distribution.

As shown in Figure 20(a), the overall performances of O2R-tree and DO2R-treesignificantly outperform R-tree and Grid for range queries. The improvements ofO2R-tree and DO2R-tree over R-tree show the same trend as in the experiments withthe uniform Gen dataset while those over the Grid become more pronounced. Specif-ically, O2R-tree (respectively DO2R-tree) accessed about 65% (respectively 60%)less number of pages than Grid. This is because the occupancy of grid files rises verysteeply for skewed distribution.

Figure 20(b) plots the results for directional queries. We can see that O2R-treeand DO2R-tree show even more improvements over Grid. This is expected since theorientation optimization in building O2R-tree and DO2R-tree was more effective insupporting directional queries than range queries.


8.6 Evaluation using RW dataset

To evaluate the effectiveness of our indexes and search algorithms in a real-worldsetting, we also conducted a set of experiments using the real-world (RW ) dataset,which follows a skewed distribution. As shown in Figure 22, the results show thesame trends as in the previous experiments using the skewed Gen dataset, for bothrange and directional queries.

To summarize, the experimental results demonstrate that our proposed indexesO2R-tree and DO2R-tree and corresponding search algorithms outperform the twobaseline indexes for both range and directional queries.

9 Conclusion and Future Work

This paper represents video data as a series of spatial objects with orientations, i.e.,FOV objects, and proposed a class of R-tree-based indexes that can index location,orientation and distance information of FOVs for both filtering and optimization. Inaddition, our indexes can flexibly support user generated videos that contain ad-hocFOVs with potentially skewed distributions. Further, two novel search strategies wereproposed for fast video range and directional queries on top of our index structures.Our indexes are not specific for FOVs with sector shapes and can be used to indexany spatial object with orientation in other shapes such as vectors, triangles and par-allelograms. This is because, like the sector-shaped FOV objects, objects shaped invectors, triangles and parallelograms can also be defined as objects being comprisedof a position p, a direction tuple

−→Θ , and a radius tuple MinMaxR. Additionally, our

indexes and search algorithms can be easily extended for 3-dimensional FOVs. Weintend to extend this work in two directions. First, we intend to extend our indexesto the cloud for even larger sets of video data. Second, we would like to study theinsertion and update costs of our indexes and study techniques for batch insertion ofvideo.

References

1. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white_paper_c11-520862.pdf.

2. A. S. Ay, R. Zimmermann, and S. H. Kim. Viewable Scene Modeling for Geospatial Video Search.In ACM Intl. Conf. on MM, pages 309–318, 2008.

3. S. A. Ay, S. H. Kim, and R. Zimmermann. Generating synthetic meta-data for georeferenced videomanagement. In the SIGSPATIAL, pages 280–289, 2010.

4. S. A. Ay, R. Zimmermann, and S. H. Kim. Relevance ranking in georeferenced video search. ACMMultimedia Systems (MMSys), 16(2):105–125, Mar. 2010.

5. F. Gilboa-Solomon, G. Ashour, and O. Azulai. Efficient storage and retrieval of geo-referenced videofrom moving sensors. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Ad-vances in Geographic Information Systems, SIGSPATIAL’13, pages 404–407, New York, NY, USA,2013. ACM.

6. A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pages 47–57,1984.

28 Ying Lu et al.

7. V. Jain and B. Shneiderman. Data structures for dynamic queries: An analytical and experimentalevaluation. In Proc. of the Workshop on Advanced Visual Interfaces. NY: ACM, pages 1–11, 1994.

8. S. H. Kim, Y. Lu, G. Constantinou, C. Shahabi, G. Wang, and R. Zimmermann. Mediaq: Mobilemultimedia management system. In ACM MMSys, pages 224–235, 2014.

9. Y. Kim, J. Kim, and H. Yu. Geotree: Using spatial information for georeferenced video search. InKnowledge-Based Systems, pages 1–12, 2014.

10. K. C. Lee, W.-C. Lee, and H. V. Leong. Nearest surrounder queries. IEEE TKDE, 22(10):1444–1458,2010.

11. G. Li, J. Feng, and J. Xu. Desks: Direction-aware spatial keyword search. In Proc. of the 28th IEEEICDE, pages 474–485, 2012.

12. X. Liu, S. Shekhar, and S. Chawla. Object-based directional query processing in spatial databases.Proc. of IEEE TKDE, 15(2):295–304, Feb. 2003.

13. Y. Lu, C. Shahabi, and S. H. Kim. An efficient index for large scale geo-tagged video databases. InProc. of the 22nd ACM SIGSPATIAL Intl. Conf. on GIS, 2014.

14. H. Ma, S. A. Ay, R. Zimmermann, and S. H. Kim. Large-scale geo-tagged video indexing and queries.GeoInformatica, Dec. 2013.

15. T. Navarrete and J. Blat. Videogis: Segmenting and indexing video based on geographic information.In Proc. of the Conf. on geographic information science, pages 1–9, 2002.

16. M. Schneider, T. Chen, G. Viswanathan, and W. Yuan. Cardinal directions between complex regions.ACM TODS, 37(2):8:1–8:40, June 2012.

17. S. Shekhar, X. Liu, and S. Chawla. An object model of direction and its implications. Geoinformatica,3(4):357–379, Dec. 1999.

18. Z. Shen, S. Arslan Ay, S. H. Kim, and R. Zimmermann. Automatic tag generation and ranking forsensor-rich outdoor videos. In Proc. of the 19th ACM intl. conf. on Multimedia, pages 93–102, 2011.

19. Y. Tao, D. Papadias, and J. Sun. The tpr*-tree: an optimized spatio-temporal access method forpredictive queries. In Proc. of the 29th Intl. Conf. on VLDB, volume 29, pages 790–801, 2003.

20. Y. Theodoridis, D. Papadias, and E. Stefanakis. Supporting direction relations in spatial databasesystems. In Proc. of the 7th Intl. Symposium on Spatial Data Handling(SDH’96), 1996.

21. K. Toyama, R. Logan, and A. Roseway. Geographic location tags on digital images. In Proc. of the11th ACM Intl. Conf. on MM, pages 156–166, 2003.

22. K. young Whang and R. Krishnamurthy. The multilevel grid file - a dynamic hierarchical multidi-mensional file structure. In Proc. Intl. Conf. on Database Systems for Advanced Applications, pages449–459, 1991.

23. Z. Zhu, E. M. Riseman, A. R. Hanson, and H. Schultz. An efficient method for geo-referenced videomosaicing for environmental monitoring. Mach. Vision Appl., 16(4):203–216, Sept. 2005.

Documents

Efﬁcient Indexing and Retrieval of Large-scale Geo-tagged ...ylu720/resource/lu-GeoInformatica16.pdf · We propose our index structures in Section 5 and describe search algorithms