8
Picture Context Capturing for Mobile Databases Stavros Christodoulakis and Michalis Foukarakis Technical University of Crete Lemonia Ragia Advanced Systems Group Hiroaki Uchiyama and Takuya Imai Ricoh M obile device manufacturers today are embedding sensors for GPS measurements, com- pass data, and time informa- tion in their cameras, mobile phones, and PDAs, opening up a wide range of opportunities for better management of and interaction with pictures in databases. In this article, we describe a software environment that uses the informa- tion from sensors to provide rich picture manage- ment functionality. The software environment offers several services, including semantic map personalization, spatial picture registration, and identification of the semantic objects. The system we describe takes advantage of position and direction sensors that associate picture contents with the captured environ- ment. The semantic maps associate and visual- ize geometric representations of semantic objects (like medieval forts) with regions on a map. These maps can be personalized by repre- senting concepts or items of interest. The semantic maps also include geographic seman- tic objects, such as mountains, oceans, and so on. The system identifies properties of the geo- graphic semantic objects, such as color and shape, to find such objects on the picture, guided by the knowledge of the location and direction of the picture and the spatial context provided by the semantic maps. The system can register the objects of the semantic maps on top of the picture, which allows us to develop advanced database functionalities for semantic content retrieval, interaction with the semantic objects in pictures, and visualization of the database contents on top of maps. The system also supports user event modeling and captur- ing and automatically associates the events with the pictures, using contextual data. Approach and applications In comparison to previous work, see the ‘‘Related Work’’ sidebar, our emphasis is on capturing and exploiting the contextual param- eters at the time of picture taking and using camera-integrated sensors and algorithms for precise picture registration in the captured spa- tial context. Our objective is to exploit picture taking through deep geospatial semantics related to the picture content to provide a com- plete value chain for offering rich functionality to end users. We have integrated this functionality in the SPatial Image Management (SPIM) software. A particular application of this software environ- ment is for tourists, who can choose to view se- mantic maps of the places they are going to visit and get detailed semantic information about objects of interest (such as parks, tem- ples, villages, and so on). The system can pro- cess the pictures taken during the trip and provide a living memory of the trip. The se- mantic objects depicted in them can be shown on top of the pictures themselves, allow- ing interactive exploration of the picture con- tents and linking with other information sources. The combination of GPS and compass sensors with the camera and the semantic maps is also useful in many other applications, such as mobile learning, damage registration in disaster areas, and archaeological site or out- door zoo touring. The software environment enables the cre- ation and use of a knowledge base containing objects that might be of interest to the user. The knowledge base might include a number of domain ontologies, such as Greek archeolog- ical monuments, medieval churches, modern cultural buildings, and so on. The domain ontologies consist of hierarchies of semantic concepts and types with attributes; each Mobile and Ubiquitous Multimedia A sensor-based camera system associates picture contents with the captured environment to enable semantic content retrieval, interaction, and visualization. 1070-986X/10/$26.00 c 2010 IEEE Published by the IEEE Computer Society 34

Picture Context Capturing for Mobile Databases

  • Upload
    takuya

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Picture Context Capturing for Mobile Databases

Picture ContextCapturingfor MobileDatabases

Stavros Christodoulakis and Michalis FoukarakisTechnical University of Crete

Lemonia RagiaAdvanced Systems Group

Hiroaki Uchiyama and Takuya ImaiRicoh

Mobile device manufacturers

today are embedding sensors

for GPS measurements, com-

pass data, and time informa-

tion in their cameras, mobile phones, and

PDAs, opening up a wide range of opportunities

for better management of and interaction with

pictures in databases. In this article, we describe

a software environment that uses the informa-

tion from sensors to provide rich picture manage-

ment functionality. The software environment

offers several services, including semantic map

personalization, spatial picture registration,

and identification of the semantic objects.

The system we describe takes advantage of

position and direction sensors that associate

picture contents with the captured environ-

ment. The semantic maps associate and visual-

ize geometric representations of semantic

objects (like medieval forts) with regions on a

map. These maps can be personalized by repre-

senting concepts or items of interest. The

semantic maps also include geographic seman-

tic objects, such as mountains, oceans, and so

on. The system identifies properties of the geo-

graphic semantic objects, such as color and

shape, to find such objects on the picture,

guided by the knowledge of the location and

direction of the picture and the spatial context

provided by the semantic maps. The system can

register the objects of the semantic maps on top

of the picture, which allows us to develop

advanced database functionalities for semantic

content retrieval, interaction with the semantic

objects in pictures, and visualization of the

database contents on top of maps. The system

also supports user event modeling and captur-

ing and automatically associates the events

with the pictures, using contextual data.

Approach and applicationsIn comparison to previous work, see the

‘‘Related Work’’ sidebar, our emphasis is on

capturing and exploiting the contextual param-

eters at the time of picture taking and using

camera-integrated sensors and algorithms for

precise picture registration in the captured spa-

tial context. Our objective is to exploit picture

taking through deep geospatial semantics

related to the picture content to provide a com-

plete value chain for offering rich functionality

to end users.

We have integrated this functionality in the

SPatial Image Management (SPIM) software. A

particular application of this software environ-

ment is for tourists, who can choose to view se-

mantic maps of the places they are going to

visit and get detailed semantic information

about objects of interest (such as parks, tem-

ples, villages, and so on). The system can pro-

cess the pictures taken during the trip and

provide a living memory of the trip. The se-

mantic objects depicted in them can be

shown on top of the pictures themselves, allow-

ing interactive exploration of the picture con-

tents and linking with other information

sources. The combination of GPS and compass

sensors with the camera and the semantic

maps is also useful in many other applications,

such as mobile learning, damage registration in

disaster areas, and archaeological site or out-

door zoo touring.

The software environment enables the cre-

ation and use of a knowledge base containing

objects that might be of interest to the user.

The knowledge base might include a number

of domain ontologies, such as Greek archeolog-

ical monuments, medieval churches, modern

cultural buildings, and so on. The domain

ontologies consist of hierarchies of semantic

concepts and types with attributes; each

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 34

Mobile and Ubiquitous Multimedia

A sensor-based

camera system

associates picture

contents with the

captured

environment to

enable semantic

content retrieval,

interaction, and

visualization.

1070-986X/10/$26.00 �c 2010 IEEE Published by the IEEE Computer Society34

Page 2: Picture Context Capturing for Mobile Databases

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 35

Related WorkMuch research in the past has focused on the automatic

classification of pictures using low-level features. Scene clas-

sification approaches exploit domain semantics and global

image features to give general descriptions of pictures and

their content (streets, buildings, and so on), or classify

them as indoor or outdoor.1,2 Image metadata such as expo-

sure time and aperture have also been used for classifica-

tion.3 Our work focuses on the detailed annotation of the

parts of the images that contain significant objects, not clas-

sification of the image as a whole. Significant parts of our

work focus on landscape pictures. Important research has

been done in the identification of parts of a picture, for ex-

ample, a blue sky.4 We exploit GPS and compass information

as well as additional geographic context to improve the

existing algorithms for picture registration.

Several authors have discussed the use of ontologies as a

means of image and video annotation.5-7 An advantage to

these approaches is that they systematically manage the

knowledge in a domain including concept type hierarchies,

concept properties, and individuals, unlike tags found in so-

cial networks. The plethora of images found in folksonomies

and photo-sharing sites such as Flickr has been exploited to

extract activity, event, and place semantics from user tagged

photos for annotating picture contents.8,9 The quality of the

tags, however, is often questionable.

A problem with the ontology-based approaches is that

they often rely on extensive user manual annotation during

database insertion, a task that’s unlikely to occur due to the

time required. The capturing of context at the time that a pic-

ture is taken can provide the means for automatic semantic

annotations and powerful semantic retrieval functionality.10,11

Another research project aimed to assist in organizing col-

lections of georeferenced pictures and combine location and

time parameters along with minimal user annotation to de-

rive some of the picture semantic content.12 In addition,

the World Wide Media eXchange (WWMX) project provides

another important option for organizing georeferenced

images.13 Pictures are indexed by the WWMX database

according to time and location. The WWMX browser visual-

izes them using a map interface and provides retrieval func-

tionality. This method presents different approaches to

acquiring location tags, browsing images, and visualizing

them on a map. There are many important early applications

in this area, notably in culture and tourism, that have used

functionalities similar to what we describe in this article.14

References

1. M. Vailaya et al., ‘‘Image Classification for Content-Based

Indexing,’’ IEEE Trans. Image Processing, vol. 10, no. 1,

2001, pp. 117-129.

2. A. Yavlinsky, E. Schofield, and S. Ruger, ‘‘Automated Image

Annotation Using Global Features and Robust Nonparametric

Density Estimation,’’ Image and Video Retrieval, W.K. Leow

et al., eds., LNCS 3568, 2005, Springer, pp. 507-517.

3. M. Boutell and J. Luo, ‘‘Beyond Pixels: Exploiting Camera

Metadata for Photo Classification,’’ Proc. IEEE Conf Com-

puter Vision and Pattern Recognition (CVPR), vol. 38, no. 7,

Elsevier, 2004, pp. 935-946.

4. A.C. Gallagher, J. Luo, and W. Hao, ‘‘Improved Blue Sky

Detection Using Polynomial Model Fit,’’ Proc. IEEE Int’l

Conf. Image Processing, IEEE Press, 2004, pp. 2367-2370.

5. L. Hollink, ‘‘Adding Spatial Semantics to Image Annota-

tions,’’ Proc. 4th Int’l Workshop on Knowledge Markup

and Semantic Annotation, 2004, pp. 31-40; http://www.

few.vu.nl/~guus/papers/Hollink04c.pdf.

6. C. Tsinaraki and S. Christodoulakis, ‘‘An MPEG-7 Query Lan-

guage and a User Preference Model that Allow Semantic Retrieval

and Filtering of Multimedia Content,’’ Proc. ACM Verlag Multi-

media Systems J., special issue on semantic multimedia adapta-

tion and personalization, vol. 13, no. 2, 2007, pp. 131-153.

7. C. Tsinaraki, P. Polydoros, and S. Christodoulakis, ‘‘Interoper-

ability Support between MPEG-7/21 and OWL in DS-MIRF,’’

IEEE Trans. Knowledge and Data Engineering, special issue on

the Semantic Web era, vol. 19, no. 2, 2007, pp. 219-232.

8. D. Joshi and J. Luo, ‘‘Inferring Generic Activities and Events

from Image Content and Bags of Geo-Tags,’’ Proc. Conf.

Image And Video Retrieval, ACM Press, 2008, pp. 37-46.

9. T. Rattenbury, N. Good, and M. Naaman, ‘‘Towards Auto-

matic Extraction of Event and Place Semantics from Flickr

Tags,’’ Proc. Ann. ACM Conf. Research and Development in

Information Retrieval, ACM Press, 2007, pp. 103-110.

10. Christodoulakis et al., ‘‘Semantic Maps and Mobile Context

Capturing for Picture Content Visualization and Management

of Picture Databases,’’ Proc 7th Int’l Conf. Mobile and Ubiqui-

tous Multimedia (MUM), ACM Press, 2008, pp. 130-136.

11. J. Li et al., ‘‘New Challenges in Multimedia Research for

the Increasingly Connected and Fast Growing Digital Soci-

ety,’’ Proc. ACM Int’l Conf. Multimedia Information Retrieval

(MIR), ACM Press, 2007, pp. 3-10.

12. M. Naaman, Leveraging Geo-Referenced Digital Photographs,

doctoral dissertation, Stanford Univ., 2005.

13. K. Toyama, R. Logan, and A. Roseway, ‘‘Geographic Loca-

tion Tags on Digital Images,’’ Proc. 11th Int’l Conf. Multi-

media, ACM Press, 2003, pp. 156-166.

14. S. Christodoulakis et al., ‘‘A Distributed Multimedia Tour-

ism Information System,’’ Proc. Int’l Conf. Information and

Communication Technologies in Tourism (Enter), 1997,

pp. 295-306; http://195.130.87.21:8080/dspace/bitstream/

123456789/604/1/Minotaurus%20a%20distributed%

20multimedia%20tourism%20information%20system.pdf.

35

Page 3: Picture Context Capturing for Mobile Databases

semantic object (individual) belongs to one of

these concepts. A special case of the ontologies

supported is a semantic geographic ontology

that contains concepts such as lakes, oceans,

mountains, islands, villages, and so on.

The software environment helps create se-

mantic maps that associate polygon representa-

tions (also called footprints) of semantic

individuals. The software associates semantic

individuals with a set of GPS positions that de-

scribe their enclosing polygon on the land. The

footprint can then be visualized on top of any

calibrated map, making the knowledge base in-

dependent of map information and allowing

reuse of the same semantic objects in different

map environments.

Because the number of domain ontologies

and semantic individuals contained in seman-

tic maps can be large, the users are provided

with services to personalize the content of

each map to suit their interests. They can spec-

ify conditions on the ontologies that they want

represented, the types from each ontology, as

well as specific semantic individuals. The ser-

vices help construct a personalized semantic

map. Personalized semantic maps contain

fewer objects, and only objects that are of inter-

est to the user. This reduces the chance of infor-

mation overload and improves the visualization

of such maps on screens.

Managing pictures and their

semantic contentThis section describes the integration of the

camera with the GPS and compass sensors and

the capturing of the Exif1 metadata, as well

as the use of the Exif metadata for associating

the picture with the spatial context that it cap-

tures. We have used a Ricoh Caplio 500 SE dig-

ital camera that communicates with a GPS

receiver with an integrated digital compass

using a Bluetooth interface. Recent camera

models already integrate these sensors. The ad-

ditional position and direction parameters cap-

tured by the sensors are automatically stored in

the Exif header of the produced image, along

with image capturing parameters (such as

focal length, aperture, and so on) and other

metadata.

We use the information captured in the Exif

data at the time of picture taking to calculate

contextual parameters that let us associate the

digital picture’s segments with the semantic

spatial objects that the picture captures. Our

objective is to be able to describe as accurately

as possible the spatial content of the digital

image. To do that, we use standard camera

parameters such as the sensor size, picture tak-

ing parameters such as the focal length, GPS

parameters such as location and altitude, and

compass parameters such as the angle with re-

spect to the magnetic north. These parameters

allow us to calculate the location and direction

of the picture with respect to the geographic

north and the camera angle of view.

Taking into account the camera location

and direction, and associating it with the spatial

information and the semantic geographical

individuals contained in semantic maps, we

can automatically predict the geographic

objects that appear in the direction of the pic-

ture. Associating the contextual metadata

about a picture with the semantic maps lets us

similarly predict the semantic objects described

by the semantic map ontologies that are within

the picture. When more than one semantic ob-

ject is predicted to be within a picture, the

objects are ranked according to their distance

and their relative location within the picture’s

angle of view and focusing area, if available.

The association of a picture with the semantic

objects can be used for metadata generation re-

lated to the picture’s contents or for under-

standing and visualizing the content as a way

to more effectively understand the real world.

For the purpose of picture annotation, we

calculate the 2D model of the picture contents

by taking into account the land formations and

semantic objects in the direction of the picture

(see Figure 1). There is no exact correspondence

between the picture contents and the 2D model

of the picture contents (or the visible horizon

in the picture and the visible horizon in the

2D representation) because of the additional

degrees of freedom of the camera (tilt and rota-

tion). This might not be crucial for the purpose

of picture annotation (some false drops might

result if the camera’s tilt results in the cropping

of some of the predicted objects). However,

exact correspondence becomes important

when the precise location of semantic objects

within the pictures is used in applications

that allow user interaction with the semantic

objects as a kind of virtual window to the

world. This functionality requires more precise

registration of the picture with the 2D represen-

tation of the spatial contents, so that when the

user points the mouse cursor at a specific

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 36

IEEE

Mu

ltiM

ed

ia

36

Page 4: Picture Context Capturing for Mobile Databases

picture location, the system will be able to infer

which spatial real-world objects are at this posi-

tion. This kind of accuracy can’t be directly

obtained from just the GPS and compass data.

Spatial context registrationTo obtain the additional accuracy needed for

user interaction and visualization with pictures,

we developed algorithms that match picture

contents with the 2D view of the spatial envi-

ronment—obtained from the camera location

and the picture direction, both recorded by

GPS and compass in the Exif data. To calculate

this 2D view, which includes land formations

and the semantic objects, an algorithm traces

the rays that start from the camera and move

along the camera direction within the angle

of view until they reach geographic formations

that stop them (forming a picture cone).2 The

algorithm can detect discontinuities that

come from ground formations (for example a

hill followed by a valley followed by a moun-

tain creates a discontinuity in the visible boun-

daries of the hill). The semantic objects

themselves (including geographic objects such

as islands) or the visible horizon might create

other discontinuities.

To match the objects of the picture with the

objects in the 2D view, we segment the picture

using a modified statistical region-merging

method3 to obtain important segment bounda-

ries and other characteristics, such as mountain

peaks, that can be matched with the corre-

sponding 2D shapes. Because the semantic

maps include geographic objects and their foot-

prints, we can exploit characteristics of any

type of visible geographic object (such as sky

and ocean color) to find the location of those

objects within the picture. For the current sys-

tem, we have concentrated on extracting the

boundaries of the skyline, high mountains,

and the ocean area, but we plan to investigate

additional possibilities in the future. The boun-

daries of those objects are calculated from both

the segmented picture and the 2D representa-

tion to be used later for the registration

algorithm.

Because the picture objects might not match

the 2D objects (due to errors in direction, tilt,

rotation, and so on), we want to transform

the 2D representation so that it can be superim-

posed correctly on top of the picture. We use an

error metric to quantify the quality of match-

ing. The basic algorithm for matching is a vari-

ation of the line matching algorithms.4 A

successful match enables the interactive explo-

ration of the contents of a picture in real time

and the association of the picture locations

and cones of view with semantic maps for visu-

alization and browsing the database contents.

ExperimentationWe have performed experiments to under-

stand the sensitivity of the algorithms that we

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 37

Figure 1. Construction of the 2D representation of the spatial view. (a) A picture showing mountains, land, and ocean. (b) Part of the

semantic map that contains the polygon-shaped semantic objects, the picture’s direction and angle of view, and the visible land

formations along that direction (dark areas inside the cone). A rectangular semantic object on the right side of the angle of view isn’t

visible to the user due to the hill on the right; hence it doesn’t appear in the 2D representation. (c) The 2D representation of the image

containing visible land formations and semantic objects present on the semantic map calculated from the camera’s position and

direction parameters.

(a) (b) (c)

Ap

ril�Ju

ne

2010

37

Page 5: Picture Context Capturing for Mobile Databases

use with respect to the errors in the model

parameters and the lack of sensor tilt and rota-

tion. In addition, we wanted to determine the

relative value of the types of semantic geo-

graphic objects in the precise registration of

the pictures.

Because we had observed that errors in the

GPS determination have little impact on accu-

racy, we experimented with errors produced

from inaccuracies in the direction determina-

tion. We examined and compared the results

produced when the direction captured by the

camera system deviated from the true direc-

tion. Although we have developed an error

metric to evaluate the quality of matching,

sometimes the error metric produces results

that don’t match those expected by a human.

Thus, we used human evaluations of the

matching quality. A user visually categorized

the results of registering the picture to the 2D

representation of the spatial view in the actual

direction of the picture. We categorized the

results using a scale from 1 to 7 and considered

those results to be satisfactory when they

achieved a grade higher than 3.

Table 1 shows the results of the experimen-

tation. For no errors in the captured direction

of the picture, the algorithms achieve satisfac-

tory results in 91.3 percent of the cases. For

errors in the direction of about 4 degrees, the

performance of the algorithms is solid, having

81.2 percent satisfactory results. When the

error in the compass measurements is 7 degrees

or more, the algorithms often don’t have

enough information from the 2D picture

representation to produce an accurate match,

resulting in lower percentage and quality of

successful matches. Although the sample is rel-

atively small, it demonstrates that the use of a

compass and semantic maps improves the pic-

ture registration results greatly, that the picture

registration quality achieved is good, and that

deviations in the direction determination

might result in significant deterioration of pic-

ture registration quality.

In the experiments, we observed that the

boundaries of the blue sky and the mountains

are useful for accurate picture registration.

We expected this because the location of ex-

perimentation (Crete) had clear blue skies

and mountains. However, it’s conceivable

that other geographic features could also be

useful in enhancing the performance results

or achieving results where clear separation be-

tween mountains and blue sky don’t exist.

We also tested the ocean as a geographic ob-

ject and examined its capability to improve

the results of registration achieved by the

blue sky separation. The results showed

that the combined use of the two geographic

features for picture registration was better

than the sole use of blue sky separations in

about 30 percent of the cases. Our experi-

ments indicate that this area of research is

promising; we intend to pursue more research

in this area in the future.

Time, user location, and event metadataEvents are meaningful ways of modeling the

content of pictures. For example, the MPEG-7

Semantic Model is based on event modeling.5,6

In SPIM, we use a part of MPEG-7 for event

modeling and capturing. We characterize

events by name, location, and time, and

might use actors that participate in the events.

Events are also organized in semantic event

hierarchies. For example, a wedding event

might be composed of smaller events like the

ceremony, wedding dinner, and so on. A sum-

mer vacation in Crete in 2009 could be an

event that is subdivided into smaller events of

visiting various places within Crete. Summer

vacations in Crete in 2009 are of the same

type as other summer vacations.

Our system allows specification and brows-

ing of event hierarchies in a simple manner.

Additional retrieval and browsing functional-

ities might allow users to specify events at var-

ious levels of the hierarchy. In the current

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 38

Table 1. Results from the experimentation on 69 pictures with three error

categories in the compass measurements: no error, 4 degree deviation,

and 7 degree deviation.

Distinction

True

heading

4 degree

deviation

7 degree

deviation

Perfect (7) 7 10 3

Good (6) 26 16 13

Acceptable (5) 19 18 12

Average (4) 11 12 14

Bad (3) 3 7 11

Awful (2) 1 4 10

Failed (1) 2 2 6

Number passed 63 56 42

Number failed 6 13 27

Pass percentage 91.3 % 81.2 % 60.9 %

Fail percentage 8.7 % 18.8 % 39.1 %

IEEE

Mu

ltiM

ed

ia

38

Page 6: Picture Context Capturing for Mobile Databases

system, the elementary event that is associated

with the picture is automatically determined by

the time the picture was taken. The time the

picture was taken uniquely determines a leaf

in the event hierarchy, which the system uses

to associate the picture with all the event-

instance-related information. A sophisticated

retrieval system would be able to exploit the

event hierarchies or the event instance data.

We explicitly model and associate with the

events the user location at the time of picture

taking (as opposed to the location of the objects

that appear within the picture). The user loca-

tion is captured by the GPS parameters of the

Exif file and is automatically converted to the

location name using the organization of infor-

mation in the semantic maps (geographic hier-

archies). In addition, the automatic assignment

of location names can be exploited in the re-

trieval interfaces.

SPIM software environment and servicesThe SPIM software offers client�server ser-

vices to create personalized maps. The server

includes a map database and a database con-

taining domain ontologies and individuals as

well as services that can create personalized

maps according to the user’s interests (domain

ontologies, types of concepts, specific individu-

als). The picture management software acts as a

client program to the semantic map server; it

accesses and stores the delivered personalized

semantic maps. The software manages the pic-

tures for the user and includes the services for

picture capturing, registration, storage, index-

ing, and object annotation. The software also

provides the retrieval functionality as well as

the user interfaces for visualization and object

interaction.

Figure 2 shows an example of SPIM func-

tionality. The user can specify ontologies,

types from each ontology, or even individuals,

and the system will decide which individuals

satisfy the constraints. SPIM then emphasizes

the location of the semantic objects that satisfy

the constraints and lists the pictures associated

with those semantic objects. The user can ask to

see the picture footprint (which is the geomet-

ric representation of its location and cone of

view) on top of the map, or select a semantic

object from the map and ask to see all the pic-

ture footprints from a particular database. The

user can see the pictures themselves by select-

ing the picture footprint or by clicking on the

picture thumbnail, and can list the semantic

individuals that appear in the picture (villages,

churches, and so on).

Figure 3 (next page) shows the user interface

that allows interactive exploration of the spatial

information associated with a picture. The user

can point the mouse to a location in the picture.

If the mouse is above certain semantic objects,

they are highlighted and the name of the

semantic object and relevant information is dis-

played when clicking on them. The user can

choose to hide the semantic individuals and

their boundaries to view the original picture.

ConclusionsThe research in this article has emphasized

the importance of detailed registration of the

remote scenes on pictures so that the user can

point to rather small objects visible in the pic-

ture and interact with them. A wide range of

visualization and interactive functionality

services for personalized information manage-

ment systems can be supported using this func-

tionality. As the accuracy of capturing the

location of remote objects is critical for such

interactions, we are currently performing

more research in the integration of additional

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 39

Figure 2. An example of the SPIM user interface. Semantic objects are selected

and shown as polygons on a map, and the footprints of selected semantic

objects are displayed. The user can select footprints and see the corresponding

pictures and their information. Locations of pictures are shown as small

circles. The user is able to see what semantic entities are on top of the map and

view information about them. In the figure, a semantic individual describing a

mountainous area has been selected.

Ap

ril�Ju

ne

2010

39

Page 7: Picture Context Capturing for Mobile Databases

contextual information that is readily available—

such as the time of the day and year with

respect to current location, the camera

azimuth, and so on—in the algorithms that

perform picture registration. We are also

exploring more alternative functionalities for

personalized information management sys-

tems that are enabled by the detailed picture

registration to the 3D scenes. MM

References

1. Exif Version 2.2 Digital Still Camera Image File

Format Standard, 2002 Japan Electronics and

Information Technology Industries Assoc.; http://

www.exif.org/Exif2-2.pdf.

2. R. Franklin and C.K. Ray, ‘‘Higher Isn’t Necessarily

Better: Visibility Algorithms and Experiments,

Advances in GIS Research,’’ Proc. 6th Int’l Symp.

Spatial Data Handling, T.C. Waugh and R.G.

Healey, eds., Taylor & Francis, 1994, pp. 751-770.

3. R. Nock and F. Nielsen, ‘‘Statistical Region Merg-

ing,’’ IEEE Trans. Pattern Analysis and Machine In-

telligence, vol. 26, no. 11, 2004, pp. 1452-1458.

4. J.R. Beveridge and E.M. Riseman, ‘‘How Easy Is

Matching 2D Line Models Using Local Search?’’

IEEE Trans. Pattern Analysis and Machine Intelli-

gence, vol. 19, no. 6, 1997, pp. 564-579.

5. C. Tsinaraki and S. Christodoulakis, ‘‘An MPEG-7

Query Language and a User Preference Model

that Allow Semantic Retrieval and Filtering of

Multimedia Content,’’ Proc. ACM Verlag Multi-

media Systems J., special issue on semantic

multimedia adaptation and personalization, vol.

13, no. 2, 2007, pp. 131-153.

6. C. Tsinaraki, P. Polydoros, and S. Christodoulakis,

‘‘Interoperability Support between MPEG-7/21

and OWL in DS-MIRF,’’ IEEE Trans. Knowledge and

Data Engineering, special issue on the Semantic

Web era, vol. 19, no. 2, 2007, pp. 219-232.

Stavros Christodoulakis is professor and director of

the MUSIC/TUC laboratory at the Department of

Electronic and Computer Engineering, Technical

University of Crete. His research interests include in-

formation systems, multimedia, semantics, and inter-

operability. Christodoulakis has a PhD in computer

science from the University of Toronto, Canada.

Contact him at [email protected].

Michalis Foukarakis is a graduate student in elec-

tronic and computer engineering at the Technical

University of Crete, where he works as a research as-

sistant at the Laboratory of Distributed Multimedia

Information Systems and Applications. His research

interests include semantic spatial image management

and ontologies. Foukarakis has a MS in electronic and

computer engineering from the Technical University

of Crete. Contact him at [email protected].

Lemonia Ragia is a research assistant at the Univer-

sity of Geneva. Her research interests include data

mining, spatial databases and data, model manage-

ment and schema matching, and high-performance

visualization of spatial information. Ragia has a PhD

in photogrammetry from the Institute of Photogram-

metry, Bonn, Germany. Contact her at lemonia.

[email protected].

Hiroaki Uchiyama is a software engineer at Ricoh.

His research interests include Bluetooth technology

and developing business-oriented digital cameras

using Bluetooth, WiFi, GPS, and barcode functional-

ities. Uchiyama has an MS in electrical and electronics

engineering from Sophia University, Tokyo. Contact

him at [email protected].

Takuya Imai is a software engineer at Ricoh. His re-

search interests include implementing Bluetooth tech-

nology for digital cameras and innovative application

research for Bluetooth-equipped devices. Imai has a BS

in mechanical engineering from Meiji University,

Tokyo. Contact him at [email protected].

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 40

Figure 3. Interactive exploration of image contents. The user is able to select

semantic objects depicted in the picture and obtain relevant information

about them.

IEEE

Mu

ltiM

ed

ia

40

Page 8: Picture Context Capturing for Mobile Databases

[3B2-14] mmu2010020034.3d 22/4/010 15:19 Page 41