21
Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Embed Size (px)

Citation preview

Page 1: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Online Image Metadata Collection for Geographical Analysis

Vlad Coman, Stuart Dunn, Austin Taylor

Instructor: Dr. Serpen Adviser: Dr. Thomas

Page 2: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Contents Background Information System Overview Platform Requirements Project Components

Web Crawler EXIF Extractor Database Front-End Application

Conclusion

Page 3: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Project Goal & Background EXIF Specification:

Created in 1998 by Japan Electronics Industries Development Association (JEIDA)

Metadata embedded in the image file itself (TIFF & JPEG) Supported by virtually every modern camera software

GPS embedded metadata of latitude and longitude associates images with the geographical location of where they were taken

The correlation between make/model and GPS data allows for valuable market research for commercial applications

Personal use for private image collections lets photographers find their most often used camera settings

Page 4: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

System Overview

Page 5: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Platform Requirements Software Specifications:

Java Runtime Environment 5.0 or higher Linux or Windows with Cygwin

Hardware Requirements: Minimum 500MHz CPU and 128MB of RAM for

World Wind Minimum 2 GB hard drive space for World Wind

cache Nutch hard disk requirements scale with crawl

time

Page 6: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Nutch 1.2 Part of Apache Software Foundation Requires a Tomcat installation as well as a Cygwin

environment Superior over other open source crawling options Configuration uses regular expressions to limit

URLs accepted by the fetcher Command line arguments

-thread – specifies number of concurrent fetcher threads -depth – specifies max crawl depth from original URL

Performance on a desktop computer 2 images per second on average 47 MB per hour on 12 Mbit connection

Page 7: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Image Aggregator Extracts images from Nutch segments

Uses bit representation to rebuild them from the content segments into a separate directory

Imports Hadoop and Nutch libraries Images are numerically incremented and fed

to EXIF Extractor input directory Performance – 150 images per second

Page 8: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Web Crawler & Image Aggregator

Page 9: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

EXIF Extractor Accepts a set of input directories containing

images Will monitor these directories for new files

Uses a multi-threaded design to efficiently process all images and extract embedded metadata Ability to scale for as much parallelism as can be

afforded by multi-core hardware Collected EXIF data is grouped in batches for

database insertion

Page 10: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

EXIF Extractor

Page 11: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Database Uses the HSQLDB Java engine Allows for embedded mode where the engine

runs in the application memory space No I/O overhead in communicating between

application and database engine Will increase overall memory footprint but not

excessively so Engine has been in development since 2001

and is currently being used as a database and persistence engine in many Open Source Software projects[2]

It performs on par or better than many of its Java based competitors[7]

Page 12: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Database FieldsField Name Type Example

Hash String “008b004d62c68fb64”

File Name String “IMAG0074.jpg”

Longitude Decimal 101.567474

Latitude Decimal -56.948134

Map Datum String “WGS84”

Image Height Integer 600

Image Width Integer 800

Make String “HTC”

Model String “PC36100”

Date & Time Timestamp 2002:08:24 13:59:08

Metering Mode String “Multi-segment”

ISO Speed Ratings Integer 125

Shutter Speed Double 0.0015625

F-Number Double 2.8

Aperture Value Double 2.8

Max Aperture Value Double 2.8

Focal Length Double 24.9

Brightness Value Double 6.12

Exposure Bias Value Double 0.0

Exposure Time Double 0.0015625

Flash String “Flash fired”

Page 13: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Front-end Application Uses both Swing and AWT to generate the

Graphical User Interface Filters database queries based on user

preferences Date and time constraints Make and model Camera attributes (aperture, flash, shutter speed,

etc.) Displays GPS coordinates when available on a

virtual globe powered by NASA World Wind

Page 14: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Front-end Application Data Flow

Page 15: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

NASA World Wind Open source 3D interactive world-viewer SDK allows inclusion of World Wind into an

application Updates to latest terrain data available from

NASA  Compared to Google Earth, World Wind is free

and open source[5]

Google Earth licensing issues Costs from $20 to $400 annually to use

Better terrain images

Page 16: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Front-end Application

Page 17: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Ethical & Societal Issues The issue of privacy and intellectual property

Stripping the artistic value of images[6] The extractor deletes all processed images and

disconnects the image data from the original EXIF data

It is the responsibility of the person or company running the crawler and extraction applications to follow ethical considerations

Geotagging – adding positional data to images Linking images to GPS locations can be considered

an ethical issue but is not applicable in the scope of this project

No personal or identifiable information is being kept

Page 18: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Conclusion Designed a system for collecting, processing,

and representing a large set of EXIF metadata Can be used with extensive web-collected

data sets as well as personal image collections

Flexible nature of the system allows for multiple possible usage scenarios Representing trends in location and time of

images being taken Pavement quality scenario

Page 19: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

GPS Coordinate Heatmap

source: http://www.openheatmap.com/

Page 20: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Q & A

Page 21: Online Image Metadata Collection for Geographical Analysis Vlad Coman, Stuart Dunn, Austin Taylor Instructor: Dr. Serpen Adviser: Dr. Thomas

Bibliography[1] Wikipedia (2011, Apr. 19) Exchangeable Image File Format.

[Online] Available:  

http://en.wikipedia.org/wiki/Exchangeable_image_file_format.

[2] HyperSQL (2011, Apr. 25) HSQLDB - 100% Java Database [Online] Available: http://hsqldb.org/

[3] Nutch (2011, Mar. 27) Nutch Frontpage [Online] Available: http://wiki.apache.org/nutch/.

[4] JEIDA. (2002, Apr.) Digital Still Camera Image File Format Standard.  [Online] Available: http://www.exif.org/Exif2-2.PDF.

[5] World Wind Central (2011, April 25) Google Earth Comparison [Online] Available: http://worldwindcentral.com/wiki/Google_Earth_comparison

[6] U.S. Copyright Office. (2006, July 12) Copyright in General (FAQ). Available: http://www.copyright.gov/help/faq/faq-general.html

[7] JPA Performance Benchmark (2011, Apr. 24) HSQLDB Performance Summary [Online] Available: http://www.jpab.org/HSQLDB.html