Upload
lester-horn
View
219
Download
0
Embed Size (px)
Citation preview
Online Image Metadata Collection for Geographical Analysis
Vlad Coman, Stuart Dunn, Austin Taylor
Instructor: Dr. Serpen Adviser: Dr. Thomas
Contents Background Information System Overview Platform Requirements Project Components
Web Crawler EXIF Extractor Database Front-End Application
Conclusion
Project Goal & Background EXIF Specification:
Created in 1998 by Japan Electronics Industries Development Association (JEIDA)
Metadata embedded in the image file itself (TIFF & JPEG) Supported by virtually every modern camera software
GPS embedded metadata of latitude and longitude associates images with the geographical location of where they were taken
The correlation between make/model and GPS data allows for valuable market research for commercial applications
Personal use for private image collections lets photographers find their most often used camera settings
System Overview
Platform Requirements Software Specifications:
Java Runtime Environment 5.0 or higher Linux or Windows with Cygwin
Hardware Requirements: Minimum 500MHz CPU and 128MB of RAM for
World Wind Minimum 2 GB hard drive space for World Wind
cache Nutch hard disk requirements scale with crawl
time
Nutch 1.2 Part of Apache Software Foundation Requires a Tomcat installation as well as a Cygwin
environment Superior over other open source crawling options Configuration uses regular expressions to limit
URLs accepted by the fetcher Command line arguments
-thread – specifies number of concurrent fetcher threads -depth – specifies max crawl depth from original URL
Performance on a desktop computer 2 images per second on average 47 MB per hour on 12 Mbit connection
Image Aggregator Extracts images from Nutch segments
Uses bit representation to rebuild them from the content segments into a separate directory
Imports Hadoop and Nutch libraries Images are numerically incremented and fed
to EXIF Extractor input directory Performance – 150 images per second
Web Crawler & Image Aggregator
EXIF Extractor Accepts a set of input directories containing
images Will monitor these directories for new files
Uses a multi-threaded design to efficiently process all images and extract embedded metadata Ability to scale for as much parallelism as can be
afforded by multi-core hardware Collected EXIF data is grouped in batches for
database insertion
EXIF Extractor
Database Uses the HSQLDB Java engine Allows for embedded mode where the engine
runs in the application memory space No I/O overhead in communicating between
application and database engine Will increase overall memory footprint but not
excessively so Engine has been in development since 2001
and is currently being used as a database and persistence engine in many Open Source Software projects[2]
It performs on par or better than many of its Java based competitors[7]
Database FieldsField Name Type Example
Hash String “008b004d62c68fb64”
File Name String “IMAG0074.jpg”
Longitude Decimal 101.567474
Latitude Decimal -56.948134
Map Datum String “WGS84”
Image Height Integer 600
Image Width Integer 800
Make String “HTC”
Model String “PC36100”
Date & Time Timestamp 2002:08:24 13:59:08
Metering Mode String “Multi-segment”
ISO Speed Ratings Integer 125
Shutter Speed Double 0.0015625
F-Number Double 2.8
Aperture Value Double 2.8
Max Aperture Value Double 2.8
Focal Length Double 24.9
Brightness Value Double 6.12
Exposure Bias Value Double 0.0
Exposure Time Double 0.0015625
Flash String “Flash fired”
Front-end Application Uses both Swing and AWT to generate the
Graphical User Interface Filters database queries based on user
preferences Date and time constraints Make and model Camera attributes (aperture, flash, shutter speed,
etc.) Displays GPS coordinates when available on a
virtual globe powered by NASA World Wind
Front-end Application Data Flow
NASA World Wind Open source 3D interactive world-viewer SDK allows inclusion of World Wind into an
application Updates to latest terrain data available from
NASA Compared to Google Earth, World Wind is free
and open source[5]
Google Earth licensing issues Costs from $20 to $400 annually to use
Better terrain images
Front-end Application
Ethical & Societal Issues The issue of privacy and intellectual property
Stripping the artistic value of images[6] The extractor deletes all processed images and
disconnects the image data from the original EXIF data
It is the responsibility of the person or company running the crawler and extraction applications to follow ethical considerations
Geotagging – adding positional data to images Linking images to GPS locations can be considered
an ethical issue but is not applicable in the scope of this project
No personal or identifiable information is being kept
Conclusion Designed a system for collecting, processing,
and representing a large set of EXIF metadata Can be used with extensive web-collected
data sets as well as personal image collections
Flexible nature of the system allows for multiple possible usage scenarios Representing trends in location and time of
images being taken Pavement quality scenario
GPS Coordinate Heatmap
source: http://www.openheatmap.com/
Q & A
Bibliography[1] Wikipedia (2011, Apr. 19) Exchangeable Image File Format.
[Online] Available:
http://en.wikipedia.org/wiki/Exchangeable_image_file_format.
[2] HyperSQL (2011, Apr. 25) HSQLDB - 100% Java Database [Online] Available: http://hsqldb.org/
[3] Nutch (2011, Mar. 27) Nutch Frontpage [Online] Available: http://wiki.apache.org/nutch/.
[4] JEIDA. (2002, Apr.) Digital Still Camera Image File Format Standard. [Online] Available: http://www.exif.org/Exif2-2.PDF.
[5] World Wind Central (2011, April 25) Google Earth Comparison [Online] Available: http://worldwindcentral.com/wiki/Google_Earth_comparison
[6] U.S. Copyright Office. (2006, July 12) Copyright in General (FAQ). Available: http://www.copyright.gov/help/faq/faq-general.html
[7] JPA Performance Benchmark (2011, Apr. 24) HSQLDB Performance Summary [Online] Available: http://www.jpab.org/HSQLDB.html