21
Tony Rees Divisional Data Centre CSIRO Marine Research, Australia ([email protected]) Use of c-squares spatial indexing and mapping in the 2004 release of OBIS, the Ocean Biogeographic Information System

Tony Rees Divisional Data Centre CSIRO Marine Research, Australia ([email protected])

Embed Size (px)

DESCRIPTION

Use of c-squares spatial indexing and mapping in the 2004 release of OBIS, the Ocean Biogeographic Information System. Tony Rees Divisional Data Centre CSIRO Marine Research, Australia ([email protected]). OBIS Concept. - PowerPoint PPT Presentation

Citation preview

Tony ReesDivisional Data Centre

CSIRO Marine Research, Australia

([email protected])

Use of c-squares spatial indexing and mapping in the 2004 release of OBIS, the

Ocean Biogeographic Information System

OBIS Concept

• Intention of OBIS is to be an on line “marine species atlas” by providing electronic access to data from multiple sources via a single gateway or portal

• Current marine species data are scattered (hundreds, thousands of potential data sources) and require harmonisation of data formats, species names, etc.; potentially 200,000+ species

• OBIS has made a start with connections to 10-15 data sources, holding data on some 30,000 species (approx. 2.5m records); intention is to pass 10m records by 2010

• Initial OBIS portal has been operational since 01/2002 (located at Rutgers Univ., NJ.): www.iobis.org (upgraded March 2004).

OBIS Concept - diagrammatic

Distributed data sources

Native Portal functions – including data retrieval and

integration

Visualisation / analysis tools

OB

IS

OBIS Architecture – initial implementation

OBIS Portal

data provider 1

data provider 2

data provider 3

(etc.)

www user 2

www user 3

(etc.)

Real time data

queries

Custom DB wrappers

Initial versionJan 2002 – Mar 2004

Mapping tool 1

Mapping tool 1

Mapping tool 1

www user 1

Strengths / weaknesses

Good things about this approach...

• Source data stays with the providers – no versioning problems, good for distributed “ownership” of the OBIS concept, no IP issues, content is always up-to-date

• OBIS portal can be structurally very simple (simply relays requests and responses, and provides access to on-line mapping tools)

• Portal does not have to be a data manager (with associated resourcing, ongoing data integrity issues), or have any intelligent understanding of the data content (simply does matching on text strings)

Strengths / weaknesses

Less good things about this approach...

• System is only as fast as its slowest link, i.e. performance is dependent on factors outside the Portal’s control; can wait minutes to return data on one species from all providers

• One or more providers may be off line at time of query – will never know whether or not they have data of interest (and mapping potentially incomplete)

• Many searches may return no data (only approx. 10-15% of the marine biota covered at the present time, plus spatial coverage is very patchy), also user has to spell name/s correctly (a common problem)

• No ability to query by common terms e.g. “all fishes”, “all whales”, as this information not held at provider level (typically just the scientific names)

• Some species can be known by multiple / variant names, user may not be aware of this. Also some bad / irrelevant data amongst provider input, will show up with appropriate searches (not filtered out)

... Portal is basically “dumb”, cannot provide user with any pre-search information about what content is available or unavailable.

Semi-equivalent situation at author’s agency

• (Component 1): “Data Warehouse” data repository, with 0.25m marine species distribution records, for 3000 species ... can be slow / cumbersome to query, many queries return no data

• (Component 2): Separate “CAAB” master names list – all possible species which occur in the region (c.20,000 names)

• CAAB upgraded to show which species on the master names list have data in the Warehouse

• Also, parsed all the 0.25m species records and built a spatial index – list of squares in which each species has been recorded; this table then stored as part of the CAAB database

• Now ... can do name and spatial queries on the (smaller) CAAB database (= Index) – show all names for which there are data, what species occur in any square (0.1 x 0.1 degrees in this instance), and distribution of any species, direct from the Index, without needing to establish a connection to the full “Warehouse” database

• Can then support full Warehouse queries as “stage 2” if needed.

C-squares spatial indexing ...

- Doesn’t store the point data, just a list of the squares in which data are present, for each species

- Efficient for data reduction, where multiple points occur in the same square

- Easy to store and query

... choice of square size is a design decision (CAAB index uses 0.1 x 0.1 deg. squares, =~ 10 km)

Hierarchical nomenclature for the c-squares codes

Lat 40.5 S, long 140.2 Eis in...10 x 10º square “3414”

(etc.)

5 x 5º square “3414:1”

1 x 1º square “3414:100”

0.5 x 0.5º square “3414:100:3”

x

Behind the scenes, spatial index looks like this ...

• Index must be refreshed when new data are added to the Warehouse (or records deleted / modified)

• Spatial query logic is very simple (standard text match, on part or all of a “word”)

OBIS New version ...

• For OBIS – similar approach taken, i.e. introduce name index and spatial index (= new “metadata layer” – the OBIS Index), this time using 0.5 x 0.5 degree squares (~50 km global resolution)

• Name index also enhanced with additional metadata and value-adding...

– how many records of each species (0-40,000+)– which sources have the data– date range (start, end year)– what group a species belongs to (fishes, whales,

barnacles...)– common name for the species, where available (plus

more)

• Can now do many queries – including name lists / metadata, spatial queries and “quick maps” – direct from the Index (smaller, rapid to query, plus everything runs locally)

• Only need to query the remote data sources for “stage 2” (= get data) queries. In production version, local data cache of key fields introduced as well, for further performance benefits, and guaranteed data availability.

OBIS Architecture – initial implementation (reprise)

www user 1

OBIS Portal

data provider 1

data provider 2

data provider 3

(etc.)

www user 2

www user 3

(etc.)

Real time data

queries

Custom DB wrappers

Mapping tool 1

Mapping tool 1

Mapping tool 1

Initial version2002-2004

OBIS Architecture – 2004 version

www user 1data provider 1

data provider 2

data provider 3

(etc.)

www user 2

www user 3

(etc.)

DiGIR translation software

New versionMar 2004 onwards

(refreshed on regular cycle)

Mapping tool 1

Mapping tool 2

Mapping tool 1

Provider crawling

Index building

“Stage 2” queries

OBIS Portal

Data Cache

OBIS Index

global names list (partial)

“Stage 1” queries

OBIS User Interface – 2004 version

• Click-on-a-map spatial search (all categories, or single category)

• Name search (scientific name, common name, partial match, “soundalike” search)

• Browse a list of names – all categories (alphabetic), or subset by category

• Show only names with data, or all names (shows status of content building, also confirms that user has entered a valid name – whether data held or not)

Result in practice...

• Can now generate lists of names matching search criteria extremely rapidly – e.g.

– “all whales” (35 spp.) ... <4 secs, including c-squares distribution data (up to 1000 squares per species)

– all whales in a 10 x 10 deg. square ... <3 secs, including distribution data

– all fishes beginning with “lu..” (115 spp) ... <10 secs, including distribution data

(Compare with previous situation, of 2 mins+ per species, and numerous “no data returned” messages)

• “Quick maps” available directly from search results page (require no connection to the source data)

• Also can hold summary statistics (nos. of names per category, overall category distribution maps, etc.) as “meta- metadata”, for presentation to user.

Result of query for “all whales” ... 35 spp., < 4 secs

Search OBIS for “Lutjanus” ... 64 spp., < 6 secs

Note, presence of common names, other metadata, “Quick Map” buttons, plus “Get OBIS Data” (= Stage 2) hyperlinks

HTML results page has all the c-squares for “quick maps” already loaded – e.g. for 1 species (portion of 1 row of the HTML table) ...

• User can choose from a range of available base maps, at variety of sizes / scales

• List of squares returned in the HTML code along with every map as a new form, for re-submission to the mapper if needed (e.g. if user requests a different base map)

• Clicking any point on the map triggers a “Stage 2” request for the source (point) data (implemented as a 5 x 5 degree search on the cache, for the species in question).

C-squares mapper output

Some limitations ...

• Whole world at 0.5 x 0.5 degrees requires 259,200 codes – may exceed present mapper limit (around 60,000 codes), also be a problem for storage. One solution: Multiple contiguous codes can be “collapsed” into next larger step of the hierarchy (i.e., 648 10 x 10 degree squares cover the world), giving quadtree-like efficiencies

• Spatial queries are fastest when constrained to a single square (potentially at any of a range of scales). Multiple-square queries are also possible, (basically, a Boolean “OR” search), but will be slower to execute

• System becomes somewhat less efficient towards the poles (square size becomes smaller)

• Searching on complex polygons (e.g. country boundaries) not really supported – would require a true GIS or spatial database environment to implement (although can come close).

Australia using variable-resolution encoding

Concept for multiple-square searches

Summary

• Metadata-driven approach provides orders-of-magnitude improvements in application functionality, user interactivity, and response times for OBIS

• C-squares spatial indexing supports both spatial searching and provision of “quick maps” directly from the Index – faster, efficient for data storage

• Index / front end can be run as standalone system (decoupled from source data), and requires no GIS environment for implementation (these aspects will be important for future move to a system of replicated OBIS nodes)

• “Quick maps” form a set of custom GUIs which can be used as direct data access points in an intuitive manner

• C-squares system is available for use in other app’s as desired (e.g. see satellite data search presentation, this workshop); OBIS is a demonstration of performance / implementation of a large scale c-squares enabled system in practice.

More information:• C-squares description: “Oceanography”, March 2003

(vol. 16 no. 1)• C-squares website: http://www.marine.csiro.au/csquares/• OBIS website: http://www.iobis.org/ .