20
PLEASE SCROLL DOWN FOR ARTICLE This article was downloaded by: [University of Melbourne] On: 14 February 2011 Access details: Access Details: [subscription number 932238059] Publisher Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37- 41 Mortimer Street, London W1T 3JH, UK International Journal of Geographical Information Science Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713599799 Development of an interoperable tool to facilitate spatial data integration in the context of SDI Hossein Mohammadi a ; Abbas Rajabifard a ; Ian P. Williamson a a Department of Geomatics, Centre for Spatial Data Infrastructure and Land Administration, University of Melbourne, Victoria, Australia Online publication date: 16 March 2010 To cite this Article Mohammadi, Hossein , Rajabifard, Abbas and Williamson, Ian P.(2010) 'Development of an interoperable tool to facilitate spatial data integration in the context of SDI', International Journal of Geographical Information Science, 24: 4, 487 — 505 To link to this Article: DOI: 10.1080/13658810902881903 URL: http://dx.doi.org/10.1080/13658810902881903 Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article may be used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Workers’ Compensation Online Reporting - Wyoming Department of

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Workers’ Compensation Online Reporting - Wyoming Department of

PLEASE SCROLL DOWN FOR ARTICLE

This article was downloaded by: [University of Melbourne]On: 14 February 2011Access details: Access Details: [subscription number 932238059]Publisher Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of Geographical Information SciencePublication details, including instructions for authors and subscription information:http://www.informaworld.com/smpp/title~content=t713599799

Development of an interoperable tool to facilitate spatial data integrationin the context of SDIHossein Mohammadia; Abbas Rajabifarda; Ian P. Williamsona

a Department of Geomatics, Centre for Spatial Data Infrastructure and Land Administration,University of Melbourne, Victoria, Australia

Online publication date: 16 March 2010

To cite this Article Mohammadi, Hossein , Rajabifard, Abbas and Williamson, Ian P.(2010) 'Development of aninteroperable tool to facilitate spatial data integration in the context of SDI', International Journal of GeographicalInformation Science, 24: 4, 487 — 505To link to this Article: DOI: 10.1080/13658810902881903URL: http://dx.doi.org/10.1080/13658810902881903

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf

This article may be used for research, teaching and private study purposes. Any substantial orsystematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply ordistribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss,actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directlyor indirectly in connection with or arising out of the use of this material.

Page 2: Workers’ Compensation Online Reporting - Wyoming Department of

Development of an interoperable tool to facilitate spatial dataintegration in the context of SDI

Hossein Mohammadi, Abbas Rajabifard* and Ian P. Williamson

Department of Geomatics, Centre for Spatial Data Infrastructure and Land Administration,University of Melbourne, Victoria, Australia

(Received 18 August 2008; final version received 9 March 2009)

The integration of multisource heterogeneous spatial data is one of the major challengesfor many spatial data users. To facilitate multisource spatial data integration, manyinitiatives including federated databases, feature manipulation engines (FMEs),ontology-driven data integration and spatial mediators have been proposed. The majoraim of these initiatives is to harmonize data sets and establish interoperability betweendifferent data sources.On the contrary, spatial data integration and interoperability is not a pure technical

exercise, and there are other nontechnical issues including institutional, policy, legal andsocial issues involved. Spatial Data Infrastructure (SDI) framework aims to better addressthe technical and nontechnical issues and facilitate data integration. The SDIs aim toprovide a holistic platform for users to interact with spatial data through technical andnontechnical tools.This article aims to discuss the complexity of the challenges associated with data

integration and propose a tool that facilitates data harmonization through the assessmentof multisource spatial data sets against many measures. The measures represent harmo-nization criteria and are defined based on the requirement of the respective jurisdiction.Information on technical and nontechnical characteristics of spatial data sets is extractedto form metadata and actual data. Then the tool evaluates the characteristics againstmeasures and identifies the items of inconsistency. The tool also proposes availablemanipulation tools or guidelines to overcome inconsistencies among data sets. The toolcan assist practitioners and organizations to avoid the time-consuming and costly processof validating data sets for effective data integration.

Keywords: multisource spatial data; heterogeneous spatial data; SDI; spatial dataintegration

1. Introduction

Spatial data integration is an essential component for many spatial services, especially forthose services that are reliant on multisource spatial data. This includes framework data,locational data and infrastructure/socio-economic data. Fundamental data sets are mostlyowned and managed by governments, whereas infrastructure and socio-economic data are inmost cases owned and coordinated by businesses. The diversity and sensitivity of each datagroup for public and business sector cause different considerations on data management.Fundamental data sets are more publicly available and more interoperable, have richermetadata, more consistent data models, clear pricing and privacy policies, and fewer

International Journal of Geographical Information ScienceVol. 24, No. 4, April 2010, 487–505

*Corresponding author. Email: [email protected]

ISSN 1365-8816 print/ISSN 1362-3087 online# 2010 Taylor & FrancisDOI: 10.1080/13658810902881903http://www.informaworld.com

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 3: Workers’ Compensation Online Reporting - Wyoming Department of

involving organizations that make it easier for the users to discover and find data (ANZLIC2004). Business-type data including infrastructures, utilities and socio-economic data areless publicly available and each business sets its own data standards and frameworks. Thesedata sets have poor or no metadata and restrictions in use. To respond to emergencysituations, the inconsistent and multisource spatial data sets need to be integrated effectively.

The diversity of data providers and their priorities in data coordination cause difficultiesin data integration. One of the major challenges in this regard is to identify the items ofincompliancy in a standard way. This is quite crucial for jurisdictions that aim to establish aneffective SDI in which users can easily access and integrate data sets. In this regard, avalidation tool can assist the data repositories to assess their data sets based on the standardsand rules defined within the SDI context.

This article aims to respond to the need of a data validation tool for harmonization of datasets within a jurisdiction context. The article capitalizes on many case studies to identify theissues and obstacles of effective spatial data integration. Based on the observations andfindings of case studies, a tool has been designed that tries to evaluate data sets against therules. These rules represent the measures that assess the compliancy of data sets withstandards and guidelines adopted by respective jurisdiction or agency.

1.1. Scope of the article

This article identifies many technical and nontechnical issues that hinder spatial dataintegration. In this regard, the most significant technical and nontechnical issues of effectivespatial data integration are highlighted based on many case studies. A tool is also proposed toevaluate some technical and nontechnical characteristics of spatial data sets and can assistpractitioners to identify the bottlenecks and existing inconsistency among data sets. The toolassigns specifications, guidelines and standards to the issues. Some of these issues includingmetadata availability, pricing and the restrictions on data will be incorporated in the technicaltool, but as there are issues that are not measurable, the proposed tool refers to guidelines andstandards in these cases.

2. Data integration background

Because of different natures of data sets, there are different challenges among them. Toillustrate the incompliancy and challenges among different data sets, Table 1 as an examplesummarizes many characteristics of state-level cadastre and topography data sets in the states

Table 1. Incompliancy of sample spatial data in the states of NSWandVictoria-Australia (ASDD2007).

Spatial data

Cadastre (1:500) Topography (1:25,000)

NSW Victoria NSW Victoria

Source Local councils Local councils Departmentof lands

Departmentof sustainabilityand environment

Stored format DXF,MapInfo

ESRI SDEformat

Arc/Info, DXF,DGN

ESRI SDE format

Positionalaccuracy

20 mm–5 m �0.5 m N/A �8.3 m

488 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 4: Workers’ Compensation Online Reporting - Wyoming Department of

of NSWand Victoria-Australia. Many applications require the integration of these two datasets that historically have two very different technical and institutional legacies.

In both cases (NSW and Victoria States), large-scale cadastral data are produced andmaintained by local councils. Topographical data are also produced within these jurisdic-tions by different organizations. The structure of cadastral and topographical data is differentin these jurisdictions. The source of data, format and accuracies are a few of the many otherissues that hinder data integration. Diversity in data providers/custodians not only results insome technical incompliancy including format, accuracy, currency, datum, and accessnetwork among data sets, but also results in many nontechnical issues such as pricing andlicensing (Syafi’i 2006, Mohammadi et al. 2007).

To overcome the challenges among diverse data sets, many approaches have beenproposed that aim to effectively harmonize and integrate multisource spatial data sets.Federated databases, ontology-driven data integration, spatial mediators and feature manip-ulation engines (FMEs) are a number of them. A federated database is a logical (virtual)collection of diverse stand-alone databases in a single unified database (Buch 2002) that canbe viewed through a single user interface (Burgess 1999). Federated databases aim to accessand integrate the data and specialized computational capabilities of a wide range of datasources (Haas and Lin 2002).

People, organizations and technical tools need to communicate effectively. However,because of the diversity of the requirements, purposes and backgrounds, there is a widelyvarying understanding and set of assumptions among them. This leads to the difficulty inidentifying requirements and the definition of the specifications. The way to address theseproblems is to create a unifying framework for different viewpoints through the basicdescription of the entities (Uschold and Gruninger 1996).

Many researches have been conducted to apply ontology for spatial data integration.Fonseca (2001) has introduced a framework for integration of multisource spatial data basedon ontologies. He created a mechanism that integrates spatial data sets based on their mean-ings. Uitermark (2001) has studied the ontology for update propagation and introduced theontology-driven spatial data integration as a necessary condition for this purpose. Hakimpour(2003) has utilized ontologies to handle the semantic heterogeneity among diverse databases.

Mediation has been developed as a means of providing the tools for the integration ofmultisource heterogeneous data sets. Spatial mediation has been studied as a means of makinguse of these techniques to integrate spatial data (Miller and Nussar 2003). It also provides dataaccess for distributed and heterogeneous data sources. The mediation technique capitalizes onone global component (a mediator) and multiple local components (wrappers) (Pinto et al.2003). This architecture provides the binding of heterogeneous data to its domain application.Hence, for identification of the corresponding data sets and their appropriate domain applica-tion, each data set is associated to a metadata (Gupta et al. 1999). This approach promotes dataintegration and mechanisms to translate data requests across ontologies.

Some organizations utilize spatial Extract, Transform and Load (ETL) tools to overcomethe incompliancy among different data formats, data models and standards. FME is anintegrated collection of Spatial ETL tools. FME has been introduced as a complete inter-operability solution and eliminates barriers of spatial data integration and transform andenhances data-sharing capabilities between diverse applications (Axmann 2008). The FMEalso allows the integration of different data sets, possibly of different types and differentcoordinate systems, into one logical data set. Although this capability is most often used tomerge data from adjacent data sets into a single data set, it is also used to integrate data fromseveral different sources through semantic translators (Visser et al. 2002, Gerasimtchouk andMoyaert 2007).

International Journal of Geographical Information Science 489

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 5: Workers’ Compensation Online Reporting - Wyoming Department of

3. Spatial data integration and interoperability

Above-mentioned initiatives try to address the incompliancy among multisource data sets indifferent ways; however, the main objective of these approaches is to harmonize spatial datasets and to facilitate the interoperability. As the central issue of geographic informationscience, spatial data interoperability has received considerable attention in recent years.Numerous spatial interoperability research projects and initiatives have addressed the hetero-geneity among multisource spatial data sets. The focus of interoperability research is movingtowards the models and technologies that facilitate spatial data integration (Zaslavsky et al.2005), hence the interoperability can be assumed as a part of integration (Sen 2005).

Three levels of interoperability have been defined by researchers including syntactical,structural, and semantic levels:

l Syntactical: Syntactic interoperability helps to overcome the challenge of informa-tion reuse including the integration of data sets for visualization, query and analysis.The need for syntactic interoperability in multisource and multi-vendor distributedsoftware architecture is addressed with standards-based Web services and vectorformats such as Geographical Markup Language (GML). But this is only the firststep towards real information integration.

l Structural: Structural interoperability provides a basic level of conversion from oneschema to another (Peedell et al. 2005). Any spatial data are structured according to acertain conceptual view of the corresponding phenomena. This structure is stronglyaffected by the relevant organization’s vision, hence it differs from one data set toanother. As a consequence, the data model (database schema) is influenced. Structuralinteroperability deals with the incompliancy such as names, specifications, attributenames, granularity (many object types with few attributes, or few object types withmany attributes), domain values (de Vries 2005).

l Semantic: Semantic interoperability is the ability of one user/system to understandthe meaning of data from another user/system (Goodchild et al. 2005). Apart fromdifferences in data structure, differences in information semantics will also stand in theway of unproblematic multisource data integration. It is the specification, definitions,and the meaning of the terms within a domain (Miller 2006). The users within thesource organization understand the context, so they are implicit and seldom published.The major problem emerges when the data are reused by other organizations and this‘inside knowledge’ is not there (de Vries 2005).

Sen (2005) has addressed different levels of interoperability to achieve effective integra-tion. He proposes the fulfilment of three levels of interoperability as an ideal situation(Figure 1).

In many ways, syntactical interoperability at technical level (data and services) is themost straightforward aspect of maintaining interoperability. Consideration of technicalissues includes ensuring an involvement in the continued development of communication,transport, storage and representation standards such as Z39.50, the work of the World WideWeb Consortium (W3C), Open GIS Consortium (OGC), International StandardsOrganization (ISO). However, semantic and structural interoperability levels are the concernof systems that tend to solve the heterogeneity among data and services at meaning(ontology-based) and structure (schema) levels.

It is clear that there is far more to ensuing interoperability than using compatible data,software and hardware. Rather, assurance of effective interoperability will require often

490 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 6: Workers’ Compensation Online Reporting - Wyoming Department of

radical changes to the ways in which organizations work and especially in their attitudes toinformation (Miller 2006). Hence, besides interoperability at the technical level, interoper-ability at institutional, policy legal and social levels need to be established to ensure effectiveintegration (Sen 2005) and interoperation among organization’s data and systems(Mohammadi et al. 2007).

For these services, minimizing the time and cost of data harmonization is a priority. Mostapplications that rely on integrated data suffer from the lack of an automatic mechanism toidentify the heterogeneity among data sets and a mechanism to assign available solutions toovercome the incompliancy (Edwards and Simpson 2002, Chen et al. 2003, Knoblock et al.2003). The identification of incompliancy items and the provision of necessary guidelinesand standards can help facilitate the process of spatial data integration. The integration toolcan investigate and validate data against standards and measures including items of incon-sistency or compliance with standards. This can be done through the investigation of dataand metadata.

4. Spatial data integration challenges

Many technical and nontechnical issues hinder effective data integration, which should beidentified and overcome. To better understand and identify the complexity of issues involvedin data integration, many case studies have been conducted in the Asia-Pacific region(Australia, New Zealand, Malaysia, Indonesia, Brunei Darussalam, Japan and Thailand)(Mohammadi 2006). In the case study phase, participating countries were asked to provideinformation on the challenges and issues they usually encounter to harmonize and integratediverse data sets within their jurisdictions. Then, many data sets from Australian federal andstate mapping agencies have been collected and integrated based on an approach that tried totest technical and nontechnical characteristics of the data sets and accommodating

Figure 1. Interoperability hierarchy for effective spatial data harmonization (Sen 2005).

International Journal of Geographical Information Science 491

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 7: Workers’ Compensation Online Reporting - Wyoming Department of

jurisdictions. Findings and outcomes of the case studies identified numerous technical andnontechnical issues associated with effective spatial data integration as summarized inTable 2.

The technical and nontechnical issues and obstacles of effective data integration need tobe addressed in the context of SDIs. This also includes the development of guidelines andtechnical tools. These issues have been elaborated as follows:

4.1. Technical issues

Many technical issues hinder effective spatial data integration. Data specifications containinvaluable information on the conceptual definition of the spatial features, the spatial andaspatial content of data sets and the relationship between features. This provides informationon the conceptual data model and also relationships between data features. Lack of rich andconsistent data specification can lead to incompliancy of data model design and also thespatial and aspatial content of data sets. The format and data structure is not a huge problem,as there are many advanced data conversion tools developed for this purpose, but theconversion of data formats may cause the loss of data content.

The integration of spatial data sets in different resolution and scales requires special toolsand techniques including data conversion, data extraction and generalization. This is also atime-consuming and costly process.

Table 2. Technical and nontechnical issues associated with spatial data integration.

Technical issues

Nontechnical issues

Institutional issues Policy issues Legal issues Social issues

l Inconsistent dataspecification

l Access, retrieval anddisplay mechanisms

l Differentbase maps

l Differentlicenseconditions

l Silo mentalitywithout effectivemergers amongsilos

l Multiple rasterand vectorformats

l Sharing data amongorganizations

l Pricingmodels

l IP andlicensing

l Aversion againstdata sharing andintegration

l Variety of spatialresolution

l Different coordinationand maintenancearrangements

l Accesspolicies

l Liabilityregimes

l Different scales l High degree ofduplication

l Userestrictions

l Differences indatum,projections,coordinatesystems

l Weak collaboration

l Data models l Uncoordinatedspecifications andstandards across spatialstakeholders

l Currency andaccuracy

l Lack of central accessgateway (single pointof access)

l Logicalinconsistency

l Building awareness

492 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 8: Workers’ Compensation Online Reporting - Wyoming Department of

Integration of data models facilitates a greater degree of cross-data set analysis (Mills andPaull 2007) from both spatial and aspatial perspectives. The integration of data modelsrequires resolving the diversity of feature classes, relations and attributes. This is a quitedifficult task as different data models may conceptualize features, relations and attributesdifferently. An example of this problem is the number of classes of a single feature indifferent data models. The problem can be more complex if some features are not modeledby respective data models, whereas the same features are modeled in other data models. Thequality of data sets in terms of currency, accuracy and logical consistency is also anotherissue that should be considered in the integration of data sets. Multisource data sets withdifferent qualities can provide incorrect results. For example, the integration of a current andan outdated data set may lack certain spatial phenomena.

Metadata and its content can play a key role in data integration as it can provideinformation on consistency of data sets. Information on some of the above-mentioned issuesincluding accuracy, geographical extent and spatial reference systems are included in mostcommon metadata standards. Resolving these issues help to establish interoperability.

4.2. Nontechnical issues

The nontechnical issues of multisource spatial data integration including institutional,policy, legal and social issues are also the source of major problems. This has beenhighlighted by the case study and also stated by the studies that have been done by Syafi’i(2006) and Mohammadi et al. (2006). Custodians of spatial data sets utilize diverse policies,standards and arrangements to maintain and coordinate spatial data sets. This includes thediversity of databases management and design approaches, data structures, and policies fordata maintenance and distribution. Data providers also utilize different approaches in sharingdata sets, which is mainly enforced by the organizational arrangements, business structure,funding models and their priorities. In some cases, organizations have silo mentality to storeand manage data sets within a single and isolated environment. The diversity of standardsand technical tools which organizations utilize to share data sets also hampers data integration.In many cases organizations do not participate in joint collaborations to produce and sharedata. Therefore, many data duplication and resource wasting occurs.

Effective spatial data integration requires different components. It includes the identifi-cation of potential technical and nontechnical issues and problems and also enablers, tools,guidelines and solutions to overcome the problems. Without building capacity among spatialcommunity stakeholders about the importance of spatial data integration and also above-mentioned component, effective spatial data integration cannot be achieved.

There are many items that affect the decision of spatial data users to utilize data sets. Itincludes the policies and restrictions on data, fitness-for-purpose and availability of data toname a few. Tough restrictions and also restricted access and privacy policies restrict the useand sharing of spatial data sets.

Many of the above-mentioned issues are not only overcome by setting a regime orlegislation, but also require capacity building and raising awareness. Some stakeholdersbelieve in silo approach in data coordination and maintenance. This averts easy access andsharing data sets. Practitioners put much effort and time into investigating the data, metadataand accompanying documents (e.g. data specifications that sometimes contain the featurelevel and attribute information) to find out the heterogeneity among data sets. They some-times also investigate the actual data to identify the characteristics of the data includingspatial and aspatial accuracies. Besides this, every jurisdiction requires a certain combina-tion of guidelines, standards and specification, based on which they modify and maintain

International Journal of Geographical Information Science 493

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 9: Workers’ Compensation Online Reporting - Wyoming Department of

the data. The assignment of the best solution in the form of guidelines or standards isanother challenge, which should be addressed as shown in Figure 2.

In most cases, this process is carried out based on individuals’ knowledge and in a manualmanner. The automation of this task can highly decrease time, cost and effort. It can also providea standardized pattern to investigate the items of heterogeneity. To automate the process, a toolhas been proposed and developed to harmonize different spatial data sets. The main aim of thetool is to identify the incompliancy of the data sets with a set of measures. The measures aremany technical and nontechnical criteria including datum, format, restrictions, prices andgeographical extent, which can be set by every single jurisdiction or organization thatwishes to use the tool. Some of the rules including datum, geographic extent and attributecontent can be extracted from the data itself. Many spatial files (e.g. shapefiles and GMLfiles) encapsulate information on datum. The geographic extent and attribute content isalso extractable from spatial content and attribute tables.

Metadata and other accompanying documents including privacy and pricing policydocuments include information on technical and nontechnical characteristics of datasets. However, the jurisdiction that tends to utilize the tool can make configurations andalso set rules in form of restrictions on data use, distribution and manipulation or in formof some restrictions on spatial and aspatial components. Certain geographic extent, spatialaccuracy and scale are some examples of restrictions on spatial components. Restrictions onvalue and type of attributes are also configurable as default settings for aspatial restrictions.

The tool also accommodates instructions in the form of guidelines and standards of thejurisdiction. Upon any incompliancy with the rules of the tool (measures), an availableinstruction is assigned to the item (Figure 3).

Figure 2. Steps for spatial data integration.

494 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 10: Workers’ Compensation Online Reporting - Wyoming Department of

The proposed process tries to validate and harmonize spatial data sets before it can beused by users. This facilitates the time-consuming and costly data integration initiatives byproviding the users with more integrable data sets. In this process, users can also set out theirown rules for data validation and harmonization, which best meets their requirements. Itincludes user-defined rules on spatial characteristics and aspatial content of data sets.

The measures to assess data sets are customizable based on the context of the organiza-tions; therefore, any spatial data repository within the jurisdiction that aims to provide userswith harmonized data sets needs to set many measures and criteria for harmonization. Therepository stores the data sets that are compliant with the measures, and therefore any set ofdata collected from the repository is homogeneous. Based on this approach, the tool providesaccess to multisource data and identifies the items of heterogeneity against measures ofintegration and facilitates the effective data integration.

The implementation of this scenario requires many abilities and functionalities. The toolshould be able to access different data sources and extract information from data sets andmetadata. In this regard, the tool needs some functions to read data sets. It also needs aparsing function to extract data from metadata. Also, the tool requires a solid comparisonmodule to run a set of comparing functions and queries to asses the compatibility of data setwith the measures.

A report function should accompany the process to provide the user with the informationon data sets and items of incompliancy. The report is linked to the set of instructions and isable to assign the most appropriate available instruction that is valid in the context of the userjurisdiction.

The result of the data integration in the form of a visual map also can give the users abetter insight into some of the visual compliancy of the data sets including the geometric,coordinate and geographical extent.

5. Spatial data integration tool

As outlined above, an effective spatial data integration tool requires many characteristics.The ability to access remote and local databases and different formats together withcomplying with open systems and languages is necessary for a tool that aims to facilitatethe integration of multisource heterogeneous spatial data. The diversity of formats andmodeling languages and the distribution of spatial data across the Internet, demands anopen web-based tool. To meet these requirements, web-service architecture proved to be anappropriate solution (Newcomer 2002). Geo-web services (GWS) are Web services with

Current Situation

Data Provider Data Provider

Spatial Data Repository

Harmonized Data

Data Harmonization

User’sInput

Spatial Data Repository

Harmonized Data

Harmonization Tool

Data Provider Data Provider Data Provider Data Provider

Proposed Situation

Figure 3. Proposed process for data harmonization/integration.

International Journal of Geographical Information Science 495

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 11: Workers’ Compensation Online Reporting - Wyoming Department of

spatial interests (Aditya 2003) which are designed to support spatial interoperability acrossthe Web (W3C 2007).

Capitalizing on the GWS concepts, the tool tries to propose an approach in which threelevels of interoperability (syntactic, schematic and semantic) can be addressed and ensured.Many of the spatial and aspatial aspects of spatial data sets represent syntactic, schematic andsemantic aspects of geographic phenomena. Syntactic specifications define the storageforms and structure of spatial data sets. The schematic aspect of spatial data sets specifiesthe logical and physical structure of data content and spatial databases. It also definesaspatial information content and also the relations between spatial features. Conceptualdesign and definition of data sets and the definition of conceptual entities and relations areinfluenced by semantic specification of data sets.

To design and develop the tool, GeoTools (2006) has been utilized for the developmentof necessary functions. GeoTools is a set of libraries with spatial capabilities. uDig (2007)has been also chosen to provide a visualization environment with some GIS capabilities.uDig is an open-source spatial data viewing and editing environment, with special emphasison the OpenGIS standards for Internet GIS, the Web Map Server and Web Feature Serverstandards. uDig collaborates with GeoTools for Web Map Service (WMS) and Web FeatureService (WFS) support.

5.1. System architecture and design

uDig has been developed with a strong emphasis on supporting the public standards beingdeveloped by the OGC (OpenGIS Consortium), and with a special focus on the WMS andWFS standards. One of the most remarkable strengths of the uDig project has been theattempt to leverage as much existing technology as possible in the development process. Asa result, uDig has become a case study in many of the latest modular software developmenttools in the business community including Eclipse RCP (2007). uDig inherits its extrememodularity from Eclipse (Garnett 2005). The functionalities of GeoTools are also utilized inthe Eclipse. GeoTools’ functionalities allow developers to focus on the creation of a GIS userinterface especially utilizing WMS and WFS.

The Eclipse RCP (Rich Client Platform) is structured around the concept of plug-ins thatcan only be extended and customized through extending existing plug-ins (Garnett 2004). Tofacilitate this process, the Eclipse RCP has provided many core plug-ins on which a uDigapplication can be built up. The integration tool has been developed by extending the uDiguser interface plug-in. In the execution of an integration tool, uDig is responsible for theprovision of interface features and running environment, whereas Eclipse is the developmentenvironment (Figure 4).

The interaction between Eclipse and uDig has been illustrated in Figure 5. Figure 5outlines the main components of the integration tool as a whole.

The tool evaluates multisourced spatial data sets against many measures. The measurescan be defined based on the technical and nontechnical requirements of the jurisdiction. Theaim can be targeted to compare many data sets or to validate data sets against predefinedstandard measures. In this case, many technical and nontechnical issues including format,metadata availability, spatial extent, accuracy and restrictions on data are selected. The majorsources of information on the issues are the metadata and data set itself. A Java class has beendeveloped to parse XMLmetadata and also extract many different types of information from datasets. If there is no incompliancy among data sets and there is no restriction on data use, a report isprovided to the user with the characteristics of the data sets; otherwise, the tool identifies the items

496 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 12: Workers’ Compensation Online Reporting - Wyoming Department of

of incompliancy and proposes an available solution for the incompliancy in terms of guidelinesor standards. If this is the case, users can amend data and attempt to fulfil the test again.

To perform the above-mentioned structure, five major classes have been developedwithin the integration plug-in including data access (data-accessMethod), metadata parser(metadata-parserMethod), comparison (data-compareMethod), reporting (reportMethod)and display (displayMethod) classes (Figure 6).

The data access class is responsible for connecting to the data source and obtainsdata and metadata if available. This class saves information of data. It also sends metadatainformation including its source address to the metadata parser class. Information on dataformat and source is also stored, which is passed to the display class if data display is needed.The metadata parser class parses metadata and based on the list of measures extractscorresponding information from metadata. Then extracted information is passed to the

Figure 4. The interaction of development and user interface environments.

RemotedBases

Local dBases

Modify

RevisionInstruction

DefineInconsistency

Superimpose

IntegratabilityTool

NoInconsistency

Inconsistency

Res

trctio

n of

use

DaturnFormat

Spatial ExtentAttributesQuality

Restrictionsetc

StandardsGuidelines

IntegratabilityCheck

ComprehensiveReport

Web-Servers(WFS, WMS)

Figure 5. Components of integration tool [298 · 204 mm (51 · 51 DPI)].

International Journal of Geographical Information Science 497

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 13: Workers’ Compensation Online Reporting - Wyoming Department of

data-comparing class. This class acquires information from the data access class and basedon the criteria measures data sets. The outcomes of this class provide information for thereporting class to prepare a report on items of incompliancy. The reporting class alsoproposes guidelines to the user for further amendments of data. This class provokes thedisplay method if there is no incompliancy between data sets. The display class obtains dataset source information from the data access class and displays data in the uDig displayenvironment.

Data access, reporting and display classes contain user interfaces in uDig environment,whereas all classes contain background codes running in Eclipse environment.

5.2. User interfaces

The uDig and Eclipse are quite compliant together. Any coding in Eclipse with the uDig userinterface plug-in extension point is executed in the uDig environment. The integration tooldevelops many steps to collect information on data sets, compare data sets, provide a reportto the user and finally collate data sets.

5.2.1. Collecting information

The very first user interface of the integration tool that is run from uDig is a window whichcollects information on data sets. The window asks the user of the system information on theformat, the source of data, metadata standard and place if applicable and finally anyrestriction that the user is aware of. At this stage the user is able to enter as many datasets as required. This preliminary information is saved for each data set and retrieved asrequired.

The system has been developed to support many different formats including ESRI’sshapefile and TIFF images and services including WFS and WMS. Based on the formatentry, the system provides the user with tools to browse data in the local or remote database.

Another part of the system asks about the metadata standard for XML metadata. At thisstage, only XMLmetadata is readable by the system, and other formats such as text and html

Figure 6. Main classes of integration tool.

498 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 14: Workers’ Compensation Online Reporting - Wyoming Department of

are not compliant with the integration tool. There are five major metadata standards includingISO/TC211 (ISO/Technical Committee 211), CEN/TC287, Australia and NZ (ANZLIC –Australia and New Zealand Land Information Council), Europe (DCMI – Dublin CoreMetadata Initiative) and US (CSDGM – Content Standard for Digital Geospatial Metadata),which have been fed into the system, and the system is able to match any metadatastandard with the corresponding schema. The schemas have been developed andcustomized based on the information required for the integration purposes. These steps arehandled by the data-accessMethod class. The information collected at this step is saved fornext steps.

5.2.2. Comparing data sets

Both the actual data and metadata contain information, which is required for data integrationassessment. Information including geographical extent, format, attribution and datum can becollected from actual data. Depending on its standard, metadata also contains muchinformation on data including custodianship, jurisdiction of data, completeness, currency,any restrictions on data, pricing, scale and attribute and spatial accuracy (Figure 7). Thesecharacteristics are essential for data integration; therefore, the comparison of the character-istics results in the ability of data sets for integration.

The metadata-parserMethod class parses the XML metadata and extracts necessaryinformation. This task is carried out by utilizing XML schemas that have been developedbased on the required information and metadata standards. The information is then input tothe data-compareMethod class. This class also extracts information from actual data andcompares them to identify the items of incompliancy. The result of this step is passed to thereporting class to form the report.

Figure 7. Integration measures.

International Journal of Geographical Information Science 499

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 15: Workers’ Compensation Online Reporting - Wyoming Department of

5.2.3. Reporting to the user

The results of the comparison form a report with the list of data set characteristics. The itemsof incompliancy are also highlighted in the report. The report also provides the user withsome guidelines or documents to overcome the incompliancy. The reportMethod class isresponsible for forming the report (Figure 8).

The report comprises two sections. The report not only contains detailed information ondata sets and their characteristics, but also a list of incompliancy items. The report assigns anavailable instruction to the users to overcome the compliancy items. If there is no incompliancyleft and there is no restriction on data use, data sets can be collated and displayed.

5.2.4. Data collation

The displayMethod class receives formats and sources for homogeneous data. This classpossesses methods to read and display data in different formats. The uDig provides displayand basic GIS tools. Figure 9 shows an example of data collation as the final step ofintegration tool.

5.2.5. Use test

To conduct a realistic and practical example to show the capabilities of the system, the toolshave been applied on data sets at the state of Victoria, Australia. These data sets have beenselected from fundamental data sets that are less heterogeneous; however, utilizing business-oriented data sets may show more inconsistency with the rules and measures, as businessesfollow their own standards and tools for data management.

The user points to a data set (local council administrative boundaries) and provides ametadata source based on ANZLIC metadata profile. The system extracts the informationfrom data and metadata and cross-checks the information with the jurisdictions (State ofVictoria) specifications. Based on the predefined specifications (for testing the tool), thedatum of compliant data should be GDA94 (Geographic Datum of Australia), in thegeographical extent of Victoria, the restriction on data was set to ‘completely restrictions’and a specific attribute’s value (jurisdiction) could not be ‘null’.

Figure 8. Integration assessment report.

500 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 16: Workers’ Compensation Online Reporting - Wyoming Department of

The data set was tested and the outcome showed that the geographical extent is the sameas the tools’ configuration, but the datumwasWGS84, the restriction was ‘subject to license’and many features had the attribute with null value. For the datum the specification GDA’stechnical manual (ICSM 2002) was proposed and the null values were reported. Therestriction, as it was in a lower level of restriction, was accepted. Table 3 summarizes therules that were set as default rules and also the characteristics of the data set. It also highlightsthe items of incompliancy with the defined rules.

Figure 9. Graphical data integration (uDig interface).

Table 3. The test result of validating a data set.

Measures Jurisdiction-defined Data set Incompliancy

Accepted Formats ESRI shapefile, ESRI coverage,WFS/WMS, ERMapper ECW

MapInfo TAB X

Metadata Standard andDTD/Schema

ANZLIC’s profile,anzmeta1.3.dtd

ANZLIC’s profile,anzmeta1.3.dtd

p

Policy agreement Privacy policy Subject to licensep

RestrictionsGeographical extent Australia’s boundaries Australia, Victoria

p

Datum GDA_1994 WGS86 XCurrency Not older than one month Two weeks old

p

Minimum and maximumscale

1:1000 to 1:1,000,000 1:25,000p

Completeness Complete Completep

User-defined rule No ‘null’ is accepted for attributes Some ‘null’ attributes X

International Journal of Geographical Information Science 501

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 17: Workers’ Compensation Online Reporting - Wyoming Department of

The result of the use test showed that with a given list of measures and rules to evaluatethe data sets, the users spend much more time as they require manually investigating datacontent, running queries and extracting information from metadata and data. The use testalso indicated that in some cases in manual test, participants take different approaches, whichare caused by the different interpretation of the rules. For example, the geographical extentcan be defined by the maximum and minimum coordinates of the data features, but in somecases participants used the jurisdiction that the data belong to, in order to specify thegeographical extents. The use test defined two major advantages: time consumed for dataevaluation and as a consequence the money spent and also the comprehensiveness of theassessment.

6. Discussion

The tool test showed the automation of the process of harmonization can be increased andfacilitated with the integration tool. The tool decreased the manual, resource-intensive, time-consuming and cumbersome process of data investigation and validation. It is especiallycrucial for agencies and users who try to overcome the inconsistency and heterogeneityamong multisource data sets.

Many of the technical and nontechnical issues require the investigation of data contenttogether with metadata and data specification, which takes time. It is not also practicalbecause of the access difficulties to the documents. With a central tool that is customized forthe jurisdiction-defined rules and documentations, the process of data integration validationcan be automated much easier. The standardized and routine process proposed by the toolalso provides a consistent approach to evaluate different data sets.

In the process of development, the availability of resources including rich metadataremains a challenge in some jurisdictions. In some cases, some information including dataon spatial and aspatial accuracies are not accessible and require many logical and statisticalanalyses on data that should be developed for the tool. Also to be widely used by differentjurisdictions, metadata schema for the jurisdiction (where it is not available) and metadataschema conversion tool should be developed. In this regard, the data access module alsorequires more development to be able to access and obtain data from more formats includingraster format and services including Web Coverage Services (WCS).

Many of the technical and nontechnical issues, including geographical extent and datum,can be measured utilizing analysis and query tools; however, some of them including logicalconsistency and restrictions are not easily measurable. To automate the process of theevaluation of these items, the machine-readable documents are highly helpful. Anotherissue that was raised during the implementation phase was the measurability of the metadatacontent, which helps the assessment process. Some metadata content including the accura-cies, privacy policy and restrictions are kept in the form of a descriptive text, which is noteasily comparable with another value, so the issue which is raised is the metadata content. Ifthe metadata content is measurable, this helps many different analysis and assessments,which require the evaluation and measurement of the metadata content.

The content and availability of metadata is also another issue, as data providers do notnecessarily provide appropriate metadata content. Within the SDI context, it has beenaddressed by the establishment of custodianship arrangements. Custodians are data provideragencies or those agencies who have been assigned by data providers to coordinate andmaintain data and metadata. An effective custodianship arrangement that considers theseissues could ensure the provision of appropriate metadata. From an institutional perspective,a rigorous custodianship agreement between data providers and owners who oblige them to

502 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 18: Workers’ Compensation Online Reporting - Wyoming Department of

provide a certain data content will also assist the effective data integration. Privacy, restric-tions, metadata and pricing documents (in the form of XML or other machine-readablestructures) can highly facilitate not only data evaluation but many other processes includingdata discovery, data use and sharing. The comprehensive data validation and harmonizationrequires information on different technical and nontechnical characteristics, barriers andenablers. This cannot be fully achieved unless this information is available in the form ofdocuments and specifications.

One of the major problems that the users of multisource spatial data sets encounter is thelack of an automatic structured approach to investigate the heterogeneity among data sets.The tool can play a significant role in the integration of multisource spatial data sets. The toolcan also help users and agencies that rely on different sources of spatial data to facilitate theharmonization of heterogeneous spatial data sets through structured data validation andharmonization.

7. Conclusion

Spatial data integration and harmonization is an important process for many data users andproviders. The complexity of data integration process can be highly facilitated, if a part or allof the process could be implemented automatically. This also assists us in establishinginteroperability among different services and data sets.

In this regard, a data integration tool has been proposed that incorporates guidelines andinstructions for incompliancy items and provides a holistic system for effective data evalua-tion that facilitates data integration. The tool utilizes many rules and measures to validate thedata sets.

The tool also provides some functionality to evaluate the harmonization of the datasets; it considerably helps practitioners to implement interoperability among data, servicesand jurisdictions. The tool provides a set of rules that evaluate data sets against manymeasures that represent some aspects of data sets. These aspects affect spatial dataintegration and are quite significant for effective data integration. By utilizing this tool,practitioners are able to automate the process of data validation. The tool is not limited totechnical issues but also exploits the capability to adopt jurisdiction contexts, standardsand specification; it is a further step in the aim of SDIs to facilitate the use of data sets totheir maximum potential.

The introduced approach can be improved if more rules andmeasures for other aspects ofthe data sets are formulated and added to the tool. This can be done partly by the supportingdocuments including metadata.

ReferencesAditya, T., 2003. Semantic and interoperability in GeoWeb services. Master’s Thesis. ITC, Enschede.ANZLIC, 2004. ANZLIC strategic plan 2005–2010 – milestone 5: national framework data themes

[online]. ANZLIC. Available from: http://www.anzlic.org.au/get/2442847451.pdf [Accessed 27September 2007].

ASDD, 2007. Australian spatial data dictionary, GeoScience Australia [online]. Available from:http://asdd.ga.gov.au/ [Accessed 4 May 2007].

Axmann, 2008. Feature manipulation engine, Vienna [online]. Available from: http://www.axmann.at/downloads/fme_documents/fme_intro.pdf [Accessed 12 May 2008].

Buch, V., 2002. Database architecture: federated vs. clustered [online]. Oracle Corporation. Availablefrom: http://www.oracle.com/technology/tech/windows/rdbms/ClusterComp.pdf [Accessed 2May 2008].

International Journal of Geographical Information Science 503

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 19: Workers’ Compensation Online Reporting - Wyoming Department of

Burgess, W.S., 1999. Vertical Integration of Spatial Data. In: M.D.O. N., William S. Burgess, ed.Resources. The Maryland State Government Geographic Information Coordinating Committee(MSGIC) and Maryland Local Government GIS Committee (MLOGIC).

Chen, C.-C., et al., 2003. Automatically annotating and integrating spatial datasets. Internationalsymposium on spatial and temporal databases. Santorini Island, Greece.

de Vries, M., 2005. Recycling geospatial information in emergency management situation: OGCstandards play an important role, but more work is needed [online]. Available from: http://www.directionsmag.com/article.php?article_id=2019 [Accessed 22 May 2006].

Eclipse RCP (2007) Eclipse rich client platform, the eclipse foundation [online]. Available from: http://www.eclipse.org/community/rcp.php [Accessed 21 June 2007].

Edwards, D. and Simpson, J., 2002. Integration and access of multi-source vector data. Symposium ofgeospatial theory, processing and application. Ottawa, Canada.

Fonseca, F.T., 2001. Ontology-driven geographic information systems. Thesis (PhD). The Universityof Maine.

Garnett, J., 2004. Data access developer’s guide for uDig. Victoria, BC, Canada: Refraction ResearchInc.

Garnett, J., 2005. User-friendly desktop internet GIS. Miles virtual seminar-free software, geoinfor-matics and environmental management information systems at the local level [online]. Colombo,Sri Lanka, November 2005. Available from: http://udig.refractions.net/docs/VSpaperuDig.pdf[Accessed 20 June 2007].

GeoTools, 2006. GeoTools – the open source Java GIS toolkit [online]. Codehaus Foundation.Available from: http://geotools.codehaus.org/ [Accessed 16 June 2007].

Gerasimtchouk, R. and Moyaert, L., 2007. Integrating CADD and GIS data. Baama, Mineta San Jose�International Airport.

Goodchild, M.F., et al., 2005. Uncertainty and interoperability: the areal interpolation problem.In: L. Wu, et al., eds. Fourth international symposium on spatial data quality (ISSDQ 05). Beijing.

Gupta, A., et al., 1999. Integrating GIS and imagery through XML-based information mediation(Lecture notes in computer science). Berlin/Heidelberg: Springer.

Haas, L. and Lin, E., 2002. IBM federated database technology [online]. IBM. Available from:http://www-128.Ibm.Com/Developerworks/Db2/Library/Techarticle/0203haas/0203haas.Html[Accessed 23 February 2008].

Hakimpour, F., 2003. Using ontologies to resolve semantic heterogeneity for integrating spatialdatabase schemata. Thesis (PhD) Zurich, Zurich University.

ICSM, 2002. GDA’s technical manual (Version 2.2 – 11 February 2002, intergovernmental committeeon surveying &mapping (ICSM) [online]. Australia. Available from: http://www.icsm.gov.au/gda/gdatm/index.html [Accessed 10 July 2006]

Knoblock, C., Shahabi, C., and Wilson, J., 2003. Rapid integration of online and geospatial datasources for knowledge discovery. Geospatial visualization and knowledge discovery workshop,Washington DC. VA: Lansdowne.

Miller, P., 2006. Interoperability [online]. Available from: http://www.ariadne.ac.uk/issue24/interoper-ability/intro.html [Accessed 18 May 2006].

Miller, L.L. and Nussar, S., 2003. An infrastructure for supporting spatial data integration. Federalconference on statistical methodology. Washington, DC.

Mills, L. and Paull, D., 2007. PSMA Australia’s spatial data warehouse, Canberra, Australia, PSMA[online]. Available from: http://www.psma.com.au/resources/psma-australias-spatial-data-ware-house-information-paper [Accessed 1 June 2007].

Mohammadi, H., 2006. Integration of built and natural environmental data: international case studycountry reports [online]. Melbourne, Geomatics Department, The University of Melbourne.Available from: http://www.geom.unimelb.edu.au/research/SDI_research/Integrated/publica-tions.html [Accessed 12 May 2008]

Mohammadi, H., et al., 2006. Bridging SDI design gaps to facilitate multi-source data integration.Coordinates, 2, 26–29.

Mohammadi, H., et al., 2007. Spatial data integration challenges: Australian case studies, spatialscience conference 2007, Hobart, Australia, 14–18 May 2007.

Newcomer, E., 2002.Understanding web services (Independent technology guides). Boston: Addison-Wesley, 332.

Peedell, S., Friis-Christensen, A., and Schade, S., 2005. Approaches to solve schema heterogeneity atthe European level. 11th EC GI and GIS 2005.

504 H. Mohammadi et al.

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011

Page 20: Workers’ Compensation Online Reporting - Wyoming Department of

Pinto, G.D.R.B., et al., 2003. Spatial data integration in a collaborative design framework.Communications of the ACM, 46, 86–90.

Sen, S., 2005. Semantic interoperability of geographic information. GIS Development, 9, 18–21.Syafi’i, M.A., 2006. The integration of land and marine spatial dataset as part of Indonesian SDI

development. 17th UNRCC-AP, Bangkok, Thailand, 18–22 September 2006.uDig, 2007. User-friendly desktop internet GIS (uDig), refraction research [online]. Available from:

http://udig.refractions.net/confluence/display/UDIG/Home [Accessed 06 July 2007].Uitermark, H.T.J.A., 2001. Ontology-based geographic data set integration. Deventer: Twente

University.Uschold, M. and Gruninger, M., 1996. Ontologies: principles, methods and applications. Knowledge

Engineering Review, 11, 93–156.Visser, U., et al., 2002. Ontologies for geographic information processing. Computers and

Geosciences, 28, 103–117.W3C, 2007.World Wide Web Consortium [online]. Available from: http://www.w3.org/ [Accessed 17

October 2007].Zaslavsky, I., Memon, A., and Memon, G., 2005. Data integration within cyberinfrastructure project.

GIS Development, 9, 28–31.

International Journal of Geographical Information Science 505

Downloaded By: [University of Melbourne] At: 07:24 14 February 2011