14
DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THE ABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA WASHINGTON STATE) 1 Nadine Schuurman, Aparna Deshpande, and Diana M. Allen 2 ABSTRACT: Integrating spatial datasets from diverse sources is essential for cross-border environmental inves- tigations and decision-making. This is a little investigated topic that has profound implications for the availabil- ity and reliability of spatial data. At present, ground-water hydrostratigraphic models exist for both the Canadian or for the United States (U.S.) portion of the aquifer but few are integrated across the border. In this paper, we describe the challenges of integrating multiple source, large datasets for development of a ground- water hydrostratigraphic model for the Abbotsford-Sumas Aquifer. Growing concerns in Canada regarding excessive withdrawal south of the border and in the U.S. regarding nitrate contamination originating north of the border make this particular aquifer one of international interest. While much emphasis in GIScience is on theoretical solutions to data integration, such as current ontology research, this study addresses pragmatic ways of integrating data across borders. Numerous interoperability challenges including the availability of data, meta- data, data formats and quality, database structure, semantics, policies, and cooperation are identified as inhibi- tors of data integration for cross-border studies. The final section of the paper outlines two possible solutions for standardizing classification schemes for ground-water models – once data heterogeneity has been addressed. (KEY TERMS: interoperability; data integration; cross-border studies; ground water.) Schuurman, Nadine, Aparna Deshpande, and Diana M. Allen, 2008. Data Integration Across Borders: A Case Study of the Abbotsford-Sumas Aquifer (British Columbia Washington State). Journal of the American Water Resources Association (JAWRA) 44(4):921-934. DOI: 10.1111/j.1752-1688.2008.00192.x INTRODUCTION Environmental phenomena do not follow political boundaries and their management requires data from multiple sources – often from multiple jurisdictions. Consequently, there is a pressing need for integrated datasets, which can be used to model potential impacts and develop strategies for mitigation or adap- tation. Although such cross-jurisdictional studies are dependent on the availability of integrated datasets, sparse literature exists on the experiences and chal- lenges faced by low budgeted organizations while integrating datasets for cross-border studies. Data integration refers to the process of taking data from disparate sources that may have been col- lected under very different circumstances and com- bining them into a unified database that can be used to answer specific questions about a phenomenon or to create a model. Data integration, while superfi- cially straightforward, can be very complex as it entails making decisions about meaning (semantics), 1 Paper No. JAWRA-07-0014-P of the Journal of the American Water Resources Association (JAWRA). Received January 24, 2007; accepted November 6, 2007. ª 2008 American Water Resources Association. Discussions are open until February 1, 2009. 2 Respectively, Associate Professor, Department of Geography, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia V5A 1S6; MSc Geography, Department of Geography, Simon Fraser University, Burnaby, British Columbia V5A 1S6; and Associate Professor, Department of Earth Sciences, Simon Fraser University, Burnaby, British Columbia V5A 1S6 (E-Mail Schuurman: [email protected]). JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 921 JAWRA JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION Vol. 44, No. 4 AMERICAN WATER RESOURCES ASSOCIATION August 2008

Data integration across borders: a case study of the Abbotsford-Sumas aquifer (British Columbia/Washington State) 1

Embed Size (px)

Citation preview

DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THEABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA ⁄WASHINGTON STATE)1

Nadine Schuurman, Aparna Deshpande, and Diana M. Allen2

ABSTRACT: Integrating spatial datasets from diverse sources is essential for cross-border environmental inves-tigations and decision-making. This is a little investigated topic that has profound implications for the availabil-ity and reliability of spatial data. At present, ground-water hydrostratigraphic models exist for both theCanadian or for the United States (U.S.) portion of the aquifer but few are integrated across the border. In thispaper, we describe the challenges of integrating multiple source, large datasets for development of a ground-water hydrostratigraphic model for the Abbotsford-Sumas Aquifer. Growing concerns in Canada regardingexcessive withdrawal south of the border and in the U.S. regarding nitrate contamination originating north ofthe border make this particular aquifer one of international interest. While much emphasis in GIScience is ontheoretical solutions to data integration, such as current ontology research, this study addresses pragmatic waysof integrating data across borders. Numerous interoperability challenges including the availability of data, meta-data, data formats and quality, database structure, semantics, policies, and cooperation are identified as inhibi-tors of data integration for cross-border studies. The final section of the paper outlines two possible solutions forstandardizing classification schemes for ground-water models – once data heterogeneity has been addressed.

(KEY TERMS: interoperability; data integration; cross-border studies; ground water.)

Schuurman, Nadine, Aparna Deshpande, and Diana M. Allen, 2008. Data Integration Across Borders: A CaseStudy of the Abbotsford-Sumas Aquifer (British Columbia ⁄ Washington State). Journal of the American WaterResources Association (JAWRA) 44(4):921-934. DOI: 10.1111/j.1752-1688.2008.00192.x

INTRODUCTION

Environmental phenomena do not follow politicalboundaries and their management requires data frommultiple sources – often from multiple jurisdictions.Consequently, there is a pressing need for integrateddatasets, which can be used to model potentialimpacts and develop strategies for mitigation or adap-tation. Although such cross-jurisdictional studies aredependent on the availability of integrated datasets,

sparse literature exists on the experiences and chal-lenges faced by low budgeted organizations whileintegrating datasets for cross-border studies.

Data integration refers to the process of takingdata from disparate sources that may have been col-lected under very different circumstances and com-bining them into a unified database that can be usedto answer specific questions about a phenomenon orto create a model. Data integration, while superfi-cially straightforward, can be very complex as itentails making decisions about meaning (semantics),

1Paper No. JAWRA-07-0014-P of the Journal of the American Water Resources Association (JAWRA). Received January 24, 2007; acceptedNovember 6, 2007. ª 2008 American Water Resources Association. Discussions are open until February 1, 2009.

2Respectively, Associate Professor, Department of Geography, Simon Fraser University, 8888 University Drive, Burnaby, British ColumbiaV5A 1S6; MSc Geography, Department of Geography, Simon Fraser University, Burnaby, British Columbia V5A 1S6; and Associate Professor,Department of Earth Sciences, Simon Fraser University, Burnaby, British Columbia V5A 1S6 (E-Mail ⁄ Schuurman: [email protected]).

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 921 JAWRA

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

Vol. 44, No. 4 AMERICAN WATER RESOURCES ASSOCIATION August 2008

classification hierarchy (schematics), and the relation-ship of the data field to others in the database (syn-tax). Each of these issues is described in this paper.In addition, we also describe emerging research onontologies of data – or the unique interpretation ofdata meaning related to context as well as issues ofcognition.

Current data integration research focuses on anobject-oriented computing paradigm and providespossible technological solutions but does not addressthe current needs of organizations where data aremaintained in relational database formats. Data hete-rogeneity, an unresolved issue for single jurisdictions,is further exacerbated for cross-border studieswhere different data cultures and collection strategiesexist.

In this paper, the transnational Abbotsford-Sumas Aquifer is used as an example to study dataintegration challenges encountered while gatheringand interpreting lithologic (lithologic pertains to themineral composition and grain size characteristicsof rocks and sediments) data necessary for con-structing a hydrostratigraphic model (hydrostrati-graphic model of an aquifer is a representation ofthe spatial distribution of geologic units, which arecategorized based on their potential for transmittingand storing ground water) of the aquifer. Develop-ment of the hydrostratigraphic model (Allen et al.,2007) is an important step in developing an under-standing of the ground-water resource and conse-quent management strategies. The paper addressesthe multifaceted challenges of integrating datasets.It addresses data integration issues faced by organi-zations and includes problems ranging from techni-cal to institutional. While much current emphasisin GIScience focuses on theoretical issues, suchas ontologies (Fonseca et al., 2000; Kokla andKavouras, 2001; Visser et al., 2002a), this paperemphasizes pragmatic solutions for data integrationacross borders.

The paper starts with an overview of dataintegration research and a description of the Abbots-ford-Sumas Aquifer study area. The data and meth-odologies are then described in detail. Finally, webriefly describe two standardization solutions thatwere developed in the province of British Columbia(BC) for standardizing cleaned data as a final step indata integration.

DATA INTEGRATION: A BRIEF OUTLINE

Data integration is at the core of geographic infor-mation systems (GIS) technology and although data

fuels the GIS industry, it is the source of most inter-operability problems. Integration is part of a broaderrubric of interoperability – or finding ways to makecomputing systems compatible. Data interoperabilityresearch has focused on issues such as standardiza-tion, spatial data infrastructures (SDIs), databaseintegration, and semantics. Data interoperabilityissues can be traced back to the 1970s and 1980swhen data were maintained in proprietary formats(Guptill, 1991; Bishr, 1998; Sondheim et al., 1999)and were the purview of government organizations.Data were collected and maintained by multiple agen-cies to meet specific departmental needs, whichresulted in redundant datasets. In order to reducewaste, government organizations initiated standardi-zation activities around the world and developednational data transfer standards (Moellering, 1991;Salge, 1999). The limitations of national standards fortransnational issues prompted application specificinternational groups to develop de jure standards(Salge, 1999), such as Geographic Data File, DigitalGeographic Information Exchange Standard, Open-GIS� specifications, and ISO (ISO 19115) standards.Although these standards provided a method forexchanging information, spatial data transfer stan-dards only formalized the lexicon and syntax (Kuhn,1994). They lacked semantic translation, whichenables users to share information but not meaning(Kuhn, 1994). Despite these deficiencies, standardswere regarded as key to the integration process (Gup-till, 1999) and provided the building blocks of nationalSDIs (Hogan and Sondhiem, 1996; Taylor, 1996).

There has been sustained academic research ondata integration for two decades, yet no consensus ofapproach has emerged (Widom, 1996). Althoughsophisticated techniques based on object-orientedparadigm, such as the mediator approach, the feder-ated approach, ontologies, and linguistic approachesare being researched (Sheth and Larson, 1990; Kash-yap and Sheth, 1996; Fonseca et al., 2000; Smith andMark, 2001; Kokla and Kavouras, 2001; Visser et al.,2002b), these studies provide futuristic solutions –that are difficult to sell to on-the-ground organiza-tions. Most organizations still maintain datasetsin relational format (Schuurman, 2002, 2005;Schuurman 2006) and, consequently, continue tograpple with the problems of data integration. Thecomplexities associated with data integration havefurther increased reluctance among organizations toachieve interoperability unless incorporated into off-the-shelf software (Schuurman, 2002). Although theseobject-oriented methodologies of resolving data inte-gration problems may resolve issues that require datafrom multiple jurisdictions, as of now it is not knownwhen these solutions will be available in off-the-shelfsoftware.

SCHUURMAN, DESHPANDE, AND ALLEN

JAWRA 922 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

Although various types of data heterogeneitieshave been identified by researchers, this researchadopts Bishr (1998) data heterogeneity classification.Bishr (1998) identified three types of heterogeneities.

Syntactic Heterogeneity

Syntactic heterogeneity stems from the use ofdifferent data models, such as relational or object-oriented models, to represent database elements, orthe use of raster or vector models (Bishr, 1998).Elevation information, for example, can be repre-sented in raster format as Digital Elevation Models(DEMs) or as contours in vector format.

Schematic Heterogeneity

Schematic heterogeneities result from differentclassification schemes employed in the componentdatabases or structuring of database elements in com-ponent databases. For example, in this research thegeological description of the well logs is representedby a single attribute in the BC database and by threeattributes in the Washington State database. Adetailed classification of schematic heterogeneitiescan be found in Kim and Seo (1991).

Semantic Heterogeneity

Semantic heterogeneity occurs when there is a dis-agreement about the meaning, interpretation, orintended use of the same or related data (Sheth andLarson, 1990). This heterogeneity results from thedifferent categorizations employed by individualswhen conceptualizing real world objects. Semanticheterogeneity is the most difficult data integrationproblem to resolve because language is interpreteddifferently in different contexts. The word ‘‘range,’’for instance, can refer to a ‘‘stove-top,’’ a ‘‘spread ofnumbers,’’ or an ‘‘animal habitat’’ (Schuurman, 2003).Such semantic heterogeneities have been identifiedas the main cause of data-sharing problems and arethe most difficult to reconcile (Bishr, 1998; Vckovski,1998; Kottam, 1999; Schuurman and Leszczysnki,2006).

Until recently, syntactic and schematic heterogene-ities were the primary focus of interoperabilityresearch, which concentrated on issues related todata models and database structures (Sheth, 1999;Visser et al., 2002b). Semantic data heterogeneitieswere, however, identified as the main cause of alldata-sharing problems (Bishr, 1998; Sheth, 1999).Given that the origins of semantics lay in human

conceptualizations of space, Harvey et al. (1999) pro-posed the need to resolve semantic issues in a holisticfashion, drawing conclusions from domains such ascomputer science, social science, cognitive science,and linguistics. Certainly semantic challenges to dataintegration remain the most challenging Schuurman(2002; Schuurman and Leszczynski, 2006).

Ontologies

The use of context in mediators ⁄ wrappers or thefederated approach for achieving semantic transla-tions has often been based on the use of ontologies.Ontologies are being explored by researchers for avariety of applications such as communication, infor-mation integration, system engineering, and databasetheory (Uschold and Gruniger, 1996). Although onto-logies are being widely studied there is no consensuson the definition of ontology and instead it is basedon the context in which it is used (Winter, 2001). In aphilosophical context, ontologies mean the existenceof entities or the nature of being (Merriam WebsterDictionary; Guarino et al., 1999). Ontologies in thecomputer science community are defined as ‘‘an expli-cit specification of a conceptualization’’ (Gruber, 1995)or a ‘‘shared understanding of some domain of inter-est’’ (Uschold and Gruniger, 1996).

As ontologies provide a shared understanding inthe form of a vocabulary of terms (Kashyap and Sheth,1996), ontologies have primarily been explored forresolving integration issues in GIScience (Agarwal,2005). Such activities include research by Bishr et al.(1999), Fonseca et al. (2000), Smith and Mark (2001),Kokla and Kavouras (2001), Visser et al. (2002b),Fonseca et al. (2003), Brodeur et al. (2003, 2004).There remains, however, a lack of a consensual geo-ontology (Agarwal, 2005) and agreement on the termi-nology that should be used (Fonseca et al., 2000).

Cognitive Approaches

Resolving semantic issues has not only been thepurview of computer scientists. Today, cognitive, lin-guistic, and social scientists are also investigatingmethods to provide a better understanding of seman-tic issues (Harvey et al., 1999). The cognitiveapproach to resolving semantic interoperabilityinvolves understanding the process of developingcategorizations of space. Semantic conflicts originatefrom the varied categorizations visualized byhumans, thus cognitive science theories have beenexplored by researchers to resolve semantic issues(Frank and Raubal, 1999; Stock and Pullar, 1999;Brodeur et al., 2003, 2004).

DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THE ABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA ⁄ WASHINGTON STATE)

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 923 JAWRA

In addition, interoperability research has beendeveloped at the interface level by writing Applica-tion Programming Interfaces that plug and playwith other applications adhering to these specifica-tions or at the encoding level by using GeographyMarkup Language (Reed, 2004), which is an exten-sion of Extensible Markup Language. Little researchhas been undertaken on the technical and institu-tional challenges faced by organizations from adata interoperability perspective in an internationalsetting.

Sieber (2003) pointed out the lack of research inthe Public Participation GIS arena for cross-borderand multinational applications and although litera-ture may exist for cross-border projects (Sieber,2003), they address issues other than the challengesof data integration. Organizations like the OpenGeospatial Consortium have made significant contri-butions to resolve interoperability issues for a techni-cally sophisticated audience at a conceptual level, butpresent few solutions that cater to agencies with lim-ited budgets that are wed to the relational data for-mat. At present, the lack of support for resolution ofsemantic heterogeneity issues requires users toresolve integration on an ad hoc basis. Thus, informa-tion exchange for cross-border spatial data projects isdoubly complex.

AN OVERVIEW OF THE STUDY AREA:THE ABBOTSFORD-SUMAS AQUIFER

The Abbotsford-Sumas aquifer (Figure 1), whichstraddles the Canada-United States (U.S.) borderbetween BC and Washington State, is used as a casestudy to explore the interoperability challengesencountered while constructing a hydrostratigraphicmodel of the aquifer. It is the largest aquifer in theregion (�160 km2) and is shared equally by Canadaand the U.S. The aquifer supports the activities of�200,000 people who live in this area. Ground wateris used not only for drinking purposes but also sup-ports industrial, farming, and agricultural activities(Kohut, 1987; Cox and Khale, 1999). However, thesesame activities have threatened the integrity of thisaquifer (Ricketts, 1999). Agricultural practices and thepoultry industry have lead to nitrate contaminationsthat exceed levels permissible by the U.S. Environ-mental Protection Agency (USEPA) and HealthCanada (Cox and Khale, 1999). The aquifer has alsobeen exploited extensively on both sides of the border(Cox and Khale, 1999); Canada is concerned with theexcessive ground-water withdrawal south of theborder (Kohut, 1987) and the U.S. is concerned with

ground-water contamination that may originate northof the border (Cox and Khale, 1999).

As ground water is the primary source of water,there is a pressing need for the development ofground-water management strategies. Developmentof ground-water management strategies to ensure asustainable source of good quality water is dependenton an understanding of the ground-water resource,which can be gained through the development of ahydrostratigraphic conceptual model of the aquifer.Construction of the conceptual model is dependent onthe depth-specific lithological information providedin the water well databases on both sides of theborder. The use of the lithological information is,however, constrained due to the inconsistent geologi-cal descriptions (Russel et al., 1998), questionabledata quality, and the lack of a standardized inte-grated database.

To study the integration challenges in an interna-tional setting, data were integrated from severalsources in Canada and the U.S. During the integra-tion process, issues such as data sources, metadatareview, assessments of data quality, data conversion,and database integration were explored. Each of theseissues will be discussed within the context of the stepsused to assemble and interpret data needed for deve-lopment of the model for the Abbotsford-Sumas aqui-fer. Moreover, pragmatic solutions to the resolution ofeach of these integration problems are demonstrated.

DATA SOURCES

Spatial data infrastructures maintained by variousorganizations in Canada and the U.S. were theprimary source of datasets. Although SDIs are theprimary source of geospatial datasets, there seems tobe a general inertia among organizations to submitinformation about their spatial data products and ser-vices. This is especially true of local organizations.This is exemplified in Canada where only 13 munici-palities have submitted information regarding theirdata products and services. Therefore, in Canada,despite the establishment of a Canadian GeospatialData Infrastructure, local knowledge is still requiredfor acquiring datasets. Appendix Table A1 provides alist of data sources obtained for this study.

Lithological information for the study area wasobtained directly from the government ministriesresponsible for ground-water management in Canadaand the U.S. In Canada, ground water and its manage-ment is a provincial, as opposed to a federal, responsi-bility. The lack of a national ground-water policyhas resulted in provincial governments developing

SCHUURMAN, DESHPANDE, AND ALLEN

JAWRA 924 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

divergent ground-water policies (Piteau Associatesand Turner Groundwater Consultants, 1993). InOntario, Alberta, and Newfoundland and Labrador,for example, it is mandatory for drillers to submitwater well information to the provincial government,while in BC submission of water well information iscurrently a voluntary task. Unlike Canada, the federalgovernment in the U.S. has some rights to groundwater and thus influence over its management isdelegated to various federal government organizations(Piteau Associates and Turner Groundwater Consul-tants, 1993). In Washington State, ground-water infor-mation is maintained by the Department of Ecologyand the U.S. Geological Survey (USGS).

Although acquiring datasets may seem to be a sim-ple task, acquiring datasets for cross-border studiescan be challenging where political environment, orga-nizational, and institutional priorities influence datamanagement and the subsequent integration process.For example, soil and surficial geology datasets for

BC were available in paper format whereas theDepartment of Ecology maintained digital copies forWashington State.

In addition, a lack of awareness of activities in thesame department or among different levels of gover-nance, institutional reluctance, and institutional pri-orities resulted in duplicate lithological databasesbeing maintained by the USGS and the Departmentof Ecology, which lacked cross referencing. Thesetypes of obstacles are not unique to this project andprovide examples of the sorts of challenges associatedwith cross-border projects.

METADATA

Metadata, defined as data about data, is a keyfactor in enabling data sharing by facilitating data

FIGURE 1. Abbotsford-Sumas Aquifer Situated in Southwestern British Columbia, Canada, and Northwestern Washington State.

DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THE ABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA ⁄ WASHINGTON STATE)

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 925 JAWRA

discovery and reuse of datasets. Metadata records, inconjunction with data catalogs, form the basis forclearinghouses, which provide an online mechanismfor accessing metadata records. In the absence ofmetadata, data discovery via spatial data clearing-houses may be limited.

In Canada, the Discovery Portal of Geoconnectionsis responsible for the development and maintenanceof the geospatial data clearinghouse. When comparedwith its U.S. counterpart very few organizations havesubmitted information. For example, to date, onlyfour ministries in BC and 13 municipalities in theentire country have submitted some form of informa-tion about their spatial data products. Although theseorganizations have submitted information they maylack metadata as organizations are concentratingon data discovery rather than fitness for use (TomFulton, Ministry of Forest and Range, BC, 2004,personal communication).

Many spatial datasets that may be important forsite specific studies are maintained at the municipallevel. However, many municipalities have been reluc-tant to submit information via the clearinghouse. Thisfurther hinders the data discovery process. In addi-tion, these datasets store metadata in a variety offorms, ranging from a simple text file, to an ArcGISformat that is Federal Geographic Data Committeecompliant, to a self explanatory format, to word ofmouth (Robert Regier, Township of Langley, BC, 2002,personal communication). Although metadata is animportant component of GIS analysis, much informa-tion is usually lacking. Most metadata contains infor-mation on geometry but do not provide information onfitness of use (Guptill, 1999) or attribute information.For example, in BC base maps from the Ministry ofAgriculture and Lands (MAL) (formerly the Ministryof Sustainable Resource Management) contain geo-metrical information but attribute information isstored elsewhere and direct links to this informationare conspicuously absent from the metadata. Simi-larly, cadastral data maintained by the provincial gov-ernment lacks accuracy information and parcelinformation. Parcel information is maintained by themunicipal government. Until recently, separate data-sets were maintained by various levels of government;this points to the challenges of working across differ-ent levels of governments.

Thus, the absence of metadata records, which com-prise the basis for data catalogs, is a major obstacle fordata sharing because first, the user is unaware of thedataset and second, the user is unaware of the purposeor semantics of the dataset. Such metadata problemscan be resolved by studying the causes that preventorganizations from submitting information via theclearinghouses or by increasing political support andmandating certain aspects of metadata information.

DATA QUALITY

Data quality is usually measured in terms of howdata satisfies the needs of the users (Strong et al.,1997; Frank, 1998) and is also defined by its fitnessfor use, which differs from person to person. Poordata quality is one of the many issues that hinderthe integration process – unusable formats and a lackof detailed information may result in misuse of data-set and erroneous conclusions. Generally, data qua-lity is considered in terms of accuracy, precision,completeness, and consistency. Data errors present inthe database are frequently ignored but play animportant role in GIS analysis) and can limit theusability of datasets. For example, the use of theground-water databases by drillers and consultantshas been traditionally limited to the identification ofsuccessful and unsuccessful wells (Bob Symington,Gandalf Consulting Limited, BC, 2002; RobertDickin, Gartner Lee Limited, BC, 2002, personal com-munication), even though they provide valuable litho-logical information.

Numerous data errors were identified and resolvedprior to the integration of the cross-border datasets.Such errors are not unique to this case study buthave to be addressed before analysis nonetheless.These errors result in data inconsistencies, which cre-ate problems during querying and analysis. Repeateddata quality checks were necessary due to the extre-mely poor quality of the datasets.

Database Design, Data Structuring, and Their Effecton Integration

Data integrity rules do not exist to keep checks onthe data quality for either dataset. Lithological, gen-eral, and locational information within the BC andWashington State databases is distributed across dif-ferent tables. Logically all tables should store equalnumber of well records; however, the tables stored aninconsistent number of well records (see Appendix,Table A2). This issue was resolved after all the othererrors were rectified by writing database scripts thatidentified common wells in the three tables anddeleted the uncommon records.

Similarly, data integrity rules do not exist to pre-vent the entry of null values in important fields. Forexample, locational information in the form of coordi-nates is central to any GIS – without it, spatial dataare useless for analysis in a GIS environment.Although the BC database contains a table for thelocational information, this crucial information forthe well logs is often lacking (MoE has an ongoingeffort to provide locational data to the water well

SCHUURMAN, DESHPANDE, AND ALLEN

JAWRA 926 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

database). Close to 100% of the records examined inthis project were missing locational information. Thisinformation was obtained separately from a shapefile(shapefile is an Environmental Systems ResearchInstitute (ESRI) vector data) (digitized well locations)provided by the MoE. Approximately 250 welllocations were not digitized in the original shapefileprovided by the Ministry. Locational information forthese wells was obtained from GIS Innovations StreetNetwork file.

Layer depth information (recorded as a depthbelow ground surface for the top and bottom of eachgeologic layer) contained null values in �20-30% ofthe wells records (see Appendix, Figures A2 andA3). Such wells were deleted as they do not provideuseful information in terms of the geologic layering.That is, only a general ambiguous representation ofgeologic layering is possible. A similar problem wasthe lack of geologic descriptions for the layersrecorded (see Appendix, Figure A3). These wellswere also deleted. The occurrence of missing depthinformation significantly reduced the usability of thedatabase.

Subsurface Topography

Depth information of each layer encountered duringdrilling is recorded from the youngest layer (top) to theoldest layer (bottom). Surface elevation is not takeninto consideration. The surface elevations for well logswere always represented by a null value (zero eleva-tion) in both databases (see Appendix, Figure A4). Thisinformation is critical for calculating the actual depthof the well layers and, subsequently, for constructingthe hydrostratigraphic model. In this study, the sur-face elevation for each well was extracted from theDEM using a script called Sp3dPntzVal.ave developedby ESRI, which can be downloaded from the ESRI web-site. After obtaining the surface elevations (reported inmetres above sea level) for the wells, the top and bot-tom elevations for each unit were calculated. This wasachieved by writing a Visual Basic script that automat-ically calculates elevation.

Data quality parameters are an important consid-eration and an important component of the interoper-ability process. Apart from conventional data qualityparameters, database errors should be considered forany environmental management project. In thisstudy, in some cases, errors resulted in a 40% reduc-tion in the lithological databases. This highlights thepoor data quality of the dataset, which ironicallyremains the chief source of depth specific information(see Appendix, Table A3). As seen in this study, dataquality issues can limit the use of such databases andcan cause unnecessary duplication of datasets.

DATA CONVERSION

Data from the two countries were distributedbased on the level of government, jurisdictional man-dates, and organizational requirements. These data-sets, which were based on different datums,projections, and formats, had to be converted to acommon format to facilitate data integration.

The Washington State spatial datasets wereobtained from the USGS, which is a federal organiza-tion, mandated to distribute data in Spatial DataTransfer Standard format. Canada follows a more lib-eral approach, where the format for disseminatingspatial datasets is based on jurisdictional constraints,although more sophisticated standards like SpatialArchive and Interchange Format (SAIF) exist (PaulQuackenbash, Ministry of Agriculture and Lands,BC, 2002, personal communication). The BC spatialdatasets were obtained from MAL and distributed inMinistry of Environment and Parks format, ESRIShape format, or in SAIF format. These datasets notonly differ with respect to the transfer format, butalso in the Spatial Reference System. All spatial data-sets were converted to ESRI Shape format having areference system of Universal Transfer Mercator(Zone 10) using the Feature Manipulation Softwareversion 2002. Appendix Table A4 describes the incon-sistencies in the spatial datasets.

DATABASE STRUCTURE

Although the Washington State and the BC data-bases store lithological information, they differ intheir structure. The Washington State lithologicaldatabase stores data in four tables, whereas the BClithological database stores information in threetables. Table name conflicts (semantically similartables are assigned different names or semanticallydifferent tables are assigned similar names) were alsoobserved in the databases (Table 1).

TABLE 1. Table Name Conflicts.

Washington StateDatabase

British ColumbiaDatabase

Tblmaterial (geology) Lithology (geology)TblwellData (location +general information)

Location

TblRecovery General (general information)TblWellTest

DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THE ABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA ⁄ WASHINGTON STATE)

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 927 JAWRA

ATTRIBUTE NAME CONFLICTS

Attribute name conflicts arise when semanticallysimilar attributes in databases are named differently.In Figure 2, for example, the unique identification isexpressed as WELLID in the Washington State data-base and as WELLTAGNUM in the BC database. Fur-thermore, the coordinate information is expressed asXUTM and YUTM in the Washington State Databaseand as UTMEast and UTMNorth in the BC Database.

Another example of naming conflicts was observedin the spatial datasets where merging of thematicdatasets can result in information loss. This is due toinability of the GIS software to handle attribute nam-ing conflicts. In the Washington State datasets, fea-tures were identified by a feature code (e.g., 1700200), whereas in the BC datasets features wereidentified by their names and subclasses (e.g., pavedsingle lane). The features codes are absent in theattribute table and are available from the MAL web-site. Resolution of such problems requires findingsemantically similar elements, renaming attributenames, and then performing the merge. This is fur-ther complicated by lack of semantically similar ele-ments in same thematic datasets and will bediscussed in the semantic heterogeneity section.

MANY TO MANY ATTRIBUTE CONFLICTS

Many to many attribute conflicts occur when seman-tically similar attributes are expressed using different

number of fields. In the Washington State database,for example, geological descriptions are expressed inthree fields (Material1, Material2, and Material3) vs. asingle field called Description in the BC database (Fig-ure 2). This was resolved by concatenating Material1and Material2 fields. Material3 was not concatenatedas only the first two terms were considered representa-tive of dominant material types and because fewrecords contained an entry in the Material3 field.

TABLE STRUCTURE CONFLICTS

Table structure conflicts arise when the number ofattributes in the tables differs. Although the tablesTblwellData in the Washington State database andthe Location table in the BC database are semanti-cally similar, they store a different number of attri-butes. TblwellData stores 109 attributes, while theUTM table stores 19 attributes. Table conflicts alsoarise when similar information is stored in differentnumbers of tables. For example, the table TblWellMa-terial in the Washington State database stores infor-mation that is distributed between two tables in theBC database (Figure 3).

DATA CONFLICT

Data conflicts may arise for any number of reasons.The data conflicts encountered in this research are:

FIGURE 2. Attribute Conflicts.

SCHUURMAN, DESHPANDE, AND ALLEN

JAWRA 928 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

• Unique identification was represented by anumeric data type in the Washington Statedatabase and as a string type in the BC data-base.

• There were multiple representations of the samedata. Sand may be expressed as sd, sand, or snad,while clay may be expressed as cl, or caly.

Although schematic heterogeneities are not diffi-cult to resolve, it is a time consuming process. Todate, no general framework has been created for theenumeration and systematic classification of resolu-tion techniques for schematic conflicts (Bishr, 1998).Possible resolutions, such as a unified schema (Bishr,1998) or the use of object-oriented data models, whichinclude concepts like generalization, aggregation,inheritance, and methods (Kim and Seo, 1991), havebeen suggested by various authors. As the data weremaintained in relational format, these heterogeneities

were resolved by mapping the Washington Statedatabase to the BC database.

Semantic heterogeneities result from differences inmeaning and classifications employed in the databas-es and are the cause of most interoperability prob-lems (Bishr, 1998; Sheth, 1999). Although this is anactive field of research, semantic interoperabilityremains unresolved.

In this study, the geological descriptions used in thevarious lithological databases were the source ofsemantic interoperability issues. Referential geologicalinformation for the study area was obtained from BCMinistry of Transportation bridge construction reports,drill core records from an independent research study(Cameron, 1989), stratigraphic information from Geo-logical Survey of Canada (GSC) reports, and well drill-ers’ logs. The geological classification for the bridgeconstruction reports were based on the Unified SoilClassification System – a de facto standard developed

FIGURE 3. Table Structure Conflicts.

DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THE ABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA ⁄ WASHINGTON STATE)

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 929 JAWRA

by the American Society for Testing and Materials –which is used for engineering purposes and is based onthe particle size, liquid limit, and plasticity index. Thedrill core record descriptions were based on the Went-worth Scale, GSC geological descriptions were basedon the stratigraphy and environment of deposition,and the drillers’ descriptions were based on experienceor education. These semantic differences and the lackof standardized descriptions for the study area resultedin 6,000 unique categories for the study area. Onlyonce the data heterogeneities have been addressed canthese lithological terms be standardized.

The Final Step: Standardizing UniqueLithological Terms

This study has described in detail the challengesthat face researchers who want to study cross-juris-diction hydrological phenomena – or to build models.These problems are common to a wide range of datausers. Only once these broad data heterogeneitieshave been addressed, can a standard classificationsystem be imposed on data. Standardization – or theimposition of a unified classification system on heter-ogeneous data sources – is superficially an attractiveoption but nevertheless it has a number of drawbacks(Bowker, 2000; Bibby and Shepherd, 2000). First itnecessarily glosses over profound and perhaps impor-tant differences between superficially similar terms(Russel et al., 1998; Schuurman, 2002, 2005). Second,it creates equivalences between terms that may behierarchically different in their respective databases.And it removes a field from the context of its sur-rounding data terms – a context that provides impor-tant information (Bowker, 2005).

Despite limitations to standardization, it is onemeans of forcing a merged dataset. One methoddescribed by Schuurman (2002) is to build a semi-automated interface. In this case, a Visual Basicinterface was created to merge well-log data accordingto a choice of two sets of rules. The Flexible Standard-ization for Spatial Data methods starts with a ‘‘clean’’dataset in which the most common errors outlinedabove have been addressed. It offers a choice of stan-dardization using two different classification systems:one developed at Simon Fraser University as a jointproject between Dr. Diana Allen and Dr. NadineSchuurman. The second classification system uses ascheme developed by hyrdogeologists at the GSC for avery large database of 250,000+ boreholes for theGreater Toronto Area of Ontario. They developed aclassification scheme with 12-category subsurfacecategories for well-log data (Logan et al., 2001;Russell et al. 1996). This classification is particularlywell suited to the unconsolidated materials of the

area but not ideal in a hard rock environment. TheFlexible Standardization for Spatial Data system isrule-based, and therefore not as flexible as might bedesired when dealing with complex, context-dependentclassification. An advantage of this system, however, isthat it can be easily adapted to incorporate new classi-fication rules. The system was developed using VisualBasic and therefore very portable (Schuurman, 2002).

A similar rule-based standardization methodologywas developed by Allen et al. (2007). This system isspecific to BC and uses a series of rules to reducelithological terms in BC from over 6,000 to 16. Itshould be noted, however, that the schematic andsyntactic problems associated with cross-border datamust be addressed in the first instance.

CONCLUSIONS

Anthropogenic activities are progressively stressingthe environment. Research to protect the environmenthas necessitated cross-border investigations; however,these studies are dependent on the availability of inte-grated datasets. Although integrated data are a prere-quisite for such studies, little is known about thechallenges and experiences of organizations sharingand integrating information across borders.

Interoperability research has concentrated onmethods to resolve either technical or institutionalinteroperability issues and has rarely addressed allfactors or challenges of integrating datasets. Thisresearch has explored interoperability issues from atechnical and institutional perspective, providingguidelines for the varied problems associated withintegrating cross-border datasets.

Data issues, such as poor data quality, can lead tolimited use of environmental datasets even thoughthey may be a source of valuable information and, inmany cases, the only source of information. For exam-ple, lithological databases associated with drillers’logs offer a source of valuable geological informationbut are difficult to use due to the questionable dataquality. Changes to the database structure can signif-icantly reduce data errors and improve the quality ofthe data. Creating databases adhering to databaserules would resolve most database errors. Similarly,setting data entry integrity rules would resolve theissue of inconsistent records or missing information.

Schematic and semantic heterogeneities are otherdata issues that inhibit interoperability. Semanticresearch has concentrated on object-oriented technol-ogy and is still in the prototyping stage. Although thisresearch provides futuristic solution it does notresolve the immediate integration problems faced by

SCHUURMAN, DESHPANDE, AND ALLEN

JAWRA 930 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

organization where data are maintained in a relationalformat. In the meantime, this study provides prag-matic solutions for integrating cross-border datasets.

Apart from data issues, institutional issues wereobserved as deterrents to data integration in an inter-national setting. The issues of data dissemination poli-cies, copyright issues, and data costs have often beencited as the facilitators or detractors of a booming GISindustry. Academic inquiry into such issues in Canadahas also shown that such policies have stifled the GISindustry (Sears, 2001; Klinkenberg, 2003). Althoughresearch has addressed these issues from a policy per-spective, the effects of metadata on data discovery,access, and dissemination are rarely addressed. A lackof metadata means that the user cannot know howthe database semantics should be interpreted. Institu-tional reluctance and organizational dynamics arealso important parameters that need to be consideredfor cross-border projects. Institutional inertia, in theform of reluctance to provide information or the lackof awareness of activities in the same department orother levels of governance, may result in duplicatedatasets that lack cross referencing.

Finally, we outlined two standardization systemsthat have been utilized with similar data. Their usedepends of course the data having been ‘‘cleaned’’ andready for use. Much of the work outlined in thispaper constitutes the data cleansing process. Only atthe end stage of this is data ready for a rule-based

classification. As the paper describes, data are thechief impediment to cross-border studies. Certainlyinvestigation of the technical and institutional dimen-sions of data integration should be considered a start-ing point for cross-border studies.

APPENDIX

TABLE A1. List of Datasets Obtained for This Research.

Dataset British Columbia (Canada) Washington State

Lithologicaldata

1. BC Ministry of Environment(formerly BC Ministry Water,Land and Air Protection)

2. Ministry of Transportation3. Drill Records (Cameron,

1989)

1. USGS2. Washington

State, Departmentof Ecology

DigitalElevationModel(DEM)

BC Ministry of Agricultureand Lands (formerly BCMinistry of SustainableResource ManagementMSRM)

United StatesGeological Survey

TopographicData

Formerly MSRM (mapsheets 92G008, 92G009,and 92G010)

USGS (7.5quadranglesBertrand Creek,Kendall, Sumas,Lynden)

SurficialGeologyMaps

Geological Survey of Canada(paper format)

(Department ofEcology,Washington State)

FIGURE A1. CGDI Design. BCGS 092G010214: British Columbia Geological Survey Map Sheet Coordinates; From 38 to 39 ft: Upper andLower Bounds of Geologic Unit; Geologic Description; Lith Seq#: Sequence Number for Each Line Entry; WTN: Unique Well Tag Number.

DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THE ABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA ⁄ WASHINGTON STATE)

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 931 JAWRA

TABLE A2. Inconsistent Well Records inthe BC and Washington State Databases.

Mapsheet Lithology General Location

92G008 (BC) 2,310 2,347 2,30292G009 (BC) 1,418 1,517 1,51792G010 (BC) 309 320 314Washington State 937 1,261

TABLE A3. Total Reduction of Well Records.

Map Sheet Original Final Percentage Reduced

92G008 2,319 1,882 1992G009 1,505 878 4292G010 314 127 41Washington State 1,261 937 26

TABLE A4. Interoperability Issues.

InteroperabilityIssues

British Columbia(Canada)

WashingtonState

Data format SAIF ⁄ ESRI SDTS (mandatory byfederal agencies)

Projection BC Albers ⁄ UTM UTM ⁄ SPCSDatum NAD 83 NAD 27Metadata FGDC compliant

(Voluntary)FGDC Complaint(Mandatory)

Scale 1:20,000 1:24,000Vertical datum CGDV 28 NGDV

LITERATURE CITED

Agarwal, P., 2005. Ontological Considerations in GIScience. Inter-national Journal of Geographical Information Science 19(5):501-536.

Allen, D., N. Schuurman, A. Deshpande, and J. Scibek, 2007. DataIntegration and Standardization in Cross-Border Hydrogeologi-cal Studies: A Novel Approach to Hydrostratigraphic ModelDevelopment. Environmental Geology Epub ahead of print. doi:10.1007/s00254-007-0753-3.

Bibby, P. and J. Shepherd, 2000. GIS, Land Use, and Representation.Environment and Planning B: Planning and Design 27:583-598.

Bishr, Y., 1998. Overcoming the Semantic and Other Barriers toGIS Interoperability. International Journal of GeographicalInformation Science 12(4):299-314.

Bishr, Y., H. Pundt, W. Kuhn, and M. Radwan, 1999. Probing theConcept of Information Communities – A First Step TowardsSemantic Interoperability. In: Interoperating Geographic Sys-tems, M.F. Goodchild, M. Egenhofer, R. Fegeas, and C. Kottman(Editors). Kluwer Academic Publishers, Boston, Massachusetts,pp. 39-54.

Bowker, G., 2000. Mapping Biodiversity. International Journal ofGeographical Information Science 14(8):739-754.

Bowker, G., 2005. Memory Practices in the Sciences. The MITPress, Cambridge, Massachusetts.

Brodeur, J., B. Yvan, E. Geoffrey, and M. Bernard, 2003. Revisitingthe Concept of Geospatial Data Interoperability Within theScope of the Human Communication Processes. Transactions inGIS 7(2):243-265.

Brodeur, J., B. Yvan, E. Geoffrey, and M. Bernard, 2004. A Geose-mantic Proximity – Based Prototype for the Interoperability ofGeospatial Data. Computers, Environment and Urban Systems29(2005):669-698.

Cameron, V.J., 1989. The Late Quaternary Geomorphic History ofthe Sumas Valley. MA Dissertation, Simon Fraser University,Burnaby, British Columbia.

Cox, S.E. and S.C. Khale, 1999. Hydrogeology, Ground-Water Qua-lity, and Sources of Nitrate in Lowland Glacial Aquifers ofWhatcom County, Washington, and British Columbia, Canada.U.S. Geological Survey Water-Resources Investigations Report.

Fonseca, F.T., C. Davis, and G. Camara, 2003. Bridging Ontologiesand Conceptual Schemas in Geographic Information Integra-tion. Geoinformatica 7(4):355-378.

FIGURE A2. Missing Elevations (wells composed of single record).Output From Custom Software LDBuilder. The BCGS number isomitted in the LDBuilder output. WELLTAGNUM: corresponds toWTN; UpperBound: corresponds to From # (depth); LowerBound:correspond to To # (depth); LayerOrderNum: is a new field thatstores the sequence number for the geologic layers. Although thelith Seq # serves the same purpose, it was not used due to inconsis-tent Lith Seq numbering. Layerdepth: is a new field which storesthe depth of the layers and Description: stores geologic description.

FIGURE A3. Missing Elevations (wells composed of two layers).

FIGURE A4. Surface Elevation Represented as a Null Value.

FIGURE A5. Inconsistent Records.

SCHUURMAN, DESHPANDE, AND ALLEN

JAWRA 932 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION

Fonseca, F.T., M.J. Egenhofer, C.A. Davis, Jr., and K.A.V. Borges,2000. Ontologies and Knowledge Sharing in Urban GIS. Com-puters, Environment and Urban Systems 24:251-271.

Frank, A.U., 1998. Metamodels for Data Quality Description. In:Data Quality in Geographic Information – From Error to Uncer-tainty, R. Jeansoulin and M.F. Goodchild (Editors). EditionsHermes, Paris, France, pp. 15-29.

Frank, A.U. and M. Raubal, 1999. Formal Specification of ImageSchemata – A Step Towards Interoperability in GeographicInformation Systems. Spatial Cognition and Computation 1:67-101.

Gruber, T., 1995. Towards Principles for the Design of OntologiesUsed for Knowledge Sharing. International Journal of Human-Computer Studies 43:907-928.

Guarino, N., C. Masola, and G. Vetere, 1999. OntoSeek: Content-Based Access to the Web. IEEE Intelligent Systems1094(7167):70-80.

Guptill, S.C., 1991. Spatial Data Exchange and Standardization.In: Geographic Information Systems: Principles and Applica-tions (Vol. 1), M. Goodchild, D. Maguire, and D. Rhind (Editors).Longman Scientific & Technical, Essex, pp. 515-530.

Guptill, S.C., 1999. Metadata and Data Catalogues. In: Geographi-cal Information Systems: Management Issues and Application,M. Goodchild, P. Longley, D. Maguire, and D. Rhind (Editors).John Wiley & Sons Inc., New York, New York, pp. 677-692.

Harvey, F., W. Kuhn, H. Pundt, and Y. Bishr, 1999. SemanticInteroperability: A Central Issue for Sharing Geographic Infor-mation. Annals of Regional Science 33:213-232.

Hogan, R. and M. Sondhiem, 1996. Spatial Data Standards Activi-ties in North America. In: Spatial Database Transfer Standards2: Characteristics for Assessing Standards for the Transfer ofSpatial Data and Full Description of the National and Interna-tional Standards in the World, H. Moellering (Editor). ElsevierScience on behalf of the International Cartographic Association,Oxford, United Kingdom, pp. 31-38.

Kashyap, V. and A. Sheth, 1996. Semantic and Schematic Similari-ties Between Database Objects: A Context – Based Approach.The VLDB Journal 5:276-304.

Kim, W. and J. Seo, 1991. Classifying Schematic and Data Hetero-geneities in Multidatabase Systems. IEEE 24(12):12-18.

Klinkenberg, B., 2003. The True Cost of Spatial Data in Canada.The Canadian Cartographer 47(1):37-49.

Kohut, A.P., 1987. Groundwater Supply Capability AbbotsfordUpland, Province of British Columbia. Water ManagementBranch, Ministry of Environment and Parks, Victoria, BritishColumbia.

Kokla, M. and M. Kavouras, 2001. Fusion of Top-Level and Geo-graphical Domain Ontologies Based on Context Formation andComplementarity. International Journal of Geographical Infor-mation Science 15(7):679-687.

Kottam, C.A., 1999. The Open GIS Consortium and ProgressTowards Interoperability in GIS. In: Interoperating GeographicInformation Systems, M. Egenhofer, M. Goodchild, R. Fegeas,and C. Kottam (Editors). Kluwer Academic Publishers, Boston,Massachusetts, pp. 39-54.

Kuhn, W., 1994. Defining Semantics for Spatial Data Transfers. In:Advances in GIS Research: Proceedings of the Sixth Interna-tional Symposium on Spatial Data Handling, T.C. Waugh andR.G. Healey (Editors). Department of Geoinformation, Tech Uni-versity Vienna, Vienna, pp. 973-987.

Logan, C., H.A.J. Russell, and D.R. Sharpe, 2001. Regional Three-Dimensional Stratigraphic Modelling of the Oak Ridges MoraineAreas, Southern Ontario 2001 [GSC Report: September 18,2005, 2001-D1].

Moellering, H., 1991. Approaches to Spatial Database TransferStandards: Introduction. In: Spatial Database Transfer Stan-dards: Current International Status, H. Moellering (Editor).

Elsevier Science on behalf of the International CartographicAssociation, London, United Kingdom, pp. 1-27.

Piteau Associates and Turner Groundwater Consultants. 1993.Groundwater Mapping and Assessment in British Columbia:Volume 1: Review and Recommendations. Environment Canada,North Vancouver, British Columbia.

Reed, P.M., T.R. Ellsworth, and B.S. Minsker, 2004. Spatial Inter-polation Methods for Nonstationary Plume Data. Ground Water42(2):190-202.

Ricketts, B.D., 1999. The Fraser Lowland Hydrogeology Project: AnOverview (No. Open File D3828). Geological Survey of Canada,Vancouver, British Columbia.

Russell, H.A.J., C. Logan, T.A. Brennand, M.J. Hinton, and D.R.Sharpe, 1996. Regional Geoscience Database for the Oak RidgesMoraine Project (South Ontario). Current Research 1998-E,Geological Survey of Canada, pp. 191-200.

Russel, H.A.J., T.A. Brennand, C. Logan, and D.R. Sharpe, 1998.Standardization and Assessment of Geological DescriptionsFrom Waterwell Records: Greater Toronto and Oak Ridges Mor-aine Areas, Southern Ontario (No. Current Research 1998-E).Geological Survey of Canada, Ottawa, Ontario.

Salge, F., 1999. National and International Data Standards. In: Geo-graphical Information Systems: Management Issues and Applica-tion (Vol. 2), M. Goodchild, P. Longley, D. Maguire, and D. Rhind(Editors). John Wiley & Sons Inc., New York, New York, pp. 693-706.

Schuurman, N., 2002. Flexible Standardization: Making Interoper-ability Accessible to Agencies With Limited Resources. Cartog-raphy and Geographic Information Science 29(4):343-353.

Schuurman, N., 2003. The Ghost in the Machine: Spatial Data, Infor-mation and Knowledge in GIS. Canadian Geographer 47(1):1-4.

Schuurman, N., 2005. Social Perspectives on Semantic Interopera-bility: Constraints on Geographical Knowledge From a DataPerspective. Cartographica 40(4):47.

Schuurman, N., 2006. Formalization Matters: Critical GIS andOntology Research. Annals of the Association of American Geo-graphers 96(4):726-739.

Schuurman, N. and A. Leszczynski, 2006. Ontology-Based Meta-data. Transactions in Geographic Information Science 10(5):709-726.

Sears, G., 2001. Geospatial Data Policy. Prepared for Geoconnec-tions Policy Advisory Node, KPMG Consulting, Ottawa, Ontario.

Sheth, A.P., 1999. Changing the Focus on Interoperability in Infor-mation Systems: From System, Syntax, Structure to Semantics.In: Interoperating Geographic Information Systems, M. Egenho-fer, M. Goodchild, R. Fegeas, and C. Kottman (Editors). KluwerAcademic Publishers, Boston, Massachusetts, pp. 5-29.

Sheth, A. and J.A. Larson, 1990. Federated Database Systems forManaging Distributed, Heterogeneous, and Autonomous Data-bases. ACM Computing Surveys (CSUR) 22(3):183-236.

Sieber, R.E., 2003. Public Participation Geographic Information Sys-tems Across Borders. The Canadian Cartographer 47(1):50-61.

Smith, B. and D.M. Mark, 2001. Geographical Categories: An Onto-logical Investigation. International Journal of GeographicalInformation Science 15(7):591-612.

Sondheim, M., K. Gardels, and K. Buehler, 1999. GIS Interopera-bility. In: Geographical Information Systems (Vol. 1), M.Goodchild, P. Longley, D. Maguire, and D. Rhind (Editors).John Wiley & Sons Inc., New York, New York, pp. 359-369.

Stock, K. and D. Pullar, 1999. Identifying Semantically SimilarElements in Heterogeneous Spatial Databases Using PredicateLogic. In: Interoperating Geographic Information Systems: Sec-ond International Conference, INTEROP’99 Zurich, Switzer-land, 1999, A. Vckovski, K. Brassel, and H. Schek (Editors).Springer-Verlag, Berlin, pp. 231-252.

Strong, D.M., Y. Lee, and R. Wang, 1997. Data Quality in Context.Communications of the ACM 40(5):103-110.

DATA INTEGRATION ACROSS BORDERS: A CASE STUDY OF THE ABBOTSFORD-SUMAS AQUIFER (BRITISH COLUMBIA ⁄ WASHINGTON STATE)

JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION 933 JAWRA

Taylor, D.R.F., 1996. Presidential Foreword. In: Spatial DatabaseTransfer Standards 2: Characteristics for Assessing Standardsfor the Transfer of Spatial Data and Full Description of theNational and International Standards in the World, H. Moeller-ing (Editor). Elsevier Science on behalf of the International Car-tographic Association, Oxford, United Kingdom.

Uschold, M. and M. Gruniger, 1996. Ontologies: Principles, Methodsand Applications. Knowledge Engineering Review 11(2):93-155.

Vckovski, A., 1998. Special Issue: Interoperability in GIS (GuestEditorial). International Journal of Geographical InformationScience 12(4):297-298.

Visser, U., H. Stuckenschmidt, and C. Schlieder, 2002a. Interopera-bility in GIS – Enabling Technologies. 5th AGILE Conference onGeographic Information Science, Palma (Balearic Islands, Spain).

Visser, U., H. Stuckenschmidt, G. Schuster, and T. Vogele, 2002b.Ontologies for Geographic Information Processing. Computers &Geoscience 28:103-117.

Widom, J., 1996. Integrating Heterogeneous Databases: Lazy orEager. ACM Computing Surveys (CSUR) 28(4es): Article No. 91.

Winter, S., 2001. Ontology: Buzzword or Paradigm Shift in GI Sci-ence. International Journal of Geographical Information Science15(7):587-590.

SCHUURMAN, DESHPANDE, AND ALLEN

JAWRA 934 JOURNAL OF THE AMERICAN WATER RESOURCES ASSOCIATION