6
Spatial characterization of the meltwater field from icebergs in the Weddell Sea John J. Helly a,b,1 , Ronald S. Kaufmann c , Maria Vernet b , and Gordon R. Stephenson b a San Diego Supercomputer Center and b Scripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093; and c Marine Science and Environmental Studies Department, University of San Diego, 5998 Alcalá Park, San Diego, CA 92110 Edited by Michael Goodchild, University of California, Santa Barbara, CA, and approved November 22, 2010 (received for review August 20, 2009) We describe the results from a spatial cyberinfrastructure devel- oped to characterize the meltwater field around individual icebergs and integrate the results with regional- and global-scale data. Dur- ing the course of the cyberinfrastructure development, it became clear that we were also building an integrated sampling planning capability across multidisciplinary teams that provided greater agility in allocating expedition resources resulting in new scientific insights. The cyberinfrastructure-enabled method is a complement to the conventional methods of hydrographic sampling in which the ship provides a static platform on a station-by-station basis. We adapted a sea-floor mapping method to more rapidly character- ize the sea surface geophysically and biologically. By jointly analyz- ing the multisource, continuously sampled biological, chemical, and physical parameters, using Global Positioning System time as the data fusion key, this surface-mapping method enables us to exam- ine the relationship between the meltwater field of the iceberg to the larger-scale marine ecosystem of the Southern Ocean. Through geospatial data fusion, we are able to combine very fine-scale maps of dynamic processes with more synoptic but lower-resolution data from satellite systems. Our results illustrate the importance of spatial cyberinfrastructure in the overall scientific enterprise and identify key interfaces and sources of error that require improved controls for the development of future Earth observing systems as we move into an era of peta- and exascale, data-intensive computing. Antarctica remote sensing hydrography surface water T his paper describes the results from a geospatial cyberinfras- tructure developed to support the characterization of the meltwater field around free-drifting icebergs and to combine it with synoptic, regional-scale data to investigate the role of ice- bergs in controlling biological productivity in the Weddell Sea. The cyberinfrastructure was formed through the time-based cor- relation of data sources, computing and communication channels across multiple scales of space and time to produce synthetic results (cf. SI Text). The work was carried out over the course of 20052009 including three cruises onboard the National Science Foundation (NSF) ice-capable ships in the vicinity of the Antarctica Peninsula [Fig. 1 (1)]. These results are of interest because the role of icebergs and their meltwater in controlling the freshwater and carbon fluxes in the Antarctic is unclear (2, 3). For example, for a given iceberg, what volume of meltwater, with its terrestrially derived inorganic load from rock scour and atmo- spheric deposition, is introduced into what volume of seawater and how is it distributed in time and space? An accurate and pre- cise answer to this question requires the spatial estimation of the meltwater field around an iceberg instantaneously as well as integrated over time with local- and regional-scale implications (4). A key goal of this research was to get as close to an instan- taneous estimate as possible within the limits of our shipboard capabilities and to do it over large enough spatial extents as to be able to correlate the fine-scale spatial analysis with more synop- tic, regional-scale analyses albeit with coarser spatial and tempor- al resolution. The characterization of the surface of the ocean with physical, chemical and biological data in the vicinity of a free-drifting ice- berg poses a number of challenges. Besides being large, icebergs are affected by geophysical forces that alter their structure and movement: solar radiation, Earths rotation, tides and currents, and winds and storms (5). As free-drifting, tabular icebergs pro- ceed through the ocean, they change size, shape, and mass dis- tribution and melt into the surrounding ambient seawater. The relative contributions of geophysical forces on iceberg motion change as ice mass is lost to the surface waters of the ocean, and the losses change the composition of the waters they move through. It is a dynamic and complex system that encompasses a wide range of measurement scales. Our focus has been on small-to-medium-sized icebergs (<250 km 2 ) because they are the most abundant and widely distributed in the Weddell Sea. Initially, in 2005, we attempted to characterize the meltwater field using hydrocasts distributed around the icebergs and trans- ects across the near- and far-field normal to the ice face. How- ever, due to the length of time required for these operations, and the dynamic motions of the icebergs surveyed, it was impossible to develop a clear and convincing picture of where the meltwater was; especially whether there was a plume or wake and in which direction it was tending. Hydrocasts typically require 1 h per Fig. 1. Regional map of the study area in the Weddell Sea and Scotia Sea showing the GPS-based ship track for the three research cruises, LMG0514a (red), NBP0806 (yellow), NBP0902 (green), as well two representative RADAR- SAT-1 image frame boundaries (dashed yellow, wideband mode). Bathymetry is from the Global Topography 11.1 (1). Author contributions: J.J.H., R.S.K., and M.V. designed research; J.J.H., R.S.K., M.V., and G.R.S. performed research; J.J.H. contributed new reagents/analytic tools; J.J.H. analyzed data; and J.J.H. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Data deposition: Data have been deposited with the Antarctic and Southern Ocean Data Portal (http://www.marine-geo.org/portals/antarctic/) as Expeditions LMG0514A, NBP0806, and NBP0902. 1 To whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/ doi:10.1073/pnas.0909306108/-/DCSupplemental. 54925497 PNAS April 5, 2011 vol. 108 no. 14 www.pnas.org/cgi/doi/10.1073/pnas.0909306108

Spatial characterization of the meltwater field from ...polarphytoplankton.ucsd.edu/docs/publications/... · Spatial characterization of the meltwater field from icebergs in the Weddell

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatial characterization of the meltwater field from ...polarphytoplankton.ucsd.edu/docs/publications/... · Spatial characterization of the meltwater field from icebergs in the Weddell

Spatial characterization of the meltwaterfield from icebergs in the Weddell SeaJohn J. Hellya,b,1, Ronald S. Kaufmannc, Maria Vernetb, and Gordon R. Stephensonb

aSan Diego Supercomputer Center and bScripps Institution of Oceanography, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093;and cMarine Science and Environmental Studies Department, University of San Diego, 5998 Alcalá Park, San Diego, CA 92110

Edited by Michael Goodchild, University of California, Santa Barbara, CA, and approved November 22, 2010 (received for review August 20, 2009)

We describe the results from a spatial cyberinfrastructure devel-oped to characterize themeltwater field around individual icebergsand integrate the results with regional- and global-scale data. Dur-ing the course of the cyberinfrastructure development, it becameclear that we were also building an integrated sampling planningcapability across multidisciplinary teams that provided greateragility in allocating expedition resources resulting in new scientificinsights. The cyberinfrastructure-enabled method is a complementto the conventional methods of hydrographic sampling in whichthe ship provides a static platform on a station-by-station basis.We adapted a sea-floormappingmethod tomore rapidly character-ize the sea surface geophysically and biologically. By jointly analyz-ing themultisource, continuously sampled biological, chemical, andphysical parameters, using Global Positioning System time as thedata fusion key, this surface-mapping method enables us to exam-ine the relationship between the meltwater field of the iceberg tothe larger-scale marine ecosystem of the Southern Ocean. Throughgeospatial data fusion, we are able to combine very fine-scalemapsof dynamic processeswith more synoptic but lower-resolution datafrom satellite systems. Our results illustrate the importance ofspatial cyberinfrastructure in the overall scientific enterprise andidentify key interfaces and sources of error that require improvedcontrols for the development of future Earth observing systemsas we move into an era of peta- and exascale, data-intensivecomputing.

Antarctica ∣ remote sensing ∣ hydrography ∣ surface water

This paper describes the results from a geospatial cyberinfras-tructure developed to support the characterization of the

meltwater field around free-drifting icebergs and to combine itwith synoptic, regional-scale data to investigate the role of ice-bergs in controlling biological productivity in the Weddell Sea.The cyberinfrastructure was formed through the time-based cor-relation of data sources, computing and communication channelsacross multiple scales of space and time to produce syntheticresults (cf. SI Text). The work was carried out over the courseof 2005–2009 including three cruises onboard the NationalScience Foundation (NSF) ice-capable ships in the vicinity ofthe Antarctica Peninsula [Fig. 1 (1)]. These results are of interestbecause the role of icebergs and their meltwater in controllingthe freshwater and carbon fluxes in the Antarctic is unclear (2, 3).For example, for a given iceberg, what volume of meltwater, withits terrestrially derived inorganic load from rock scour and atmo-spheric deposition, is introduced into what volume of seawaterand how is it distributed in time and space? An accurate and pre-cise answer to this question requires the spatial estimation of themeltwater field around an iceberg instantaneously as well asintegrated over time with local- and regional-scale implications(4). A key goal of this research was to get as close to an instan-taneous estimate as possible within the limits of our shipboardcapabilities and to do it over large enough spatial extents as to beable to correlate the fine-scale spatial analysis with more synop-tic, regional-scale analyses albeit with coarser spatial and tempor-al resolution.

The characterization of the surface of the ocean with physical,chemical and biological data in the vicinity of a free-drifting ice-berg poses a number of challenges. Besides being large, icebergsare affected by geophysical forces that alter their structure andmovement: solar radiation, Earth’s rotation, tides and currents,and winds and storms (5). As free-drifting, tabular icebergs pro-ceed through the ocean, they change size, shape, and mass dis-tribution and melt into the surrounding ambient seawater. Therelative contributions of geophysical forces on iceberg motionchange as ice mass is lost to the surface waters of the ocean,and the losses change the composition of the waters they movethrough. It is a dynamic and complex system that encompassesa wide range of measurement scales. Our focus has been onsmall-to-medium-sized icebergs (<250 km2) because they arethe most abundant and widely distributed in the Weddell Sea.

Initially, in 2005, we attempted to characterize the meltwaterfield using hydrocasts distributed around the icebergs and trans-ects across the near- and far-field normal to the ice face. How-ever, due to the length of time required for these operations, andthe dynamic motions of the icebergs surveyed, it was impossibleto develop a clear and convincing picture of where the meltwaterwas; especially whether there was a plume or wake and in whichdirection it was tending. Hydrocasts typically require 1 h per

Fig. 1. Regional map of the study area in the Weddell Sea and Scotia Seashowing the GPS-based ship track for the three research cruises, LMG0514a(red), NBP0806 (yellow), NBP0902 (green), as well two representative RADAR-SAT-1 image frame boundaries (dashed yellow, wideband mode). Bathymetryis from the Global Topography 11.1 (1).

Author contributions: J.J.H., R.S.K., and M.V. designed research; J.J.H., R.S.K., M.V., andG.R.S. performed research; J.J.H. contributed new reagents/analytic tools; J.J.H. analyzeddata; and J.J.H. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: Data have been deposited with the Antarctic and Southern OceanData Portal (http://www.marine-geo.org/portals/antarctic/) as Expeditions LMG0514A,NBP0806, and NBP0902.1To whom correspondence should be addressed. E-mail: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.0909306108/-/DCSupplemental.

5492–5497 ∣ PNAS ∣ April 5, 2011 ∣ vol. 108 ∣ no. 14 www.pnas.org/cgi/doi/10.1073/pnas.0909306108

Page 2: Spatial characterization of the meltwater field from ...polarphytoplankton.ucsd.edu/docs/publications/... · Spatial characterization of the meltwater field from icebergs in the Weddell

500 m of depth sampled, and the ship can move at 18.5 kmh−1between stations. A representative near-far, radial transect out to20 km from an iceberg with a hydrocast every 2-km to 1,500-mdepth requires about 31 h. This is a major commitment of shiptime, and this is the most scarce resource during a field campaignwith many competing demands for it.

An additional limitation on obtaining an estimate of themeltwater field is the difficulty in separating the individual forcesof wind, tide, and current and their relative contribution to themotions of a given iceberg and the water it was traveling throughbecause these become integrated as sampling time is extended.We needed a faster sampling method that would enable us tobetter localize the meltwater field and differentiate the meltwatersignal from background ambient conditions as well as to help uschoose where to conduct conventional hydrocast transects. Wealso needed something that would minimize the amount of timeand get us as close as possible to an instantaneous estimate. Thisled to the surface-mapping method (cf. Materials and Methods).

The surface mapping of the meltwater field requires a geospa-tial cyberinfrastructure with which to organize, combine, anddisplay analytic results coming from a variety of physical and bio-logical measurements. There is no standard definition of spatialcyberinfrastructure so, for the purposes of this paper, we general-ize the definition suggested by ref. 6 as follows: Spatial cyberin-frastructure is a complex of information technology and data withfour essential elements: (i) computational methods and (ii) com-munication channels that produce and transport (iii) digital infor-mation describing (iv) locations on the Earth. The first threeelements are common to most other cyberinfrastructure settings,but it is the fourth, the spatial element, that adds the complexityof an Earth-centric coordinate system.

The additional complexity poses unique challenges in main-taining a reliable quantitative framework for the integration ofdata while minimizing and managing the propagation of errorsin precision and accuracy. This challenge pertains not only topositional information but also to any derived measurementssuch as area, volume, distance, speed, and orientation. The needto maintain and verify positional accuracy and precision iswell-established in geodesy and cartography (7) but not directlyaddressed by existing geographic information system (GIS) tech-nology (8–10). The dynamic character of Earth systems furthercomplicates spatial measurements as exemplified by the long-standing unsolved problem of defining the coastline due to thevariability in topography, tides, and waves (11–13).

Estimation and management of spatial errors are of centralconcern in conducting scientific research with spatial data, espe-cially as we progress to higher-resolution data and models. Oftenerrors in measurement appear only when data from multiplesources are combined through data fusion. Mismatches in posi-tional information, for example, can arise through limitationsin measurement methods and sampling as well as through com-putational errors. These are frequently revealed when data abouta common position on the Earth but from different sources arecombined and found to be inconsistent. A rational spatial cyber-infrastructure should account for these major sources of errorand facilitate the identification and reconciliation of them.

In spatial analysis, perhaps the most significant purely cyber-infrastructure challenges emerge when large volumes of multi-source, multiscale digital data are combined into data fusionproducts for quantitative analysis. As we continue to expandthe use of derivative, data fusion products, it is increasinglyimportant to have a means to establish the provenance andcorrectness of the information products. The multistep natureof data fusion, along with the difficulty in simply moving andstoring large quantities of data of any kind, challenges currentmethods of verification of data correctness and provenance. Itis useful to consider an example of observational data across localand regional scales and use this as a basis to extrapolate to the

long-term cyberinfrastructure requirements for future Earth-observing systems. Our results from surface mapping the melt-water field of icebergs provide a good case study.

Results and DiscussionShip-based sampling of the Weddell and Scotia seas was con-ducted during 2005–2009 across a range of spatial and temporalscales, from meters to hundreds of kilometers (Fig. 1) and fromminutes to weeks using a wide range of physical, chemical, andbiological data sources. Through a combination of conventionalship-based sampling methods and a fast, surface-mapping method(Fig. 2, Right), we were able to detect and characterize the melt-water field from free-drifting icebergs on an unprecedentedspatial scale (1–103 m) that enables the connection of local-and regional-scale, space-based measurements (102–104 m). Theresults of the survey for salinity shown in Fig. 2 (Right), a conser-vative property of seawater, indicate that the meltwater field isa large fraction (>75%) within the area sampled with salinityvalues below that of a control site 74 km away (Fig. 3). The sur-face-mapping grid has 20 × 1 km spacing on the grid pattern. Thisprovides a basis for estimating the structure of the meltwaterfield geometrically using the percentage area distribution ofFig. 3. In ongoing research, this result will be combined withexpendable bathythermograph and Acoustic Doppler CurrentProfiler (ADCP) data (cf. SI Text) to form an estimate of thevolume of ambient seawater affected by the meltwater. Theseresults are beyond the scope of this report and will be presentedelsewhere.

These results provide direct evidence of the spatial character-istics of the meltwater field from free-drifting icebergs and its ef-fects on the ambient seawater. Changes in measures of biomass aswell as physical and chemical properties of the seawater areclearly and consistently revealed by surface mapping across thebiogeochemical domains (Fig. 2, salinity, and chlorophyll-a) pro-viding confidence that the surface maps captured real featuresand not sampling artifacts. Consequently, we now have a soundbasis for more effectively sampling the influence of the meltwateron carbon flux from the surface waters to the sea-floor sedimentsusing more time-consuming but versatile sampling methods suchas hydrocasts. Current measurements made during the surface-mapping survey (Fig. 2, Bottom Right) were combined at a seriesof discrete depths resulting in a set of maps around the icebergto a depth of approximately 250 m. These were used to produceanimations showing the vertical structure of the circulation andwake turbulence in the near and far field relative to the locationof an iceberg. Repeated sampling of the same geographic regionrevealed that the meltwater field persisted for at least 10 d afterthe iceberg moved through the area (4).

Because icebergs and their environs are so dynamic and theareas involved are so large (e.g., 102–103 km2 in area, 1–20 kmin longest dimension) compared to the scale of ship-based sam-pling techniques, a static sampling strategy, such as a Cartesiangrid of sample locations using hydrocasts, is impractical. Usingthe fast surface-mapping method, we were able to focus our sam-pling by mapping the areas influenced by the iceberg meltwaterfirst, followed by targeted sampling of the water column andbiota, stopping or operating the ship at slow speeds (e.g.,3–4 kmh−1) as required. Surface mapping enables us to definethe important length scales and regimes of interest around aniceberg and helps to direct the location and spacing of watercolumn sampling for physical and biological variables using thesurface expressions of water mass properties detected by surfacemapping.

The rapidity of surface mapping also enables extensive surveysthat provide overlap with regional-scale space-based measure-ments across multiple pixels. This provides a sound basis fordata fusion across local and regional scales of measurement(cf. SI Text). The overlap provides an essential means of connect-

Helly et al. PNAS ∣ April 5, 2011 ∣ vol. 108 ∣ no. 14 ∣ 5493

ECOLO

GY

GEO

PHYS

ICS

SPEC

IALFEAT

URE

Page 3: Spatial characterization of the meltwater field from ...polarphytoplankton.ucsd.edu/docs/publications/... · Spatial characterization of the meltwater field from icebergs in the Weddell

ing iceberg-scale processes to synoptic, regional-scale oceanicprocesses that cannot be observed effectively from a ship. Thelocal-scale maps provide subpixel information within the morecoarse, regional-scale maps and densify the measurements withinthe extent of the local-scale maps. This cross-mapping spans 2–3orders of magnitude resulting in both a locally high-resolutionand regionally synoptic dataset.

The fused spatial products also span interdisciplinary domains,chemical, biological, and physical, each with unique sampling andscale characteristics that must be considered in the context ofa cyberinfrastructure for spatial data exploitation. A key focusof this Antarctic research is to connect the fine-grained localmeasurements in the vicinity of icebergs (e.g., plant productivity,animal abundance and distribution, and seawater chemistry andphysics) to the coarser-grained, regional measurements of ocean

surface features (e.g., sea-surface temperature, chlorophyll, ice-berg positions, and track history). This connection is made usingtime to index the position of the ship using global positioningsystem (GPS) time to combine continuously recorded physicaloceanographic data and discretely sampled chemical and biolo-gical measurements (Fig. S1). By fusing these data, we are able todevelop a spatially detailed picture of the interactions of oceanicwater masses, iceberg meltwater, iceberg position and track his-tory, and the surrounding distribution of flora and fauna.

The combination of physical, chemical, and biological indica-tors enabled us to direct time-consuming biological samplingefforts, such as trawling, by replanning the sampling of locationswith surface-mapping features suggestive of biological activity(e.g., fluorescence due to chlorophyll a, available continuouslyand also measured in discrete water samples) (Fig. S1A). Surface

Fig. 2. Correspondence of regional and local data products across temporal scales and spatial extent and resolution. The local-scale measurements on the rightreveal subpixel processes within the regional-scale raster data on the left. (A) Phytoplankton biomass is estimated fromModerate Resolution Imaging Spectro-radiometer (MODIS) chlorophyll-a data products compared to fluorescence measurements from surface mapping of shipboard measurements of fluorescence.(B) MODIS sea-surface temperature (SST) is used to investigate thermal fronts versus near-field iceberg-melt affects on SST from shipboard thermosalinograph.(C) JASON-2 sea-surface elevation data is correlated with shipboard ADCP current data to investigate the geostrophic forces on iceberg motion and relationshipto local currents, tides, and wind. The label C18A identifies the location of the large tabular iceberg discussed in this paper. A partial western perimeter of theiceberg can be seen in the surface-mapping images in the right column.

5494 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0909306108 Helly et al.

Page 4: Spatial characterization of the meltwater field from ...polarphytoplankton.ucsd.edu/docs/publications/... · Spatial characterization of the meltwater field from icebergs in the Weddell

mapping also supported research on the state of biological com-munities (e.g., plankton biomass and community composition)with a sampling effort of one to several hours per sample. Thisapproach saves time and provides more meaningful data anddemonstrates the importance of spatial information in the jointinterpretation of physical, chemical, and biological data as wellas focusing the sampling strategy itself.

In the results presented here, positional accuracy and precisionare controlled by the underlying accuracy and precision of theclocks used within the computers of the onboard sensor systems,the temporal and positional errors in GPS measurements at thesehigh latitudes (also clock-based), and the temporal precision ofthe computational method of “binning” the data from multiplesensor systems with the navigational data (cf. SI Text). Time isthe controlling factor because it is what is used as the selectionand merge-key for associating sensor data with positional data(Fig. S1B). In principle, these sources of error are knowable,but in practice they are generally unknown and not explicitlyaccounted for. In the data fusion products, combining local-scalemaps with regional-scale maps, the positional uncertainties inthe field data are confounded with those inherent in the dataproducts from the satellite sensor systems (Fig. S1A) for whichthe spatial errors are also poorly represented and accounted atthe pixel level. This is the level that data fusion is done in griddedraster data of the type represented by this work (cf. SI Text) as wellas in most other geospatially intensive science. In lieu of practicalmethods of explicit error estimation and propagation, it iscommon practice to quality control the data by ad hoc, heuristiccross-checking with other data sources of better-known quality.

This issue is not directly addressed in any of the commonly usedspatial analysis tools, and this is a significant limitation to theproper characterization of spatial errors at the pixel level. How-ever, the problem is approachable by standardization of instru-ment interfaces to data systems to integrate global system clocks(somemanufacturers already do this), standardization of data for-mats, and through the open-source user and developer commu-nities that build and use spatial tools for research such as theGeneric Mapping Tools (GMT, http://gmt.soest.hawaii.edu/) (14),for general purpose mapping and analysis of spatial data, andMB-System (http://www.ldeo.columbia.edu/res/pi/MB-System/)(15) for swath-mapping of multibeam sonar data. Although notwidely known outside the geophysics community, these are bothdesigned specifically for scientifically sound, quantitative spatialanalysis and are already adapted to open, standard geophysicaland geospatial data formats (e.g., netCDF and Arcgrid) and

conventions. Interestingly, although the Arcgrid format startedas a proprietary format, it has evolved into an essentially openstandard because of the need for greater data interoperability.

The open-source software development model exposes theprogramming language source code to continuous challenges bydomain-expert user communities and allows careful analysis ofeach data transformation and fusion step. Very importantly, thesecodes typically run on a wide range of computational platformsand can be embedded in scalable computations and data analyses.This provides the opportunity for ongoing development ofincreasingly robust, automated error management schemes withexplicit representations of sensor system measurement time andglobal system time, at the pixel level. Many instrument manufac-turers already provide external time source interfaces by support-ing the National Marine Electronics Association standardmessage from external GPS receivers to enable GPS time to beembedded in their sensor outputs. Ideally, all time referenceswould be derived from a common global clock, but in lieu of thatthere is a need for capturing the local measurement time andthe global system time simultaneously in a geospatial cyberinfras-tructure. This will make it possible to postprocess the data forspatial errors using whatever is chosen as the global system clockand navigational reference system. This is GPS for many currentresearch applications but could be other systems such as theRussian GLONASS or the future European Galileo system. Suchpathways to improvement are not available in proprietary GISsoftware but fortunately alternatives to proprietary GIS exist andcan be found in systems such as GRASS (http://grass.itc.it/) andQGIS (http://www.qgis.org/).

We also need improvements in data formats to represent posi-tional uncertainty as part of the data and metadata structures insuch a way that the error ellipse for any given georeferenced pixelcan be easily accessed and correctly propagated in computations.This requires the representation of both time and position andthe errors in each to be expressed explicitly in a data structure.An analogy that suggests itself is the method of computation withcomplex numbers where the real and the imaginary parts of thenumbers are explicitly carried through the computation. This re-quires both specialized data types and operators to carry out thisclass of computations. Another interesting example is the use oforbital elements to precisely specify the position of an orbitingspacecraft. Although these “two-line” element sets may seemburdensome with respect to earth science data, they do providea clear example of how space and time information are encodedto enable highly precise and accurate position determinations tobe made.

There are related problems having to do with the georeferen-cing of place-names as well, for example, the Weddell Sea. Whenwe talk about the Weddell Sea, it is generally understood to bedefined by both physical oceanographic properties of the seawateras well as by the bathymetry and boundaries of the region but notby a well-defined polygon described in geographic coordinates.One approach to addressing this would be to carry along withthe place-name a polygon with explicit geospatial error terms,as measures of uncertainty due to ambiguity in definition ratherthan measurement, or physical properties of the water mass alonga given polygon edge. This way researchers would be betterequipped to know if they are talking about the same place.

In our ongoing research, we are examining methods ofproperly representing the error budget in our surface-mappingcalculations and generalizing their representation but have nogeneral solution to these problems at present. Our current prac-tice is to constrain the mapping using ship tracklines and design-ing sampling plans with this explicit purpose in mind. However,more work needs to be done in methods for mapping and statis-tically summarizing both navigational and observational errors.

The interdisciplinary nature of this research and the associatedwide variety of sensors and sampling methods results in a large

Fig. 3. The area within 0.05-psu contours of the surface map of salinity wasanalyzed for absolute area (km2) and also as a percentage of the total areasampled (black solid line). The results are plotted here and show the size ofthe meltwater field on an areal and percentage basis. The control value isalso plotted (black dashed line).

Helly et al. PNAS ∣ April 5, 2011 ∣ vol. 108 ∣ no. 14 ∣ 5495

ECOLO

GY

GEO

PHYS

ICS

SPEC

IALFEAT

URE

Page 5: Spatial characterization of the meltwater field from ...polarphytoplankton.ucsd.edu/docs/publications/... · Spatial characterization of the meltwater field from icebergs in the Weddell

number of data files: about 20,000 files for one research cruisewith a moderate volume of data. Using the data from these threeexpeditions as a basis, we find that the minimum data volume(i.e., onboard data acquisition system, satellite data, ADCP,and derived analytic data) for each cruise is ∼200 gigabytes(109 bytes) (Table S1). A typical field research program couldhave four of these cruises over a 3-y program resulting in adata load of ∼3 terabytes (TB, 1012 bytes) of data per researchprogram, including both field and laboratory data. If this wereextended to a 50-y monitoring project, a single copy of the com-plete data record is ∼40 TB. This is not an extraordinary amountof data in contrast to the amounts generated by climate models,but it is enough to pose a significant challenge to the individualresearcher and is well beyond the capability of many GISs tohandle for reasons such as limited memory, sequential processing,and 32-bit addressing. Some of these limitations will be graduallyremoved through technology evolution and commercial develop-ment, but that is driven by business needs, not scientific needs.Focused efforts are needed within the scientific community to en-sure we are prepared to handle these types and quantities of data.

Manipulating this data record using data fusion methodsrequires that multiple sources of data come together within acomputer at some point regardless of whether the data are storedin a centralized or decentralized manner for archival purposes.This is because computation requires that data operands be heldwithin primary computer memory (random access memory) atthe same time. The implication is that the data must be containedin file systems accessible by a given processing unit for the dura-tion of the computation however those file systems are deployedand connected to the processors. As our ability to produce highvolumes of georeferenced data have grown, the demands on com-munication and data transfer capacities have become a rate-limit-ing step in scientific progress. On land, in the laboratory, this isa problem, and at sea it is a bigger problem due to much morelimited shipboard resources and options.

Actual computation with data typically requires at least twicethe disk space required to store it and more typically 3× includingcopying from one system to another and work space required forthe processing of intermediate computational results. At present,and for the short-term future such as the next 5 to 10 y, there isno reasonable expectation of sufficient network communicationinfrastructure to enable the virtualization of this processing ex-cept in special cases. We are only just beginning to provisionthis type of computing within the research community throughprograms such as the National Science Foundation (NSF) Tera-Grid XD (for extreme data). Despite these efforts, tera-scale dataanalysis resources will be scarce for the next 10 y. This scarcity willbe exacerbated by the need within the geospatial cyberinfrastruc-ture for improvements in spatially enabled data representations.A rational, coherent, spatial cyberinfrastructure for multidisci-plinary research should be planned into Earth-observing systemsfrom the outset, and these should be consciously organizedaround community standards and best practices. This will facil-itate the development of better methods for error representationand propagation management as well as a broader understandingof them. Until we can achieve these goals, the capability forroutine spatial and temporal fusion and visualization of data,in the range of terabytes, is needed to provide observing systemswith the analytic capability to make scientific progress usingthe current ad hoc methods in the face of poorly characterizedsystematic and nonsystematic errors in observational and modeldata.

Other key features that should be factored into future obser-ving systems include network interfaces that provide on-demanddata access from within scientific intranets to host-platformdata acquisition systems. For example, some existing policiesof network management on the ships make it difficult to obtainaccess to real-time and near-real-time data. This can be overcome

by designing the data acquisition systems with “demilitarizedzones” in the network architectures where open access is encour-aged rather than grudgingly permitted. Increasingly, securityrequirements in multiuser computing systems limit the abilityof scientists to access and control critical parts of an operatingsystem to verify the integrity of measurements or to rapidly accessthe most recent measurements as part of decision-support ana-lyses. Networks should be designed to service science require-ments without sacrificing security. The present models sufferfrom the retrofitting of post hoc approaches that often succeedonly in keeping out the real users. We need new approaches toprovide reliable and convenient access to data.

System services such as timekeeping and file-system interoper-ability must be reliable and explicit in order to enable internaland external consistency-checking especially when dealing witha multiplicity of data types, large numbers of files, and largevolumes of data. Connecting natural phenomena across spatialscales requires careful attention to time and position and demandsan understanding of the limitations of timekeeping and data inter-operability in computing systems. For example, most commoncomputing systems do not provide easy access to time at a preci-sion of less than 1 s and have limited means for preserving accu-racy by correcting time bias across clocks. Loss of accuracy andprecision in time or position (e.g., temporal binning, truncatingdecimal degrees of longitude or latitude in data conversions)throw away important location information and introduce hard-to-discover ambiguities in position that are often revealed duringdata fusion in a trial-and-error fashion.

Finally, sharing data within a large, multidisciplinary scienceteam requires a systematic method of storing, cataloguing,archiving, and distributing data to ensure that each membercan unambiguously identify the data they are working with.The international community has recognized this problem andreached a multinational consensus on the need for change inthe management of long-term scientific data (16). We have thesame need within observational systems as the data are beingproduced. To address these issues we have developed the DigitalLibrary Framework (DLF) and have been using it to support anumber of scientific research projects through the CaliforniaCoastal Atlas (http://CaliforniaCoastalAtlas.net) as well as otherNSF-supported projects. The DLF is a data publication systemthat is designed to support the scientific enterprise as illustratedin Fig. S1 and employs the use of digital object identifiers tointegrate the data holdings with the publication industry’s stan-dard cross-referencing systems. The system is portable and is usedonboard the ships to manage the onboard system data for analysisand interoperates by design with the shore-based system to pro-vide the basis for an editorially controlled publication process forall types of scientific data but especially geospatial data. Spatialcyberinfrastructure efforts such as these are at the frontier inthe discovery, development, and implementation of effectivesolutions to these long-standing and ongoing scientific needs.

Materials and MethodsIn November–December 2005, during a research expedition (designatedLMG0514a) on the R/V (Research Vessel) Laurence M. Gould, we madeship-based observations that demonstrated the existence of a near-field tofar-field gradient in several biological and geochemical parameters arounda free-drifting iceberg (3). In an effort to better understand the distributionof meltwater around icebergs, a second expedition (NBP0806), on the RVIB(Research Vessel/Ice Breaker) Nathaniel B. Palmer, in June 2008, was devotedto the development of improved sampling methods and exploring ap-proaches to characterizing the environment around icebergs. This focusled to a method of sampling the surface waters that we dubbed surface map-ping because of its similarity to seafloor swath mapping and the commonlyused “mowing-the-lawn” sampling pattern used with onboard multibeamsonar (Fig. 2, Right). The third expedition (NBP0902), in March–April 2009,also on the RVIB Nathaniel B. Palmer, was the first operational use of thesurface mapping as a routine part of our iceberg characterization. Usingthe surface mapping sampling method in combination with surface water

5496 ∣ www.pnas.org/cgi/doi/10.1073/pnas.0909306108 Helly et al.

Page 6: Spatial characterization of the meltwater field from ...polarphytoplankton.ucsd.edu/docs/publications/... · Spatial characterization of the meltwater field from icebergs in the Weddell

measurements from a flow-through thermosalinograph instrument, wewere able to characterize large areas (102–103 km) near and around a largetabular iceberg, designated C18A, at full speed (18.5 kmh−1 knots) withoutstopping.

ConclusionsThe ability to correlate multisource measurements accuratelyand precisely across spatial and temporal scales requires carefulmanagement of the georeferencing and georegistration of thedata and a clear understanding and control of the sources of errorinherent in data processing. Because timekeeping is fundamentalto geolocation systems, the relationships between measurementinstrumentation and sources of time used in geolocation must beaccounted for more effectively. This can be achieved by integrat-ing geolocation–service system time into computing and instru-mentation systems and by adding interfaces between them forsynchronization. This level of sophistication in timekeeping isfound in the geodetic community but not broadly in other scien-tific disciplines that need it. Improvements to the production ofmetadata, now widely recognized as important in scientific datamanagement, need to move forward more rapidly so that theyare automatically produced according to well-defined standards.Metadata are not only important in documentation and archivalbut they are increasingly important in machine-to-machine inter-operation and in our ability to automate the highly labor-inten-sive data processing necessary for high-resolution, high-volumedata from future observing systems. The specification of spatialreference systemmetadata has enabled the geospatial communityto achieve a high level of data interoperability. It needs to beexpanded to explicitly provide for error management. In addition,temporal reference system metadata are needed to provide theequivalent for the time domain.

The determination and control of the propagation of temporaland geospatial errors must be improved. We need new models of

these errors, and we can learn from our colleagues in the astro-nomical community. Models of errors in telescope performancehave been used to make progress in observing system design andperformance for decades. End-to-end modeling of measurementsnot only provides insight into the parameterization and analysisof error propagation, but it can also be used as an aide in planningdata collection (17). Because the resultant models simulate sub-grid (subpixel) processes, they can be integrated with geophysicalmodels to make important connections between the geophysical,chemical, and biological domains at local scales of measurement.

As global and regional geophysical models approach the abilityto resolve processes at local spatial and temporal scales, newobserving systems should plan for the data interoperabilityneeded to correlate observations with model predictions. Thisis essential to the effective assessment of the predictive skill ofmodels and to advance the use of data assimilation with geophy-sical models (18). There is a need for this integration to beaccounted for as part of the future modeling and instrumentationprograms and this will require new interactions and roles betweenand within the modeling and observing communities.

ACKNOWLEDGMENTS. We thank all the shipboard scientific personnel andcrew on the R/V Laurence M. Gould (LMG05-14A) and the RVIB NathanielB. Palmer (NBP0802, NBP0902) for excellent support. We specially thank ChiefScientist Kenneth L. Smith who ensured that we had sufficient ship time todevelop and employ these methods during each of the cruises. We also thanktwo anonymous reviewers whose comments significantly contributed toimprovements to the manuscript. This research was supported by NSF GrantsANT-0636723, ANT-0636809, ANT-0636440, ANT-0636543, ANT-0636319,ANT-0636813, ANT-0636730, ANT-0529815, ANT-0650034, and OCE-0327294,and by the David and Lucile Packard Foundation. RADARSAT-1 and PALSARimages were provided by National Aeronautics and Space Administration(NASA) through the Alaska Satellite Facility and a NASA investigator grant(to J.J.H.). NASA MODIS data were provided by the Goddard Space FlightCenter Ocean Color system. Support for the Digital Library Frameworkwas provided by NSF Grant OCE 0607372.

1. Smith WHF, Sandwell DT (1997) Global sea floor topography from satellite altimetryand ship depth soundings. Science 277:1956–1962.

2. Schodlok MP, Hellmer HH, Rohardt O, Fahrbach E (2006) Weddell Sea iceberg drift:Five years of observations. Geophys Res 111:C06018 10.1029/2004JC002661.

3. Smith K, et al. (2007) Free-drifting icebergs: Hot spots of chemical and biologicalenrichment in the Weddell Sea. Science 317:478–482.

4. Helly JJ, Kaufmann RS, Stephenson GR, Jr, Vernet M (2011) Cooling, dilution andmixing of ocean water by free-drifting icebergs in the Weddell Sea. Deep-Sea ResPt II 10.1016/j.dsr2.2010.11.010.

5. Wadhams P (2000) Ice in the Ocean (Gordon and Breach, New York).6. Wright DJ, Wang S (2011) The emergence of spatial cyberinfrastructure. Proc Natl

Acad Sci USA 108:5488–5491.7. Snyder JP (1987) Map projections—a working manual, professional paper 1935. (US

Geological Survey, Washington, DC).8. Burrough PA, McDonnell R (1998) Principles of Geographical Information Systems (Ox-

ford Univ Press, Oxford, UK).9. Goodchild M (1995) GIS and Geographic Research, Ground Truth: The Social Implica-

tions of Geographic Information Systems (Guilford Press, New York).

10. Heuvelink GBM (1998) Error Propagation in Environmental Modeling with GIS,Research Monographs in Geographic Information Systems (Taylor & Francis, London,England).

11. Beneviste J, Vignudelli S (2009) Challenges in coastal satellite altimetry. EOS Trans AGU90:225 10.1029/2009EO260007.

12. Cartwright DE (1999) Tides: A Scientific History (Cambridge Univ Press, New York).13. Monmonier MS (2008) Coastlines: How Mapmakers Frame the World and Chart

Environmental Change (Univ Chicago Press, Chicago).14. Wessel P, Smith WHF (1991) Free software helps map and display data. EOS Trans

AGU 72:441.15. Caress DW, Chayes DN (1996) Improved processing of Hydrosweep DS multibeam data

on the R/V Maurice Ewing. Mar Geophys Res 18:631–650.16. Mokhele K (2004) Scientific data and information. International Council for Science

(ICSU). A report of the CSPR Assessment Panel (http://www.icsu.org/Gestion/img/ICSU_DOC_DOWNLOAD/551_DD_FILE_PAA_Data_and_Information.pdf).

17. Giacconi R (2008) Secrets of the Hoary Deep: A Personal History of Modern Astronomy(Johns Hopkins Univ Press, Baltimore, MD).

18. Edwards PN (2010) A Vast Machine: Computer Models, Climate Data, and the Politicsof Global Warming (Massachusetts Inst Technology Press, Cambridge, MA).

Helly et al. PNAS ∣ April 5, 2011 ∣ vol. 108 ∣ no. 14 ∣ 5497

ECOLO

GY

GEO

PHYS

ICS

SPEC

IALFEAT

URE