7
Computers & Geosciences 33 (2007) 678–684 Efficient algorithms for finding sills in digital topographic maps Thomas D. Otto a, , Andreas M. Thurnherr b a Laborato ´rio de Genoˆmica Funcional e Bioinforma´tica, DBBM, Oswaldo Cruz Institute, Av. Brasil, 4365, 21040-900 Rio de Janeiro, Brazil b DOCP, LDEO, Columbia University, Palisades, NY 10964-8000, USA Received 31 March 2006; received in revised form 3 October 2006; accepted 5 October 2006 Abstract In stratified geophysical flows, the energetically optimal exchange of dense fluid across a topographic barrier generally takes place at the deepest unblocked connection, which is typically a saddle point (sill). The flow at or near a sill is often hydraulically controlled, in which case the sill is called a controlling sill. Oceanographic examples include overflows of newly formed dense water at high latitudes as well as sills in channels connecting major ocean basins, such as the Strait of Gibraltar. Controlling sills are usually associated with strong flows, making them ideal sites for monitoring transport and hydrographic variability. The locations and depths of controlling sills also provide strong constraints for the downstream hydrographic properties below sill depth. Here, two algorithms for finding sills in digital topographic maps are presented. The first approximates the sill height to arbitrary precision in OðkÞ steps, where k is the number of data points in the map. The second algorithm, which requires Oðk log kÞ steps, additionally returns the sill location. Several tests carried out with realistic problems from physical oceanography reveal that the second algorithm runs faster in practice, even though its worst case behavior is worse. r 2006 Elsevier Ltd. All rights reserved. Keywords: Digital elevation model; Topography; Sill; Priority queue 1. Introduction Inspection of any large-scale bathymetric map of the ocean (e.g. Fig. 1) reveals that the seafloor is characterized by deep basins separated by shallower topography. For example, the Mediterranean is separated from the North Atlantic by the African and European continents, with the Strait of Gibraltar providing the sole pathway for exchange of water (Fig. 1). Similarly, the deep basins of the western and eastern Atlantic are separated by the Mid-Atlantic Ridge (MAR). In regions of complex topography, such as on the MAR, the separation of the seafloor into individual basins occurs on comparatively small scales (e.g. Fig. 2). The deepest point along a topographic barrier separating two deeper regions is usually a saddle point, and is called a sill. In order to answer a variety of oceanographic questions the locations ARTICLE IN PRESS www.elsevier.com/locate/cageo 0098-3004/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2006.10.003 Corresponding author. Tel.: +55 21 3865 8159; fax: +55 21 2590 3495. E-mail addresses: otto@fiocruz.br (T.D. Otto), [email protected] (A.M. Thurnherr).

Efficient algorithms for finding sills in digital topographic maps

Embed Size (px)

Citation preview

Page 1: Efficient algorithms for finding sills in digital topographic maps

ARTICLE IN PRESS

0098-3004/$ - se

doi:10.1016/j.ca

�Correspondfax: +5521 259

E-mail addr

[email protected]

Computers & Geosciences 33 (2007) 678–684

www.elsevier.com/locate/cageo

Efficient algorithms for finding sills indigital topographic maps

Thomas D. Ottoa,�, Andreas M. Thurnherrb

aLaboratorio de Genomica Funcional e Bioinformatica, DBBM, Oswaldo Cruz Institute, Av. Brasil,

4365, 21040-900 Rio de Janeiro, BrazilbDOCP, LDEO, Columbia University, Palisades, NY 10964-8000, USA

Received 31 March 2006; received in revised form 3 October 2006; accepted 5 October 2006

Abstract

In stratified geophysical flows, the energetically optimal exchange of dense fluid across a topographic barrier generally

takes place at the deepest unblocked connection, which is typically a saddle point (sill). The flow at or near a sill is often

hydraulically controlled, in which case the sill is called a controlling sill. Oceanographic examples include overflows of

newly formed dense water at high latitudes as well as sills in channels connecting major ocean basins, such as the Strait of

Gibraltar. Controlling sills are usually associated with strong flows, making them ideal sites for monitoring transport and

hydrographic variability. The locations and depths of controlling sills also provide strong constraints for the downstream

hydrographic properties below sill depth. Here, two algorithms for finding sills in digital topographic maps are presented.

The first approximates the sill height to arbitrary precision in OðkÞ steps, where k is the number of data points in the map.

The second algorithm, which requires Oðk log kÞ steps, additionally returns the sill location. Several tests carried out with

realistic problems from physical oceanography reveal that the second algorithm runs faster in practice, even though its

worst case behavior is worse.

r 2006 Elsevier Ltd. All rights reserved.

Keywords: Digital elevation model; Topography; Sill; Priority queue

1. Introduction

Inspection of any large-scale bathymetric map ofthe ocean (e.g. Fig. 1) reveals that the seafloor ischaracterized by deep basins separated by shallowertopography. For example, the Mediterranean isseparated from the North Atlantic by the African

e front matter r 2006 Elsevier Ltd. All rights reserved

geo.2006.10.003

ing author. Tel.: +55 21 3865 8159;

0 3495.

esses: [email protected] (T.D. Otto),

bia.edu (A.M. Thurnherr).

and European continents, with the Strait ofGibraltar providing the sole pathway for exchangeof water (Fig. 1). Similarly, the deep basins of thewestern and eastern Atlantic are separated by theMid-Atlantic Ridge (MAR). In regions of complextopography, such as on the MAR, the separation ofthe seafloor into individual basins occurs oncomparatively small scales (e.g. Fig. 2).

The deepest point along a topographic barrierseparating two deeper regions is usually a saddlepoint, and is called a sill. In order to answer avariety of oceanographic questions the locations

.

Page 2: Efficient algorithms for finding sills in digital topographic maps

ARTICLE IN PRESST.D. Otto, A.M. Thurnherr / Computers & Geosciences 33 (2007) 678–684 679

and depths of the sills connecting deep basins mustbe determined. For example, in the absence of deepconvection, the bottom-water properties in a deepbasin are limited by the properties of the densestinflowing water (e.g. Saunders and Francis, 1985).In case of multiple possible pathways between tworegions, the densest inflow tends to take place acrossthe deepest sill, where the flow is often hydraulicallycontrolled (e.g. Whitehead, 1998). Therefore, the

20°S

21°S

22°S

23°S

15°W 14°W 13°W 12

C

Z

Depth [m]

Fig. 2. Topography near Rio de Janeiro Fracture Zone on MAR, from

A14 station symbols along 9�W are proportional to mean oxygen con

western-basin water that has crossed the ridge (Mercier et al., 2000). Lo

labeled; sill depths are 3360m (X), 3365m (Y), 3435m (Z), and 3460m

60°N

50°N

40°N

30°N

20°N

10°N

10°S

20°S

30°S

40°S

50°S

60°S90°W 60°W 30°W 30°E

Fig. 1. Topography of Atlantic Ocean, from Smith and Sandwell

(1997) global data set. Contour interval is 1000m, shading

increases with depth. Box centered near 20�S, 10�W indicates

location of Rio de Janeiro Fracture Zone (Fig. 2).

shallowest sill in the deepest passage between twobasins is called the controlling sill (CS). Because ofits controlling aspect, a CS is often the ideal locationfor monitoring transport and hydrographic change(e.g. Hogg and Zenk, 1997).

As an example, consider the southern MARbetween 20�S and 23�S (Fig. 2): in their analysis ofdata from the WOCE A14 section along 9�W,Mercieret al. (2000) describe signatures of western-basin waterin the eastern South Atlantic near 22�S. Because of itsproximity, they infer that the Rio de Janeiro FractureZone is the most likely pathway for deep-waterexchange across the MAR in this region. Visualinspection of the regional topography indicates severalsills with similar saddle depths (labeled ‘‘X’’, ‘‘Y’’, ‘‘Z’’,and ‘‘C’’ in the figure). Traditionally, sill locations anddepths are estimated visually from bathymetric charts.In addition to being tedious, the visual method isprone to uncertainties and errors. Computers, on theother hand, are ideally suited for this task, becausetopographic data are usually available as digitalhx; y; zi grids (digital elevation models, or DEMs).The number of data points in DEMs and, conse-quently, the number of possible pathways between twogiven locations can be large; the near-global topogra-phy of Smith and Sandwell (1997), for example,consists of nearly 7� 107 elevation data. (Since thespace requirement of a DEM grows quadratically withresolution, future data sets are expected to be muchlarger than that.) Therefore, it is important that sill-finding algorithms are efficient in terms of CPU andmemory requirements.

11°W°W 10°W 9°W

X

Y

Smith and Sandwell (1997) global data set. Diameters of WOCE

centration below 3000m, with high values near 22�S indicating

cations of four sills in deep passages across ridge are circled and

(C).

Page 3: Efficient algorithms for finding sills in digital topographic maps

ARTICLE IN PRESST.D. Otto, A.M. Thurnherr / Computers & Geosciences 33 (2007) 678–684680

In the computing literature, a solution to aproblem is called efficient if the runtime is poly-nomial in the size of the input (e.g. Cobham, 1964;Edmonds, 1965). An alternative definition is givenby Cormen et al. (2001): an algorithm is consideredefficient if it correctly solves the problem using aminimal amount of processing steps or runtime.Below, we describe two efficient algorithms forsolving the sill-finding problem in linear and log-linear time, respectively. In practical terms, ouralgorithms determine the location and depth of theCS for exchange between the eastern and westernSouth Atlantic between 201S and 231S (sill ‘‘C’’ inFig. 2) in a few seconds on an average workstation.Our reference implementation, which is compatiblewith the DEM format used by the GMT softwarepackage (http://gmt.soest.hawaii.edu) isavailable at www.dbbm.fiocruz.br/sill.

This paper is organized as follows: In Section 2two algorithms for solving the sill problem aredescribed. The first one (Section 2.2) approximatesthe sill depth in linear time to arbitrary precision.The second algorithm (Section 2.3) uses a priorityqueue to determine the exact height and the locationof the CS in log-linear time. The run-time andmemory requirements of the two algorithms appliedto realistically sized DEMs are investigated inSection 3. This is followed by a discussion of themain results (Section 4).

2. Finding Controlling Sills

2.1. Notation

Let E be a DEM with k ¼ n�m elements(e0 . . . ek 2 E). In addition to a geographic location,each element is associated with an elevation zðeiÞ.Furthermore, each element has eight neighbors,unless it lies along the edge of the map. Given startand target points s; t 2 E, as well as an elevation z,pðs; t; zÞ � E denotes an arbitrary continuous pathbetween s and t that is unblocked at level z (i.e.8ei 2 pðs; t; zÞ : zðeiÞpz). With these definitions theblocking depth between the points s and t can bewritten as

zbðs; tÞ ¼ max½z : )pðs; t; zÞ�. (1)

2.2. Approximation algorithm

This first algorithm presented here approximatesthe blocking depth zbðs; tÞ to arbitrary precision � byconsecutively shrinking the interval ½z� . . . zþ�,

where z�pzbðs; tÞpzþ, until zþ � z�p�. The algo-rithm can be initialized in linear time (OðkÞ) bysetting z� ¼ zmin and zþ ¼ zmax, where zmin and zmax

are the lowest and highest points in the DEM,respectively. An efficient way to decrease theinterval size consists in letting zmid ¼ ðz� þ zþÞ=2become the new lower bound z� if )pðs; t; zmid Þ or thenew upper bound zþ, otherwise. To check for theexistence of an unblocked path at level z, the mapcan be traversed recursively with the algorithmshown in Fig. 3.

After initializing all fields of the DEM higherthan z as visited (and all others as not visited),which can be done in OðkÞ steps, the return value ofthe function points_are_connected(s,t) in-dicates the existence of an unblocked path from s tot at level z. The runtime of the function point-s_are_connected() is linear in the number ofelements of E, because each field of the DEM has tobe visited at most once. If Dz ¼ zmax � zmin, theinterval must be halved logDz=� times (base-2logarithm). Since both Dz and � are constants theoverall runtime of the approximation algorithm isOðkÞ.

In addition to the memory required to store theDEM, the approximation algorithm requires a‘‘visited’’ flag for each field (a minimum of k bits).In the worst case the stack size required for therecursive traversals is proportional to k. In total upto 3 k space is needed.

2.3. The spill algorithm

For some oceanographic problems it is sufficientto determine the blocking depth between twolocations (e.g. Thurnherr et al., 2005), but in manyother cases the locations of the CS are of interest aswell. In order to determine the CS locations as wellas their depths, a second algorithm, called spill, isintroduced here. It mimics the flooding of terrain:starting at point s, all points that are reachable viaunblocked paths at level zðsÞ (the current ‘‘waterlevel’’) are marked as visited using a recursive-traversal algorithm similar to the one shown in Fig.3. Whenever a point that is to be visited is at thesame level or higher than zðsÞ, it is stored in a list(containing the ‘‘current coastline’’) for later pro-cessing. When there are no neighbors left that canbe reached at level zðsÞ (when the local depressioncontaining s is flooded to zðsÞ), s is replaced by thelowest point from the ‘‘current coastline’’ list andthe algorithm is re-started from that position. Note

Page 4: Efficient algorithms for finding sills in digital topographic maps

ARTICLE IN PRESS

Fig. 3. Recursive algorithm to determine whether points s and t are connected; see text for additional details.

T.D. Otto, A.M. Thurnherr / Computers & Geosciences 33 (2007) 678–684 681

that this implies an increase in the ‘‘water level’’. Ifthe new s happens to be a topographic saddle pointconnecting the already flooded terrain to deepertopography that has not yet been flooded, therecursive algorithm will automatically fill the‘‘downstream’’ basin. This process is repeated untiltarget point t is reached, in which case the last pointthat was removed from the ‘‘current coastline’’ is aCS. Three special cases are possible:

The starting point s is returned as a CS: Thishappens if there is an unblocked connectionbetween s and t at level zðsÞ, implying that zðtÞozðsÞ.

The target point t is returned as a CS: Thishappens if zðtÞ is greater than the height of the lastspill point, indicating that there is an unblockedpath between s and t at level zðtÞ.

No CS can be found: This can only happen if allpaths between s and t are blocked by undefinedvalues in the DEM.

In the spill algorithm, each point in the DEMhas to be visited at most once, requiring OðkÞ steps.Each visited point may have to be added to the‘‘current coastline’’ list, which therefore contains atmost k elements. The sequence of insertion andremoval operations is random, i.e. if the ‘‘currentcoastline’’ were to be organized as a linear list,

either insertion (in case of a sorted list) or removalof the element with the smallest elevation (in case ofan unsorted list) would require OðkÞ operations,yielding a total run-time requirement for spill ofOðk2Þ.

A more efficient solution is possible when apriority queue (Floyd, 1964) is used to store the‘‘current coastline.’’ The priority queue is a specialcase of a binary tree (Williams, 1964), where the keyvalues (here: elevations) of a node’s children aregreater or equal to that of the node itself (Fig. 4).Given k nodes in the tree, the path length betweenthe root and any of the leaves isp log k. Along eachsuch path the key values increase monotonically. Asa consequence, the root always holds the elementwith the smallest key (here: the lowest point in the‘‘current coastline’’). After removing the root nodefrom the queue, the last (in a breadth-first sense)leaf of the tree is moved to the root and the tree is‘‘heapified’’ as follows: unless the key value of thenew node is smaller than that of both its children, itsposition is exchanged with the smaller of thechildren. This operation is repeated recursively untilthe correct position for the new node is found (Fig.4b), requiring Oðlog kÞ operations. In order to inserta new element into the tree, a new leaf is allocated in

Page 5: Efficient algorithms for finding sills in digital topographic maps

ARTICLE IN PRESS

Table 1

Test DEMs, sorted according to size

DEM Starting point (s) zðsÞ Target point (t) zðtÞ Position of sill/zCS Size (k)

RdJa 19W/19S �4586 8.7W/24S �4343 12.9W/22.2S 6:5� 104

�3454

Rainbowb 33.6W/36.4N �2452 34W/36.1N �2548 33.7W/36.4N 1:3� 105

�2452

AMARc 34.6W/35.6N �2666 33.3W/36.6N �3082 33.5W/36.6N 2:1� 106

�2091

NATLd 66W/20N �7534 20E/35N �3233 6.1W/35.8N 8:9� 106

�337

z is elevation in meters.aFig. 2 of this paper.bFig. 2 of Thurnherr and Richards (2001).cFig. 1 of Thurnherr et al. (2002).dFig. 1 of this paper.

a b c

Fig. 4. Illustration of priority queue; key values are elevations of DEM points in ‘‘current coastline’’; see text for additional details.

(a) Initial tree, ‘‘current coastline’’. (b) After removal of root element (elevation 1), replacement of root by DEM point with elevation 9

(last node in tree), and ‘‘heapification’’ of tree. (c) After insertion of a new DEM point with elevation 3: first, new element is inserted at last

position in the tree and then the tree is ‘‘heapified’’, resulting in new element ending up in the root, because it is associated with smallest

elevation.

T.D. Otto, A.M. Thurnherr / Computers & Geosciences 33 (2007) 678–684682

a breadth-first sense, ensuring that the depth of thetree remains bounded by log k. The tree is ‘‘heapi-fied’’ by re-sorting the path between the new leafand the root, which again requires Oðlog kÞ opera-tions (Fig. 4c).

When the ‘‘current coastline’’ is implemented as apriority queue, the spill algorithm requiresOðk log kÞ operations. In principle it is possible thatthe CS is not unique, implying that not allunblocked paths between s and t pass through asingle point. In order to find additional CSs at thesame level, zðCSÞ is increased by an arbitraryamount. When the spill algorithm is re-started withthe modified DEM it will find an alternative sill atzðCSÞ if one exists. Because it is known, in this case,that there is no sill below zðCSÞ, zðsÞ can be set tozðCSÞ. While this improves performance in practice,the worst case scenario does not change, i.e. thealgorithm still requires Oðk log kÞ operations.

In addition to the memory required to store theDEM, the spill algorithm requires a ‘‘visited’’

flag for each field (k bits). In the worst case the stacksize required for the recursive traversal of the DEMand for the recursive tree operations are propor-tional to k þ log k. The overall memory requirementis therefore up to 5k.

3. Oceanographic examples

In order to test the run-time behavior of the twoalgorithms introduced in Sections 2.2 and 2.3 theywere applied to four different DEMs of oceanictopography (Table 1). For comparison, the ‘‘currentcoastline’’ data structure of the spill algorithmwas implemented both as a linear list (Oðk2

Þ

runtime) and as a priority queue (Oðk log kÞ

runtime). The algorithms were tested on a PentiumIII 1GHz processor with 512MB of RAM runningthe Linux operating system, resulting in theruntimes shown in Table 2. Of particular interestis the fact that spill is faster in practice than theapproximation algorithm, in spite of its worse

Page 6: Efficient algorithms for finding sills in digital topographic maps

ARTICLE IN PRESS

Table 2

Runtime in seconds required for finding first CS on a typical workstation; see text for description of algorithms

DEM Approximation algorithm spill Oðk2Þ spill Oðk log kÞ

� ¼ 10 � ¼ 0:001

s2t s2t s2t

RdJ 2 13 8 15 6 4 (3255) 2 (1692)

Rainbow 6 14 4 28 10 1 (2509) o1 (1506)

AMAR 57 136 53 5021 239 21 (14902) 10 (6994)

NATL 1333 2793 755 42h 2988 535 (82618) 27 (31466)

� is precision of the approximation algorithm (Section 2.2), s2t indicates exchanged start and target points; maximum number of points

in ‘‘current coastline’’ data structure is given in parenthesis in rightmost two columns.

T.D. Otto, A.M. Thurnherr / Computers & Geosciences 33 (2007) 678–684 683

theoretical worst case behavior (Oðk log kÞ vs. OðkÞ).There is a clear relationship between runtime andmapsize, but the topology of the DEM alsoinfluences the results.

4. Discussion

In order to determine the deepest unblockedconnection between two deep regions separated byshallower topography automatically and efficiently,we implemented and tested two different algo-rithms. The first approximates the sill depth inOðkÞ steps by successive shrinking of an intervalbracketing the correct depth, while the second onemimics the flooding of terrain, which requiresOðk log kÞ steps. To our best knowledge, these arethe first published algorithms solving the sillproblem, which is important in physical oceano-graphy and, presumably, in some other fields aswell. In principle, it is possible to further decreasethe run-time requirement of the spill algorithmby using a Oðlog log kÞ-implementation of thepriority queue (van Emde Boas et al., 1977).Because this implementation is restricted to integerkeys and because our current implementation is fastenough for realistic problems, we have not imple-mented this possibility.

In spite of the fact that the theoretical runtime ofthe approximation algorithm grows slower thanthat of the spill algorithm, the latter is faster inpractice for the maps considered here (up to � 107

points in the DEM). The primary reason for this isthe large magnitude of the constant factor Dz=� (theratio between the depth span of the DEM and thetarget precision) that is hidden in the O-term. Asecond reason for the observation that spillperforms better than the approximation algorithm

in our examples is the fact that spill does notapproach the worst case scenario. While the size ofthe ‘‘current coastline’’ data structure of the spillalgorithm is OðkÞ topological considerations suggestthat its size should be much smaller than k inpractice. This is confirmed in the case of our tests,which show that the maximum number of elementsin the ‘‘current coastline’’ structure typically liesbetween 1% and 5% of k, see Table 2. The practicalruntimes of both algorithms are dependent on theheight of the starting point zðsÞ. Elements of theDEM that are deeper than the starting point do notenter in the time-consuming parts of the algorithms(they are marked as visited before starting theapproximation algorithm and they cannot enter the‘‘current coastline’’ data structure of spill).Therefore, it is generally advantageous to swap thestarting and target points if the original startingpoint is lower than the original target.

If the size of the DEMs is further increased (e.g.due to higher spatial resolution of future data sets),the approximation algorithm will eventually becomefaster than spill. Therefore, for very largeproblems it may be advantageous to first find theheight of the CS using the approximation algorithmand using zðCSÞ to initialize spill, in which casethe runtime is reduced to Oðk log cÞ, where c is thenumber of points at the same height as the CS.

References

Cobham, A., 1964. The intrinsic computational difficulty of

functions. In: Proceedings of the 1964 , Congress for Logic,

Mathematics and the Philosophy of Science. Amsterdam,

North Holland, pp. 24–30.

Cormen, T.T., Leiserson, C.E., Rivest, R.L., 2001. Introduction

to Algorithms, second ed. MIT Press, Cambridge, MA,

1180pp.

Page 7: Efficient algorithms for finding sills in digital topographic maps

ARTICLE IN PRESST.D. Otto, A.M. Thurnherr / Computers & Geosciences 33 (2007) 678–684684

Edmonds, J., 1965. Paths, trees and flowers. Canadian Journal of

Mathematics 17, 449–467.

Floyd, R.W., 1964. Algorithm 245 (treesort). Communications of

the ACM 7, 701.

Hogg, N.G., Zenk, W., 1997. Long period changes in the bottom

water flowing through Vema Channel. Journal of Geophysi-

cal Research 102, 15639–15646.

Mercier, H., Weatherly, G.L., Arhan, M., 2000. Bottom water

through flows at the Rio de Janeiro and Rio Grande fracture

zones. Geophysical Research Letters 27, 1503–1506.

Saunders, P.M., Francis, T.J., 1985. The search for hydrothermal

sources on the Mid-Atlantic Ridge. Progress in Oceanography

14, 527–536.

Smith, W.H.F., Sandwell, D.T., 1997. Global seafloor topogra-

phy from satellite altimetry and ship depth soundings. Science

277, 1956–1962.

Thurnherr, A.M., Richards, K.J., 2001. Hydrography and high-

temperature heat flux of the Rainbow hydrothermal site

(36�140N, Mid-Atlantic Ridge). Journal of Geophysical

Research 106, 9411–9426.

Thurnherr, A.M., Richards, K.J., German, C.R., Lane-Serff,

G.F., Speer, K.G., 2002. Flow and mixing in the rift valley of

the Mid-Atlantic Ridge. Journal of Physical Oceanography

32, 1763–1778.

Thurnherr, A.M., St. Laurent, L.C., Speer, K.G., Toole, J.M.,

Ledwell, J.R., 2005. Mixing associated with sills in a canyon

on the Mid-Atlantic Ridge flank. Journal of Physical

Oceanography 35, 1370–1381.

van Emde Boas, P., Kaas, R., Zijlstra, E., 1977. Design and

implementation of an efficient priority queue. Mathematical

Systems Theory 10, 99–127.

Whitehead, J.A., 1998. Topographic control of oceanic flows in

deep passages and straits. Reviews of Geophysics 36,

423ff.

Williams, J.W.J., 1964. Algorithm 232 (heapsort). Communica-

tions of the ACM 7, 347–348.