14
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identifier 10.1109/ACCESS.2017.DOI Stability and Statistical Inferences in the Space of Topological Spatial Relationships PADRAIG CORCORAN, CHRISTOPHER B. JONES. School of Computer Science & Informatics, Cardiff University, Wales, UK. Corresponding author: Padraig Corcoran (e-mail: [email protected]). ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship, represents a fundamental research problem in many domains including Artificial Intelligence (AI) and Geographical Information Science (GIS). Real world data is generally finite and exhibits uncertainty. Therefore, when attempting to model topological relationships from such data it is useful to do so in a manner which is both stable and facilitates statistical inferences. Current models of the topological relationships do not exhibit either of these properties. We propose a novel model of topological relationships between objects in the Euclidean plane which encodes topological information regarding connected components and holes. Specifically, a representation of the persistent homology, known as a persistence scale space, is used. This representation forms a Banach space that is stable and, as a consequence of the fact that it obeys the strong law of large numbers and the central limit theorem, facilitates statistical inferences. The utility of this model is demonstrated through a number of experiments. INDEX TERMS Spatial Relationships, Topology, Stable, Statistical Inference I. INTRODUCTION There are many real world scenarios where one is required to model the spatial relationship between objects [1]. A meteo- rologist may wish to model the spatial relationship between different environmental variables toward understaning cli- mate change [2]. A city planner may wish to model the spa- tial relationship between noise pollution produced by heavy traffic and residential areas. Models of spatial relationships have also been employed when designing intelligent robotic systems [3]. There exist many models of spatial relationships and these models typically focus exclusively on modelling metric, order or topological properties of the relationship in question [4]. Metric properties of a spatial relationship relate to things such as distance, size and orientation. Order properties of a spatial relationship relate to things such as the partial and total order of objects as described by prepositions such as in front of, behind, above, and below [5]. Finally, topological properties of a spatial relationship relate to things which are invariant under continuous transformations of the ambient space such as a homotopy. An indicator of whether or not two objects intersect is an example of a topological property. In this article we use the term topological relation- ship when referring to the topological properties of a given spatial relationship. Real world data are generally finite and exhibit uncertainty. Therefore, when attempting to model topological relation- ships from such data it is useful to do so in a manner which is both stable and facilitates statistical inferences. Informally, a model is stable if a small change in the input data produces at most a small change in the resulting model [6]; a formal definition of Lipschitz stability, which is a type of stability, is provided in Appendix VI-A. Given the presence of data uncertainty, model stability is necessary for robustness. Statistics is an effective paradigm for making inferences given data regarding phenomena which cannot be directly observed. A model which facilitates statistical inferences with respect to topological relationships is useful in many contexts. For example, consider a sensor network with a small number of sensors where each is constantly moving and capable of detecting the presence or absence of different objects whose locations remain constant over time. Given the small size of such a network, at a given time, object locations and in turn topological relationships cannot be precisely modelled from sensor measurements. A solution to this problem would be to perform a statistical inference whereby the topological relationships in question VOLUME 4, 2016 1

Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.

Digital Object Identifier 10.1109/ACCESS.2017.DOI

Stability and Statistical Inferences in theSpace of Topological SpatialRelationshipsPADRAIG CORCORAN, CHRISTOPHER B. JONES.School of Computer Science & Informatics, Cardiff University, Wales, UK.

Corresponding author: Padraig Corcoran (e-mail: [email protected]).

ABSTRACT Modelling topological properties of the spatial relationship between objects, known as thetopological relationship, represents a fundamental research problem in many domains including ArtificialIntelligence (AI) and Geographical Information Science (GIS). Real world data is generally finite andexhibits uncertainty. Therefore, when attempting to model topological relationships from such data it isuseful to do so in a manner which is both stable and facilitates statistical inferences. Current modelsof the topological relationships do not exhibit either of these properties. We propose a novel model oftopological relationships between objects in the Euclidean plane which encodes topological informationregarding connected components and holes. Specifically, a representation of the persistent homology, knownas a persistence scale space, is used. This representation forms a Banach space that is stable and, as aconsequence of the fact that it obeys the strong law of large numbers and the central limit theorem, facilitatesstatistical inferences. The utility of this model is demonstrated through a number of experiments.

INDEX TERMS Spatial Relationships, Topology, Stable, Statistical Inference

I. INTRODUCTION

There are many real world scenarios where one is required tomodel the spatial relationship between objects [1]. A meteo-rologist may wish to model the spatial relationship betweendifferent environmental variables toward understaning cli-mate change [2]. A city planner may wish to model the spa-tial relationship between noise pollution produced by heavytraffic and residential areas. Models of spatial relationshipshave also been employed when designing intelligent roboticsystems [3]. There exist many models of spatial relationshipsand these models typically focus exclusively on modellingmetric, order or topological properties of the relationshipin question [4]. Metric properties of a spatial relationshiprelate to things such as distance, size and orientation. Orderproperties of a spatial relationship relate to things such as thepartial and total order of objects as described by prepositionssuch as in front of, behind, above, and below [5]. Finally,topological properties of a spatial relationship relate to thingswhich are invariant under continuous transformations of theambient space such as a homotopy. An indicator of whetheror not two objects intersect is an example of a topologicalproperty. In this article we use the term topological relation-ship when referring to the topological properties of a given

spatial relationship.

Real world data are generally finite and exhibit uncertainty.Therefore, when attempting to model topological relation-ships from such data it is useful to do so in a mannerwhich is both stable and facilitates statistical inferences.Informally, a model is stable if a small change in the inputdata produces at most a small change in the resulting model[6]; a formal definition of Lipschitz stability, which is atype of stability, is provided in Appendix VI-A. Given thepresence of data uncertainty, model stability is necessary forrobustness. Statistics is an effective paradigm for makinginferences given data regarding phenomena which cannotbe directly observed. A model which facilitates statisticalinferences with respect to topological relationships is usefulin many contexts. For example, consider a sensor networkwith a small number of sensors where each is constantlymoving and capable of detecting the presence or absenceof different objects whose locations remain constant overtime. Given the small size of such a network, at a giventime, object locations and in turn topological relationshipscannot be precisely modelled from sensor measurements. Asolution to this problem would be to perform a statisticalinference whereby the topological relationships in question

VOLUME 4, 2016 1

Page 2: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

are modelled at n distinct time steps, where each of these nmodels is considered to be an independent sample from thesampling distribution of the model, and the expected value ofthese models is approximated. Such an approximation couldbe made using the sample mean if the model in questionexhibited the statistical property of the strong law of largenumbers where by the sample mean approaches the expectedvalue as the number of samples increases. Current models ofthe topological relationships are not stable and do not exhibitthose statistical properties necessary for performing manyuseful statistical inferences. Proposing a model of topologicalrelationships which overcomes these limitations representsthe theme of this article.

In this article we consider the problem of modelling topo-logical relationships between objects in the ambient space R2

given a finite set of points S in that space and the ability todetect the presence or absence of different objects at each ofthese points. The finite size of S results in uncertainty withrespect to object locations and the degree of this uncertaintyis a function of the size of S. This is equivalent to the sensornetwork problem discussed above where S corresponds tothe set of sensors. We propose a novel model of topolog-ical relationships between objects which encodes topologi-cal information regarding connected components and holes.Specifically, a representation of the persistent homology,known as a persistence scale space, is used. This modelforms a Banach space which is stable whereby a smallchange in the set S, as measured by the Hausdorff distance,produces at most a small change in the resulting model. Thismodel also exhibits the following statistical properties whichfacilitate statistical inferences. It exhibits the strong law oflarge numbers described above. It also exhibits the centrallimit theorem whereby the distribution of the sample meanconverges to a normal distribution centred at the expectedvalue as the number of samples increases.

The layout of this article is as follows. In section IIwe review important works on modelling topological rela-tionships. In section III the proposed model of topologicalrelationships is presented. In section IV the accuracy andutility of this model is demonstrated through a number ofexperiments. Finally in section V conclusions from this workand possible future research directions are discussed.

II. RELATED WORKSThere exist many models of topological relationships withtwo of the most cited by a significant margin being the Inter-section Model (IM) [7] and the Region Connection Calculus(RCC) [8]. These models assume object locations are knownprecisely and these are modelled as subsets of R2. In theIM, a number of subsets of the ambient space are consideredwhere each is a binary set relation of object interiors, bound-aries and exteriors. For example, one subset considered isthe intersection of object interiors. Having determined thesesubsets the spatial relationship in question is modelled byevaluating whether or not these subsets equal the null setor their dimension. The success of the IM can be attributed

in part to the fact that it models important topological fea-tures in a simple and interpretable manner. In fact, manyinstances of the model correspond to spatial relationshipswhich can be described using the natural language termssuch as contains or disjoint; here a natural language termis an English language description which does not refer tobinary set relations and mathematical topology concepts. Forexample, if the intersection of the interiors and boundaries oftwo objects is the null set the spatial relationship in questioncan be described using the natural language term disjoint.The IM is described in greater detail in Appendix VI-A wherewe also prove this model to be unstable. The RCC modelstopological relationships between objects as one of five oreight topological relationships such as disconnected (DC)and externally connected (EC). The IM and RCC modelsare in some sense equivalent; after accounting for physicalconstraints, there are exactly eight feasible instances of theIM and these correspond to the five or eight RCC topologicalrelationships [9]. In their original form, both the RCC andIM assume the objects in question equal single connectedcomponents. A number of generalisations of the IM and RCChave been proposed which consider objects equalling one ormore connected components [10].

As stated previously, when attempting to model topolog-ical relationships from data which exhibits uncertainty it isimportant to do so in a manner which is both stable andfacilitates statistical inferences. Worboys [11] defined thefollowing five factors which cause uncertainty in spatial data.Incompleteness due to lack of information; Inconsistencyarising from conflicts in information; Vagueness resultingfrom objects not having crisp or sharp boundaries; Impre-cision resulting from limits in the resolution at which mea-surements are made or stored; Errors which are a conse-quence of deviation from true values. If a spatial relationshipis represented using natural language terms this may alsoresult in uncertainty; for example, if a spatial relationship isrepresented using the terms near or far there is uncertaintywith respect to the distance between the objects in question[12]. These latter terms correspond to vague qualifications ofthe topological relationship of disjoint.

Many models of topological relationships have been pro-posed which model uncertainty. Generally this is achievedby generalising the IM or RCC models in some way. Forexample [13]–[15] generalise these models using fuzzy settheory to offer robustness to vagueness. A number of gen-eralisations of the IM and RCC have been proposed whichmodel spatial uncertainty with respect to object locations.Tøssebro et al. [16] proposed a probabilistic model where aprobability distribution is defined over the true location ofeach object. Clementini et al. [17]–[19] proposed to modelobjects using broad boundaries where an object boundaryis represented by a region corresponding to all its possiblelocations. Bejaoui et al. [20] proposed to model each objectusing two components; one corresponding to the minimaland one corresponding to the maximal possible extent ofthe object. In each of these works the authors generalised

2 VOLUME 4, 2016

Page 3: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

(a) (b) (c) (d)

FIGURE 1: Objects A and B are illustrated in (a) using the colours red and blue respectively. For a set S containing 6,000points, the subset of the set of points contained in object B are illustrated in (b). The subsets of the points S contained in theintersection and union of A and B are illustrated in (c) and (d) respectively.

the IM to model topological relationships between objectsmodelled in the manner in question. Cohn et al. [21] proposeda similar ‘egg-yolk’ model for objects and generalised theRCC model [8] to model topological relationships betweenobjects modelled in this manner.

Although the models described above model spatial un-certainty with respect to object locations, they are not neces-sarily stable and do not consider the problem of performingstatistical inferences. This does not mean that these modelscould not potentially be generalised to perform such infer-ences. For example, the metric refinements proposed by [22]could be used to measure the degree to which a property of atopological relationship exists.

A number of generalisations of the IM and RCC have beenproposed which model topological relationships between ob-jects whose locations, and in turn topological relationships,change as a function of time [23]–[25]. We do not considerthis aspect of modelling topological relationships.

III. MODEL OF TOPOLOGICAL RELATIONSHIPSIn this article we consider the following instance of theproblem of modelling topological relationships. We assumethe existence of two objectsA andB in the ambient space R2

for which we wish to model the corresponding topologicalrelationship. Furthermore, we assume the maximum of thelength and width of the axis aligned minimum bounding boxcontaining both objects is equal to 10. The spatial locationsof the objects is unknown and cannot be directly observed.Instead we assume a finite set S of points contained in thebounding box is known. Furthermore, we assume the abilityto detect the presence or absence of the objects A and B ateach of those points and, in turn, determine those subsets ofS contained in A and B. This corresponds to performingrejection sampling of finite sets of points contained in Aand B. Rejection sampling is a commonly used method fordrawing samples from one space given the ability to drawsamples from another [26]. To illustrate, consider the objectsA and B illustrated in Figure 1(a) which form a runningexample in this article. For a given set S of size 6,000,the subset of S contained in the object B of Figure 1(a) isillustrated in Figure 1(b). The finite size of the set S resultsin uncertainty with respect to the locations of A and B. Thedegree of such uncertainty varies as a function of the size of

S. If the size of S is small the size of the subsets contained inA and B is also small and there is in turn a greater degree ofuncertainty with respect to the locations of these objects. Onthe other hand, if the size of S is larger these object locationscan be more precisely determined. Given this uncertainty,when attempting to model topological relationships fromsuch data it is important to do so in a manner which is bothstable and facilitates statistical inferences.

In this article we propose a model of topological rela-tionships which goes some way toward achieving the abovegoal. The construction of the proposed model consists ofthe following three steps. In the first step a number ofsubsets of the set S are considered where each is a binaryset relation of the objects A and B. This step is similarto the IM model although the subsets considered equal afinite number of points as opposed to subsets of R2 whichcontain an infinite number of points. Unlike the IM model,the proposed model does not model the spatial relationshipby evaluating whether or not these subsets equal the nullset or their dimension. Instead, in the second step of theproposed model, the persistent homology of each subset iscomputed to give a corresponding persistence diagram. Thisis a representation which describes the topology of the subsetin terms of the number of k dimensional holes it containsplus the range of scales across which these holes persist(note that a zero dimensional hole corresponds to a pathconnected component). In the third step, this information isin turn mapped to a function space representation known asa persistence scale space which forms a Banach space thatis stable, obeys the strong law of large numbers and thecentral limit theorem. These properties are a consequenceof the fact that this representation exploits the insight thatthose k dimensional holes which do not persist over a largerange of scales are not statistically significant and can beconsidered topological noise. Fasy et al. [27] presents aformal definition of statistical significance in the context ofpersistence diagrams. Given these properties, the proposedmodel represents a more suitable platform for performingstatistical inferences than existing models of topological re-lationships. Each of these steps is described in turn in thefollowing three subsections.

VOLUME 4, 2016 3

Page 4: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

A. SUBSETSWhen modelling the topological relationship between objectsA and B we do not wish to model global topological proper-ties such as the number of k dimensional holes both objectscontain. Instead, we wish to model topological propertiesof the spatial relationship in question. Toward this goal, wepropose to model the topological properties of subsets ofthe set S where each subset is a binary set relation of theobjects in question. This corresponds to sampling from thesubsets in question by performing rejection sampling. Forexample the intersection and union of those points containedin the objects of Figure 1(a) are illustrated in Figure 1(c) andFigure 1(d) respectively. The consideration of such subsetsis motivated by the insight that the topological propertiesof these subsets model distinct topological properties of thespatial relationship in question. For example, if that subsetcorresponding to the binary set relation of intersection con-tains zero connected components, the spatial relationship inquestion may be described using the natural language termdisjoint.

The specific subsets to consider depend on the topologicalproperties one wishes to model and in turn the problem onewishes to solve. For example, if one wishes to perform ageneral clustering of topological relationships, a reasonablesolution would be to consider a large number of subsetswhich capture a variety of topological properties and sub-sequently perform a clustering based on these properties. Onthe other hand, if one wishes to determine if a given topo-logical relationship equals that corresponding to the naturallanguage term intersect, a reasonable solution would be toconsider the single subset corresponding to the binary setrelation of intersection and determine whether or not thissubset contains zero 0 dimensional holes. Similarly, if onewishes to determine if a given topological relationship equalsthat corresponding to the natural language term contains, areasonable solution would be to consider the single subsetcorresponding to the binary set relation of exclusive or anddetermine whether or not this subset contains one 0 dimen-sional hole and one 1 dimensional hole. Determining neces-sary and sufficient conditions for instances of the proposedmodel to correspond to various natural language terms isbeyond the scope of this work.

B. PERSISTENCE DIAGRAMIn this step the persistent homology of each subset is com-puted to give a corresponding set of persistence diagrams.This section only briefly introduces the concept of persistenthomology and its computation. More technical details arecontained in Appendix VI-B.

Each persistence diagram is a representation which de-scribes the topology of the subset in terms of the number of kdimensional holes it contains plus the range of scales acrosswhich these holes persist. In this context scale corresponds tothe radius of a set of balls centred at each point in the subset.As one increases the value of this radius holes may bothappear and disappear. More formally, a persistence diagram

for k dimensional holes is a multiset of points (i,j) in thespace (i,j) ∈ R2, i ≤ j where a point (i,j) indicates thata k dimensional hole appeared at scale i and disappearedat scale j. The disappearance of a k dimensional hole maybe the consequence of its size becoming zero or it mergingwith another k dimensional hole. The persistence of thek dimensional hole in question is the value j − i. If a kdimensional hole appears at scale i but does not disappear,it is represented in the persistence diagram by a point (i,u)where u is a upper bound on the scale at which k dimensionalholes may disappear. Since we assume the maximum of thelength and width of the axis aligned minimum bounding boxcontaining both objects is equal to 10, we set the value of uequal to 7.6 for all k dimensional holes. This is a valid upperbound given the fact that the objects in question are scaledto be contained in a specified bounding box and a unionof balls centered at the points is considered (see AppendixVI-B for details). It is important to note that some authorsomit from persistence diagrams those points correspondingto k dimensional holes which do not disappear [28]. Inthe context of the current problem, it is important not toomit such points because doing so would remove the abilityto differentiate between a number of important cases. Thisincludes differentiating between a subset containing zero 0dimensional holes and a subset containing one 0 dimensionalholes.

The persistent homology computation is performed in twosteps. In the first step the subset in question is representedusing a combinatorial representation known as a filtration.The persistence diagrams are subsequently computed as afunction of this filtration. The technical details of this compu-tation are provided in Appendix VI-B. Since we assume theambient space is R2 it is only necessary to consider 0 and 1dimensional holes where a 0 dimensional hole corresponds toa path connected component. That is, for each subset speci-fied in section III-A two corresponding persistence diagramsare computed; one for 0 dimensional holes and one for 1dimensional holes.

The persistence diagram corresponding to the 0 dimen-sional holes of that subset in Figure 1(c) is illustrated inFigure 2(a). In this diagram the x-axis represents the scale atwhich 0 dimensional holes appear while the y-axis representsthe scale at which 0 dimensional holes disappear. Each pointin the persistence diagram is represented by a red pointabove the diagonal which is in turn represented by a blueline. Note that points lying closer to the diagonal have lowerpersistence. This persistence diagram contains two pointswhere the corresponding persistence values are relativelylarge. All other points in this persistence diagram lie closeto the diagonal and therefore their corresponding persistencevalues are relatively small (note that, there are multiple pointsin this category but they all have similar coordinates andtherefore appear to be a single point in the figure). Thesetwo significant points correspond to the two clusters in Figure1(c). Note that, the point at location (0,1) disappears when thecluster it corresponds to disappears when it merges with the

4 VOLUME 4, 2016

Page 5: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

other cluster following increase in the scale value. Similarly,the persistence diagram corresponding to the 1 dimensionalholes of that subset in Figure 1(d) is illustrated in Figure2(b). This persistence diagram contains one point where thecorresponding persistence value is relatively large. This pointcorresponds to the single large hole in Figure 1(d).

It is important to consider the persistence of k dimensionalholes for two reasons. Firstly, the number of k dimensionalholes and their persistence encodes important topologicalinformation regarding the subset in question and in turn thetopological relationship in question. Consider the persistencediagram in Figure 2(a) corresponding to the 0 dimensionalholes of that subset in Figure 1(c). This persistence diagramcontains two points where their persistence values are rela-tively large. This is clearly important information regardingthe topological relationship in question. It is important hereto recall that the IM model only encodes if subsets equal thenull set or their dimension. Secondly, the persistence diagramrepresentation is stable whereby a small change in a givensubset, as measured by the Hausdorff distance, produces atmost a small change in the resulting persistence diagram, asmeasured by the bottleneck distance [6], [29].

C. FUNCTION SPACE REPRESENTATIONAs stated in the introduction of this article, the goal ofthis research is the development of a model of topologicalrelationships which is both stable and facilitates statistical in-ferences. The bottleneck and Wasserstein distance functionsmay be used to compute the distance between two persistencediagrams [30]. Both these distance functions are stable withrespect to the locations of objects. However they do notprovide a way of computing a mean persistence diagram. Wetherefore convert each persistence diagram into an alternativefunction space representation which facilitates this. Note thata function space is a space where the objects in that space arefunctions [31].

Let D be the space of persistence diagrams. Let p = (b,d)denote a point in a persistence diagram F ∈ D and p = (d,b)denote its mirror image across the diagonal [32], [33]. Fur-thermore, let Ω be the space x = (x1,x2) ∈ R2, x2 ≥ x1.In this work we employ the map Φσ : D → L2(Ω) definedin Equation 1. Here L2(Ω) is a Banach Space consistingof real-valued functions on Ω [31]; that is, a vector spaceof real-valued functions on Ω which is equipped with anorm. The norm in question is the L2-norm which is denoted‖.‖2. Conceptually this map places a Gaussian function ateach point in the corresponding persistence diagram whilesuppressing those Gaussian functions which lie closer to thediagonal. The scale of the Gaussian functions in question isequal to the parameter σ which we set equal to the value0.3. Figures 2(c) and 2(d) illustrate the result of applying themap in Equation 1 to those persistence diagrams illustratedin Figures 2(a) and 2(b) respectively.

Although the persistence diagrams in Figures 2(a) and 2(b)encode the same information as the function space represen-tations of Figures 2(c) and 2(d) respectively, they are different

types of mathematical objects. A persistence diagram is amultiset of points while a function space representation is afunction.

Φσ(F ) : Ω→ R, x 7→ 1

4πσ

∑p∈F

e−‖x−p‖2

4σ − e−‖x−p‖2

(1)The map in Equation 1 was originally proposed by [32],

[33]. The authors proved that the function space producedby this map facilitates statistical inferences. Specifically theyproved the following three facts. This space is stable wherebya small change in the set S as measured by the Hausdorffdistance produces at most a small change in the resultingmodel. It obeys the strong law of large numbers; that is, asample mean converges almost surely to the expected value.Furthermore, this convergence obeys the central limit theo-rem; a reader unfamiliar with concepts relating to probabilityin a Banach space may consult [34] for details.

IV. EXPERIMENTSIn this section we present four experiments which demon-strate the accuracy of the proposed model of topologicalrelationships and its ability to facilitate a number of importantstatistical inferences. In a first experiment we demonstrate theaccuracy of the proposed model. In a second experiment wedemonstrate the model can correctly infer a topological rela-tionship given a set of samples from the sampling distributionof the model. In a third experiment we demonstrate the modelcan perform a statistical test with the null hypothesis thattwo topological relationships are equal against the alternativehypothesis that they are different. In a final experiment wedemonstrate the model can perform the data mining tasks ofclustering and retrieval of similar topological relationships.These four experiments are described in turn in the followingfour subsections.

A. MODEL ACCURACYThis section demonstrates the accuracy of the proposedmodel with respect to representing the topology of a givenset of points. This is achieved by considering sets of pointsfor which accurate ground truth persistence diagrams can beinferred and comparing the persistence diagram computed bythe proposed model to this ground truth.

Consider the set of points illustrated in Figure 3(a) whichequals 1,000 points uniformly sampled on a circle centredat the point (1,1) with radius equal to 1. The correspondingpersistence diagrams for 0 and 1 dimensional holes areillustrated in Figures 3(b) and 3(c) respectively. The persis-tence diagram for 0 dimensional holes accurately contains asingle significant point at the location (0,7.6) correspondingto the single significant connected component which neverdisappears. Here a point is determined significant if it doesnot lie close to the diagonal and, in turn, its persistencevalue is not close to zero. Note that, 7.6 is the value ofan upper bound described in Section III-B. The persistence

VOLUME 4, 2016 5

Page 6: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

(a) (b) (c) (d)

FIGURE 2: The persistence diagram corresponding to the 0 dimensional holes (connected components) of that subset in Figure1(c) is illustrated in (a). The persistence diagram corresponding to the 1 dimensional holes of that subset in Figure 1(d) isillustrated in (b). The function space representations of the persistence diagrams in (a) and (b) are illustrated in (c) and (d)respectively. In these figures the x-axis represents the scale at which k dimensional holes appear while the y-axis represents thescale at which k dimensional holes disappear.

diagram for 1 dimensional holes accurately contains a singlesignificant point at the location (0,1). The value of 1 in thesecond coordinate of this point accurately indicates that thehole formed by the circle disappears when the radius of theunion of balls centred at the points is greater than or equal tothe value 1.

Next, consider the set of points illustrated in Figure 4(a)which equals 2,000 points uniformly sampled on two cir-cles centred at the points (1,1) and (5,1) with radius equalto 1. The corresponding persistence diagrams for 0 and 1dimensional holes are illustrated in Figures 4(b) and 4(c)respectively. The persistence diagram for 0 dimensional holesaccurately contains two significant points at the locations(0,1) and (0,7.6) corresponding to the two significant con-nected components. The value of 1 in the second coordinateof the point (0,1) accurately indicates that the two connectedcomponents merge when the radius of balls centred at thepoints is greater than or equal to the value 1. The persistencediagram for 1 dimensional holes accurately contains twosignificant points at the location (0,1) corresponding to thetwo significant 1 dimensional holes of radius 1.

B. INFERRING EXPECTED VALUE

Consider the situation where one is attempting to model atopological relationship given a set of samples from the sam-pling distribution of the model. In this situation a reasonablesolution is to approximate the expected value of the modelfrom the samples. In the proposed model samples from thesampling distribution correspond to function space represen-tations. Owing to the fact that the proposed model obeysthe strong law of large numbers, the mean of such samplesconverges to its expected value. To illustrate this conceptof a mean consider the two function space representationsdisplayed in Figures 5(a) and 5(b). Note that both share asingle peak in the same location. The mean of these functionspace representations is displayed in Figures 5(c). In this the

height of the shared peak is maintained while the heights ofthe other peaks are suppressed.

To demonstrate that the proposed model facilitates theabove inference of estimating the expected value consideragain the topological relationship illustrated in Figure 1(a).For this relationship 10 independent sets S1, . . . , S10 ofpoints were sampled from the ambient space where each setcontains 5,000 points.

For each Si the persistence diagram Fi corresponding tothe 1-homology group of the union of points contained ineach object was computed. The mapping of Equation 1 wasin turn applied to each Fi to give the corresponding functionspace representations (Φσ(F1), . . . ,Φσ(F10)). Figures 6(a),6(b) and 6(c) illustrate three of these representations.

The function space representations (Φσ(F1), . . . ,Φσ(F10))correspond to samples from the sampling distribution ofthe model. Computing the mean of these samples reducesto applying the map of Equation 1 to the union of pointsin the persistence diagrams F1, . . . , F10 and normalizing;that is, 1

10Φσ(F1 ∪ · · · ∪ F10). The result of this mapping isillustrated in Figure 6(d). Owing to the fact that the proposedmodel obeys the strong law of large numbers, this mappingconverges to its expected value. Figure 6(d) implies that thereexists a single one dimensional hole where the correspondingpersistence value is relatively large. Examining the originaltopological relationship in Figure 1(a) demonstrates this tobe correct; that is, the union of both objects contains a singlelarge hole.

C. TWO-SAMPLE HYPOTHESIS TESTGiven the fact that our model forms a Banach space, thenorm in this space induces a metric. This metric may beused to measure the distance between samples from thesampling distributions of different topological relationships.[35] proposed a method for computing the distance betweentwo instances of the IM model. However this distance func-

6 VOLUME 4, 2016

Page 7: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

(a) (b) (c)

FIGURE 3: For the set of points illustrated in (a), the corresponding persistence diagrams for 0 and 1 dimensional holes areillustrated in (b) and (c) respectively.

(a) (b) (c)

FIGURE 4: For the set of points illustrated in (a), the corresponding persistence diagrams for 0 and 1 dimensional holes areillustrated in (b) and (c) respectively.

(a) (b) (c)

FIGURE 5: Two function space representations are displayed in (a) and (b). The mean of these is displayed in (c). In thesefigures the x-axis represents the scale at which k dimensional holes appear while the y-axis represents the scale at which kdimensional holes disappear.

VOLUME 4, 2016 7

Page 8: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

(a) (b) (c) (d)

FIGURE 6: The function space representations of three samples drawn from a topological relationship are illustrated in (a)-(c).The function space representation of the mean of ten such samples is illustrated in (d). In these figures the x-axis represents thescale at which 1 dimensional holes appear while the y-axis represents the scale at which 1 dimensional holes disappear.

tion returns integers in the interval [0,8] and therefore onlyprovides a very coarse measure of distance.

Given the distance between two samples from the sam-pling distributions of two distinct topological relationships,one can perform a statistical test with the null hypothesis thatthe topological relationships are equal against the alternativehypothesis that they are different. This is known as the two-sample problem [34], [36]. There exist many contexts forwhich it is necessary to perform such a statistical test. Forexample, given two distinct meteorological phenomena suchas two hurricanes, a meteorologist may want to perform ahypothesis test to determine if the topological relationshipbetween the spatial extent of the hurricane and that of someother potentially related environmental phenomenon was thesame in both cases.

In order to demonstrate such a statistical test we consideredthe topological relationships in Figures 7(a) and 7(b) and em-ployed the bootstrap hypothesis test [37]. These topologicalrelationships are different in the sense that the union of theobjects in Figure 7(a) forms a single connected componentwith a single 1 dimensional hole while the union of theobjects in Figure 7(b) forms a single connected componentwith zero 1 dimensional holes. Note that the IM model doesnot distinguish between these topological relationships.

Let Sa and Sb be single samples from the sampling dis-tributions of the topological relationships in Figures 7(a) and7(b) respectively where the cardinality of both sets is n. ForSa and Sb we computed the function space representationsof their corresponding 1-homology groups. Recall that, the1-homology group encodes the number of 1 dimensionalholes present plus their persistence values. This particularhomology group is appropriate for distinguishing betweenthe topological relationships in question but may not beappropriate in other contexts. Using the metric induced bythe norm of the function space representation, the distancebetween the function space representations in question wascomputed. Toward determining if this distance could be usedto accept the null hypothesis (i.e. that the relationships are

(a) (b)

FIGURE 7: A statistical test may be used to determine if thetopological relationships illustrated are equal given a samplefrom each topological relationship.

equal) we computed the bootstrap distribution under the nullhypothesis [37]. That is, we sampled with replacement twosets S1

a and S2b from Sa and computed the distance between

the corresponding function space representations. This stepwas repeated 1000 times to form the bootstrap distribution.The null hypothesis was then rejected with p-value equal tothe number of distances in the bootstrap distribution greaterthan the distance between Sa and Sb.

This hypothesis test was repeated for varying values ofn; that is, the cardinality of Sa and Sb. For n equal to500, 1000, 2000 and 4000, the null hypothesis was rejectedwith a p-value of 0.21, 0.03, 0.01 and 0.00 respectively.This result demonstrates that, given sufficient points in theambient space, the proposed topological relation may be usedto test the hypothesis that two topological relationships areequal.

D. DATA MINING

Given that the proposed model allows distances betweendifferent topological relationships to be computed, one canuse this model to perform a number of data mining tasks.In this section we describe how the proposed model may beused to perform clustering and retrieval of similar topologicalrelationships. There are many contexts where performingsuch tasks would be useful. For example, consider a pair of

8 VOLUME 4, 2016

Page 9: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

environmental variables measured daily over an entire year.Toward summarisation, a meteorologist may be interested indetecting clusters of topological relationships which occurreddaily between these variables. These clusters could in turnbe used to explain broad weather conditions which occurredduring the year in question.

Toward performing clustering and retrieval we generated200 sets S corresponding to distinct topological relationshipsusing the following approach. First a pair of simple polygonscorresponding to A and B as defined in section III wererandomly generated using the 2-opt Moves method of [38].These polygons each contained between four and ten pointsand were initially centred at the coordinates (0,0). The loca-tion of the polygon B was translated by adding a constantto the second coordinate of each of its points. Figure 8 dis-plays four pairs of polygons generated using this approach.For each of these pairs of polygons, a corresponding set Swas generated by uniformly sampling 6000 points from thebounding box containing A and B.

For each of the sets S generated the topological relation-ship in question was modelled using the proposed model.Specifically, we considered the binary set relations of union(A ∪ B), intersection (A ∩ B), symmetric difference orexclusive or ((A \ B) ∪ (B \ A)) and both relative com-plements ((A \B) and (B \A)). Next, for each of these fivesubsets we computed the function space representations oftheir corresponding 0- and 1-homology groups. This gave tenindividual function space representations for each topologi-cal relationship where each describes a different aspect of thetopological relationship in question. To combine these spacesinto a single space we formed the direct sum of the spaceswith the direct sum norm [39]. Using this norm we com-puted the pair-wise distances between the 200 topologicalrelationships and performed clustering using the k-medoidsalgorithm [40]. This algorithm is an iterative method whichtakes as input a single parameter k corresponding to thenumber of clusters and returns the cluster centres found. Herea cluster centre corresponds to a single representative, i.e. apair of objects, of an entire cluster.

Figure 8 displays the pairs of polygons corresponding tothe cluster centres found when the k-medoids algorithm wasrun with parameter k equal to 4. We can understand whyour model determined those topological relationships to besignificantly different and in turn belonging to different clus-ters by examining the corresponding persistence diagrams.The persistence diagrams corresponding to the connectedcomponents (i.e. the 0 dimensional holes) of the intersectionsubsets of the pairs of polygons in Figure 8(a) and 8(b)are illustrated in Figure 9(a) and 9(b) respectively. Thesepersistence diagrams respectively contain one and zero pointswhere the corresponding persistence values are relativelylarge. This is because the intersection of the pairs of polygonsin Figure 8(a) and Figure 8(b) contain one and zero connectedcomponents respectively.

The persistence diagrams corresponding to the 1 dimen-sional holes of the union subsets of the pairs of polygons in

Figure 8(c) and 8(d) are illustrated in Figure 9(c) and 9(d)respectively. These persistence diagrams respectively containone and zero points where the corresponding persistencevalues are relatively large. This is because the union of thepairs of polygons in Figure 8(c) and Figure 8(d) contain oneand zero holes respectively. Note that the hole in the union ofthe pair of polygons in Figure 8(c) appears once the scale ofthe radius of balls centred at each point is increased slightly.

The topological relationship in Figure 8(a) could be de-scribed as partial overlap. That in Figure 8(b) could bedescribed as disjoint. That in Figure 8(c) could be describedas partial overlap with the formation of a one dimensionalhole. Finally the topological relationship in Figure 8(d) couldbe described as partial overlap with each object splitting theother into two connected components.

Using the pair-wise distances between the 200 topologicalrelationships described above, we performed retrieval of sim-ilar topological relationships as follows. For a given querytopological relationship, that topological relationship withthe smallest distance to the query was retrieved. The top rowof Figure 10 displays four query topological relationshipswhile the bottom row displays the corresponding retrievedtopological relationships. It is evident that in each case thequery and retrieved topological relationships are similar. Thequery and retrieved topological relationships in Figures 10(a)and 10(e) respectively could both be described as partialoverlap. The query and retrieved topological relationships inFigures 10(b) and 10(f) respectively could both be describedas partial overlap with each object splitting the other into twoconnected components. The query and retrieved topologicalrelationships in Figures 10(c) and 10(g) respectively couldboth be described as disjoint. The query and retrieved topo-logical relationships in Figures 10(d) and 10(h) respectivelycould both be described as partial overlap with the formationof a one dimensional hole.

V. CONCLUSIONSThis article proposes a novel model of topological relation-ships which is stable and exhibits a number of propertieswhich facilitate statistical inferences. Existing models oftopological relationships are not stable and do not considerthe problem of performing statistical inferences. However,it must be noted that these models could potentially begeneralised.

The proposed model formulates the problem of modellingtopological relationships in terms of algebraic topology. Thisrepresents a novel formulation of the problem and conse-quently it presents many opportunities for further researchand development. In the experiments section of this articlewe only consider objects corresponding to simple polygons.Without any adjustments to the model it can be appliedto more general polygons such as polygons with multiplecomponents and polygons with holes. With a slight generali-sation, the model could also be applied to objects correspond-ing to lines and points. In this work we assume the ambientspace to be a subset of R2. However the model proposed

VOLUME 4, 2016 9

Page 10: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

(a) (b) (c) (d)

FIGURE 8: Four pairs of simple polygons A and B generated using the method described in section IV-D are displayed. Thesepairs correspond to the 4 cluster centres found using the k-medoids algorithm.

(a) (b) (c) (d)

FIGURE 9: The persistence diagrams corresponding to the connected components (0 dimensional holes) of the intersectionsubsets of the pairs of polygons in Figure 8(a) and 8(b) are illustrated in (a) and (b) respectively. The persistence diagramscorresponding to the 1 dimensional holes of the union subsets of the pairs of polygons in Figure 8(c) and 8(d) are illustrated inFigure (c) and (d) respectively. In these figures the x-axis represents the scale at which k dimensional holes appear while they-axis represents the scale at which k dimensional holes disappear.

generalises to higher dimensional real coordinate spaces andmore abstract spaces which can be embedded in Rn.

The proposed model currently does not have any meansof generating natural language descriptions of topologicalrelationships in an automated manner. The current modeloutputs the topological relationships between pairs of objectsin the form of persistence diagrams and function space repre-sentations that record the degree of significance of connectedcomponents and holes. A challenge for future work is todevelop automated approaches to natural language summari-sation of the characteristics of the relationships such as thetype (e.g. containment or overlap) and degree of applicabilityof a relationship [41].

Although in this article we have only applied the proposedmodel to synthetic data it could also be applied to real worlddata such as that from sensor networks. In future work theauthors hope to pursue research on such applications.

VI. APPENDIXA. INTERSECTION MODELThis section describes the Intersection Model (IM) by [7]and presents some analysis of this model. The IM assumes

the existence of two objects A and B in the ambient spaceR2 for which we wish to model the corresponding topo-logical relationship. Furthermore, this model assumes objectlocations are known precisely and these are modelled assubsets of R2. A number of subsets of the ambient spaceare considered where each is a binary set relation of objectinteriors, boundaries and exteriors. The subsets in questionare defined in Equation 2 where Ao, Ae and ∂A equal theinterior, exterior and boundary of the set A respectively.Given the above subsets, the spatial relationship in questionis modelled by evaluating whether or not these subsets equalthe null set or their dimension. Egenhofer [7] presented an in-depth analysis of which instances of this model correspondto natural language terms. In this section we consider theversion of the IM which evaluates whether or not thesesubsets equal the null set. We refer to these Boolean valuedfunctions as the features of the IM.

10 VOLUME 4, 2016

Page 11: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

(a) (b) (c) (d)

(e) (f) (g) (h)

FIGURE 10: For each of the query topological relationships displayed in (a), (b), (c) and (d), the corresponding retrievedtopological relationship is displayed in (e), (f), (g) and (h) respectively.

Ao ∩Bo (2a)Ao ∩ ∂B (2b)Ao ∩Be (2c)∂A ∩Bo (2d)∂A ∩ ∂B (2e)∂A ∩Be (2f)Ae ∩Bo (2g)Ae ∩ ∂B (2h)Ae ∩Be (2i)

Let (X, dX) and (Y, dY ) be metric spaces where X andY are sets while dX and dY are metrics on these setsrespectively. A function f : X → Y is Lipschitz stablewith constant K if for all x1 and x2 in X the inequalityof Equation 3 is satisfied [6]. Broadly speaking, a functionis Lipschitz stable if a small change in the function inputproduces a small change in the function output.

dY (f(x1), f(x2)) ≤ KdX(x1, x2) (3)

We now set about proving the IM is not Lipschitz stable.LetX be the space of 2-tuples of subsets of R2. Let dX be themetric on X defined in Equation 4. Each of the terms beingsubtracted on the right side of this equality is a non-negativereal valued function and, given this, it is straight forward toprove that dX is a metric.

dX((A1, B1), (A2, B2)) =

| infa∈Ao1,b∈Bo1

‖a− b‖2 − infa∈Ao2,b∈Bo2

‖a− b‖2| (4)

Let Y be the space of Boolean values and let f be themapping from X to Y defined in Equation 5. Note that,infa∈Ao,b∈Bo‖a − b‖2 = 0 if and only if Ao ∩ Bo 6= ∅.Furthermore, infa∈Ao,b∈Bo‖a − b‖2 > 0 if and only ifAo ∩ Bo = ∅. Let dY be the discrete metric on Y defined inEquation 6. The mapping f in Equation 5 is Lipschitz stableif there exists a real constant K which satisfies the inequalitydefined in Equation 7 for all (A1, B1), (A2, B2) in X .

f : (A,B)→ Ao ∩Bo 6= ∅ (5)

dY (f((A1, B1)), f((A2, B2))) =0, if f((A1, B1)) = f((A2, B2))

1, otherwise

(6)

dY (f((A1, B1)), f((A2, B2))) ≤ KdX((A1, B1), (A2, B2))(7)

Theorem 1. The mapping f in Equation 5 is not Lipschitzstable.

Proof. We prove this theorem using proof by contradiction.Assume there exists a real constant K which satisfies theinequality defined in Equation 7 for all (A1, B1), (A2, B2)in X . Let (A1, B1) be two sets such that (Ao1 ∩ Bo1 6= ∅)

VOLUME 4, 2016 11

Page 12: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

and in turn that infa∈Ao1,b∈Bo1‖a − b‖2 = 0. Let (A2, B2)be two sets such that (Ao2 ∩ Bo2 = ∅) and in turn thatinfa∈Ao2,b∈Bo2‖a − b‖2 > 0. In this case Equation 7 canbe written as Equation 8a where the expressions on the leftand right sides of this inequality follow from evaluatingEquations 6 and 4 respectively. Equation 8a is simplified inEquations 8b and 8c where the first simplification followsfrom the fact that the term infa∈Ao2,b∈Bo2‖a− b‖2 is boundedbelow by 0.

1 ≤ K|(0− infa∈Ao2,b∈Bo2

‖a− b‖2| (8a)

1 ≤ K infa∈Ao2,b∈Bo2

‖a− b‖2 (8b)

1

K≤ infa∈Ao2,b∈Bo2

‖a− b‖2 (8c)

Equation 8c gives a lower bound for the term on the rightside of this inequality. One can construct an example wherethis term lies in the open interval (0,1/K) which in turn im-plies that Ao2 ∩Bo2 = ∅. This contradicts the assumption thatthere exists a real constant K which satisfies the inequalitydefined in Equation 7 for all (A1, B1), (A2, B2) in X .

Corollary 1. The Intersection Model (IM) is not Lipschitzstable.

Proof. The mapping f in Equation 5 is one of the features ofthe IM. This mapping was proven to not be Lipschitz stablein Theorem 1. Therefore, the IM is not Lipschitz stable.

B. PERSISTENT HOMOLOGYFor a given subset considered in section III-A the correspond-ing persistent homology computation in performed in twosteps. In the first step the subset in question is representedusing a combinatorial representation known as a filtration.The persistence diagrams are subsequently computed as afunction of this filtration. We now describe each of these stepsin turn in the following subsections.

1) FiltrationLet X be a given subset considered in section III-A forwhich we wish to compute the persistent homology. Let‖.‖ denote the Euclidean norm and for each x ∈ X letBr(x) = y ∈ Rn|‖x − y‖ < r for r ≥ 0; that is, aclosed ball of radius r centred at x. The r-neighbourhoodXr, as defined by Equation 9, represents an intuitive meansof representing the subset X . However this correspondsto an abstract mathematical representation of a continuousobject upon which computation is difficult. Therefore, onegenerally instead uses a combinatorial representation knownas a simplicial complex upon which computations may beperformed.

Xr =

m⋃i=1

Br(xi) (9)

An (abstract) simplicial complex K is a finite collectionof sets such that for each σ ∈ K all subsets of σ are alsocontained inK. Each element σ ∈ K is called a simplex or k-simplex where |σ| = k+1 and is referred to as the dimensionof the simplex. The faces of a simplex σ correspond to allsimplices τ where τ ⊂ σ. There exists a number of differentsimplicial complexes which may be used to represent a set ofpoints [30]. For the purposes of this work we use a specificsimplicial complex which is known has an alpha complex andis now described.

For x ∈ X let Vx be the Voronoi cell of x; that is, Vx =y ∈ Rn|‖y − x‖ ≤ ‖y − z‖, z ∈ X. Furthermore, letRx(r) be the intersection of each Voronoi cell with a ballcentred at the point in question; that is,Rx(r) = Br(x)∩Vx.The alpha complex is isomorphic to the nerve of this coverand is defined by Equation 10. It is homotopy equivalent tothe r-neighbourhood Xr where homotopy equivalence is anequivalence relation on the class of topological spaces (seechapter 7 in [42]).

Alpha(r) = σ ⊆ X|⋂y∈σ

Ry(r) 6= ∅ (10)

The parameter r in Equation 10 may be varied to givealpha complexes of different scales and in turn a 1-parameterfamily of nested alpha complexes. However only finitelymany of these are distinct and are described by the sequencein Equation 11 which is called a filtration [30].

∅ = K0 ⊆ K1 ⊆ · · · ⊆ Kl (11)

2) Persistent HomologyLet K be a simplicial complex. The formal sum c defined byEquation 12 is called a k-chain where each σi ∈ K is a k-simplex and each λi is an element of a given field. For thepurposes of this work we consider the field Z2 [43].

c =∑

λiσi (12)

The vector space of all k-chains is denoted Ck(K). Theboundary of a k-simplex σ = [v1, . . . , vk+1] is a sum ofits (k − 1)-dimensional faces and is defined by Equation13 where vi indicates the deletion of vi from the sequence.The boundary of a k-chain is obtained by extending this maplinearly.

∂kσ =

k+1∑i=1

[v1, . . . , vi, . . . , vk+1] (13)

A k-chain c is a k-boundary if there exists some k+1-chaind such that c = ∂d and a k-cycle if ∂c = 0. The set of allk-boundaries and k-cycles are denoted by Bk(K) and Zk(K)respectively. The fact that ∂k+1∂k = 0 implies thatBk(K) ⊆Zk(K). The quotient group Hk(K) = Zk(K)/Bk(K) is thek-homology group of K. Intuitively an element of the k-homology group corresponds to a k-dimensional hole in K.That is, an element of the 0-homology group corresponds to

12 VOLUME 4, 2016

Page 13: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

a path connected component in K while an element of the1-homology group corresponds to a one dimensional hole inK. The rank of Hk(K) is called the k-th Betti number and isdenoted βk(K).

For a given filtration, for every i ≤ j there exists an inclu-sion map from Ki to Kj and in turn an induced homomor-phism from Hk(Ki) to Hk(Kj) for each dimension k. Whenmoving fromKi toKi+1 the corresponding homology groupsmay change in the following two ways. A homology groupelement appears atKi+1 if it exists inHk(Ki+1) but does notexist inHk(Ki); that is βk(Ki+1) = βk(Ki)+1. A homologygroup element disappears at Ki+1 if it exists in Hk(Ki) butdoes not exist in Hk(Ki+1); that is βk(Ki+1) = βk(Ki)− 1.If an element of a homology group never disappears, weassign its disappearance to be at Ku where u is an upperbound. In most abstract mathematical publications a valueof∞ is used instead of an upper bound. However in terms ofalgorithm implementation it is not feasibly to consider sucha value.

If a homology group element appears at Ki and disappearsat Kj we represent it as a point (i,j) in the space (i,j) ∈R2, i ≤ j with corresponding persistence value of j − i.The magnitude of a persistence value is important becausehomology group elements which persist over a larger range ina given filtration are considered to be of greater significance.In the context of this work the range in question is overthe scale parameter r in the Alpha complex. The multisetof points corresponding to a k-homology group is called apersistence diagram. In this work the method described in[44] was used to compute all persistence diagrams.

REFERENCES[1] P. Corcoran, P. Mooney, and M. Bertolotto, “Spatial relations using high

level concepts,” ISPRS International Journal of Geo-Information, vol. 1,no. 3, pp. 333–350, 2012.

[2] R. W. Portmann, S. Solomon, and G. C. Hegerl, “Spatial and seasonalpatterns in climate change, temperatures, and precipitation across theunited states,” Proceedings of the National Academy of Sciences, vol. 106,no. 18, pp. 7324–7329, 2009.

[3] A. Boularias, F. Duvallet, J. Oh, and A. Stentz, “Grounding spatial rela-tions for outdoor robot navigation,” in IEEE international conference onrobotics and automation, 2015, pp. 1976–1982.

[4] M. J. Egenhofer and R. D. Franzosa, “Point-set topological spatial rela-tions,” International Journal of Geographical Information System, vol. 5,no. 2, pp. 161–174, 1991.

[5] D. Hernandez, Qualitative representation of spatial knowledge. SpringerScience & Business Media, 1994, vol. 804.

[6] D. Cohen-Steiner, H. Edelsbrunner, and J. Harer, “Stability of persistencediagrams,” Discrete & Computational Geometry, vol. 37, no. 1, pp. 103–120, 2007.

[7] M. J. Egenhofer, “Reasoning about binary topological relations,” in Sym-posium on Spatial Databases. Springer, 1991, pp. 141–160.

[8] D. A. Randell, Z. Cui, and A. G. Cohn, “A spatial logic based onregions and connection,” in Proceedings 3rd international conference onknowledge representation and reasoning, 1992.

[9] J. Chen, A. G. Cohn, D. Liu, S. Wang, J. Ouyang, and Q. Yu, “A survey ofqualitative spatial representations,” The Knowledge Engineering Review,vol. 30, no. 1, pp. 106–136, 2015.

[10] M. Schneider and T. Behr, “Topological relationships between complexspatial objects,” ACM Transactions on Database Systems (TODS), vol. 31,no. 1, pp. 39–81, 2006.

[11] M. Worboys, “Imprecision in finite resolution spatial data,” GeoInformat-ica, vol. 2, no. 3, pp. 257–279, 1998.

[12] X. Yao and J.-C. Thill, “Spatial queries with qualitative locations inspatial information systems,” Computers, environment and urban systems,vol. 30, no. 4, pp. 485–502, 2006.

[13] J. T. Bjørke, “Topological relations between fuzzy regions: derivation ofverbal terms,” Fuzzy sets and systems, vol. 141, no. 3, pp. 449–467, 2004.

[14] S. Schockaert, M. De Cock, and E. E. Kerre, “Spatial reasoning in a fuzzyregion connection calculus,” Artificial Intelligence, vol. 173, no. 2, pp.258–298, 2009.

[15] A. C. Carniel, M. Schneider, R. R. Ciferri, and C. D. de Aguiar Ciferri,“Modeling fuzzy topological predicates for fuzzy regions,” in Proceedingsof the 22nd ACM SIGSPATIAL International Conference on Advances inGeographic Information Systems, 2014, pp. 529–532.

[16] E. Tøssebro and M. Nygård, “An advanced discrete model for uncertainspatial data,” in International Conference on Web-Age Information Man-agement. Springer, 2002, pp. 37–51.

[17] E. Clementini and P. Di Felice, “Approximate topological relations,”International journal of approximate reasoning, vol. 16, no. 2, pp. 173–204, 1997.

[18] E. Clementini and P. Di Felice, “A spatial model for complex objects with abroad boundary supporting queries on uncertain data,” Data & KnowledgeEngineering, vol. 37, no. 3, pp. 285–305, 2001.

[19] E. Clementini, “A model for uncertain lines,” Journal of Visual Languages& Computing, vol. 16, no. 4, pp. 271–288, 2005.

[20] L. Bejaoui, F. Pinet, Y. Bédard, and M. Schneider, “Qualified topologicalrelations between spatial objects with possible vague shape,” InternationalJournal of Geographical Information Science, vol. 23, no. 7, pp. 877–921,2009.

[21] A. G. Cohn and N. M. Gotts, “The ‘egg-yolk’ representation of regionswith indeterminate boundaries,” Geographic objects with indeterminateboundaries, vol. 2, pp. 171–187, 1996.

[22] M. J. Egenhofer and M. P. Dube, “Topological relations from metricrefinements,” in Proceedings of the 17th ACM SIGSPATIAL InternationalConference on Advances in Geographic Information Systems, 2009, pp.158–167.

[23] H. Liu and M. Schneider, “Tracking continuous topological changes ofcomplex moving regions,” in Proceedings of the 2011 ACM Symposiumon Applied Computing, 2011, pp. 833–838.

[24] M.-H. Jeong and M. Duckham, “Decentralized querying of topologicalrelations between regions monitored by a coordinate-free geosensor net-work,” GeoInformatica, vol. 17, no. 4, pp. 669–696, 2013.

[25] P. Corcoran and C. B. Jones, “Modelling topological features of swarmbehaviour in space and time with persistence landscapes,” IEEE Access,vol. 5, pp. 18 534–18 544, 2017.

[26] D. J. MacKay, Information theory, inference and learning algorithms.Cambridge university press, 2003.

[27] B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, A. Singhet al., “Confidence sets for persistence diagrams,” The Annals of Statistics,vol. 42, no. 6, pp. 2301–2339, 2014.

[28] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman,S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier, “Persistenceimages: a stable vector representation of persistent homology,” Journal ofMachine Learning Research, vol. 18, no. 8, pp. 1–35, 2017.

[29] H. Edelsbrunner and D. Morozov, “Persistent homology: theory and prac-tice,” Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley,CA (US), Tech. Rep., 2012.

[30] H. Edelsbrunner and J. Harer, Computational topology: an introduction.American Mathematical Soc., 2010.

[31] O. Christensen, Functions, spaces, and expansions: mathematical tools inphysics and engineering. Springer Science & Business Media, 2010.

[32] J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt, “A stable multi-scalekernel for topological machine learning,” in Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2015, pp. 4741–4748.

[33] R. Kwitt, S. Huber, M. Niethammer, W. Lin, and U. Bauer, “Statisticaltopological data analysis-a kernel perspective,” in Advances in neuralinformation processing systems, 2015, pp. 3070–3078.

[34] P. Bubenik, “Statistical topological data analysis using persistence land-scapes.” Journal of Machine Learning Research, vol. 16, no. 1, pp. 77–102,2015.

[35] M. Egenhofer and K. Al-Taha, “Reasoning about gradual changes of topo-logical relationships,” Theories and methods of spatio-temporal reasoningin geographic space, pp. 196–219, 1992.

VOLUME 4, 2016 13

Page 14: Stability and Statistical Inferences in the Space of ...ABSTRACT Modelling topological properties of the spatial relationship between objects, known as the topological relationship,

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

[36] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “Akernel two-sample test,” Journal of Machine Learning Research, vol. 13,no. Mar, pp. 723–773, 2012.

[37] B. Efron and R. J. Tibshirani, An introduction to the bootstrap. CRCpress, 1994.

[38] M. Heidi, “Heuristics for the generation of random polygons,” in CanadianConference on Computational Geometry, vol. 5. McGill-Queen’s Press-MQUP, 1996, p. 38.

[39] R. E. Megginson, An introduction to Banach space theory. SpringerScience & Business Media, 2012, vol. 183.

[40] P. Corcoran and C. B. Jones, “Spatio-temporal modeling of the topology ofswarm behavior with persistence landscapes,” in Proceedings of the 24thACM SIGSPATIAL International Conference on Advances in GeographicInformation Systems, 2016, pp. 65:1–4.

[41] A. R. B. Shariff, “Natural-language spatial relations between linear andareal objects: the topology and metric of english-language terms,” Inter-national journal of geographical information science, vol. 12, no. 3, pp.215–245, 1998.

[42] J. Lee, Introduction to topological manifolds. Springer Science &Business Media, 2010, vol. 940.

[43] H. Edelsbrunner and J. Harer, “Persistent homology-a survey,” Contempo-rary mathematics, vol. 453, pp. 257–282, 2008.

[44] A. Zomorodian and G. Carlsson, “Computing persistent homology,” Dis-crete & Computational Geometry, vol. 33, no. 2, pp. 249–274, 2005.

Padraig Corcoran is a lecturer in the Schoolof Computer Science and Informatics at CardiffUniversity. Prior to this, he completed a EuropeanMarie Curie International Outgoing Fellowship.During the outgoing phase of this fellowship heworked at Massachusetts Institute of Technology(MIT) while during the incoming phase he workedat University College Dublin (UCD).

Christopher B. Jones took up his current po-sition as Professor of Geographical InformationSystems in the School of Computer Science &Informatics at Cardiff University in 2000. He hasalso held academic positions, in computer science,at the University of South Wales and, in geog-raphy, at the University of Cambridge and hasworked in software development at the British Ge-ological Survey and at BP Exploration. Researchat the University of South Wales on cartographic

label placement led to the development of the Maplex software which isnow an ESRI product. He graduated in geology, from Bristol University, andreceived a PhD, from the School of Physics, University of Newcastle uponTyne, for research on periodicities in fossil growth rhythms.

14 VOLUME 4, 2016