Bivand Paper

Embed Size (px)

Citation preview

  • 8/3/2019 Bivand Paper

    1/14

    Implementing spatial data analysis software toolsin R

    Roger BivandEconomic Geography Section, Department of Economics,

    Norwegian School of Economics and Business Administration, Bergen, Norwayo g e r F f i v n d n h h F n o

    16th April 2002

    1 Introduction

    This contribution has two equal threads: doing spatial data analysis in the R projectand environment, and learning from the R project about how an analytic andinfrastructural open source community has achieved critical mass to enable mutuallybenecial sharing of knowledge and tools. The challenge is to see whether, and if sohow far, we can contribute to the next meeting of the community nurturing R (andother projects) at the Distributed Statistical Computing workshop in 2003. It is fair tosay that the statistical and data analytic interests of the community are catholic,rigourous, and enthusiastic, and challenge the perceived barriers between commercialand open source software in the interests of better, more timely, and more professionalanalysis in the proper sense of the word.

    R is an implementation of the S language, as is S-Plus, and often able to execute thesame interpreted code; it was initially written by Ross Ihaka and Robert Gentleman(1996). R follows most of the Brown and Blue Books (Becker, Chambers and Wilks,1988, Chambers and Hastie 1992), and also implements parts of the Green Book

    (Chambers, 1998). R is associated with the Omegahat project: it is here that muchprogress on inter-operation is being made, for instance embedding R in Perl, Python,Java, PostgreSQL or Gnumeric. R is available as Free Software under the terms of theFree Software Foundations GNU General Public License in source code form. Itcompiles and runs out of the box on a wide variety of UNIX platforms and similarsystems (including FreeBSD and Linux). It also compiles and runs on Windowssystems and MacOS, and as such provides a functional cross-platform distributionstandard for software (and from 1.5.0 for data).

    Paper for CSISS specialist meeting on spatial data analysis software tools, Santa Barbara CA, 10-11May 2002.

    1

    http://www.omegahat.org/http://www.r-project.org/mailto:[email protected]
  • 8/3/2019 Bivand Paper

    2/14

    2 Spatial data analysis in R: status

    At the time of writing, searching the R site for "spatial" yielded 447 hits. As Ripley(2001) comments, some of the hesitancy that was previously observable incontributions of new packages has been due to the existence of the S-Plus SpatialStatsmodule: duplicating existing work (including GIS integration) has not seemedfruitful. Over recent months, however, a number of packages have been released onCRAN in all three areas of spatial data analysis (point patterns, continuous surfaces,and lattice data).

    Ripley is not only very familiar with spatial statistics as an academic statistician(Ripley, 1981 among other publications), but also contributed an early package to Rfor point pattern analysis and continuous surface analysis, included in Venables and

    Ripley (1999 - third edition). Descriptions of some of the packages available are givenin notes in R News (Ripley, 2001; Bivand, 2001b), while a more dated survey wasmade by Bivand and Gebhardt (2000), reecting the situation about three years ago.Rather than duplicate these surveys, this section will be concerned with highlightingfeatures of the R implementation of S that are of potential value for spatial dataanalysis.

    First, though, some basic remarks may help to provide a context. In the terminologyused in the R project, the programming environment is provided as a programinterpreting the language, managing memory, providing services, and running the userinterface (command line, history mechanisms and graphics windows). In passing, itworth noting that the language supports the use of expressions as function arguments,allowing considerable uency in command line interaction and function writing.There is a clearly dened interface to this program, permitting additional compiledfunctions to be dynamically loaded, and interpreted functions to be introduced into thelist of known objects. By default on program startup, only the base package is loaded,and other packages are loaded at will. R distributions are accompanied by a small setof packages, available by default, and a larger recommended collection. Sourcepackages provide a powerful vehicle for distributing additional code, but theirstructure encourages a much richer formulation.

    A minimal source package contains a directory of les of interpreted functions, and adirectory of les of documentation of this code. All functions should be fullydocumented, and must be if the package is to be distributed through theComprehensive R Archive Network. It is customary for the documentation to includean example section that can be executed to demonstrate what the function does;typically use is made of data sets distributed with the base package if possible. If onechooses to use domain-specic data sets, then the package will contain a furtherdirectory with the necessary data les, which in turn are documented in the help ledirectory. Interpreted R code can of course be read within the context of the userinterface, and functions may be edited, saved to user les, and sourced back into theprogram.

    In some circumstances, it is desirable to move the internal computations of a functionto a compiled language, although, as we will see, this is not an absolute requirementbecause the built-in internal functions themselves interface highly optimized andheavily debugged compiled code. In this case, a source directory will also be present,

    2

    http://cran.r-project.org/http://finzi.psych.upenn.edu/search.html
  • 8/3/2019 Bivand Paper

    3/14

    with C, C++, or Fortran 77 source les. In the next section below we will see how

    these are converted into an installed package that is ready for use in the program. Hereit is sufcient to mention the possibility of dynamically loading used-compiled sharedobject code, and the usefulness of header les in C in particular giving direct access tointernal R data objects and memory allocation mechanisms from such user-compiledfunctions.

    For instance, R provides a f t o r object denition for categorical variables, with acharacter vector of level labels and an integer vector of observation values pointing tothe vector of levels. In the GRASS/R compiled interface using GRASS libraryfunctions, moving categorical raster data between R and GRASS is accomplished fastand fully preserving labels by operating on R factor objects in C functions. Within R,functions are written use object classes, for example the factor class, to test for objectsuitability, or in many modelling situations to convert factors into appropriate dummy

    variables.

    Finally, users can at will create new classes for which the class method despatchmechanism can be invoked. The s u m m r y @ A function appears to be a single function,but in fact calls appropriate summary functions based on the class of the rstargument; the same applies to the p l o t @ A and p r i n t @ A functions. The extent towhich the extant spatial data analysis packages use class and method based functionsvaries, mostly depending on the age of the code and on the potential value of suchrevisions. The main packages at present published on CRAN specically for spatialdata analysis are spatial for point pattern and continuous surface data (in the VRbundle), elds , geoR , geoRglm , RandomFields , sgeostat for continuous surfacedata, spatstat , splancs for point pattern data, and spdep for spatial lattice data.

    On the graphics side, R does not provide dynamic linked visualization, since thegraphics model is based on drawing on one of a number of graphics devices. R doesprovide the more important tools for graphical data analysis, although no mapping ispresent as yet in general terms. Work is progressing on the provision of panelledgraphics in the grid and lattice packages, and R can be loosely linked with Ggobi.Graphics have in part been kept fairly simple because of cross-platform difculties;they are extensible at the user level in many ways, but are more for viewing than forinteraction.

    2.1 Implementation examples

    In this section, we will use some of the implementation details of the spdep toexemplify the internal workings of an R package; this package is a workbench forexploring alternative implementation issues, and bug reports, contributions andcriticisms are very welcome. The illustrations will draw on the canonical data sets forareal or lattice data, many of which are included in the package to provide clearcomparisons with the underlying literature.

    Most of the world as seen by data analysis software still looks like a at table, and themost characteristic object class in R at rst seems to be the data frame. But a dataframe, a rectangular at table with row and column names, and made up of columnsof types including numeric, integer, logical, character and logical, and with other

    3

  • 8/3/2019 Bivand Paper

    4/14

    attributes, is in fact a list. While point pattern data can exist happily within at tables,

    as indeed can point locations with attributes as used in the analysis of continuoussurfaces, as well as time series, the specic structuring data object of lattice dataanalysis describing the neighbourhood relations between observations can not. Whenthe weights matrix is represented as such, for moderate to larger data sets analysismay be impeded.

    This provides one reason for supplementing the existing S-PLUS spatial statisticsmodule for lattice data. In that case, weights are represented sparsely in a data frame,with four columns: weights matrix element row index, column index, value, and(optionally) the order of the weights matrix when multiple matrices are stored in thesame data frame. While this provides a direct route to sparse matrix functions fornding the Jacobian, and for matrix multiplication, it makes the retrieval of neighbourhood relations awkward for other needs. Here, it was found simpler tocreate a hierarchy of class objects leading to the same target, but also open to manyoperations at earlier stages.

    The basic building block is a simple list, with each list element an integer vectorcontaining the indices of its neighbours in the present denition. The list is of classn , and has a character region ID attribute, to provide a mapping between the regionnames and indices. These lists may be read in as legacy GAL-format les, orgenerated from lists of polygon perimeter coordinates, or matrices of coordinatesrepresenting the regions under analysis. Nicholas Levin-Koh has contributed a numberof useful graph object derived functions, so that there is now quite a choice withregard to creating lists of neighbours. Class n has s u m m r y @ A and p l o t @ A functions:

    b d t @ o l u m u A

    b u m m r @ o l F l F n D o o r d A

    g o n n t t o o l F l F n t t o l l o n t t r u t X

    v t o S

    6 l X r 4 n 4

    6 r o n F d X n u m I X R W I H H S I H H I I H H T I H H P I H H U F F F

    6 l X l o i

    6 l l X l o i

    6 m X l o i

    x v v

    x u m r o r o n X R W

    x u m r o n o n z r o l n k X P Q H

    r n t n o n z r o t X W F S U W Q R P

    e r n u m r o l n k X R F T W Q V U V

    v n k n u m r d t r u t o n X

    P Q R S T U V W I H

    U U I Q R W T I I I

    U l t o n n t d r o n X

    I H H S I H H V I H R S I H R U I H R W I H R V I H I S t P l n k

    I m o t o n n t d r o n X

    I H I U t I H l n k

    4

  • 8/3/2019 Bivand Paper

    5/14

    6 7 8 9 10 11

    1 0

    1 1

    1 2

    1 3

    1 4

    1 5

    q

    q

    qq

    q

    q

    qq

    q qq

    qq q

    qq

    q

    q

    q

    qq q

    qqq

    q

    qqq q q

    qq q

    qq q

    qq

    q

    q

    q

    q

    qq

    q

    q

    qq

    q qq

    qq q

    qq

    q

    q

    q

    qq q

    qqq

    q

    qqq q q

    qq q

    qq q

    qq

    q

    q

    Figure 1: Canonical Columbus OH. spatial units.

    b p l o t p o l @ p o l D D o r d r a 4 r 4 A

    b p l o t @ o l F l F n D o o r d D d d a i A

    b o l F k n n ` E k n r n @ o o r d D k a I A

    b p l o t @ k n n P n @ o l F k n n A D o o r d D d d a i D o l a 4 r d 4 D l t a P A

    b p l o t p o l @ p o l D D o r d r a 4 r 4 A

    b p l o t @ o l F l F n D o o r d D d d a i A

    b t o F F d r o p p d ` E @ o o r d D I ` a o o r d P I D I A

    b t t @ o o r d t o F F d r o p p d D I D o o r d t o F F d r o p p d D P D

    C l l a t o F F d r o p p d D p o a P D o t a H F Q A

    b u F o l F l F n ` E u t @ o l F l F n D 3 @ I X l n t @ o l F l F n A 7 n 7

    C t o F F d r o p p d A A

    b p l o t @ u F o l F l F n D o o r d E t o F F d r o p p d D D o l a 4 r d 4 D

    C d d a i A

    b @ 3 @ t t r @ o l F l F n D 4 r o n F d 4 A 7 n 7 t t r @ u F o l F l F n D

    C 4 r o n F d 4 A A A

    I Q I Q R Q T Q W R P R T

    Subsetting is perhaps more interesting, because it involves more work on the

    5

  • 8/3/2019 Bivand Paper

    6/14

    neighbours list. The code snippet above and gure 2 show how the neighbours list for

    Columbus may be subsetted to retain spatial units east of the river:

    6 7 8 9 10 11

    1 1

    1 2

    1 3

    1 4

    1 5

    q

    q

    qq

    q

    q

    qq

    q q

    q qq q

    qq

    qq

    qq

    q q

    qqq

    q

    q qq q q

    qq q

    qq q

    qq

    qq

    2131

    343639 4246

    q

    q

    qq

    q

    q

    qq

    q q

    q qq q

    qq

    q

    qq

    q q

    qqq

    qqq q

    q qq q

    qq

    qq

    Figure 2: Subsetting Columbus OH. on the Scioto River.

    Since this is written in interpreted code, it may be instructive to prole 1000repetitions of the subsetting operation using the convenient proler compiled bydefault into R under Unix and Linux. Total time was 12.06 seconds, of which 43%was spent in the calling function. Standard functions such as s o r t @ A andu n i q u e F d e f u l t @ A do spend extra time checking argument characteristics, but turnout to be very effective, themselves calling internal compiled functions.

    A more demanding task is to nd lists of higher order neighbours, in this case thesecond, third and fourth order neighbours for Columbus, once again proling for 1000calls of the n l g @ A function. Of the total time of 136.28 seconds for 1000 calls, onethird is within the calling function, but as much as 17% is in the double left bracketfunction for accessing list elements, and a good deal in w h i h and m t h . Then l g @ A function returns a list of neighbours lists, and is now just interpreted code.This also permits more condent debugging, especially in relation to possible specialcases, allowing extra conditions to be imposed, for example with regard to units withno neighbours. In other cases, a call to a compiled function, say for counting numbersof neighbours per unit, may be a helpful simplication.

    Similar considerations apply to the function for creating weights lists from neighbourslists, n P l i s t w @ A , which is now interpreted, having been partly compiled in the past.

    6

  • 8/3/2019 Bivand Paper

    7/14

    which

    names.default

    is.factor

    !

    is.na

    inherits

    names

    unique

    unique.default

    sort

    subset.nb

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    0 1 2 3 4 5

    Figure 3: Seconds elapsed by function for 1000 subset function calls on Columbusneighbours list.

    unique

    match

    inherits

    names

    unique.default

    seq.default

    sort

    which

    [[

    nblag

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    10 20 30 40

    Figure 4: Seconds elapsed by function for 1000 nblag function calls on Columbusneighbours list returning four orders of neighbours.

    7

  • 8/3/2019 Bivand Paper

    8/14

    In particular the conversion of the standard l p p l y @ A function from interpreted to

    compiled for applying a function to list elements has meant that less code needs to becompiled for acceptable response times. The l i s t w object is a list with a neighbourslist member and a corresponding weights member, and well as attributes providingsome metadata. The function can use general as well as binary weights, and can usethe weighting schemes described in Tiefelsdorf, Grifth and Boot (1999).

    A nal consideration is that the availability of classes does encourage conformity, forexample in using the h t e s t class to report the results of hypothesis tests. Thefollowing example shows the output of three Morans I tests on the Freeman-Tukeysquare root transformation of SIDS occurrences in counties in North Carolina 1974-8,the rst agging the presence of two counties without neighbouring county seatswithin 30 miles of their own, the second subsetting to remove the zero-neighbourcounties, and the third using the Saddlepoint approximation (Tiefelsdorf,forthcoming):

    b m o r n F t t @ t F s h U R B q r t @ n F d 6 f s U R A D n P l t @ d o r F n D

    C z r o F p o l a i A D z r o F p o l a i D l t r n t a 4 t o F d d 4 A

    w o r n 9 s t t u n d r r n d o m t o n

    d t X t F s h U R B q r t @ n F d 6 f s U R A

    t X n P l t @ d o r F n D z r o F p o l a i A

    w o r n s t t t t n d r d d t a Q F Q H S D p E l u a H F H H H W R W V

    l t r n t p o t X t o F d d

    m p l t m t X

    w o r n s t t t i p t t o n r n

    H F P Q S V V T I W I E H F H I H I H I H I H H F H H S S Q W T R I

    b d r o p F n o F n ` E 3 @ I X l n t @ d o r F n A 7 n 7

    C @ r d @ d o r F n A a a H A A

    b u F d o r F n ` E u t @ d o r F n D d r o p F n o F n A

    b u F ` E u t @ t F s h U R B q r t @ n F d 6 f s U R A D d r o p F n o F n A

    b m o r n F t t @ u F D n P l t @ u F d o r F n A D

    C l t r n t a 4 t o F d d 4 A

    w o r n 9 s t t u n d r r n d o m t o n

    d t X u F

    t X n P l t @ u F d o r F n A

    w o r n s t t t t n d r d d t a Q F Q T U R D p E l u a H F H H H U S V U

    l t r n t p o t X t o F d d

    m p l t m t X

    w o r n s t t t i p t t o n r n

    H F P R H I U V V I R E H F H I H Q H W P U V H F H H S S Q Q P Q I

    b l m F m o r n t t F d @ l m @ u F ~ I A D n P l t @ u F d o r F n A D

    C l t r n t a 4 t o F d d 4 A

    d d l p o n t p p r o m t o n o r l o l w o r n 9 s @ f r n d o r E x l n

    8

  • 8/3/2019 Bivand Paper

    9/14

    o r m u l A

    d t X

    m o d l X l m @ o r m u l a u F ~ I A

    t X n P l t @ u F d o r F n A

    d d l p o n t p p r o m t o n a Q F I W I U D p E l u a H F H H I R I R

    l t r n t p o t X t o F d d

    m p l t m t X

    y r d w o r n 9 s

    H F P R H I U V V

    Concluding this section, it is worth recapping that the structures of the R language areevolving, and that many issues that even 18 months ago seemed potentially forbiddinghave been resolved. For example, reading images into R was handled by usingcompiled code from an external library not available on all platforms. Now the sametask is accomplished using standard connections functions, encapsulated for p n m images, in the pixmap package. The same connections functions are used for readingand writing legacy GAL les to and from neighbours lists. These advances arebrought about by continuing interaction between the core developers and userscontributing packages to CRAN, and because R provides researchers working on theS language with a exible environment to prototype functionality.

    3 What R offers as a programming environment and aproject

    Above, CRAN (Comprehensive R Archive Network) and packages were mentioned.While R provides a rich language and environment for data analysis and visualization,it is also extendible, not just because the user can write new or customised interpretedfunctions, and dynamically load compiled C, Fortran or C++ code, but because theproject provides tools for checking, building, archiving and distributeduser-contributed packages. Each such package is required to document functions, toprovide examples which should run without error if the package is correctly installed.

    This effectively reduces barriers between users (with a certain insight into thelanguage and their own problem areas) and core developers, and seems to be a goodexample of the benecial consequences of an open source development model. It hasbeen important to maintain a certain conservatism, meaning that hard-won experience(and legacy C and Fortran code) is central, while experimentation continues inparallel, and in part in the Omegahat project. It is also worth stressing that the Rproject is an open community, with multiple commitments to varying data analysiscommunities, and a clear willingness to adapt within the possibilities offered by opensource development, in particular through inter-operation with other visualizationsoftware, databases, languages, and so on (even including R as an Excel Addin).

    9

    http://cran.r-project.org/
  • 8/3/2019 Bivand Paper

    10/14

    As has already been indicated indirectly, much of the added value of the R project

    extends beyond the standard functionality of the language and programmingenvironment. The archive network is such an extension, as are the package checkingmechanisms (in the tools package). Together with the test suites, they have beendeveloped to facilitate quality control of the core system rather than user contributedpackages, but because the same standards are used, the contributed packages alsobenet in terms of organisation and coherence. The use of proling was demonstratedabove, and is a typical side effect of the spillovers from the core team to users.

    The dynamics of the r E h e l p mailing list further provide feedback about areas whichmight be given higher priority; currently providing threading is such an area, as arename space mechanisms for user contributed code. Of course, such mechanisms arerelatively common in open source communities, but do need care and a willingness tocontribute and participate. Maybe for analysts of spatial data, some of the moredetailed statistical themes may seem marginal, usually until those in dispute haveclaried their positions. Since R is in general well documented, and introductions nowexist in a number of languages, at least some questions may be superuous, or aproduct more of misunderstanding than real difculty.

    Consequently, the R project provides a number of helpful ideas for the organisation of similar kinds of actions, particularly about the internal dynamics of encouraging manypeople with little time and no funding to collaborate fruitfully and enjoyably. Whatseems to happen (from observant reading of list trafc and occasional contact withother user-space contributors and core members is that adaptation to helpful signals isstronger than responses to (sometimes justied) negative signals, I feel mostlybecause a majority of the people most of the time nd that their own work, be it

    teaching, research, consulting or production, benets from their participation. This isalso related to disciplinary culture, where statisticians and scientists in differentknowledge domains have differing traditions for working collaboratively. Naturally, asense of humour helps, and a willingness to sense when positive feedback in words isneeded, and when one actually needs to devote the hours it takes to attack a problem.

    It is also worth mentioning that R functions on effectively all Unix and Linux systems,Windows systems, and MacOS. There is a framework GUI under Windows andMacOS, which does not permit user interaction with data objects in iconic form, butpermits the system to be managed. For current Windows releases, this includes amenu item for online downloading and installation of binary packages from CRAN.This effectively puts spatial data analysis in R just a few clicks away - to get the basicfunctions anyway. Beyond this, the lack of a GUI does constitute an important hinder,

    but as has been said on the list many times, developing and maintaining GUIs onmany platforms is not a priority for anyone in the core group, not least because theysee R as an engine somewhat removed from direct contact with users not motivated totake the system as it stands. In projects and production, use has been made of Tcl/Tk to build custom interfaces, though not all platforms can be relied on to have thenecessary libraries.

    10

  • 8/3/2019 Bivand Paper

    11/14

    4 Opportunities for advancing spatial data analysis in

    R

    While a good deal is already going on, there are some clear gaps that need to be lled,over and above making more modern spatial data analysis tools and knowledgeavailable. One is the wish that Ross Ihaka made after the last DSC meeting (at which Ihad talked about GIS integration, Bivand, 2001a) for mapping capability in R. Thereis some code around, including topology code, other libraries are available(particularly from Frank Warnerdams work), and all the current spatial data analysispackages try to solve visualization problems in their own ways. GRASS is alsomoving to positions from which the use of vector libraries is likely to be possible, alsoGPL and written in C.

    Underlying the following example is the use of county boundary polygons,downloaded from the Spacestat data archive, and projected to UTM zone 18,measured in km, using p r o j in a s y s t e m @ A call:

    b d F p t ` E u m @ n F d 6 s h U R A G u m @ n F d 6 f s U R A

    b p m ` E p p o @ n F d 6 s h U R D d F p t B n F d 6 f s U R A

    b p m F ` E o n o k @ n F d 6 s h U R D d F p t B n F d 6 f s U R A

    b p m F ` E F o r d r d @ u t @ p m D r k a @ H F H D H F H I D H F H S D H F I D

    C H F W D H F W S D H F W W D I A D n l u d F l o t a i A A

    b p m F F ` E F o r d r d @ u t @ p m F D r k a @ H F H D H F H I D H F H S D H F I D

    C H F W D H F W S D H F W W D I A D n l u d F l o t a i A A

    b o l ` E m F o l o r @ l n t @ l l @ p m F A A A

    b p r @ m r o a @ P D I A A

    b p l o t p o l @ n F u t m F p o l D n F u t m D o l a o l o d @ p m F A A

    b l n d @ @ E P V H D E U H A D @ Q U H H D Q W H H A D l n d a p t @ 4 p r o F 4 D

    C l l @ p m F A A D l l a o l D t a 4 n 4 A

    b p l o t p o l @ n F u t m F p o l D n F u t m D o l a o l o d @ p m F F A A

    b l n d @ @ E P V H D E U H A D @ Q U H H D Q W H H A D l n d a p t @ 4 p r o F 4 D

    C l l @ p m F F A A D l l a o l D t a 4 n 4 A

    Note that using p p o i s @ A does not fold together spatial units where observed countsgreatly exceed and greatly fall below expectations as in the standard denition of probability maps. The small h o y n o w s k i @ A function gives the same values where theobserved count is less than the expected value, but folds back the others, as can beseen in gure 5.

    o n o k ` E u n t o n @ D i A {

    n ` E l n t @ A

    r ` E n u m r @ n A

    o r @ n I X n A {

    @ ` i A {

    o r @ j n H X A {

    ` E @ i j B p @ E i A A G m m @ j C I A

    r ` E r C

    }

    11

    http://www.spacestat.com/data4.htmhttp://grass.itc.it/index2.htmlhttp://gdal.velocet.ca/projects/index.html
  • 8/3/2019 Bivand Paper

    12/14

    } l {

    ` E I

    ` E

    l @ b F w n 6 d o u l F p H F P S A {

    ` E @ i B p @ E i A A G m m @ C I A

    r ` E r C

    ` E C I

    }

    }

    }

    r

    }

    Showing this without a m p function is cumbersome, and nding display classintervals equally so. Here there is something that can be contributed very practically!

    A further area is that of inter-operation, using XML and/or Green Book connectionsmethods, or simple programs writing programs. This could also involve plugging Rdata computation services into other front ends, say R in PostGIS given that R canalready be embedded (experimentally) in PostgreSQL. This is more speculative, butOmegahat seems to be progressing vigourously, and highlights inter-systeminterfaces. It would however build on any pre-existing spatial data analysis functionsin R, which would become available in the environment within which R is embeddedif so selected. In fact, it seems that embedding R into Python is now quite practical,but thought would need to be given to the transfer of data structures needed for spatialanalysis between systems.

    References

    R. A. Becker, J. M. Chambers, and A. R. Wilks. 1998. The New S Language .Chapman & Hall, London.

    R. S. Bivand. 2001a. R and geographical information systems, especially GRASS,Proceedings of the 2nd International Workshop on Distributed StatisticalComputing, Technische Universitt Wien, Vienna, Austria.

    R. S. Bivand. 2001b. More on Spatial Data Analysis, R News , 1 (3) 13-17.

    R. S. Bivand and A. Gebhardt. 2000. Implementing functions for spatial statisticalanalysis using the R language, Journal of Geographical Systems , 2 (3) 307-317.

    J. M. Chambers. 1998. Programming with Data . Springer, New York.

    J. M. Chambers and T. J. Hastie. 1992. Statistical Models in S . Chapman & Hall,London.

    R. Ihaka and R. Gentleman. 1996. R: A Language for Data Analysis and Graphics, Journal of Computational and Graphical Statistics , 5, 299-314.

    B. D. Ripley. 1981 Spatial statistics. Wiley, New York.

    12

    http://cran.r-project.org/doc/Rnews/http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/Bivand.pdf
  • 8/3/2019 Bivand Paper

    13/14

    200 0 200 400

    3 7 0 0

    3 8 0 0

    3 9 0 0

    4 0 0 0

    4 1 0 0

    prob. [0,0.01]prob. (0.01,0.05]prob. (0.05,0.1]prob. (0.1,0.9]prob. (0.9,0.95]prob. (0.95,0.99]prob. (0.99,1]

    200 0 200 400

    3 7 0 0

    3 8 0 0

    3 9 0 0

    4 0 0 0

    4 1

    0 0

    prob. [0,0.01]prob. (0.01,0.05]prob. (0.05,0.1]prob. (0.1,0.9]

    Figure 5: Probability map of North Carolina SID counts, 1974-8; upper map cumula-tive probabilities, lower map Choynowski probabilities.

    13

  • 8/3/2019 Bivand Paper

    14/14

    B. D. Ripley. 2001. Spatial Statistics in R, R News , 1 (2) 14-15.

    M. Tiefelsdorf. forthcoming. The Saddlepoint approximation of Morans I and localMorans Ii reference distributions and their numerical evaluation. Geographical Analysis .

    M. Tiefelsdorf, D. A. Grifth, and B. Boots. 1999. A variance-stabilizing codingscheme for spatial link matrices. Environment and Planning A , 31, 165180.

    W. N. Venables and B. D. Ripley. 1999 Modern Applied Statistics with S-Plus.Springer, New York (book website).

    14

    http://www.stats.ox.ac.uk/pub/MASS3/http://cran.r-project.org/doc/Rnews/