20
Fundamentals of GIS Data Work Site Alliance – Community Based GIS 1 2000 Chapter 2. Fundamentals of GIS Data In this training manual, the ArcView GIS software, which also can process geographic data from a variety of sources, is used to illustrate the different concepts that are discussed. However, these concepts are fairly universal within the field of GIS, and not limited to the ArcView software. 1. Spatial Data It should be noted that spatial data is at the heart of every GIS application. Spatial data stores the geographic location of particular features, along with information describing what these features represent. The location is usually specified according to some geographic referencing system (e.g., latitude, longitude) or simply by an address. Spatial data may define some physical characteristics, such as location or position, or it may also define a property such as the area of a forest (which results from defining the various positions of its boundaries). (Davis, 1996) 1.1 Types of GIS Spatial Data In GIS, spatial data is classified as three main types: point, line, and polygon. A point is a convenient visual symbol (an X, dot or other graphic), but it does not reflect the real dimensions of the feature. Points may indicate specific locations (such as a given address, or the occurrence of an event) and/or which are usually too small to depict properly at the chosen scale features (such as a building) as in Figure 2.1. Figure 2.1. Points on a GIS feature representation. A line is a one-dimensional feature with a starting and an ending point. Lines represent linear features, either real (e.g., roads or streams) as in Figure 2.2, or fictitious (e.g., administrative boundaries). Figure 2.2. Lines on a GIS feature representation. A polygon is an enclosed area, a two-dimensional feature with at least three sides (and therefore with an area). For example, it may represent a parcel of land, agricultural fields, or a political district as in Figure 2.3 below. Figure 2.3. Polygon in a GIS feature representation.

Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Embed Size (px)

Citation preview

Page 1: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 1 2000

Chapter 2. Fundamentals of GIS Data

In this training manual, the ArcView GIS software, which also can process geographicdata from a variety of sources, is used to illustrate the different concepts that arediscussed. However, these concepts are fairly universal within the field of GIS, and notlimited to the ArcView software.

1. Spatial DataIt should be noted that spatial data is at the heart of every GIS application.Spatial data stores the geographic location of particular features, along withinformation describing what these features represent. The location is usuallyspecified according to some geographic referencing system (e.g., latitude,longitude) or simply by an address. Spatial data may define some physicalcharacteristics, such as location or position, or it may also define a property suchas the area of a forest (which results from defining the various positions of itsboundaries). (Davis, 1996)

1.1 Types of GIS Spatial DataIn GIS, spatial data is classified as three main types: point, line, and polygon.

• A point is a convenient visual symbol (an X, dot or other graphic), but it doesnot reflect the real dimensions of the feature. Points may indicate specificlocations (such as a given address, or the occurrence of an event) and/orwhich are usually too small to depict properly at the chosen scale features(such as a building) as in Figure 2.1.

Figure 2.1. Points on a GIS feature representation.

• A line is a one-dimensional feature with a starting and an ending point. Linesrepresent linear features, either real (e.g., roads or streams) as in Figure 2.2,or fictitious (e.g., administrative boundaries).

Figure 2.2. Lines on a GIS feature representation.

• A polygon is an enclosed area, a two-dimensional feature with at least threesides (and therefore with an area). For example, it may represent a parcel ofland, agricultural fields, or a political district as in Figure 2.3 below.

Figure 2.3. Polygon in a GIS feature representation.

Page 2: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 2 2000

1.2 GIS Spatial Data FormatsGIS spatial data organizations can be defined in three different formats.Essentially, if the data can be converted into one of the following formats, it canbe used in ArcView (ArcView Online Help).

• Map, the paper version (hard copy) of a project.• Coverage, the digital form or version of a map (the map in the computer).

Usually, GIS coverages have a single major theme.• Data Structure, the way of organizing the data inside an information system.

In our case, it depicts the way the GIS data are built, stored, and displayed. Itis the data format that the GIS software understands and uses.

1.3 Scales of measurementSince the data collected and stored in the database determine the kind ofquestions that can be asked of the data, it is necessary to understand the scalesof measurement in which data are recorded. The measurement scales normallyused are nominal, ordinal, interval and ratio.

Nominal Scale – The nominal scale is the lowest level of measurement which isused to distinguish among features. Nominal data could be a name or adescription of features. For instance, a lake could be differentiated from a sanddune. In a tropical area, there could be regions identified with sugar cane fieldsor rice paddy fields. Basically, each name or description is distinct.

Ordinal Scale – Ordinal scales allows for data to be ranked in either anascending or descending order. A hierarchy of rank could be establisheddepending on the features under consideration. For example, a country couldhave cities ranked as small, medium and large. In addition, the country may haveparks that are ranked as being minor, intermediate and major. Although theordinal scale permits differentiation on the basis of rank, it does not show orspecify the magnitude of difference.

Interval Scale – With the interval scale of measurement the distance betweenthe ranks is known. To employ an interval scale an arbitrary starting point isused. The widely used example of the Celsius temperature scale explains theinterval scale. For example, it cannot be said that 38 degrees Celsius are twiceas hot as 19 degrees Celsius, because 0 degrees Celsius is arbitrary.

Ratio Scale – A ratio scale is more advanced than the interval scale becausethere can be an absolute starting point. For example, 78 miles is twice as far as39 miles. (Lakhan, 1996)

Page 3: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 3 2000

2. Spatial Data Structures In GisThere are two different data structures in GIS. The choice of a particular one isamong the important early decisions in designing a GIS project. The type of datastructure affects both data storage volume and processing speed, and it isimportant to have a fundamental understanding of the characteristics of differentdata types.

Notice that the discussion in the following sections focuses on the ways in whichinformation is connected to spatial data in a geographic information system. It issuch connection that makes GIS different from other types of databasemanagement software.

2.1 The Simple Raster Data StructureA simple raster data set is a regular grid of cells divided into rows and columns.In a raster data set, data values for a given parameter are stored in each cell –these values may represent an elevation in meters above sea level, a land useclass, a plant biomass in grams per square meter, and so forth. The spatialresolution of the raster data set is determined by the size of the cell. Forexample, Landsat TM satellite imagery data are raster data that are corrected tohave a cell size of approximately 30 meters on a side. However, spatialresolution can be much finer, or much coarser than 30 meters. In general, spatialresolution is a function of the data collection techniques used, and the desiredoutcomes. Figure 2.4 is an example of a simple raster data set.

0 0 1 1

1 0 1 1

0 1 1 1

1 1 0 1

Figure 2.4. Simple raster data set.

Note that each cell in the raster is assigned a single data value. In this case weare using simple binary data values meaning that the possibilities are limited totwo digit numbers – either 0 or 1. This is an example of a 1-bit raster data file (21

power); mathematically, there are only two possibilities for each pixel, 0 or 1. Bycontrast in an 8-bit data file, there are 28 or 256 possibilities of data values foreach pixel.

In our example, the computer “sees” the cells that contain 0 as “turned off”, whilethe cells that contain 1 as “turned on”. In Figure 2.5 below, the data values havebeen transformed into a raster image or grid file.

Page 4: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 4 2000

Figure 2.5 One-bit raster image.

The horizontal dimension of raster data is often oriented parallel to the east-westdirection. Following image processing convention, raster cells are numberedbeginning on the left margin of the raster. Further, the positions of cells in thevertical dimension are numbered starting from the top – or northern boundary.Thus, the origin of the raster is in the upper left corner. This location is most oftenreferenced (1,1). It is important to note that this referencing system is differentfrom more traditional geo-referencing systems that are based on Cartesiangeometry where the origin is in the lower left corner, and the origin is typicallyreferenced as (0,0).

1,1 1,2 1,3 1,4 0,3 1,3 2,3 3,3

2,1 2,2 2,3 2,4 0,2 1,2 2,2 3,2

3,1 3,2 3,3 3,4 0,1 1,1 2,1 3,1

4,1 4,2 4,3 4,4 0,0 1,0 2,0 3,0

Figure 2.6. Referencing convention for simple raster (left) and for Cartesian geometry (right).

2.3 Vector Data Structure: Topological Relationships

Standing on a street corner looking at a map is a very easy way to identifyintersecting streets and adjacent properties. But the computer ‘sees’ theserelationships by means of topology. Topology is one of the most usefulrelationships maintained in many spatial databases. It is defined as themathematical procedure for explicitly defining spatial relationships between thedata (connectivity or adjacency of points or lines in a GIS). The topological datastructure logically determines exactly how and where points and lines connect ona map by means of nodes (topological junctions); for example, highway I-94 isconnected with US-23, or Strong Hall (building, point) is located within EasternMichigan University (campus, polygon). The computer stores such information invarious tables of the database structure.

In digital maps or GIS, topological data structures provide additional intelligencefor manipulating, analyzing, and using the information stored in a database. The

Page 5: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 5 2000

order of connectivity defines the shape of an arc or polygon. Also, by storinginformation in a logical and ordered relationship, missing information (e.g., a linesegment of a polygon) is readily apparent.

2.3.1 How Topology Works

Vector data structure

The vector data model represents geographic features similar to the way mapsdo. Points represent geographic features too small to be depicted as lines orareas; lines represent geographic features too narrow to be depicted as areas;and areas represent homogeneous geographic features. An X, Y (Cartesian)coordinate system references real-world locations. In a vector data model, eachlocation is recorded as a single X, Y coordinate. With X, Y coordinates, you canrepresent points, lines and polygons as a list of coordinates instead of a pictureor graph. Points are recorded as a single coordinate pair. Lines are recorded asa series of ordered X, Y coordinates. Areas are recorded as a series of X, Ycoordinates defining line segments that enclose an area, hence the term polygonmeaning ‘many-sided figure’.

In Figure 2.7, for example, the coordinate pair 3,2 represents a point location(building); the coordinate pairs 1,5 3,5 5,7 8,8 and 11,7 represent a line (road);and the coordinate pairs 6,5 7,4 9,5 11,3 8,2 5,3 and 6,5 represent a polygon(lake). The first and last coordinates of the polygon are the same because apolygon always closes. These coordinate lists represent how geographic featuresare stored in a computer as sets of X, Y coordinates.

Figure 2.7. Vector data -- point, line, and polygon features created by sets of coordinates.

To keep track of many features, each is assigned a unique identification numberor tag. Then, the list of coordinates for each feature is associated with thefeature’s tag (Figure 2.8).

Page 6: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 6 2000

Figure 2.8. Coordinates associates with the feature tag.

Arc-Node Data Structure

To draw the boundaries of two adjacent land parcels on a map sheet, you do notwant to redraw a common boundary; doing so is inefficient. The same applies tostoring a common boundary in the computer’s database files, where possibleduplication is to be avoided. Repeating the coordinates for a point shared by anumber of lines is inefficient, because the point would be stored more than once.Storing each polygon as a closed loop of coordinates is also inefficient, becausethe sides between adjacent polygons would be stored twice. A more efficient wayto store vector data is the arc-node data structure.

The arc-node data structure is made up of points called nodes, and lines calledarcs. Nodes define the two endpoints of an arc; they may or may not connect twoor more arcs. An arc is the line segment between two nodes. An arc is composedof its two nodes and an ordered series of points which define its shape, calledvertices. Nodes and vertices are represented as X, Y coordinates. In Figure 2.9,polygons A and B in the left diagram are represented by a series of connectedcoordinates. In the diagram on the right, nodes are created where the linesintersect, arcs are created between the nodes, with vertices providing shape, andpolygons A and B are constructed from the arcs. Below the two polygons inFigure 2.9 are tables that contain data about each of the polygons. These datadescribe the arcs and polygons (i.e. the start and end node, vertices, etc.)

Figure 2.9. Vector data – points, lines, and polygons and corresponding tables.

Page 7: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 7 2000

2.3.2 Practical Applications

2.3.2.1 ConnectivityConnectivity allows you to identify a route to the airport or connect streams torivers or follow a path from the water treatment plant to a house. Here’s how itworks. Recall the arc-node data structure. An arc is defined by two endpoints, thefrom-node indicating where the arc begins and a to-node indicating where itends. This is called arc-node topology. Arc-node topology is supported throughan arc-node list. The list identifies the “From” and “To” nodes for each arc.

Connected arcs are determined by searching through the list for common nodenumbers. In Figure 2.10, it is possible to determine that arcs 1, 2, and 3 allintersect because they share node 11. The computer can determine that it ispossible to travel along arc 1 and turn onto arc 3 because they share a commonnode (11), but it is not possible to turn directly from arc 1 onto arc 5 because theydo not share a common node.

Figure 2.10. Arc-node connectivity.

2.3.2.2 Area definitionMany of the geographic features we wish to represent cover a distinguishablearea on the surface of the earth, such as lakes, parcels of land and censustracts. An area is represented in the vector model by one or more boundariesdefining a polygon. While this might sound far from reality, let us consider a lakewith an island in the middle. The lake actually has two boundaries, that whichdefines its outer edge and the island which defines its inner edge. In theterminology of the vector model, an island defines an inner boundary (or hole) ofa polygon. Here is how topology is used to define areas.

Recall that the arc-node structure represents polygons as an ordered list of arcsrather than a closed loop of X, Y coordinates. In Figure 2.11, polygon F is made

Page 8: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 8 2000

up of arcs 8, 9, 10 and 7; the 0 before the 7 indicates that this arc creates anisland in the polygon.

Figure 2.11. Polygon-arc topology.

2.3.2.3 ContiguityTwo geographic features that share a boundary are called adjacent. Contiguityis the topological concept which allows the vector data model to determineadjacency. Recall that the From-node and To-node define an arc. This indicatesthe direction of an arc, so that the polygons on the left and right sides of the arccan be determined. Left-right topology refers to the polygons on the left and rightsides of an arc. In Figure 2.12, polygon B is on the left of arc 6, and polygon C ison the right. Thus, we know that polygons B and C are adjacent.

Figure 2.12 Contiguity

Notice that the label for polygon A is outside the boundary of the area. Thispolygon is called the external or universe polygon, and represents the worldoutside the study area. The universe polygon ensures that each arc always hasits left and right side defined.

Page 9: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 9 2000

2.3.3 Benefits of TopologyCreating and storing topological relationships has a number of advantages. First,data are stored efficiently, so large data sets can be processed quickly. Usually,the vector arc-node data structure maintains the arc as the basic unit, storingwith each arc definition its related polygons (left, right) and nodes (starting andending nodes, “From” and “To” nodes) for testing. This means that each arc isstored only once and that only endpoints (nodes) are duplicated. Second, thetopological data structure provides a useful tool for data error detection,especially node errors and label errors. The final advantage to topology is that itfacilitates analytical functions, such as modeling flow through the connectinglines in a network, combining adjacent polygons with similar characteristics,identifying adjacent features, and overlaying geographic features.

2.4 Summary TableThe following table summarizes different properties of the two spatial datastructures that we have discussed. It allows as well some rapid comparisonsbetween these properties.

Page 10: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 10 2000

FEATURES RASTER VECTORFoundation of thedata structure

Smallest unit is a single gridcell. A grid cell representsan area of a givendimension.

Grid cells are called ‘pixels’(picture element).

Grid cells are arranged inrows and columns.

Each grid cell is assigned aparticular value.

Arc: segment between twonodes. Not necessarily astraight line, but a series ofconnected points.

Node: A junction wherethree or more arcs join.

Polygon: A series of nodesand arcs defining an area.

Vertex: The directionalturning point on a chain; apoint on a chain given acoordinate label.

Spatial coordinates:Coordinates for arcs,nodes, and vertices, whichmust be stored in adatabase.

Representation ofa point feature

A point takes up an entiregrid cell.The point has thedimensions of that cell.

A point has no dimensions.

The point is just an (X, Y)coordinate pair.

Measurement ofareas

Very easy.

Count the number of cellsand multiply by the area ofthe cell

Much more difficult.

A problem in geometry(divide into rectangles ortriangles, calculate area ofeach, sum).

Table 2.1. Features of raster and vector data structures.

Page 11: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 11 2000

3. ExamplesRaster Data Vector Data

* ARC/INFO grids*Image data (remotely sensed orscanned, aerial photos)*TIFF*TIFF/LZW compressed image data*ERDAS .lan and .gis files*ERDAS IMAGINE files*BSQ, BIL, and BIP*Sun raster files*ARC/INFO GRID data*JPEG*Run-length compressed files

*ArcView shapefiles*ARC/INFO coverages*Data layers in ARC/INFO MapLibraries or ArcStorm databases*CAD drawings*Data managed by ESRI SpatialDatabase Engine (SDE)

Table 2.2. Examples of raster and vector data file types.

ArcView can create vector data and use several other software vector data types.While ArcView can create and utilize certain types of raster data, it does not haveany raster functionality unless the Spatial Analyst extension is added to thesoftware. Because several different data structures exist and not all of themcompatible with ArcView, it is often necessary to convert data sets from one formto another. This is the case if you are trying to work with several different datasets at one time (e.g. imagery and line data, or importing a data set into anexisting system). Some of these conversion algorithms can change theinformation slightly during the conversion process. Thus, it is very important to beable to understand how these conversions affect the underlying data (Star andEstes, 1990) before conversions are performed.

4. Comparisons Between GIS Data StructuresThe following table (Table 2.3) summarizes the advantages and disadvantagesof each type of the two data structures that we have just defined.

Page 12: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 12 2000

Table 2.3 (below). Advantages and disadvantages to raster and vector data formats.

RASTER VECTORAdvantages • It is a relatively simple data

structure: a grid with a singlenumber (representing code) ineach cell. It is easy tounderstand and use, even bybeginners.

• The simple, coded gridstructure makes analysiseasier; computers are good atcomparing numbers. Even ifthere is a “stack” of data filesto manipulated for somecomplex analysis, thecomputer reads each cellposition one by one and doesthe analysis on that cell foreach data file. (For example,determining which data filehas the highest-value numberin each grid cell position is asimple matter of comparingnumbers.)

• It is more map-like.Vector displays are morepleasing to the eye.

• Because of the nature ofvector data, highresolution is the norm.Vector data can be moredetailed than productsfrom a computer monitoror mapping device.

• The high resolutionsupports high spatialaccuracy.

• Vector formats take lessstorage space. This isbecause vector featuresare defined and storedonly as nodes andvertices, whereas rastercoverages have everycell coded. This meansthat vector data files canbe smaller, thusprocessed faster thanraster files.

Page 13: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 13 2000

RASTER VECTORDisadvantages • Spatial inaccuracies are

common with raster data sets.It is usually hoped that lossesare compensated by gainsand that, overall, inaccuraciesare therefore canceled out.Projects that need highaccuracy either have to usemore cells (greater resolution),or convert to vector format.

• Because each cell tends togeneralize a landscape, theresult is relatively lowresolution compared to thevector format. Only the use ofa very high number of cells(which makes files muchlarger and slows downcomputation and display) canguarantee better resolution.

• Each cell must have a code,even where nothing elseexists. That is, even “nothing”must be coded (usually 0).

• Therefore, every cell iscoded, increasing the needsfor computer storage,especially for high-resolution grid cell formats,used when higher accuracyis desired.

• Vector data formats maybe more difficult tomanage than rasterformats.

• They are usually storedin a long list ofcoordinates for nodesand vertices— easy forthe computer tounderstand but difficultfor editing by the user.

• Whereas very basic and“low-end” computers canoperate raster-basedGISs, vector formatsrequire more powerful orhigh-tech machines.

• The use of bettercomputers, increasedmanagement needs, andother considerationsoften make the vectorformat more expensiveto use for projects.

While a simple raster structure is very popular, there are at least two limitationsthat need to be mentioned (Ibid).

1. With the raster structure, there is a limited possibility to specify the location ofa feature at a particular scale. In fact, we are either in one cell or another –there is no way to be in between -, considering that the line separating onecell from another is infinitely narrow.

Page 14: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 14 2000

2. Adjacent cells may not be evenly spaced (Ibid).Let us consider Figure 2.13.

1.4unit

1 unit 1.4unit

1 unit Seedpixel

1 unit

1.4unit

1 unit 1.4unit

Figure 2.13. Seed pixel with 8-connected neighborhood.

The cells directly above, below, and on the sides of any “seed pixel” are exactlyat one unit of distance away from that pixel. The ones on the diagonal areapproximately at 1.41 (the square root of 2) units of distance away.

These two limitations become relevant when we try to vectorize a raster data set,that is when we try to draw lines around raster cells that contain common rangesof data values. The models most typically employed in such process “look” atcells next to a seed pixel. The models then determine whether the neighboringcells (or pixels) are within the specified tolerances to be grouped together or not.If the model “looks” in only the four directions from the seed pixel (defined as in a4-connected neighborhood), then it has to “look” one unit away to check datavalue for the neighboring pixels. Moreover, these pixels share an edge. The 4-connected neighborhood can be seen in Figure 2.14 below.

1 unit

1 unit Seedpixel

1 unit

1 unit

Figure 2.14. Seed pixel with 4-connected neighborhood.

On another hand, if we include pixels on the diagonal (Figure 2.13, defined as ina 8-connected neighborhood), the cells are not evenly spaced and the neededalgorithm to perform the function becomes more complex. Also, in this lattercase, some cells share an edge while others only share a vertex (Ibid).

5. How to Obtain Data For Use in Arcview

5.1 Data included with ArcViewArcView comes with a full set of ready-to-use general-purpose data. In addition,there are some very useful data included on the ESRI Schools bundle CD. Formany applications, you will find that this is the only data set you need. You canuse these data by themselves to create maps in ArcView. You can also use

Page 15: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 15 2000

these data as a base to which you add your own data. For example, you canmake a map of the world and then add your own tabular data about the cities andcountries of the world (ArcView Online Help).

5.2 Other third party data suppliersAn increasing number of third party companies are making data available inARC/INFO and ArcView format. These include not only commercial datasuppliers, but also local, state and national government agencies and academicinstitutions. Many users of GIS are willing to share their data with other interestedusers.

5.3 Finding data on the InternetData that can be used in ArcView is being made available on the Internet bymany organizations and agencies. An excellent starting point is the United StatesFederal Geographic Data Committee web page at http://www.fgdc.gov. On thispage, the National Geospatial Data Clearinghouse contains links to hundreds oforganizations that maintain and provide GIS data, both in the US and around theworld. In some cases, the data can be downloaded directly and used in ArcView.ESRI's web page (http://www.esri.com) also contains links to organizations thatmaintain and provide GIS data (Ibid).

5.4 Tabular data• Data from database servers such as Oracle, Ingres, Sybase, Informix, etc.• dBASE III files• dBASE IV files• INFO tables• Text (tab or comma delimited) files

Tabular data for use in ArcView can be created and edited using almost anyspreadsheet, database, word processor, or text editor that can save files in thesupported formats (Ibid).

Tabular data can include almost any data set, whether or not it containsgeographic data. What you can do with a table in ArcView depends on what itcontains. Some tables can be displayed in a view directly while others provideadditional attributes that can be joined to your existing spatial data. You can adddBASE, INFO, and text files into ArcView as tables.

There are two types of text files that can be used in ArcView: comma-delimitedand tab-delimited.

Comma-delimited text is characters separated by commas, as seen in the listbelow. The first line of the list are the field names and the below are the actualrecords of data.

Page 16: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 16 2000

Xcoord,Ycoord,type,rating494612,3764140,GASOLINE,1494671,3763942,GASOLINE,2495612,3765256,GASOLINE,2497403,3764664,GASOLINE,3499460,3761745,GASOLINE,3496054,3769584,GASOLINE,2497089,3761839,GASOLINE,2494612,3761199,CRUDE OIL,1494671,3760955,CRUDE OIL,2495612,3760000,CRUDE OIL,2

This same file could be tab-delimited. A tab-delimited file would replace all thecommas with tabs:

Xcoord Ycoord type rating494612 3764140 GASOLINE 1494671 3763943 GASOLINE 2495612 3765256 GASOLINE 2497403 3764664 GASOLINE 3499460 3761745 GASOLINE 3496054 3759584 GASOLINE 2498989 3761839 GASOLINE 2494612 3761199 CRUDE OIL 1494671 3760955 CRUDE OIL 2495612 3760000 CRUDE OIL 2

5.5 Hot linking to other data sourcesThe ArcView Hot Link tool is used to access virtually any other data source orapplication, simply by clicking on a view. For example, you might click on abuilding to display a drawing of its floor plan, access a document describing it, oreven play a video showing it. In this way ArcView lets you organize and accessdiverse sources of data geographically.

ExercisesShapefiles are ArcView’s file format. Each shapefile is created along with a collection offiles. We shall cite mainly:• .shp – storing the feature geometry (shape and location information)• .shx – storing the index of the feature geometry (spatial data index)• .dbf – a dBASE file storing the characteristics of the features (attribute table)

Page 17: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 17 2000

2A. Adding files containing tabular data to your projectStep 1: Copy a file to your personal directory

Because the operations in this exercise will change the source data file, copy thefile to your personal directory, and make sure all changes only happen to yourown file copy.

You can use any windows copy operation to copy Children_plus.dbf from YourDrive:\Chapter 2\data directory to your personal directory (on your machine).

Step 2: Start ArcView and open a project

If necessary, start ArcView. From the File menu, choose Open Project.Move to Your Drive:\Chapter 2\ and open ex02a.apr.When the project opens you see a view, which contains a point theme calledChildren.shp and a polygon theme called Detroit_zip. shp, where each pointstands for a child and his/her family house location.

Step 3: Add a table to the project

You can add an INFO file, dBASE IV, dBASE IV file, or a tab-delimited orcomma-delimited text file to your current project as a table.

Page 18: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 18 2000

Suppose now that you want to get an additional set of tabular data from anexternal source (e.g., the Children_plus.dbf file that you now have on yourselected directory from step 1)

.To add an INFO, dBASE, or delimited text file to your project:

Make the Project window active.From the Project menu, choose Add TableFrom the File Type list, choose INFO, dBASE (.dbf), or Delimited Text (.txt)Navigate to the directory that contains the file you want to add.Double-click the file you want to add or choose the file and press OK.

ArcView adds the table to your project in this way.

Make the Project window (ex02a.apr) active.From the Project menu, select Add Table

In the Add Table dialog box, navigate to your personal directory (the one youcopied the Children_plus.dbf table in, from Step 1).

NOTE: In our example, we get the file from the C:\temp directory, which is thenthe external source for the added table. So, the table to add may be anywhereelse than the actual project itself.

Page 19: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 19 2000

Click on Children_plus.dbf to highlight it. (If you do not see Children_plus.dbf,make sure dBASE is selected in List Files of Type).

Click OK to add the table to the project.

Step 4: Closing the project

From the File menu, choose Close Project.Click on No when prompted to save the project.

Page 20: Chapter 2. Fundamentals of GIS Data 1. Spatial Data 1.1 ...igre.emich.edu/wsatraining/TManual/Chapter2/Chap2.pdf · Chapter 2. Fundamentals of GIS Data In this training manual, the

Fundamentals of GIS Data

Work Site Alliance – Community Based GIS 20 2000

3.1 BIBLIOGRAPHYDesai, Bipin C., 1997. An Introduction to Database Systems. West Publishing

Company, Los Angeles, 820pp.

ESRI, Inc., 1997. Getting to Know ArcView GIS. GeoInformation International,Cambridge, 29-25pp.

Hutchinson, Scott, and Larry Daniel, 1997. INSIDE ArcView GIS. Onworld Press, SantaFe, 474 pp.

Star, Jeffrey, and John Estes, 1990. Geographic Information Systems: An Introduction.Prentice Hall, 303pp.