23
1 Chapter 1 Geodatabases Shubham Bansal This seminar introduces the basic of Geodatabases and ex- plains the overall structure with some examples. Briefly, It also discuss briefly about the different types of Geo- databases, their architectures and their advantages/ disad- vantages. Later on in this paper you can find out the overall tech- nical details about the different data types used for Geo- databases, different operations to be performed in order to process and analyze geographical data, data structure be- hind it and different algorithms used for indexing of Geo- databases. In the last, there are some SQL queries that deal with geo- graphical data of Geodatabases. The motivation behind this paper is to get the clear picture of the Geodatabases and how it manages the spatial data internally. From the name Geodatabases looks very simple, but the biggest challenge is, how it really supports the stor-

Geodatabases - SE-Wiki

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

1

Chapter 1

Geodatabases

Shubham Bansal

This seminar introduces the basic of Geodatabases and ex-

plains the overall structure with some examples. Briefly,

It also discuss briefly about the different types of Geo-

databases, their architectures and their advantages/ disad-

vantages.

Later on in this paper you can find out the overall tech-

nical details about the different data types used for Geo-

databases, different operations to be performed in order to

process and analyze geographical data, data structure be-

hind it and different algorithms used for indexing of Geo-

databases.

In the last, there are some SQL queries that deal with geo-

graphical data of Geodatabases.

The motivation behind this paper is to get the clear picture

of the Geodatabases and how it manages the spatial data

internally. From the name Geodatabases looks very simple,

but the biggest challenge is, how it really supports the stor-

2 1 Geodatabases

age and do management of the geographical data. Here in

this paper I am explaining some of the topics that are in-

ternally related to Geodatabases, its data types, its datasets

(What all kinds of data it supports), its relation with Re-

lational Database Management System (RDBMS) and how

the indexed algorithm modified to fairly deal with queries

relates to geographical data that is stored in Geodatabases.

This paper also covers some of the aspects of usage, appli-

cations and different fields where Geodatabases is playing

a great role and also the names of the few organizations,

who are the key players in terms of GIS + Geodatabases

systems.

1.1 Definition of Geodatabases

Geodatabase is a part of Geographical Information System

(GIS) which helps to store and manipulate geographical

data. Geodatabase is a combination of two words Geo +

Database [1].

Geo refers to spatial data, which means data that identi-

fies the location and boundaries on earth. In other words

spatial data is the data used to store the coordinates and

topology of some location that can be mapped. Later on

this spatial data is processed and analyzed with GIS sys-

tems [2].

Geodatabases are the underlying data structure of any GIS

system and used for editing and data management tasks. A

wide range of database management systems (DBMS) and

normal file systems are the base for Geodatabases. These

DBMS and file systems come in many sizes and supports a

different number of users [3].

1.2 Generation of Geodatabases 3

1.2 Generation of Geodatabases

Following are the different genration of Geodatabases:

First Generation:

• In the first generation of Geodatabases the spatial

data is stored outside the DBMS system in separate

files.

• Each file is mapped with the unique Id’s as shown in

Figure 1.1.

Figure 1.1: 1st Generation Geodatabases

Second Generation: In the second generation of Geo-

Figure 1.2: 2nd Generation Geodatabases

databases the spatial data is stored inside the DMBS system

in a separate column called GEOMETRY. Along with this

4 1 Geodatabases

the DBMS engine is enhanced to support the SQL queries

related to spatial data.

Overall we can say spatial data is linked and stored in the

same location in tabular format in DBMS as shown in Fig-

ure 1.2.

1.3 Type of Geodatabases

There are three different categories of Geodatabases,

namely: personal, file and ArcSDE. We will discuss each

category in the following subsection [4] :

1.3.1 Personal Geodatabases

The single user database is also known as personal Geo-

databases. This kind of Geodatabases are using MS Access

to store the spatial information. There are some limitations

applied to this type:

• In case of file storage maximum size bounded to only

2 GB.

• Supported only for windows platform.

1.3.2 File Geodatabases

This type of Geodatabase also supports single user editing.

This kind of Geodatabases are using a normal file structure

to store the spatial information. The file extension should

be ”.mdb” format.

There are some limitations applied to this type:

1.3 Type of Geodatabases 5

• In case of file storage maximum size of 1 TB / Table.

1.3.3 ArcSDE Geodatabases

This database is built on top of RDBMS system (Figure 1.3).

This is also called ArcSDE. This kind of database is used

to provide a multiuser environment via providing central

spatial data storage location. This database has access

control for individual user, backup and recovery option.

These all features make this database more scalable. Even

though it has many advantages over other types but still

got some limitations such as, it’s not platform indepen-

dent, different software versions have different platform

(windows, Linux, Mac) mapping.

Figure 1.3: High Level Architecture

Geodatabases are divided into two categories according to

number of users. On the basis of following advantages and

disadvantages one can select the type:

Single User Geodatabases

Geodatabase which supports only one user at a time are

called Single user Geodatabase. For File Geodatabases

multiple concurrent editors are allowed, but for personal

Geodatabases it is not allowed. This kind of Geodatabases

has limited capacity to store geospatial data [5].

6 1 Geodatabases

Multi User Geodatabases

Multiple users are allowed to work parallely in multi user

geodatabases. Multi-user Geodatabases fits into all kinds

of organizations (Small, Medium, Large) because data Stor-

age capacity is totally depending on the size of the server.

Multiple users are allowed to perform different geospatial

operations concurrently.

Multiple user editing is supported with this type also it

supports all the spatial data types [6].

1.4 Architecture of a Geodatabase

The architecture of Geodatabase is divided into following

four features [7][8]:

• Basic operation of Geodatabase is to physically store

the geographic information using some underlying

DBMS or file system. But in addition to physical

storage of geographic information Geodatabases has

some key aspects.

• Information Model: This model is used for repre-

senting and managing geographic information. This

model is based on a collection of data tables with dif-

ferent geographic datasets (feature class, raster class

and attributes).

• The application layer logic is to access and working

with geographic data with different files and formats.

• And finally a transaction model to manage the GIS

data flow in the GIS system.

1.5 Geodatabases Datasets 7

1.5 Geodatabases Datasets

The Geodatabase has three different kinds of data sets to

manage geographic information. Creating and developing

these above mentioned data sets are the primary need to

design and build a new Geodatabase. Users have to start

first with these datasets designs later on the user can also

add the advance features in the Geodatabase like one can

add topology and network design. The storage of the Geo-

database has both the schema and the set of rules for every

datasets and a table like storage for spatial attribute and

data [9].

1.5.1 Table Basics

Following are the types used to hold and manage informa-

tion about attributes in the Geodatabases [10]:

• Numbers : It holds the numeric values like short inte-

gers, long integers, float and double.

• Text : Collection of alphanumeric values.

• Date : Holds time and date related values.

• BLOBs : This data types stands for Binary large ob-

jects and used to store images.

• Global Identifiers : This is used to manage the rela-

tionship for data management, versioning, updates

and replication. It is a registry style string consist-

ing of 36 characters enclosed in curly brackets and

these strings uniquely identifies a feature or a table

row within and across Geodatabase.

8 1 Geodatabases

1.5.2 Feature class basics

Feature classes hold the homogenous collection of the same

features, where each feature has same spatial representa-

tion like points, lines and polygons with a similar set of

attribute column [11].

For example a point representing some specific location, a

line feature class representing roads lines.

Points, lines, polygons and annotations are the most com-

monly used feature classes used in Geodatabases. Along

with these feature classes Vector features for representing

geographic object with vector geometry are frequently used

for representing discrete boundaries like walls, streets and

rivers.

In simple words we can say that a feature is simply an ob-

ject which holds the graphical representation and which we

typically a collection of point, line and polygon.

Following are the features classes used to hold the graph-

ical representation of an object:

• Points : This class is used to represent features that

are very small (such as a GPS observation).

• Lines : This class is used to represent shape and loca-

tion of geographic objects Eg. street lines and streams,

Or we can use this class to represent those graphical

objects which have length but no area.

• Polygons : This class is used to represent shape and

location of geographic objects Eg. States, countries

and land use zone etc.

• Annotation : This is used to show some descriptive

properties of geographic objects. This class is respon-

sible for text rendering for graphical objects.

1.5 Geodatabases Datasets 9

• Dimensions : This class is used to show the length

and distance of graphical objects. Eg. to indicate dis-

tance between two entities.

• Multi points : This class used to represent the features

which is composed of more than one point.

• Multi patches : This feature class is used to represent

the outer surface of geographical objects that occupy

some area or volume in 3-D space. Representation of

simple objects (triangles and cubes) to complex ob-

jects (Isosurface and buildings).

1.5.3 Raster basics

Raster datasets display geographic elements by dividing

the space into discrete square or rectangular cells in the

grid. Every cell has a value that is used to represent some

characteristic of that location as shown in Figure 1.4.

Raster datasets are commonly used for representing im-

agery, digital models and other different areas. Often

rasters are used as a way to represent points, line and poly-

gon features. In the example below, you can see how a se-

ries of polygons would be represented as a raster dataset.

Rasters are interesting for at least two reasons: first, they

can be used to represent all geographic information (fea-

tures, images and surfaces) and second, they have a rich set

of analytic Geo processing operators. Therefore, in addition

to being a universal data type for holding imagery in GIS,

rasters are also heavily used to represent features enabling

all geographic objects to be used in raster-based modeling

and analysis [12].

10 1 Geodatabases

Figure 1.4: Raster Representation [12]

1.6 Geodatabases storage in Relational

Database

The data storage model behind the Geodatabases is basic

DBMS. This backbone system provides Geodatabase a sim-

ple and effective data model for storing and working with

GIS data. In this data model [13]:

• Data is stored in tabular form.

• The table is formed by multiple rows and each row

has the same number of columns.

• Every column is representing a data type (i.e what

king of value is stored in the column. Like int, long,

float, date, time, char and new data types for spatial

information).

• The relation between tables is used to map rows of

1.6 Geodatabases storage in Relational Database 11

one table to row of another table with a common col-

umn in related tables.

Geodatabase storage includes both database schema and

rules for geographic datasets and tabular storage for spatial

and attribute data. Schema consists of definition, behavior

and integrity rules for every object as shown in Figure 1.5.

On the other hand spatial objects are most commonly

stored as a raster data set or as vector features in tabular

form along with other attributes. feature class can be stored

Figure 1.5: RDBMS Representation

in table format, where each row represents a feature. A col-

umn type SHAPE that holds the geometry or shape of the

corresponding feature. SHAPE column can be of two types.

• BLOB (storing image)

• Spatial column type.

A common set of features with spatial representation like

point, line, polygon and some set of common attributes

each stored in a separate column is referred to as a feature

class which is stored and managed in a single table [14].

12 1 Geodatabases

We can also stored raster data in the form of tables, but due

to big data size a separate block table is maintained. Raster

data is cut into small chunks and each chunk is stored in a

separate row in the table.

Below are some examples of different databases, i.e. how

they store vector and raster data. Most of the databases al-

ready added spatial data storage and their SQL query pro-

cessing support.

Different databases have different column types that hold

the vector and raster geometry:

• Oracle uses their own spatial data types for data types

defined by ArcSDE [14].

• IBM DB2 using the Spatial Extender Geometry Object

[14].

• Informix uses the Spatial DataBlade Geometry Object

[14].

• PostgreSQL uses the ArcSDE Spatial Type (Geome-

try) or PostGIS geometries [14].

• Microsoft SQL Server using Microsoft spatial types,

geometry and geography [14].

1.7 Geodatabase field data types

GEOMETRY a SQL data type used by different Geodatab-

ses storage for Oracel, IBM DB2 and PostgreSQL. This SQL

data type is built by ESRI. GEOMETRY data type is the de-

fault data type for storing object geometry in PostgreSQL

database [15].

Here we will discuss in detail about the Geodatabase data

1.7 Geodatabase field data types 13

type and their subclasses of store spatial information, but

before starting a discussion about data types in detail first

have a small look on the common storage mechanism of

Geodatabase in succeeding sections.

In general spatial data is stored in the form of tables sup-

ported by feature storage model in relational database ta-

bles. Feature table can have multiple rows where each row

holds a feature with the geometry of object is stored in one

column called SHAPE. This SHAPE column holds the poly-

gon, line and point geometry. Feature class is widely used

storage model for Geodatabase because it fits very well

with SQL processing engine. Along with this feature class

has a number of advantages also:

Figure 1.6 shows the datatype hirearchy, supported for ge-

ographical data.

• One column holds the overall geometry of a feature.

• Data structure for physical schema is very fast, scal-

able and simple.

• Easy from programmers point of view to write an in-

terface.

• Interoperability i.e. easy to move data in and out.

Let’s have a look into some important data types and their

supported functions.

POINT [17]

This data type is considered as zero dimensional (0 - D). It

is used to store position of any object in the space and used

to define feature like landmarks, hospitals and any location

specified by the user.

POLYGON [17]

This data type is used to store a sequence of points that

14 1 Geodatabases

Figure 1.6: Spatial Data Types [16]

represents a two dimensional surface. These stored point

define the exterior boundary of the polygon. This is used

to define different parts of lands, water bodies and other

clustered objects.

Following are the functions supported by this datatype.

• Area - This returns the polygon area.

• ExteriorRing.

• NumInteriorRing.

• InteriorRingN.

• Centroid - returns the center of the polygon.

• PointOnSurface - returns the point of the polygon on

request.

LineString

This is used to store a sequence of points that defines a lin-

ear interpolated path. This kind of data type has a length

1.7 Geodatabase field data types 15

attribute. This can be considered as ring if start and end

points are same. This type is used when we have to define

roads, tunnels, rivers and power lines.

Following are the functions supported by this datatype.

• StartPoint

• EndPoint

• PointN

• Length - return double precision value

• NumPoints

• IsRing - return boolean value

• IsClosed - return boolean value

The above discussed data types are very frequently used

data types in the Geodatabase. Other data types are

MultiPoint, MultiLineString and MultiPolygon. These

data types are the advanced version of the above discussed

data types.

SQL query execution examples with spatial data. Below

are some SQL query syntax will work with PostgreSQL

database.

Table creation [17]

create table table name(field 1 integer, field 2 varchar(100),

field 2 float, field 4 geometry);

Spatial data Insertion [17]

Insert into table name values(11, this is just an example,

445.63, ploygon(1 2, 3 4, 5 6, 7 8),1));

16 1 Geodatabases

Query for spatial data [17]

select field 1 from table name where (overlaps(object1,

object2) = ’t’);

All these operations are performed using normal SQL and

some of the functions specially defined for spatial data

types.

I am listing the names and functionality of some of the

functions which support spatial operations in SQL:

Point

This function Point for given x,y and spatial reference id.

Difference

The output of this function is the difference of two geomet-

ric objects.

Intersects

This function returns ’t’ in case of successful intersection

else returns ’f’.

Equals

This function compares two geometries to be either identi-

cal or not.

Contains

This function is used to check wheather any given object is

lying completely inside another object or not.

1.8 Spatial Indexing

Spatial indexing is used to support spatial data selection

and to perform operations related to spatial data (exam-

ple: spatial join and nearest objects). The advantage of us-

ing spatial index is, it organizes object space in such a way

1.8 Spatial Indexing 17

that while querying only a part or a subset of the object is

considered. B-Tree [18], R-Tree [19] are the main used data

structure for spatial indexing and these data structures are

designed to store either points or rectangles [20] [17].

A spatial index structure forms objects in the form of

baskets, where each basket has associates region i.e. a part

of space containing all objects present in that bucket as

shown in Figure 1.7.

Figure 1.7

Spatial Index structure of rectangles

• In case of overlapping (Figure 1.8) bucket region par-

tition region is abandoned and bucket region may

overlap. For example R-tree algorithm. The advan-

tage of this approach is that, spatial object remains in

the single bucket, but on the other hand there exists

multiple search paths due to overlapping bucket re-

gions.

• In case of Clipping bucket regions are disjoint, but a

data rectangle is cut into multiple pieces. For example

R+ trees [20]. The advantage of this approach is to

18 1 Geodatabases

Figure 1.8: Overlapping Example [17]

have less branching during search operation and the

disadvantage of this approach is there exists multiple

entries for the single spatial object.

Figure 1.9: Clipping Example

1.9 Benefit of Geodatabases

Geodatabases help to create a central repository of Geo spa-

tial data that can be easily accessed via mobile, web, desk-

top and provides functionality to apply different operations

and setting up relationship among data.

1.10 Different Area’s for Geodatabases 19

Figure 1.10: Geodatabase Benefits

1.10 Different Area’s for Geodatabases

The field of spatial database research has been an active

area of research for more than two decades. The results

of this research, e.g., spatial multidimensional indexes, are

being used in a number of areas such as [21]:

• Geographic Information Systems (GIS).

• Computer Aided Design (CAD).

• Multi-media Information System (MMIS).

• Data Warehousing(DWH).

• NASAs Earth Observation System (EOS).

• It can be used to store information about physical

world such as geography, planning of urban area and

astronomy.

1.11 Conclusion

This paper states the general overview of Geodatabses, its

need, different application, Geo-relational storage concept,

20 1 Geodatabases

Geo-SQL query processing, data structure behind the Geo-

databases and its benefits.

This paper can also help in learning Geodatabases basic ter-

minology such as how Geodatabases is interlinked with un-

derlying relational database management systems and and

its basic datasets to support Geo-spatial information. Along

with this, one can find few examples of different data types

supported by Geodatabases to store spatial information,

functions which helps to resolve Geo-spatial SQL queries

and some sample SQL statements to create tables , insert

values into a table, and querying Geo-Specific information.

21

Bibliography

[1] http://www.esri.com/news/arcnews/

winter0809articles/the-geodatabase.html.

[2] http://www.coastalwiki.org/coastalwiki/

GIS/#_ref-Cox_1.

[3] http://www.esri.com/software/arcgis/

geodatabase/index.html.

[4] http://webhelp.esri.com/arcgisdesktop/

9.2/index.cfm?topicname=types_of_

geodatabases.

[5] http://www.esri.com/software/arcgis/

geodatabase/single-user-geodatabase.

[6] http://www.esri.com/software/arcgis/

geodatabase/multi-user-geodatabase.

[7] http://webhelp.esri.com/arcgisdesktop/

9.2/index.cfm?TopicName=Architecture_

of_a_geodatabase.

[8] http://www.srnr.arizona.edu/rnr/rnr420/

gdb_architecture.html.

[9] http://webhelp.esri.com/arcgisdesktop/

9.2/index.cfm?TopicName=An_overview_of_

the_Geodatabase.

22 Bibliography

[10] http://webhelp.esri.com/arcgisdesktop/

9.3/index.cfm?TopicName=Table_basics.

[11] http://webhelp.esri.com/arcgisdesktop/

9.3/index.cfm?TopicName=Feature_class_

basics.

[12] http://webhelp.esri.com/arcgisdesktop/

9.3/index.cfm?TopicName=Raster_basics.

[13] http://www.esri.com/software/arcgis/

geodatabase/data-storage.

[14] http://webhelp.esri.com/arcgisdesktop/

9.2/index.cfm?TopicName=Geodatabase_

storage_in_relational_databases.

[15] http://webhelp.esri.com/arcgisdesktop/

9.2/index.cfm?TopicName=Geodatabase_

field_data_types.

[16] http://webhelp.esri.com/arcgisserver/9.

3/dotNet/geodatabases/st_geometry.gif.

[17] http://workshops.opengeo.org/

postgis-spatialdbtips/introduction.html.

[18] http://en.wikipedia.org/wiki/B-tree.

[19] http://en.wikipedia.org/wiki/R-tree.

[20] A. V. Philippe Rigaux, Michel Scholl. Spatial

DataBases with application to GIS.

[21] S. Shekhar. Spatial databases-accomplishments and

research needs. IEEE, 1999.

Typeset August 13, 2012