21
1 Enviromatics 2008 - Data capture and data storage Data capture and data storage Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.

1 Enviromatics 2008 - Data capture and data storage Data capture and data storage Вонр. проф. д-р Александар Маркоски Технички факултет

Embed Size (px)

Citation preview

1Enviromatics 2008 - Data capture and data storage

Data capture and data storage

Вонр. проф. д-р Александар Маркоски

Технички факултет – Битола

2008 год.

2Enviromatics 2008 - Data capture and data storage

Introduction

• Because of political, legal and administrative developments both the amount and the quality of environmental data that is being collected has increased considerably over the past 35 years. This development was influenced by improvements in the collection, management and utilisation of environmental information.

• Sensor networks have been installed and upgraded to monitor the quality of water, air, and the soil. Satellite data and remote sensing data are used increasingly to obtain environmental information.

• Environmental data sets are large and complex. Their administration requires powerful processors and efficient storage technologies. The question is rather how to handle and to process these large unstructured data sets in order to obtain efficient decision support.

3Enviromatics 2008 - Data capture and data storage

Object taxonomies

• The term data capture denotes the process of deriving environmental data objects from environmental objects, where any real world object can be regarded as an environmental object. Living and non-living environmental objects will be grouped into a number of classes with typical attributes (e.g. taxonomy of species).

• Simpler taxonomy structures of the environment are given by the media’s soil, water and air. This taxonomy is commonly used by governmental environmental agencies. Understanding the environment as an integrated and comple system this taxonomy leads to interdisciplinary tasks.

• Therefore, interdisciplinary task forces groups and environmental network organisations are becoming increasingly common.

4Enviromatics 2008 - Data capture and data storage

General examples of object taxonomies• atmosphere which includes all objects above the surface of the

earth;

• hydrosphere contains all waterrelated objects;

• lithosphere relates to soil, sediments and rocks;

• biosphere is collecting all living matter;

• technosphere is used to denote manmade objects;

• sociosphere denotes social and economic interrelationships within the human society

5Enviromatics 2008 - Data capture and data storage

Object taxonomy of ecology

Object taxonomy of ecology is given by • Autecology (Interrelationships between species and their

interrelations with the abiotic environment). Ecological processes take place at ecosystem scale;

• Synecology (interrelationships between communities and their environments, and between populations within a community). Ecological processes take place on a community scale;

• Demecology (population ecology) (interrelationships of individuals within a population, and interrelationships of populations with the biotic and abiotic environment). The processes take place on a population scale.

6Enviromatics 2008 - Data capture and data storage

Mapping the environment

• The main question is which environmental objects should be monitored, and what data should be collected on them.

• There are many ways to obtain environmental data objects from environmental objects.

• The results are achieved as time series of measurements. • The incoming raw data has to be subjected to some

domain-specific and device-specific processing. • Depending on the source of data this may include some

manipulations just like optical rectification, noise suppression, filtering, or contrast enhancement.

7Enviromatics 2008 - Data capture and data storage

Raw data processing

• Raw data processing procedures are known as complex analytical techniques (laboratory methods) which may be used to survey toxicants in the environment.

• Air and satellite imagery is increasingly being used to monitor remote areas and to recognise long-term environmental loads. The raw imagery is usually processed and represented as a thematic map in order to visualise the load distribution.

• For forest and wildlife management manual counts of animals and plantsare often the most reliable source of data.

• For objects from the technosphere or from the socio-sphere it is useful to study printed documentations in order to extract and condense the required environmental data objects.

8Enviromatics 2008 - Data capture and data storage

Data validation procedures

Data validation procedures are given by:

1. Temporal validation: Recent measurements are compared to previous measurements and to some reference data obtained under similar conditions.

2. Geographic validation: Data that do not fit the usual patterns are subjected to cross-validation with measurements from other equipment in the same area that measures the same parameter.

3. Space-time validation: Data are compared with previous measurements from the same equipment.

4. Parameter validation: Data that do not fit the norm are forwarded to across-validation with equipment that measure different parameters.

9Enviromatics 2008 - Data capture and data storage

Advanced techniques

• For the processing and initial evaluation of raw environmental data objects knowledge-based systems have been considerable potential. With regard to knowledge representation the requirements of environmental applications can be met by standard database and artificial intelligence techniques.

• Knowledge representation• Data merging• Bayesian probability theory and uncertain information• Data storage and data security• Data base management systems• Databases• Geographic Information Systems

10Enviromatics 2008 - Data capture and data storage

Knowledge representation

• Static knowledge is stored in specialised file systems or in relational or objectoriented databases.

• Object-oriented databases give the users the ability to group similar objects into classes and to connect those classes in an inheritance hierarchy.

• The objects in each class all, share a set of attributes and possibly a number of methods, i.e., special procedures that take one or more objects of the class as arguments.

• The notion of inheritance is that attributes and methods that are defined for some class C higher up in the heritance hierarchy are also valid for all classes in the sub-tree below C.

11Enviromatics 2008 - Data capture and data storage

Dynamic knowledge

• Dynamic knowledge in EIS is represented by IF-THEN rules. The concept of a rule-based knowledge system is to encode the available information on environmental objects by a possible large number of relatively simple rules rather than by a complex procedural program.

• Each rule consists of a IF part and a THEN part. Starting from some initial state, the system checks which of the rules are currently applicable

• If more than one rule can be applied, the system picks up one of the rules according to a given priority scheme.

12Enviromatics 2008 - Data capture and data storage

Data merging

• Environmental data capture can be performed with techniques that are standard in the areas of statistical classification, database management, and artificial intelligence. If raw data are aggregated and evaluated, the input data are only one part of information.

• Other circumstantial information is also taken into account in order to extract those environmental data objects that the user is interested in. Human experts always take such information into account when evaluating a sample.

• A promising strategy is to form a working hypothesis, and to support this hypothesis based on the information available. This has to include the possibility that the input information may partly contradict each other.

13Enviromatics 2008 - Data capture and data storage

Bayesian probability theory and uncertain information

• Environmental data are often inaccurate and uncertain. From such data statements of probability can be derived only.

• Bayesian statistics requires that events are independent from each other.

• This assumption is rarely true in environmental context. • Uncertainties within raw data sets and data bases can be

valuated by the Dempster-Shafer approach. It is used widely for environmental data capture.

14Enviromatics 2008 - Data capture and data storage

Dempster-Shafer approach

• The key idea is that one should logically separate the arguments for and against a given hypothesis H. This separation is managed by distinguishing between belief B(H) and plausibility Pl(H).

• Both concepts are represented by a number between zero and one.

• The belief represents the weight of the facts which support the working hypothesis.

• In opposite of this plausibility is one minus the weight of the facts speaking against H.

15Enviromatics 2008 - Data capture and data storage

Degree of uncertainty

• Therefore Pl(H) = 1 - B(H*), if H* denotes the hypothesis that H is false.

• The belief of the counterhypothesis B(H*) is sometimes referred to as doubt D(H) with respect to the workinghypothesis H.

• Therefore Pl(H) = 1 - D(H). • In Bayesian probability theory belief and plausibility coincide

p(H) = B(H) = Pl(H) = 1 - p(H*). • For Dempster-Shafer theory: B(H) ≤ Pl(H). • The difference between B(H) and PI(H) represents the

degree of uncertainty U(H) about the hypothesis.

16Enviromatics 2008 - Data capture and data storage

Data storage and data security

• In former years, most environmentally relevant data are only available in analogue form. This concerns historical data records but also a large number of more recent thematic maps, images, and documents.

• Those historical data sets that are of relevance in current and future applications are rapidly being digitised. This process is supported by the continuous progress in scanning technologies.

• New data is almost captured in some digital format, and it is mainly a question of logistics to make the data available.

• There are essentially two options for storing a given digital data set.

17Enviromatics 2008 - Data capture and data storage

Data storage and data security (2)

1. A data base management system (DBMS) with a welldefined data model, typical relational, object-relational, or object-oriented;

2. an application-specific file system, as it is still used by many geographic information systems (GIS).

• Environmental data have special demands to databases and data storage. In the most cases, environmental data consist of three parts of information: matter or substance based information, time information, and space information.

• An environmental data base is characterised by the type of data stored in the data base, by the management system used for data storage and by the type of information available from the data base.

18Enviromatics 2008 - Data capture and data storage

Data storage and data security (3)

• Operations between applications and inquiries are organised by interfaces. While in the past data storage and data processing were tightly coupled, more recent systems make a clear distinction between those tasks.

• This trend results of the general tendency towards to open systems. As user demand comfortable interfaces between different hardware and software tools across heterogeneous computer platforms, vendors have been forced to decompose their products along the lines of more narrowly defined functionalities.

• GIS in particular used for data storage, data querying and data visualisation of geographic information in a tightly integrated manner.

19Enviromatics 2008 - Data capture and data storage

Data base management systems

• A DBMS serves as a complete pool of data languages where the parts are given by – data definition language (DDL),

– query language (QL),

– data manipulation language (DML).

• Links to higher programming languages are given. Mainly a structured query language (SQL) is used.

• All data operations within the data base are performed by transactions which should allow multi user operations. Mostly, commercial DMBS are the result of application-oriented developments.

20Enviromatics 2008 - Data capture and data storage

Geographic Information Systems

• Geographic information systems (GIS) are essential environmental informatic tools for the management of the environment, including decision support and visualisation of large amounts of environmental data. The original idea for GIS was to computerise the metaphor of a thematic map.

• In general, GIS are computer- based tools to capture, manipulate, process, and display spatial or georeferenced data. The spatial data is still mostly held in proprietary file systems. Therefore, most of the underlying data models are layer-based. The information is encoded in a number of thematic maps, such as vegetation maps, soil maps, or topographic maps.

• With regard to geometry, each map corresponds to a partition of the universe into disjoint polygons. Each polygon represents a region that is sufficiently homogeneous with respect to the theme of the map. Maps may be enhanced by lines and points to represent specific features, such as roads or cities.

21Enviromatics 2008 - Data capture and data storage

Questions?