Database Management & Models

The History of Database Management &

Database Models

TaskWhat are data,

information, knowledge and wisdom?

How do they differ?Give an example of a

context where you would apply each concept.

Working Example: The UK Census

Data, information & knowledge

• Data: information without context

• Information: data within a context

• Knowledge: information with added value

Section 1: Database Management Development

6 Generations of Data Management

• G1: 1900 - 1955• G2: 1955 - 1970• G3: 1970 - 1980• G4: 1980 - 1995• G5: 1995 - 2012• G6: 2012 - ?

Zeroth generation: Record Managers 4000BC -1900

• The first known writing describes the royal assets and taxes in Sumeria.

• The next six thousand years saw a technological evolution from clay tablets to papyrus to parchment and then to paper.

• There were many innovations in data representation: phonetic alphabets, novels, ledgers, libraries, paper and the printing press.

TASKCreate a timeline which

shows the dates of origin of the:

1. International Phonetic Alphabet (IPA)

2. First Novel in English 3. 1st Library4. Printing Press5. 1st Computational

DeviceTime: 10 mins

Answers• 1300 BC: The Royal Library of Ashurbinpal in Sumeria• 1439: Johannes Guttenberg invented the printing

press• 1470: The first novel in English, Le Morte D’Arthur

was written by Thomas Malory• 1801: Jacquard invented the loom – considered to be

the first computational device• 1886: The International Phonetic Alphabet was

invented

First Generation: Record Managers 1900 -1955

• The first practical automated information processing began circa 1800 with the Jacquard Loom that produced fabric from patterns represented by punched cards.

• Each data record was represented as binary patterns on a punched card

• By 1955, many companies had entire floors dedicated to storing punched cards, much as the Sumerian archives had stored clay tablets.

Second Generation: Programmed Unit Record Equipment 1955-1970

• Stored program electronic computers had been developed in the 1940’s for scientific and numerical calculations. At about the same time, Univac had developed a magnetic tape

• Software was a key component of this new technology. It made them relatively easy to program and use. It was much easier to sort, analyze, and process the data with languages like COBOL

• The software of the day provided a file-oriented record processing model. Typical programs sequentially read several input files and produced new files as output

Third Generation: Online Network Databases 1965-1980

• Teleprocessing monitors provided the specialized software to multiplex thousands of terminals onto the modest server computers of the day

• Online transaction processing augmented the batch transaction processing that performed background reporting tasks.

• Simple indexed-sequential record organizations soon evolved to a more powerful set-oriented record model. Applications often want to relate two or more records.

• The end product was, in essence, a network data model

Fourth Generation: Relational Databases 1980-1995

• Despite the success of the network data model, many software designers felt that a navigational programming interface was too low-level

• The idea of the relational model is to represent both entities and relationships in a uniform way.

• The relational model had some unexpected benefits beyond programmer productivity and ease-of-use. The relational model was well suited to client-server computing, to parallel processing, and to graphical user interfaces.

Fifth Generation: Multimedia Databases 1995-?

• Relational systems offered huge improvements in ease-of-use, graphical interfaces, client-server applications, distributed databases, parallel data search, and data mining. Nonetheless, in about 1985, the research community began to look beyond the relational model.

• People coming from the object-oriented programming community saw the problem clearly: datatype design requires a good data model and a unification of procedures and data.

Sixth Generation: The Future• Defining the data models for new types and

integrating them with the traditional database systems.

• Scaling databases in size (to petabytes), space (distributed), and diversity (heterogeneous).

• Automatically discovering data trends, patterns, and anomalies (data mining, data analysis).

Section 2: Database Models

What is a database model?A database model is the

theoretical foundation of a database and fundamentally determines in which manner data can be stored, organised, and manipulated in a database system. It thereby defines the infrastructure offered by a particular database system.

Flat File ModelThe flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.

Hierarchical ModelIn a hierarchical model, data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.

Network ModelThe network model (defined by the CODASYL specification) organises data using two fundamental concepts, called records and sets. Records contain fields (which may be organized hierarchically, as in the programming language COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.

COnference on DAta SYstems Languages DataBase Task Group data model. The CODASYL group originally formed in the early 1970's to create the standards for COBOL. After successfully developing the COBOL specifications, the groups charter was extended to create a set of database standards.

Relational Model

The relational model was introduced by E.F. Codd in 1970 as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory.

Strengths of the Relational Model• The data model and access to it is simple to

understand and use, even for those who are not experienced programmers.

• The model of data represented in tables is simple.• There are straightforward database design

procedures.• Efficient implementation techniques are well known

and widely used.• Standards exist for query languages, such as SQL.

Object-Oriented ModelIn recent years, the object-oriented paradigm has been applied to database technology, creating a new programming model known as object databases. These databases attempt to bring the database world and the application programming world closer together, in particular by ensuring that the database uses the same type system as the application program.

Section 3: Data Warehousing

What is a Data Warehouse?A data warehouse is a database used

for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the data warehouse for reporting.

A data-processing database? Wholesaling Data?

Benefits of a Data WarehouseA data warehouse maintains a copy of information from the source transaction

systems. This architectural complexity provides the opportunity to:• Maintain data history, even if the source transaction systems do not.• Integrate data from multiple source systems, enabling a central view across

the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.

• Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.

• Present the organization's information consistently.• Provide a single common data model for all data of interest regardless of the

data's source.• Restructure the data so that it makes sense to the business users.• Restructure the data so that it delivers excellent query performance, even for

complex analytic queries, without impacting the operational systems.• Add value to operational business applications, notably customer relationship

management (CRM) systems.

Dimensional v NormalisedThere are two leading approaches to storing data in a data

warehouse — the dimensional approach and the normalised approach.

• The dimensional approach, whose supporters are referred to as “Kimballites”, believe in Ralph Kimball’s approach in which it is stated that the data warehouse should be modelled using a Dimensional Model (DM).

• The normalized approach, also called the 3NF model, whose supporters are referred to as “Inmonites”, believe in Bill Inmon's approach in which it is stated that the data warehouse should be modelled using an Entity-Relationship (ER) model.

Section 4: Data Mining

What is Data Mining?

Data mining, the analysis step of the Knowledge Discovery in Databases (KDD) processor; a relatively young and interdisciplinary field of computer science, is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems.

Classes of Task Examples 1-3Data mining involves common classes of tasks, for example:1. Classification: is the task of generalising known structure

to apply to new data, eg, an email program might attempt to classify an email as legitimate or spam.

3. Clustering: is the task of discovering groups and structures in the data that are in some way or another similar, without using known structures in the data, eg, market basket analysis: Age x Income x Type of Cheese

4. Summarisation: providing a more compact representation of the data set, including visualisation and report generation, eg, charts and graphs

Technology

Database Management & Models