28
1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009

1 Information Retrieval and Use Data Analysis & Data Modeling, Relational Data Analysis and Logical Data Modeling Geoff Leese September 2009

Embed Size (px)

Citation preview

1

Information Retrieval and Use

Data Analysis & Data Modeling, Relational Data Analysis and

Logical Data Modeling

Geoff Leese September 2009

2

Relational Data Analysis Captures the detailed knowledge of the

meaning of the data. Ensures that the data is logically easy to

maintain and extend.Data inter-dependencies have been

identifiedAmbiguities have been resolved.Eliminate unnecessary duplication of data.Forms the data into optimum groups.Validates the Logical Data Model (LDM).

3

Logical Data Modelling

Basic Rules for converting 3NF to a LDM Create an entity type for each data relation Mark qualifying foreign keys Check compound key relations Make foreign/primary key relations

4

Guidelines for logical modelling

Entity type names are singular nouns, descriptive, concise and organisation specific.

Attribute names are unique descriptive nouns of standard format.

Relationship names are descriptive, precise verb phrases.

5

Simple Master-Detail relationships

Where a single foreign key of a relation corresponds to the primary key of another relation

See next slide for example.

6

Simple Master-Detail relationships

Shows SINGLE primary key at MASTER entity (Organisation) connected to SINGLE foreign key at DETAIL entity (Contact people)

7

Multiple level Master-Detail Relationships

Example: five entities

8

Identifying Recursive (Unary) Relationships

Is a relation where a foreign key references the same relation.

Example: Employee Employee-number

Employee-name

Employee-manager-number

Employee

9

Relationships: Student/Module

At this point we need to identify the data items that describe or identify each entity

Entity attributes are also known as data items

What are the data items associated with the following LDS diagram?

TakesStudent Module

Is taken by

10

The Student

Entity Type Attribute Name Attribute Student Student Name Jones

Street Address Leek Road

Town Stoke-on-Trent

Post Code ST4 2DE

Telephone 294303

TakesStudent Module

Is taken by

11

The Module

Entity Type Attribute Type AttributeModule Module Number CM5111-1

Module Name SSAT Module Leader A LecturerLevel 1Cats Points 10

TakesStudent Module

Is taken by student

12

The Data Items

TakesStudent Module

Is taken by student

Module NumberModule NameModule LeaderLevelCats Points

Student NameStreet AddressTownPost CodeST4 2DETelephone

13

Identifying occurrences of entities

Each occurrence of an entity must be uniquely identified in some way

Imagine the British Gas data base that used only surnames to identify account holders

There would be 100,000 account holders called Jones in this country

Even if we used the given names there would still be considerable duplication

It would be impossible to find the right account by name alone

14

Adding a Primary Key

TakesStudent Module

Is taken by student

Module NumberModule NameModule LeaderLevelCats Points

Student NumberStudent NameStreet AddressTownPost CodeST4 2DETelephone

Primary key added

15

Relationships: Getting it right

TakesStudent Module

Is taken by student

TakesStudent Module

Is taken by student

Is this right?

The real situation is surely

16

Putting it right: Intersection entity

Student Number Module number

Student Module

Module NumberModule NameModule LeaderLevelCats Points

Student Number Student NameStreet AddressTownPost CodeST4 2DETelephone

Stud/Mod

We need a link entity - less ambiguity

17

Normalisation - steps

Start with a set of un-normalised tablesEntity/attribute list

Step 1 - remove ambiguity and repeating data

Step 2 - remove shared data

18

Normalisation - step 1 Break down ALL attributes into smallest

meaningful parts EG student name becomes student surname,

student firstname, student title

Remove REPEATED information to form a new table EG a course may be composed of MANY

modules (but assume that each module is only on one course!) - so form a MODULE table

19

Normalisation - step 2

Remove SHARED data to form new tablesEG modules may share tutors - so form

a TUTORS table.

20

Normalisation

FIRST NORMAL FORM - a relation (table) is in 1NF if it contains atomic values and all repeating groups have been removed

21

Normalisation

SECOND NORMAL FORM - a relation(table) is in 2NF if it is in 1NF and every non-key attribute is fully dependent on the primary key

22

Normalisation

THIRD NORMAL FORM - a relation(table) is in 3NF if it is in 2NF and every non-key attribute is not dependent on any other non-key attribute

23

Relational Data Analysis Form

Validates the LDM against the relations. Consists of:

Unnormalised Form– attributes

First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF)

– Relations– Attributes

24

RDA Form

Name Date

UNF 1NF 2NF 3NF Resultrelation attributeattributes

25

Data Dictionary

lists, for every field in every tableTablenameFieldnameField TypeField size (if variable)Decimal places (if applicable)Description (if required)Other significant field properties

26

Data Dictionary example

Tablename Fieldname Fieldtype Length DecPlaces

Description

Students Student ID Counter N/A N/AStudents Student firstname Text 20 N/A Full firstname(capitalised)Students Student other

initialsText 5 N/A Other initials, Capitals,

Space separatedStudents Student Surname Text 25 N/A Surname, CapitalisedStudents Fee paid Number

(currency)N/A 2 Fee paid

Students Date of Birth Date/Time N/A N/A Input mask Short date,format Medium Date

Students Full Time? Yes/No N/A N/AEtc

27

The domain Is the “set” of items, and the definition

thereof to which an attribute belongs Define domain once, saves time when

defining attributes belonging to it. For example - Date of Birth, Course Start

Date and Enrolment Date all belong to the DATE domain - data type is date/time, format dd/mm/yyyy, non-unique, non-null.

28

Further reading

Rolland chapters 3 and 4 Hoffer chapters 10 and 12 Kendall & Kendall chapter 17