Upload
everett-dalton
View
218
Download
0
Embed Size (px)
Citation preview
1
Information Retrieval and Use
Data Analysis & Data Modeling, Relational Data Analysis and
Logical Data Modeling
Geoff Leese September 2009
2
Relational Data Analysis Captures the detailed knowledge of the
meaning of the data. Ensures that the data is logically easy to
maintain and extend.Data inter-dependencies have been
identifiedAmbiguities have been resolved.Eliminate unnecessary duplication of data.Forms the data into optimum groups.Validates the Logical Data Model (LDM).
3
Logical Data Modelling
Basic Rules for converting 3NF to a LDM Create an entity type for each data relation Mark qualifying foreign keys Check compound key relations Make foreign/primary key relations
4
Guidelines for logical modelling
Entity type names are singular nouns, descriptive, concise and organisation specific.
Attribute names are unique descriptive nouns of standard format.
Relationship names are descriptive, precise verb phrases.
5
Simple Master-Detail relationships
Where a single foreign key of a relation corresponds to the primary key of another relation
See next slide for example.
6
Simple Master-Detail relationships
Shows SINGLE primary key at MASTER entity (Organisation) connected to SINGLE foreign key at DETAIL entity (Contact people)
8
Identifying Recursive (Unary) Relationships
Is a relation where a foreign key references the same relation.
Example: Employee Employee-number
Employee-name
Employee-manager-number
Employee
9
Relationships: Student/Module
At this point we need to identify the data items that describe or identify each entity
Entity attributes are also known as data items
What are the data items associated with the following LDS diagram?
TakesStudent Module
Is taken by
10
The Student
Entity Type Attribute Name Attribute Student Student Name Jones
Street Address Leek Road
Town Stoke-on-Trent
Post Code ST4 2DE
Telephone 294303
TakesStudent Module
Is taken by
11
The Module
Entity Type Attribute Type AttributeModule Module Number CM5111-1
Module Name SSAT Module Leader A LecturerLevel 1Cats Points 10
TakesStudent Module
Is taken by student
12
The Data Items
TakesStudent Module
Is taken by student
Module NumberModule NameModule LeaderLevelCats Points
Student NameStreet AddressTownPost CodeST4 2DETelephone
13
Identifying occurrences of entities
Each occurrence of an entity must be uniquely identified in some way
Imagine the British Gas data base that used only surnames to identify account holders
There would be 100,000 account holders called Jones in this country
Even if we used the given names there would still be considerable duplication
It would be impossible to find the right account by name alone
14
Adding a Primary Key
TakesStudent Module
Is taken by student
Module NumberModule NameModule LeaderLevelCats Points
Student NumberStudent NameStreet AddressTownPost CodeST4 2DETelephone
Primary key added
15
Relationships: Getting it right
TakesStudent Module
Is taken by student
TakesStudent Module
Is taken by student
Is this right?
The real situation is surely
16
Putting it right: Intersection entity
Student Number Module number
Student Module
Module NumberModule NameModule LeaderLevelCats Points
Student Number Student NameStreet AddressTownPost CodeST4 2DETelephone
Stud/Mod
We need a link entity - less ambiguity
17
Normalisation - steps
Start with a set of un-normalised tablesEntity/attribute list
Step 1 - remove ambiguity and repeating data
Step 2 - remove shared data
18
Normalisation - step 1 Break down ALL attributes into smallest
meaningful parts EG student name becomes student surname,
student firstname, student title
Remove REPEATED information to form a new table EG a course may be composed of MANY
modules (but assume that each module is only on one course!) - so form a MODULE table
19
Normalisation - step 2
Remove SHARED data to form new tablesEG modules may share tutors - so form
a TUTORS table.
20
Normalisation
FIRST NORMAL FORM - a relation (table) is in 1NF if it contains atomic values and all repeating groups have been removed
21
Normalisation
SECOND NORMAL FORM - a relation(table) is in 2NF if it is in 1NF and every non-key attribute is fully dependent on the primary key
22
Normalisation
THIRD NORMAL FORM - a relation(table) is in 3NF if it is in 2NF and every non-key attribute is not dependent on any other non-key attribute
23
Relational Data Analysis Form
Validates the LDM against the relations. Consists of:
Unnormalised Form– attributes
First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF)
– Relations– Attributes
25
Data Dictionary
lists, for every field in every tableTablenameFieldnameField TypeField size (if variable)Decimal places (if applicable)Description (if required)Other significant field properties
26
Data Dictionary example
Tablename Fieldname Fieldtype Length DecPlaces
Description
Students Student ID Counter N/A N/AStudents Student firstname Text 20 N/A Full firstname(capitalised)Students Student other
initialsText 5 N/A Other initials, Capitals,
Space separatedStudents Student Surname Text 25 N/A Surname, CapitalisedStudents Fee paid Number
(currency)N/A 2 Fee paid
Students Date of Birth Date/Time N/A N/A Input mask Short date,format Medium Date
Students Full Time? Yes/No N/A N/AEtc
27
The domain Is the “set” of items, and the definition
thereof to which an attribute belongs Define domain once, saves time when
defining attributes belonging to it. For example - Date of Birth, Course Start
Date and Enrolment Date all belong to the DATE domain - data type is date/time, format dd/mm/yyyy, non-unique, non-null.