42
Database structure Database structure & file & file organization organization LIS 670 Bair-Mundy

Database structure & file organization LIS 670 Bair-Mundy

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Database structure & file organization LIS 670 Bair-Mundy

Database structure & Database structure & file organizationfile organization

LIS 670Bair-Mundy

Page 2: Database structure & file organization LIS 670 Bair-Mundy

Electronic databases

Governmentaldatabases

Commercialdatabases

OPACSpecialtydatabases

ERICEBSCO

Host

Trust Territory

Online Public Access Catalog

Page 3: Database structure & file organization LIS 670 Bair-Mundy

The online catalog

Bibliographicdata

Nameauthority

data

Holdingsdata

Circulationdata

TitlePublisherDate of publicationExtent of item

Caroll, LewisKarol, LuisKerroll, L.Dodgson,

Charles

Copies of a book;

Issues of a journal

Who checked out what; for how long; patron data

Page 4: Database structure & file organization LIS 670 Bair-Mundy

Entity-relationship diagram

NAMENAME_ID

PERS_NAMCRP_NAMCNF_NAMAD_P_NAM

SUBJ_ID

SUBJECTUSE_FORREL_TERMNAR_TERM

SUBJECT

LCN

ISBNPERS_NAM (FK)TITLEPLACE_PUBLPUBLISHERSUBJECT (FK)

BIBL-ENTITY

TRANS_IDTR_PA_ID (FK)TR_DATEDUE_DATETR_HLD_ID (FK)

CHKED_OUT_ITEM

PATRON_ID

PAT_NAMESTREET_1STREET_2CITYSTATE

PATRON

HOLDINGCP_NOHLD_LCN (FK)

LOCATIONCALL_NOHLD_DTE (FK)

Page 5: Database structure & file organization LIS 670 Bair-Mundy

Bibliographic Entity (Resource)

LCN

ISBNPERS_NAM (FK)TITLEEDITIONPLACE_PUBLPUBLISHERDATE_PUBLEXTENT_OF_ITEMSERIESSUBJECT (FK)

BIBL-ENTITY

Abstraction of a book, journal, video, photograph, or artifact

Unique ID(Primary Key)

Attributes

Page 6: Database structure & file organization LIS 670 Bair-Mundy

Name Entity

NAMENAME_ID

PERS_NAMCRP_NAMCNF_NAMAD_P_NAM

LCN

ISBNPERS_NAM (FK)TITLEEDITIONPLACE_PUBLPUBLISHERDATE_PUBLEXTENT_OF_ITEMSERIESSUBJECT (FK)

BIBL-ENTITY

Abstraction of a book, journal, video, photograph, or artifact

Unique ID

Attributes

Page 7: Database structure & file organization LIS 670 Bair-Mundy

Subject Entity

NAMENAME_ID

PERS_NAMCRP_NAMCNF_NAMAD_P_NAM

SUBJ_IDSUBJECTUSE_FORREL_TERMNAR_TERM

SUBJECT

LCN

ISBNPERS_NAM (FK)TITLEEDITIONPLACE_PUBLPUBLISHERDATE_PUBLEXTENT_OF_ITEMSERIESSUBJECT (FK)

BIBL-ENTITY

Abstraction of a book, journal, video, photograph, or artifact

Page 8: Database structure & file organization LIS 670 Bair-Mundy

Additional entities

NAMENAME_ID

PERS_NAMCRP_NAMCNF_NAMAD_P_NAM

SUBJ_ID

SUBJECTUSE_FORREL_TERMNAR_TERM

SUBJECT

LCN

ISBNPERS_NAM (FK)TITLEPLACE_PUBLPUBLISHERSUBJECT (FK)

BIBL-ENTITY

TRANS_IDTR_PA_ID (FK)TR_DATEDUE_DATETR_HLD_ID (FK)

CHKED_OUT_ITEM

PATRON_ID

PAT_NAMESTREET_1STREET_2CITYSTATE

PATRON

HOLDINGCP_NOHLD_LCN (FK)

LOCATIONCALL_NOHLD_DTE (FK)

Page 9: Database structure & file organization LIS 670 Bair-Mundy

Entity RelationshipsNAME

NAME_ID

PERS_NAMCRP_NAMCNF_NAMAD_P_NAM

SUBJ_ID

SUBJECTUSE_FORREL_TERMNAR_TERM

SUBJECT

LCN

ISBNPERS_NAM (FK)TITLEPLACE_PUBLPUBLISHERSUBJECT (FK)

BIBL-ENTITY

TRANS_IDTR_PA_ID (FK)TR_DATEDUE_DATETR_HLD_ID (FK)

CHKED_OUT_ITEM

PATRON_ID

PAT_NAMESTREET_1STREET_2CITYSTATE

PATRON

HOLDINGCP_NOHLD_LCN (FK)

LOCATIONCALL_NOHLD_DTE (FK)

Page 10: Database structure & file organization LIS 670 Bair-Mundy

Foreign KeysNAME

NAME_ID

PERS_NAMCRP_NAMCNF_NAMAD_P_NAM

SUBJ_ID

SUBJECTUSE_FORREL_TERMNAR_TERM

SUBJECT

LCN

ISBNPERS_NAM (FK)TITLEPLACE_PUBLPUBLISHERSUBJECT (FK)

BIBL-ENTITY

TRANS_IDTR_PA_ID (FK)TR_DATEDUE_DATETR_HLD_ID (FK)

CHKED_OUT_ITEM

PATRON_ID

PAT_NAMESTREET_1STREET_2CITYSTATE

PATRON

HOLDINGCP_NOHLD_LCN (FK)

LOCATIONCALL_NOHLD_DTE (FK)

Page 11: Database structure & file organization LIS 670 Bair-Mundy

Relational databases

The logical view: data and data relationships in the database.

Main Author:Eco, Umberto. Uniform Title:Nome della rosa. English Title:The name of the rose / Umberto Eco ; translated from the Italian by William Weaver. Publisher:San Diego : Harcourt Brace, 1994. Description:1st Harvest ed.

536 p. ill. ; 21 cm. Series:Harvest in translation

A Harvest book Subject(s):Historical fiction

Detective and mystery stories Call Number: PQ4865 .C6 N613 1994 Status:Not Checked Out

Page 12: Database structure & file organization LIS 670 Bair-Mundy

Relational databasesMain Author: Eco, Umberto. Uniform Title: Nome della rosa. English Title:The name of the rose / Umberto Eco ; translated from the Italian by William Weaver. Publisher:San Diego : Harcourt Brace, 1994.

.

.

.

NAF 13789Norton, Peter

NAF 13789Norton, Peter

NAF 29563Frost, Robert

NAF 29563Frost, Robert

NAF 19568Eco, Umberto

NAF 19568Eco, Umberto

NAF 19568NAF 19568

Page 13: Database structure & file organization LIS 670 Bair-Mundy

Relational databasesAuthor: Dalyrimple, JenTitle:Try your best / by Jen Dalyrimple. …NAF 25793

Dalyrimple, Jan

NAF 25793Dalyrimple, Jan

Dalyrimple, JanDalyrimple, Jan

Author: Dalyrimple, JenTitle:The name of my nose / by Jen Dalyrimple. …

Dalyrimple, JanDalyrimple, Jan

Author: Dalyrimple, JenTitle:My name and fame / by Jen Dalyrimple. …

Dalyrimple, JanDalyrimple, Jan

Dalyrimple, JenDalyrimple, Jen

Dalyrimple, JenDalyrimple, Jen

Dalyrimple, JenDalyrimple, Jen

Oops,Typo!

NAF 25793Dalyrimple, Jen

Dalyrimple, J.S.

NAF 25793Dalyrimple, Jen

Dalyrimple, J.S.

Page 14: Database structure & file organization LIS 670 Bair-Mundy

Cataloging record

Main Author:Eco, Umberto. Uniform Title:Nome della rosa. English Title:The name of the rose / Umberto Eco ; translated from the Italian by William Weaver. Publisher:San Diego : Harcourt Brace, 1994. Description:1st Harvest ed.

536 p. ill. ; 21 cm. Series:Harvest in translation

A Harvest book Subject(s):Historical fiction

Detective and mystery stories Call Number: PQ4865 .C6 N613 1994 Status:Not Checked Out

12340005

Historical fiction

SUBJECTrecords

Detective and mystery stories

123411171234000512341117

Historical fiction

Detective and mystery stories

Page 15: Database structure & file organization LIS 670 Bair-Mundy

Subject heading change

Title:This is how we flow : rhythm in Black cultures / edited by Angela M.S. Nelson. Publisher:Columbia, S.C. : University of South Carolina Press, c1999. Description: vi, 160 p. : ill., maps, music ; 24 cm. Subject(s):Afro-Americans.

12342001

Afro-Americans

SUBJECTrecord

12342001African Americans

African Americans

Page 16: Database structure & file organization LIS 670 Bair-Mundy

Tuples

PUBLISHER

Bugsy Press

Tara Pub. Co.

Beau Gens

Bowring Press

Earth Press

BIBL-ENTITY

LCN

00001

00002

00003

00004

00005

TITLE

My life of crime / …

Gone with the wind / …

Life after library school

Drudgery made fun /…

Mudpies…

PLACE_PUBL

London

Athens, Ga.

Paris

San Diego

Fresno

Page 17: Database structure & file organization LIS 670 Bair-Mundy

Data descriptions for a table

BIBL-ENTITY

Req'd

Yes

No

No

Yes

No

No

Yes

No

Attribute name

LCN

ISBN

PERS_NAM (FK)

TITLE

PLACE_PUBL

PUBLISHER

PUB_DATE

SUB_HD (FK)

Type

Integer

Text

Integer

Text

Text

Text

Text

Integer

Size

5

20

75

200

100

100

100

10

DataUpdateable

False

True

True

True

True

True

True

True

Attribute

Counter

Fixed length

Variable length

Variable length

Variable length

Variable length

Variable length

Fixed length

Page 18: Database structure & file organization LIS 670 Bair-Mundy

Telephone numbers

Room no. Extension no.101 67321102 69518103 65835104 69112105 69345106 68123107 67721

Page 19: Database structure & file organization LIS 670 Bair-Mundy

Fixed-length fieldsThe same amount of space is allocated for every instance of the field.

6 7 3 2 1

6 9 5 1 8

6 5 8 3 5

1 0 1

1 0 2

1 0 3

Page 20: Database structure & file organization LIS 670 Bair-Mundy

Records with fixed-length fieldsPhone directory:

office number (3 digits)telephone extension (5 digits)

10167321$10269518$10365835$

extensionno.

officeno.

End of Record markers

Beginning of file

End of file

Page 21: Database structure & file organization LIS 670 Bair-Mundy

Titles

Godzilla / by Simian Amicus

Voyage around the world in the vessel La Perouse under Captain Swashbuckler during the years 1887, 1888, and 1889 with the full blessings of Her Majesty the Queen of Elbonia / by A. Hoy Maytees

Page 22: Database structure & file organization LIS 670 Bair-Mundy

Variable-length fieldsLength of field varies according to the amount of data stored within.

G o d z I l l a

V o y a g e a r o u n d t h e w o r l d i n t h e v e s s e l L a

Page 23: Database structure & file organization LIS 670 Bair-Mundy

Records with variable-length fields

008000110001392450154 …Maytees, A. HoyVoyage around the world in the vessel La Perouse under Captain Swashbuckler during the years 1887, 1888, and 1889 with the full blessings of Her Majesty the Queen of Elbonia / by A. Hoy Maytees…

Header

Pos. 139

Page 24: Database structure & file organization LIS 670 Bair-Mundy

File structures

Physical views of data

Page 25: Database structure & file organization LIS 670 Bair-Mundy

Methods for organizing records

Sequential files

Indexed files

Lists

Balanced trees

Direct access structures

Page 26: Database structure & file organization LIS 670 Bair-Mundy

Sequential files (1)Records stored contiguously in order on a sort key

Record no.123456

Publ Date(sort key)

190219281937197819841999

TitleProspect for a new centuryMy life as a flapperWhy the market crashedPolyester pantsuit revolutionWhere is Big Brother?The end is near

Sort key

Page 27: Database structure & file organization LIS 670 Bair-Mundy

Sequential files (2)

Slow - when add new record must re-sort file

123

190219281937

Prospect for a new centuryMy life as a flapperWhy the market crashed

Good for high search/record addition ratio

Requires less space than indexed files

Page 28: Database structure & file organization LIS 670 Bair-Mundy

Searching sequential files

Examine each record in sequence

Binary search

Page 29: Database structure & file organization LIS 670 Bair-Mundy

Examine each record in sequenceRecord #

123456789

101112131415

NameAuBakerChouDietrichDoiIngKawamotoLiebowitzMarcuseRowlingSeussSmithTanakaTorranceZeus

Searching for Rowling

Accession 1:Does Au = Rowling?

Accession 2:Does Baker = Rowling?

Accession 10:Does Rowling = Rowling?

.

.

.

Page 30: Database structure & file organization LIS 670 Bair-Mundy

Binary search: step oneRecord #

123456789

101112131415

NameAuBakerChouDietrichDoiIngKawamotoLiebowitzMarcuseRowlingSeussSmithTanakaTorranceZeus

Searching for Rowling

Accession 1:Does Liebowitz = Rowling?

Is Rowling below Liebowitz in the alphabet?

Page 31: Database structure & file organization LIS 670 Bair-Mundy

Binary search: step twoRecord #

123456789

101112131415

NameAuBakerChouDietrichDoiIngKawamotoLiebowitzMarcuseRowlingSeussSmithTanakaTorranceZeus

Searching for Rowling

Accession 2:Does Smith = Rowling?

Is Rowling below Smith in the alphabet?

Page 32: Database structure & file organization LIS 670 Bair-Mundy

Binary search: step threeRecord #

123456789

101112131415

NameAuBakerChouDietrichDoiIngKawamotoLiebowitzMarcuseRowlingSeussSmithTanakaTorranceZeus

Searching for Rowling

Accession 3:Does Rowling = Rowling?

Maximum no. of accessions to find a record = log2n where n is number of records in the file

Page 33: Database structure & file organization LIS 670 Bair-Mundy

Binary search: our bead game

Left Right

No. cups Questions1 02 14 28 3

L R

L R

n log2n

Records Accessions

Page 34: Database structure & file organization LIS 670 Bair-Mundy

Index filesPUBLISHER

Bugsy Press

Tara Pub. Co.

Beau Gens

Bowring Press

Earth Press

ISBN Index

LCN

00001

00002

00003

00004

00005

TITLE

My life of crime / …Gone with the wind / …Life afterlibrary sch…Drudgerymade fun /…Mudpies…

PLACE_PUBL

London

Athens, Ga.

Paris

San Diego

Fresno

ISBN

7534678945

5675849246

1234567890

4378159721

4678591357

ISBN12345678904378159721467859135756758492467534678945

LCN0000300004000050000200001

Page 35: Database structure & file organization LIS 670 Bair-Mundy

SeriesIndex

Multiple indexes to main file

Bibliographicrecords

ISBNIndex

BrowseTitle

Index

KeywordIndex

Call no.Index

PublisherIndex

Page 36: Database structure & file organization LIS 670 Bair-Mundy

Index files - advantages

Fast searches Index file smaller than main file Index file sorted so can use

sequential or binary search

Good for system with high volume of searches

Page 37: Database structure & file organization LIS 670 Bair-Mundy

Index files - disadvantages

Use additional storage space

When add new records must re-index

Page 38: Database structure & file organization LIS 670 Bair-Mundy

Lists

Record #0123456789

NameBakerDoiRowlingDrewIng ChouMarcuseKawamotoLiebowitzAu

ForwardPointer

53

eol4712860

BackwardPointer

956130847

bol

Tell the computer where to find the text record.

Page 39: Database structure & file organization LIS 670 Bair-Mundy

Searching lists

Record #123456789

10

NameBakerDoiRowlingDrewIng ChouMarcuseKawamotoLiebowitzAu

ForwardPointer

64

Eol5823971

BackwardPointer

1067241958

bol

Follow the pointers

Accession 1:Does Baker = Doi?Doi after Baker?Use forward pointer

Searching for Doi

Accession 2:Does Chou = Doi?Doi after Chou?Use forward pointer

Accession 3:Does Doi = Doi?

Page 40: Database structure & file organization LIS 670 Bair-Mundy

Balanced treesImplement binary search logic in list form.

Goo

Baker Rowling

Au Chou Ing Tanaka

root

internalnodes

leaves

Page 41: Database structure & file organization LIS 670 Bair-Mundy

Direct-access structuresDo not go through index or follow list - use algorithm to yield address where file is stored.

ISBN1234567892/11Remainder 8

4834567891/11Remainder 10

5489234831/11Remainder 5

0 1 2 3

4 5 6 7

8 9 10

Example: Divide sort key value by prime number 11, use remainder as address

Page 42: Database structure & file organization LIS 670 Bair-Mundy

Direct access pros & cons

Advantage - fastDo not go through indexes or follow

sequence of a listComputing algorithm faster than

multiple disk accessions

Disadvantage - may hash to same address (collision)