View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Final Exam Review
SIMS 202Profs. Hearst & Larson
UC Berkeley SIMSFall 2000
Final Exam
Monday Dec 11– 9:30-12:30– Room 202 and 205
Bring– Pens/pencils– Calculator– Notes/Books (optional)
Final Exam
Topics– Comprehensive, but– Emphasis on materials since the
midterm Types of questions
– Similar to those on homeworks and the midterm, but less time-consuming
– Probably a design problem.
Relationships among Language, Concepts, and
Categories
Symbols and Language
Abstract concepts are difficult to express in a computer.
Combinations of abstract concepts are even more difficult to express:– time– shades of meaning– social and psychological concepts– causal relationships
Symbols and Language
As the man walks the cavorting dog, thoughtsarrive unbidden of the previous spring, so unlikethis one, in which walking was marching anddogs were baleful sentinels outside unjust halls.
What is the relation between the symbols and the meaning?
Symbols and Language
Language only hints at meaning. Most meaning of text lies within our
minds and common understanding.– “How much is that doggy in the window?”
» how much: social system of barter and trade (not the size of the dog)
» “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own
» “in the window” implies behind a store window, not really inside a window, requires notion of window shopping
Lexical Relations
Conceptual relations link concepts
Lexical relations link words
How do they differ? How are they similar?
Major Lexical Relations
Synonymy Polysemy Metonymy Hyponymy/Hyperonymy Meronymy Antonymy
Relationships among Meanings
Homonymy: same word, different meanings– bank (river bank) vs bank (financial institution)
Polysemy: same word, different senses of meaning– slightly different concepts expressed similarly– bank (institution vs building)
Synonyms: different words, related senses of meanings– different ways to express similar concepts– jail, prison, penitentiary
Defining Category Membership
Necessary and Sufficient Conditions:– (This used to be a very influential
definition of category membership; it is ok for math and logic but out-of-date for human categories)
– Every condition must be met.– No other conditions can be required.
Can category membership be crisply defined?
What are the necessary and sufficient conditions for something to be a game?
Properties of Categorization Family Resemblance
– Members of a category may be related to one another without all members having any property in common.
» Instead, they may share a large subset of traits.» Some attributes are more likely given that others have
been seen.
– Example: feathers, wings, twittering, ...» Likely to be a bird, but not all features apply to “emu”» Unlikely to see an association with “barks”
Properties of Categorization
Centrality– Some members of a category may be
“better examples” than others.»Example: robins vs. chickens vs. emus»Exampe: soccer vs. gambling vs.
hopscotch
Properties of Categorization
Characteristic Features– Perceived degree of category membership has
to do with which features define the category.– Members usually do not have ALL the
necessary features, but have some subset.– Those members that have more of the central
features are seen as more central members.– People have conceptions of typical members.
Three Psychologically Primary Levels
SUPERORDINATE animal furnitureBASIC LEVEL dog chairSUBORDINATE terrier rocker
Children take longer to learn superordinate
Superordinate not associated with mental images or motor actions
How related to – Hyponymy– Hyperonymy
Characteristics of Basic-level Categories
Language– People name things more readily at basic
level.– Name learned earliest in childhood.– Languages have simpler names at basic level.– Sounds like the “real name”. – Name used more frequently.
» Strange to call a dime a coin, a metal object
– Names used in neutral context.» There’s a dog on the porch.» There’s a terrier on the porch.
Characteristics of Basic-level Categories
Concepts– Things perceived more holistically at the basic
level (rather than by parts).– People interact with basic and more specific
levels similarly.– Things are remembered more readily at basic
level.– Folk biological categories correspond
accurately to scientific biological categories only at the basic level.
Metadata
Metadata Topics What is metadata? Controlled vocabularies / indexing
languages Metadata standards
– Dublin Core– XML– etc
Thesaurus creation and use Classification structure
– Descriptors vs subject headings– Hierarchies vs facets
Metadata
Metadata is:– “data about data” (term usage database
systems)– Information about Information– Structures and Languages for the Description of
Information Resources and their elements (components or features)
– “Metadata is information on the organization of the data, the various data domains, and the relationship between them” (Baeza-Yates p. 142)
Type of Metadata systems and standards
Naming and ID systems – URLs, ISBNs Bibliographic description – MARC, Dublin
Core, TEI, etc. Music -- SMDL Images and objects – CIMI, VRA Core
Categories Numeric Data – DDI, SDSM Geospatial Data – FGDC Collections – EAD
Types of Indexing Languages Uncontrolled Keyword Indexing Indexing Languages
– Controlled, but not structured Thesauri
– Controlled and Structured Classification Systems
– Controlled, Structured, and Coded Faceted Classification Systems
Controlled Vocabularies
Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information.
What is a “Controlled Vocabulary”
“The greatest problem of today is how to teach people to ignore the irrelevant, how to refuse to know things, before they are suffocated. For too many facts are as bad as none at all.” (W.H. Auden)
Similarly, there are too many ways of expressing or explaining the topic of a document.
Controlled vocabularies are sets of Rules for topic identification and indexing, and a THESAURUS, which consists of “lead-in vocabulary” and an limited and selective “Indexing Language” sometimes with special coding or structures.
Uses of Controlled Vocabularies
Library Subject Headings, Classification and Authority Files.
Commercial Journal Indexing Services and databases
Yahoo, and other Web classification schemes
Online and Manual Systems within organizations– SunSolve– MacArthur
Indexing Languages
An index is a systematic guide designed to indicate topics or features of documents in order to facilitate retrieval of documents or parts of documents.
An Indexing language is the set of terms used in an index to represent topics or features of documents, and the rules for combining or using those terms.
The Indexing Process
Concept identification term selection (via thesaurus) term assignment
Application: The Indexing Process (Manual)
IsTerm
suitable
NOSelect Alternativeterm to represent
Concept
WouldConcept be
better representedby one of
these terms
Is There
Another Concept
Consider Preferred
Term
Select Preferred
Term
Establish TermDenoting Concept
Examine Documentand Identify Significant Concepts
Consider First
Concept
PreferredTerm?
StartNO
NO
NO
NO
NO
YES YES YES
YES
YESYES
DoesThesaurus
contain termfor
Concept
Consider anyassociated terms inThesaurus (NT,BT)
Admit New TermInto Thesaurus
Can Conceptbe expressed
combining terms?
Consider Each ofThese Terms
Assign Termsto
Document
Prefer Alternative
Term(s)
End
Adapted from ISO 5963, p.5
Metadata Standards
The problem
Proliferation of the forms of names– Different names for the same person– Different people with the same names
Bibliographic Description
MARC (Machine Readable Cataloging)
DUBLIN CORE– Warwick Framework for Dublin Core
Metadata GILS (Government Information
Locator Service) RFC 1807 (Format for Bibliographic
Records) RDF (Resource Description Format)
Images and Objects
Categories for the Description of Works of Art (Getty Art Institute)
Consortium for the Computer Interchange of Museum Information (CIMI)
RLG REACH Element Set (for Shared Description of Museum Objects)
VRA Core Categories (Visual Resources Association)
Collection Level Descriptors
EAD (Encoded Archival Description)
Z39.50 Profile for Access to Digital Collections
RSLP Collection Description (Research Support Libraries Programme)
Dublin Core
Simple metadata for describing internet resources.
For “Document-Like Objects” 15 Elements.
Dublin Core Elements
Title Creator Subject Description Publisher Other
Contributors Date Resource Type
Format Resource
Identifier Source Language Relation Coverage Rights
Management
Source
Label: SOURCE The work, either print or
electronic, from which this resource is derived, if applicable. For example, an html encoding of a Shakespearean sonnet might identify the paper version of the sonnet from which the electronic version was transcribed.
The Same Item in Different Metadata Systems
ISBD Dublin Core RFC 1807 TEI Header MARC Record
ISBD Punctuation
Title Proper (GMD) = Parallel title : other title info / First statement of responsibility ; others. -- Edition information. -- Material. -- Place of Publication : Publisher Name, Date. -- Material designation and extent ; Dimensions of item. -- (Title of Series / Statement of responsibility). -- Notes. -- Standard numbers: terms of availability (qualifications).
Bibliographic Record
Introduction to cataloging and classification / Bohdan S. Wynar. -- 8th ed. / Arlene G. Taylor. -- Englewood, Colo. : Libraries Unlimited, 1992. -- (Library science text series).
MARC Record (display) ID:DCLC9124851-B RTYP:c ST:p FRN: MS:c EL: AD:06-20-91 CC:9110 BLT:am DCF:a CSC: MOD: SNR: ATC: UD:04-11-92 CP:cou L:eng INT: GPC: BIO: FIC:0 CON:b PC:s PD:1992/ REP: CPI:0 FSI:0 ILC:a II:1 MMD: OR: POL: DM: RR: COL: EML: GEN: BSE: 010 9124851 020 0872878112 (cloth) 020 0872879674 (paper) 040 DLC$cDLC$dDLC 050 00 Z693$b.W94 1991 082 00 025.3$220 100 1 Wynar, Bohdan S. 245 10 Introduction to cataloging and classification /$cBohdan S. Wynar. 250 8th ed. /$bArlene G. Taylor. 260 Englewood, Colo. :$bLibraries Unlimited,$c1992. 300 xvii, 633 p. :$bill. ;$c24 cm. 440 0 Library science text series 504 Includes bibliographical references (p. 591-599) and index. 650 0 Cataloging. 650 0 Subject cataloging. 650 0 Classification$xBooks. 630 00 Anglo-American cataloguing rules. 700 10 Taylor, Arlene G.,$d1941-
Conditions of Authorship?
Single person or single corporate entity Unknown or anonymous authors
– Fictitiously ascribed works Shared responsibility Collections or editorially assembled
works Works of mixed responsibility (e.g.
translations) Related Works
Name Authority Files
ID:NAFL8057230 ST:p EL:n STH:a MS:c UIP:a TD:19910821174242 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:05-14-80 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-21-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 053 PR6005.R517 100 10 Creasey, John 400 10 Cooke, M. E. 400 10 Cooke, Margaret,$d1908-1973 400 10 Cooper, Henry St. John,$d1908-1973 400 00 Credo,$d1908-1973 400 10 Fecamps, Elise 400 10 Gill, Patrick,$d1908-1973 400 10 Hope, Brian,$d1908-1973 400 10 Hughes, Colin,$d1908-1973 400 10 Marsden, James 400 10 Matheson, Rodney 400 10 Ranger, Ken 400 20 St. John, Henry,$d1908-1973 400 10 Wilde, Jimmy 500 10 $wnnnc$aAshe, Gordon,$d1908-1973
Different names for thesame person
Name Authority Files
ID:NAFO9114111 ST:p EL:n STH:a MS:n UIP:a TD:19910817053048 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:06-03-91 RFE:a CSC:c SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 08-19-91 040 OCoLC$cOCoLC 100 10 Marric, J. J.,$d1908-1973 500 10 $wnnnc$aCreasey, John 663 Works by this author are entered under the name used in the item. For a listing of other names used by this author, search also under$bCrease y, John 670 OCLC 13441825: His Gideon's day, 1955$b(hdg.: Creasey, John; usage: J .J. Marric) 670 LC data base, 6/10/91$b(hdg.: Creasey, John; usage: J.J. Marric) 670 Pseuds. and nicknames dict., c1987$b(Creasey, John, 1908-1973; Britis h author; pseud.: Marric, J. J.)
Name authority files
ID:NAFL8166762 ST:p EL:n STH:a MS:c UIP:a TD:19910604053124 KRC:a NMU:a CRC:c UPN:a SBU:a SBC:a DID:n DF:08-20-81 RFE:a CSC: SRU:b SRT:n SRN:n TSS: TGA:? ROM:? MOD: VST:d 06-06-91 Other Versions: earlier 040 DLC$cDLC$dDLC$dOCoLC 100 10 Butler, William Vivian,$d1927- 400 10 Butler, W. V.$q(William Vivian),$d1927- 400 10 Marric, J. J.,$d1927- 670 His The durable desperadoes, 1973. 670 His The young detective's handbook, c1981:$bt.p. (W.V. Butler) 670 His Gideon's way, 1986:$bCIP t.p. (William Vivian Butler writing as J .J. Marric)
Different people writing with the same name
Other Types of Controlled Vocabularies
Gazetteers (Geographic Names) Code lists (e.g. LC Language
Codes) Subject Heading Lists Classification Schemes Thesauri
What is SGML/XML?
A. SGML stands for Standard Generalized Markup Language– XML stands for eXtended Markup
Language B. What it is NOT:
– Not a visual document description– Not an application specific markup– Not proprietary
What is SGML/XML?
What it is:– An international standard (SGML- ISO
8879:1986)– A generic language for describing the
structure of documents, and markup that can be used for those documents
– Intended for generating markup for content rather than form elements
XML is a simplified subset of SGML (being established by W3C)
XML
Extensible Markup Language– a simplification of SGML, the Standard
Generalized Markup Language – instead of a fixed set of format-oriented tags
like HTML, XML allows you to create the schema -- whatever set of tags are needed --for your information type or application
– this makes any XML instance “self-describing” and easily understood by computers and people
Version 1.0 ratified by W3C in 2/98; backed by Microsoft, Sun, Netscape, many others
Source Dr. Robert J Glushko
HTML Airline Schedule Seen “By Computer”
<Title>Airline Schedule</Title><Body><H2>Flight Information</H2><H3>United Airlines #200</H3><UL><LI>San Francisco
<LI>9:30 AM<LI>Honolulu
<LI>12:30 PM <LI>$368.50 </UL></Body>
Source Dr. Robert J Glushko
Airline Schedule in XML
<TransportSchedule Type=“Airline”><Segment Id=“United Airlines #200”> <Origin>San Francisco</Origin><DepartTime>9:30 AM</DepartTime> <Destination>Honolulu</Destination><ArriveTime>12:30 PM</ArriveTime> <Price Currency=“USD”>368.50</Price></Segment></TransportSchedule>
Source Dr. Robert J Glushko
SGML/XML Structure
An SGML document consists of three parts:– The SGML Declaration– The Document Type Definition (DTD)– The Document Instance
An XML document requires only the document instance, but for effective processing a DTD is important.
Document Type Definitions
The DTD describes the structural elements and "shorthand" markup for a particular document type. It defines:
– Names of "legal" elements– How many times elements can appear– The order of elements in a document– Whether markup can be omitted (SGML only)– Contents of elements (i.e., nested structures)– Attributes associated with elements– Names of "entities"– short-hand conventions for element tags. (SGML only)
DTD Components
The major components of a DTD are:–Entity Declarations–Element Declarations–Attribute Declarations
Thesauri A Thesaurus is a collection of
selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms
Thesauri (cont.) Examples:
– The ERIC Thesaurus of Descriptors– The Art and Architecture Thesaurus– The Medical Subject Headings (MESH)
of the National Library of Medicine
Why develop a thesaurus?
To provide a conceptual structure or “space” for a body of information– To make it possible to adequately
describe the topical contents of informational objects at an appropriate level of generality or specificity
– To provide enhanced search capabilities and to improve the effectiveness of searching (I.e., to retrieve most of the relevant material without too much irrelevant material).
Why develop a thesaurus?
To provide vocabulary (or terminological) control. – When there are several possible
terms designating a single concept, the thesaurus should lead the indexer or searcher to the appropriate concept, regardless of the terms they start with.
Preliminary considerations
What is used now?– Continue using an existing thesaurus?– Ad hoc modification of existing thesaurus?– Develop a new well-structured thesaurus?
What is the scope and complexity of the subject field?
What kind of retrieval objects or data will be dealt with?
How exhaustive and specific is the desired description of objects?
Preliminary Considerations
The scope and complexity of the field will provide some indication of the scope and complexity of the thesaurus.– It is better to plan for a larger and more
comprehensive system than a smaller system that rapidly will become inadequate as the database grows.
Development of a good thesaurus requires a major intellectual effort as well as clerical operations like data entry and production of sorted lists.
Development of a Thesaurus Term Selection. Merging and Development of
Concept Classes. Definition of Broad Subject Fields
and Subfields. Development of Classificatory
structure Review, Testing, Application,
Revision.
Flow of Work in Thesaurus Construction
Select Sources
Assign codes
Select Terms
Record Selected Terms
Sort Terms
Merge identical Terms
Define Broad SubjectFields
Merge Terms in SameConcept class
Sort Terms into BroadSubject Fields
Define Subfields withinone Subject Field
Work out detailed structureof the Subject Field
Select Preferred Terms
All Subfields of BroadSubject finished?
All BroadSubjects finished?
Improve Class Structure
Yes
Yes
No
No
Print Classified Indexand review
Discuss with Experts andUsers
Select descriptors andchecklist items
Produce Full Thesaurusand Check references
Assign Notation
Review and Test
Many Modifications?
Based on Soergel, pp 327-333
Yes
No
Revise asneeded
2. Merging and Development of Concept Classes
Sort Term DB into alphabetical order.
First Round: Merge information for Identical terms -- possibly pulling info from additional sources.
Second Round: Merge synonyms or terms in the same concept class.
3. Definition of Broad Subject Fields and Subfields
Define Broad Subject fields and sort terms into these broad fields
Define subfields within each broad field and sort terms into these subfields.
Work out the detailed structure– Select Preferred Terms– Merge information for
terms in the same concept class
Repeat these steps– for each subfield within a
broad field– and for each broad field– Until all terms have been
consolidated and preferred terms selected
4. Development of Classificatory Structure
Produce preliminary version of classified index and update the working database.
Improve classificatory structure
Reality check: produce and distribute a version of the classified index. Distribute to users/experts.
5. Final Stages
Review Testing Application Revision
Thesaurus Revision and Updates
There will always be new concepts, products, or expressions that need to be added to the thesaurus. – Set a regular schedule of reviews and
revisions.– Collect complaints, problems, etc. and
fold into revision of the thesaurus
Hierarchical vs. Faceted (Subject Heading vs. Descriptor)
Category Systems
AssigningHeadings vs. Descriptors
Subject headings – assign one (or a
few) complex heading(s) to the document
Descriptors– Mix and match
How would we describe recipes using each technique?
Subject Heading vs. Descriptor WILSONLINE
– Athletes– Athletes--Heath&Hygiene– Athletes--Nutrition– Athletes--Physical Exams– …– Athletics– Athletics -- Administration– Athletics -- Equipment --
Catalogs– …– Sports -- Accidents and
injuries– Sports -- Accidents and
injuries -- prevention
ERIC– Athletes– Athletic Coaches– Athletic Equipment– Athletic Fields– Athletics– …– Sports psychology– Sportsmanship
Subject Headings vs. Descriptors
Describe the contents of an entire document
Designed to be looked up in an alphabetical index– Look up document
under its heading Few (1-5)
headings per document
Describe one concept within a document
Designed to be used in Boolean searching– Combine to describe
the desired document Many (5-25)
descriptors per document
Hierarchical Classification
– Each category is successively broken down into smaller and smaller subdivisions
– No item occurs in more than one subdivision
– Each level divided out by a “character of division”. Also known as a feature.»Example: distinguish Literature based on:
Language Genre Time Period
Hierarchical Classification
Literature
SpanishFrenchEnglish
DramaPoetryProse
18th17th16th
DramaPoetryProse
19th 18th17th16th 19th
...
... ... ...
...
Labeled Categories for Hierarchical Classification
LITERATURE– 100 English Literature
» 110 English Prose English Prose 16th Century English Prose 17th Century English Prose 18th Century ...
» 111 English Poetry 121 English Poetry 16th Century 122 English Poetry 17th Century ...
» 112 English Drama 130 English Drama 16th Century …
– 200 French Literature
Faceted Classification
Create a separate, free-standing list for each characteristic of division (feature).
Combine features to create a classification.
Faceted Classification along with Labeled Categories
A Language– a English– b French– c Spanish
B Genre– a Prose– b Poetry– c Drama
C Period– a 16th Century– b 17th Century– c 18th Century– d 19th Century
Aa English Literature
AaBa English Prose AaBaCa English
Prose 16th Century AbBbCd French
Poetry 19th Century BbCd Drama 19th
Century
Important Question:How to use both types ofclassification structures?
How to look through them? How to use them in search?
Design of Information Architecture
Web Site Design Issues
Design
Prototype
Evaluate
Iteration earlier in the design process is more cost-effective
Iteration is the Key to UI Design
Design Process: Discovery
Implementation
Design
Preliminary Design
Conceptualization
Discovery Assess needs– understand
client’s expectations
– determine scope of project
– characteristics of users
Slide by Mark Newman
Design Process: Conceptualization
Implementation
Design
Preliminary Design
Conceptualization
Discovery Begin defining site– Take results from
discovery and visualize solutions
– Early information design
Slide by Mark Newman
Design Process: Preliminary Design
Implementation
Design
Preliminary Design
Conceptualization
Discovery Generate multiple (3-5) designs
– one will be selected for development
– navigation design– early graphic
design
Slide by Mark Newman
Design Process: Preliminary Design
Activities– Sketching designs– Creating mock-ups– Quick and rough
Deliverables– Schematics (a.k.a. templates)– Site maps– Mock-ups– Presentations
Slide by Mark Newman
Design Process: Design
Implementation
Design
Preliminary Design
Conceptualization
Discovery Iteration
Design
Prototype
Evaluate
• iteration at the level of development process
• And within design stage
Slide by Mark Newman
Design Process: Implementation
Implementation
Design
Preliminary Design
Conceptualization
Discovery Prepare design
for handoff– Create final
deliverable– Specifications and
prototypes– As much detail as
possible
Why Do We Prototype?
Get feedback on our design faster– saves money
Experiment with alternative designs
Fix problems before code is written Keep the design centered on the
user
Slide by James Landay
Fidelity in Prototyping
Fidelity refers to the level of detail High fidelity?
– prototypes look like the final product Low fidelity?
– artists renditions with many details missing
Slide by James Landay
Low-fidelity Sketches
Slide by James Landay
Low-fidelity Sketches
Database Systems
Terms and Concepts
Database: – A collection of similar records with
relationships between the records. (Rowley)
– A Database is a collection of stored operational data used by the application systems of some particular enterprise. (C.J. Date)
DBMS Benefits
Minimal Data Redundancy Consistency of Data Integration of Data Sharing of Data Ease of Application Development Uniform Security, Privacy, and
Integrity Controls Data Accessibility and
Responsiveness Data Independence Reduced Program Maintenance
Database Components
DBMS===============
Design toolsTable CreationForm CreationQuery CreationReport Creation
Procedural language
compiler (4GL)=============
Run timeForm processorQuery processor
Report WriterLanguage Run time
UserInterface
Applications
ApplicationProgramsDatabase
Database contains:User’s DataMetadataIndexesApplication Metadata
Kroenke, DatabaseProcessing
Terms and Concepts
Records– The set of values for all attributes of a
particular entity– AKA “tuples” or “rows” in relational
DBMS File
– Collection of records – Usually a physical file on OS– May also be a “logical file” like a
“Relation” or “Table” in relational DBMS
Terms and Concepts
Key– an attribute or set of attributes used
to identify or locate records in a file Primary Key
– an attribute or set of attributes that uniquely identifies each record in a file
Terms and Concepts
Data Independence– Physical representation and location of data
and the use of that data are separated» The application doesn’t need to know how
or where the database has stored the data, but just how to ask for it.
» Moving a database from one DBMS to another should not have a material effect on application program
» Recoding, adding fields, etc. in the database should not affect applications
Terms and Concepts
Metadata– Data about data
» In DBMS means all of the characteristics describing the attributes of an entity, E.G.:
name of attribute data type of attribute size of the attribute format or special characteristics
– Characteristics of files or relations»name, content, notes, etc.
Design
Determination of the needs of the organization
Development of the Conceptual Model of the database– Typically using Entity-Relationship
diagramming techniques Construction of a Data Dictionary Development of the Logical Model
Entity
An Entity is an object in the real world (or even imaginary worlds) about which we want or need to maintain information– Persons (e.g.: customers in a
business, employees, authors)– Things (e.g.: purchase orders,
meetings, parts, companies)
Employee
Attributes
Attributes are the significant properties or characteristics of an entity that help identify it and provide the information needed to interact with it or use it. (This is the Metadata for the entities.)
Employee
Last
Middle
First
Name SSN
Age
Birthdate
Projects
Relationships
Relationships are the associations between entities. They can involve one or more entities and belong to particular relationship types
Relationships
ClassAttendsStudent
PartSuppliesproject
partsSupplier
Project
Mapping to a Relational Model
Each entity in the ER Diagram becomes a relation.
A properly normalized ER diagram will indicate where intersection relations for many-to-many mappings are needed.
Relationships are indicated by common columns (or domains) in tables that are related.
We will examine the tables for the Acme Widget Company derived from the ER diagram
Normalization
Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data
Normalization is a multi-step process beginning with an “unnormalized” relation– Hospital example from Atre, S. Data Base: Structured Techniques
for Design, Performance, and Management.
Normalization
Boyce-Codd and
Higher
Functional dependencyof nonkey attributes on the primary key - Atomic values only
Full Functional dependencyof nonkey attributes on the primary key
No transitive dependency between nonkey attributes
All determinants are candidate keys - Single multivalued dependency
Relational Algebra Operations
Select Project Product Union Intersect Difference Join Divide
Effectiveness and Efficiency Issues for DBMS
Focus on the relational model Any column in a relational
database can be searched for values.
To improve efficiency indexes using storage structures such as BTrees and Hashing are used
But many useful functions are not indexable and require complete scans of the the database
Advantages of RDBMS
Possible to design complex data storage and retrieval systems with ease (and without conventional programming).
Support for ACID transactions– Atomic – Consistent– Independent– Durable
Advantages of RDBMS
Support for very large databases Automatic optimization of
searching (when possible) RDBMS have a simple view of the
database that conforms to much of the data used in businesses.
Standard query language (SQL)
Disadvantages of RDBMS
Until recently, no support for complex objects such as documents, video, images, spatial or time-series data. (ORDBMS are adding support these).
Often poor support for storage of complex objects. (Disassembling the car to park it in the garage)
Still no efficient and effective integrated support for things like text searching within fields.
Study hard, and good luck!
Thank you for all the great work!