Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical...

Data Storage and Access Methods

Min SongIS698

Database Design Process

ConceptualModel

LogicalModel

External Model

Conceptual requirements

Application 1

Application 2 Application 3 Application 4

Application 2

Application 3

Application 4

External Model

Internal Model

PhysicalDesign

Physical Database Design Many physical database design decisions are

implicit in the technology adopted Also, organizations may have standards or

an “information architecture” that specifies operating systems, DBMS, and data access languages -- thus constraining the range of possible physical implementations.

We will be concerned with some of the possible physical implementation issues

Physical Database Design

The primary goal of physical database design is data processing efficiency

We will concentrate on choices often available to optimize performance of database services

Physical Database Design requires information gathered during earlier stages of the design process

Physical Design Information Information needed for physical file and

database design includes: Normalized relations plus size estimates for them Definitions of each attribute Descriptions of where and when data are used

entered, retrieved, deleted, updated, and how often

Expectations and requirements for response time, and data security, backup, recovery, retention and integrity

Descriptions of the technologies used to implement the database

Physical Design Decisions

There are several critical decisions that will affect the integrity and performance of the system Storage Format Physical record composition Data arrangement Indexes Query optimization and performance

tuning

Storage Format

Choosing the storage format of each field (attribute). The DBMS provides some set of data types that can be used for the physical storage of fields in the database

Data Type (format) is chosen to minimize storage space and maximize data integrity

Objectives of data type selection Minimize storage space Represent all possible values Improve data integrity Support all data manipulations The correct data type should, in minimal

space, represent every possible value (but eliminate illegal values) for the associated attribute and can support the required data manipulations (e.g. numerical or string operations)

Access Data Types Numeric (1, 2, 4, 8 bytes, fixed or float) Text (255 max) Memo (64000 max) Date/Time (8 bytes) Currency (8 bytes, 15 digits + 4 digits decimal) Autonumber (4 bytes) Yes/No (1 bit) OLE (limited only by disk space) Hyperlinks (up to 64000 chars)

Access Numeric types Byte

Stores numbers from 0 to 255 (no fractions). 1 byte Integer

Stores numbers from –32,768 to 32,767 (no fractions) 2 bytes

Long Integer (Default) Stores numbers from –2,147,483,648 to 2,147,483,647 (no

fractions). 4 bytes Single

Stores numbers from -3.402823E38 to –1.401298E–45 for negative values and from 1.401298E–45 to 3.402823E38 for positive values. 4 bytes

Double Stores numbers from –1.79769313486231E308 to –

4.94065645841247E–324 for negative values and from 1.79769313486231E308 to 4.94065645841247E–324 for positive values. 15 8 bytes

Replication ID Globally unique identifier (GUID) N/A 16 bytes

Designing Physical Records

A physical record is a group of fields stored in adjacent memory locations and retrieved together as a unit

Fixed Length and variable fields

Data Storage

Storing Data: Disks Buffer manager Representing relational data in a disk

The Memory Hierarchy

Main Memory = Disk Cache•Volatile• 256M-1G•Access time: 10-100 nanoseconds

•Persistent •10-100 GB storage• speed:

•Rate=5-10 MB/S•Access time=

10-15 msecs.

• 1.5 MB/S transfer rate• 280 GB typical capacity• Only sequential access• Not for operational data

Processor Cache:• access time 10 nano’s• 512K

Disk Tape

Main Memory Fastest, most expensive (excluding

cache) Today: 512MB are common even on

PCs Many databases could fit in memory

New industry trend: Main Memory Database

E.g TimesTen Main issue is volatility

Secondary Storage

Disks Slower, cheaper than main memory Persistent !!! The unit of disk I/O = block

Typically 1 block = 4k A disk block is also called a disk page or

simply a page Used with a main memory buffer

Block Blocking factor (bfr) for a file is the

average number of records stored in a disk block.

Suppose the block size of a database system is 2000 bytes. Customer table has an average record length of 190 bytes. Assume the overhead of a block for the data is 100 bytes. What is the blocking factor?

The Mechanics of Disk

Mechanical characteristics: Rotation speed (5400RPM) Number of platters (1-30) Number of tracks (<=10000) Number of sectors (256/track) Number of bytes / sector (29=512) Block size (212=4096)

Platters

Spindle

Disk head

Arm movement

Arm assembly

Tracks

Sector

Cylinder

Important Disk Access Characteristics

Block access time = Disk latency + transfer time Disk latency = seek time + rotational latency Seek time = time for the head to reach the right track

10ms – 40ms Rotational latency = rotation time to get to the right

sector Time for one rotation = 10ms Average rotation latency = 10ms/2

Transfer time = typically 5-10MB/s Disks read/write one block at a time (typically 4kB)

Representing Data Elements

Relational database elements:CREATE TABLE Product (

pid INT PRIMARY KEY,name CHAR(20),description VARCHAR(200),maker CHAR(10) REFERENCES Company(name))

A tuple is represented as a record

Record Formats: Fixed Length

Information about field types same for all records in a file; stored in system catalogs.

Finding i’th field requires scan of record. Note the importance of schema information!

Base address (B)

L1 L2 L3 L4

F1 F2 F3 F4

Address = B+L1+L2

Record Header

L1 L2 L3 L4

F1 F2 F3 F4

To schema

length

timestamp

Need the header because:•The schema may change

for a while new+old may coexist•Records from different relations may coexist

header

Variable Length Records

L1 L2 L3 L4

F1 F2 F3 F4

Other header information

length

Place the fixed fields first: F1, F2Then the variable length fields: F3, F4Null values take 2 bytes onlySometimes they take 0 bytes (when at the end)

header

Records With Referencing Fields

L1 L2 L3

F1 F2 F3

Other header information

length

header

E.g. to represent one-many or many-many relationships

Storing Records in Blocks

Blocks have fixed size (typically 4k)

R1R2R3

Spanning Records Across Blocks

When records are very large Or even medium size: saves space in

blocks

blockheader

R1 R2 R2 R3

Binary large objects Supported by modern database

systems E.g. images, sounds, etc. Storage: attempt to cluster blocks

together

Modifications: Insertion File is unsorted

add it to the end File is sorted:

Is there space in the right block ? Yes: we are lucky, store it there

Is there space in a neighboring block ? Look 1-2 blocks to the left/right, shift records

If anything else fails, create overflow block

Overflow Blocks

After a while the file starts being dominated by overflow blocks: time to reorganize

Blockn-1 Blockn Blockn+1

Overflow

Modifications: Deletions

Free space in block, shift records Maybe be able to eliminate an

overflow block

Modifications: Updates

If new record is shorter than previous, easy

If it is longer, need to shift records, create overflow blocks

Physical Addresses Each block and each record have a physical

address that consists of: The host The disk The cylinder number The track number The block within the track For records: an offset in the block

sometimes this is in the block’s header

Logical Addresses

Logical address: a string of bytes (10-16)

More flexible: can blocks/records around

But need translation table:

Logical addressPhysical address

Main Memory Address

When the block is read in main memory, it receives a main memory address

Buffer manager has another translation table

Memory address

Logical address

Designing Physical/Internal Model

Overview terminology Access methods

Physical Design

Internal Model/Physical Model

OperatingSystem

Access Methods

DataBase

User request

DBMSInternal ModelAccess Methods

External Model

Interface 1

Interface 3

Interface 2

Physical Design Interface 1: User request to the DBMS.

The user presents a query, the DBMS determines which physical DBs are needed to resolve the query

Interface 2: The DBMS uses an internal model access method to access the data stored in a logical database.

Interface 3: The internal model access methods and OS access methods access the physical records of the database.

Physical File Design A Physical file is a portion of secondary

storage (disk space) allocated for the purpose of storing physical records

Pointers - a field of data that can be used to locate a related field or record of data

Access Methods - An operating system algorithm for storing and locating data in secondary storage

Pages - The amount of data read or written in one disk input or output operation

Internal Model Access Methods

Many types of access methods: Physical Sequential Indexed Sequential Indexed Random Inverted Direct Hashed

Differences in Access Efficiency Storage Efficiency

Physical Sequential

Key values of the physical records are in logical sequence

Main use is for “dump” and “restore” Access method may be used for

storage as well as retrieval Storage Efficiency is near 100% Access Efficiency is poor (unless fixed

size physical records)

Indexed Sequential Key values of the physical records are in logical

sequence Access method may be used for storage and

retrieval Index of key values is maintained with entries

for the highest key values per block(s) Access Efficiency depends on the levels of

index, storage allocated for index, number of database records, and amount of overflow

Storage Efficiency depends on size of index and volatility of database

Index SequentialData File

Block 1

Block 2

Block 3

AddressBlockNumber

ActualValue

Dumpling

Texaci

AdamsBecker

Dumpling

GettaHarty

MobileSunociTexaci

Indexed Sequential: Two Levels

Address

Key Value

001003

705710

455480

605610

Address

Key Value

Address

Key Value

Address

Key Value

Indexed Random Key values of the physical records are not

necessarily in logical sequence Index may be stored and accessed with

Indexed Sequential Access Method Index has an entry for every data base record.

These are in ascending order. The index keys are in logical sequence. Database records are not necessarily in ascending sequence.

Access method may be used for storage and retrieval

Indexed Random

AddressBlockNumber

ActualValue

Becker

Dumpling

BeckerHarty

AdamsGetta

Dumpling

BtreeF | | P | | Z |

R | | S | | Z |H | | L | | P |B | | D | | F |

Devils

AcesBoilersCars

MinorsPanthers

Seminoles

Flyers

HawkeyesHoosiers

Inverted Key values of the physical records are

not necessarily in logical sequence Access Method is better used for

retrieval An index for every field to be inverted

may be built Access efficiency depends on number

of database records, levels of index, and storage allocated for index

Inverted

AddressBlockNumber

ActualValue

CH 145

CS 201

CS 623

PH 345

CH 145101, 103,104

CS 201102

CS 623

105, 106

Becker

Dumpling

Mobile

Studentname

CourseNumber

Direct Key values of the physical records are

not necessarily in logical sequence There is a one-to-one correspondence

between a record key and the physical address of the record

May be used for storage and retrieval Access efficiency always 1 Storage efficiency depends on density of

keys No duplicate keys permitted

Hashing Key values of the physical records are not

necessarily in logical sequence Many key values may share the same physical

address (block) May be used for storage and retrieval Access efficiency depends on distribution of

keys, algorithm for key transformation and space allocated

Storage efficiency depends on distibution of keys and algorithm used for key transformation

Comparative Access Methods

IndexedNo wasted space for databut extra space for index

Moderately Fast

Moderately FastVery fast with multiple indexesOK if dynamic OK if dynamic

Easy but requiresMaintenance ofindexes

FactorStorage spaceSequential retrieval on primary keyRandom Retr.Multiple Key Retr.Deleting records

Adding records

Updating records

SequentialNo wasted space

Very fast

ImpracticalPossible but needsa full scancan create wasted spacerequires rewriting fileusually requires rewriting file

Hashedmore space needed foraddition and deletion ofrecords after initial load

Impractical

Very fast

Not possiblevery easy

very easy

Data Storage and Access Methods Min Song IS698. Database Design Process Conceptual Model Logical...

Documents

SRR Conceptual Model

Robie house conceptual model

Template for Conceptual Model Construction: Model Review ...PURPOSE: This technical note reports on a review of conceptual model construction and use, and identifies how conceptual

A conceptual model for academic success of military nursing …aabri.com/manuscripts/203235.pdf · 2020. 7. 21. · Conceptual model academic success, Page 1 A conceptual model for

Conceptual Model Corporate Performance

Conceptual Model

Conceptual Site Model Training

The Entity-Relationship Model IS698 Min Song. Overview of Database Design Conceptual design: (ER Model is used at this stage.) What are the entities

The Conceptual Site Model

D4.1 Conceptual model of interoperability...D4.1 Conceptual model of interoperability WP4 – Interoperability V1.0 Final Abstract: A conceptual model to solve the interoperability

High School Conceptual Progressions Model I - Bundle 5 ... Conceptual... · High School Conceptual Progressions Model I - Bundle 5 ... bundle of the High School Conceptual Progressions

Conceptual Data Model Tutorial

Conceptual Site Model

INSPIRE Generic Conceptual Model

Conceptual Model: Rapid Cyclogenesis

CONCEPTUAL MODEL & NURSING THEORY

POWER DESIGNER CONCEPTUAL MODEL

Conceptual Site Model Report

PENGARUH MODEL CONCEPTUAL UNDERSTANDING …

Scaffolding conceptual model