27
Managing the Database Objectives of the Lecture : •To consider the roles of the Database Administrator. •To consider the involvmentof the DBMS in the storage and handling of physical data. •To appreciate different kinds of file organisation and access method. •To appreciate the need for meta data.

Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Embed Size (px)

Citation preview

Page 1: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Managing the Database

Objectives of the Lecture :

•To consider the roles of the Database Administrator.

•To consider the involvmentof the DBMS in the storage and handling of physical data.

•To appreciate different kinds of file organisation and access method.

•To appreciate the need for meta data.

Page 2: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Database Administrator(s) This can vary.

1 person part-time - full-time team.

Depends on the nature, size and usage of the DB : Large DB shared by many users/applications - traditional. Relatively small DB for one person/team/application. Very large data warehouse DB for data mining. different DBMSs may be used for different purposes

different levels of technical support required.

Depends on how work allocated w.r.t. other computer staff. DBA may/may not be involved in :

application development/support; computer system/network support.

Page 3: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Why is DB Management Needed ?

A DB is a coherent, integrated collection of data. the DB needs to be managed because :

whether the coherence applies to data forone application or many, the coherencemust be created by design, and maintainedas the DB evolves;

data is now accepted as a valuableorganisational asset, that must be cared for;

physical data independence implies theperformance tuning of the DB’s physicalstorage.

DBmanagement

tasks

Page 4: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DBA’s ‘Coherence’ Tasks Creation of the DB.

Obtaining a suitable DBMS. Design of the Logical Schema. Design of Sub Schema(s). Design of Physical Schema. Provision of suitable hardware. Implementation of the DB design. Insertion/loading of valid data into the DB.

Maintenance & Extension of the DB. A few/some/all of the ‘creation’ tasks as appropriate.

Liaising with : End Users. Application Developers

and Systems Staff.

To meet their aims & enforce realistic

constraints.

Some or all, as

required.

Page 5: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DBA’s ‘Caring’ Tasks Maintaining security of DB.

Protection against unauthorised access. Ensure a requested operation on a requested object by a requesting user is acceptable. Need defence in depth; e.g. audit trails, data encryption.

Protecting DB against loss or damage.Need backup copy of DB + Transaction Log. From latest valid copy of DB, roll forward through transaction log repeating transactions till current DB state restored.

Maintaining Standards.Needed for procedures, software, documentation, etc to support other DB activities and ensure their effectiveness.

Day-to-day maintenance operations.Managing DB restarts, keeping backup data, correcting DB errors, investigating problems, updating users’ authorisation, etc.

Page 6: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DBA’s ‘Performance Tuning’ Tasks (1)

Purpose of Physical Data Independence is to allow a relations’s data to be physically stored in many different ways withoutthis affecting what a user/programmer writes in their (SQL) statements to use that relation.

Thus if a relation’s data is moved from one physical storage arrangement to another, the user/programmer is unaware of it.

DBA can and should change a relation’s physical storage if performance can be improved.

Page 7: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DBA’s ‘Performance Tuning’ Tasks (2)

Purpose - optimise trade-off between : User-level - update & retrieval,

different users’/applications’ needs; Hardware level - hard disc, RAM, CPU and network usage.

Design the initial Physical Schema. Map base relations & views to physical files. Decide file locations w.r.t. discs and network nodes. Decide record formats, file organisation & access.

Monitor usage and performance of DB.

Amend Physical Schema when altered usage &/or requirements demand it; and when DB is extended/altered.

Page 8: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Consideration of Physical File Storage

In order to choose optimal physical file designs, the DBA must understand :

How the DBMS handles statements input to it for execution. How a DBMS uses a computer’s memory for the storage and

handling of a DB. How files are organised and accessed in physical storage. What happens at the physical level to execute a DB statement. The performance characteristics of different physical file types.

These topics are now reviewed.The intent is not to show how a DBA can optimise a DB’s physical file design - this is a very large subject on its own - but to give sufficient background information to appreciate the nature of the problem.

Page 9: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DBMS’s Execution of a StatementOn receipt of a (SQL) statement, the DBMS :1. Determines what must be done to execute it; i.e. tokenises and

parses it.2. Follows the mappings between Sub, Logical and Physical

Schemas to determine what data to physically read/write from/to disc;

3. Optimises the method of execution.4. Executes the statement.

The DBMS must output data in the form of relations, even when a relation’s data is physically stored in a quite different way.

Successful optimisation depends on the DBA’s choice of physical file designs for data storage, as well as the DBMS’s optimiser.

Page 10: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Example Retrieval

Example SQL query :-SELECT *FROM CustomerWHERE ACC_NO = ‘123456’ ;

Let ‘Customer’ be a view.

DBMS : Determines what the statement means. Gets definition of view ‘Customer’. Translates query into a logical equivalent using base table(s). Gets location of file(s) holding the base table data, and their

organisation(s) & access method(s). Determines optimum query method. Executes query.

Tokenise & parse.

Use schema data.

Page 11: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

File Manager

Disk Manager

DBMS“Get the record with customer account number 123456”

“That’s on the 27th. page ofthe file called “customer”

“That’s on page 14 of cylinder 127”

“Here’s the page you wanted”

“Here’s the page of the file you wanted”

Executing the Example Retrieval

“Here’s the record you wanted”

May be part of the DBMS or the

Operating System.

Part of the Operating System.

Page 12: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

CPURandom Access

Memory

(main)

I/O Control

Simplified Computer Architecture

Backing Storage

(secondary)

Typically hard discs

Page 13: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Secondary :• Slow • For ‘permanent’ data • Cheap• Removable / Expandable

Primary :• Fast• For volatile data• Expensive• Limited in size

Primary vs. Secondary Storage

Page 14: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Magnetic/Hard Disks Rotating disc. Each surface comprises concentric tracks.

Each track split into blocks.Read/write head for each surface.

Head moves across to required track, waits till required block comes underneath, reads/writes data from/to block.

Discs may be ‘stacked’ onto one spindle; each surface accessed simultaneously.Cylinder ≡ corresponding tracks on each surface. parallel read/write of 1 cylinder.

Operating systems read/write one page at a time.1 page ≡ 1 / 2 / 4 / 8 / ... block(s).

Page 15: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Computers read/write data from/to secondary memory via Buffers.They are a special part of RAM whose purpose is to handle disc I/O. Required for efficiency.

Buffers & Cache Memory

Buffer use strategies : Double Buffering (alternate filling & emptying), Read Ahead, etc.

Disc Cache is a special kind of buffer - holds frequently read data from disc, to minimise re-reading it from disc.

DBMS must handle buffers.

Buffer 11

Buffer 2 4

3

2

Page 16: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DB Use of Memory

DBs typically :

• need to be kept for a long time;

• are large, and so need a lot of memory.

store them on Secondary Storage.

DBs users typically require significant processing of data.Picking out parts of relations, merging relations, doing calculations of stored data, sorting data, etc.

read data into Primary Storage, and process the data there.

DBMS handles all this for the user.

Page 17: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

File Structure and Content A file consists of a sequence of records. Records in a file may have :

• a fixed or variable structure,• a fixed or variable length.

A record consists of a sequence of fields. Each field holds a value of a certain data type.

(A record often holds the values of one tuple).

A disc block normally holds several records :-

Data ItemData Item Data Item Data Item Data Item

Data ItemData Item Data Item Data Item Data Item

Data ItemData Item Data Item Data Item Data Item

Records accumulate in a block till there is no further room in it.

Page 18: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

A File

Page 1record record record record record record record

Page 3

freespace

Page 2

EOF

etc.

Page 19: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Disc Access TimeAccess ≡ Read From OR Write To.

Disc access is 10,000 - 1,000,000 times longer than RAM access.

Actual speeds continually improve.Despite variations due to the technology used, the relative speeds of RAM and disc are always hugely different. always extremely important to minimise disc access times.

Time taken for disc access depends on :• whether access is Read or Write.• precisely what data is to be accessed; e.g 1 specific record, a certain range of records, all the file.

• file organisation ≡ how the records are laid out in the file.• file access method ≡ how the required record(s) are found in the file.

Page 20: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Minimising Disc Access Time (1)

Problem : File type (= organisation & method) best for one kind of user

access (= read/write & data to be accessed) is worst for another. DB users have very varying access needs.

Solution : DBA chooses suitable file types and clustering of files on pages. DBMS automatically optimises within these parameters.

DBA may also have to set other parameters, e.g. buffer space, page size, that are then used by the DBMS.Possible and worthwhile parameters depend on the individual DBMS.

Page 21: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

Minimising Disc Access Time (2)

Actual strategies used :

Minimise data transfers to / from disc : Minimise “search path” (= no. of pages read). Once in main memory, maximise use of each page. Keep “high hit” areas (e.g. index pages) in data cache.

Minimise disc handling : Minimise head movement: read whole cylinders. Minimise Latency: read whole tracks. Choose best compromise page size. Minimise CPU waiting: Buffer I/O, read ahead.

Possible strategies available depend on the individual DBMS.

Page 22: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

File TypesFile Organisation : Serial File - new records added to the end of a file; records not in

sequence; may mark a deleted record with a ‘tombstone’. Sequential File - all records maintained in order of value(s) in

one or more fields. (Could correspond to candidate key values).

File Access Method : Sequential Access - go through file’s pages in some sequence. Indexed Access - use an index to go straight to the required page.

Many types of index : B-tree, Secondary, Bitmap, etc. Hashed Access - calculate page location with a ‘hash algorithm’. Pointer Chain - pointer in a retrieved record is used to access the

next page.

Different combinations of organisation & access method give very different performance characteristics. choose appropriately to serve required access of data.

Page 23: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

etc.

Page 1

Page 3Page 2

EOF

434565

123232

675484

329545

976545

000212

000002

654737

845655

Serial Organisation, Sequential Access

Page 24: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

etc.

Page 1

Page 3Page 2

EOF

434565

123232

675484

029545

976545

000212

000002

654737

845655

Sequential Organisation & Access

Page 25: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

. .

Page 3

171345

161345

171978

161344

189000

161343

161222

171543

180987

Page 2

155565

154345

157484

148898

160545

148789

148000

156737

158655

Page 1

134565

123232

145484

000545

147545

000212

000002

144737

146655

147545 1

160545 2189000 3

etc.

key address Index

Data pages

.

‘Indexed (Access) Sequential File’

Page 26: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DBMS’s Need for Meta DataIn addition to the DB’s data,

the DBMS needs considerable data about the DB data in order to function.For example, in order to operate, the DBMS needs to know :

Names of DB relations, & whether base or view. Names & data types of attributes in relations. Mapping between relations and physical files Names and locations of physical files. Organisation & access methods of files. Buffering available. etc.

Meta data is stored in a Data Dictionary / SQL Catalog.This is another DB. It is stored & used in the same way as the main DB. The DBMS automatically updates it when relations, etc are created, retrieves from it to execute statements.

meta data=

data about data

Page 27: Managing the Database Objectives of the Lecture : To consider the roles of the Database Administrator. To consider the involvmentof the DBMS in the storage

DBA’s Need for Meta Data

The DBA needs similar data about the DB to carry out their functions :

Coherence tasks : e.g. check on relations (attributes & data types, integrity constraints), schemas, files.

Caring tasks : e.g. check on authorised users and their access privileges, state of backups and logs.

Performance Tuning tasks : e.g. check on usage, file sizes, file types.

Often used to look up data for mundane, everyday tasks, since a DB is often too big for the DBA to remember everything.

Note that most of this meta data is also used by the DBMS.