Upload
martina-norman
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Managing the Database
Objectives of the Lecture :
•To consider the roles of the Database Administrator.
•To consider the involvmentof the DBMS in the storage and handling of physical data.
•To appreciate different kinds of file organisation and access method.
•To appreciate the need for meta data.
Database Administrator(s) This can vary.
1 person part-time - full-time team.
Depends on the nature, size and usage of the DB : Large DB shared by many users/applications - traditional. Relatively small DB for one person/team/application. Very large data warehouse DB for data mining. different DBMSs may be used for different purposes
different levels of technical support required.
Depends on how work allocated w.r.t. other computer staff. DBA may/may not be involved in :
application development/support; computer system/network support.
Why is DB Management Needed ?
A DB is a coherent, integrated collection of data. the DB needs to be managed because :
whether the coherence applies to data forone application or many, the coherencemust be created by design, and maintainedas the DB evolves;
data is now accepted as a valuableorganisational asset, that must be cared for;
physical data independence implies theperformance tuning of the DB’s physicalstorage.
DBmanagement
tasks
DBA’s ‘Coherence’ Tasks Creation of the DB.
Obtaining a suitable DBMS. Design of the Logical Schema. Design of Sub Schema(s). Design of Physical Schema. Provision of suitable hardware. Implementation of the DB design. Insertion/loading of valid data into the DB.
Maintenance & Extension of the DB. A few/some/all of the ‘creation’ tasks as appropriate.
Liaising with : End Users. Application Developers
and Systems Staff.
To meet their aims & enforce realistic
constraints.
Some or all, as
required.
DBA’s ‘Caring’ Tasks Maintaining security of DB.
Protection against unauthorised access. Ensure a requested operation on a requested object by a requesting user is acceptable. Need defence in depth; e.g. audit trails, data encryption.
Protecting DB against loss or damage.Need backup copy of DB + Transaction Log. From latest valid copy of DB, roll forward through transaction log repeating transactions till current DB state restored.
Maintaining Standards.Needed for procedures, software, documentation, etc to support other DB activities and ensure their effectiveness.
Day-to-day maintenance operations.Managing DB restarts, keeping backup data, correcting DB errors, investigating problems, updating users’ authorisation, etc.
DBA’s ‘Performance Tuning’ Tasks (1)
Purpose of Physical Data Independence is to allow a relations’s data to be physically stored in many different ways withoutthis affecting what a user/programmer writes in their (SQL) statements to use that relation.
Thus if a relation’s data is moved from one physical storage arrangement to another, the user/programmer is unaware of it.
DBA can and should change a relation’s physical storage if performance can be improved.
DBA’s ‘Performance Tuning’ Tasks (2)
Purpose - optimise trade-off between : User-level - update & retrieval,
different users’/applications’ needs; Hardware level - hard disc, RAM, CPU and network usage.
Design the initial Physical Schema. Map base relations & views to physical files. Decide file locations w.r.t. discs and network nodes. Decide record formats, file organisation & access.
Monitor usage and performance of DB.
Amend Physical Schema when altered usage &/or requirements demand it; and when DB is extended/altered.
Consideration of Physical File Storage
In order to choose optimal physical file designs, the DBA must understand :
How the DBMS handles statements input to it for execution. How a DBMS uses a computer’s memory for the storage and
handling of a DB. How files are organised and accessed in physical storage. What happens at the physical level to execute a DB statement. The performance characteristics of different physical file types.
These topics are now reviewed.The intent is not to show how a DBA can optimise a DB’s physical file design - this is a very large subject on its own - but to give sufficient background information to appreciate the nature of the problem.
DBMS’s Execution of a StatementOn receipt of a (SQL) statement, the DBMS :1. Determines what must be done to execute it; i.e. tokenises and
parses it.2. Follows the mappings between Sub, Logical and Physical
Schemas to determine what data to physically read/write from/to disc;
3. Optimises the method of execution.4. Executes the statement.
The DBMS must output data in the form of relations, even when a relation’s data is physically stored in a quite different way.
Successful optimisation depends on the DBA’s choice of physical file designs for data storage, as well as the DBMS’s optimiser.
Example Retrieval
Example SQL query :-SELECT *FROM CustomerWHERE ACC_NO = ‘123456’ ;
Let ‘Customer’ be a view.
DBMS : Determines what the statement means. Gets definition of view ‘Customer’. Translates query into a logical equivalent using base table(s). Gets location of file(s) holding the base table data, and their
organisation(s) & access method(s). Determines optimum query method. Executes query.
Tokenise & parse.
Use schema data.
File Manager
Disk Manager
DBMS“Get the record with customer account number 123456”
“That’s on the 27th. page ofthe file called “customer”
“That’s on page 14 of cylinder 127”
“Here’s the page you wanted”
“Here’s the page of the file you wanted”
Executing the Example Retrieval
“Here’s the record you wanted”
May be part of the DBMS or the
Operating System.
Part of the Operating System.
CPURandom Access
Memory
(main)
I/O Control
Simplified Computer Architecture
Backing Storage
(secondary)
Typically hard discs
Secondary :• Slow • For ‘permanent’ data • Cheap• Removable / Expandable
Primary :• Fast• For volatile data• Expensive• Limited in size
Primary vs. Secondary Storage
Magnetic/Hard Disks Rotating disc. Each surface comprises concentric tracks.
Each track split into blocks.Read/write head for each surface.
Head moves across to required track, waits till required block comes underneath, reads/writes data from/to block.
Discs may be ‘stacked’ onto one spindle; each surface accessed simultaneously.Cylinder ≡ corresponding tracks on each surface. parallel read/write of 1 cylinder.
Operating systems read/write one page at a time.1 page ≡ 1 / 2 / 4 / 8 / ... block(s).
Computers read/write data from/to secondary memory via Buffers.They are a special part of RAM whose purpose is to handle disc I/O. Required for efficiency.
Buffers & Cache Memory
Buffer use strategies : Double Buffering (alternate filling & emptying), Read Ahead, etc.
Disc Cache is a special kind of buffer - holds frequently read data from disc, to minimise re-reading it from disc.
DBMS must handle buffers.
Buffer 11
Buffer 2 4
3
2
DB Use of Memory
DBs typically :
• need to be kept for a long time;
• are large, and so need a lot of memory.
store them on Secondary Storage.
DBs users typically require significant processing of data.Picking out parts of relations, merging relations, doing calculations of stored data, sorting data, etc.
read data into Primary Storage, and process the data there.
DBMS handles all this for the user.
File Structure and Content A file consists of a sequence of records. Records in a file may have :
• a fixed or variable structure,• a fixed or variable length.
A record consists of a sequence of fields. Each field holds a value of a certain data type.
(A record often holds the values of one tuple).
A disc block normally holds several records :-
Data ItemData Item Data Item Data Item Data Item
Data ItemData Item Data Item Data Item Data Item
Data ItemData Item Data Item Data Item Data Item
Records accumulate in a block till there is no further room in it.
A File
Page 1record record record record record record record
Page 3
freespace
Page 2
EOF
etc.
Disc Access TimeAccess ≡ Read From OR Write To.
Disc access is 10,000 - 1,000,000 times longer than RAM access.
Actual speeds continually improve.Despite variations due to the technology used, the relative speeds of RAM and disc are always hugely different. always extremely important to minimise disc access times.
Time taken for disc access depends on :• whether access is Read or Write.• precisely what data is to be accessed; e.g 1 specific record, a certain range of records, all the file.
• file organisation ≡ how the records are laid out in the file.• file access method ≡ how the required record(s) are found in the file.
Minimising Disc Access Time (1)
Problem : File type (= organisation & method) best for one kind of user
access (= read/write & data to be accessed) is worst for another. DB users have very varying access needs.
Solution : DBA chooses suitable file types and clustering of files on pages. DBMS automatically optimises within these parameters.
DBA may also have to set other parameters, e.g. buffer space, page size, that are then used by the DBMS.Possible and worthwhile parameters depend on the individual DBMS.
Minimising Disc Access Time (2)
Actual strategies used :
Minimise data transfers to / from disc : Minimise “search path” (= no. of pages read). Once in main memory, maximise use of each page. Keep “high hit” areas (e.g. index pages) in data cache.
Minimise disc handling : Minimise head movement: read whole cylinders. Minimise Latency: read whole tracks. Choose best compromise page size. Minimise CPU waiting: Buffer I/O, read ahead.
Possible strategies available depend on the individual DBMS.
File TypesFile Organisation : Serial File - new records added to the end of a file; records not in
sequence; may mark a deleted record with a ‘tombstone’. Sequential File - all records maintained in order of value(s) in
one or more fields. (Could correspond to candidate key values).
File Access Method : Sequential Access - go through file’s pages in some sequence. Indexed Access - use an index to go straight to the required page.
Many types of index : B-tree, Secondary, Bitmap, etc. Hashed Access - calculate page location with a ‘hash algorithm’. Pointer Chain - pointer in a retrieved record is used to access the
next page.
Different combinations of organisation & access method give very different performance characteristics. choose appropriately to serve required access of data.
etc.
Page 1
Page 3Page 2
EOF
434565
123232
675484
329545
976545
000212
000002
654737
845655
Serial Organisation, Sequential Access
etc.
Page 1
Page 3Page 2
EOF
434565
123232
675484
029545
976545
000212
000002
654737
845655
Sequential Organisation & Access
. .
Page 3
171345
161345
171978
161344
189000
161343
161222
171543
180987
Page 2
155565
154345
157484
148898
160545
148789
148000
156737
158655
Page 1
134565
123232
145484
000545
147545
000212
000002
144737
146655
147545 1
160545 2189000 3
etc.
key address Index
Data pages
.
‘Indexed (Access) Sequential File’
DBMS’s Need for Meta DataIn addition to the DB’s data,
the DBMS needs considerable data about the DB data in order to function.For example, in order to operate, the DBMS needs to know :
Names of DB relations, & whether base or view. Names & data types of attributes in relations. Mapping between relations and physical files Names and locations of physical files. Organisation & access methods of files. Buffering available. etc.
Meta data is stored in a Data Dictionary / SQL Catalog.This is another DB. It is stored & used in the same way as the main DB. The DBMS automatically updates it when relations, etc are created, retrieves from it to execute statements.
meta data=
data about data
DBA’s Need for Meta Data
The DBA needs similar data about the DB to carry out their functions :
Coherence tasks : e.g. check on relations (attributes & data types, integrity constraints), schemas, files.
Caring tasks : e.g. check on authorised users and their access privileges, state of backups and logs.
Performance Tuning tasks : e.g. check on usage, file sizes, file types.
Often used to look up data for mundane, everyday tasks, since a DB is often too big for the DBA to remember everything.
Note that most of this meta data is also used by the DBMS.