24
CERN/IT/DB Database Environments Pfleeger’s overview – Ch 6 CSc 250 – Part 1

Database Environments Pfleeger’s overview – Ch 6

Embed Size (px)

DESCRIPTION

Database Environments Pfleeger’s overview – Ch 6. CSc 250 – Part 1. Database Environments Pfleeger’s overview – Ch 6. Many differences between OS and database (DB) environments Modern DB systems have comprehensive development and processing, and - PowerPoint PPT Presentation

Citation preview

Page 1: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Database EnvironmentsPfleeger’s overview – Ch 6

CSc 250 – Part 1

Page 2: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DBDatabase Environments Pfleeger’s overview – Ch 6

Many differences between OS and database (DB) environments

Modern DB systems have comprehensive development and processing, and the comprehensive set of services + environment is a “DBMS”

• Stores, manages and processes data/info based on a data model,Most common one now is relational model (assume that here)

• DB model specifies: a) representation of data b) operations c) integrities (aka constraints)where operations must preserve existing/defined integrities-- in OS, users can represent + process arbitrary data/info, if desired

• For our purposes, DBMS is software only – no special-purpose hardware-- think of DBMS as a huge application running on OS

• Each DBMS user accesses a “schema” – logical description of the data entities and data values to be manipulated

- entity occurrences are physically stored in table rows (aka “records” in 3GL)- each row is identified by a unique key value, and a row contains a fixed number of column values (field values)

Page 3: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Database Environments Pfleeger’s overview – Ch 6

• Like OS, administration done by DataBase Administration (DBA) For a large DB, this can be a complex job, done by >= 1 administrator

-- Administration can be very involved at a large site, just as big a job as OS sys admin

DBA responsibility can be divided and categorized, unlike absolute unix root

• MAC and DAC are supported in situations where they are appropriate:

Ex: - must authenticate a client database connection

- user U can grant access rights on table T, that is owned by U

to other user(s)

Several prominent relational DBMS vendors, some platform specific (Access, SQL Server, DB2) and others are platform independent (Oracle, CA, etc.), open: mySQL

Will use Oracle as representative, since it is among most comprehensive, and scales up well for huge DBs

Page 4: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Database Environments Pfleeger’s overview – Ch 6

• DBMS implements model operations with a query language, specially-suited for data retrieval and manipulation; international standard SQL dialects have evolved (92/99)

• Here, will only consider elementary SQL queries, with form:

SELECT ItemList Column Names or expressions (to be read)

FROM TableList Table Names

WHERE condition condition is arbitrary combination of boolean- valued expressions that a result row must

satisfy

• When a user creates a table (Table Name a legal DB (not OS) identifier),

every row stored will occupy physical storage that is managed by DBMS, not OS

(a table is analogous to an OS file containing records)

• Specially-designed, and application-specific sections of >=1 table(s) can be collected together into a view

-- Modeling user’s viewpoints and data access needs best done using views

Page 5: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Database Environments Pfleeger’s overview – Ch 6

• A view is just a definition of a combination of data in >=1 table(s)

No physical storage associated with a view

Normally, views are not updated, just SELECT applied to a given view

• Given the power and relative simplicity of DB query languages, there are some unique security problems in DB that do not have solutions:

* Inference problem – in a DB with sensitive + public data: it might be possible to issue a combination of individually permitted queries whose final result

reveals sensitive data

* Multilevel DB with differentiated security – the security of a single column value might be different from the security of other column values of the same row or

from the same column in other rows

In general, security should be implemented for every row/column data value

Page 6: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DBDatabase Environments Pfleeger’s overview – Ch 6

Pfleeger (pp. 314-15) Many DBMS security issues similar to OS

• Physical, Logical and Value integrity

Outages, failures should not corrupt DB structure or its data

DB queries and accesses must not violate any enabled model constraints

Ex: disallow INSERT of a row with a duplicate key value into a table

Values entered into a DB must always (except during an actual create/update) obey range + format constraints; almost all values managed by the DB are typed values

(Except for special objects like LOBs, images, etc. that might be referenceable from DB, but not managed by the DBMS)

• Access to items can be specified with very high granularity, unlike

OS objects; e.g. simple, rigid unix file permissions apply to whole file

Views into tables provide user access flexibility, because any designated DB user or collection of DB users can be provided with customized access to a view

Also, table T can be private, but a view VT on T can be made SELECTed by user(s)

Page 7: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DBDatabase Environments Pfleeger’s overview – Ch 6

• Can audit any significant accesses and operations on a DB, as with an OS

and identity of user whose actions are audited is based on DB user authentication

(more on DB user authentication in next slides set)

• Along with flexible, data access, via queries, data value integrity is

enforced regardless of the kind of query

Define an object type “x” by creating a DB table “X”:

in Oracle by: create table XName … command

Occurrences of t will be rows generated and stored in T via:

SQL INSERT adds a row to T

Bulk load DBMS tool used to translate OS file records into T rows

(Even during bulk load, the DBMS should enforce constraints/integrities)

Page 8: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DBDatabase Environments Pfleeger’s overview – Ch 6

• Protection features of the OS for DBMSDBMS ultimately relies on underlying OS for security of I/OEx: Oracle DBWR process calls OS platform I/O services when making actual DB changes; any OS flaws for I/O, storage become DBMS vulnerabilities

• Recovery from any kind of failure has always been a functional requirement for DBMSs; unlike OSs, historically, that accepted possible loss of data

• The unit of OS processing is a process The unit of DB processing is a transaction

In general, a transaction “t” has the form:begin_t

q1;q2; Each qi is an arbitrary SQL query or appl. statement:qn;

end_t Each t has atomic execution: either all qj applied OK or t fails and is recovered

Page 9: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DBDatabase Environments Pfleeger’s overview – Ch 6

• Pfleeger’s description (pp. 321-323) of t implementation sketches one way (it is not the only way) to recover failed t

- There are 2 “phases” associated with processing each t:INTENTION phase – gather and save/store info about what the t doesUPDATE phase - t changes eventually applied, whether t fails or not

Recovery is designed to be as automatic as possible, GOAL: to guarantee no loss of data updates except for those t that were

“early” in their execution

Recovery is a defense for the non-availability threat associated with t failure

There is also much in DBMS that relates to efficient performance, because in large-scale DBs, there can be many concurrent users, so query response time is another aspect of availability

Permanent changes phase – DBMS eventually writes the changes specified by the tto the physical DB, whether t failed or not

Page 10: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

DBA functions relevant to security - summary

• Setting up a database server

• User accounts

• DB “objects”

• Backup and Recovery

Page 11: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Setting up a Database server

• DB Server setup is the responsibility of the DBA team

• Crucial to have a good platform (cheap does not pay!!! For performance, security reasons …)– Hardware selection (cluster vs. single host), HA

configuration ?– Raw partitions & RAID 0+1 (stripping and mirroring)

+ Logical Volume manager (SAN?)– GbEthernet or higher, how many interfaces?

• A properly tuned database is ONLY CPU bound

Page 12: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Oracle (Ora) - Setting up a Database server• You have to plan for the database and for the instance –

can run n>=1 instances simultaneously on same server• Database

– Sizing the tablespaces (TS) is VERY importantTablespace is logical space, and

each tablespace has n>=1 physical OS files to store objects associated with this tablespace

• General versus Dedicated• Data growing rate (TS can be created/dropped online)• Locally managed vs. Dictionary managed

– Size and rate/duration of the transactions• redo logs and rollback segments (for the INTENSION phase)

– Size of the sorting operations• size of the temporary tablespaces

Page 13: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Setting up a Database Server

• Instance – can have n>=1 instances of DB simultaneously– Dedicated vs. Multi Threaded Server connections– Number of processes– Store procedures (java, pin pl/sql code in memory)– Sort area size– Many more…

• Backup policy– Has to be agreed with the users– Several options (exp/imp, hot/cold backups)– But do not assume that all critical services backed updo not assume that all critical services backed up

SQL> CREATE DATABASE DEVDBCHARACTER SET "WE8ISO8859P9“DATAFILE '/ORADB/dbs1/devdb/system01.dbf' size 250MLOGFILE group 1 ( '/ORADB/dbs1/devdb/log1.dbf' ) size 100M, group 2 ( '/ORADB/dbs2/devdb/log2.dbf' ) size 100M ;@$ORACLE_HOME/rdbms/admin/catalog@$ORACLE_HOME/rdbms/admin/catproc@$ORACLE_HOME/rdbms/admin/catrep

Page 14: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

User accounts• Any user can request an account in the central

development and production DB handled by IT– But for physics use, there will be dedicated servers

• Accounts have space quota on one or more tablespaces

• In this account, the user will create new database applications (tables, indexes, types …)

• Users can also see/play with data that is on remote databases from the local account– Performance issues must be taken into account due to

network access

SQL> create user cms_anode_board identified by yupyupdefault tablespace data01 temporary tablespace temp01 quota 100M on data01

quota unlimited on indx01;

SQL> grant designer to cms_anode_board;

Page 15: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Security(1/2)DBMS architecture has catalog (aka data dictionary “DD”) that is a central repository of “data about the DB data”

• The DD (in tablespace SYSTEM) can not be modified directly

• For everything that you can do in the database there is an associated database privilege – definitely Least Privilege– A DBA can group system privileges into (named) roles and

assign them to user accounts– A user can also create roles to assign access privileges (on his/her

objects) to another users• By default, you can not see/modify other user’s data

Page 16: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Security(2/2)

• Security can be enhanced in several ways– Data itself can be stored encrypted

– Password can expire (like in the normal OS)

– Secure the communication with the DB using DES, Triple-DES, RSA RC4...

– User authentication with Kerberos, SSL, Cybersafe, Biometrix…

• Although the default is everything in clear– We can switch to security mode immediately

Page 17: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DBFrom here on: selected topics …

• Vast libraries of scripts for many purposes –- tuning- DB admin- securitycareful about Trojan Horse, etc. possibility

• Callouts and externals –- access/reference non-Oracle managed items: sockets, BLOBs, etc

• Distributed Oracle- at least the possibility of security issues is here to consider:

i.e. n>1 cpu, server interconnect, local + shared mass storage(Not covered in this course: potential distributed DB security issues)

Page 18: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

DB Tuning – client(1/2)

• The ability to make your queries faster– 90% of the times all the tuning is done at the user code level

• Most common tuning cases are due to – Missing index or wrong Query Plan chosen by the Optimizer

Tuning can be critical for large DB sites because of the concurrency environment;

Hopefully, DBA install of DB used proper configuration parameters

Page 19: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

DB Tuning – client(2/2)

• You can always see the Query Plan before you run the query – If the optimizer chooses the wrong options you can force a

different choice via Hints

• Detailed statistics about execution of your queries for more advanced tuning – Can easily identify hotspots

• Remedies– Knowing your data & query patterns is essential– Review/test your application carefully before declaring it

production

Reducing query times from hours to minutes (or even seconds) is not unusual

Page 20: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

DB Tuning - server

• The database itself has many parameters that can be tweaked to improve the performance

• The role of the DBA is to understand the impact of those parameters in the general performance of the db.– In the past, you could not modified them without

stopping the database– Now many can be modified online

• VERY IMPORTANT: A vast library of tuning scripts (and 3rd party tools) is available

Page 21: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Useful features (1/4)

• Partitioning– Data is separated physically according to a key although

logically, we see it as single unit – Increased performance, improved data management

• Bitmap index– Very useful for low cardinality data– Even for high cardinality if we apply a function that

reduces it

• Function based index– Index is not created on the data but on the result of

applying a function (native or external)

Page 22: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Useful features (2/4)

• Server side processing (pl/sql, java) i.e.: DB applications– Closer to data, code reusability– Database can do many more things for you than just

storing/retrieving data

• Resumable statements– Transaction freezes while the error condition is fixed– User code does not need to be modified to profit– Limited set of errors are trapped in 9i (initial version)

• Java VM inside the database kernel – Java compiler to speed up the performance– JSP,Servlets,EJB, Java stored procedures

Page 23: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Useful features (3/4)

• External procedures– Extend Oracle function libraries with your own

– Using C for the moment

• External table api allows any external source (ex a socket) to look like a normal table

• XML– XSQL, XSL, XML Parsers for C/C++/Java/PLSQL

– XML native data type

Page 24: Database Environments Pfleeger’s overview – Ch 6

CERN/IT/DB

Useful features (4/4)

• Parallel Query– Query/analyze large amounts of data– Inter parallelism (SMP machines – tested)– Intra parallelism (for clusters - not tested)

• Transportable tablespaces– Allows you to move data from one database to

another very easily and fast (almost plug&play)