Upload
thesupermanreturns
View
626
Download
0
Embed Size (px)
Citation preview
PRESENTATION ON DBMS & SQL
SCHOOL OF MANAGEMENT NIT-W
CONTENTS1) HISTORY2) FOUNDER ,VERSION & FEATURES3) ADVANTAGES OF DBMS AND SQL4) DATABASE & DBMS5) KEYS , NORMALISATION6) ER MODELLING7) DATA WAREHOUSING CONCEPTS8) SQL BASICS9) SQL FORMAT AND QUERIES10) DATA WAREHOUSING TOOLS
HISTORY OF DBMS & SQL• Ancient Times: Human beings began to store information very long ago. In the
ancient times, elaborate database systems were developed by government offices, libraries, hospitals, and business organizations, and some of the basic principles of these systems are still being used today.
• 1960s: Computerized database started in the 1960s, when the use of computers became a more cost-effective option for private organizations. There were two popular data models in this decade: a network model called CODASYL and a hierarchical model called IMS. One database system that proved to be a commercial success was the SABRE system that was used by IBM to help American Airlines manage its reservations data.
• 1970 to 1972: E.F. Codd published an important paper to propose the use of a relational database model, and his ideas changed the way people thought about databases. In his model, the database’s schema, or logical organization, is disconnected from physical information storage, and this became the standard principle for database systems.
1970s: Two major relational database system prototypes were created between the years 1974 and 1977, and they were the Ingres, which was developed at UBC, and System R, created at IBM San Jose. Ingres used a query language known as QUEL, and it led to the creation of systems such as Ingres Corp., MS SQL Server, Sybase, Wang’s PACE, and Britton-Lee. On the other hand, System R used the SEQUEL query language, and it contributed to the development of SQL/DS, DB2, Allbase, Oracle, and Non-Stop SQL.
1976: A new database model called Entity-Relationship, or ER, was proposed by P. Chen this year. This model made it possible for designers to focus on data application, instead of logical table structure.
1980s: Structured Query Language, or SQL, became the standard query language. DB2 became the flagship database product for IBM, and the introduction of the IBM PC resulted in the establishments of many new database companies and the development of products such as PARADOX, RBASE 5000, RIM, Dbase III and IV, OS/2 Database Manager, and Watcom SQL.
Early 1990s: Around this time, new client tools for application development were released, and these included the Oracle Developer, PowerBuilder, VB, and others. A number of tools for personal productivity, such as ODBC and Excel/Access, were also developed. Prototypes for Object Database Management Systems, or ODBMS, were created in the early 1990s.
Late 1990s: Increased investment in online businesses resulted in a rise in demand for Internet database connectors, such as Front Page, Active Server Pages, Java Servelets, Dream Weaver, ColdFusion, Enterprise Java Beans, and Oracle Developer 2000. The use of cgi, gcc, MySQL, Apache, and other systems brought open source solution to the Internet. With the increased use of point-of-sale technology, online transaction processing and online analytic processing began to come of age.
2000s: Although the Internet industry experienced a decline in the early 2000s, database applications continue to grow. New interactive applications were developed for PDAs, point-of-sale transactions, and consolidation of vendors. Presently, the three leading database companies in the western world are Microsoft, IBM, and Oracle.
FOUNDERS OF DBMS & SQLDBMS – The relational model, first proposed in 1970 by Edgar F. Codd,
departed from this tradition by insisting that applications should search for data by content, rather than by following links.
SQL- SQL was initially developed at IBM by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s.This version, initially called SEQUEL (Structured English QUEry Language), was designed to manipulate and retrieve data stored in IBM's original quasi-relational database management system, System R, which a group at IBM San Jose Research Laboratory had developed during the 1970s. The acronym SEQUEL was later changed to SQL because "SEQUEL" was a trademarkof the UK-based Hawker Siddeley aircraft company.
VERSION OF SQL
SPECIAL FEATURES OF DBMS
SPECIAL FEATURES OF SQL• SQL is an ANSI and ISO standard computer language
for creating and manipulating databases.• SQL allows the user to create, update, delete, and
retrieve data from a database.• SQL is very simple and easy to learn.• SQL works with database programs like DB2, Oracle,
MS Access, Sybase, MS SQL Sever etc.
ADVANTAGES OF DBMS• Concurrent Use• Structured and Described Data• Separation of Data and Applications• Data Integrity• Transactions• Data Persistence• Data Views
ADVANTAGES OF SQL• High Speed: SQL Queries can be used to retrieve large amounts of records
from a database quickly and efficiently.• Well Defined Standards Exist: SQL databases use long-established
standard, which is being adopted by ANSI & ISO. Non-SQLdatabases do not adhere to any clear standard.
• No Coding Required: Using standard SQL it is easier to manage database systems without having to write substantial amount of code.
• Emergence of ORDBMS: Previously SQL databases were synonymous with relational database. With the emergence of Object Oriented DBMS, object storage capabilities are extended to relational databases.
DATABASE Database is shared collection of logical related data design to
meet information needs of an oragnization,logical means all data are realed with one another.
Example - Database of hospital contain record of patient,doctor,medicines etc.
In short a database is an organized collection of data.
DBMS A database management system (DBMS) is the software that
allows a computer to perform database functions of storing, retrieving, adding, deleting, and modifying data.
EXAMPLES OF DBMS1. Microsoft access2. MySQL3. Microsoft SQL server4. Oracle 5. File maker pro
TYPES OF DBMS1. Hierarchical DBMS 2. Network DBMS3. Relational DBMS4. Flat file DBMS5. Object Oriented DBMS
HIERARCHICAL DBMS
A hierarchical database is a design that uses a one-to-many relationship for data elements. Hierarchical database models use a tree structure that links a number of disparate elements to one "owner," or "parent,”.
NETWORK DBMS Network databases are similar to hierarchical databases by also
having a hierarchical structure. There are a few key differences, however. Instead of looking like a tree, a network database looks more like a cobweb or interconnected network of records. In network databases, children are called members and parents are called owners. The most important difference is that each child or member can have more than one parent (or owner).
RELATIONAL DBMS RDBMS is the most important database system used in the
software industry today. It was exclusively used to establish the relation the relationship between two-database objects or two database tables.
The Relationship may be – One - One
One - ManyMany - OneMany – Many
WHAT IS A TABLE IN RDBMS ?
In Relational database, a table is a collection of data elements organised in terms of rows and columns. A table is also considered as convenient representation of relations. But a table can have duplicate tuples while a true relation cannot have duplicate tuples. Table is the most simplest form of data storage.
WHAT IS A RECORD IN RDBMS? A single entry in a table is called a Record or Row.
A Record in a table represents set of related data.
WHAT IS A FIELD IN RDBMS ? A table consists of several records(row), each record can be
broken into several smaller entities known as Fields.
WHAT IS A COLUMN IN RDBMS ? In Relational table, a column is a set of value of a particular
type. The term Attribute is also used to represent a column.
FLAT FILE DBMS
In flat file database management system the user specifies the data attributes for one table at a time, storing data independently from application.
OBJECT-ORIENTED DBMS An object database (also object-oriented database management
system) is a database management system in which information is represented in the form of objects as used in object-orientedprogramming. Object databases are different from relational databaseswhich are table-oriented.
CONCEPT OF PRIMARY KEY &FOREIGN KEY
Primary key uniquely identify a record in the table. We can have only one Primary key in a table.
Foreign key is a field in the table that is primary key in another
table. We can have more than one foreign key in a table.
DATABASE NORMALISATION Database Normalisation is a technique of organizing the data
in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.
TYPES -• First Normal Form• Second Normal Form• Third Normal Form• BCNF
1NF REQUIREMENTS The requirements to satisfy the 1st NF:
– Each table has a primary key: minimal set of attributes which can uniquely identify a record
– The values in each column of a table are atomic (No multi-value attributes allowed).
– There are no repeating groups: two columns do not store similar information in the same table.
1st NORMAL FORM EXAMPLEUn-normalized Students table:
Normalized Students table:
Student# AdvID AdvName AdvRoom Class1 Class2123 123A James 555 102-8 104-9124 123B Smith 467 209-0 102-8
Student# AdvID AdvName AdvRoom Class#
123 123A James 555 102-8123 123A James 555 104-9124 123B Smith 467 209-0124 123B Smith 467 102-8
2nd NORMAL FORM REQUIREMENTS
The requirements to satisfy the 2nd NF:– All requirements for 1st NF must be met.– Redundant data across multiple rows of a table must be
moved to a separate table.The resulting tables must be related to each other by use of foreign key.
2nd NORMAL FORM EXAMPLEStudents table
Registration table
Student# AdvID AdvName AdvRoom123 123A James 555124 123B Smith 467
Student# Class#123 102-8123 104-9124 209-0124 102-8
ER MODELLING The entity-relationship model (or ER model) is a way of
graphically representing the logical relationships of entities (or objects) in order to create a database. The ER model was first proposed by Peter Pin-Shan Chen of Massachusetts Institute of Technology (MIT) in the 1970s.
COMPONENTS OF ERD An ERD typically consists of four different graphical
components:1. Entity2. Relationship3. Cardinality4. Attribute
CARDINALITY CONSTRAINTS Express the number of entities to which another entity can be
associated via a relationship set.• Cardinality Constraints - the number of instances of one entity
that can or must be associated with each instance of another entity.
• Minimum Cardinality– If zero, then optional– If one or more, then mandatory
• Maximum Cardinality– The maximum number
CARDINALITY CONSTRAINTS (CONTD.) For a binary relationship set the mapping cardinality must be one of
the following types:– One to one
• A Manager Head one Department and vice versa– One to many ( or many to one)
• An Employee Works in one Department or One Department has many Employees
– Many to many • A Teacher Teaches many Students and A student is taught by
many Teachers
CARDINALITY CONSTRAINTS (CONTD.)
STEPS TO CREATE AN ERD• Identify the entity• Identify the entity's attributes• Identify the Primary Keys• Identify the relation between entities• Identify the Cardinality constraint• Draw the ERD• Check the ERD
STEPS IN BUILDING ERD
DEVELOPING AN ERDThe process has ten steps:
1. Identify Entities2. Find Relationships3. Draw Rough ERD4. Fill in Cardinality5. Define Primary Keys6. Draw Key-Based ERD7. Identify Attributes8. Map Attributes9. Draw fully attributed ERD10. Check Results
A SIMPLE EXAMPLE
A company has several departments. Each department has a supervisor and at least one employee. Employees must be assigned to at least one, but possibly more departments. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number.
IDENTIFY ENTITIES• One approach to this is to work through the information and highlight
those words which you think correspond to entities.
• A company has several departments. Each department has a supervisor and at least one employee. Employees must be assigned to at least one, but possibly more departments. At least one employee is assigned to a project, but an employee may be on vacation and not assigned to any projects. The important data fields are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee number and a unique project number.
• A true entity should have more than one instance
FIND RELATIONSHIPS• Aim is to identify the associations, the connections between
pairs of entities. • A simple approach to do this is using a relationship matrix
(table) that has rows and columns for each of the identified entities.
FIND RELATIONSHIPS (Contd.) Go through each cell and decide whether or not there is an
association. For example, the first cell on the second row is used to indicate if there is a relationship between the entity "Employee" and the entity "Department".
IDENTIFIED RELATIONSHIPS Names placed in the cells are meant to capture/describe the
relationships. So you can use them like this• A Department is assigned an employee• A Department is run by a supervisor• An employee belongs to a department• An employee works on a project• A supervisor runs a department• A project uses an employee
DRAW ROUGH ERDDraw a diagram and:• Place all the entities in rectangles• Use diamonds and lines to represent the relationships
between entities.• General Examples
DRAWING ROUGH ERD (Contd.)
DRAWING ROUGH ERD (Contd.)
DRAWING ROUGH ERD (Contd.)
FILL IN CARDINALITY• Supervisor
– Each department has one supervisor.• Department
– Each supervisor has one department.– Each employee can belong to one or more departments
• Employee– Each department must have one or more employees– Each project must have one or more employees
• Project– Each employee can have 0 or more projects.
FILL IN CARDINALITY (Contd.)The cardinality of a relationship can only have the following
values– One and only one– One or more– Zero or more– Zero or one
CARDINALITY NOTATION
CARDINALITY EXAMPLES
A
A
A
A
B
B
B
B
Each instance of A is related to a minimum ofzero and a maximum of one instance of B
Each instance of B is related to a minimum ofone and a maximum of one instance of A
Each instance of A is related to a minimum ofone and a maximum of many instances of B
Each instance of B is related to a minimum ofzero and a maximum of many instances of A
ERD WITH CARDINALITY
EXAMPLES
ERD FOR COURSE ENROLLMENT
ERD FOR COURSE REGISTRATION
ROUGH EBD WITH PRIMARY KEYS
IDENTIFY ATTRIBUTES• In this step we try to identify and name all the attributes essential
to the system we are studying without trying to match them to particular entities.
• The best way to do this is to study the forms, files and reports currently kept by the users of the system and circle each data item on the paper copy.
• Cross out those which will not be transferred to the new system, extraneous items such as signatures, and constant information which is the same for all instances of the form (e.g. your company name and address). The remaining circled items should represent the attributes you need. You should always verify these with your system users. (Sometimes forms or reports are out of date.)
• The only attributes indicated are the names of the departments, projects, supervisors and employees, as well as the supervisor and employee NUMBER and a unique project number.
MAP ATTRIBUTES• For each attribute we need to match it with exactly one entity.
Often it seems like an attribute should go with more than one entity (e.g. Name). In this case you need to add a modifier to the attribute name to make it unique (e.g. Customer Name, Employee Name, etc.) or determine which entity an attribute "best' describes.
• If you have attributes left over without corresponding entities, you may have missed an entity and its corresponding relationships. Identify these missed entities and add them to the relationship matrix now.
MAP ATTRIBUTES (Contd.)
DRAW FINAL ERD
CHECK ERD RESULTS• Look at your diagram from the point of view of a system owner or
user. Is everything clear? • Check through the Cardinality pairs.• Also, look over the list of attributes associated with each entity to
see if anything has been omitted.
DATA WAREHOUSING An enterprise structured repository of subject-oriented, time-
variant, historical data used for information retrieval and decision support. The data warehouse stores atomic and summary data.”
ETL TOOL
DATA FLOW DIAGRAM
OLTP(DATABASE) V/S DATA WAREHOUSE
– OLTP• Application Oriented like-• CRM,ERP• Used to run business• Detailed data• Current up to date• Isolated Data• Clerical User• Few Records accessed at a time (tens)• Read/Update Access• No data redundancy• Database Size 100MB -100 GB• Transaction throughput is the
performance metric• Thousands of users• Managed in entirety• EXAMPLE- Bank Accounts
• Warehouse (DSS)– Subject Oriented like –– MIS,ESS,DSS– Used to analyze business– Summarized and refined– Snapshot data– Integrated Data– Knowledge User (Manager)– Large volumes accessed at a time (millions)– Mostly Read (Batch Update)– Redundancy present– Database Size 100 GB - few terabytes– Query throughput is the performance metric– Hundreds of users– Managed by subsets– EXAMPLE- Company financial report of year 2001
TO SUMMARIZE ...• OLTP Systems are
used to “run” a business
• The Data Warehouse helps to “optimize” the business
BUSINESS INTELLIGENCE & DATA WAREHOUSING
DATA MINING Data Mining is the process of extracting information from the
company's various databases and re-organizing it for purposes. For example: for targeted marketing, data mining can use data
on past promotional mailings to identify the targets most likely to maximize the return on the company’s investment in future mailings.
DATA MART• Small• Flexible• Customized by Department• OLAP• Source is departmentally
structured data warehouse
Data mart
Data warehouse
SQL LANGUAGEWHAT IS SQL ?
• SQL stands for Structured Query Language• SQL lets you access and manipulate databases• SQL is an ANSI (American National Standards Institute)
standard
TYPES OF SQL LANGUAGES• DDL (Data Definition Language)• DML (Data Manipulation Language)• DQL (Data Query Language)• DCL (Data Control Language)• Data administration commands• Transactional control commands
DMLDML is abbreviation of Data Manipulation Language. It is used to retrieve, store, modify, delete, insert and update data in database.Examples: SELECT, UPDATE, INSERT statements
DDLDDL is abbreviation of Data Definition Language. It is used to create and modify the structure of database objects in database.Examples: CREATE, ALTER, DROP statements
DCLDCL is abbreviation of Data Control Language. It is used to create roles, permissions, and referential integrity as well it is used to control access to database by securing it.Examples: GRANT, REVOKE statements
TCLTCL is abbreviation of Transactional Control Language. It is used to manage different transactions occurring within a database.Examples: COMMIT, ROLLBACK statements
WHAT CAN SQL DO ?• SQL can execute queries against a database• SQL can retrieve data from a database• SQL can insert records in a database• SQL can update records in a database• SQL can delete records from a database• SQL can create new databases• SQL can create new tables in a database• SQL can create stored procedures in a database• SQL can create views in a database• SQL can set permissions on tables, procedures, and views
IMPORTANT SQL COMMANDS• SELECT - extracts data from a database• UPDATE - updates data in a database• DELETE - deletes data from a database• INSERT INTO - inserts new data into a database• CREATE DATABASE - creates a new database• ALTER DATABASE - modifies a database• CREATE TABLE - creates a new table• ALTER TABLE - modifies a table• DROP TABLE - deletes a table• CREATE INDEX - creates an index (search key)• DROP INDEX - deletes an index
SOME BASIC SQL SYNTAX 1) SQL SELECT Statement SELECT column_name,column_name
FROM table_name;
2) SQL DELETE Statement DELETE FROM table_name
WHERE some_column=some_value;
3) SQL UPDATE Statement UPDATE table_name
SET column1=value1,column2=value2,...WHERE some_column=some_value;
DATABASE BASICS
Database
Tables
In a relational database, data is stored in tables.
DATABASE BASICS
Database
Tables
Columns
Each table consists of columns and rows. Each column is a field in a record, and there is a column name associated with each column.
DATABASE BASICS
Database
Tables
Columns
Rows
Each row represents one record. When we say how many records we have, we are referring to the number of rows.
SELECT … FROM
SELECT “COLUMN_NAME”FROM “TABLE_NAME”
SQL is structured similar to the English language. The basic command for retrieving data from a database table is to SELECT data FROM a table. Not surprisingly, the keywords "SELECT" and "FROM" make up the core of a SQL statement.
The syntax for “SELECT… FROM” is:
SELECT … FROM
Different ways of selecting data:
Select more than 1 column:SELECT “COLUMN_NAME_1”, “COLUMN_NAME_2”FROM “TABLE_NAME”
Select all columns:SELECT *FROM “TABLE_NAME”
Select unique values:SELECT DISTINCT “Column_Name”FROM “TABLE_NAME”
WHERE
SELECT “COLUMN_NAME”FROM “TABLE_NAME”WHERE “CONDITION”
Sometimes we want to retrieve only a subset of the data. In those cases, we use the “WHERE” keyword.
The syntax for “WHERE” is:
CONDITION represents how we want the data to be filtered.
ORDER BY
SELECT “COLUMN_NAME”FROM “TABLE_NAME”WHERE “CONDITION”ORDER BY “COLUMN_NAME” [ASC | DESC]
When we want to list the results in a particular order (ascending or descending), we use the ORDER BY keyword at the end of the SQL statement.
The syntax for “ORDER BY” is:
MATHEMATICAL FUNCTIONS
SQL has built-in mathematical functions to allow us to perform mathematical operations on the data. Common mathematical functions include:• SUM• AVG• COUNT• MAX• MIN
GROUP BY
SELECT “COLUMN_NAME_1”, FUNCTION(“COLUMN_NAME_2”)
FROM “TABLE_NAME”WHERE “CONDITION”GROUP BY “COLUMN_NAME_1”
To summarize, the syntax for GROUP BY is as follows:
ORDER OF SQL COMMANDS
A SELECT statement has the following order:• SELECT … FROM• WHERE• GROUP BY• HAVING• ORDER BY
INFORMATICA – DATA WAREHOUSING TOOL
Informatica is one of the powerful ETL tool, easy to work which supports all the steps of Extraction, Transformation and Load process. It has got a simple visual interface like forms in visual basic. You just need to drag and drop different objects (known as transformations) and design process flow for Data extraction transformation and load.
COMPONENTS Main Components of Informatica Power Center 1. Repository: Repository is the heart of Informatica tool. Repository is a
kind of data inventory where all the data related to mappings, sources, targets etc is kept. All the client tools and Informatica Server fetch data from Repository.
2. Informatica PowerCenter Server: All the executions take place in server, it makes physical connections to sources/targets, fetches data, applies the transformations mentioned in the mapping and loads the data in the target system.
3. Informatica PowerCenter Client Tools: These tools enable a developer to define transformation process, known as mapping in Designer, define run-time properties for a mapping, known as sessions in Workflow Manager, monitor execution of sessions in Workflow Monitor and manage repository, useful for administrators in Repository Manager.
FINAL BO REPORTS(BI REPORTS)