37
cs3431 Advanced Topics: Indexes & Transactions Instructor: Mohamed Eltabakh [email protected]

Advanced Topics: Indexes & Transactions

  • Upload
    sovann

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Advanced Topics: Indexes & Transactions. Instructor: Mohamed Eltabakh [email protected]. Indexes. Why Indexes. With or without indexes, the query answer should be the same Indexes are needed for efficiency and fast access of data. Without index, we check all 10,000 students. - PowerPoint PPT Presentation

Citation preview

Page 1: Advanced Topics: Indexes & Transactions

cs3431

Advanced Topics: Indexes & Transactions

Instructor: Mohamed Eltabakh [email protected]

Page 2: Advanced Topics: Indexes & Transactions

Indexes

cs3431

Page 3: Advanced Topics: Indexes & Transactions

Why Indexes With or without indexes, the query answer should be the

same

Indexes are needed for efficiency and fast access of data

cs3431

SELECT *

FROM Student

WHERE sNumber = 76544357;

Assume we have 10,000 students

Without index, we check all 10,000 students

With index, we can reach that student directly

Page 4: Advanced Topics: Indexes & Transactions

Direct Access vs. Sequential Access

cs3431

SELECT *

FROM Student

WHERE sNumber = 76544357;

Without index, we check all 10,000 students

(sequential access)

With index, we can reach that student directly

(direct access)

Page 5: Advanced Topics: Indexes & Transactions

What is an Index A index is an auxiliary file that makes it more efficient to search for

a record in the data file

The index is usually specified on one field of the file Although it could be specified on several fields

The index is stored separately from the base table

Each table may have multiple indexes

cs3431

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

3 Matt 320FL 2

Student Can create an index on sNumber

Can create a second index on sName

Page 6: Advanced Topics: Indexes & Transactions

Example: Index on sNumber

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

1

2

3

4

10

100

Index on sNumber

Index file is always sorted

Index size is much smaller than the table size

Now any query (equality or range) on sNumber can be efficiently answered (Binary search on the index)

Page 7: Advanced Topics: Indexes & Transactions

Example: Index on sName

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

Dave

Dave

Greg

John

Matt

Matt

Index on sName

Duplicates values have duplicate entries in the index

Now any query (equality or range) on sName can be efficiently answered (Binary search on the index)

Page 8: Advanced Topics: Indexes & Transactions

Creating an Index

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

Create Index <name> On <tablename>(<colNames>);

Create Index sNumberIndex On Student(sNumber);

Create Index sNameIndex On Student(SName);

DB System knows how to:

1- create the index2- when and how to use it

DB System knows how to:

1- create the index2- when and how to use it

Page 9: Advanced Topics: Indexes & Transactions

Multiple Predicates

cs3431

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt 50WA …

4 John 50WA ..

3 Dave 200LA ..

Student

Create Index addessIndex On Student(address);

SELECT *

FROM Student

WHERE address = ‘320FL’

AND sName = ‘Dave’;

1- The best the DBMS can do is using addressIndex ‘320FL’2- From those tuples, check sName = ‘Dave’1- The best the DBMS can do is using addressIndex ‘320FL’2- From those tuples, check sName = ‘Dave’

Page 10: Advanced Topics: Indexes & Transactions

Multi-Column Indexes Columns X, Y are frequently queried together (with AND)

Each column has many duplicates

Then, consider creating a multi-column index on X, Y

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt 50WA …

4 John 50WA ..

3 Dave 200LA ..

Create Index nameAdd On Student(sName, address);

SELECT *

FROM Student

WHERE address = ‘320FL’

AND sName = ‘Dave’;

Directly returns this record only

Page 11: Advanced Topics: Indexes & Transactions

Using an Index DBMS automatically figures out which index to use based

on the query

cs3431

SELECT *

FROM Student

WHERE sNumber = 76544357; sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

Create Index sNumberIndex On Student(sNumber);

Create Index sNameIndex On Student(SName);

Automatically uses SNumberIndex

Page 12: Advanced Topics: Indexes & Transactions

How Do Indexes Work?

cs3431

Page 13: Advanced Topics: Indexes & Transactions

Types of Indexes

Primary vs. Secondary

Single-Level vs. Multi-Level (Tree Structure)

Clustered vs. Non-Clustered

cs3431

Page 14: Advanced Topics: Indexes & Transactions

Primary vs. Secondary Indexes Index on the primary key of a relation is called primary index (only one)

Index on any other column is called secondary index (can be many)

In primary index, all values are unique

In secondary indexes, values may have duplicates

SSN sNumber sName address

pNum

11111 1 Dave 320FL 1

22222 2 Greg 320FL 1

33333 100 Matt 320FL 2

44444 10 Matt … …

55555 4 John … ..

66666 3 Dave … ..

StudentIndex on SSN is a Primary Index

Index on sNumber is a Secondary Index

Index on sName is a Secondary Index

Page 15: Advanced Topics: Indexes & Transactions

Single-Level Indexes Index is one-level sorted list

Given a value v to query Perform a binary search in the index to find it (Fast) Follow the link to reach the actual record

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

1

2

3

4

10

100

Index on sNumber

Page 16: Advanced Topics: Indexes & Transactions

Multi-Level Index Build index on top of the index (can go multiple levels)

When searching for value v: Find the largest entry ≤ v, and follow its pointer

cs3431

sNumber sName address pNum

1 Dave 320FL 1

2 Greg 320FL 1

100 Matt 320FL 2

10 Matt … …

4 John … ..

3 Dave … ..

Student

1

2

3

4

10

100

Index on sNumber

1

4

1st level

2nd level

Page 17: Advanced Topics: Indexes & Transactions

Clustered vs. Non-Clustered

Assume there is index X on column CIf the records in the table are stored sorted based on C

X Clustered index Otherwise, X Non-Clustered index

Primary index is a clustered index

SSN sNumber sName address

11111 1 Dave 320FL

22222 2 Greg 320FL

33333 100 Matt 320FL

44444 10 Matt …

55555 4 John …

66666 3 Dave …

Student

11111

22222

33333

44444

55555

66666

1

2

3

4

10

100

Clustered index Non-Clustered index

Page 18: Advanced Topics: Indexes & Transactions

Index Maintenance Indexes are used in queries

But, need to be maintained when data change Insert, update, delete

DBMS automatically handles the index maintenance When insert new records the indexed field is added to the index When delete records their values are deleted from the index When update an indexed value delete the old value from index &

insert the new value

There is a cost for maintaining an index, however its benefit is usually more (if used a lot)

cs3431

Page 19: Advanced Topics: Indexes & Transactions

Summary of Indexes Indexes are auxiliary structures for efficient searching and

querying

Query answer is the same with or without index

What to index depends on which columns are frequently queried (in Where clause)

Main operations

cs3431

Create Index <name> On <tablename>(<colNames>);

Drop Index <name>;

Page 20: Advanced Topics: Indexes & Transactions

Transactions

cs3431

Page 21: Advanced Topics: Indexes & Transactions

What is a Transaction A set of operations on a database that are treated as one unit

Execute All or None

Transactions have semantics at the application level Want to reserve two seats in a flight Transfer money from account A to account B …

What if two users are reserving the same flight seat at the same time???

Transactions solve these problems

Page 22: Advanced Topics: Indexes & Transactions

Transactions

By default, each SQL statement is a transaction

Can change the default behavior

SQL > Start transaction;

SQL > Insert ….

SQL > Update …

SQL > Delete ..

SQL > Select …

SQL> Commit | Rollback;

End transaction successfully

Cancel the transaction

All of these statements are now one unit(either all succeed all fail)

Page 23: Advanced Topics: Indexes & Transactions

Transaction Properties Four main properties

Atomicity – A transaction if one atomic unit Consistency – A transaction ensures DB is consistent Isolation – A transaction is considered as if no other transaction was executing simultaneously Durability – Changes made by a transaction must persist

ACID: Atomicity, Consistency, Isolation, Durability

ACID properties are enforced by the DBMS

cs3431

Page 24: Advanced Topics: Indexes & Transactions

Consistency Issue

Many users may update the data at the same time How to ensure the result is consistent

Update T

Set x = x + 2;

Update T

Set x = x * 3;

x

2

3

4

10

100

1 2

3

x

12

15

14

32

302

Wrong, Inconsistent data

Wrong, Inconsistent data

What is the right answer???

What is the right answer???

Page 25: Advanced Topics: Indexes & Transactions

Serial Order of Transactions Given N concurrent transactions T1, T2, …TN

Serial order is any permutation of these transactions (N!) T1, T2, T3, …TN T2, T3, T1, …, TN …

DBMS will ensure that the end-result from executing the N transactions (concurrently) matches one of the serial order execution That is called Serializability As if transactions are executed in serial order

cs3431

Page 26: Advanced Topics: Indexes & Transactions

Serializable Execution Given N concurrent transactions T1, T2, …TN

DBMS will execute them concurrently (at the same time) But, the final effect matches one of the serial order executions

Update T

Set x = x + 2;

Update T

Set x = x * 3;

x

2

3

4

10

100

x

12

15

18

36

306

x

8

11

14

32

302

Page 27: Advanced Topics: Indexes & Transactions

Isolation Levels

Read Uncommitted

Read Committed

Repeatable Read

Serializable

cs3431

Gets stronger & avoids problems

That is the default in DBMS

That is the default in DBMS

Page 28: Advanced Topics: Indexes & Transactions

1- READ UNCOMMITTED

Session 1

-------BEGIN TRANSACTION----- update cust set color='blue' where id=500;

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red

select color from cust where id=500; color ----- blue

select color from cust where id=500; color ----- blue -----------COMMIT------------

||||V

Time

Dirty read(bad)

NonRepeatable read (bad)

Page 29: Advanced Topics: Indexes & Transactions

2- READ COMMITTED

Session 1

-------BEGIN TRANSACTION----- update cust set color='blue' where id=500;

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red

select color from cust where id=500; color ----- red

select color from cust where id=500; color ----- blue -----------COMMIT------------

||||V

Time

NonRepeatable read (bad)

Dirty Read SolvedDirty Read Solved

Page 30: Advanced Topics: Indexes & Transactions

2- READ COMMITTED

Session 1

-------BEGIN TRANSACTION----- delete cust where id=500;

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red

select color from cust where id=500; color ----- red

select color from cust where id=500; color ----- -----------COMMIT------------

||||V

Time

Phantom (bad)

Page 31: Advanced Topics: Indexes & Transactions

3- REPEATABLE READ

Session 1

-------BEGIN TRANSACTION----- update cust set color='blue' where id=500;

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red

select color from cust where id=500; color ----- red

select color from cust where id=500; color ----- red-----------COMMIT------------

||||V

Time

NonRepeatable Read SolvedNonRepeatable Read Solved

Page 32: Advanced Topics: Indexes & Transactions

3- REPEATABLE READ

Session 1

-------BEGIN TRANSACTION----- delete cust where id=500;

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select color from cust where id=500; color ------red

select color from cust where id=500; color ----- red

select color from cust where id=500; color ----- red-----------COMMIT------------

||||V

Time

Phantom (For Delete) SolvedPhantom (For Delete) Solved

Page 33: Advanced Topics: Indexes & Transactions

3- REPEATABLE READ

Session 1

-------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’);

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id --

select id from cust where color=‘blue’; id--

select id from cust where color=‘blue’; id-- 500-----------COMMIT------------

||||V

Time

Phantom Insert (bad)

Page 34: Advanced Topics: Indexes & Transactions

4- SERIALIZABLE

Session 1

-------BEGIN TRANSACTION----- Insert into cust(id, color) values (500, ‘blue’);

-----------COMMIT------------

Session 2-------BEGIN TRANSACTION----- select id from cust where color=‘blue’; id --

select id from cust where color=‘blue’; id--

select id from cust where color=‘blue’; id--

-----------COMMIT------------

||||V

Time

Phantom SolvedPhantom Solved

Page 35: Advanced Topics: Indexes & Transactions

Summary of Transactions Unit of work in DBMS

Either executed All or None

Ensures consistency among many concurrent transactions

Ensures persistent data once committed (using recovery techniques)

Main ACID properties Atomicity, Consistency, Isolation, Durability

cs3431

Page 36: Advanced Topics: Indexes & Transactions

END !!!

cs3431

Page 37: Advanced Topics: Indexes & Transactions

Final Exam Dec. 13, at 8:15am – 9:30am (75 mins) Closed book, open sheet Answer in the same exam sheet

Material Included ERD SQL (Select, Insert, Update, Delete) Views, Triggers, Assertions Cursors, Stored Procedures/Functions

Material Excluded Relational Model & Algebra Normalization Theory ODBC/JDBC Indexes and Transactions

Friday’s Lecture (Revision + short Quiz)

Friday’s Lecture (Revision + short Quiz)