1 Data Structures and DBMS ( Data Base Management System ) Wen-Nung Tsai [email protected] CSIE Department, NCTU

1

Data Structures and DBMS ( Data Base Management System)

Wen-Nung Tsai

[email protected]

CSIE Department, NCTU

2

Data Representation

所有的資料都是 1 和 0 如何表示文字符號 ? 回想有與沒有的猜數遊戲整數如何表示 ?

負數 ? 正負配絕對值 ? 用 1 的補數 ? 用 2 的補數 ? 實數如何表示 ? 先標準化 , 拆成指數與小數

IEEE754/854 兩倍準的實數 ?

IEEE754/854

3

Data Structures

更 High Level 的看法如何表示 Stack ? Queue? List? 如何表示族譜 ? 公司或學校組織架構 ? Tree?

Binary tree General tree AVL 平衡樹 B-Tree …

4

Sorting ( 排序 )

Take a set of items, order unknown Set: Linked list, array, file on disk, …

Return ordered set of the items

For instance: Sorting names alphabetically Sorting by height Sorting by weight

5

Sorting Algorithms

Issues of interest: Running time in worst case, other cases Space requirements

In-place algorithms: require constant space The importance of empirical testing

Often Critical to Optimize Sorting

6

Short Example: Bubble Sort

Key: “large unsorted elements bubble up”

Make several sequential passes over the set Every pass, fix local pairs that are not in order

Considered inefficient, but useful as first example

7

(Naïve)BubbleSort(array A, length n)

1. for in to 2 // note: going down

2. for j2 to i // loop does swaps in [1..i]

3. if A[j-1]>A[j]

4. swap(A[j-1],A[j])

8

Bubble sort example(1/3)

Pass 1: 25 57 48 37 12 92 86 33

25 48 57 37 12 92 86 33

25 48 37 57 12 92 86 33

25 48 37 12 57 92 86 33

25 48 37 12 57 86 92 33

25 48 37 12 57 86 33 92

9

Bubble Sort example(2/3)

Pass 2: 25 48 37 12 57 86 33 92

25 37 48 12 57 86 33 92

25 37 12 48 57 86 33 92

25 37 12 48 57 33 86 92

Pass 3: 25 37 12 48 57 33 86 92

25 12 37 48 57 33 86 92

25 12 37 48 33 57 86 92

10

Bubble Sort example(3/3)

Pass 4: 25 12 37 48 33 57 86 92

12 25 37 48 33 57 86 92

12 25 37 33 48 57 86 92

Pass 5: 12 25 37 33 48 57 86 92

12 25 33 37 48 57 86 92

Pass 6: 12 25 33 37 48 57 86 92

Pass 7: 12 25 33 37 48 57 86 92

11

Bubble Sort Features

Worst case: Inverse sorting Passes: n-1 Comparisons each pass: (n-k) where k pass number Total number of comparisons:

(n-1)+(n-2)+(n-3)+…+1 = n2/2-n/2 = O(n2) In-place: No auxilary storage Best case: already sorted

O(n2) Still: Many redundant passes with no swaps

12

Big O, Big , Big Big O

描述 complexity 上限

Big 描述 complexity 下限

Big ?

參考資料結構或演算法的書

13

改良型氣泡排序法BubbleSort(array A, length n)

1. in

2. quitfalse

3. while (i>1 AND NOT quit) // note: going down

4. quittrue

5. for j=2 to i // loop does swaps in [1..i]

6. if A[j-1]>A[j]

7. swap(A[j-1],A[j]) // put max in I

8. quitfalse

9. ii-1

14

Bubble Sort Features

Best case: Already sorted O(n) – one pass over set, verifying sorting

Total number of exchanges Best case: None Worst case: O(n2) -- 與 n 平方成正比

Lots of exchanges:

A problem with large items

15

Selection Sort

Observation: Bubble-Sort uses lots of exchanges These always float largest unsorted element up

We can save exchanges: Move largest item up only after it is identified More passes, but less total operations

Same number of comparisons Many fewer exchanges

16

SelectSort(array A, length n)

1. for in to 2 // note we are going down

2. largest A[1]

3. largest_index 1

4. for j1 to i // loop finds max in [1..i]

5. if A[j]>A[largest_index]

6. largest_index j

7. swap(A[i],A[largest_index]) // put max in i

17

Selection Sort Example

Initial: 25 57 48 37 12 92 86 33

Pass 1: 25 57 48 37 12 33 86 | 92

Pass 2: 25 57 48 37 12 33 I 86 92

Pass 3: 25 33 48 37 12 I 57 86 92

Pass 4: 25 33 12 37 I 48 57 86 92

Pass 5: 25 33 12 I 37 48 57 86 92

Pass 6: 25 12 I 33 37 48 57 86 92

Pass 7: 12 I 25 33 37 48 57 86 92

18

Selection Sort Summary Best case: Already sorted

Passes: n-1 Comparisons each pass: (n-k) where k pass number # of comparisons: (n-1)+(n-2)+…+1 = O(n2)

Worst case: Same. In-place: No external storage Very few exchanges:

Always n-1 (better than Bubble Sort)

19

Selection Sort vs. Bubble Sort

Selection sort: more comparisons than bubble sort in best case O(n2) But fewer exchanges O(n) Good for small sets/cheap comparisons, large items

Bubble sort: Many exchanges O(n2) in worst case O(n) on sorted input

20

Insertion Sort

Improve on # of comparisons Key idea: Keep part of array always sorted

As in selection sort, put items in final place As in bubble sort, “bubble” them into place

21

InsertSort(array A, length n)

1. for i2 to n // A[1] is sorted

2. y=A[i]

3. j i-1

4. while (j>0 AND y<A[j])

5. A[j+1] A[j] // shift things up

6. jj-1

7. A[j+1] y // put A[i] in right place

22

InsertSort example(1/4)

Initial: 25 57 48 37 12 92 86 33

Pass 1: 25 | 57 48 37 12 92 86 33

Pass 2: 25 57 I 48 37 12 92 86 33

25 48 | 57 37 12 92 86 33

Pass 3: 25 48 57 | 37 12 92 86 33

25 48 57 | 57 12 92 86 33

25 48 48 | 57 12 92 86 33

25 37 48 | 57 12 92 86 33

23


Pass 4: 25 37 48 57 | 12 92 86 33

25 37 48 57 | 57 92 86 33

25 37 48 48 | 57 92 86 33

25 37 37 48 | 57 92 86 33

25 25 37 48 | 57 92 86 33

12 25 37 48 | 57 92 86 33

24


Pass 5: 12 25 37 48 57 | 92 86 33

Pass 6: 12 25 37 48 57 92 | 86 33

12 25 37 48 57 86 | 92 33

Pass 7: 12 25 37 48 57 86 92 | 33

12 25 37 48 57 86 92 | 92

12 25 37 48 57 86 86 | 92

12 25 37 48 57 57 86 | 92

12 25 37 48 48 57 86 | 92

25


Pass 7: 12 25 37 48 48 57 86 | 92

12 25 37 37 48 57 86 | 92

12 25 33 37 48 57 86 | 92

26

Insertion Sort Summary

Best case: Already sorted O(n) Worst case: O(n2) comparisons

# of exchanges: O(n2) In-place: No external storage In practice, best for small sets (<30 items)

BubbleSort does more comparisons! Very efficient on nearly-sorted inputs

27

Nicklas (Nicholas) Wirth

Invented the Pascal Language Wrote a book:

Data Structures + Algorithms

= Programs

28

Divide-and-Conquer Algorithm

An algorithm design technique: Divide a problem of size N into sub-problems Solve all sub-problems Merge/Combine the sub-solutions

This can result in VERY substantial improvements

29

Small Example: f(n)

1. if ( n == 0 OR n == 1)

2. return 1;

3. else

4. return f(n-1)*n;

What is this function?

30

Small Example: f(n)

1. if ( n == 0 OR n == 1)

2. return 1;

3. else

4. return f(n-1)*n;

What is this function?

Factorial ! 算 n 階乘

你告訴我 (n-1) 階乘 ; 我就告訴你

n 階乘

31

Recursion

32

Divide-and-Conquer in Sorting Mergesort

O(n log n) always, but O(n) storage Quick sort

O(n log n) average, O(n2) worst in time Good in practice when n>30, O(log n) storage But, Quick sort is not a Stable sort

Key 一樣之 data 其相對順序在排好後與原先不同

33

Selection Sort, Insertion Sort, Bubble Sort

Selection Sort does more comparisons! (key) Selection Sort does less exchanges ! (data) Worst case 對 n 個 data 做 sort

三者都是 O(n2) Big-O of n square

Quick Sort : (average case) O(n * log n) = O(n*log2 n)

34

Quick Sort 到底有多快 ?

Consider the comparison Selection sort: (n-1)+(n-2)+…+2+1 = n(n-1)/2 n 個資料經過 n-1 次比較可以使一個資料到定位

到最前面 ( 最左邊 )? 到最後面 ( 最右邊 )? 可否到中間 ? 幾乎中間 : Quick sort

假設 100 個資料 Selection sort: 100*99/5 = 50*99 comparison times 若先用一 pass quick sort 概念 : 99 次比較使一個排定再來切成兩半若運氣好為 49 個和 50 個都算 50 個且用 selection sort: 2 * 50 * 49 / 2 = 50*49 共 99 + 50 * 49 = 約 50 * 51 次比較

35

Quicksort Algorithm

Given an array of n elements : If array only contains one element, return Else

pick one element to use as pivot. Partition elements into two sub-arrays:

Elements less than or equal to ( <= ) pivot Elements greater than ( > ) pivot

Quicksort two sub-arrays Return results

36

Quick Sort Example

We are given array of n integers to sort:

38 18 10 80 60 50 7 30 98

37

Pick Pivot Element

There are a number of ways to pick the pivot element. In this example, we will use the first element in the array:

38 18 10 80 60 50 7 30 98

想辦法在 n-1 次比較後 , 使小的在左半 ; 大的在右半

Quick Sort Example

38

Partitioning Array

Given a pivot, partition the elements of the array such that the resulting array consists of:

1. One sub-array that contains elements >= pivot 2. Another sub-array that contains elements < pivot

The sub-arrays are stored in the original data array.

Partitioning loops through, swapping elements below/above pivot.

Quick Sort Example

39

38 18 10 80 60 50 7 30 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

too_big_index too_small_index


Quick Sort Example

40

38 18 10 80 60 50 7 30 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]


1. While data[too_big_index] <= data[pivot_index]++too_big_index


Quick Sort Example

41

38 18 10 80 60 50 7 30 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]




Quick Sort Example

42

38 18 10 80 60 50 7 30 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]




Quick Sort Example

43

38 18 10 80 60 50 7 30 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



2. While data[too_small_index] > data[pivot_index]--too_small_index


Quick Sort Example

44

38 18 10 80 60 50 7 30 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]





Quick Sort Example

45

38 18 10 80 60 50 7 30 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]




3. If too_big_index < too_small_indexswap data[too_big_index] and data[too_small_index]


Quick Sort Example

46

38 18 10 30 60 50 7 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]






Quick Sort Example

47

38 18 10 30 60 50 7 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]





4. While too_big_index < too_small_index, go to 1.


Quick Sort Example

48

38 18 10 30 60 50 7 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]







Quick Sort Example

49

38 18 10 30 60 50 7 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]







Quick Sort Example

50

38 18 10 30 60 50 7 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]







Quick Sort Example

51

38 18 10 30 60 50 7 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]







Quick Sort Example

52

38 18 10 30 60 50 7 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]







Quick Sort Example

53





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

54





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

55





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

56





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

57





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

58





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

59





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

60





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

61





38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

62




4. While too_big_index < too_small_index, go to 1.5. Swap data[too_small_index] and data[pivot_index]

38 18 10 30 7 50 60 80 98pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

63




4. While too_big_index < too_small_index, go to 1.5. Swap data[too_small_index] and data[pivot_index]

7 18 10 30 38 50 60 80 98pivot_index = 4

[0] [1] [2] [3] [4] [5] [6] [7] [8]



Quick Sort Example

64

Partition Result

7 18 10 30 38 50 60 80 98

[0] [1] [2] [3] [4] [5] [6] [7] [8]

<= pivot > pivot

Recursion: Quicksort Sub-arrays

Quick Sort Example

Currently pivot = Data[too_small_index]

65

Recursion: Quicksort Sub-arrays

7 18 10 30 38 50 60 80 98

[0] [1] [2] [3] [4] [5] [6] [7] [8]

<= data[pivot] > data[pivot]

66

Algorithm types

Algorithm types we will consider include: Simple recursive algorithms Backtracking algorithms Greedy algorithms Divide and Conquer algorithms Dynamic programming algorithms Branch and bound algorithms Brute force algorithms Randomized algorithms

67

Coin Changing problem Problem: A dollar amount to reach and a collection

of coin amounts to use to get there. Configuration: A dollar amount yet to return to a customer

plus the coins already returned Objective function: Minimize number of coins returned.

Greedy solution: Always return the largest coin you can Example 1: Coins are valued $.32, $.08, $.01

Has the greedy-choice property, since no amount over $.32 can be made with a minimum number of coins by omitting a $.32 coin (similarly for amounts over $.08, but under $.32).

Coins in USA: 1 ¢ 5 ¢ 10 ¢ 25 ¢ 50 ¢

68

Question ? Suppose there are unlimited quantities of

coins of each denomination.

What property should the denominations c1, c2, …, ck have so that the greedy algorithm always yields an optimal solution?

Consider this example: Example 2: Coins are valued $.30, $.20, $.05, $.01

Does not have greedy-choice property, since $.40 is best made with two $.20’s, but the greedy solution will pick three coins (which ones?)

The greedy method cannot always find an optimal solution!

69

再看看這賺了大錢的 RSA 演算法 RSA and Diffie-Hellman RSA - Ron Rives, Adi Shamir and Len Adleman at

MIT, in 1977. RSA is a block cipher The most widely implemented 公開金鑰密碼演算法 () Public-Key Cryptographic Algorithms

Diffie-Hellman in 1976 Echange a secret key securely Compute discrete logarithms

70

The RSA Algorithm – Key Generation

1. Select p,q p and q both prime2. Calculate n = p x q3. Calculate 4. Select integer e5. Calculate d6. Public Key KU = {e,n}7. Private key KR = {d,n}

)1)(1()( qpn)(1;1)),(gcd( neen

)(mod1 ned

71

Example of RSA Algorithm

1. Select p,q p =7, q =172. Calculate n = p x q =7 x 17 = 1193. Calculate = 964. Select integer e=5 5. Calculate d =776. Public Key KU = {e,n} = {5, 119}7. Private key KR = {d,n} = {77, 119}

)1)(1()( qpn)(1;1)),(gcd( neen

)(mod1 ned

因為 77 x 5 = 385 = 4 x 96 + 1

72

Example of RSA Algorithm (cont.)

73

Diffie-Hellman Key Echange

和 q 是雙方先約好或由一方送給另一方 (A 送給B)

雙方算出的 K 會相等

74

最廣泛的的應用 DBMS + Web

MySQL + PHP 網頁程式 MS SQL + ASP 網頁程式

單純的網頁不值錢必須與資料庫結合才能展現其威力資料庫要有管理系統 (DBMS) DBMS 最常見的是關聯式資料庫管理系統 (Relational Data Base Management System)

75

3-tier architecture

Web Server

Browsers

DBMS

學習平台採用三層式（ 3-tiers ）系統架構

Material

Knowledge

XML

程式邏輯

使用者、管理者

e2.NCTU

76

資料庫與 DBMS 簡介資料檔案資料庫資料庫管理系統 (DBMS)

資料經過適當的安排有系統的存取資料最常搭配 4GL (4-th Generation Language) Structure Query Language (SQL)

Select ALL where 學分 >=9 and 不及格學分 *2 >= 總學分

77

常見的資料庫相關名詞 DBMS (Data Base Management System) Relational DBMS 關聯式資料庫 (RDBMS) SQL (Structural Query Language) DDL (Data Defination Language) DML (Data Manipulation Language) DCL (Data Control Language) Normalization( 正規化 ) ER Model

78

常見的資料庫系統名稱 Dbase III (DOS 時代最有名的 RDBMS) Lotus, Excel 也可拿來存資料

Excel ( 最接近日常生活的資料格式 ) MS Access ( 小而美的資料庫管理系統 ) MS SQL (PC 上大型資料庫溝通的工具 ) MySQL 通常搭配 PHP = DBMS + Web PostGre SQL ( 最早開始於 BSD 的 Ingres 專案 ) RDBMS: Oracle / Informix / Sybase IBM DB2

IBM 於 2001 年 4 月 24 日宣佈將 10 億美元，購併知名資料庫大廠 Informix 。

79

何以 The Matrix 中的先知叫做Oracle ?

Oracle 台灣 ( 美商甲骨文 ) http://www.oracle.com/global/tw/ Oracle( 甲骨文 ) 公司，是僅次於微軟的全球第二

大軟體公司，同時是全球最大的資料庫管理系統 (RDBMS) 供應商

SyBase 台灣 : http://www.sybase.com.tw/ Informix 台灣 :

台灣 Informix 用戶組織 (Taiwan Informix User Group; TWIUG) 在 IBM 支持下於 2005 年 9 月 21 日正式成立。

http://www.iiug.org/twiug/

80

如何存取資料庫使用試算表 Visicalc /Lotus / Excel 直接寫程式 ( 傳統寫法 )

dBase2, dBase III/Clipper/FoxPro 直接寫程式 (Visual programming)

VB / Delphi / Access / PowerBuilder / Developer 2000 /

透過中介軟體 (Middleware) 使用 DBMS Various ODBC / JDBC drivers Using OLEDB / SQL statements

81

RDBMS 基本操作 Table ( 資料表格 ) = Relation =~ file 三種基本操作

Select Project Join

Structure Query Language (SQL) Select ALL where 學分 >=9 and 不及格學分 *2 >= 總學

分

82

Example: An employee database consisting of three relations

83

Another example of the JOIN operation in Relational DataBase

84

資料庫表格正規化簡介資料庫表格正規化簡介 (1/(1/2)2) 第一正規化第一正規化

關聯式資料庫 ((relational databaserelational database) 要求各個資料表皆須符合第一正規化(first normal form, 簡稱 1NF) ，如下：

第一正規化：每一個欄位只准有一個值。第一正規化：每一個欄位只准有一個值。關聯式資料庫要求每個資料表都要有一個主鍵 ((primary key)primary key) ，用來識別每一

個 tuple 。主鍵可以是資料表中的某一個欄位，也可以由幾個欄位組成。關聯式資料庫對主鍵欄位另外有個要求，即實體完整性 ((entity integrityentity integrity) ：

組成主鍵的任何欄位值，都不可以是 Null 。第二正規化：

資料表要滿足第一正規化，而且所有欄位都資料表要滿足第一正規化，而且所有欄位都完全功能相依完全功能相依於主鍵。於主鍵。功能相依功能相依 (functionally dependence) ：

由 attribute X 的值可以決定一個唯一的 attribute Y 的值，簡寫成 X→Y 。完全功能相依完全功能相依 (full functional dependence) ：

如果 attribute Y 功能相依於 attribute X ，但是並不功能相依於 attribute X 的任何子集，則稱 attribute Y 完全功能相依於 attribute X 。

85

X→YX→Y ，， Y→ZY→Z 的關係稱做遞移功能相依遞移功能相依 (transitive functional dependence) ，即 Z 遞移功能相依於 X 。

第三正規化第三正規化 ((third normal formthird normal form ，，簡稱簡稱 33NF)NF) 資料表要滿足第二正規化，而且所有欄位都不可遞移功能相依於主鍵。資料表要滿足第二正規化，而且所有欄位都不可遞移功能相依於主鍵。

第三正規化解決的問題第三正規化解決的問題解決了新增資料的問題解決了刪除資料的問題解決了修改資料的問題

資料庫表格正規化簡介資料庫表格正規化簡介 (2/(2/2)2)

86

Data Mining ( 資料探勘 ) 應用

客戶信用資料降低貸款風險損失率預測潛在流失客戶

分析零售商店歷史銷售記錄與位置概述以決定最佳的位置音樂 /電影喜好問卷蒐集分析提款機設置地點最佳位置分析販賣促銷資訊的成效（ e.g. coupon) 分析客戶行為幫助決策 (e.g. CRM 系統 ) 預測侵蝕性的物質對皮膚的影響降低產品 (藥品或毒品 ) 的發

展成本和時間，以及減少動物實驗的需求

87

Some Data Structure examples

Linked List : insert/delete a node Data Structure in Java

Java 把一般資料結構課本上討論的都做成程式庫並提供一致的 access interface 請參考 Java 的線上 Reference Manual

( 可到 Sun 網站或 Java 網站抓 , 或這也有 :

http://www.csie.nctu.edu.tw/~tsaiwn/course/java/ )

88

Linked List

Flexible structure, providing

• Insertion and removal from any place in O(1), compared to O(n) for array-based list

• Sequential access

• Random access at O(n), compared to O(1) for array-based list

89

Connecting Nodes

creating the nodes

connecting

90

Inserting Nodes

p.link = r

r.link = q

q can be accessed by p.link.link

r

91

Removing Nodes

p q

92

Traversing a List

(null)

93

Double Linked ListsSingle linked list

Double linked list

(null)

(null)

data

successor

predecessor

data

successor

predecessor

data

successor

predecessor

(null)

(null)

94

Data Structures in JAVA Let‘s see what JAVA has to offer:

95

The Collection Hierarchy Collection: top interface, specifying requirements for all collections

96

Collection Interface (1/2)

97

Collection Interface (2/2)

!

98

Iterator Interface Purpose:

Sequential access to collection elements

Note: the so far used technique of sequentially accessing elements by sequentially indexing is not reasonable in general (why ?) !

Methods:

Documents

1 Data Structures and DBMS ( Data Base Management System ) Wen-Nung Tsai [email protected] CSIE Department, NCTU