23
1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you get 60 points. If you do Question 2 only, you get 90 points. If you correctly do both Question 1 and Question 2, you get 100 points. Bonus: 5 Points will be given to those who write a Java program for the Huffman code algorithm.

1 Assignment 2: (Due at 10:30 a.m on Friday of Week 10) Question 1 (Given in Tutorial 5) Question 2 (Given in Tutorial 7) If you do Question 1 only, you

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

1

Assignment 2: (Due at 10:30 a.m on Friday of Week 10)

Question 1 (Given in Tutorial 5)

Question 2 (Given in Tutorial 7)

•If you do Question 1 only, you get 60 points.

•If you do Question 2 only, you get 90 points.

•If you correctly do both Question 1 and Question 2, you get 100 points.

•Bonus: 5 Points will be given to those who write a Java program for the Huffman code algorithm.

2

Review of Lecture 1 to Lecture 6

Lecture 1: Some concept: Pseudo code, Abstract Data Type. (Page 60 of text book.)

Stack. Give the ADT of stack (slide 11 of lecture1)

The interface is on slide 19. (Q: Is the interface equivalent to ADT? Not really. We need the method for insertion and deletion, i.e., first in last out. )

Applications: parentheses matching

3

Lecture 2: Linked list

Singly linked list

Doubly linked list

Just know how to setup a list. (Assignment 1)

Lecture 3: Analysis of Algorithms (important)

Primitive operations

Count number of primitive operations for an algorithm

big-O notation 2nO(n), 5n2+10n+11++>O(n2).

4

Lecture 4: TreeDefinition of tree (slide 7)

Tree terminology: root, internal node, external node (leaf), depth of a node, height of a node, height of a node.

Inorder traversal of a binary tree

Tree ADT, slide 11, Binary tree ADT, slide 17

In terms of programming, understand TreeInExample1.java. (If tested in exam, java codes will be given. I do not want to give long code.)

5

Lecture 5: More on TreesLinked Structure for Binary Tree.

Just understand the node:

Preorder traversal for any tree

Postorder traversal for any tree

Array-Based representation of binary tree (slide 9)

Algorithms for Depth(), Height() slide 12-15.

6

Lecture 6: Priority Queue (Heeps)Priority Queue ADT (slide 2)

Heap:

1. definition of heap

2. What does “heap-order” mean?

3. Complete Binary tree (what is a complete binary?)

4. Height of a complete binary tree with n nodes is O(log n).

5. Insert a node into a heap runtimg time O(log n).

6. removeMin: remove a node with minimum key. Running time O(log n)

Array-based complete binary tree representation.

Show a sample exam paper.

7

Lecture 6: Priority Queue (Heeps)Priority Queue ADT (slide 2)

Heap:

1. definition of heap

2. What does “heap-order” mean?

3. Complete Binary tree (what is a complete binary?)

4. Height of a complete binary tree with n nodes is O(log n).

5. Insert a node into a heap runtimg time O(log n).

6. removeMin: remove a node with minimum key. Running time O(log n)

Array-based complete binary tree representation.

Show a sample exam paper.

8

Exercise:

Give some trees and ask students to give InOrder, PostOrder and PreOrder.

Tutorial 6 of Question 2: Using PreOrder.

Given a complete binary, write the array representation.

Given an array, draw the complete binary tree.

Given a heap, show the steps to removMin.

Given a heap, show the steps to insert a node with key 3. (Do it for the tree version, do it for an array version.)

Linear time construction of a heap.

Hash Tables 9

Huffman codes (Page 565 Chapter 12.4)

Binary character code: each character is represented by a unique binary string.A data file can be coded in two ways:

a b c d e f

frequency(%) 45 13 12 16 9 5

fixed-length code

000 001 010 011 100 101

variable-length code

0 101 100 111 1101 1100

The first way needs 1003=300 bits. The second way needs

45 1+13 3+12 3+16 3+9 4+5 4=232 bits.

Hash Tables 10

Variable-length code

Need some care to read the code. 001011101 (codeword: a=0, b=00, c=01,

d=11.) Where to cut? 00 can be explained as

either aa or b. Prefix of 0011: 0, 00, 001, and 0011.Prefix codes: no codeword is a prefix of some other codeword. (prefix free)Prefix codes are simple to encode and decode.

Hash Tables 11

Using codeword in Table to encode and decode

Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.)

Decode: 001011101 = 0.0.101.1101 = aabe a b c d e f

frequency(%) 45 13 12 16 9 5

fixed-length code 000 001 010 011 100 101

variable-length code

0 101 100 111 1101 1100

Hash Tables 12

Encode: abc = 0.101.100 = 0101100 (just concatenate the codewords.)

Decode: 001011101 = 0.0.101.1101 = aabe

(use the (right)binary tree below:)

a:45

b:13 c:12

d:16 e:9

f:5

0

1

100

1486

142858

00

0

0

0 1 1

1

1

a:45

b:13c:12

d:16

e:9

f:5

55

25 30

14

100

0 1

00

0

0

1 1

1

1

Tree for the fixed length codeword Tree for variable-

length codeword

Hash Tables 13

Binary tree

Every nonleaf node has two children.The fixed-length code in our example is not optimal.The total number of bits required to encode a file is

f ( c ) : the frequency (number of occurrences) of c in the file

dT(c): denote the depth of c’s leaf in the tree

Cc

T cdcfTB )()()(

Hash Tables 14

Constructing an optimal code

Formal definition of the problem:Input: a set of characters C={c1, c2, …, cn}, each cC has frequency f[c]. Output: a binary tree representing codewords so that the total number of bits required for the file is minimized. Huffman proposed a greedy algorithm to solve the problem.

Hash Tables 15

a:45

d:16e:9

f:5 b:13c:12

a:45

d:16

e:9

f:5

140 1

b:13c:12

(a)

(b)

Hash Tables 16

a:45

d:16

e:9

f:5

140 1

b:13c:12

250 1

a:45

b:13c:12

d:16

e:9

f:5

25 30

140 1

00 1 1

(c)

(d)

Hash Tables 17

a:45

b:13c:12

d:16

e:9

f:5

55

25 30

14

100

0 1

00

0

0

1 1

1

1

a:45

b:13c:12

d:16

e:9

f:5

55

25 30

140 1

00

0

1 1

1

(e) (f)

Hash Tables 18

HUFFMAN(C)

1 n:=|C|

2 Q:=C

3 for i:=1 to n-1 do

4 z:=ALLOCATE_NODE()

5 x:=left[z]:=EXTRACT_MIN(Q)

6 y:=right[z]:=EXTRACT_MIN(Q)

7 f[z]:=f[x]+f[y]

8 INSERT(Q,z)

9 return EXTRACT_MIN(Q)

Hash Tables 19

The Huffman AlgorithmThis algorithm builds the tree T corresponding to the optimal code in a bottom-up manner.C is a set of n characters, and each character c in C is a character with a defined frequency f[c].Q is a priority queue, keyed on f, used to identify the two least-frequent characters to merge together.The result of the merger is a new object (internal node) whose frequency is the sum of the two objects.

Hash Tables 20

Time complexity

Lines 4-8 are executed n-1 times. Each heap operation in Lines 4-8 takes O(lg n) time.Total time required is O(n lg n).

Note: The details of heap operation will not be tested. Time complexity O(n lg n) should be remembered.

Hash Tables 21

Another example:

e:4 a:6 c:6 b:9 d:11

c:6 b:9 d:11

e:4 a:6

10

0 1

Hash Tables 22

d:11

e:4 a:6

10

0 1

c:6 b:9

15

0 1

c:6 b:9

15

0 1

d:11

e:4 a:6

10

0 1

21

0 1

Hash Tables 23

c:6 b:9

15

0 1

d:11

e:4 a:6

10

0 1

21

0 1

36

0 1

Summary Huffman Code: Given a set of characters and frequency, you should be able to construct the binary tree for Huffman codes. Proofs for why this algorithm can give optimal solution are not required.