47
Approximating Optimal Binary Decision Trees Brent Heeringa [email protected] (joint work with Micah Adler) 18 November 2005

Approximating Optimal Binary Decision Trees Brent Heeringa [email protected] (joint work with Micah Adler) 18 November 2005

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

Approximating Optimal Binary Decision Trees

Brent [email protected]

(joint work with Micah Adler)

18 November 2005

Question:I am thinking of a computer scientist.

Which one?Rule: Ask YES/NO questions from a

finite set Q

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1: MIT Professor?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1: MIT Professor? YES

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1: MIT Professor? YESQ2: Author of a popular CS text?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1: MIT Professor? YESQ2: Author of a popular CS text? YES

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q2YES NO

Q1: MIT Professor? YESQ2: Author of a popular CS text? YES

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q2YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Q1: MIT Professor? YESQ2: Author of a popular CS text? YESQ3: Inventor of RSA?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q2YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Q1: MIT Professor? YESQ2: Author of a popular CS text? YESQ3: Inventor of RSA? NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q1YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Q2YES NO

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

NOQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Decision Tree Problem (DT)

Input: A set X=(x1,…,xn) of binary strings (called items)Each item has exactly m-bitsE.g. if m=5 then xi might be 10011

Solution: A binary tree with n leavesEach internal node indexes some bit k

partitions items into two groups

Each item is a leaf (n leaves total)

Cost: Total Sum of Leaf DepthsOptimal Solution: DT with minimum cost

k0 1

Example:

11110 10111 11010 01101

30 1

11010

01101

0

1

1

2

0

10111

1

11110

1

2

3 3

Cost: 1 + 2 + 3 + 3 = 9

Example:

11110 10111 11010 01101

30 1

11010

01101

0

1

1

2

0

10111

1

11110

1

2

3 3

4

3

2 +

9

Example:

11110 10111 11010 01101

50 1

11010 01101

0 11

1011111110

2

Cost: 2 + 2 + 2 + 2 = 8

OPTIMAL!0 13

2 2 2

Alternative Cost

11110 10111 11010 01101

50 1

11010 01101

0 11

1011111110

2

0 13

2 2 2

4 + 2 + 2 = 8

Decision Trees

Decision Trees (DT) model many natural tasks inMedical DiagnosisExperiment Design

DT is the the 20-questions problemDT is NP-Complete

Reduction from Set Cover (Exact Cover by 3 Sets)[Hyafill and Rivest - 1975]

Outline

Problem IntroductionA Greedy Approximation Algorithm

for DTAn Analysis of the Greedy Algorithm

ln n-approximationOther Results and Open Problems

A Greedy DT Algorithm

01101 10001 11101 11110 10111 11010

?

IDEA: Always choose bit which most evenly partitions items

0 1

A Greedy DT Algorithm

01101 10001 11101 11110 10111 11010

4

IDEA: Always choose bit which most evenly partitions the items

0 1

01101 10001 11101 11110 10111 11010

A Greedy DT Algorithm

4

IDEA: Always choose bit which most evenly partitions items

0 1

11101

11110 10111 11010

1

01101 2

10001

01101 10001 11101 11110 10111 11010

A Greedy DT Algorithm

4

IDEA: Always choose bit which most evenly partitions items

0 1

1110111010

1

01101 2

10001

01101 10001 11101 11110 10111 11010

10111

2

11110

3

A Greedy DT AlgorithmIDEA: Always choose bit which

most evenly partitions items

GREEDY-DT(X)If X=Ø

Return NILElse

k index of the bit most evenly separating XT new tree nodeT[left] GREEDY-DT({X | X(k)=0})T[right] GREEDY-DT({X | X(k)=1})Return T

Optimal vs. Greedy

a

b c

h

e b

c df g

a

Optimal Tree T* Greedy Tree T

Cost(T)=26Cost(T*)=25

d eh

f g

Outline

Problem IntroductionA Greedy Approximation Algorithm

for DTAn Analysis of the Greedy Algorithm

(ln n+1)-approximationOther Results and Open Problems

Approximation Algorithm Review

Minimization ProblemC = cost given by approximation algorithmCopt = cost of optimal solution

-approximation: may be a function of the input size – n

Analysis Outline

Accounting SchemeEach pair of items {xi, xj} is separated exactly once

in any decision tree True for Greedy and Optimal

Distribute cost of the Greedy tree among item pairs Analyze cost of greedy tree w.r.t. structure of

optimal tree

Theorem: The greedy algorithm yields a tree with cost at most a factor of (ln n +1) more than the optimal tree

Definitions and Notation

Consider each pair of items {xi,xj}

Sij separates xi from xj

Sij : set of items that are

children

Sij+ and Sij

- child sets

respectively |Sij

+| ≥ |Sij-|

|Sij| = |Sij+| + |Sij

-|

xi

Sij

Sij-Sij

+

xj

Greedy Tree T

Accounting Method

xi

Sij

Sij-Sij

+

xj

Greedy Tree T Assign cost cij to each pair of

items {xi,xj}

Distribute |Sij| equally among

the |Sij+||Sij

-| pairs of items

split at Sij

cij =

xi

2 4

xj

Greedy Tree T Assign cost cij to each pair of

items {xi,xj}

Distribute |Sij| equally among

the |Sij+||Sij

-| pairs of items split

at Sij

cij =

Example:

|Sij|= 6 |Sij+|= 4 |Sij

-|= 2

{a,f} = {c,e} = cij = 6/8 = 3/4

6

Accounting Method

{a,b,c,d,e,f}

{a,b,c,d} {e,f}

Greedy Tree Cost

xi

Sij

Sij-Sij

+

xj

Greedy Tree T

Cost of Greedy Tree T:

Free to order pair costs in any way we like!

Reorder cij according to T*

xi

Z

Z-Z+

xj

Free to order pair costs in any way we like!

Optimal Tree T*

A Lemma

xi

Z

Z-Z+

xj

Optimal Tree T*

Lemma: For any node Z in T*

Prove of the Theorem

xi

Z

Z-Z+

xj

Optimal Tree T*

(lemma)

(|Z| ≤ n)

(Def of tree cost)

(CLRS)

Lemma: For any node Z in T*

Proving the Lemma

Lemma: For any node Z in T*

xi

Sij

Sij-Sij

+

xj

Greedy Tree T

Goal: Relate pair cost (defined w.r.t. greedy tree) to the optimal tree

Claim 1:

Proving the Lemma

Lemma: For any node Z in T*

xi

Sij

SijZ-Sij

Z+

xj

Greedy Tree T

Claim 1:

Proving the Lemma

Lemma: For any node Z in T*

Claim 1:

xi

Sij

SijZ-Sij

Z+

Greedy Tree T

xj

Claim 2

(claim 1)

Claim 2:For any Z in T*, for any xi in Z+:

Proof of Claim 2Claim 2:For any Z in T*, for any xi in Z+:

Z- = {a, b, c, d, e, f}

Order Z from 1 to 6 according to when xj is split from xi

When tth item is split from xi, |Sij Z-| ≥ 6-t+1

xi

a,b

c

d,e

f

Si1

Si2

Si3

Si4

Si5

|Si2| ≥ 6

|Si3| ≥ 4

|Si4| ≥ 3

|Si4| ≥ 1

Greedy Tree T

Wrapping up the Proof:

(claim 1)

Claim 2:For any Z in T*, for any xi in Z+:

Lemma: For any node Z in T*

Wrapping up the Proof:

(claim 1)

Claim 2:For any Z in T*, for any xj in Z-:

Lemma: For any node Z in T*

Wrapping up the Proof:

(claim 1)

Lemma: For any node Z in T*

QED

(claim 2)

Outline

Problem IntroductionA Greedy Approximation Algorithm

for DTAn Analysis of the Greedy Algorithm

(ln n +1)-approximationOther Results and Open Problems

DT has no PTAS unless P=NP

MAX3SAT5 [Feige]: 3CNF; each literal appears in exactly 5 clausesThm: There exists a universal constant > 0

such that it is NP-Hard to distinguish 3SAT5 formula that are satisfiable and those in which at most (1- )|C| clauses are simultaneously satisfied.

Gap preserving reduction from MAX3SAT5 to DTVia a set cover

DT has no PTAS unless P=NP

All clauses satisfied:

Cost:

DT has no PTAS unless P=NP

At most (1- )|C| clauses satisfied:

Cost:

The ConDT Problem: Input: A set X=(x1,…,xn) of m-bit binary

strings (called items)Each item xi has a label TRUE or FALSE

Solution: A binary treeEach internal node is a bit k; each leaf is a labelThe tree correctly labels each item (consistent)

Cost: Total number of leavesOptimal Solution: Consistent decision tree

with minimum number of leavesNot possible to approx. size s DTs with size sk

DTs (for any constant k) unless NP is in DTIME[2m] for some < 1

Open Problems

Gap in approximation ratios between lower and upper boundsTechniques from ConDT don’t work

Items with weightsTests with weightsMinimize:

Fin