View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Approximating Optimal Binary Decision Trees
Brent [email protected]
(joint work with Micah Adler)
18 November 2005
Question:I am thinking of a computer scientist.
Which one?Rule: Ask YES/NO questions from a
finite set Q
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1: MIT Professor?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1: MIT Professor? YES
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1: MIT Professor? YESQ2: Author of a popular CS text?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1: MIT Professor? YESQ2: Author of a popular CS text? YES
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q2YES NO
Q1: MIT Professor? YESQ2: Author of a popular CS text? YES
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q2YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Q1: MIT Professor? YESQ2: Author of a popular CS text? YESQ3: Inventor of RSA?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q2YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Q1: MIT Professor? YESQ2: Author of a popular CS text? YESQ3: Inventor of RSA? NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q1YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Q2YES NO
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
NOQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Decision Tree Problem (DT)
Input: A set X=(x1,…,xn) of binary strings (called items)Each item has exactly m-bitsE.g. if m=5 then xi might be 10011
Solution: A binary tree with n leavesEach internal node indexes some bit k
partitions items into two groups
Each item is a leaf (n leaves total)
Cost: Total Sum of Leaf DepthsOptimal Solution: DT with minimum cost
k0 1
Example:
11110 10111 11010 01101
30 1
11010
01101
0
1
1
2
0
10111
1
11110
1
2
3 3
Cost: 1 + 2 + 3 + 3 = 9
Example:
11110 10111 11010 01101
50 1
11010 01101
0 11
1011111110
2
Cost: 2 + 2 + 2 + 2 = 8
OPTIMAL!0 13
2 2 2
Alternative Cost
11110 10111 11010 01101
50 1
11010 01101
0 11
1011111110
2
0 13
2 2 2
4 + 2 + 2 = 8
Decision Trees
Decision Trees (DT) model many natural tasks inMedical DiagnosisExperiment Design
DT is the the 20-questions problemDT is NP-Complete
Reduction from Set Cover (Exact Cover by 3 Sets)[Hyafill and Rivest - 1975]
Outline
Problem IntroductionA Greedy Approximation Algorithm
for DTAn Analysis of the Greedy Algorithm
ln n-approximationOther Results and Open Problems
A Greedy DT Algorithm
01101 10001 11101 11110 10111 11010
?
IDEA: Always choose bit which most evenly partitions items
0 1
A Greedy DT Algorithm
01101 10001 11101 11110 10111 11010
4
IDEA: Always choose bit which most evenly partitions the items
0 1
01101 10001 11101 11110 10111 11010
A Greedy DT Algorithm
4
IDEA: Always choose bit which most evenly partitions items
0 1
11101
11110 10111 11010
1
01101 2
10001
01101 10001 11101 11110 10111 11010
A Greedy DT Algorithm
4
IDEA: Always choose bit which most evenly partitions items
0 1
1110111010
1
01101 2
10001
01101 10001 11101 11110 10111 11010
10111
2
11110
3
A Greedy DT AlgorithmIDEA: Always choose bit which
most evenly partitions items
GREEDY-DT(X)If X=Ø
Return NILElse
k index of the bit most evenly separating XT new tree nodeT[left] GREEDY-DT({X | X(k)=0})T[right] GREEDY-DT({X | X(k)=1})Return T
Optimal vs. Greedy
a
b c
h
e b
c df g
a
Optimal Tree T* Greedy Tree T
Cost(T)=26Cost(T*)=25
d eh
f g
Outline
Problem IntroductionA Greedy Approximation Algorithm
for DTAn Analysis of the Greedy Algorithm
(ln n+1)-approximationOther Results and Open Problems
Approximation Algorithm Review
Minimization ProblemC = cost given by approximation algorithmCopt = cost of optimal solution
-approximation: may be a function of the input size – n
Analysis Outline
Accounting SchemeEach pair of items {xi, xj} is separated exactly once
in any decision tree True for Greedy and Optimal
Distribute cost of the Greedy tree among item pairs Analyze cost of greedy tree w.r.t. structure of
optimal tree
Theorem: The greedy algorithm yields a tree with cost at most a factor of (ln n +1) more than the optimal tree
Definitions and Notation
Consider each pair of items {xi,xj}
Sij separates xi from xj
Sij : set of items that are
children
Sij+ and Sij
- child sets
respectively |Sij
+| ≥ |Sij-|
|Sij| = |Sij+| + |Sij
-|
xi
Sij
Sij-Sij
+
xj
Greedy Tree T
Accounting Method
xi
Sij
Sij-Sij
+
xj
Greedy Tree T Assign cost cij to each pair of
items {xi,xj}
Distribute |Sij| equally among
the |Sij+||Sij
-| pairs of items
split at Sij
cij =
xi
2 4
xj
Greedy Tree T Assign cost cij to each pair of
items {xi,xj}
Distribute |Sij| equally among
the |Sij+||Sij
-| pairs of items split
at Sij
cij =
Example:
|Sij|= 6 |Sij+|= 4 |Sij
-|= 2
{a,f} = {c,e} = cij = 6/8 = 3/4
6
Accounting Method
{a,b,c,d,e,f}
{a,b,c,d} {e,f}
Greedy Tree Cost
xi
Sij
Sij-Sij
+
xj
Greedy Tree T
Cost of Greedy Tree T:
Free to order pair costs in any way we like!
Reorder cij according to T*
xi
Z
Z-Z+
xj
Free to order pair costs in any way we like!
Optimal Tree T*
Prove of the Theorem
xi
Z
Z-Z+
xj
Optimal Tree T*
(lemma)
(|Z| ≤ n)
(Def of tree cost)
(CLRS)
Lemma: For any node Z in T*
Proving the Lemma
Lemma: For any node Z in T*
xi
Sij
Sij-Sij
+
xj
Greedy Tree T
Goal: Relate pair cost (defined w.r.t. greedy tree) to the optimal tree
Claim 1:
Proof of Claim 2Claim 2:For any Z in T*, for any xi in Z+:
Z- = {a, b, c, d, e, f}
Order Z from 1 to 6 according to when xj is split from xi
When tth item is split from xi, |Sij Z-| ≥ 6-t+1
xi
a,b
c
d,e
f
Si1
Si2
Si3
Si4
Si5
|Si2| ≥ 6
|Si3| ≥ 4
|Si4| ≥ 3
|Si4| ≥ 1
Greedy Tree T
Wrapping up the Proof:
(claim 1)
Claim 2:For any Z in T*, for any xi in Z+:
Lemma: For any node Z in T*
Wrapping up the Proof:
(claim 1)
Claim 2:For any Z in T*, for any xj in Z-:
Lemma: For any node Z in T*
Outline
Problem IntroductionA Greedy Approximation Algorithm
for DTAn Analysis of the Greedy Algorithm
(ln n +1)-approximationOther Results and Open Problems
DT has no PTAS unless P=NP
MAX3SAT5 [Feige]: 3CNF; each literal appears in exactly 5 clausesThm: There exists a universal constant > 0
such that it is NP-Hard to distinguish 3SAT5 formula that are satisfiable and those in which at most (1- )|C| clauses are simultaneously satisfied.
Gap preserving reduction from MAX3SAT5 to DTVia a set cover
The ConDT Problem: Input: A set X=(x1,…,xn) of m-bit binary
strings (called items)Each item xi has a label TRUE or FALSE
Solution: A binary treeEach internal node is a bit k; each leaf is a labelThe tree correctly labels each item (consistent)
Cost: Total number of leavesOptimal Solution: Consistent decision tree
with minimum number of leavesNot possible to approx. size s DTs with size sk
DTs (for any constant k) unless NP is in DTIME[2m] for some < 1
Open Problems
Gap in approximation ratios between lower and upper boundsTechniques from ConDT don’t work
Items with weightsTests with weightsMinimize: