Lecture 14 - web.stanford.edu

Preview:

Citation preview

Lecture14Greedyalgorithms!

Announcements

• HW6DueFriday!

• TONSOFPRACTICEONDYNAMICPROGRAMMING

Lastweek

Roadmap

Graphs!

Asymptotic

Analysis

Dynamic

ProgrammingGreedyAlgs

MIDTERM

The

Future!

Moredetailedscheduleonthewebsite!

Thisweek

• Greedyalgorithms!

• Buildsonourideasfromdynamicprogramming

Greedyalgorithms

• Makechoicesone-at-a-time.

• Neverlookback.

• Hopeforthebest.

Today

• Onenon-exampleofagreedyalgorithm:

• Knapsackagain

• Threeexamplesofgreedyalgorithms:

• ActivitySelection

• JobScheduling

• HuffmanCoding

Non-example

• UnboundedKnapsack.

• (Frompre-lectureexercise)

• UnboundedKnapsack:

• SupposeIhaveinfinitecopiesofalloftheitems.

• What’sthemostvaluablewaytofilltheknapsack?

• “Greedy”algorithmforunboundedknapsack:

• TacoshavethebestValue/Weightratio!

• Keepgrabbingtacos!

Weight:

Value:

6 2 4 3 11

20 8 14 3513

Item:

Capacity:10

Totalweight:10

Totalvalue:42

Totalweight:9

Totalvalue:39

ExamplewheregreedyworksActivityselection

FrisbeePractice

Orchestra

CS161study

group

Sleep

CS110

Class

TheoryLunch

TheorySeminar

Combinatorics

Seminar

Underwaterbasket

weavingclass

Math51Class

CS161Class

CS166Class

CS161

Section

CS161Office

Hours

Swimming

lessons

Programming

teammeeting

Socialactivity

time

Youcanonlydooneactivityatatime,andyouwantto

maximizethenumberofactivitiesthatyoudo.

Whattochoose?

Activityselection

• Input:

• Activitiesa1,a2,…,an• Starttimess1,s2,…,sn• Finishtimesf1,f2,…,fn

• Output:

• Howmanyactivitiescanyoudotoday?

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

GreedyAlgorithm

a3a1

a4a2

a5

a7

a6

time

• Pickactivityyoucanaddwiththesmallestfinishtime.

• Repeat.

Atleastit’sfast

• Runningtime:

• O(n)iftheactivitiesarealreadysortedbyfinishtime.

• OtherwiseO(nlog(n))ifyouhavetosortthemfirst.

Whatmakesitgreedy?

• Ateachstepinthealgorithm,makeachoice.

• Hey,Icanincreasemyactivitysetbyone,

• Andleavelotsofroomforfuturechoices,

• Let’sdothatandhopeforthebest!!!

• Hope thatattheendoftheday,thisresultsinagloballyoptimalsolution.

Threequestions

1. Doesthisgreedyalgorithmforactivityselectionwork?

2. Ingeneral,whenaregreedyalgorithmsagoodidea?

3. The“greedy”approachisoftenthefirstyou’dthinkof…

• Whyarewegettingtoitnow,inWeek8?

Answers

1. Doesthisgreedyalgorithmforactivityselectionwork?

• Yes.

2. Ingeneral,whenaregreedyalgorithmsagoodidea?

• Whentheyexhibitespeciallyniceoptimalsubstructure.

3. The“greedy”approachisoftenthefirstyou’dthinkof…

• Whyarewegettingtoitnow,inWeek8?

• Relatedtodynamicprogramming!(WhichwedidinWeek7).

• Provingthatgreedyalgorithmsworkisoftennotsoeasy.

(Seemsto:IPython notebook…) (Butnowlet’sseewhy…)

Whydoesitwork?

• Wheneverwemakeachoice,wedon’truleoutanoptimalsolution.

a3a1

a4a2

a5

a7

a6

time

a5a3

a7

There’ssomeoptimalsolutionthat

containsournextchoiceOurnext

choicewould

bethisone:

Toseethis,consider

OptimalSubstructure

• Subproblem i :

• A[i]=NumberofactivitiesyoucandoafterActivityi finishes.

ai

a2

a7

a6

time

a4

aka3

Wanttoshow:whenwemakeachoiceak,theoptimalsolution

tothesmallersub-problemkwillhelpussolvesub-problemi

Claim

• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.

• ThenA[i]=A[k]+1.

akai

a2

a7

a6

time

a4

aka3

A[k]:howmany

activitiescanIdohere?

A[i]:howmanyactivitiescanIdohere?

Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.

• ThenA[i]=A[k]+1.

a1ai

a2

a7

a6

time

a4

aka3

• ClearlyA[i]≥ A[k]+1• SincewehaveasolutionwithA[k]+1activities.

ai

a2

Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.

• ThenA[i]=A[k]+1.

• SupposetowardacontradictionthatA[i]> A[k]+1.

• There’ssomebettersolutiontosubproblem(i)that

doesn’tuseak• Sayaj endsfirstafterai inthatbettersolution.

• Removeaj andaddak fromthebettersolution.

akai

a2

a7

a6

time

a4

a3 a7a4

a3

aj

Thesetwodon’tcount

forsub-problem(i)so

let’sgreythemout.

Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.

• ThenA[i]=A[k]+1.

• SupposetowardacontradictionthatA[i]> A[k]+1.

• There’ssomebettersolutiontosubproblem(i)that

doesn’tuseak• Sayaj endsfirstafterai inthatbettersolution.

• Removeaj andaddak fromthebettersolution.

• Nowyouhaveasolutionofthesamesize…

butitincludesak soitmusthavesize≤A[k]+1.ak

ai

a2

a7

a6

time

aj

a3 a7a3

Proof• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.

• ThenA[i]=A[k]+1.

a1ai

a2

a7

a6

time

a4

aka3

• ClearlyA[i]≥ A[k]+1• SincewehaveasolutionwithA[k]+1activities.

• Andwejustshowed A[i]≤ A[k]+1• Bycontradiction

• Thatprovestheclaim.

Weneverruleoutanoptimalsolution

• We’ve shown:

• Ifwechooseak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes,thenA[i]=A[k]+1.

• Thatis:

• Assumethatwehaveanoptimalsolutionuptoai• Byaddingak wearestillontracktohitthatoptimalvalue

ai

a2

a7

a6

time

a4

aka3

Sothealgorithmiscorrect

• Weneverruleoutanoptimalsolution

• Attheendofthealgorithm,we’vegotasolution.

• It’snotnotoptimal.

• Soitmustbeoptimal.

LuckytheLackadaisicalLemur

Sothealgorithmiscorrect

• InductiveHypothesis:• Afteraddingthet’th thing,thereisanoptimalsolutionthatextendsthecurrentsolution.

• Basecase:• Afteraddingzeroactivities,thereisanoptimalsolutionextendingthat.

• Inductivestep:• TODO

• Conclusion:• Afteraddingthelastactivity,thereisanoptimalsolutionthatextendsthecurrentsolution.

• Thecurrentsolutionistheonlysolutionthatextendsthecurrentsolution.

• Sothecurrentsolutionisoptimal.

PluckythePedanticPenguin

Inductivestep

• Supposethatafteraddingthet’th thing(Activityi),thereisanoptimalsolution:

• XactivitiesdoneandA[i]activitiesleft.

• Thenweaddthe(t+1)’st thing(Activityk).

• A[k]=A[i]- 1(bytheclaim)

• Now:

• X+1activitiesdoneandA[i]– 1activitiesleft.

• Samenumberasbefore!

• Stilloptimal.

Sothealgorithmiscorrect

• InductiveHypothesis:• Afteraddingthet’th thing,thereisanoptimalsolutionthatextendsthecurrentsolution.

• Basecase:• Afteraddingzeroactivities,thereisanoptimalsolutionextendingthat.

• Inductivestep:• TODO

• Conclusion:• Afteraddingthelastactivity,thereisanoptimalsolutionthatextendsthecurrentsolution.

• Thecurrentsolutionistheonlysolutionthatextendsthecurrentsolution.

• Sothecurrentsolutionisoptimal.

PluckythePedanticPenguin

Commonstrategyforgreedyalgorithms

• Makeaseriesofchoices.

• Showthat,ateachstep,ourchoicewon’truleoutanoptimalsolution attheendoftheday.

• Afterwe’vemadeallourchoices,wehaven’truledoutanoptimalsolution,sowemusthavefoundone.

Commonstrategy(formally)forgreedyalgorithms

• InductiveHypothesis:

• Aftergreedychoicet,youhaven’truledoutsuccess.

• Basecase:

• Successispossiblebeforeyoumakeanychoices.

• Inductivestep:

• TODO

• Conclusion:

• Ifyoureachtheendofthealgorithmandhaven’truledoutsuccessthenyoumusthavesucceeded.

DPviewofactivityselection

• Thisalgorithmismostnaturallyviewedasa

greedyalgorithm.• Makegreedychoices

• Neverruleoutsuccess

• But,wecouldviewitasaDPalgorithm• Takeadvantageofoptimalsub-structureandfill

inatable.

• We’lldothatnow.• Justforpedagogy!

• (Thisisn’tthebestwaytothinkaboutactivity

selection).

RecipeforapplyingDynamicProgramming

• Step1:Identifyoptimalsubstructure.

• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.

• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.

• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.

• Step5:Ifneeded,codethisuplikeareasonableperson.

Optimalsubstructure

• Subproblem i:

• A[i]=numberofactivitiesyoucandoafterActivityi finishes.

ai

a2

a7

a6

time

a4

a1a3

RecipeforapplyingDynamicProgramming

• Step1:Identifyoptimalsubstructure.

• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.

• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.

• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.

• Step5:Ifneeded,codethisuplikeareasonableperson.

Wedidthatalready

• Letak havethesmallestfinishtimeamongactivitiesdo-ableafterai finishes.

• ThenA[i]=A[k]+1.

a1ai

a2

a7

a6

time

a4

aka3

A[k]:howmany

activitiescanIdohere?

A[i]:howmanyactivitiescanIdohere?

RecipeforapplyingDynamicProgramming

• Step1:Identifyoptimalsubstructure.

• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.

• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.

• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.

• Step5:Ifneeded,codethisuplikeareasonableperson.

Top-downDP

• InitializeaglobalarrayAto[None,…,None]

• Makea“dummy”activitythatendsattime-1.

• def findNumActivities(i):

• IfA[i]!=None:

• Return A[i]

• LetActivitykbetheactivityIcanfitinmyscheduleafterActivityi withthesmallestfinishtime.

• If thereisnosuchactivityk,setA[i]=0

• Else,A[i]=findNumActivities(k)+1

• Return A[i]

• Return findNumActivities(0)

Thisisaterriblewaytowritethis!

Theonlythingthatmattershereisthatthe

highlightedlinesareourrecursiverelationship.

SeeIPython notebookfor

implementation

RecipeforapplyingDynamicProgramming

• Step1:Identifyoptimalsubstructure.

• Step2:Findarecursiveformulationforthevalueoftheoptimalsolution.

• Step3:Usedynamicprogrammingtofindthevalueoftheoptimalsolution.

• Step4:Ifneeded,keeptrackofsomeadditionalinfosothatthealgorithmfromStep3canfindtheactualsolution.

• Step5:Ifneeded,codethisuplikeareasonableperson.

Top-downDP

• InitializeaglobalarrayAto[None,…,None]

• InitializeaglobalarrayNextto[None,…,None]

• Makea“dummy”activitythatendsattime-1.

• def findNumActivities(i):• IfA[i]!=None:

• Return A[i]• LetActivitykbetheactivityIcanfitinmyscheduleafterActivityi withthesmallestfinishtime.

• If thereisnosuchactivityk,setA[i]=0• Else,A[i]=findNumActivities(k)+1and Next[i]=k• Return A[i]

• findNumActivities(0)

• Stepthrough“Next”arraytogetschedule.

Thisisaterriblewaytowritethis!

Theonlythingthatmattershereisthatthe

highlightedlinesareourrecursiverelationship.

SeeIPython notebookfor

implementation

Let’sstepthroughit.(SeeIPython notebookforcodewithsomeprintstatements)

Thislooksprettyfamiliar!!

Let’sstepthroughit.

a3a1

a4a2

a5

a7

a6

time

• Startwiththeactivitywiththesmallestfinishtime.

Let’sstepthroughit

a3a1

a4a2

a5

a7

a6

time

• Nowfindthenextactivitystilldo-ablewiththesmallestfinishtime,andrecurse afterthat.

Let’sstepthroughit

a3a1

a4a2

a5

a7

a6

time

• Nowfindthenextactivitystilldo-ablewiththesmallestfinishtime,andrecurse afterthat.

Let’sstepthroughit

a3a1

a4a2

a5

a7

a6

time

• Nowfindthenextactivitystilldo-ablewiththesmallestfinishtime,andrecurse afterthat.

Let’sstepthroughit

a3a1

a4a2

a5

a7

a6

time

• Ta-da!

It’sexactlythesame*asthegreedysolution!

*ifyouimplementthetop-downDPsolutionappropriately.

Sub-problemgraphview

• Divide-and-conquer:

Bigproblem

sub-problemsub-problem

sub-sub-

problem

sub-sub-

problem

sub-sub-

problem

sub-sub-

problem

sub-sub-

problem

Sub-problemgraphview

• DynamicProgramming:

Bigproblem

sub-problemsub-problem

sub-sub-

problemsub-sub-

problem

sub-sub-

problem

sub-sub-

problem

sub-problem

Sub-problemgraphview

• Greedyalgorithms:

Bigproblem

sub-sub-

problem

sub-problem

Sub-problemgraphview

• Greedyalgorithms:

Bigproblem

sub-sub-

problem

sub-problem

• Notonlyisthereoptimalsub-structure:• optimalsolutionstoaproblemaremadeup

fromoptimalsolutionsofsub-problems

• buteachproblemdependsononlyone

sub-problem.

Answers

1. Doesthisgreedyalgorithmforactivityselectionwork?

• Yes.

2. Ingeneral,whenaregreedyalgorithmsagoodidea?

• Whentheyexhibitespeciallyniceoptimalsubstructure.

3. The“greedy”approachisoftenthefirstyou’dthinkof…

• Whyarewegettingtoitnow,inWeek8?

• Relatedtodynamicprogramming!(WhichwedidinWeek7).

• Provingthatgreedyalgorithmsworkisoftennotsoeasy.

Let’sseeafewmoreexamples

Anotherexample:

Scheduling

Overcommitted

StanfordStudent

CS161HW!

Callyourparents!

MathHW!

EconHW!

Practicemusicalinstrument!

ReadCLRS!

Haveasociallife!

Sleep!

Administrativestuffforyourstudentclub!

Dolaundry!

Meditate!

Scheduling

• ntasks

• Taski takesti hours

• Everythingisalreadylate!

• Foreveryhourthatpassesuntiltaski isdone,payci

• CS161HW,thenSleep:costs10⋅ 2+(10+8)⋅ 3=74units• Sleep,thenCS161HW:costs8⋅ 3+(10+8)⋅ 2=60units

CS161HW!

Sleep!

10hours

8hours

Cost:2 unitsper

houruntilit’sdone.

Cost:3unitsper

houruntilit’sdone.

Optimalsubstructure

• Thisproblembreaksupnicelyintosub-problems:

JobA JobB JobC JobD

Supposethisistheoptimalschedule:

Thenthismustbetheoptimal

scheduleonjustjobsB,C,D.

Optimalsubstructure

• Seemsamenabletoagreedyalgorithm:

JobA JobB JobC JobD

Takethebestjobfirst Thensolvethisproblem

JobBJobC JobD

Takethebestjobfirst Thensolvethisproblem

JobBJobD

Takethebestjobfirst

(Thatone’seasyJ )

Thensolvethisproblem

Whatdoes“best”mean?

• Recipeforgreedyalgorithmanalysis:

• Wemakeaseriesofchoices.

• Weshowthat,ateachstep,ourchoicewon’truleoutanoptimalsolution attheendoftheday.

• Afterwe’vemadeallourchoices,wehaven’truledoutanoptimalsolution,sowemusthavefoundone.

JobA JobB JobC JobD

“Best”means:won’truleoutanoptimalsolution.

Theoptimalsolutiontothisproblemextendsanoptimalsolutiontothewholething.

Head-to-head

• Ofthesetwojobs,whichshouldwedofirst?

• Cost(AthenB)=x⋅z+(x+y) ⋅ w• Cost(BthenA)=y ⋅w+(x+y) ⋅z

JobA

JobB

xhours

y hours

Cost:z unitsper

houruntilit’sdone.

Cost:w unitsper

houruntilit’sdone.

AthenBisbetterthanBthenAwhen:

𝑥𝑧 + 𝑥 + 𝑦 𝑤 ≤ 𝑦𝑤 + 𝑥 + 𝑦 𝑧𝑥𝑧 + 𝑥𝑤 + 𝑦𝑤 ≤ 𝑦𝑤 + 𝑥𝑧 + 𝑦𝑧

𝑤𝑥 ≤ 𝑦𝑧𝑤𝑦 ≤

𝑧𝑥

Whatmattersistheratio:

costofdelaytimeittakes

Dothejobwiththe

biggestratiofirst.

Lemma

• GivenjobssothatJobi takestime ti withcostci ,

• Thereisanoptimalschedulesothatthefirstjobistheonethatmaximizestheratioci/ti

• Proof:

• SayJobBmaximizesthisratio,andit’snotfirst:

• SwitchAandB!Nothingelsewillchange,andweshowedonthepreviousslidethatthecostwon’tincrease.

• RepeatuntilBisfirst.

JobA JobB

cA/tA >=cB/tB

JobC JobD

JobAJobBJobC JobD

Choosegreedily:Biggestcost/timeratiofirst

• Jobi takestime ti withcostci

• Thereisanoptimalschedulesothatthefirstjobistheonethatmaximizestheratioci/ti

• Soifwechoosejobsgreedilyaccordingtoci/ti,weneverruleoutsuccess!

GreedySchedulingSolution

• scheduleJobs(JOBS):

• SortJOBSbytheratio:

• 𝒓𝒊 = 𝒄𝒊𝒕𝒊 =

costofdelayingjobitimejobitakestocomplete

• Saythatsorted_JOBS[i] isthejobwiththei’th biggestri• Return sorted_JOBS

TherunningtimeisO(nlog(n))

Nowyoucangoaboutyourschedule

peacefully,intheoptimalway.

Formally,useinduction!

• Inductivehypothesis:

• Thereisanoptimalorderingsothatthefirsttjobsaresorted_JOBS[:t].

• Basecase:

• Whent=0,thisreads:“Thereisanoptimalorderingsothatthefirst0jobsare[]”

• That’strue.

• InductiveStep:

• Boilsdownto:thereisanoptimalorderingonsorted_JOBS[t:]sothatsorted_JOBS[t]isfirst.

• ThisfollowsfromtheLemma.

• Conclusion:

• Whent=n,thisreads:“Thereisanoptimalorderingsothatthefirstnjobsaresorted_JOBS.”

• aka,whatwereturnedisanoptimalordering.

SLIDESKIPPEDINCLASS

Whathavewelearned?

• Agreedyalgorithmworksforscheduling

• Thisfollowedthesameoutlineasthepreviousexample:

• Identifyoptimalsubstructure:

• Findawaytomake“safe”choicesthatwon’truleoutanoptimalsolution.

• largestratiosfirst.

JobA JobB JobC JobD

OnemoreexampleHuffmancoding

• everyday english sentence• 01100101011101100110010101110010011110010110010001100001011110010010000001100101011011100110011101101100011010010111001101101000001000000111001101100101011011100111010001100101011011100110001101100101

• qwertyui_opasdfg+hjklzxcv• 01110001011101110110010101110010011101000111100101110101011010010101111101101111011100000110000101110011011001000110011001100111001010110110100001101010011010110110110001111010011110000110001101110110

OnemoreexampleHuffmancoding

• everyday english sentence• 01100101 0111011001100101 01110010011110010110010001100001011110010010000001100101 011011100110011101101100011010010111001101101000001000000111001101100101 011011100111010001100101 011011100110001101100101

• qwertyui_opasdfg+hjklzxcv• 01110001011101110110010101110010011101000111100101110101011010010101111101101111011100000110000101110011011001000110011001100111001010110110100001101010011010110110110001111010011110000110001101110110

ASCIIisprettywasteful.Ife

showsupsooften,weshould

haveamoreparsimoniousway

ofrepresentingit!

Supposewehavesomedistributiononcharacters

Supposewehavesomedistributiononcharacters

A B C D E F

Percentage

Letter

45

1312

16

9

5

Forsimplicity,

let’sgowiththis

made-upexample

Howtoencodethemas

efficientlyaspossible?

Try0(likeASCII)

A B C D E F

Percentage

Letter

45

1312

16

9

5

000 011001 010 100 101

• Everyletterisassignedabinarystring

ofthreebits.

Wasteful!

• 110and111areneverused.

• Weshouldhaveashorterwayof

representingA.

Try1

A B C D E F

Percentage

Letter

45

1312

16

9

5

0 100 01 10 11

• Everyletterisassignedabinarystring

ofoneortwobits.

• Themorefrequentlettersgetthe

shorterstrings.

• Problem:

• Does000meanAAAorBAorAB?

Try2:prefix-freecoding

A B C D E F

Percentage

Letter

45

1312

16

9

5

01 00101 110 111 100

• Everyletterisassignedabinarystring.

• Morefrequentlettersgetshorterstrings.

• Noencodedstringisaprefixofanyother.

10010101

Confusingly,“prefix-freecodes”arealsosometimes

called“prefixcodes”(includinginCLRS).

Try2:prefix-freecoding

A B C D E F

Percentage

Letter

45

1312

16

9

5

01 00101 110 111 100

• Everyletterisassignedabinarystring.

• Morefrequentlettersgetshorterstrings.

• Noencodedstringisaprefixofanyother.

10010101 F

Confusingly,“prefix-freecodes”arealsosometimes

called“prefixcodes”(includinginCLRS).

Try2:prefix-freecoding

A B C D E F

Percentage

Letter

45

1312

16

9

5

01 00101 110 111 100

• Everyletterisassignedabinarystring.

• Morefrequentlettersgetshorterstrings.

• Noencodedstringisaprefixofanyother.

10010101 FB

Confusingly,“prefix-freecodes”arealsosometimes

called“prefixcodes”(includinginCLRS).

Try2:prefix-freecoding

A B C D E F

Percentage

Letter

45

1312

16

9

5

01 00101 110 111 100

• Everyletterisassignedabinarystring.

• Morefrequentlettersgetshorterstrings.

• Noencodedstringisaprefixofanyother.

10010101 FBA

Question:Whatisthemost

efficientwaytodoprefix-free

coding?(Thisisn’tit).

Confusingly,“prefix-freecodes”arealsosometimes

called“prefixcodes”(includinginCLRS).

Aprefix-freecodeisatree

D:16A:45

B:13F:5 C:12 E:9

0

0 0

0 0 1

1

1

1

1

00 01

100 101 110 111Aslongasalltheletters

showupasleaves,this

codeis prefix-free.

B:13belowmeansthat‘B’

makesup13%ofthe

charactersthateverappear.

Sometreesarebetterthanothers

D:16A:45

B:13F:5 C:12 E:9

0

0 0

0 0 1

1

1

1

1

00 01

100 101 110 111

• Imaginechoosingaletteratrandomfromthelanguage.

• Notuniform,butaccordingtoourhistogram!

• Thecostofatreeistheexpectedlengthoftheencodingofthatletter.

Expectedcostofencodingaletterwiththistree:

𝟐 𝟎. 𝟒𝟓 + 𝟎. 𝟏𝟔 + 𝟑 𝟎. 𝟎𝟓 + 𝟎. 𝟏𝟑 + 𝟎. 𝟏𝟐 + 𝟎. 𝟎𝟗 = 𝟐. 𝟑𝟗

Cost=

K 𝑃 𝑥 ⋅ depth(𝑥)�

QRSTRUV P(x)isthe

probability

ofletterx

Thedepthinthe

treeisthelength

oftheencoding

Question

• GivenadistributionP onletters,findthelowest-costtree,where

cost(tree) = K 𝑃 𝑥 ⋅ depth(𝑥)�

XYZ[Y\V P(x)isthe

probability

ofletterx

Thedepthinthe

treeisthelength

oftheencoding

Optimalsub-structure

• Supposethisisanoptimaltree:

10

Thenthisisan

optimaltreeon

fewerletters.

Otherwise,wecould

changethissub-tree

andendupwitha

betteroveralltree.

Inordertodesignagreedyalgorithm

• Thinkaboutwhatlettersbelonginthissub-problem...

10What’sasafe

choicetomake

fortheselower

sub-trees?

Infrequent

elements!Wewantthemaslow

downaspossible.

Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters

D:16A:45 B:13 F:5C:12 E:9

14

0 1

Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters

D:16A:45 B:13 F:5C:12 E:9

14

0 1

25

0 1

Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters

D:16A:45 B:13 F:5C:12 E:9

14

0 1

25

0 1

30

1

0

Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters

D:16A:45 B:13 F:5C:12 E:9

14

0 1

25

0 1

30

1

0

551

0

Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters

D:16A:45 B:13 F:5C:12 E:9

14

0 1

25

0 1

30

1

0

551

0

1001

0

Solutiongreedilybuildsubtrees,startingwiththeinfrequentletters

D:16

A:45

B:13

F:5

C:12

E:9

14

0 1

25

0 1

30

10

5510

100

10

0

100 101 110

1110 1111

Expectedcostofencodingaletter:

𝟏 ⋅ 𝟎. 𝟒𝟓+

𝟑 ⋅ 𝟎. 𝟒𝟏+

𝟒 ⋅ 𝟎. 𝟏𝟒= 𝟐. 𝟐𝟒

Whatexactlywasthealgorithm?

• Createanodelikeforeachletter/frequency

• Thekeyisthefrequency(16inthiscase)

• LetCURRENT bethelistofallthesenodes.

• while len(CURRENT)>1:

• X andY← thenodesinCURRENT withthesmallestkeys.

• CreateanewnodeZ withZ.key =X.key +Y.key

• SetZ.left =X,Z.right =Y

• AddZ toCURRENT andremoveX andY

• returnCURRENT[0]

D:16

F:5 E:9

14

0 1

Y

Z

XD:16A:45 B:13 C:12

Doesitwork?

• Yes.

• Samestrategy:

• Showthatateachstep,thechoiceswearemakingwon’truleoutanoptimalsolution.

• Lemma:

• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.

D:16A:45 B:13 F:5C:12 E:9

14

0 1

Lemmaproofidea

• Saythatanoptimaltreelookslikethis:

• Whathappenstothecostifweswapxfora?• thecostcan’tincrease;awasmorefrequentthanx,andwejustmadeitsencodingshorter.

• Repeatthislogicuntilwegetanoptimaltreewithxandyassiblings.• Thecostneverincreasedsothistreeisstilloptimal.

Ifxandyarethetwoleast-frequentletters,there

isanoptimaltreewherexandyaresiblings.

x

a

Lowest-levelsibling

nodes:atleastoneof

themisneitherxnory

Lemmaproofidea

• Saythatanoptimaltreelookslikethis:

• Whathappenstothecostifweswapxfora?• thecostcan’tincrease;awasmorefrequentthanx,andwejustmadeitsencodingshorter.

• Repeatthislogicuntilwegetanoptimaltreewithxandyassiblings.• Thecostneverincreasedsothistreeisstilloptimal.

x y

Lowest-levelsibling

nodes:atleastoneof

themisneitherxnory

Ifxandyarethetwoleast-frequentletters,there

isanoptimaltreewherexandyaresiblings.

Proofstrategyjustlikebefore

• Showthatateachstep,thechoiceswearemakingwon’truleoutanoptimalsolution.

• Lemma:

• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.

D:16A:45 B:13 F:5C:12 E:9

14

0 1

Proofstrategyjustlikebefore

• Showthatateachstep,thechoiceswearemakingwon’truleoutanoptimalsolution.

• Lemma:

• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.

That’senoughtoshowthatwe

don’truleoutoptimalityafter

thefirststep.

Whataboutoncewestart

groupingstuff?

D:16A:45 B:13 F:5C:12 E:9

0 1

25

01

1

014

30

Lemma2thisdistinctiondoesn’treallymatter

D:16

F:5E:9

14

0 1

25

0 1

30

10

5510

100

10

C:12B:13

A:45 A:4555

10

100

10

G:25H:30

Thefirstthingisanoptimal

treeon{A,B,C,D,E,F}

ifandonlyif

thesecondthingisan

optimaltreeon{A,G,H}

• Foraproof:

• SeeCLRS,Lemma16.3

• Rigorousalthoughpresentedinaslightlydifferentway

• SeeLectureNotes14

• Abitsketchier,butpresentedinthesamewayashere

• Proveityourself!

• Thisisthebest!

Siggi theStudiousStork

Gettingallthedetails

isn’tthatimportant,but

youshouldconvince

yourselfthatthisistrue.

Lemma2thisdistinctiondoesn’treallymatter

Together

• Lemma1:

• Supposethatxandyarethetwoleast-frequentletters.Thenthereisanoptimaltreewherexandyaresiblings.

• Lemma2:

• WemayaswellimaginethatCURRENTcontainsonlyleaves.

• Theseimply:

• Ateachstep,ourchoicedoesn’truleoutanoptimaltree.

Thewholeargument

• Inductivehypothesis:• afterthet’th step,

• thereisanoptimaltreecontainingthecurrentsubtreesas“leaves”

• Basecase:• afterthe0’thstep,

• thereisanoptimaltreecontainingallthecharacters.

• Inductivestep:• TODO

• Conclusion:• afterthelaststep,

• thereisanoptimaltreecontainingthiswholetreeasasubtree.

• aka,• afterthelaststepthetreewe’veconstructedisoptimal.

Afterthet’th step,we’vegotabunchofcurrentsub-trees:

Inductivehyp.asserts

thatoursubtreescanbe

assembledintoan

optimaltree:

Inductivestep

• Supposethattheinductivehypothesisholdsfort-1

• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”

• Wanttoshow:

• Aftertsteps,thereisanoptimaltreecontainingallthecurrentsub-treesasleaves.

We’vegotabunchofcurrentsub-trees:

xy

saythatxandyarethetwosmallest.

wz

Inductivestep

• Supposethattheinductivehypothesisholdsfort-1

• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”

• ByLemma2,mayaswelltreatas

We’vegotabunchofcurrentsub-trees:

xyw

saythatxandyarethetwosmallest.

aa

yxw

z

z

Inductivestep

• Supposethattheinductivehypothesisholdsfort-1

• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”

• ByLemma2,mayaswelltreatas

• Inparticular,optimaltreesonthisnewalphabetcorrespondtooptimaltreesontheoriginalalphabet.

We’vegotabunchofcurrentsub-trees:

xyw

saythatxandyarethetwosmallest.

aa

zwyx

z

Inductivestep

• Supposethattheinductivehypothesisholdsfort-1

• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”

• Ouralgorithmwoulddothisatlevelt:

We’vegotabunchofcurrentsub-trees:

xyw

saythatxandyarethetwosmallest.

xy

wa a=x+y

z

zwyx

z

Inductivestep

• Supposethattheinductivehypothesisholdsfort-1

• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”

• Ouralgorithmwoulddothisatlevelt:

We’vegotabunchofcurrentsub-trees:

xyw

saythatxandyarethetwosmallest.

zw

a

yx

xy

wa a=x+y

Lemma1impliesthatthere’s

anoptimalsub-treethatlooks

likethis;aka,whatour

algorithmdidokay.

z

z

Inductivestep

• Supposethattheinductivehypothesisholdsfort-1

• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”

• Ouralgorithmwoulddothisatlevelt:

We’vegotabunchofcurrentsub-trees:

xyw

saythatxandyarethetwosmallest.

w

a

xy

wa a=x+y

Lemma2againsaysthat

there’sanoptimaltreethat

lookslikethis

z

yxz

z

Inductivestep

• Supposethattheinductivehypothesisholdsfort-1

• Aftert-1steps,thereisanoptimaltreecontainingallthecurrentsub-treesas“leaves.”

• Ouralgorithmwoulddothisatlevelt:

We’vegotabunchofcurrentsub-trees:

xyw

saythatxandyarethetwosmallest.

w

a

xy

wa a=x+y

Lemma2againsaysthat

there’sanoptimaltreethat

lookslikethis

z

yxz

Thisiswhatwe

wantedtoshowfor

theinductivestep.

z

Inductiveoutline:

• Inductivehypothesis:• afterthet’th step,

• thereisanoptimaltreecontainingthecurrentsubtreesas“leaves”

• Basecase:• afterthe0’thstep,

• thereisanoptimaltreecontainingallthevertices.

• Inductivestep:• TODO

• Conclusion:• afterthelaststep,

• thereisanoptimaltreecontainingthiswholetreeasasubtree.

• aka,• afterthelaststepthetreewe’veconstructedisoptimal.

Afterthet’th step,we’vegotabunchofcurrentsub-trees:

Inductivehyp.asserts

thatoursubtreescanbe

assembledintoan

optimaltree:

Whathavewelearned?

• ASCIIisn’tanoptimalwaytoencodeEnglish,sincethedistributiononlettersisn’tuniform.

• HuffmanCodingisanoptimalway!

• Tocomeupwithanoptimalschemeforanylanguageefficiently,wecanuseagreedyalgorithm.

• Tocomeupwithagreedyalgorithm:

• Identifyoptimalsubstructure

• Findawaytomake“safe”choicesthatwon’truleoutanoptimalsolution.

• Createsubtreesoutofthesmallesttwocurrentsubtrees.

RecapI

• Greedyalgorithms!

• Threeexamples:

• ActivitySelection

• SchedulingJobs

• HuffmanCoding

RecapII

• Greedyalgorithms!

• Ofteneasytowritedown

• Butmaybehardtocomeupwithandhardtojustify

• Thenaturalgreedyalgorithmmaynotalwaysbecorrect.

• Aproblemisagoodcandidateforagreedyalgorithmif:

• ithasoptimalsubstructure

• thatoptimalsubstructureisREALLYNICE

• solutionsdependonjustoneothersub-problem.

Nexttime

• GreedyalgorithmsforMinimumSpanningTree!

• Pre-lectureexercise:candidategreedyalgorithmsforMST

Before nexttime

Recommended