26

Algorithms Part i

Embed Size (px)

DESCRIPTION

Algorithms

Citation preview

Page 1: Algorithms Part i
Page 2: Algorithms Part i

1. Introduction2. 1-1-CourseIntroduction3. 2-1-DynamicConnectivity4. 2-2-QuickFind5. 2-3-QuickUnion6. 2-4-Quick-UnionImprovements7. 2-5-Union-FindApplications8. 3-1-AnalysisofAlgorithmsIntroduction9. 3-2-Observations10. 3-3-MathematicalModels11. 3-4-Order-of-GrowthClassifications12. 3-5-TheoryofAlgorithms13. 3-6-Memory

TableofContents

AlgorithmsPartI

2

Page 3: Algorithms Part i

Thisfilefileservesasyourbook'spreface,agreatplacetodescribeyourbook'scontentandideas.

MyAwesomeBook

AlgorithmsPartI

3

Page 4: Algorithms Part i

Welcome.I'mBobSedgewick,professorofcomputerscienceatPrinceton.ThisisouronlinecourseAlgorithmsdevelopedbymyselfandKevinWaynehereatPrinceton.We'regonnastartwithanoverviewdiscussionofwhyyoumightwanttostudyalgorithmsandalittlebitofdiscussionabouttheresourcesthatyouneedtotakethiscourse.So,whatisthiscourse?It'sanintermediatelevelsurveycourseonalgorithms.We'regoingtoconcentrateonprogrammingandproblemsolvinginthecontextofrealapplications,andourfocusisgoingtobeontwothings,Algorithmswhicharemethodsforsolvingproblemsanddatastructureswhichstoretheinformationassociatedinproblem,withaproblemandgohandinhandwithalgorithms.Thesearethebasictopicsthatwe'llcoverinpartoneandparttwoofthecourse.Thefirstpartisdatatypesortingandsearching.We'llconsideranumberofdatastructuresandalgorithmsthatarebasictoallthemethodsweconsiderincludingstacks,queues,bagsandpriorityqueues.Thenwe'llconsiderclassicalgorithmsforsorting,puttingthingsinorder.That'squicksort,mergesort,heapsortandradixsorts.Andwe'llconsiderclassicmethodsforsearching.Includingbinarysearchtrees,red-blackbinarysearchtreesandhashtables.Thesecondpartofthecourseisformoreadvancedalgorithmsincludinggraphalgorithms,classicgraphsearchingalgorithms,minimumspanningtreeandshortestpathalgorithms,algorithmsforprocessingstringsincludingregularexpressionsanddatacompression.Andthensomeadvancedalgorithmsthatmakeuseofthebasicalgorithmsthatwedevelopedearlierinthecourse.So,whyshouldonestudyalgorithms?Well,theirinput,impactisverybroadandfar-reaching.Fromtheinternettobiologyto,commercialcomputing,computergraphics,security,multimedia,socialnetworks,andscientificapplications,algorithmsareallaroundus.They'reusedformoviesandvideogames,forparticlecollisionsimulation,they'reusedtostudythegenome,andallmannerofotherapplications.So,that'soneimportantreasontostudyalgorithms,theirimpactisbroadandfar-reaching.Algorithmsarealsointerestingtostudy,becausethey,theyhaveancientroots.Nowthefirstalgorithmwestudiedgoesbackto300B.C.,datingatleasttoEuclid.TheconceptofanalgorithmwasformalizedactuallyhereatPrinceton,byChurchandTuring,inthe1930s.Butmostalgorithmsthatweconsider,werediscoveredinrecentdecades.Infact,somewerediscoveredbyundergraduatesinacourse,courselikethis.Andthere'splentyofotheralgorithmswaitingtobediscoveredbystudentslikeyou.Themainreasonthatpeoplestudyalgorithms,istobeabletosolveproblemsthatitcouldnototherwisebeaddressed.Forexample,inthefirstlecture,we'regoingtotalkaboutthenetworkconnectivityproblem,wheretheproblemis,givenalargesetofitemsthatareconnectedtogetherpairwiseisthereawaytogetfromonetoanotherwithapaththroughtheconnections.Asyoucanseefromthisexample,it'snotclearwhetherornotthere'ssuchapath,weneedacomputerprogramtodoit,infact,weneedanefficientalgorithmtodoit.Inthiscasetheansweristhatthereissuchapath.Anotherreasontostudyalgorithmsisforintellectualstimulation.Algorithmsareveryinterestingobjectstostudy.DonKnuthwhowroteseveralbookson,onalgorithmsandwasapioneerinthefieldsaidthat,"Analgorithmmustbeseentobebelieved."Youcan'tjustthinkaboutanalgorithmyouhavetoworkwithit.AnotherquotefromFrancisSullivan,says,"Thegreatalgorithmsarethepoetryofcomputation."Justlikeverse,theycanbeterse,elusive,dense,andevenmysterious.Butonceunlocked,theycastabrilliantnewlightonsomeaspectofcomputing.Algorithmsareinterestingforintellectualstimulation.AnotherreasonmanypeoplestudyalgorithmsandIsuspectmanyofyou,isit'snecessarytounderstandgoodalgorithms,efficientalgorithms,agooddatastructuresinordertobeaproficientprogrammer.LinusTorvalds,whocreatedlin,Linux,saysthatthedifferencebetweenabadprogrammerandagoodoneiswhetherheconsidershiscodeorhisdatastructuresmoreimportant.Badprogrammersworryaboutthecode,goodprogrammersworryaboutdatastructures,andtheirrelationships.And,Imightadd,thealgorithmsthatprocessthem.NiklausWirth,anotherpioneerincomputerscience,wroteafamousbookcalledAlgorithms+DataStructures=Programs.[cough].Anotherreasonnowadaystostudyalgorithmsisthat,theyhavebecomeacommonlanguageforunderstanding,nature.Algorithmsarecomputationalmodels,andalgorithmicmodelsarereplacingmathematicalmodelsinscientificinquiry.Inthetwentiethcentury,math,scientistsdevelopedmathematicalmodelstotrytounderstandnaturalphenomenon.Itsoonbecameclearthatthosemathematicalmodelsweredifficulttosolve.Itwasdifficulttocreatesolutions,tobeabletotesthypothesesagainstnaturalphenomenon.So,moreandmoreandmorenowadayspeoplearedevelopingcomputationalmodels,wheretheyattempttosimulatewhatmightbehappeninginnatureinordertotrytobetterunderstandit.Algorithmsplayanextremelyimportantroleinthisprocess.Andwe'llseesomeexamplesofthisinthiscourse.Anotherimportantreasonisthatifyouknoweffect,howtoeffectivelyusealgorithmsanddatastructuresyou'regoingtohaveamuchbetterchanceatinterviewingforajobinthetechnologyindustrythenifyoudon't.So,here'sabunchofreasonsthatIjustwentthroughforstudyingalgorithms.Theirimpact'sbroadandfar-reaching,theyhaveoldrootsandpresentnewopportunities,theyallowustosolveproblemsthatcouldnototherwisebeaddressed,youcanusethemforintellectualstimulationto

1-1-CourseIntroduction

AlgorithmsPartI

41-1-CourseIntroduction

Page 5: Algorithms Part i

becomeaproficientprogrammer.Theymightunlockthesecretsoflifeintheuniverse,andthey'regoodforfunandprofit.Infact,aprogrammermightask,whystudyanythingelse?Well,there'splentyofgoodreasonstostudyotherthings,butI'llsubmitthere'snogoodreasonnottostudyalgorithims.[cough]So,forthiscoursewehavetworesourcesthatIwanttotalkaboutandmakesurethatpeoplearefamiliarwithbeforeenteringintothecontent.ThisisapublishingmodelthatKevinWayneandIdevelopedandhavebeenusingformanyyears,andwethinkit'saveryeffectivewaytosupportthe,kindsoflecturesthatwe'regoingtobegivinginthiscourse.Downatthebottom,andit'soptionalforthiscourse,wehaveatextbook.It'satraditional,textbookthatextensivelycoversthetopicsinthecourse,infactmanymoretopicsthanwecanpresentinlecture.Andthensupportingthattextbook,isfreeonlinematerialthatwecallthebooksite.Youcangotobooks,thebooksitetoseethelectureslides.Butmoreimportant,there'scode,there'sexercises,tere'sagreatdealofinformationthere.Infact,maybetentimeswhat'sinthebook,includingasummaryofthecontent.So,duringthiscourseyou'llbereferringtothebooksitefrequentlywhileworkingonline.Peopleoftenaskaboutprerequisites.We'reassumingthatpeoplewhotakethiscourseknowhowtoprogram,andknowthebasicsofloops,arrays,functions.Theyhavesomeexposuretoobjectorientedprogrammingandrecursion.WeusetheJavalanguage,butwedon'tdwellondetailsofJava,wemostlyuseitasanexpositorylanguage.Wedosomemath,butnotadvancedmath.Ifyouwanttoreviewthematerialthatwethinkisprerequisiteforthematerialinthiscourse,youcandoaquickreviewbylookingatsections1.1and1.2ofthebook.Eitheratthebooksiteorinthetextbook.Ifyouwantanindepthreview,wehaveafulltextbookcalled,AnIntroductiontoProgramminginJava:AnInterdisciplinaryApproach.Thereisabooksiteandtextbookaswell.But,thebottomlineis,youshouldbeabletoprogram,andthequickexercisetogetreadyis,towriteajavaprogramonyourcomputerperhapsusingaprogrammingmodel,asdescribedonthebooksite.Wewillprovidemuchmoredetailinformationonthataswegetintotheassignments.Youcanuseyourownprogrammingenvironmentifyourcomfortablewithoneoryoudownloadours.Wehaveinstructionsonthewebonhowtodothat.

AlgorithmsPartI

51-1-CourseIntroduction

Page 6: Algorithms Part i

Welcomebacktoalgorithms.Today,we'regoingtotalkabouttheunionfindproblem.Asetofalgorithmsforsolvingtheso-calleddynamicconnectivityproblem.We'lllookattwoclassicalgorithms.QuickFindandQuickUnion,andsomeapplicationsandimprovementsofthosealgorithms.Thesubtextoftoday'slecturereallyistogothroughthestepsthatwe'llfollowoverandoveragaintodevelopausefulalgorithm.Thefirststepistomodeltheproblem.Trytounderstand,basically,whatarethemainelementsoftheproblemthatneedtobesolved.Thenwe'llfindsomealgorithmtosolvetheproblem.Inmanycases,thefirstalgorithmwecomeupwithwouldbefastenoughandmaybeitfitsinmemoryand,we'llgoaheadanduseit,andbeoffandrunning.Butinmanyothercasesmaybeit'snotfastenough,orthere'snotenoughmemory.So,whatwedoistrytofigureoutwhy,findawaytoaddresswhatever'scausingthatproblem,findanewalgorithmanditerateuntilwe'resatisfied.Thisisthescientificapproachtodesigningandanalyzingalgorithms,wherewebuildmathematicalmodelstotryandunderstandwhat'sgoingon,andthenwedoexperimentstovalidatethosemodelsandhelpusimprovethings.So,firstwe'lltalkaboutthedynamicconnectivityproblem,themodeloftheproblemforunionfind.So,here'stheidea.They'regoingtohaveasetofNobjects.Doesn'treallymatterwhattheyare.We'regoingtousethenumbers,zerothroughNtomodelourobjects.Andthen,wehavetheideaofaconnectionbetweentwoobjects.And,we'll,postulatethatthere'sgoingtobeacommandthatsays,connecttwoobjects.Giventwoobjects,provideaconnectionbetweenthem.Andthenkeypartoftheproblemisfindqueryortheconnectedquery,whichjustasks,isthereapathconnectingthetwoobjects.Soforexample,inthissetoftenobjects,weperformedalready,abunchofunioncommands,connectingfourandthree,threeandeight,sixandfive,nineandfour,twoandone.Andnowwemighthaveaconnectedquerythatsays,iszeroconnectedtoseven?Well,inthiscase,thereisnoconnection,sowesayno.Butifweaskiseightconnectedtonine?Wearegoingtosayyes,evennowedon'thaveadirectconnectionbetweeneightandnine.Thereisapathfromeighttothreetofourtonine.So,that'sourproblem,tobeabletoofficiallysupportthesetwocommandsforgivensetofobjects.Now,let'ssayweaddaunionfive,zero.So,thatcreatesaconnectionbetweenfiveandzero.Sevenandtwocreatesaconnectionbetweensevenandtwo.Andsixandone,betweensixandone.So,nowifweaskourzeroconnectedtoseven,welloneandzerowecandothattoo.Andthat'saredundantconnection.Andnow,ifweaskiszeroconnectedtosevenwe'regoingtoansweryes.Sothat'sourproblem,intermixunion,commandsandconnectedqueriesandweneedtobeabletoofficiallysupportthosecommandsforalargenumberofobjects.So,here'samuchbiggerexample.Andyoucanseethatwe'regoingtoneedefficientalgorithmsforthis.Firstofall,youcanseewe'regoingtoneedacomputerforthis.Itwouldtakequite,quitesometimeforahumantofigureoutwhetherthere'saconnection.Inthiscasethereisaconnection.Now,thealgorithmsthatwe'relookingattodayarenotgoingtoactuallygivethepathconnectingthetwoobjects.It'sjustgoingtobeabletoanswerthequestion,isthereapath?Inparttwoofthecourse,we'llconsideralgorithmsthatexplicitlyfindpaths.They'renotasefficientasunionfindbecausetheyhavemoreworktodo.Now,applicationsofthese,thesealgorithmsinvolveobjectsofalltypes.Theseareusedfordigitalphotos,wheretheobjectsarepixelsthey'reusedfornetworks,wheretheobjectsarecomputers,socialnetworks,whereit'speople,orcomputerchips,whereit'scircuitelementsorabstractthingslikevariablenamesinaprogram,orelementsinamathematicalset,orphysicalthingslikemetallicsitesinacompositesystem.So,alldifferenttypesofobjectsfor,butforprogrammingwe'regoingtoassociateeachobjectwithanameandwe'lljustnametheobjectswithanumber,integersfromzerotoN-1.That'saveryconvenientinitialstartingpointforourprogramsbecausewecanuseintegersasanindexintoanarraythen,andthenquicklyaccessinformationrelevanttoeachobject.Anditalsojustsupressesalotofdetailsthatarenotrelevanttounionfind.Infact,tomakethismappingfromanobjectnametotheintegerzerothroughN-oneistofindapplicationofasymboltableorasearchingalgorithm,whichisoneofthethingsthatwe'llbestudyinglaterinthiscoursealgorithmsanddatastructuresforsolvingthatproblem.Now,theconnections,well,weneed,afewabstractpropertiesthattheseconnectionshavetosatisfy.Andthey'reallquitenaturalandintuitive.Soweassumethatisconnectedtoisanequivalencerelation.Thatis,everyobject'sconnectedtoitself,it'ssymmetric.IfP'sconnectedtoQ,thenQ'sconnectedtoP,andit'stransitive.IfP'sconnectedtoQ,andQ'sconnectedtoR,thenP'sconnectedtoR.Nowthesepropertiesareveryintuitive.Butit'sworthwhiletostatethemexplicitlyandmakesurethatouralgorithmsmaintainthem.Whenwehaveanequivalencerelationasetofobjectsandconnectionsdivideintosubsetscalledconnectedcomponents.Aconnectedcomponentisamaximalsetofobjectsthat'smutuallyconnected.Forexampleinthissmallexamplehere,there'sthreeconnectedcomponents.Oneconsistingofjustobjectzero,secondoneobjectsone,fourandfive.Andthirdonetheotherfourobjects.Andthesecomponentshavethepropertythatifanytwoobjectsinthemareconnectedandthereisnoobjectoutsidethatisconnectedtothoseobjects,that'sconnectedcomponents.Ouralgorithmswillgainefficiencybymaintainingconnected

2-1-DynamicConnectivity

AlgorithmsPartI

62-1-DynamicConnectivity

Page 7: Algorithms Part i

componentsandusingthatknowledgetoefficientlyanswerthequerythat's,thatthey'representedwith.Okay,sotoimplementtheoperations,wehavetofindqueryandtheunioncommand.Andsowe'regoingtomaintaintheconnectedcomponents.Thefindisgoingtohavetocheckiftwoobjectsareinthesamecomponentandtheunioncommandisgoingtohavetoreplacecomponentscontainingtwoobjectswiththeirunion.So,forexample,ifwehavethesecomponents,andwegetthecommandtounionconnect,twoandfive.Essentially,weneedtomergetheconnectedcomponentscontainingtheonecontainingtwoortheonecontainingfivetogetabigconnectedcomponentsandnowwehaveonlytwoconnectedcomponents.Allofthatleadsupto,inaprogrammingworldtospecifying,adatatypewhichissimplyspecificationofthemethodsthatwearewanttogoingtoimplementinordertosolvethisproblem.Soyouknow,typicalJavamodel,whatwewilldoiscreateaclasscalledUFthatcontainstwomethods,onetoimplementunion,theotheronetoimplementconnected,whichreturnsaboolean.Theconstructor,takesSRunit,thenumberofobjects,sothatitcanbuilddatastructurebasedonthenumberofobjects.So,andwehaveto,bearinmind,aswe'rebuildingourlogarithms,thatboththenumberofobjectscanbehuge,butalso,thenumberofoperations.Wecanhavea,averylargenumber,ofunionandconnected,operationsandouralgorithmsaregoingtohavetobeefficient,underthoseconditions.OneofthepracticesthatwillfollowofteninthiscourseistocheckourAPIdesignbeforegettingtoofarintodealingwiththeproblem,bybuildingaclientthatisgoingtousethedatatypethatwedevelop.So,forthisexample,we'vegotaclientthat,Willreadinformationfromstandardinput.First,anintegerwhichisthenumberofobjectsthataregoingtobeprocessed.Andthenaseriesofpairsofobjectnames.Andwhattheclientdoesisit,it'll,firstit'llreadtheintegerfromstandardinput,andcreatea,aUFobject.Andthenaslongasstandardinputisnotempty,it'sgoingtoreadtwointegersfromtheinput.Andifthey'renotconnected,thenit'llconnectthemandprintthemout.Iftheyareconnectedit'llignore.So,that'sourtestclientandthat'safinetestclienttomakesurethatanyimplementationdoeswhatweexpectthatitwill.So,that'sthesetup.We'vedescribedtheoperationswewanttoimplementallthewaydowntocodeandwehaveclientcodethatwe'regoingtohavetobeabletoservicewithour

AlgorithmsPartI

72-1-DynamicConnectivity

Page 8: Algorithms Part i

Nowwe'lllookatourfirstimplementationofanalgorithmforsolvingthedynamicconnectivityproblem,calledQuick-find.Thisisasocalledeageralgorithm,forsolvingkindactivityproblem.Thedatastructurethatwe'regoingtousetosupportthealgorithmissimplyanintegerarrayindexedbyobject.Theinterpretationisthetwoobjects,PandQareconnectedifandonlyif,theirentriesinthearrayarethesame.Soforexampleinthisexamplewithourtenobjectstheideaarraythatdescribesthesituationaftersevenconnectionsisillustratedinthemiddleoftheslide.Sothat,afterthe,atthispointzero,five,andsixareallinthesameconnectedcomponent,becausetheyhavethesamearrayentry,zero.One,two,andsevenallhaveentryone.Andthree,four,eight,andnineallhaveentryeight.Sothatrepresentationis,showsthatthey'reconnected.Andclearly,that'sgoingtosupportaquickimplementationofthefindoperation.Wejustcheckthearrayentriestoseeifthey'reequal.CheckifPandQhavethesameID.So,sixandonehavedifferentIDs.OnehasIDone,sixhasIDzero.They'renotinthesameconnectedcomponent.Unionismoredifficultinordertomergethecomponents,containingtwogivenobjects.Wehavetochangealltheentries,whoseIDisequaltooneofthemtotheotherone.AndarbitrarilywechoosetochangetheonesthatarethesameasPtotheonesthataresameasQ.Soifwe'regoingtounionsixandone,thenwehavetochangeentrieszero,five,andsix.Everybodyinthesameconnectedcomponentassix.Fromzerotoone.Andthisis,aswe'llsee,thisisabitofaproblemwhenwehaveahugenumberofobjects,becausethere'salotofvaluesthatcanchange.Butstill,it'seasytoimplement,sothat'llbeourstartingpoint.Sowe'llstartwitha,ademoofhowthisworks.So,initially,wesetuptheIDarray,witheachentry,equaltoitsindex.Andsoallthatsaysisthatalltheobjectsareindependent.They'reintheirownconnectedcomponent.Now,whenwegetaunionoperation.So,say,fourissupposedtobeunionwiththree.Thenwe'regoingtochange,allentries,whoseIDisequaltothefirstIDtothesecondone.Sointhiscase,we'llchangethe,connectthreeandfourmeansthatweneedtochangethefourtoathree.Andwe'llcontinuetodoafewmoresoyou'llgetanideaofhowitworks.Sothreeandeightnowsotoconnectthreeandeightnowthreeandfourhavetobeconnectedtoeight.Sobothofthoseentrieshavetochangetoeight.Okay?Sonow,whataboutsixandfive?Soagain,wechangethefirstonetomatchthesecondone.Sotoconnectsixandfive,wechangethesixtoafive.Whataboutnineandfour?So,nowwehavetochangethe,toconnectnineandfour,wehavetochange,9'sentrytobethesameas4's.Sonowwehavethree,four,eight,andnine.Allhaveentrieseight.They'reallonthesameconnectedcomponent.Twoandonemeansthatweconnecttwoandonebychangingthe2201.Eightandninearealreadyconnected.Theyhavethesame,entriesintheideaarray.So,thatconnectedquery,thatfindsays,true,they'realreadyconnected.Andfiveandzerohavedifferententries.They'renotconnected,sowe'dreturnfalse,inthatcase,notconnected.Andthen,ifwewanttoconnectfiveandzero.Then,asusualwe'llconnect,theentrycorrespondingtobothfiveandsixtozero.Sevenandtwo,unionsevenandtwo.That'saneasyone.Andunion,sixandonesothereisthreeentriesthathavetogetchanged.Allthosezeroshavetogetchangedtoones.So,that'saquickdemoofQuick-find.Nownextwe'lllookatthecodeforimplementatingthat.Okay,withthisconcretedemoinmindthenmovingtocodingupthisalgorithimisprettystraightforward.Althoughit'saninterestingprogrammingexercisethatalotofuswouldgetwrongthefirsttime.Solet'sstartwiththeconstructor,wellwehavea,aprivateintegerarray.That'sourIDarray.That'sthedatastructurethat'sgoingtosupportthisimplementation.TheconstructorhastocreatethearrayandthengothroughandsetthevaluecorrespondingtoeachindexItoI.That'sstraightforward.Thefindoperation,orconnectedoperation.That'stheeasyone.ThisistheQuick-findalgorithm.Soitsimplytakesitstwoarguments,PandQ,andcheckswhethertheirIDentriesareequal,andreturnsthatvalue.Ifthey'reequal,itreturnstrue.Ifthey'renotequal,itreturnsfalse.Themorecomplicatedoperationimplementisaunion.Andthere,wefindfirsttheIDcorrespondingwiththefirstargument,andthentheIDcorrespondingtothesecondargument.Andthenwegothroughthewholearray,andlookingfortheentrieswhoseIDsareequaltotheIDofthefirstargument,andsetthosetotheIDofthesecondargument.That'saprettystraightforwardimplementation.AndImentionedthatalotofuswouldgetuswrong.ThemistakewemightmakeistoputIDofPhereratherthanfirstpickingout,thatvalue.Andyoucanthinkabouttheimplicationsofthat.That'saninsidiousbug.So,that'safineimplementationofQuickFindsothenextthingtodecideishoweffectiveorefficientthatalgorithmisgoingtobeandwe'lltalkinsomedetailabouthowtodothatbutforthisit'ssufficienttojustthinkaboutthenumberoftimesthecodehastoaccessthearray.Aswesawwhendoingtheimplementation,boththeinitializedandunionoperationsinvolvedthefor-loopthatgothroughtheentirearray.Sotheyhavetotouchinaconstantproportionaltontimesaftertouchingarrayentry.FindOperationisquick,it'sjusttoaconstantnumberoftimescheckarrayentries.Andthisisproblematicbecausetheunionoperationistooexpensive.InparticularifyoujusthaveNunioncommandsonNobjectswhichisnotunreasonable.They'reeitherconnectedornotthenthatwilltakequadratictimeinsquaredtime.Andoneofthethemesthatwe'llgo

2-2-QuickFind

AlgorithmsPartI

82-2-QuickFind

Page 9: Algorithms Part i

throughoverandoverinthiscourseisthatquadratictimeismuchtoslow.Andwecan'tacceptquadratictimealgorithmsforlargeproblems.Thereasonistheydon'tscale.Ascomputersgetfasterandbigger,quadraticalgorithmsactuallygetslower.Now,let'sjusttalkroughlyaboutwhatImeanbythat.Averyroughstandard,sayfornow,isthatpeoplehavecomputersthatcanrunbillionsofoperationspersecond,andtheyhavebillionsofentriesinmainmemory.So,thatmeansthatyoucouldtoucheverythinginthemainmemoryinaboutasecond.That'skindofanamazingfactthatthisroughstandardisreallyheldfor50or60years.Thecomputersgetbiggerbuttheygetfastersototoucheverythinginthememoryisgoingtotakeafewseconds.Nowit'struewhencomputersonlyhaveafewthousandwordsofmemoryandit'struenowthattheyhavebillionsormore.Solet'sacceptthataswhatcomputersarelike.Now,thatmeansisthat,withthathugememory,wecanaddresshugeproblems.Sowecouldhave,billionsofobjects,andhopetodobillionsofunioncommandsonthem.And,buttheproblemwiththatquickfindalgorithmisthat,thatwouldtaketen^18thoperations,or,sayarrayaxisesortouchingmemory.Andifyoudothemath,thatworksoutto30someyearsofcomputertime.Obviously,notpracticaltoaddresssuchaproblemontoday'scomputer.And,andthereasonis,andtheproblemisthatquadraticalgorithmsdon'tscalewithtechnology.Youmighthaveanewcomputerthat'stentimesasfastbutyoucouldaddressaproblemthat'stentimesasbig.Andwithaquadraticalgorithmwhenyoudothat.It'sgoingtobetentimesasslow.That'sthekindofsituationwe'regoingtotrytoavoidbydevelopingmoreefficientalgorithmsforsolvingproblemslikethis.

AlgorithmsPartI

92-2-QuickFind

Page 10: Algorithms Part i

AllrightsoQuickFindistooslowforhugeproblems.So,howarewegoingtodobetter?Ourfirstattemptisanalternativecalled,Quick-union.Thisissocalledlazyapproachtoalgorithmdesignwherewetrytoavoiddoingworkuntilwehaveto.ItusesthesamedatastructureorarrayIDwithsizeMbutnowithasadifferentinterpretation.Wearegoingtothinkofthatarrayasrepresentingasetoftreesthat'scalledaforestasdepictedatright.So,eachentryinthearrayisgoingtocontainareferencetoitsparentinthetree.So,forexample,3'sparentisfour,4'sparentisnine.So3'sentryisfourand4'sentryisnineinthearray.Noweachentryinthearrayhasassociatedwithitaroot.That'stherootofitstree.Elementsthatareallbythemselvesinjust,intheirownconnectedcomponent,pointtothemselves,soonepointstoitselfbutalsoninepointstoitself.It'stherootofthetree,containingtwo,fourandthree.So,fromthisdatastructurewecanassociatewitheachitemaroot,whichisrepresentative,say,ofit'sconnectedcomponent.Sothat'stherootofthreeisnine,goingupthatroot.Now,oncewecancalculatetheseroots,thenwecanimplementthefindoperationjustbycheckingwhetherthetwoitemsthatwe'resupposedtocheckwithareconnectivewheretheyhavethesameroot.That'sequivalenttosaying,aretheyinthesameconnectivecomponent?Sothat'ssomework,goingtofindtherootsofeachitembuttheunionoperationisveryeasy.Tomergecomponentscontainingtwodifferentitems.Twoitemsthatareindifferentcomponents.AllwedoissettheIDofP'sroutetotheIDofQ'sroute.Let'smakeP'streepointtoQ.Sointhiscase,wewouldchangetheentryofninetobesixtomergethreeandfive.Thecomponentscontainingthreeandfive.Andwithjustchangingonevalueinthearraywegetthetwolargecomponentsemergedtogether.That'stheQuick-unionalgorithm.Becauseaunionoperationonlyinvolveschangingoneentryinthearray.Findoperationrequiresalittlemorework.Solet'slookattheImplementation,ademoofthatoneinoperationfirst.Soagainwe,westartoutthesamewaybutnowtheideaarrayentryreallymeansthateveryoneofthesethingsisalittletreewheretheonenodeeacheveryonepointingtoitself.It'stherootofit'sowntreesonowifwehavetoputfourandthreeinthesamecomponent,thenallwedoiswetaketheroot,ofthecomponentcontainingthefirstitemandmakethatachildoftherootofthecomponent,componentcontainingtheseconditem.Inthiscasewejustmakefourasparentthree.Sonowthreeandeight.Soagain,wetakethefirstitemandmakeitachildoftherootofthetreecontainingtheseconditem.Sonowthree,four,andeightareinthesamecomponent.Sixandfivesixgoesbelowfive.Nineandfour,Sonowfouristherootofthetreecontainingfouriseight.Andtherootoftreecontainingnineisnine.Andsowemakenineachildofeight.Twoandone,that'saneasyone.Nowifwegetour,oureightandnineconnected,wejustcheckedthattheyhavethesamerootandtheybothhavethesamerooteightandsothey'reconnected.Fiveandfour4'srootiseight.5'srootisfive.They'redifferent.They'renotconnected.Fiveandzero.Fivegoestobeachildofzero.Sevenandtwosevengoestobeachildof2'srootwhichisone.Sixandone.6'srouteiszero1'sitsownroute,sozerobecomesachildofone.Eachoneoftheseunionoperationsjustinvolveschangingoneentryinthearray.Andfinally,sevenandthree.Soseven'srootisone,three'srootiseight,onebecomesachildofeight.Okayandnowwehaveoneconnectedcomponentwithalltheitemstogether.Alright,sonowlet'slookatthecodeforimplementingQuick-union.Theconstructoristhesameastheotherone.Wecreatethearrayandthenseteachelementtobeit'sownroot.NowwehaveaprivatemethodthatimplementsthisprocessoffindingtherootbychasingparentpointersuntilwegettothepointwhereIisequaltoIDofI,andifit'snotequal,wejustmoveIuponelevelinthetree,setIequalsIDofIandreturnit.Sostartingatanynode,youjustfollowIDequalsIDofIuntilthey'reequalandthenyou'reatarootandthat'saprivatemethodthatwecanusetoimplementthefindoperationortheconnectedoperation.YoujustfindtherootofPandtherootofQandifyoucheckifthey'reequal.AndthentheunionoperationissimplyfindthetworootsIandthensettheideathefirstonecouldbethesecondone.ActuallylesscodethanforQuickFind,noforeloops.There'sthisonewildloopthatwehavetoworryaboutalittlebit.Butthat'saquickandelegantimplementationofcodetosolvethedynamicconnectivityproblemcalledQuick-union.Sonowwe'regoingtohavetolookatcanthiscodebeeffectiveforlargeproblems?WellunfortunatelyQuick-unionisfasterbutit'salsotooslow.Andit'salittledifferentkindoftooslowthenforQuickFind,there'stimeswhenitcouldbefast,butthere'salsotimeswhenitcouldbetooslow.AndthedefectforQuick-unionisthatthetreescangettootall.Whichwouldmeanthatthefindoperationwouldbetooexpensive.Youcouldwindupwithalongskinnytree.Ofeachobjectjustpointingtonextandthentodoafindoperationforobjectatthebottomwouldinvolvegoingallthewaythroughthetree.Costinginvolvingintherayaxisesjusttodothefindoperationandthat'sgoingtobetooslowifyouhavealotofoperations.

2-3-QuickUnion

AlgorithmsPartI

102-3-QuickUnion

Page 11: Algorithms Part i

Okay.So,we'velookedatthequickunionandquickfindalgorithms.Bothofwhichareeasytoimplement.Butsimplycan'tsupportahugedynamicconnectivityproblems.So,howarewegoingtodobetter?That'swhatwe'lllookatnext.Averyeffectiveimprovement,it'scalledweighting.Anditmighthaveoccurredtoyouwhilewearelookingatthesealgorithms.Theideaistowhenimplementingthequickunionalgorithmtakestepstoavoidhavingtalltrees.Ifyou'vegotalargetreeandasmalltreetocombinetogetherwhatyouwanttotrytodoisavoidputtingthelargetreelower,that'sgoingtoleadtolongtalltrees.Andthere'sarelativelyeasywaytodothat.Whatwe'lldoiswe'llkeeptrackofthenumberofobjectsineachtreeandthen,we'llmaintainbalancebyalwaysmakingsurethatwelinktherootofthesmallertreetotherootofthelargertree.So,we,weavoidthisfirstsituationherewhereweputthelargertreelower.Intheweightedalgorithm,wealwaysputthesmallertreelower.Howwe,let'sseehowweimplementthat.Let'sseeademofirst.Okay,soagainstartoutinournormalstartingposition,whereeverybody'sintheirowntree.Andforwhenthere'sonlytwoitemstolinkit,itworks,worksthesamewayasbefore.Butnow,whenwehaveeighttomergewithfourandthree,weputtheeightasthechild,nomatterwhichordertheirargumentscame,becauseit'sthesmallertree.So,sixandfivedoesn'tmatter,whicheveronegoesdowndoesn'tmatter.Nineandfour,sonow,nineisthesmallonefouristhebigone.So,nineisgoingtobetheonethatgoesdownbelow.Twoandone,fiveandzero.Sonow,fiveandzerofiveisinthebiggertreesozerogoesbelow.Sevenandtwo,twoisinthebiggertreesosevengoesbelow.Sixandonethey'reinequalsizetrees.Andsevenandthree,threeisinthesmallertreesoitgoesbelow.So,theweightedalgorithmalwaysmakessurethatthesmallertreegoesbelow.Andagain,wewindupwithasingletreerepresentingalltheobjects.Butthistime,wehavesomeguaranteethatnoitemistoofarfromtherootandwe'lltalkaboutthatexplicitlyinasecond.So,here'sanexamplethatshowstheeffectofdoingtheweightedquickunionwherewealwaysputthesmallertreedownbelowforthesamesetofunioncommands.Thisiswithahundredsitesand88unionoperations.Youcanseeinthetopthebigtreehassometrees,somenodes,afairdistancefromtheroot.Inthebottom,fortheweightedalgorithmallthenodesarewithindistancefourfromtheroot.Theaveragedistancetotherootismuch,muchlower.Let'slookattheJavaimplementationandthenwe'lllookinmoredetailat,atthatquantitativeinformation.So,weusedthesamedatastructureexcept,nowweneedanextraarray,thatforeachitem,givesthenumberofobjectsinthetreeroutedatthatitem.Thatwillmaintainintheunionoperation.Findimplementationisidenticaltoforquickunion,you'rejustcheckingwhethertherootsareequal.Fortheunionimplementation,we'regoingtomodifythecodetocheckthesizes.Andlinktherootofthesmallertreetotherootofthelargertreeineachcase.Andthenafterchangingtheidlink,wealsochangethesizearray.Ifwemakeid,iachildofj,thenwehavetoincrementthesizeofj'streebythesizeofi'stree.Orifwedotheotherwayaround,thenwehavetoincrementthesizeofi'streebythesizeofj'stree.So,that'sthefullcodeinwhiteforimplementingquickunion.So,notverymuchcodebutmuch,muchbetterperformance.Infactwecananalyzetherunningtimemathematicallyandshowthatdefinedoperation,ittakestimeproportionaltohowfardownthetreesareinthenodeinthetree,thenodesareinthetree,butwecanshowthatit'sguaranteedthatthedepthofanynodeinthetreeisatmostthelogarithmtothebasetwoofN.WeusethenotationLgalwaysforlogarithmtothebasetwo.And,and,sofor,ifNisathousand,that'sgoingtobeten,ifNisamillionthat'stwenty,ifNisabillionthat's30.It'saverysmallnumbercomparedtoN.So,let'slookattheproofofthat.Wedosomemathematicalproofsin,inthiscoursewhenthey'recriticalsuchasthisone.Andwhyisittruethatthedepthofanynodexis,atmost,logbasetwoofN?Well,thekeytounderstandingthatisto,takealookatexactlywhendoesthedepthofanynodeincrease?Whendoesitgodownfurtherinthetree?Well.Thex'sdepthwillincreasebyone,whenitstree,T1inthisdiagram,ismergedintosomeothertree,T2inthisdiagram.Well,atthatpointwesaidweonlydothatifthesizeofT2wasbiggerthantheorequaltosizeofT1.So,whenthedepthofxincreases,thesizeofitstreeatleastdoubles.So,that'sthekeybecausethatmeansthatthesizeofthetreecontainingxcandoubleatmostlogNtimesbecauseifyoustartwithoneanddoublelogNtimes,yougetNandthere'sonlyNnodesinthetree.So,that'sasketchofaproofthatthedepthofanynodexisatmostlogbasetwoofN.Andthathasprofoundimpactontheperformanceofthisalgorithm.NowinsteadoftheinitializationalwaystakestimeproportionaltoN.Butnow,boththeunionandtheconnectedorfindoperationtakestimeproportionaltologbasetwoofN.Andthatisanalgorithmthatscales.IfNgrowsfromamilliontoabillion,thatcostgoesfromtwentyto30,whichisquitenotacceptable.Now,thiswasveryeasytoimplementand,andwecouldstopbutusually,whathappensinthedesignofalgorithmsisnowthatweunderstandwhatitisthatgainsperformance,wetakealookandsee,well,couldweimproveitevenfurther.Andinthiscase,it'sveryeasytoimproveitmuch,muchmore.Andthat'stheideaofpathcompression.Andthisideaisthat,well,whenwe'retryingtofindtherootofthetreecontaininga,agivennode.We'retouchingallthenodesonthepathfromthatnodetotheroot.Whilewe'redoingthatwemightaswellmake

2-4-Quick-UnionImprovements

AlgorithmsPartI

112-4-Quick-UnionImprovements

Page 12: Algorithms Part i

eachoneofthosejustpointtotheroot.There'snoreasonnotto.Sowhenwe'relooking,we'retryingtofindtherootof,ofP.Afterwefindit,wemightaswelljustgobackandmakeeverynodeonthatpathjustpointtotheroot.That'sgoingtobeaconstantextracost.Wewentupthepathoncetofindtheroot.Now,we'llgoupagaintojustflattenthetreeout.Andthereasonwouldbe,noreasonnottodothat.Wehadonelineofcodetoflattenthetree,amazingly.Actuallytomakeaonelinercode,weusea,asimplevariantwherewemakeeveryothernodeinthepathpointtoitsgrandparentonthewayupthetree.Now,that'snotquiteasgoodastotallyflatteningactuallyinpracticethatitactuallyisjustaboutasgood.So,withonelineofcode,wecankeepthetreesalmostcompletelyflat.Now,thisalgorithmpeoplediscoveredratherearlyonafterfiguringouttheweightinganditturnsouttobefascinatingtoanalyzequitebeyondourscope.Butwementionedthisexampletoillustratehowevenasimplealgorithmah,canhaveinterestingandcomplexanalysis.AndwhatwasprovedbyHopcroftUlmanandTarjanwasthatifyouhaveNobjects,anysequenceofMunionandfindoperationswilltouchthearrayatmostac(N+MlgstarN)times.Andnow,lgNiskindofafunnyfunction.It'sthenumberoftimesyouhavetotakethelogofNtogetone.Andthewaytothink,it'scalledtheiteratedlogfunction.Andintherealworld,it'sbesttothinkofthatasanumberlessthanfivebecauselgtwo^65536isfive.So,thatmeansthattherunningtimeofweightedquickunionwithpathcompressionisgoingbelinearintherealworldandactuallycouldbeimprovedtoevenamoreinterestingfunctioncalledtheAckermannfunction,whichisevenmoreslowlygrowingthanlg.Andanotherpointaboutthisisit</i>seemsthatthisissoclosetobeinglinearthatistimeproportionaltoNinsteadoftimeproportionaltoNtimestheslowlygrowingfunctioninN.Isthereasimplealgorithmthatislinear?Andpeople,lookedforalongtimeforthat,andactuallyitworksouttobethecasethatwecanprovethatthereisnosuchalgorithm.So,there'salotoftheorythatgoesbehindthealgorithmsthatweuse.Andit'simportantforustoknowthattheoryandthatwillhelpusdecidehowtochoosewhichalgorithmswe'regoingtouseinpractice,andwheretoconcentrateoureffortintryingtofindbetteralgorithms.It'samazingfactthatwaseventuallyprovedbyFriedmanandSachs,thatthereisnolineartimealgorithmfortheunionfindproblem.Butweightedquickunionwithpathcompressioninpracticeis,iscloseenoughthatit'sgoingtoenablethesolutionofhugeproblems.So,that'soursummaryforalgorithmsforsolvingthedynamicconnectivityproblem.Withusingweightedquickunionandwithpathcompression,wecansolveproblemsthatcouldnototherwisebeaddressed.Forexample,ifyouhaveabillionoperationsandabillionobjectsIsaidbeforeitmighttakethirtyyears.Wecandoitinsixseconds.Now,andwhat'smostimportanttorecognizeaboutthisisthatitsthealgorithmdesignthatenablesthesolutiontotheproblem.Afastercomputerwouldn'thelpmuch.Youcouldspendmillionsonasupercomputer,andmaybeyoucouldgetitdoneinsixyearsinsteadof30,orintwomonthsbutwithafastlogarithm,youcandoitinseconds,insecondsonyourownPC.

AlgorithmsPartI

122-4-Quick-UnionImprovements

Page 13: Algorithms Part i

Alright.Nowthatwe'veseenefficientimplementationsofalgorithmsthatcansolvetheunifyingproblemforhugeprobleminstanceslet'slooktoseehowthatmightbeapplied.There'sahugenumberofapplicationsofUnion-find.Wetalkedaboutdynamicconnectivityinnetworksthere'smanyotherexamplesinourcomputationalinfrastructure.Downatthebottomisoneofthoseimportantoneisinimageprocessingforunderstandinghowtolabelareasinimages.We'llseelaterKruskal'sminimumspanningtreealgorithm,whichisagraphprocessingalgorithmwhichusesUnion-findasasubroutine.There'salgorithmsinphysicsforunderstandingphysicalphenomenonthatwe'lllookatanexampleandmanyothersonthislist.So,theonewe'regoingtotalkaboutnowiscalledpercolation.That'samodelformanyphysicalsystemsI'llgiveanabstractmodelandthenjusttalkbrieflyabouthowitappliestophysicalsystems.Solet'sthinkofannbyngridofsquaresthatwecallsites.Andwe'llsaythateachsiteisopen.That'swhiteinthediagramwithprobablyPorblocked,that'sblackofthediagramwithprobabilityone-Pandwedefineasystemto,wesaythatasystemispercolatedifthetopandthebottomareconnectedbyopensites.Sothesystemattheleft,youcanfindawaytogetfromthetoptothebottomthroughwhitesquares,butthesystemtotherightdoesnotpercolate,there'snowaytogetfromthetoptothebottomthroughwhitesquares.So,that'samodelformanysystems.Youcanthinkofforelectricity.Youcouldthinkofavacantsiteasbeingaconductorand,andablocksiteasbeinginsulated.Andsoifthere'saconductorfromtoptobottomthenthethingconductselectricity.Or,youcouldthinkofitas,aswaterflowingthroughaporoussubstanceofsomekind.Whereavacantsideisjustemptyandablocksidehasgotsomematerial,andeitherthewaterflowsthroughfromtoptobottom,ornot.Oryoucouldthinkofasocialnetworkwhereit'speopleconnectedandeitherthere'saconnectionbetweentwopeopleornotandtheseareawaynottogetfromonegroupofpeopletoanothercommunicatingthroughthatsocialnetwork.That'sjustafewexamplesofthepercolationmodel.Soifwe,wearetalkingaboutarandomizedmodevacantlwherethesitesarevacantwiththegivenprobability.Andsoit'sprettyclearthatifit's.Probabilitythatasiteisvacantislowasontheleft,twoexamplesontheleftinthisdiagram,it'snotgoingtopercolate.There'snotenoughopensitefortheretobeaconnectionfromthetoptothebottom.Iftheprobabilityishighandthereisalotofopensides,itdefinitelyisgoingtopercolate.Therewouldbelotsofwaystogetfromthetoptothebottom.Butinthemiddle,whenit'smedium,it'squestionablewhetheritpercolatesornot.Sothescientificquestion,orthe,mathematicalquestionfromthismodelis,howdoweknow,whetherit'sgoingtopercolateornot?Inthisproblemandinmanysimilarproblems,there'swhat'scalledaphasetransition.Whichsaysthat,youknow,whenit'slow,it'snotgoingtopercolate.Whenit'shigh,itisgoingtopercolate.Andactually,thethresholdbetweenwhenitpercolatesandwhenitdoesn'tpercolateisverysharp.AndactuallythereisavalueasNgetslargethatifyou'relessthanthatvalueitalmostcertainlywillnotpercolate,ifyou'regreateritalmostcertainlywill.Thequestioniswhatisthatvalue.Thisisanexampleofamathematicalmodelwheretheproblemis,isverywellarticulated.What'sthatthresholdvaluebut,nobodyknowsthesolutiontothatmathematicalproblem.Theonlysolutionwehavecomesfromacomputationalmodel,wherewerunsimulationstotryanddeterminethevalueofthatprobability.Andthosesimulationsareonlyenablebyfastunionfindalgorithms,that'sourmotivatingexampleforwhywemightneedfastunionfindalgorithms,solet'slookatthat.Sowhatwe'regoingtoruniscalledasocalledMonteCarlosimulation.Whereweinitializethewholegridtobeblockedallblackandthenwerandomlyfillinopensites.Andwekeepgoing.Andeverytimeweaddanopensite,wechecktoseeifitmakesthesystempercolate.Andwekeepgoinguntilwegettoapointwherethesystempercolates.Andwecanshowthatthevacancypercentageatthetimethatitpercolatesisanestimateofthisthresholdvalue.Sowhatwewanttodoisrunthisexperimentmillionsoftimes,whichwecandoinacomputer,aslongaswecan,efficientlydothecalculationofdoesitpercolateornot.That'saMonteCarlosimulation,acomputationalproblemthatgivesusasolutiontothis,scientifcproblemwhere,mathematicalproblemsnobodyknowshowtosolveyet.So,let's,lookinalittlebitmoredetailofhowwe'regoingtouseourdynam-,dynamicconnectivitymodeltodothis.So,it'sclearthat,we'llcreateanobjectcorrespondingtoeachsite.Andwe'llgive'emaname,fromzerotoN^2-1asindicatedhere.Andthenwe'llconnectthemtogether.Ifthey'reconnectedbyopensites.Sothepercolationmodelontheleftcorrespondstothe,connectionmodelontheright,accordingtowhatwe'vebeendoing.Now,youmightsay,well,whatwewanttodois,connect,checkwhetheranysiteinthebottomrowisconnectedtoanysiteinthetoprow,anduseunionfindforthat.Problemwiththatis,thatwouldbeabruteforcealgorithm.Wouldbequadratic,rightonthefaceofit.BecauseitwouldhaveN^2,callstofind,tocheckwhetherthey'reconnected.Foreachsiteonthetop,I'dcheckeachsiteonthebottom.Muchtooslow.Instead,whatwedoiscreateavirtualsiteonthetopandonthebottom.Andthen,whenwewanttoknowwhetherthissystempercolates,wejustcheckwhetherthevirtualtopsiteisconnectedtothevirtualbottomsite.Sohowdowemodelopeninganewsite?Welltoopenasitewejustconnectittoallit'sadjacentopensites.Sothat'safewcallsto

2-5-Union-FindApplications

AlgorithmsPartI

132-5-Union-FindApplications

Page 14: Algorithms Part i

Unionbutthat'seasytoimplement.Andthenwiththat,simple,relationshipwecanusetheexactlythecodethatwedevelopedtogoaheadandrunasimulationforthisconnectivityproblem.Andthat'swherewegettheresultthat,byrunningenoughsimulationsforabig-enoughn,thatthis,percolationthresholdisabout.592746.Withthisfastalgorithmwecangetanaccurateanswertothescientificquestion.IfweuseaslowUnion-findalgorithmwewon'tbeabletorunitforverybigproblemsandwewon'tgetaveryaccurateanswer.Soinsummary,wetookanimportantproblem.The,thedynamicconnectivityproblem.Wemodeledtheproblemtotrytounderstandpreciselywhatkindsofdatastructuresandalgorithmswe'dneedtosolveit.Wesawafeweasyalgorithmsforsolvingtheproblem,andquicklysawthattheywereinadequateforaddressinghugeproblems.Butthenwesawhowtoimprovethemtogetefficientalgorithms.Andthenleftuswith,applicationsthat,couldnotbesolvedwithouttheseefficientalgorithms.Allofthisinvolvesthescientificmethod.Foralgorithmdesignwherewetrytodevelopmathematicalmodelsthathelpusunderstandthepropertiesofthealgorithmsthatwe'redeveloping.Andthenwetestthosemodelsthroughexperimentationenablingustoimprovealgorithmsiterating,developingbetteralgorithmsandmorerefinedmodelsuntilwegetwhatweneedtosolvethepracticalproblemsthatwehaveofinterest.That'sgoingtobetheoverallarchitectureforstudyingalgorithmsthatwe'regoingtousethroughoutthecourse.

AlgorithmsPartI

142-5-Union-FindApplications

Page 15: Algorithms Part i

Welcomeback.Todaywe'regoingtodosomemathandsomescience.Notalot,butweneedtohaveascientificbasisforunderstandingtheperformanceofouralgorithmstoproperlydeploytheminpractise.Sotodaywe'regoingtotalk,abouthowto,observeperformancecharacteristicsofalgorithms.We'regoingtolookathowtomakemathematicalmodelsandhowtoclassifyalgorithmsaccordingtotheorderofgrowthoftheirrunningtime.We'lltalkabitaboutthetheoryofalgorithmsandalsohowtoanalyzememoryusage.Sotoputthisallinperspective,we'regoingtothinkabouttheseissuesfromthepointofviewofdifferenttypesofcharacters.Sothefirstoneistheprogrammerwhoneedstosolveaproblemandgetitworkingandgetitdeployed.Secondoneistheclientwhowantstousethewhateverprogramdidtogetthejobdone.Thirdoneisthetheoretician,that'ssomebodywhoreallywantstounderstandwhat'sgoingon.And,andthelastoneiskindofateam,thisbasicblockingandtacklingsometimesnecessarytoget,youknow,allthesethingsdone.So,there'salittlebitofeachoneoftheseintoday'slecture.Andactuallywhenyou'reastudentyouhavetothinkthatyoumightbeplayinganyoralloftheserolessomeday.So,it'sprettyimportanttounderstandthedifferentpointsofview.So,thekeythatwe'llfocusonisrunningtime.AndactuallytheideaofunderstandingtherunningtimeofacomputationgoeswaybackeventoBabbageandprobablybefore.Andhere'saquotefromBabbage,"Assoonasananalyticalengineexists,itwillnecessarilyguidethefuturecourseofthescience.Wheneveranyresultissoughtbyitsaid,thequestionwillarisebywhatcourseofcalculationcantheseresultsbearrivedatbythemachineintheshortesttime".IfyoulookatBabbage'smachinecalledtheanalyticengine,it'sgotacrankonit.AndliterallytheconcernthatBabbagehadinknowinghowlongacomputationwouldtakeis,howmanytimesdowehavetoturnthecrank.It's,it'snotthatdifferent,intoday'sworld.Thecrankmaybesomethingelectronicthat'shappeningabilliontimesasecond.Butstill,we'relookingfor,howmanytimesdoessomediscreetoperationhavetobeperformedinordertogetacomputationdone.So,therearelotofreasonstoanalysealgorithms.Inthecontextofthiscoursewearemainlyinterestedinperformanceprediction.Andwealsowanttocomparetheperformanceofdifferentalgorithmsforthesametask,andtobeabletoprovidesomeguaranteesonhowwelltheyperform.Alongwiththis,isunderstandingsometheoreticalbasisforhowalgorithmsperform.Butprimarily,thepracticalreasonthatwewanttobeanalyzingalgorithmsandunderstandingthemistoavoidperformancebugs.Wewanttohavesomeconfidencethatouralgorithmsgoingtocompletethejobintheamountoftime,that,thatwethinkitwill.Andit'svery,veryfrequenttosee,intoday'scomputationalinfrastructure,asituationwheretheclientgetsbadperformance,becausetheprogrammerdidnotunderstandtheperformancecharacteristicsofthealgorithm.Andtoday'slectureisabouttryingtoavoidthat.Now,we'regoingtofocusonperformanceandcomparingalgorithmsinthiscourse.There'slatercoursesintypicalcomputersciencecurriculathathavemoreinformationaboutthetheoreticalbasisofalgorithmsandI'llmentionalittlebitaboutthatlateron.Butourfocusisonbeingabletopredictperformanceandcomparingalgorithms.Nowthere'salonglistofsuccessstoriesindesigningalgorithmwithbetterperformancein,inenablingthesolutionofproblemsthatwouldotherwisenotbesolved.AndI'lljustgiveacoupleofexamples.OneofthefirstandmostfamousisthesocalledFFTalgorithm.That'sanalgorithmforbreakingdownthewaveformofnsamplesofasignalintoperiodiccomponents.Andthat'satthebasisfordvdsandjpegsand,andmanyotherapplications.There'saneasywaytodoitthattakestimeproportionaltoN^2.ButtheFFTalgorithm,takesonlyNlogNsteps.AndthedifferencebetweenNlogNandN^2is,isthedifferencebetweenbeingabletosolvealargeproblemandnotbeingabletosolveit.Alotofthedigitaltechnology,digitalmediatechnologythatwehavetodayisenabledbythatfastalgorithm.AnotherexamplewasactuallydevelopedbyAndrewAppel,who'snowthechairofcomputersciencehereatPrinceton.Anditwasdevelopedwhenhewasanundergraduateforhisseniorthesis.It'safastalgorithmfortheNbodysimulationproblem.TheeasyalgorithmtakestimeproportionaltoN^2,butAppel'salgorithmwasanNlogNalgorithmthatagain,meantthatscientistscandoNbodysimulationforhugevaluesofN.Andthatenablesnewresearch.S0,othechallengeisthatweusuallyfaceis,willmyprogrambeabletosolvealargepracticalinput?And,andactually,theworkingprogrammerisactuallyfacedwiththatallthetime.Why,whyismyprogramrunningsoslowly?Whydoesitrunoutofmemory?Andthat'sfacedprogrammersforareallylongtimeandtheinsighttoaddressthis.DeuterKanoof,inthe1970s,wasthat,wereallycanusethescientificmethodtounderstandtheperformanceofalgorithmsinoperation.Maybewe'renotunlockingnewsecretsoftheuniversebut,wecanusethe,scientificmethod,andtreatthecomputer,assomethingtobestudiedinthatwayandcometoanunderstandingofhowourprogramaregoingtoperform.Andlet'stakealookatthatinmoredetail.Sothisjustaquicksummaryofwhatwemeanbythescientificmethod,whichhas,beensuccessfulforacoupleofcenturiesnow.So,whatwe'regoingtodois,observefromsomefeatureofthenaturalworld.Inthiscase,it'sgoingtobetherunningtimeofourprogramonacomputer.Thenwe'regoingtodevelophypothesissomemodelthat'sconsistentwiththeobservations,and

3-1-AnalysisofAlgorithmsIntroduction

AlgorithmsPartI

153-1-AnalysisofAlgorithmsIntroduction

Page 16: Algorithms Part i

we'regoingtohopethat,thathypothesisisgoodenoughthatit'llallowustopredictsomething.Usuallypredictarunningtimeforlargerproblemsize,oronadifferentcomputer.Andthenwe'llverifythepredictionsbymakingmoreobservations,andvalidateuntilwe'recomfortablethatourmodelhypothesisandobservationsallagree.That'sawaytogetcomfortthatweunderstandtheperformanceofourprograms.Now,thewithinthescientificmethod,there'ssomebasicprinciplesandthe,thefirstisthatifyou'regoingtorunexperiments,youshouldexpectthatsomebodyelseshouldbeabletorunexperimentsandgetthesameresult.Andalsothehypotheseshavetohaveaspecificpropertythattheexperimentcanshowthehypothesistobewrong.So,ithastobecarefullycrafted,andwe'llbesuretotrytodothat.So,andagainthefutureofthenaturalworldthatwe'restudyingissomeparticularcomputerthatexistsinthenaturalworld.Itchangesthealgorithmfromanabstractiontoa,some,somekindofactualphysicalthinghappeninglikeelectronsracingaroundinsidethecomputer.

AlgorithmsPartI

163-1-AnalysisofAlgorithmsIntroduction

Page 17: Algorithms Part i

Okay,sothefirststepistobeabletomakesomeobservationsabouttherunningtimeoftheprograms.Andforanalysisofalgorithmsthat'seasierthaninalotofscientificdisciplines,aswe'llsee.Forarunningexamplewe'regoingtousetheso-called3-sumproblem.Andit'saneasytostateproblem.Ifyou'vegotNdistinctintegers,howmanytriplesumtoexactlyzero?Forexampleinthisfile8ints.text.Textwhichhaseightintegersinit.There'sfourtriplesthatsumtozero.30-40,ten.30-twenty-tenandsoforthandsoourgoalistowriteaprogramthatcancomputethisquantityforanyinputfile,anysetofNintegers.Thisisactuallya,anextremelyimportantcomputationthat'sdeeplyrelatedtomanyproblemsincomputationalgeometrywhichisabranchofcomputersciencethatcoversthealgorithmsandunderlyingsciencerelatedtographicsandmoviesandgeometricmodelsofallsort.Sothisisaactuallyanimportantpracticalproblem.Butit'sasimpleonetowritecodeforinaviewyoucouldwritedownthisprogramwithoutmucheffort.It'sa,gotastaticmethodcountthatisgoingtogoaheadandtakeaintegerarrayasanargument.And,isthat,that'sanumberofintegers,that'sthelengthofthearray.Wewillstartwithavariablecountequalszero,andthenatripleforloop,thatcheckseachtripleIjk,wegoIfromoneandjfromI+1ton,andkfromj+1ton,sothatwegeteachtriplejustonce.AndthenifI+j,ai+aj+ak=zero,weincrementthecount.Alright.Andafterthattriplefourloop,wereturnthecount.Andthenthemainmethod,inthissimpleclassjustreadsin,alltheintegers,andprintsoutthecount.Sothat'sabruteforcealgorithmthatisafinemethodforsolvingthethreesumproblem,nowwhatwe'reinterestedinishowmuchtimedoesthistakeasafunctionof'n?Well,onetotimeourprogramistoisjustlookatthewatch.Ifyouhaveastopwatch,orlookattheclockoryourphone,orwhateveryoumightneedyoucanjustgoaheadandtimeitifyouwantorwehave,Javahasthispartofit'sstandardlibrary,astopwatchclassthatwillgoaheadandcomputealapsetime.So,inorder,anytimeyourunaprogram,ifitissetuptoeasilytakeinputofdifferentsizes,anaturalthingtodo,isjustrunitforbiggersizes.Soforeightintsthisprogramtakesnottoomuchtime,for1000intsittakeshalfasecond.For2,000.Takesmoretime.That's3.7secondsrunitagain,stilltakes3.7secondsfor4,000,soeachtimewe'redoublingthesizeoftheinputandit'sdefinitelytakingmoretimeeachtime.Andactuallyaswe'llseeifprogrammerswhogetinthehabitoftestingoranytimeontheirprograminthiswaycangetsothatyoucanactuallyprettyeasilyandquicklyevaluatewhenit'sgoingtofinish.Infact.Whileyou'rewaitingforittofinishyoucanoftenfigureitout.Sothatonetook30secondsfor4Kanddefinitelywecouldfigureitouthowlongit'sgoingtotakefor8Kbeforeitfinishes,andyou'llseehowinjustasecond.I'mnotgoingtowaitrightnow.Youcanthinkaboutwhatyouthink.Okayso[cough]that'sempiricalanalysis,analysis.Runitforvariousinputsizesandmeasuretheirrunningtime.Nowifthisweresomescientificproblemwherewewerecountingsomethingthathappeninthenaturalworld.Thenumberofantsinananthillorwhateverthenwe'dhaveonlyafewdatapointsandwewouldtrytounderstandwhatswasgoingonbydoingaplotoforrunningtimewithquiteinterestedinontheYaxisandproblemsizewiththeXaxis.Hitacurvelikethisandactuallywhatsscienceusuallydobecauseofsomemanyproblemsfallintooutofthisclassisdotheplotasalg,lgplot.Ifyoudoitasalg,lgplotveryoftenyou'llgetastraightline.Andtheslopeofthestraightlineisthekeytowhat'sgoingon.Inthiscase,theslopeofthestraightlineisthreeandsoyoucanrunwhat'scalledaregressiontofitalate,thestraightlinethroughthedatapoints.Andthen,it'snotdifficulttoshowtodothemathtoshowthatifyougetastraightlineandtheslopeisB,thenyourfunctionisproportionaltoA,N^B.That'scalledthepowerlaw.Andthat'strueofmany,manyscientificproblemsincludingmostalgorithms.Sohere'salittlebitofthemathforthat.Sothestraightlinemeansthatsincewedidalg,lgplotwithpowersoftwo,thatlg(T(N)=BlgN+C.AndwehaveourempiricalvaluesofBandCandthenifyouraisebothsidesofthatequationtotwotothatpowerthenyougetT(N)=aconstanttimesN^B.Sorightawayjustfromobservationwehaveaprettygoodmodelfortherunningtimeforourprogram,wecanfigureanddothemathandfigureoutthatitseemsasthoughtherunningtimeisaboutten^-10N^3seconds.Wecanusethathypothesistogoaheadandmakepredictions.JustpluginfordifferentvaluesofNanditsaysitwilltakeus400secondsfor16,000.400secondsisplentyoftimebutnowwecangoaheadandinvestandrunthatexperimentandsureenoughwe'reprettyclosetothat408secondswhenwerunit.Andnowwecanmakeapredictionfor32,000orfororforwhateverelsewemightbeinterestedin.Themodelhelpsusdopredictionswithoutinvestingtheexpensetoruntheexperiments.Infact,inthissituationifthereisapowerlaw,andagaininaverygreatmajorityofcomputeralgorithmrunningtimesisgoingtobeapowerlaw.WhatwecandoisjustdoublethesizeoftheinputeachtimethewaywewereandtaketheratiooftherunningtimesforNand2N.Andifyoudothat,thatratiogoingtoconvergetoaconstant.Andinfactthelogoftheratioisgoingtoconvergetothatconstant,whichistheexponentofNandtherunningtime.Andyoujustneedalittlemathtocheckthatone,butthat'saveryeasyandnaturalwaytogoaheadandpredictrunningtimes.Sothat'swhatIsaidbeforeis,sowehavethisquickwaytoestimateBinthepowerlawrelationship.HowdoweestimateA?WellwecanjustrunitandsolveforA.Sooncewe've

3-2-Observations

AlgorithmsPartI

173-2-Observations

Page 18: Algorithms Part i

decidedthat,thatexponentisthreelet'srunitforsomebigNandwegetprettyclosemodeltotheonewehadfromplottingthings.Soit'salmostidenticalhypothesisandwejustgotitbyrunningtheprogramdoubleNeachtime.Okaysothereisalotofeffectsintryingtounderstandtherunningtimeofaprogramon,onyourmachine.[cough]So.Keyeffectsareindependentofwhatcomputeritis.Andthat'sthealgorithmyou'reusingandwhat'sthedata.Andthat'sgoingtoreallydeterminetheexponentinthepowerlaw.Andthenthere'salotof,systemdependenteffects.Whatkindofhardwaredoyouhave?Doyouhaveafastcomputeroraslowone?Whatkindofsoftware?What'sgoingoninyourcomputer?AllofthosethingsreallydeterminetheconstantAinthepowerlaw.So.Inmodernsystemsitissomuchgoingoninthehardwareandsoftware,it'ssometimesdifficulttogetreallyprecisemeasurements.Butontheotherhandwedon'thavetosacrificeanimals,orflytoanotherplanetthewaytheydoinothersciences,wecanjustrunahugenumberofexperimentsandusuallytakecareofunderstandingthesekindofeffects.

AlgorithmsPartI

183-2-Observations

Page 19: Algorithms Part i

Observingwhat'shappeningaswedidinthelastsectionitgivesusa,awaytopredictperformancebutitreallydoesn'thelpusunderstandwhatthealgorithm'sdoing.Sonext,we'regoingtolookatmathematicalmodel.Awaytogetabetterconceptofwhat'sreallyhappening.Again,thisconceptwasreallydevelopedandpopularizedbyDonKnuthstartinginthelate60s.Atthattime,computersystemswerereallybecomingcomplicatedforthefirsttime.Andcomputerscientistswereconcernedaboutwhetherwereallyweregoingtobeabletounderstandwhat'sgoingon.AndKnuthwasverydirectinsayingthatthisissomethingthatwecertainlycando.Wecancalculatethetotalrunningtimeofaprogrambyidentifyingallthebasicoperations,figuringoutthecost,figuringoutthefrequencyofexecutionandsummingupthecosttimesfrequencyforalltheoperations.Youhavetoanalyzetheprogramtodeterminewhatsetofoperationsandthecostdependsonthemachineandthecomputerinthesystemiswhatwetalkedaboutbefore.Thefrequencyleadsustomathematicsbecauseitdependsonthealgorithmandinputdata.Knuthhaswrittenaseriesofbooksthatgiveverydetailedandallexactanalyseswithinaparticularcomputermodelforawiderangeofalgorithms.So,fromKnuth,weknowthatinprinciple,wecangetaccuratemathematicalmodelsfortheperformanceofalgorithmsorprogramsandoperation.Allright.Sowhat,whatdoesthisprocesslooklike?Wellyoucan,ifyouwantrunexperiments.In,inancienttimes,wewouldactuallylookatthecomputermanualandeverycomputercamewithamanualthatsaidpreciselyhowlongeachinstructionwouldtake.Butnowadays,it'salittlemorecomplicated.So,werunexperimentsand,andyoucangoaheadanddoabillionadsandfigureoutthatmaybeonyourcomputer,anadtakes2.1nanoseconds.Oryoucandomorecomplicatedfunctionslikecomputersignoranarctangentalthoughthat'salreadygettingclosetotheanalysisofalgorithms.So,there'ssomewaytodeterminethecostsofthebasicoperations.Andso,we'lljustinmost,mostofthecaseswe'lljustpostulatethatit'ssomeconstantandyoucanfigureoutwhattheconstantis.Althoughwhenwe'reworkingwithacollectionofobjects,ofanobjectstherearesomethingsthattakestimeproportionaltoNlikeifyou'regoingtoallocateaarrayofsizeNittakestimeproportionaltoNbecauseinJavathedefaultisthatalltheelementsinthearrayinitializetozero.Inotheroperationsitdependsonthesystemimplementationandanimportantoneisstringconcatenation.Ifyouconcatenatetwostringstherunningtimeisproportionaltothelengthofthestring.InmanynovicesprogramminginJava,makeamistakeofassumingthat'saconstanttimeoperationwhenitsnot.Alright,sothat'sthecostofeachoperation.Moreinterestingisthefrequencyofoperation,ofexecutionoftheoperations.Sothisisa,a,it'saverysimplevariantofthethreesumproblem.That'stheonesumproblem.That'showmanynumbersareactuallyequaltozero?Howmanysinglenumbersadduptozero?So,thatone,it'sjustonefourloop,andwegothrough,andwetestedthenumberzeroandincrementorcount.AndbyanalyzingthatcodeyoucanseethatIandcounthavetobedeclaredandthentheyhavetobeassignedtozero.There'scomparesofiagainstNandthere'sN+oneofthem.There'scomparesofA(i)againstzero,there'sNofthose,Narrayaxisesandthenumberincrementedisnumberoftimesthere'sanincrementisvariable.IhasincrementedNtimes,butcountcouldbeincrementedanynumberfromzerotoNtimes.Andsothatfrequencyisdependentontheinputdata.Orwemightneedamodelfordescribingthatormaybethere'sotheroperationsthataremoreexpensiveandwewon'tneedtoworryaboutthat.So,let'slookatthenextmorecomplicatedproblemiswhataboutthefrequencyofexecutionofinstructionsinthisprogramwhichisthetwosumproblem,howmanypairsofintegerssumtozero?Well,inthiscase,youhavetodoalittlebitofmathtoseethatwhenwewhenigoesfromzerotoN,andjgoesfromi+atoNthenumberofcomparesthatwedowork,plusarrayaxisesthatwedoistwoforeachtimetheifstatementisexecutedforAiandAjandthattimeis,thingisexecutedN-onetimesthefirsttimethroughtheloopandN-two^2andsoforth.It'sthesumoftheintegersfromzerouptoN-onewhichisasimplediscretesumone-halfN,(N-one)andsince,andsincewe'redoingittwicethenumberofarrayaxisesisN,N-one.So,wecangoaheadandgettheseactualexactcounts.Butalready,it'sgettingalittlebittedioustodothat.AndasfarbackasTuringwhoalsoknewthatandaswellasBabbagedid,thatwewanttohaveameasureoftheamountofworkinvolvedintheprocess.Herecognizedthatyoudidn'twanttonecessarilygothroughanddoitinfulldetail.It'sstillhelpfultohaveacrudeestimate.So,youcouldcountupthenumberoftimesthateveryoperationisapplied,giveitweightsand,andcountthe[inaudible]andsoforth.Butmaybeweshouldjustcounttheonesthataremostexpensivethat'swhatTuringsaidin1947,andrealisticallythat'swhatwedonowadays.Soratherthangoinginandcountingeverylittledetail,wetakesomebasicoperationthat'smaybethemostexpensiveandorandortheonethat'sexecutedthemostoften.Theonethatcostandfrequencyisthehighestandusethatasaproxyforrunningtime.Essentially,makingthehypothesisthattherunningtimeis,isgoingtogrowlikeaconstanttimes[inaudible],So,inthiscase,weregoingtopickarrayaxises.So,that'sthefirstsimplification.Andthesecondsimplificationisthatwe'regoingtoignorelowordertermsintheformulasthatwederive.Andthere'saneasywaytodothat.It'scalledthetildenotationand,

3-3-MathematicalModels

AlgorithmsPartI

193-3-MathematicalModels

Page 20: Algorithms Part i

andtheideaiswhenNislargeinaformulalikethistheN^3termismuch,muchhigherthantheNtermorsixteen.Infact,somuchsothatwewouldn'tevenhardlynoticetheseloworderterms.So,alloftheseformulasaretildeone-sixthN^3andthat'safinerepresentativeorapproximate,approximationtothesequantities.Anditgreatlysimplifiestheircalculationstofora,throughawaytolower,lowertotermslikethis.So,byfocusingononeoperationand,throwingawaythetildes,thelowerthetermsandthisisthetechnicaldefinitionoftilde.It'sjust,F(N)tildeG(N)meansthelimitasFNorGNequalsone,andyoucancheckthatthat'sgoingtoholdinthesekindsofsituations.So,thatgreatlysimplifiesthefrequencycounts.Andifwe'reonlypickingonethingwe'rejusttalkingabouttildeN^2andmaybeanothertildeN^2fortheincrementforthetwosumproblems,okay.Soagain,whenNislarge,thetermsarenegligibleandwhenNisreallysmall,they'renotnegligiblebutwedon'treallycarebecausewe'retryingtoestimaterunningtimesforlargeNandrunningtimesforsmallNaregoingtobesmallnomatterwhat.Allright,sonow,we'reusingboththecostmodelandthetildenotationandthenwecansimplysay,thatthisprogramusestildeN^2squaredarrayaxisesandhaveimplicitthehypothesisthatwethinktherunningtimeisgoingtobetilde,aconstant,timesNsquared.Okay,wenowwhataboutthreesums,let'sdoa,arealproblem.Sonow,wehavethetripleloop.Andthen,wehavetodoamorecomplicatedcombinatorialprobleminisnotthatbigadealreallywearelookingatthedistinctnumberofwaysyoucanchosethreethingsoutofNandthat'sbinomialcoefficient.Andagain,doingthemathandusingthetilde,it'sjusttildeone-sixthN^3threerayaxisesforeachtriplesowecansayone-halfN^3.Sowe'renotcomputingandsummingthecostsofalloperationsthat'stoomuchwork.We'repickingthemostexpensiveintermsofcosttimesfrequencyandapproximatingthatandtryingtogetagoodmodelfortherunningtime.Sonowmost,we'renotgoingtodoofafulldiscretemathematicsinthiscoursebutthere'ssomebasicthingsthatwe'llwanttouseandare,arenotthatdifficulttounderstand.So,alotoftimeswefindoutthatweneedtocomeupwithanestimateofadiscretesum.Likewedidforone+twouptoN.Orsomeofthesquaresorotherthingslikethethreesumtripleloop.Andsoactuallyifyou'vehadbasiccalculus,onewaytothinkofitastojustreplacethesumwithaninterval,integral.Thatusuallyworksorwecandothemathandusetheso-calledEuler–Maclaurinsummationformulatogetatrueapproximation.Butifyouthinkofitthiswayyou'llbelieveuswhenwesaythat,thatthingistildeone-halfN^2orsumofone+one-half+one-thirduptoone/N.That'slikeintegralfromx=onetoN1/xandthat'snaturallogofN.Noweventhethreesumtripleloopkindofifyou'reusedtomultipleintegrals,Iwillquicklygiveyoutheone-sixthN^3.There'smanymoreandothertechniquesthatwecoulduseforthis.Andwe'renotgoingtoteachallthat,butwe'llsometimesrefertoresultsofthistype.Alright,soinprinciple,Knuthtellsusthataccuratemathematicalmodelsareavailableinpractice,wecangetreallycomplicatedformulas.Wealsomightneedsomeadvancemathematicsthatthetheoreticianwillrevelin.Butthatmaybepeoplelearningalgorithmsforthefirsttimemightnotbeexpectedtoknow.Sointheendcarefulexactmodelsarebest,bestleftforexit,experts.There'sreallyalotofthingsthatcangoon.Ontheotherhandapproximatemodelsaredefinitelyworthwhile.Andforallthealgorithmsthatweconsiderwe'lltrytocommunicateareasonableapproximatemodelthatcanbeusedtodescribetherunningtime.Sometimeswe'llgivethemathematicalproofsandothertimeswe'llhavetojustcitetheworkofsomeexpert.

AlgorithmsPartI

203-3-MathematicalModels

Page 21: Algorithms Part i

Now,fortunatelywhenweanalyzealgorithms,actuallynottoomanydifferentfunctionsariseandactuallythatpropertyallowsustoreallyclassifyalgorithmsaccordingtotheirperformanceastheproblemsizegrows.Sothat'swhatwe'lltalkaboutnext.Sothegoodnewsisthere'sonlythesefewfunctionsturnupaboutthealgorithmsthatweareinterestedin.Wecancraftthingsthathaveotherfunctionsandtherearecounterexamplestothis.Butreallyagreatnumberofthealgorithmsthatweconsideraredescribedbythesefewfunctionsandthatareplottedhere.And[cough]thewhenwearetalkingabouttheorderofgrowth,wearenottalkingabouttheleadingconstant.Normallywe'llsaytherunningtimeofthealgorithmisproportionaltoNlogN.ThatmeanswethatwethinkthatourhypothesisisthattherunningtimeistildeClgN,NlgN,whereCissomeconstant.Andintheseplots,thesearelg,lgplotsthatnotreallygiveagoodideaofwhat'sgoingon.Ifaorderofgrowthislogarithmicorconstant,doesn'tmatterhowbigthethingis.It'sgoingtobefastoftherunningtimeforisTforsayathousand,andforhalfamillionitwillbeprettyclosetoT.Ifit'slinear,ifit'sautogrowthisproportionaltoNthenastherunningtime,asthesizeincreasestherunningtimeincreasescorrespondingly.Andthesameistrue,almost,ifit'sNlogN.Sothosearethealgorithmsthatwestrivefor.Theyscalewiththeinputsize.Astheinputgrows,sogrowstherunningtime.Andthat's,areasonablesituationtobein.AswetalkedaboutwhenwetalkedaboutUnion-find.Ifit'squadratic,therunningtimegrowsmuchfasterthantheinputsize.Andit'snotfeasibletousesuchanalgorithmforlargeinputs.Andqubicisevenworse.Sowhatwefindisformanyalgorithmsourfirsttaskisreally,simply,makesureit'snotquadraticorqubit.Andtheseorderofgrowthclassificationsactuallycomefromkindofsimplepatternsintermsofthecodethatwewrite.Soifourcodehasnoloopsinit,thentheorderofgrowthisgoingtobeconstant.Ifourcodehassomekindofloopwheretheinput'sdividedinhalf,andsobinarysearchalgorithmisanexampleofthat.Thenourordergrowthwillbelogarithmicandwe'lltakealookatthatanalysisandbutifyoudothedoublingtest,itgrowsalmostlinearly,ifyouhaveahugeinputandyoudoublethesizeit's,it'sstillgoingtobeI'msorry,notlinearly,constantjustlikeifit'sconstant.You'llhardlynoticethatlgN.Ifyouhavealoopwhereyoutoucheverythinginyourinput.Thantherunningtimeislinear,proportionaltoendsoatypicalexampleofthatwouldbefindthemaximum,ortocountthenumberofzeros.Ouronesomeproblem.Averyinterestingcategoryisaso-calledNlgNalgorithmsorlinearrhythmicalgorithms.Andthosearetheonesthatarisefromaparticularalgorithmsdesigntechniquecalledthedivideandconquer.AndtheMergesortalgorithm,whichwe'lltalkaboutinacoupleofweeks,isaprimeexampleofthat.Andthenifyouhavedoublefourloopslikeourtwosumalgorithm,that'sgoingtobetimeproportionaltoN^2.Aswesaw,that'squadratic,ortriplefourlooplikeour3-sumalgorithm,that'sgoingtobecubicortimeproportionaltoN^3.Foraquadraticalgorithmoracubicalgorithm,thedoublingfactorisfouroreightastheinputsizedoubleforcubicalgorithm,therunningtimegoesupbyafactorofeight,andthat'sthekindofcalculationthatyoucandoinyourheadwhilewaitingforaprogramtofinish.There'salsoacategoryofalgorithmswho'srunningtimeisexponentialandinthosealgorithmsndoesn'tgetverylargeatandwe'lltalkaboutthoseattheendparttwoofthecourse.Sothesearesomepracticalimplicationsof,oftheordergrowth.Andwereallydwellonthistoomuch,excepttocomebacktothepointthatthealgorithmswearereallyinterestedin,thatcansolvehugeproblems,arethelinearandNlgNalgorithms.Becauseevennowaquadraticalgorithmonatypicalfastcomputercouldonlysolveproblemsandsayingthattensofthousandsinacubicalgorithmonlyinthesizeofthousands.Andnowadaysthosearejustnotusefulbecausetheamountofdatathatwehaveismorelikethemillionsorbillionsortrillions.Thatfactisbecomingmoreandmoreevidentastimewearsontheancienttimeswouldhavesomediscussionaboutwhetherquadraticalgorithmmightbeusefulbutthesituationgetsworseasthetimegoeson,soweneedbetteralgorithms.Toillustratetheprocessofdevelopingamathematicalmodelfordescribingaperformancethroughanalgorithm,we'lllookatafamiliaralgorithmcalledbinarysearch.It's,thegoalisthatyouhaveasortedarrayofintegers,sayandyou'regivenakey.Andyouwanttoknow,isthatkeyinthearray?Andifitis,what,what'sitsindex?Andafastalgorithmfordoingthisisknownasbinarysearch,wherewecomparethekeyagainstthemiddleentry.Inthiscase,ifwe'relookingfor33,wecompareitagainst53.Ifitssmallerweknowitsinthelefthalfofthearray,ifit'slargerweknowit'sintherighthalfofthearray,ifit'sequal,wefoundit.Andthenweapplythesamealgorithmrecursively.Solet'squicklylookatademo.Sowe'relookingfor33inthisarray,compareitagainstthemiddleentryinthearray.53andit'slesssowegoleft,sonowwecanconcentratejustonthelefthalfofthearray,nowwelookinthemiddleofthishalf,that's25,33isbiggersowegoright.Andnowweconcentrateontherighthalforthelefthalfandwehaveasmallersubarray.Lookatthemiddle,33islesssowegoleftandnowwehaveonlytheoneelementtolookatandwefoundourkey33inthearrayandwereturnthatindexfour.Ifwe'relookingforsomethingthat'snotinthearray,wedothesameprocess.So,say,we'relookingfor34.It'sgoingtobethesame.Lookinthelefthalf,lookintherighthalf.Looktotheleftofthe43.Now,there'sonlyonekeytolookat.Andit'snot34,sowesay,

3-4-Order-of-GrowthClassifications

AlgorithmsPartI

213-4-Order-of-GrowthClassifications

Page 22: Algorithms Part i

it'snotthere.Sothat'sbinarysearch.Sohere'sthecodeforbinarysearch.Actually,BinarySearchalthoughit'sasimplealgorithm,itsnotoriouslytrickytogeteverydetailright.Infactonepaperclaimed,thatthefirstbugfreebinarysearchwasn'tpublisheduntil1962,andevenin2006,abugwasfoundinJava'simplementationofbinarysearch,justanindicationofthecarethatwehavetotakeindevelopingalgorithmsespeciallyforlibrariesthataregoingtobeusedbymillionsofpeople.Sohere'sanimplementation.It'snotrecursivealthoughoftenwecanimplementthisrecursively.Andit'sjustreflexingcode,whatIdescribedinwords,wehavetofind.Akey,whetherakey'sinanarray.Andweusetwopointers,lowandhigh,to,indicatethepartofthearrayweareinterestedin,aslongaslowislessandequaltohigh,wecomputethemiddle.Andthenwecompareourkeyagainstthemiddle,actuallyitsathreewaycompare,seeitslessorgreaterorifitsequal,we,wereturnthatmidindex.Ifitslessweresetthehighpointer,ifitsgreater,weresetthelowpointer,andwekeepongoinguntilthepointersareequal.Iftheyareequalandwehaven'tfounditthenwereturn-one.Andit'seasytopersuadeourselvesthatthisprogramworksasadvertisedbythinkingaboutthisinvariant,ifthekeysinthearray,thenit'sbetweenlowandhighinthearray.Alright,sothat'saprogramthat,youareprobablyfamiliarwith.Letslookatthemathematicalanalysisofthatprogram.Andthisa,atheoremthatwearegoingtoproveeasily.Wewanttoalotofproofsbutthisisoneworthdoing.Soitssaythatbinarysearchusesatmostone+lgbasetwoeventcompares,tocompleteasearch,inasortedarrayofsizef.Sowedothat,tosetuptheproblembydefining,avariableT(N),whichisthenumberofcomparesthatbinarysearchneededforitsarraysizeand.Andthenwewritedownarecurrencerelationthatisreflexthecode.Andwhatthecodedoesis,itdividestheproblemsizeinhalfsothat.Iftheeventislessorequaltotheeventovertwoplusdependingonhowyoucountwhatthecompareisthinkofitasatwowaycomparesodividedinhalfbydoingonecompareandthat'strueaslongasNisbiggerthanone.Ifit'sequaltoonethesolutionisone.Soit'sarecurrentrelationdescribingthecomputation.Andsowe,wecangoaheadand,solvethisrecurrencebyapplyingtherecurrenceitself,tothefirsttermontheright.Nowthat'scalledtelescoping.SoifthisistrueandwecanapplythesamethingtoT(N/2).Andthrowoutanotheroneandifthat's,thisistrue,applythesamethingtoNoverfour,andthrowoutanotheroneandsoforthuntilwegetdowntojustone.InwhichcasewehavelgNonesleft.Nowthisisatruesketchyoumighthavenoticedthat,thatthisproofactuallyonlyholdsifNisapoweroftwo.BecausewenearlyspecifyinthisrecurrencewhatwemeanifNisodd.Butit'spossibletogoaheadandsorry,possibletogoaheadandtakecareofthatdetailaswellandshowthatbinarysearchrunningtimeislogarithmicalways.Allright,sogiventhatfactwecandevelopafasteralgorithmforathreesome.It'sasortingbasedalgorithm.Andsowhatwe'regoingtodoiswe'regoingtotakethenumbersthatwehaveasinputandsortthem.We'lltalkaboutsortingalgorithmsnextweek.AndwegetthattimeintimeproportionaltoNlgNbutthat'snotthemainpartofthecomputation.Themainpartofthecomputationistoafterthenumbersaresorted,we'llgothroughandforeachpairofnumbersaiandaj.We'lldoabinarysearchfor-ai+ij.Ifwefinditthenwe'llhavethreenumbersthatsumtozero.Soifwe[cough]sortournumbersandthengothroughforeachpairdoabinarysearchtoseeifit'sthere,so-40,zero.Minusthatis40,wedoabinarysearchthat'sintheresowehaveonesolutiontothe3-sumproblem.Anddothatforallpairsofnumbers.ThenaquickanalysissaystheorderofgrowthofrunningtimeisgoingtobeN^2lgN.Thenyouneedagoodsort,well,youcouldusetheelementaryinsertionsortthefirstonewetalkaboutbuttherunningtimeofthebinarysearchforeachofthepairs,eachoftheN^2pairsorN^2/2pairswe'regoingtodothebinarysearch,sowegetaN^2lgNrunningtime.So,aquickexampleofhowwecouldimprovetheperformance,wecouldfindanimrovedalgorithmtosolveaproblem.N^2lgNismuchlessthanN^3forlargeN.Andso,we'reimplicitlymakingthehypothesisthatifwedothis,dothesortbasethingandusebinarysearch,we'regoingtohaveafasterprogram.And,sureenoughwecangoaheadandrunsomeexperimentsandfindthatwhereasittookus50secondstosolvetheproblemfor8,000numbersbefore.It'stakinglessthanasecondnow.In50secondswecansolveupto64,000.Sotypicallyweexpectthatbetterorderofgrowthmeans.Fasterinpracticeandbutwhenitcomestoexaminingthealgorithmsindetailwecan,wecangoaheadanddothetestsandfigureoutwhichalgorithmisfaster.AndcertainlygoingfromN^3toN^2lgNwe'regoingtoexpectthatwe'regoingtohaveamuchbetteralgorithm.

AlgorithmsPartI

223-4-Order-of-GrowthClassifications

Page 23: Algorithms Part i

Infacttheorderofgrowthclassificationsaresoimportantthey'veledtoenormousamountofresearchinrecentyearsandjusttalkbrieflyaboutthatnow.Sothereis,lifeisalittlebitmorecomplicatedthanpointedoutinthelastexampleandoneproblemisthattheinputscancausetheperformanceofthealgorithmtovarywidely.Sooftenwehavetothinkaboutdifferentwaysofanalyzingthealgorithmdependingontheinput.So,therunningtimeisgoingtobesomewherebetweenthebestcaseandtheworstcase.Bestcaseisthelowerboundoncostit.Itprovidessomethingthattherunningtimeisgoingtobebiggerthanthatalwaysornotlessthanthatandthenthere'stheworstcasewhichisthemostdifficultinput.Ifweanalyzethatthenwecanguaranteethattherunningtimeinthealgorithmsnotgoingtobebiggerthanthat.Andtheninalotofsituationswemightconsiderourinputtoberandom.Wellweneedto,somewaytomodel,whatwemeanbyrandomfortheproblemthatwe'resolvingbutthereisalotofsituationswherewecandothatandthenwehaveawaytopredictperformanceevenwhentheinputmightvarywidely.Soforexamplefor3-sum,it'skindofalwaysthesame.Withthetildenotation,theonlyvariabilityinthatalgorithmisthenumberoftimesthecounterisincrementedandthat'sinlowordertermssoitdoesn'tneedtochewupinouranalysis.Forbinarysearchit's,youmightfindthethingrightawayinwhichcaseisconstanttimeandwecanshowthattheaverageandtheworstcasearebothlgbasedtwo(N).There'sother,inanotherexamplesthatbemuchmorevariabilityeven.So,wehavethisdifferenttypesofanalysisdependingontheinput.Andbutthequestionis,whatabouttheactualproblemthattheclientistryingtosolve?Sowehavetounderstandthattwoinordertobeabletounderstandperformanceofthealgorithm.Andthere'stwoapproachesthatare,orsuccessfulinthis.Oneistodesignfortheworstcase.Justtomakesurethatyouralgorithmare,alwaysrunsquicklyandthat'sdefinitelyideal.Anotheristo,ifyoucan'tdothatistorandomizeandthendependonsomekindofprobabilisticguaranteeandwe'llseeexamplesofbothoftheseaswegothroughthecourse.Now,thosekindsofconsiderations,youknowtheideaoforderofgrowthleadstodiscussionof,what'scalled,whatIcallthetheoryofalgorithms.Andhereourgoalsare,wehaveaproblemtosolvelikesolvethe3-sumproblemandwewanttoknowhowdifficultitis.Wewanttofindthebestalgorithmforsolvingthatproblem.Theapproachthatthecomputerscientistuseforthisistotrytosuppressasmanydetailsaspossibleintheanalysis.Andsojustanalyzetherunningtimetoorwithinaconstantfactor.That'swhatorderofgrowthisgettingatandalsoIwantto,notworryabouttheinputmodelatall.Andsowefocusedonworstcasedesignandwecantalkaboutperformanceofalgorithmsjustinturnoftheorderofgrowthandit'sactuallypossible,it'sactuallypossibletodothatinaveryrigorouswaythatit'staughtusalotaboutthedifficultyofsolvingproblems.Andourgoalistofindanoptimalalgorithmwherewecanguaranteetowithinaconstantfactorcertainperformanceforanyinputcuzwediscoveredtheworstcasebutwealsocanhaveapprovedthatdidn'tknowalgorithmcouldprovideabetterperformanceguarantee.I'llgiveacoupleofeasyexamplesofthis.Nowinordertodothisthey're,thesecommonlyusednotationscalledthebigtheta,bigOandbigomeganotations.Sotheandthosedefinitionsaregivenhere.Sobigthetanotationisjustthewaytodescribetheorderofgrowth.Theta(N)^2iskindofshorthandforanythingN^2.It'sboundedaboveandbelowbyconstanttimeN^2andthat'swhatwereallyusetoclassifyalgorithms.Andthen,thereisbigOnotationwhichisupperboundsonperformance.WhenwesayO(N^2),wemeanthatit'slessthansomeconstanttimeN^2asNgrows.AndbigomegaisusedforlowerboundsmeansgreaterthansomeconstanttimeN^2asNgrows.SothosethreenotationswereabletousetoclassifyalgorithmsandI'llshowtheminthefollowing.So,examplesfromour1-sum,2-sum,and3-sumareeasytoarticulatesoourgoalsaretoestablishthedifficultyoftheproblemandtodevelopanoptimalalgorithm.So,the1-sumproblemis00inthearray.Well,anupperboundonthedifficultyoftheproblemissomespecificalgorithm.So,forexample,thebruteforcealgorithmthatlooked,thatlooksateveryarrayentryisaspecificalgorithmanditmeansthatandthattakesO(N)time.Wehavetolookatevery,it'slessthanaconstanttimeNforsomeconstant.So,therunningtimeoftheoptimalalgorithmhastobeO(N)thatisthat'sspecificalgorithmprovidesanupperboundontherunningtimeoftheoptimalalgorithm.Andbutinthiscaseit'salsoeasytodevelopalowerbound,that'saproofthatnoalgorithmcandobetter.Well,for1-sumyouhavetoexamineallentriesinthearray.Ifyoumissone,thenthatonemightbezerosothatmeansthattheoptimalalgorithmhastohavearunningtimeatleastsomeconstanttimesNwherewesaytherunningtimeisomegaofn.Nowinthiscase,theupperboundandthelowerboundmatch.So,doingtheconstantfactorso,that'saproofthatthebruteforcealgorithmfor1-sumisoptimal.It'srunningtimeistheta(N).It'sbothomegaandO(N).That's,forthatsimpleproblemitwasokaytogettheoptimalalgorithm.Foramorecomplicatedproblemsit'sgoingtobemoredifficulttogetupperbalanceandlowerbalanceandparticularlyupperbalanceandlowerbalancethatmatch.Forexamplelet'slookat3-sum.So,upperboundfor3-sum,sayourfirstbruteforcealgorithm,saythattheproof,wasaproofthattherunningtimeoftheoptimalalgorithmisO(N^3)butwefoundabetterimprovedalgorithm.WhoserunningtimeisO(N^2)lgN.So,that'sabetterupperbound.Lowerboundwell,

3-5-TheoryofAlgorithms

AlgorithmsPartI

233-5-TheoryofAlgorithms

Page 24: Algorithms Part i

wehavetoexamineallentriescuzagain,wemightmissonethatmakes3-sum=zeroandthat'saproofthattherunningtimeintheoptimalalgorithmisomega(N)butnobodyknowshigherorlowerboundfor3-sum.Sothere'sagapbetweentheupperboundandthelowerboundandopenproblems.Isthereanoptimalalgorithmfor3-sum?Wedon'tknowwhatitis.Wedon'tevenknowifthere'saalgorithmwhoserunningtimeis<O(N^2)orwedon'tknowhigherlowerboundandlinear.Sothat'sanexampleofanopenprobleminthetheoryofalgorithmswedon'tknowhowdifficultitistosolvethe3-sumproblem.Now,thispointofviewhasbeenextremelysuccessfulinrecentdecades.Wehaveanewproblem,developsomealgorithm,provessomelowerbound.Ifthere'sagap,welookfornewalgorithmthatwilllowertheupperboundorwetrytofindawaytoraisethelowerbound.Usuallyit'sverydifficulttoprovenon-trivialorlowerbounds.Trivialorlowerboundlikelookateveryinputitemsisnotsohardnon-triviallowerboundslikeforexample,theproofthatwe'retalkingaboutforUnion-findproblemaremuchmoredifficult.Andinthelastseveraldecadespeoplehavelearnedaboutthecomputationaldifficultyofproblemsbyexaminingsteadilydecreasingupperboundssothealgorithmswerebetterworstcaserunningtimesforlotsandlotsofimportantproblemsandplentyofoptimalalgorithmsandplentyofgapsstillremain.It'safascinatingfieldofresearchthatmanypeopleareengagedin.Nowthereisacoupleofcaveatsonthisonthecontexttothiscourse.Andthefirstoneismaybeit'soverlypessimistictobefocusingontheworstcase.We'vegotdataoutthere.We'vegotproblemstosolve.Maybeit'snotworstcasedataandlotsoffieldsofengineeringandscience.Wedon'tfocusontheworstcase.Theworstcaseforthiscoursewouldbelightningtostrikeanditwouldbeoversowedon'tplanforthat.Andsincesimilarit'strueforalgorithms.Maybeweshouldbefocusingonunderstandingpropertiesoftheinputandfindingalgorithmsthatareefficientforthatinput.Andtheotherthingisinordertoreallypredictperformanceandcomparealgorithmsweneedtodoacloseranalysisthantowithinaconstantfactor.Sowetalkedaboutthetildenotationinthebigtheta,bigO,andbigomega,omegathatareusedinthetheoryofalgorithms.Andreallythere'ssomuchpublishedresearchinthetheoryofalgorithmsthatalotofpeoplemakethemistakeofinterpretingthebigOresultsthataresupposedtogiveimprovedupperboundsonthedifficultyoftheproblemasapproximatemodelsfortherunningtimeandthat'sreallyamistake.Sointhiscourse,we'regoingtofocusonapproximatemodelsby,youknowmakingsurethatweusethetildenotationandwe'lltrytogivespecificresultsforcertainquantitiesofinterestandtheconstant,anyunspecifiedconstantintherunningtime.We'llhavetodowithpropertiesinthemachineandinthesystemsotheywillbeabletousetheseresultstopredictperformanceandtocomparealgorithms.

AlgorithmsPartI

243-5-TheoryofAlgorithms

Page 25: Algorithms Part i

Sofar,we'vebeentalkingaboutrunningtime.Nowwehavetotalkaboutthememoryrequirementsoverourprogramsaswell.Well,thebasicsarewewanttoknowhowmanybitstheprogramuseorbytes,eightbitsatatime.Andactually,we'llbetalkingintermsofmillionsofbitsorbillionsofbitsandactuallysurprisinglythereisacontroversyabouteventhesebasicdefinitions.Computerscientiststhinkofamillionbitsistwo^20andabillionistwo^30becausethat'sanumberofpossiblethingsthatyoucanfitinto30bitsandeverythingisconsistentwithourcalculations.Otherscientistssticktoonemillionoronebillionforalotsofreasonswe'llusuallyusetwo^20,Imean,amegabyte.Nowanoldcomputersweusedtoformanyyears,weusea32-bitmachinesothatpointerswerefourbytes.Justinrecentyearswe'vemostlyswitchedtoamodelwheremachinesare64-bitsandpointersareeightbytes.Thatallowsustoaddressmuchmorememorybutpointersusemuchmorespaceandactuallythistransitioncausedalotofproblemsinitiallybecauseprogramswereusingwaymorespacethanpeoplethoughttheyshould.You'renotgoingtohavetogothroughthiskindoftransitionthewaythatwedidbecause64bitsisdefinitelyenoughtoaddressanythingthatyoumightneedtoaddress,two^64isreallyahugenumber.Sointermsofbyteswehavetostartoutwithtypicalmemoryusage.Now,again,thisisverydependentonmachineandimplementationbutthesenumbersarereasonableandarefoundontypicalimplementations.Soaboolean,itwillbeniceofabooleanjusttookabitcuzthat'sjusttrueorfalsebutactually,usuallywehavetocountforabyteforaboolean.Allbyteisabyte.Characternowadaysistwobyte,16-bitcharacters.Notthatalongagoweusedeightbitforchars.Integerregularintisfourbytesor32bitsandafloatisalsofourbyteslongintiseightandadoubleiseight.Usually,weusedoubleforfloatingpointandintsforintegersinmostapplications.So,that'sforprimitivetypes.Andthenforarraysthere'sacertainamountofoverheadformakinganarrayandthenifthere'snitems,it'swhateverthecostoftheprimitivetypetimesnsoanarrayofdoublesissay8n+24.Andtwo-dimensionalarraythenwell,wecangoaheadandcomputetheexactthingbutnow,now,it'stimetouse,thetildenotation.Andthenforarrayswecouldsayadoubleistilde8nforone-dimensional.Fortwo-dimensional,two-dimensionalarrayofdoublesistilde8mn.Andthere'sextratermsfortheoverheadbutforlargemandnthat'sgoingtobeprettyaccurate.So,that'sourbasicusageforprimitivetypesandarraysinatypicalJavaimplementation.Now,alotofourprogramsandobjectslikelinklistandsoforth.So,wehavetoalsofactorinobjectoverheadtocrossoverreferenceandalsothere'spaddingbuiltin,intypicalimplementationstomakeitsothateachobjecthasusedamultipleofeightbytes.So,forexampleifyouhaveadateobjectthathadthreeintinstancevariablesthenthatobjectwouldtakeatotalof32bytes.Eachinttakesfourbytes,objectoverheadissixteenbytes.Itneedsfourbytesforpaddingsoit'satotalof32bytes.SoandtheotheronethatoftencomesupisastringandthestringisalittlebitmorecomplicatedthanathananarraybutthetypicalimplementationofastringinJavahasa,areferenceouttoanarrayofcharactersandthen,itsgotintvaluesforoffsetcountinahashvalueandthensomepaddingandaddingitalltogetherthe[cough]costofthestringisabout2n+64bytes.So,thesearethebasicsthatweneedtoanalyzethememoryusageforatypicalJavaprogram.Ah,soforprimitive,fordatatypevalue,ifit'saprimitivetypeit'sfourforaneight,andeightforadouble,andsoforth.Ifit'sareference,it'sgoingtobeeightbytesandthat'sforthepointertakesarray24bytesplusthememoryforeachentryinanobjectsixteenbytesplusthememoryfortheinstancevariableplusifthere'saninnerclass,it'sanothereightbytesaswetalkedaboutwithnodesforlinklist.Andthenthere'sthepadding.Sothenwehaveto,tothinkaboutwhoisresponsibleforreferencedobjects,youknow,in,insomecases.Andwe'lltakecareofthatwhenwegettothesesituations.So,asanexample,asimpleexampleofmemoryuseanalysis,let'stakealookathowmuchmemoryareratedquickunionUFfunctionfroma,afewlecturesago,usesasafunctionofn.Andthere'sonlyacoupleofmemoryelementsandeachoneofthemareeasilyanalyzedusingthebasicsthatwejustgaveit'sanobjectsothesixteenbytesofobjectoverheadthere'stwointarrays.Eachoneofthemhavearrayoverheadof24plusandthen4nforthenentries.Eachandvnentriestakesfourbytesandthere'sfourbytesforthecountandthere'sfourbytesforthepaddingandifyouadditaltogetheritgets8n+88whichistilde8nandagain,allthat'ssayingiswhennislarge,allwearegoingtocareaboutintermsofanalyzingthememoryisthatwe'vegot[cough]2nintegerstwoarraysofsizeneachoneofwhichtakesfourbytesforagrandtotalof8nbytes.Okay.So,insummarywereallycanfigureouthowmanytimeswehavetoturnthecrankonmoderncomputers.Wecandoitwithempiricalanalysiswhereweactuallyexecutetheprogram,candoexperimentsanduse[inaudible]powerlaw,formulatehypothesisandmakepredictions.Butwecandomore,wecandomathematicalanalysiswherewecanidentifythemostcostlyoperations,analyzethefrequencyofexecutionofthoseoperationsandusingthetildenotationtosimplifyanalysis.Wecanactuallyexplainthebehavior,notjustpredictit.Andthisisafineexampleoftheuseofthescientificmethodtounderstandtheartifactsthatwe'restudying,thealgorithms.Ourmathematicalmodelsareusuallyindependentofaparticularcomputersystemandevenimpliedtomachinesthatarenot

3-6-Memory

AlgorithmsPartI

253-6-Memory

Page 26: Algorithms Part i

yetbuilt.Butwealwaysvalidateourmathematicalmodelsbyrunningexperimentsonrealmachinessothatwecanbeconfidentwherewe'remakingpredictionsandanalyzingalgorithms.

AlgorithmsPartI

263-6-Memory