18551 Fall 2008 G1 Final Report - ECE:Course Page

C.A.V.E.S.ContentAwareVideoExpansionandScaling

Fall2008,Group1

AneebQureshi([email protected])GregoryTress([email protected])DavidXiang([email protected])

Fall 2008, Group 1 – Content Aware Video Expansion and Scaling (C.A.V.E.S.)

2

1.Problem

Since the proliferation of consumer television technology in the 1940s and 1950s, the American television

industryhasused the4:3aspect ratioasa standard for filmingandbroadcasting,establishedby theNational

TelevisionSystemCommittee.This standardhasguidedboth televisionmanufacturersandcontentproviders.

More recently, with the increased availability of high‐definition content, the nationwide switch to digital

broadcasttelevision,andtheintroductionof low‐costdigitaltelevisionsets,consumershavedemonstratedan

interest inviewingcontent inthe16:9"widescreen"aspectratio.ThisaspectratiowasusedbytheAdvanced

Television Systems Committee as part of its standard for high definition television and its basis for new

standardizedtelevisionresolutions.Manytelevisionsinthemarkettodayaredesignedaroundthe16:9aspect

ratio,andtelevisionstudiosaretransitioningtowarddigitalwidescreenfilmingofcontent.Forconsumers,digital

television generally results in a higher‐quality picture, with increased resolution and decreased visible

interference.

At thesametime, this transition results inabackwards‐compatibility issue.Themajorityofexisting television

contenthasalreadybeenfilmedin4:3,andinordertodisplay4:3aspectratiocontentona16:9television,the

consumermust choosehow to adjust the aspect ratio on the television itself.Widescreen televisions usually

have a similar set of user‐defined options for this purpose, as described here.One option is tomaintain the

originalaspect ratioof thecontentandcenter iton thescreen.Because the16:9screen is significantlywider

thanthe4:3videoframeofthesameheight,blackbarswillappearonthesidesofthevideo;thisphenomenon

isknownas"pillarboxing."Asaresult,theconsumerlosesthebenefitofthewiderscreen;infact,awidescreen

televisionofthesameviewablesurfaceareaasatraditional4:3televisionwillyieldasmallerrepresentationof

thesamevideo.Asecondoptionistohorizontally"stretch"the4:3video,forcingittofilltheentirescreen.This

results inasignificantandnoticeabledistortion.Yetanotheroptionisto"zoom"the4:3video,cuttingoffthe

top and bottom of the frame, so that the video can fill the entire screen without distorting the content.

However,thiszoomfunctionalityisgenerallyunintelligentandresultsincuttingoffimportantpartsofthevideo.

Consumers will ultimately continue to face aspect‐ratio difficulties as long as 4:3 content is broadcasted.

Currently, there is no simpleway to view this 4:3 content on a 16:9 televisionwhile bothutilizing the entire

screenandavoidingnoticeabledistortion.

Inthispaperwedetailasystemthatintelligentlyconvertsvideocontentfromoneaspectratiotoanother.Inthe

caseofthetelevisionaspectratioproblemdescribedabove,thissystemwouldmodifya4:3inputvideoinsuch

awaythattheresultingoutputwouldfill theentirescreenofa16:9televisionbutwouldnotappearseverely

distorted. While our motivation stems specifically from the 4:3 to 16:9 conversion problem, the system is


3

generalizedinsuchawaythat itcanconvertbetweenanytwoaspectratios.Oursystemdoesnotoperateon

actual television content in real time, but will still function as a valid proof‐of‐concept for this, since our

algorithmcanbeusedasthebasisforaconsumer‐enddevicewhichdoesperformthisfunction.Suchadevicein

the form of a set‐top box would require adequate hardware and a modification to the parameters of our

algorithmtooperateinreal‐time.Inourcase,wecanstillcreatea4:3to16:9conversionwithlowerresolution

and lower framerate todemonstrate theeffectivenessof thesystem.With theprevalenceofmobiledevices

andweb‐basedvideo inavarietyofphysical resolutions, therearemanypossibleapplicationsofaspect ratio

conversioninadditiontotelevision‐specificcontent.

There are numerousmethods explored for content‐aware image resizing. For videos in particular, there has

been increased research in video retargeting. Video retargeting relies on solving a large system of linear

equations in order to determine the desired output aspect ratio. As we will detail in a later section, video

retargeting is not suitable for the C67 DSK due to the large amount of computations andmemory accesses.

Instead,wewill be using amodified versionof seam carvingwhich takes into account temporal dependency

betweenframes.Byusingseamcarvinginsteadofvideoretargetingtechniques,wecreateatradeoffbetween

qualityandspeed.ConsideringthelackofpoweroftheC67DSK,thetradeofffornotusingvideoretargetingisa

suitablesacrifice.

2. Novelty ThisprojectrelatescloselytoFall2007Group6'sproject:ContentAwareImageResizingasavideoissimplya

seriesofimages.However,thealgorithmdescribedbytheirprojectcannotbedirectlyappliedtovideodueto

largeartifacts thatoccur. Whenthetraditional seamcarvingmethod isapplied toeach frame inavideo, the

result is jerky: parts of the frame appear to jump from one area to another, usually shaking left and right

sporadically.Thisoccursbecausetheseamsinoneframeareunrelatedtotheseamsinthenextframe.Hence,

whenseamsmovebyenoughpixelsbetween2givenframes,theviewerofthevideoobservesthisjerkyeffect.

Throughoutthepaper,wewill refertoa lessextremeversionofthe jerkyartifactasawavyartifact, inwhich

onlycertainsegmentsoftheframeexperiencemildtemporaldistortionintheirexpansion.

Withthatinmind,wehaveaddednewimprovementstotheseamcarvingalgorithmwhichallowsthealgorithm

to functionproperly for video sequences. These tweaksaredescribed indetail in Section3. In addition,our

projecthasanenormousamountofdatatotransferbetweentheDSKandthePC.Thisintroducesmemoryand

speedproblemswhicharenotpresentinGroup6'sproject.

Inregardtocreatingtheprominencescores,wehavefullyadoptedGroup6'sfacedetectionalgorithm.Wefelt


4

that face detection should not be the prominent focus of our project and testing, hencewe decided to not

exploreotherfacedetectionalgorithms.

3.Algorithms

EdgeDetection

Thefirststepincreatingtheprominencescoresforaframeinvolvesdetectinghighenergyareasbasedonedge

detection.Edgedetectioncansimplybeimplementedbycorrelatinganimagewithkernelsthatdetectchanges

incolorbetweenadjacentpixels.Inordertosimplifycalculations,theRGBimageisconvertedtograyscale.This

reducestheamountofcalculationsandmemoryaccessestoathirdoftheoriginalamount.Astherearemany

operators which produce edge detection, we use the Sobel operator as it only uses 3‐by‐3 kernels. Smaller

kernelsaremorefavorableastheyreducethenumberofcalculations.Forexample,themosttimesapixelcan

beoperatedonbya3by3kernelis9times.Fora4by4kernel,apixelcanbeoperatedon16times.Asyou

cansee,theamountofoperationsoneachpixelincreasesby7operations.Consideringthatthekernelisbeing

correlatedwithaframeof320by240(76,800pixels),saving7operationsoneachpixelissignificant.

Figure3.1:Sobeloperatorkernels

TheSobeloperatorusesthekernelsdefinedinFigure3.1andtheL2‐Norminordertoimplementedgedetection

[1]. The L2‐Norm is usually implemented by the square root of the sum of squares of each kernel output.

However, a fastermethod is to approximate the L2‐Norm by the sum of the absolute value of each kernel

output;thisallowsfortheprocesstobeevenfasterontheDSK.AsnotedinFigure3.1,eachkernelrepresents

eitherthegradientinthex‐directionorinthey‐direction.Inactualimplementation,thesetwokernelscanbe

correlatedatthesametime(meaningthatweonlyneedtoiteratethroughtheimageonce).


5

WhenimplementingtheSobeloperatorontheDSK,wedonotdocomputationsforthe"zero"elementsinthe

kernelstosavecycles.Inaddition,wedonotmultiplyvaluesbyone,asitisauselesscomputation.Theoutput

ofedgedetectionfromtheDSKisshowninFigure3.2.

Figure3.2:Edgedetection

FaceDetection

Facedetectionisanimportantaspectoftheprominencescoresinordertoensurethatfacesarenotdistorted.

Inmanycases,facescanlackdetail(dependingonhowclosethefaceistothecamera).Whenthisoccurs,edge

detectionwillfailtoassignhighenergytothatface.Henceforth,facedetectionmakesupfortheshortcomings

ofedgedetectionalone.

Asnotedearlier,wehavefullyadoptedthefacedetectionalgorithmbyGroup6fromFall2007.Theapproach

consists of three sequential stages: creating a binary image via YCbCr thresholding, opening and closing via

erosionanddilation,andblobdetectionandrejection.ThefacedetectionprocessisshowninFigure3.3.

Figure3.3:Facedetectionblockdiagram


6

1:YCbCrThresholding

YCbCr isacolorspace that isoftenused indigitalvideosystems. Y represents thebrightnesscomponent,Cb

representsthebluechromacomponentandCrrepresentstheredchromacomponentof the image. Theface

detectionalgorithmthatweareusingfocusesondifferentiatingfacesfromtherestoftheimagebasedonthe

skin tone. Henceforth,weonlyuse theCbandCrcomponents inourthresholdingstep. Itmaybepossible to

incorporatetheYcomponenttoallowthethresholdingto identifyfaces inmoresituations. Butagain,wedid

notwanttomakefacedetectionthe focusofourresearchandhencewehave justsimplyadoptedGroup6's

algorithm. Theconversion fromRGB toYCbCr is simplydone through theMatlab command rgb2ycbcr(). We

thenusetheYCbCrcolorspacetocreateabinaryimage.Thebinaryimageishighforpixelsintherangeof100<

Cb<133and140<Cr<165.

2:MorphologicalOpeningandClosing

Beforeexplainingtheprocessofopeningandclosing,wehavetounderstanderosionanddilation.Botherosion

anddilationareperformedonbinaryimages.

Erosionissimplytheprocessofremovingnoiseandotherartifactsbymovingastructuringelementthroughout

theimagewhilefindingoverlapsbetweenthestructuringelementandhighpixels.Ineachoverlap,thecentral

pixelstayshighandallotherpixelsbecomelow[5].

Dilationistheoppositeprocessoferosion.Dilationisperformedaftererosioninordertoattempttofillinany

holes thatcouldhavebeencreatedbysettinghighpixels to lowpixels in theerosionprocess. Indilation,we

asserthighpixelstotheentireareaofthestructuringelementforeachhighpixelintheimage;allotherpixels

becomelow[5].

Whenweapply imageopening,weerodeanddilate the imagebyastructuringelementofsize9by9pixels.

Thisbasicallymeansthat in imageopening,weremoveartifactsthataresmallerthan9by9pixels. In image

closing,wedilateanderodetheimagebyastructuringelementofsize7by7pixels.Imageclosingisusedtofill

holes,likeeyesandlips,whichareusuallydifferentcolorthanregularskin[5].Noticethatinimageopening,we

firsterodewhileinimageclosing,wefirstdilate.Besidesthedifferentstructuringelements,thosearethekey

differencesbetweenimageopeningandclosing.Bothstructuringelementsusedwerefoundbytrialanderror

fromGroup6fromFall2007.

3:BlobDetectionandRejection

Atthispoint,thenoise(ifpresentoriginally)shouldbeeliminatedandweshouldbeleftwithaseriesofblobs.


7

Wenowneedtointerpretwhichblobrepresentsafaceandwhichblobissimplyafalsedetection.Byusingthe

method regionprops() inMatlab,we caneasily determine thewidth, height andareaof eachblob. We then

rejectallblobswhichmeetanyofthefollowingproperties:width<20,width>80,height<25,height>150.

Theseblobsarerejectedbasedonthe ideathattheyareprobablytoosmalltobeafaceortoo largetobea

face.Lastly,ontheremainingblobs,weexaminetheirwidthheightratios.Iftheirwidth‐heightratioisbetween

0.5and0.9,thenwecontinueprocessing;otherwise,thatblobisrejected.Ifsomethingpassesthewidth‐height

ratio,wethenlookatthedensityoftheblob,whichisdefinedasthearea/(width*height).Ifthedensityisless

than0.5,werejecttheblob;otherwise,wehavesuccessfullydetectedaface.

Limitations

Thisfacedetectionalgorithmisverylimited.Becauseitisbasedoncolorthresholding,it'shighlydependenton

the lighting conditions of the given frame,. With that in mind, this algorithm will not work for every

environment.Wehavenoticedthatitworksbestforindoorsenvironmentsandoftenfailsinoutdoorscenarios.

In alternativemethod for face detectionwould be to do a feature‐based face detection algorithm. Feature‐

basedfacedetectionisindependentofthecolorofthesubject;hence,itwouldworkinanylightingconditions.

Weoriginallylookedat feature‐basedalgorithmsanddecidednot to implement thembecause they requirea

verylargetrainingsetinorderforthealgorithmtoworkcorrectly.Todemonstrateasuccessfulfacedetection,

Figure3.4andFigure3.5eachshowfacesofdifferentsizesdetectedfromdifferentvideos. Inbothcases, the

regiondeterminedtobeafacespreadsslightlyoutsidetheactualboundariesofthefaceduetothecoloringof

thesubject'sshirt.

Figure3.4:Inputframeandresultingfacedetectionoutput(withprimitiveedgedetectionshownasaguide)


8

Figure3.5:Inputframeandresultingfacedetectionoutput(withprimitiveedgedetectionshownasaguide)

MotionDetection

ThemotionalgorithmusedintheCAVESprojectisablockbasedimagethresholdingalgorithmimplementedby

Liu et al. [2] The goal ofmotion scoring in video expansion is to give importance to regions of highmotion.

Assumingthat theseregionsareonesofhigh interest, thepurposeofdetectingmovement is togiveabetter

viewing experience for expanded videos. The scores of the motion detector is the third contributor to the

prominencetestmatrix.

The motion algorithm used is a proposed way to provide illumination‐independent change detection. The

algorithm discussion will be broken up into two sections. The first part will discuss special values known as

circular shiftmomentsandshowhowtheyprovidechangedetection inanoise‐freecase.Thesecondsection

willapplyanewdecisionruleontopofthistocopewiththeeffectsofnoise.

1:ChangeDetectionwithCircularShiftMoments(CSM)

For the calculations in this algorithm, the 24bpp images are averaged to give an 8bpp gray scale image. The

mappingfromRGBtothe8bppgrayscaleimageissimplyanaverageofthered,green,andblue.Fromthis,the

image is partitioned into NxN pixel square blocks. For our implementation, we choose N to be 10 for our

320x240images.Thisresultsin768possibleareasofmotionwithinonegivenframe.Choosingsmallervaluesof

Nresultedinsignificantlyslowercomputationtimes,andlessaccuratedecisionresultsbythemotiondetector.

For example, using anN value of 5 results in 3072 possible areas ofmotionwith one frame. For such small

valuesofN,theimagesequencesaretoonoisyandresultinalmostconsistentmotiondetectionwhenitisonly

noise.Forvalueslargerthan10,wefoundthatthemotionalgorithmwouldnotreturnspecificenoughresults.


9

For example, choosing anN value of 20 results in only 192 areas ofmotion. Even though computationwas

faster,thesquareregionsweretobigtogiveanaccurateestimationofmotionregions.ChoosingNtobe10,for

ourgivensize,resultedinthemostpromisingresults.

For every square area of interest, there is a predefined x‐direction circular shiftmoment and a y‐directional

circular shift moment. Please refer to [2] for the equations. With these equations, the CSM‐based change

detectioncanthenbeappliedtodetectmotion.Thestepsareasfollows.Accordingtotheequationsgivenby

[2], calculateboth thexandydirectional circular shiftmoments forevery squareblock ina reference frame.

Alongwiththesecalculations,decideuponapredeterminedthreshold.Foreverysquareblockareaofinterestof

the kth frame in an image sequence, calculate its x and ydirectional circular shiftmoments for every square

block.Finally,claimthatachangeoccurs inasquareblockiftheabsolutevaluedifferenceofeitherthexory

directionalcircularshiftmomentsbetweenthekthframeandthereferenceframeisgreaterthanthethreshold.

Otherwise,thereisnomotioninthatsquare.Rinseandrepeatbyre‐initializingthereferenceframe,andmove

onwiththenextone.

2:ChangeDetectionwithCSMtoCopewithNoise

Theproposedmethodin[2]todealwithnoiseinthevideoisquitesimple.Considerasituationwherenothingin

theactualvideocontentchanges,butpixelvalueschangeasaresultofnoise.Underanoise‐corruptedsituation,

thegraylevelatacertainpositionwillthenbethegraylevelofthatsamepositioninthepreviousframewith

theadditionofnoise.WeassumethatthisnoiseisadditivewhiteGaussiannoise(AWGN)andcanbesomewhat

accuratelyencapsulatedbyitsmeanandvariance.Weassumethemeanofthenoiseinourvideosiszerowitha

certainvariance.Thevarianceiswhatwillbecalculatedinordertocopewiththenoise.Furthermore,thenoise

isassumedtobeindependentbetweenpixels.

ThegoalofthesecalculationsistointelligentlychangethethresholdfromPart1inordertoproperlycopewith

thenoisebetter.Hypothetically in thenoise‐freecase, thecircularshiftmomentsofacertainsquareblock in

two consecutive image framesmust be identical provided that there is no content change. Aswe know, the

effects ofAWGNwillmake it so that these twoCSMmoments aredifferent even though the scenes are the

same.

Theprocess is simple. For a given reference frame and a kth frame, determine the one square 10x10 region

whichexhibitstheleastchangeinitscircularshiftmoments.Letthe'change'herebedefinedasthesumofthe

absolutevaluedifferencesofboththexandydirectionalCSMmoments.Inorderwords,picktheonesquarein

theentire framewhich changes the least.Accumulate thegray scale levels in this area inboth the reference


10

frameandthecurrentframeandcreatearatiowhichwillbeknownasthevariationfactor.Thisnumberright

here is then used to estimate the noise variance. The equations and detailed sequential steps are outlined

thoroughly in [2]. A basic summary goes as follows. Find theNxN square in a framewhich changes the least

between two given frames. Assume that this change is due entirely to noise. Calculate the variance of the

speculative AWGN and accommodate for the variance in the predetermined threshold. Any change will

contributetonoisevariancewhichwillmakethethresholdmorestringent.Thus,thefinalstepsareexactlythe

sameastheonesoutlinedinPart1,exceptthatthethresholdisnowmorestrict,orhigher,tofilteroutnoise.

Results

Themotionalgorithmis implementedentirely inMatlab.Theresultsof thealgorithmaregoodwithroomfor

improvement.Motionisdetectedinsquareblocksinareaswhereitshouldbe.Thisisbasedonusviewingthe

videosandactually seeingwhatmovesand thencomparing it to theblockswhicharedetected.All character

movement,bodymovement,andmobileobjectsaredetectedwell.Resultswerelessaccurateduringtimesof

intensecameramovement.Duringcameramovement,manythingssuchasthebackgroundchangesalotwhen

itisinfactnot"important"motion.Nonetheless,duringthepresenceofmotion,moreweightisaddedinthat

particularareaandisaccountedforduringtheseamcarvingprocess.Adetaileddiscussiononhowthemotion

contributedtotheresultsoftheseamcarvingarediscussed inSection8. Anexampleofmotiondetectionon

sequentialframesisshowninFigure3.6. Inthefigure,thetwosequential inputframeshavejustafewsubtle

visibledifferences,butthemotiondetectionalgorithmeasilydetectsthechangesinthepositionofthesubject's

lightsaber,arm,andhead.


11

Figure3.6:Twosequentialinputframesandtheresultingmotiondetectionoutput(withprimitiveedge

detectionshownasaguide).

SeamCarving

Seamcarvingusestheprominencescorestogenerateanenergymapwhichshowsthe"totalcost"ofaseamat

the bottom boundary (for a top‐down energy approach). We can't simply use the prominence scores to

determinewhich seams to add due to the fact that each prominence score isindependentfrom each other.

Hence,wecan'tmakeaneducateddecisionastowheretostartseamsjustfromexaminingonerow.Thisisthe

motivationbehindcreatingtheenergymap.

EnergyMap

Theenergymapiscalculatedbyatop‐downapproach.Thismeansthatthetotalcostofaddingaseamisfound

inthelast(bottom)rowoftheframe.Theenergymapiscreateddirectlyfromtheprominencescores.Thevery

toprowoftheenergymapisequivalenttothetoprowoftheprominencemap.Startingfromthesecondrowto


12

thelastrow,eachpixel'svalueisequivalenttothesumofitsprominencescoreandthemaximumprominence

scoreofthethreeadjacentpixelsaboveit.Thisalgorithmcreatesvaluesinthelastrowwhichcarryinformation

from every other row. The nature of this algorithm is clearly best implemented by dynamic programming.

Figure3.7showsanexampleenergymap.

Figure3.7:Aprominencematrixwithmotion,gradient,andfacevisible(left)anditscorrespondingenergy

(right)

Seams

Asmentionedearlier,weusetheenergymaptodeterminewheretoplaceseams.Wesimplychoosethelowest

100energyvaluesinthebottomrowoftheenergymapandmaketheseourstartingpositionsforeachseam.

This isdifferentfromseamremoval. Whenseamsarebeingremoved,seamsareremovedoneatatimeand

aftereachseamisremoved,theenergymapisrecalculated. This isanextremelyslowprocess. Inthistoken,

videoexpansionismuchmoresuitableforimplementationonaDSK.Thereasonwhywechoosethe100lowest

energyvaluesatonceratherthancalculatingoneseamatatime,istoavoidartifacting.Consideringcalculating

seamsone at a time. The first seamwould be calculated and then duplicated ‐‐ causing the imagewidth to

increaseby1pixel.Wethentrytofindthenextseam.Becausetheadditionofthefirstseamsimplycopiesthe

pixelstotheleftofit,theenergymatrixbarelychanges.Henceforth,whentryingtofindthesecondseam,the

algorithmwillapproximatelyselectthefirstseamagain.Thisprocesswouldkeepgoinguntilall100seamsare

chosen.


13

Attheend,theviewerwillnoticethattherewouldbeverynoticeableartifactingfromexpandingthesamepixels

by100timesTheartifactingproblemisshowninFigure3.8.

Figure3.8:Seamcarvingartifactingforexpansion[3]

Thus,bychoosingall100seamsatthesametime,withoutalteringtheenergymap,weareabletoavoidthis

artifactingproblem(asshowninFigure3.9).Afterchoosingthe100startingpositions,wesimplybacktrackthe

energymaptodeterminetheentireseam. Thisresults inseamsthatarebothconnectiveandmonotonic [3].

Thisbasicallymeansthataseamcanonlyhaveonepixelperrowandeachpixeloftheseammustbeadjacent

totheseam'spixelsintherowaboveandbelowit.Thismakessense,aswewanttouniformlychangethewidth

ofallrows.

Figure3.9:Properseamcarvingforexpansion[3]

Thismethodthatwedescribedcreatesseamssolelydependentontheenergymap. Inordertoeliminatethe

jerkyartifactandlessenthewavyartifact,weaddnewrestraintsonframes.Wefirstcomputethefirstframeof

avideoasdescribedabove.Butthenfortheframesthereafter,weaddanewrestraintbasedontheprevious


14

frame.Forexample,forthesecondframe,westartoffwithhavingtheenergymapforthesecondframeand

theseamsusedinthefirstframe.Wethenaddarestraintforcalculatingtheseamsinthesecondframe.The

seamsinthesecondframemustbewithin3pixelstotherightor3pixelstotheleftoftheseamsfromthefirst

frame. The value, 3 pixels, was found from experimentation and it gave the best qualitative results. After

restraining the seams in the second frame with this 6 pixel window, we then allow the seams to change

dependingontheenergyofthe2ndframe.Byrestrictingtheseamsofthecurrentframebythepreviousframe,

wecreateatemporaldependancywhichcompletelyeliminatesthejerkyartifact.

Problems

By creatinga limitationonhowmuch seamscanchangebetween two frames,weencounterproblemswhen

importantregionsofthevideomovefastbetweenframes.Forinstance,ifthereisapersonintherightsideof

frame1andthenhemovesprogressivelytotheleftsideupuntilframe10,weoftenseeartifacting.Thisoccurs

becausetheseamsarenotabletomoveasfastasthehighenergyregionismoving.Whenthisoccurs,thehigh

energycontentbecomesdistortedbecause it runs into the seams. This canpossiblybeavoidedbynot fixing

rangeoftheseam'smovabilitywindow. Infuturework,wecanpotentiallycontent‐awarecalculatetherange

basedonthetotalenergyoftheseam.FutureworkandcurrentproblemsarediscussedindepthinSection8.

Onemethodweusetoamelioratethisproblemistheprocessofkeyframing.Akeyframeisaframethathasno

additionalrestraintsbeyondtheenergymapforseamcalculations.Ifwecontent‐awarelykeyframethroughout

thevideo, itgivestheseamsachanceto fully"reset;"essentially itallowstheseamstomovetotheirproper

locationsdisregardingtemporaldependancy.Thisfixestheissueofapersonrunningintoseams,aswecanjust

keyframethatinstance.However,determiningwhentokeyframeisataskinitself.Wenaivelykeyframewhen

there is a large energy change between frames. When the energy ratio between the current frame and the

previousframeiseither200%or50%,wemakethecurrentframeakeyframe.Thevalues200%and50%were

found by examining the energy ratios of a high‐motion 50 frame video. By keyframing only at these large

changes,weavoidtheproblemofkeyframingsuccessiveseams,asthiswouldbreakourtemporaldependancy.

Tofurtherbuildonusingenergyratios,wechangetheseam'smovabilitywindowdependingonalarge[small]

enoughenergyratio.Iftheenergyratioisfrom126%to200%orfrom50%to74%,wechangethewindowfrom

6pixelsto50pixels.Theideaisthatwewouldliketokeyframeatthischangeinenergyratio,butthechangein

this energy ratio occurs toomany times throughout the video to allow keyframing. Hence,we instead allow

moremovement for the seams so that they're able to "jump"more. This enhances thewavy artifact but it

allowsforseamstokeepupwithmovinghighimportantenergyareas.Again,thesepercentratioswerefound

fromthesamehigh‐motion50framevideo.


15

4.BriefSystemOverview

The following sectionwill provideabrief systemoverviewof theCAVESproject. Formoredetails behind the

processinganddatatransferbetweenthePCandtheDSKpleaserefertoSection7:Processing,SpeedsandData

Rates.

From start to finish, we process an input video on the local machine and output its 420x240 24bpp

representation. The whole process can be broken down sequentially into steps with different PC and DSK

responsibilities. Figure 4.1 shows a basic representation of these steps. The details of the data flow are

describedfurtherinthissection.

Figure4.1:Simplifieddataflowwithprimaryalgorithmstepsshown

Inoursystemlayout,differentprogramsareusedindifferentpartsofthedataflow(Figure4.2).Matlabisused

for processing on the PC, but a separate network server, written in C, runs on the PC also and handles all

communicationwith theDSK. BothMatlab and the network server read fromandwrite to a common set of

image and video files on the PC, but the timing is asynchronous. In the data flow figures in this section, the

networkservercomponentisexcludedfromthefiguresforforclarity,butitisstillusedfordatatransfer.


16

Figure4.2:Corecomponentsusedindatatransferandprocessing

1) PC: The computer takes the input video and does preliminary processing entirely inMatlab. This includes

breakingthevideodownintoseparateframes,rescaling,recoloring,andpreparationfortransfertotheDSK.In

addition to this,Matlabwillperformpreprocessingofevery frameof thevideoby running themthrough the

facedetectionandmotiondetectionalgorithms.Thiswillbeaccumulatedintoa"partialPT"whichwillbestored

onthePCandsenttotheDSKlater.(Figure4.3)

Figure4.3:Matlabpreprocessing

3)DSK:TheDSKtakestheframealongwiththepartialPTmatrixandcomputesthegradient.Afteraddingthe

gradientmatrix tothepartialPTmatrix, theDSKwillnowholdthefinalprominencematrix for thatparticular

frame.TheDSKcomputestheenergymatrixbasedonthisprominencematrix,andfeedstheenergymatrixinto

the seam calculations. All the seams are then routed for this frame. Remember that CAVES calculates seams

basedonthepreviousframe'sseamsalongwiththecurrent frame'senergy.TheDSKexpandstheframewith

thesecalculatedseamsandsendstheexpandedframebacktothePC.Fordebuggingandviewingpurposes,we

alsosendtheexpandedframewithredseams,a fullPTmatrix,andtheresultingenergymapbacktothePC.

(Figure4.4).


17

Figure4.4:DSKprocessing

Todeliver thenecessarydatatotheDSK,aseparatePCnetworkserverprogram isused,asdescribedearlier.

Thisprogramisverysimpleandperformsonlynetwork,file‐handling,andformat‐conversionoperations.Thisis

not shown in the simplifieddata flow inFigure4.4,but ispresent in the system.TheMatlabpreprocessing is

performedrelativelyfast(about5framespersecond)andeachframeissavedtoafileonthePC'slocaldrive.

TheDSKprocessingisstartedsimultaneouslyandisperformedasynchronouslyfromtheMatlabprocessing.The

PCnetworkserverreadseachfilerecentlycreatedbyMatlab,convertsittoRGB,andsendsittotheDSK.When

theDSKfinishesprocessingeachframe,thePCnetworkserverreceivestheRGBoutput,convertsittoBMP,and

savesitonthelocalPCdrive.Theexpandedframe,theexpandedframewithredseams,thefinalPTmatrix,and

theenergymatrixareeachsavedasindependentBMPfilesforeachframeinadesignatedfolderstructureon

thePC.Figure4.5showsthegeneralizedlayoutofdatadeliverytoandfromtheDSK.

Figure4.5:DetailofPC‐DSKdataflowforeachdatasegment

4)PC:ThecomputerreceiveseverythingfromStep3andreassemblesitintoavideowhilesavingittothelocal

machine.AlltheinformationisviewableontheGUI.(Figure4.6)


18

Figure4.6:Matlabpost‐processingandGUI

Thisisaquicksummaryofthebasicsystemlayoutofourproject.Again,pleasenotethatanin‐depthanalysisof

thisdataflowisincludedinSection7:Processing,SpeedsandDataRates.

5.GraphicalUserInterface

Figure5.1:GUIoverview


19

For the CAVES project, a graphical user interfacewas created to allow users to easily use the video resizing

algorithmandviewthe results (Figure5.1).Bydisplayingavarietyof intermediatestepsandallowingvarious

settingchanges,thisGUIwasextremelyhelpfulinourdebuggingprocessandallowsustoanalyzevarioussteps

inthealgorithm.It isalsoveryconvenientfordemonstratingthebehaviorofthealgorithminapracticalway.

Theuserinterfaceisdividedintotwomainsections:retargettingcontrolsandframeviewing.

RetargettingControlsandSettings:

TheretargettingcontrolsandsettingsarelocatedinthetopleftoftheGUI.(Figure5.2)

Figure5.2:Retargetingcontrolsandsettings

ControlButtons:

ThefirststepinusingtheuserinterfaceistousetheOpenVideobuttontoopenupavideowithMatlab.These

videosarelocaltothemachineandwillbereadintovariablesontheMatlabworkspace.Thesecondstepisto

useExtracttodeterminewhichsectionsofthevideoaretobeprocessedandtoperformallthepreprocessing

requiredsothatthesequencecanbesentovertotheDSK.Theexactprocessinganddetailsofthisarediscussed

in Section 7: Processing, Speeds, and Data Rates. Finally, the Retarget button will establish the connection

betweenthePCandtheDSKhardwareunitinordertoprocessourvideo.

ActiveAlgorithms:

ThePTmatrixthatwecalculateperframeisdependentonthegradientoftheframealongwithmotionandface

detection. As a default, all three of these attributes are enabled in order to contribute to our prominence


20

scoringastheyareallimportant.Wehaveaddedafeaturetoallowtheusertospecificallychooseamongthese

contributesforthePTmatrixcalculations.Thisenablesustoanalyzehowwellindividualpartsareworking.

FrameRange:

Theuserisallowedtoinputthespecificrangeofframesinthevideothatheorshewouldliketoretarget.Thisis

usefulfordebuggingaswesometimestendtoanalyzeparticularsequencesinvideos.Thisisalsoveryhelpfulfor

theviewingexperienceasusersmayonlywanttoprocesscertainsectionsofthevideo.

I/OHeight,Weight,andAspectRatio:

Thesearenotwritableby theuserandareusedtomerelydisplay theheight,weight,andaspect ratioof the

inputandoutputvideo.Wehavehardcodedtheinputvideotobe320x240pixelsandtheoutputvideotobe

420x240pixels.This isshownontheGUI tomaketheuserawareof thechangeswearemakingto thevideo

sequence. If the input video does not have an aspect ration of 4:3, its converted size will be smaller than

320x240. The correct converted input dimensions and corresponding output dimensionswith 100 seams are

displayed, preserving theoriginal aspect ratio of the videoduring resizing.However,wedid encounter some

problemswith retargetingbehaviorwhenusing input videos thatwerenotoriginally4:3. Theproblemswere

causedbyvariousdiscrepanciesbetweenthehard‐coded320x240dimensionsandtheactualdimensionsofthe

resized input. Some debugging would be necessary to fix these problems, but we were not particularly

concernedwith these cases becausemost of the videoswe used for testing and demonstrationwere 4:3. In

addition,wedidnothavetimetomaketheinputoroutputdimensionsettingsuser‐customizable.Futurework

inthisareashouldallowtheusertospecifyarbitraryinputandoutputaspectratios.

FrameView:

InputVideo:

Wedisplaythevideosequenceofourinputvideo.Itisimportanttonotethatallvideosequencesandrescaled

andrecoloredto320x24024bpp.

OutputVideo:

ThisisthefinalproductthatCAVESoutputs.Ourfinalvideois420x24024bppandistheretargettedversionof

theinputvideotoitsleft.

Prominence:

Forthecompletesequenceof inputframes,wedisplaytheprominencematrixofeachframe.Rememberthat

the prominencematrix consists of all the checked attributes in the Active Algorithms box. This includes the


21

gradient, face,andmotionbydefaultbutmaybechangedaccordingly.Theresultingprominencecanthenbe

viewed.

Energy:

Forthecompletesequenceofinputframes,wedisplaytheenergymapwhichiscalculatedwithrespecttothe

prominence. This isuseful to seehowenergy changes indifferent sectionsof a frameandbetweendifferent

frames.ChangingpropertiesintheActiveAlgorithmsboxalsoenablesustoanalyzehowourvariousalgorithms

contributetotheenergymatrixaswell.Theenergymatrixisthususedtorouteseams.

Seams:

For the complete sequence of input frames, we display the retargetting video at its new aspect ratio while

includingtheseamsdrawninred.Thisallowstheusertoseeexactlyhoweachoftheseamswereroutedwith

respecttotheenergymatrix.

ScrollBar/Animate:

InthemiddleoftheGUI,thereisascrollbarandananimatebutton.Thescrollbarscrollsbetweentheframesof

thevideosequencewhiletheanimatebuttonrunsthroughtheframesandplaysitasamovie.Thisisessential

toseeexactlyhowouralgorithmisperformingonavideosequence.

Note: InMatlab, behind the scenes,we are not playing amovie. TheGUI is stepping through all the frames

sequentially.TheCAVESprogramoutputstherescaledinputvideoalongwiththefinaloutputvideosequencein

avideoformattedfileonthelocalmachine.Theprominence,energy,andseamsareoutputtedasasequenceof

imagesandarenotreconstructedintoamoviefile.Thesearealloutputtednicelyintoadirectorytree.

7.Processing,Speeds,andDataRates

VideoandImageFormats

Wechosetouseaninputvideosizeof320x240pixelsandanoutputvideosizeof420x240pixelsforthesystem.

The input size was chosen because it is exactly a 4:3 aspect ratio and because it is a standardized display

resolution,knownasQuarterVGA.Thisisacommonsizeformobilevideodisplaysandisapopularresolution

forvideosfoundontheweb,includingthoseontheYouTubewebsite.Theoutputsizewaschosenbecauseitis


22

very close to16:9 andallowsexactly 100 columns tobeadded to the video frame.A true16:9 framewould

requireapproximately426.67columns,butthisisnotanintegerandthereisnostandardizedwidescreenframe

sizedefinedataheightof240pixels.Forthemostpart,thecodeofthesystemisgeneralizedtoallowtheframe

size to be changed relatively easily if necessary. Using the systemwith larger standardized resolutions (e.g.,

640x480)wouldrequiresignificantlylongerprocessingandnetworktransfertimes.Wedecidedthatourdefault

sizeof320x240waslargeenoughtovisiblydemonstratetheresultsoftheseamcarvingalgorithminapractical

waywhileminimizingoverallprocessingandtransfertimes.

ThePCdecodingoftheinputvideoisperformedentirelyinMatlab.Anyvideoformatandcodecsupportedby

Matlabcanbeprocessed.Whenthevideoisdecoded,itsframesareresizedto320x240,whichisa4:3aspect

ratio. If the input video is not 4:3, its existing aspect ratio is preserved and blank data is added around the

content so that it fits completely inside a 320x240 framewithout aspect ratio distortion.Matlab then saves

theseframes individuallyasBMP imagefileswith24bppcolor locallyonthePC.WedecidedtouseBMPfiles

becausetheydonotrequiredecompressionandwearenotconcernedwithstoragespaceonthePC.ABMPfile

with24bppcontains8bitseachforred,green,andbluecolorinformationineachpixel.Italsocontainsheader

dataofvariable lengthanddatapaddingbetweenrowsofthe imageforbytealignmentpurposes. Inorderto

minimize memory access on the DSK, to eliminate unnecessary format parsing on the DSK, and to reduce

networktransfertime,wetransformtheBMPfilestoasimplerRGBformatonthePCbeforeeachframeissent.

ThisRGBformathasnoheaderandnointernalpadding.TheDSKperformsallprocessingon24bppRGBdata(in

the case of the input frame) or 8bpp grayscale data (in the case of the PT). When the DSK processing is

complete,theRGBoutputistransferredtothePC.ThePCsavesthisoutputforeachframeasa420x24024bpp

BMPfilebycreatingtheappropriateheaderinformationandaddingdatapaddingbetweenrowsoftheimageas

necessary.

WeuseMatlab'sbuilt‐invideoprocessingtoolstocreateAVIfilesfromthebitmapimages.TheGUIallowsthe

usertoselectaspecific rangeof framestoretarget fromthe inputvideo.Matlabcreatesaduplicatevideoof

onlytheselectedrangeofframesfromtheinputsothatitcanbeeasilycomparedtotheoutput.WhentheDSK

hasfinishedprocessingallframes,Matlabconvertstheoutputbitmapsintoanoutputvideo.Wechosetouse

uncompressedvideooutputfromMatlabtoeliminatedistortioncausedbycompression.Wearenotconcerned

withstoragespaceonthePC,sothiswasnotaproblem.Iftheoutputvideosneedtobesavedpermanentlyor

moved,or if storage space is an issue, the compressionusedbyMatlabcaneasilybe changedbyadjustinga

singleparameterinMatlab'smovie2avifunctioncall.


23

ColorDepth

Ourinitialimplementationofseamcarvingusedan8bppinputframe.Thiscolordepthwaschosentominimize

DSKprocessingtimeduetofewermemoryaccesses.Tochangethecolordepthofaninputvideo,weaddeda

simplepixel‐by‐pixelconversiontoourPCcodewhichsaved3bitsofreddata,3bitsofgreendata,and2bitsof

bluedata, inaccordancewiththestandard8‐bit truecolorscheme. Inour initial testing,wefoundthatvideos

originallyencodedwithhighercolordepthwereconvertedcorrectlybutexperiencedanoticeablereductionin

quality.Thisreductionisinevitablebecauseofthesmallernumberofcolorsthatcanberepresentedwith8bpp

comparedtoothercommoncolordepthssuchas16bppor24bpp.Weeventuallydecidedtouse24bpprather

than8bppintheDSKprocessingtoeliminateanysignificantreductioninquality.Whenretargetingvideosthat

arealreadysignificantlycompressed,theuseof24bppisnotvisiblydifferentthan8bpp.Inthefuture,itwould

bepossibletoaddsupportforbothformatsandallowtheusertoselecthigh‐qualityorlow‐qualityconversion

as one of the settings. This would require additional code support on the DSK, primarily in the gradient

calculation and the applySeams() function, to handle both formats. The use of 24bpp not only requires

additional memory accesses on the DSK but also requires additional data to be sent over the network.

Ultimately,wedecidedthatthehigherqualityofthevideooutweighedthecostoflongerprocessingtimeand

networktransfertime.

DSKMemoryandPaging

Our retargeting systemoperates sequentially onone frameat a time. Theonly information thatneeds tobe

savedontheDSKfromoneframetothenextisthelocationoftheseams.Thefunctionsthatchangetheseam

locations(getSeams()andgetInitialSeams())operatein‐placeandsimplyoverwritetheseamsofthepastframe

withtheseamsofthecurrentframe.Notethatseamsareformedandupdatedfromlefttorightandmaynot

overlap.Theonlytemporalconstraintofaseamisbasedonitsownlocationinthepastframe,notthelocation

ofanyotherseams.Theseamtoitsleftinthecurrentframeprovidesaspatialconstraint.Thus,noextraseam

datahastobesaved.

Besidestheseamdata,allotherdataisoverwrittenbyeachsubsequentsetofframedata.Wedefinedasimple

structcalledpixelcontainingthreecharfieldsforred,green,andblue.Thiswasusedtomakethecodingmore

intuitivefor24bppframes.Thefollowingtabledetailsthestoragerequirementsforthevarioussetsofdataon

theDSK.


24

Data Type Dimensions Sizeinbytes

Inputframe pixel(3bytes) 320x240 230,400

PT char(1byte) 320x240 76,800

Gradient char(1byte) 320x240 76,800

Energy float(4bytes) 320x240 307,200

Outputframe pixel(3bytes) 420x240 302,400

Seams short(2bytes) 100x240 48,000

(Total) 1,041,600

Approximately 1 MB of data is stored on the DSK. All of these data fields are stored in external memory.

Processing isperformeddirectlyonthedata inexternalmemoryandnopaging isused.TheL2cacheissetto

32KB.Inanearlyimplementationofouralgorithm,wefullyimplementedpagingforallprocessing.Atthetime,

wewere using 8bpp color andwere only performing seam carving functions on theDSK, not prominence or

energyfunctions.DMAwasusedforallpagetransfers.Themaximumsizeofthememoryworkspacethatcould

fit on‐chipwas approximately 120 KBwith L2 cache turned off. In this implementation, we did not notice a

perceptibleimprovementinoverallsystemspeedcomparedtothesameimplementationwithoutpaging.Later,

weupgradedtheframecolordepthto24bppandsimultaneouslyaddedgradientandenergycalculationstothe

DSK.Thesechangesrequiredadjustmetstothepagingmechanism.Duetotheaddedcodecomplexityofpaging,

the need to handle extra boundary cases in all of the processing functions, and the minimal difference in

observed processing time, we chose to eliminate paging in our final implementation. After more detailed

testing,wefoundthatnetworktransfertimeandvariousinefficienciesinthePCcode(whichwerelaterfixed)

werecontributing toaslowoverall systemspeed. It is reasonable toassumethatpagingwouldhelpmemory

access time in our final algorithm implementation, but we could not re‐implement paging due to time

constraints.

Onesimpleoversightofourmemorymanagementwasthatverylittleon‐chipdataisusedexceptforautomatic

variablesinfunctionsandafewsmallpermanentarrays.Giventhattheremainingon‐chipmemoryavailableis

closeto100KB,oneofthedatafieldscouldbeeasilymovedfromexternalmemorytointernalmemorysimply

bychagingonelineofcode.Theseamdataisagoodcandidateforthisbecuaseofitssmallsize(48KB)andits

consistent use in multiple functions. We did not realize this until after performing all of our timing

measurements,sothetimingresultsinthisreportarebasedonfullyexternalmemoryallocation.Anyonewho

usesourcodeinthefuturecaneasilymakethischangetoon‐chipmemoryifpagingisnotdesired.


25

Network

Foreachframe,thedatasentfromthePCtotheDSKiscomprisedofthescaledinputframeandthepartialPT

matrix, which is comprised of weighted face detection and motion detection scores. Both sets of data are

320x240pixels.Theinputframeis24bppandthePTis8bppgrayscale.Theresultingsizeoftheinputframeis

320 x240x3=230,400bytesand the sizeof thePT is320 x240x1=76,800. The total amountofdata to

transferfromthePCtotheDSKperframeis307,200bytes.ProfilingtheDSKcodefornetworkreceivingcaused

thedatatransfertofail,sowecouldnotdeterminethistimeexactly.Basedonthe2.5MB/sinboundspeedlimit

oftheDSK,weestimatethatthistransfertakes123ms.Thefollowingtabledetailsthetimerequirements.

Data Sizeinbytes Cycles Time

Inputframe 320x240x3=230,400 20,700,000estimated 92msestimated

PartialPT 320x240x1=76,800 6,975,000estimated 31msestimated

(Total) 307,200 27,675,000estimated 123msestimated

ThedatasentfromtheDSKtothePCiscomprisedoffouroutputimages.Theprimaryoutputistheexpanded

video frame itself,which is420x240and24bpp.Theotheroutputsare the finalPTmatrix, theenergymatrix,

andtheexpandedframewithvisibleseamsdrawninred.ThePTandtheenergyare320x240becausetheyare

basedontheinputimage.Theexpandedframewithvisibleseamsis420x240.Alloutputdataissentas24bpp.

ThismakestheprocessofconvertingfromRGBtoBMPconsistentforalloutputimages.Inthefuture,itwould

be possible to reduce network transfer time simply by sending the final PTmatrix and the energymatrix as

8bpp, since the data is grayscale. Thiswould also require additional grayscale‐to‐BMP conversion code to be

written on the PC with appropriate adjustments to the BMP header data to create an 8bpp BMP file.

Alternatively,thePCcouldconvertthegrayscaledatato24bppRGBandsubsequentlyconvertitto24bppBMP.

Wemadetheinitialdecisiontomakealloutputdata24bppforconsistency,notknowinghowthisaddeddata

transferwould impact outoverall system speed.Ultimately, network transfer timewasobserved tobemore

significant than we expected, but due to time constraints we did not have the opportunity to create the

additionalcodenecessarytohandle8bppoutputconversion. WhensendingdatafromtheDSKtothePC,we

were able to use DSK code profiling to perform precise measurements. These measurements essentially

confirmedthe10MB/soutboundspeed limitof theDSK.ThetablebelowshowstheDSK‐to‐PCdatasizesand

timerequirements.


26

Data Sizeinbytes Cycles Time

Outputframe 420x240x3=302,400 6,350,700measured 28.2msmeasured

Outputframe+seams 420x240x3=302,400 6,350,700measured 28.2msmeasured

FinalPT 320x240x3=230,400 4,833,900measured 21.5msmeasured

Energy 320x240x3=230,400 4,833,900measured 21.5msmeasured

(Total) 1,065,600 22,369,200measured 99.4msmeasured

Thetotaldatasizeis justover1MB,andthetotaltimeisabout100ms.Thismatchesthepredicted10MB/s.If

the finalPTandenergyweresentas8bpp instead, theywouldeachrequire76,800bytesofdata transfer,or

about 7.7ms each. This would reduce the total outbound transfer time to an estimated 72ms per frame, or

about72%ofthepresenttransfertime.Notethatinapracticalapplicationofoursystem,suchasaconsumer‐

end video retargetingdevice, theonly necessary outputwouldbe the actual output frame,whichhas only a

28mstransfertime.Ifthebehaviorofthealgorithmwasdeterminedtobeadequate,itwouldberelativelyeasy

toaddanoptiontothesystemthatwouldallowittosendonlytheoutputframetothePC.Thiswouldincrease

theoverallspeedofthesystembyeliminatinginformationabouthowthealgorithmisworking.Wedidnotadd

thisoptionbecausewealwayswantedtohavetheadditionalinformationaboutthebehaviorofthealgorithm

availabletous.

MatlabProcessing

ThefullPTismadeupoffacedetection,motiondetection,andtheimagegradient.Weimplementedtheface

detectionandmotiondetectionalgorithms inMatlabaspartofthepre‐processingofeachframe.Sincethese

twodetectionalgorithmscontributeapartofthePTscoreforeachpixel,wecalltheoutputa"partialPT."When

the retargeting process is started, the array of input frames selected by the user is already saved locally in

Matlab'smemory.TheMatlabcodeisthenresponsibleforperformingthefacedetectionandmotiondetection

foreachframeandsavingtheresultingdatainafileonthePCharddrive.InadditiontotheexistingBMPfile,

eachframenowhasacorrespondingpartialPTfile.Thedataisan8bppgrayscalerepresentationofthepartial

PT.ThesefileswilllaterbereadbythenetworkserverprogramandsenttotheDSKasneeded.

Inourmeasurements,wefoundthatMatlab isabletoperformpreprocessingatarateofabout5 framesper

second.Thisspeedincludesthetimerequiredforfacedetectionandmotiondetectionandforsavingthefileto

theharddrive.WedidnotperformanysignificantoptimizationsbyhandtotheMatlabalgorithmssincetheir

performancewasalreadyadequateonthePC.


27

AftertheDSKhasfinishedprocessingallframes,MatlabreadsallBMPfilesofexpandedframesandassembles

themintoaMatlab‐nativevideostructure.Thefinalexpandedvideoissavedtothelocaldrivewiththebuilt‐in

movie2avi()function.BecauseMatlabcanreadandassembletheexpandedframesrelativelyfast,wedesigned

thesystemtowaituntilallexpandedframeswereavailablebeforestartingtoreadthem.Thiscouldeasilybe

changed such that, after it completes preprocessing all frames, Matlab starts to asynchronously read the

expandedframeswhiletheDSKisstillprocessing.Inthisscenario,afterthefinalframeisexpandedbytheDSK,

thetimerequiredtofinishassemblingthevideoinMatlabwouldbesignificantlyreduced.Webelievedthisissue

was relatively unimportantbecause, formost videos, the total video assembly time is small. For the cases in

whichthistimeissignificant,itisstillrelativelysmallcomparedtotheoverallsystemprocessingtime.

DSKProcessing

TheprocessingofeachframeontheDSK isperformed indistinctstages.Thefirststage is tocompletethePT

matrix.Thefacedetectionandmotiondetectionresults,whicharecalculatedinMatlabandsenttothetheDSK,

arealreadysaved inmemory.TheonlyremainingcomponentofthePT isthegradient,which iscalculatedon

theDSK and added to the existing PT. Once the full PT is formed, the energy is calculated top‐down on the

frame.Seamcarvingisbasedsolelyontheenergymatrix,nottheinputframeitself.Eachseamisdrawninthe

framefromthebottomup.Thisislogicalbecausebitmapimagesarecustomarilystoredondiskandinmemory

row‐by‐rowstartingwiththebottomrowoftheimage.ThisroworderingispreservedintheRGBrepresentation

ofeach frameon theDSK.Theenergy calculation isperformed in theoppositedirectionof the seamcarving

algorithmsothattheseamcarvingalgorithmcan"seeahead"ofitscurrentposition.Thisultimatelyallowsusto

useagreedyalgorithmwhencarvingseamsbecauseallthenecessaryinformationfordirectingtheseamateach

pixel is contained nearby in the energymatrix. The seams are then applied to the input frame to create the

expandedframe.

In general, eachmajor function in theDSKalgorithms consistsof a constantnumberofmemoryaccesses for

eachpixel.Giventhe fact that the functionstendtotakethesameamountof timefordifferent input frames

andvideos, it is reasonable toassume thata significantportionofeachstage is limitedbymemoryaccesses,

althoughtheL2cacheincreasesthespeed.Thetablebelowdetailsthecyclecountsthatwemeasuredforeach

stageofouralgorithm.Thesecyclecountsdidvarybyseveralthousandfromframetoframebutwerealways

veryclose.


28

Function Cycles(approximate) Time

gradient 1,850,000 8.2ms

getEnergy 3,540,000 15.7ms

getSeams 4,060,000 18.1ms

applySeams 2,820,000 12.5ms

(Total) 12,280,000 54.6ms

Insomecases,thefunctiongetInitialSeams()iscalledinplaceofgetSeams().Theonlydifferencebetweenthese

functionsisthepresenceoftemporaldependencyingetSeams().WhengetInitialSeams()iscalled,thealgorithm

is executed the sameway but the perevious seam locations are not used. The cycle count we observed for

getInitialSeams()was3,950,000,orabout110,000cycleslessthangetSeams().Thisdecreaseispartiallydueto

thesmallernumberofmemoryaccessesbecausethepreviousseamdatadoesnothavetoberead.Sincethe

actualtimedifferenceislessthan0.5ms,wecosiderthedifferencetobenegligibleandsimplycountthevalues

listedintheabovetableasaverage‐casetimes.

PerformanceOptimization

ToimproveDSKperformance,wechangedsomeofthecompilersettingsinadditiontomanuallymodifyingour

code tomake itmore optimal. Once our testing demonstrated that the algorithm behavior was correct, we

turnedoffthememoryaliassafetycompileroption.Weexpectedthatthiswouldsignificantlyhelpperformance

because of the heavy use of heap pointers in function calls and calculations. In reality, this adjustment only

significantlyincreasedthespeedofthegradientcalculation.Figure7.1showsthecyclecomparisonbetweenthe

stages of the algorithm using alias safety on (the default setting) and off (which we used in our final

implementation).


29

Figure7.1:Cyclecomparisonwithandwithoutaliassafety

Computationtime isspreadrelativelyevenlybetweentheprimaryalgorithmfunctions,sothere isnoobvious

bottleneck in the algorithm. The slowest functions in cycle time are getSeams and getInitialSeams, but we

expect this becauseof the complexity of the seam routing process and theneed formanymemory accesses

(sometimes repeating) in the seam data and energy data. From our initial estimates, we can see that the

compiler isabletooptimizeourfunctionssignificantly.Forexample,thenumberofexternalmemoryaccesses

(alloccurringas1‐bytereads)inthegradient()functionaloneisover1.2million.Ifeachbytewasactuallyread

fromexternalmemoryateachcorrespondinglookupinthecode,thiswouldrequirenearly7millioncyclesfor

memory access alone, given an access time of 5.6 cycles. Alternatively, if each byte was read from internal

memorywithanaccesstimeof1.5cycles,memoryaccesseswouldrequire1.8millioncycles.Sincetheentire

optimizedgradient()functionrequiresjustover1.8millioncycles,wecanassumethatacombinationoftheL1

andL2cachesminimizestheexternalmemoryaccesstimeforthisfunction. IngetEnergy(),wecountedabout

2.4millionbytesofexternalmemoryaccesses,suggesting3.6millioncyclesrequiredifinternalmemoryisused

(1.5cyclesperbyte),andthecycletimeofthefunctionisabout3.5million.Itisobviousfromtheseresultsthat

both theL1andL2cachesareusedheavily inourprocessing.Similarcacheoptimizationsoccur for theother

functions.

Afterverifyingouralgorithmbehavior,werevisedmuchofoursourcecodebyunrollingconditionalstatements

andboundarycasesfromwithinimportantloopswhenpossible.ForthegetEnergy()function,thischangealone

reducedthecyclecountbyover50%fromourinitialimplementation.Wealsoeliminatedunnecessarymemory

accesses in the getSeams() and getInitialSeams() functions by establishing a validitiy check for possible seam

routes before comparing their energy measurements. Based on the compiler's suggestions, we added code

speculationoption(‐mh2)andremovedthedebugoption(‐g)tomaximizespeedinthefinaltests.Optimization


30

level3wasused for theproject andno settingswere changedonaper‐filebasis.Wedidnotneed toworry

aboutnegativeeffectsfromaggressiveoptimizationbecausenointerrupthandlingisperformedinourcode.

AsafinalgaugeofourDSKperformance,weexaminedtheassemblycodegeneratedbythecompiler.Fromthis

we observed that nearly all of the primary processing loops in the algorithm were scheduled with 3 or 4

iterationsinparallel.Forthosethatwerenotscheduledinparallel,wewereabletochangethecodefurtherby

hand,makingminoradjustmentstocreateadditionalparallelism.TheonlyexceptiontothiswasthegetEnergy()

method,whichcontainsmanyintermediatevariablesandcalculations,andforwhichthecompilercouldnotfind

aschedulewithanyiterationsinparallelregardlessofouradjustments.Still,weweresatisfiedwiththisresult.

SystemSpeed

Videoandimageconversionisnotcountedinourtimingofthesystemsincethesetimesarerelativelysmalland

sincewe simplyused thebuilt‐inMatlab functions asneeded. Forourpurposes,weassumed that the set of

inputBMPfilesisalreadypreparedandthatthesystemonlyneedstooutputaBMPfileforeachframewithout

consideringvideoreassembly.

Theretargetingsystemrunsatabout2.7framespersecond.Thecorrespondingtimeperframeisabout370ms,

whichisbrokendownindetailinFigure7.2.Thisincludesthetimeforreadingtheinputfilefromthelocaldrive

andsavingtheoutputfiletothelocaldrive.Thesefilemanagementtimes,inadditiontothetimerequiredfor

BMP‐to‐RGB and RGB‐to‐BMP conversion, are counted in the PC overhead time. As described above,Matlab

preprocessingofeachframeoccursasynchronouslyandisnotcountedhere.Networktransfertimesweremore

significantthanweoriginallyexpectedandcontributed220ms,orabout60%ofthetimeperframe.

Figure7.2:Averagetimeforretargetingasingleframe


31

DSK processing time is divided into two categories in Figure 7.2. Primary processing refers to the algorithm

functionsdescribedintheDSKprocessingsectionearlier.Thesearethefunctionsnecessarytogenerateasingle

expandedoutputframe.Inaddition,theDSKperformssomesecondaryoperationstodeliverthePT,energy,and

visible‐seamoutputstothePC.OnedesignchoicewemadewastosendalloutputdatatothePCas24bpp.This

allowed us to use the same RGB‐to‐BMP conversion function for all outputs on the PC,which simplified our

codingbut resulted in slowernetwork transfer,asdescribedearlier. Inaddition, it requiresextra timeon the

DSKtoconvertthePTmatrix(oftypechar)andtheenergymatrix(oftypefloat)toRGB.Itwouldbepossibleto

speed up the system by eliminating these DSK‐side conversions, but we did not have time to make these

changes. The other part of the secondary DSK processing is a second instance of the applySeams() function

whichdrawstheseams inred insteadofexpandingexistingpixels in the frame.This ismemory intensiveand

performsmostlyidenticalmemorycopiesastheoriginalcalltoapplySeams().Asaresult,callingbothfunctions

inseriesaswedoisinefficient,butwewewantedtokeeptheabilitytocalloneortheotherindependentlyif

necessary.Themoreefficientalternativewouldbetodrawtheseamsovertheexistingoutputframeafteritis

senttothePC,sothattheframedoesnothavetobeexpandedagain.Alternatively,theDSKcouldsimplysend

theseamdataandthePCcoulddrawthemontopoftheexpandedframe.Eitheroftheseapproachescouldbe

implementedinthefuture,butourapproachisthemostrobustatthecostofspeed.

Onceagain,allofthesecondaryDSKprocessing,asignificantpartofthePCoverhead,andasignificantpartof

theDSK‐to‐PCtransfercouldbeeliminatedinapracticalapplicationofthesystembecausetheonlyimportant

result istheexpandedframeitself.Toestimatethespeedincrease inthisscenario,wecalculatethatthenew

DSK‐to‐PCtransfertimewouldbe28ms,andweknowthatthesecondaryDSKprocessingwouldbeeliminated

completely.ThePCoverheadwouldbeapproximatelycutinhalfto35msconservatively.Theresultingtotaltime

per framewouldthenbeabout240ms,a35%reduction fromthecurrentmeasuredtime.Thiswouldallowa

system throughput of about 4 frames persecond. We did not test this scenario because we were more

concernedwiththebehaviorofthealgorithmthanwithideal‐casespeedupgrades.

8.Problems,Issues,andFutureWork

ISSUE:

Theoutputis"wavy"duetomovingseamswithinunimportantregions.

DISCUSSION:

Avarietyofdifferentvideosdisplayedwavy‐likebehaviorwithinseamregionsafterbeingprocessedbyCAVES.


32

Withintheseunimportantregions,theseamstendtomovearoundquitenoticeablyandhaveanadverseaffect

ontheoveralloutputvideodisplayexperience.Themainreasonbehindthisisenergychangesduetonoisefrom

thevideoitself.Aswerescaleinputvideostoourpredeterminedheightandwidth,thevideoqualityisnotideal.

Theexistingcompressioninthevideosweusedfortestingwasalsosignificantenoughtocauseamplifiednoise

inourPTcalculations.Subsequently,thereisenoughnoisetocausetheenergyregionsinseamareastochange

suchthattheseamsthemselvesareforcedtomovebasedonthenatureofthealgorithm.Theseamschangein

ordertofollowtheleastenergypaths,butthesepathsappearanddisappearwiththevideonoise.Asaresult,

ourseamstendtoexhibitawavyeffectattimesinunimportantregions.

Apossibleimprovementcouldbetoaccountforthisvideonoisewithintheenergycalculationalgorithmitselfor

tofilteroutthevideonoisebeforeprocessingtheframe.Thisway,theenergymatrixcalculationscouldbemore

accurateandonlydisplaylegitimateenergychanges,notrandomfluctuationsintheinputvideosequence.

ISSUE:

Seams can not move fast enough to accommodate for fast changes in video sequences such as character

movement.

DISCUSSION:

WhentestingvideoswithCAVES,wewereforcedtosetboundstoseammovementinordertoprovidetemporal

smoothness to the video sequences.Without this temporaldependency in seams, video sequencesdisplayed

choppy results as seams relocate themselves around the image too drastically. Such choppy behavior was

unacceptableintermsofthefinalviewingexperiencethusmakingtemporaldependencyanessentialtechnique.

Nonetheless, this customization came at a cost. During periods of extreme movement during the video

sequences,characterssometimesrun intotheseamsandbecomedistorted.Duetoourrestraints,there isno

way for the seam tomove out of the way fast enough. As expected, the algorithm does detect the energy

change and seams shift accordingly within time. However, in this time the important regions of our video

becomedistorted.Thereareacoupleofpotentialfixesthatwewilldiscussinordertobetteraccommodatefor

this.

Thefirstpotentialfixistomaketheseamrestraintrangechangewithrespecttomotion.Currently,motiononly

addstotheprominencematrixwhichgets fed intoourenergyfunction.Theaddedmotiondoeshave itsown

contributionasitintroducesmoreenergyintospecificsectionsofmovement,butitdoesnotdelegatetherange

oftheseams.Anewsystemcouldtakeintoaccountmotionspecifically,andchangetherangeofspecificseam

movement dynamically. Ideally, this would result in seamswith a large range ofmotion in areas where the


33

algorithmdetectslargequantitiesofmovement.

Asecondpotentialfixforthisscenarioistoforcetheredrawingofseamsinthesespecificareasofinterest.Let

usconsideranexamplewhereacharacterorobjectmovesintoanareadenselypopulatedbyseamsandthus

becomesdistorted.Aproposedfixistoredrawtheseamswhichcausethisdistortion.Ifacharactermovesintoa

groupofseams,thatgroupofseamswillberecalculatedanddrawninadifferentareaoftheframeawayfrom

themovement.Sincethisisonlyasmallsectionofseamsbeingrecalculatedatatime,thiswillnothinderthe

viewingexperience.Thisisanotherpotentialimprovementuponourcurrentmethodstodrawnseams.

ISSUE:

Duringsceneswithahighnumberofimportantregions,areasinevitablygetdistortedincertainareas.

DISCUSSION:

This is a problem with our implementation which has a tough time handling videos with many important

regions.Thepotentialfixbehindthisliesinthefactthatwehavechosentohardcodethenumberofseamsand

expandeachseambyexactlyonepixel.Asidefromthisapproach,thereareavarietyofothermethodswhichwe

havethoughtabout,butdidnotimplementintheCAVESproject.

Thepotentialsolutiontothisliesinthefactthatitisnotnecessarytoexpandaseambyonlyonepixel.Sincewe

decided toexpandour imagebyonehundredpixels, itwasnot required thatwehave to routeonehundred

specificseams.Forexample,fiftyseamscouldhavebeenchosenwhichcouldhavebeenaddedtwiceinsteadof

once. This still expands the image by one hundred pixels and draws less seams which could result in less

distortion. Furthermore, smarter techniques could have been developed to expand low energy seamsmore

often than higher energy seams. This way, the overall number of seams could have been reduced but the

expansionresizingcouldstillbethesame.Thiswouldbeanotherpotentialfixfortheissueofhavinginevitable

distortionin"busy"frames.

9.FinalWorkSchedule

We chose to create the GUI and the PC‐DSK network infrastructure early in the project, in parallel with the

researchanddesignofourcorealgorithms,sothatwecouldeasilyobservethebehavioroftheDSKcodeaswe

transitioned from theMatlab implementation of the algorithm to the C implementation. This allowed us to

verifythebehaviorofthePTandenergycalculationsandwasparticularlyusefulintheearlystagesofdebugging


34

our seamcarvingalgorithm.Since thecomponentsofour systemareverymodular, itwas straightforward to

test each part of the data flow either inMatlab or on the DSK itself.We did not encounter any significant

obstacles in theMatlaborC implementationsof the functions, sowewereable toclosely followouroriginal

timeframe and to use the last fewweeks to tweak the behavior of the algorithm andwork on optimization

ratherthandebugbehavioral issues in thesystem.Thetablebelowshowstheweeklybreakdownof taskswe

accomplished.

Week Task Person

October12 Matlabgradientsandfacedetection Dave

PCvideo/imageprocessing Greg

Retargetingalgorithmresearch Aneeb

October19 GUIcreation Greg

Matlabmotiondetection Dave

Retargetingalgorithmresearch Aneeb

October26 Matlabretargetingimplementation Aneeb

Networkinfrastructure Greg

Facedetectionandmotiondetectionimprovements Dave

November2 SeamcarvingimplementationonDSK Everyone

November9 EnergycalculationonDSK Everyone

Networkimprovements Greg

November16 Seamcarvingbehavioraladjustmentandoptimization Greg,Dave

Energycalculationoptimization Aneeb

November23 DSKalgorithmtweaks;completesystemtesting Everyone

November30 NetworkandPCcodefixes Greg

Convertfullvideosfordemonstration Aneeb

Finalcodeoptimizationsandcompletesystemtesting Dave


35

10.References

•[1]L.Wolf,M.Guttmann,andD.Cohen.Non‐homogeneousContent‐drivenVideo‐retargeting.IEEETrans.on

ImageProcessing,2007.

–Videoretargetingbasedonlargecomputationsthroughsystemsoflinearequations

•[2]S.‐C.Liu,C.‐W.Fu,andS.Chang.Statisticalchangedetectionwithmomentsundertime‐varyingillumination.

IEETrans.OnImageProcessing,1998.

–Motiondetectionalgorithm

•[3]S.AvidanandA.Shamir.Seamcarvingforcontent‐awareimageresizing.SIGGRAPH,2007.

–Originalseamcarvingexplanation;weusethisinformationasthebasisofouralgorithms

•[4]S.Avidan,A.Shamir,andM.Rubinstein.ImprovedSeamCarvingforVideoRetargeting.ACMSIGGRAPH,

2008.

–Ideaofforwardenergycalculationsandsuggestionsfortemporalconsiderations

•[5]Marius,D.,Pennathur,S.,&Rose,K.(n.d.).FaceDetectionUsingColorThresholding,andEigenimage

TemplateMatching.

–Facedetectionalgorithmresource

•[6]Z.Wolkowicki,J.He,andM.Gonzalez‐Rivero.ContentAwareImageResizing.18‐551Fall2007Group6,

2007.

–Generalresource;adoptedWolkowicki’sfacedetectionalgorithm

Documents

18551 Fall 2008 G1 Final Report - ECE:Course Page