Delivering Deep Learning to Mobile Devices via...

Preview:

Citation preview

DeliveringDeepLearningtoMobileDevicesviaOffloading

Xukan Ran*,Haoliang Chen*,Zhenming Liu1,Jiasi Chen**UniversityofCalifornia,Riverside 1CollegeofWilliamandMary

Deeplearningonmobiledevices

• Augmentedreality (AR)isthenext“killerapp”

• FastobjectrecognitioniskeyforgeneralARapplications

• Deeplearningisapopulartechniqueforobjectrecognition2

Pokemon GoSnapchatfilters(facedetection)GoogleTranslate(textprocessing)

Problem

• Currentapproachesfordeeplearningonmobiledevices1. Local-onlyprocessing

• ApplePhotos,GoogleTranslate• GPUspeedup[1]

2. Remote-onlyprocessing• AppleSiri,AmazonAlexa

• Goal:Developaframeworktointelligentlyoffloadtonearbyedgedevicesforreal-timevideoanalysisusingdeeplearning.

• Cannotusegeneraloffloadingtechniques.Needtospecificallyaccountfor:• Characteristicsofthevideo• Characteristicsofthedeeplearningmodels• Applicationrequirements

3

Slow!(~600ms/frame)

Doesn’tworkwhennetworkisbad

[1]L.Huynh,Y.Lee,R.Balan,“DeepMon:MobileGPU-basedDeepLearningFrameworkforContinuousVisionApplications”,ACMMobiSys,2016.

Designspace

Degreesoffreedom• Videocharacteristics

• Framerate• Resolution• Bitrate

• Deeplearningcharacteristics• Modelsize• Modellatency/energy• Modelaccuracy

Constraints• Apprequirements

• Latency• Accuracy

Metrics• Accuracy• Framerate• Energy

4

Complexinteractionsbetweenthesedegreesoffreedomandmetrics• e.g.,highbitratewhenoffloadingà highaccuracy,highenergy• e.g.,smalldeeplearningmodelà highframerate,lowaccuracy

Howtodecide?

Optimizedecision

Constraints:• Currentnetworkconditions• Applicationrequirements

offloadingdecision

neuralnetmodelsize

videoresolution

Decisionframework

detectionaccuracy

framerate

energyconsumption

Metrics:Degreesoffreedom:

Relationbetweenthedegreesoffreedomonthemetricscannotbeanalyticallyunderstoodà needmeasurements!

Systemdesign

6

Offlineperformance

characterization

ServerFront-enddevice

Camerafeed Outputdisplay

BatteryfunctionLatencyfunctionAccuracyfunction

SmallCNN

?Onlinedecisionengine

BigCNN

car0.9

Measurements:-

-

-

s

Experimentalsetup

7

• Deeplearningmodel:YOLObuiltonTensorflow [2]• tiny-yolo:9convolutionallayers• big-yolo:22convolutionallayers

• Localprocessing:OnePlus3TAndroidphonewithquad-coreCPU,6GBRAM• Remoteprocessing:Serverwithquad-coreCPU,8GBRAM,NVIDIAGeForceGTX970graphicscardwith4GBofRAM

[2]JosephRedmon,AliFarhadi,“YOLO9000:Better,Faster,Stronger”,CVPR,2017.

Videoframe

Boundingbox

Developedapptoimplementoffloading:

8

Local-onlyprocessingRemote-onlyprocessing

Howdolatencyandenergychangewithresolution?

9Energyandlatencyincreasewithpixels2

• Encodeavideoframeatdifferentresolutions• MeasuretheprocessingtimeandenergyusageinAndroidonthesmartphone

Howdoesaccuracychangewithbitrateandresolution?

10

• Encode20videosatdifferentbitrateandresolutions• Measuretheaccuracy(IoU)relativetothebig-yolo+rawvideo

Accuracyincreaseswithlargermodel,higherresolution,higherbitrate

big-yolo tiny-yolo

Howfastisdeeplearning,end-to-end?

11

• Measure#processedframespersecond,undercontrollednetworkconditions• Caveat:stop-and-waitforeachprocessedframe

• Increasedbandwidthà higherframerate• Whenbandwidth>5Mbps,shouldoffload

0 200 400Added Network Latency (ms)

0

1

2

3

Fram

es p

er s

econ

d offload to serverrun locally on phone

• Increasedlatencyà lowerframerate• Whenlatency<100ms,shouldoffload

Howmuchtimeisspentforcommunication?

• Recordtimestampsasframetravelsfromphonetoserverandback

12

Whenoffloading,majorityoftimeisspentonnetwork

Howmuchbatteryisusedfromoffloadingdeeplearning?

13

HigherbandwidthàmorebatteryPrefertorunlocallytosavebattery

• Measurethebatterydropafter30secondsofcontinuoususage

Howwelldoesoffloadingdointhewild?

14

• Perform5trialsinpubliclocationsoverLTEandWiFi• Coffeeshop1:Differentcityfromserver• Coffeeshop2:Samecity,samesubnetasserver

• Apartment1:Differentcitythanserver• Apartment2:Samecityasserver

14PerformanceoverLTEsometimes>WiFi HigherframeratesoverLTEattheexpenseofdatacost

KeyTake-Aways

15

Real-timevideoanalysisusingdeeplearningisslow(~600ms/frameonsmartphones)

Offloadingcanbebeneficial(upto2xframerate),butoptimaldecisionisunclear

Inthewild,LTEsometimes>publicWiFi

Recommended