Apps with Hardware - USENIX · Apps with Hardware Enabling Run-time ... User (Operating System)...

Preview:

Citation preview

AppswithHardwareEnablingRun-timeArchitecturalCustomizationinSmartPhones

MichaelCoughlin,AliIsmail,EricKellerUniversityofColoradoBoulder

MobileDevices

2

Devicesaredesignedaroundcertainrestrictions

Thisleadsvendorstomaketradeoffs

Whatifusersanddeveloperscouldchoose?

Vision:SmartPhonewithanFPGA

3

HW SW

Android

FPGA ARM

App

Software-definedRadio

4

High-performanceComputing

5

Cryptography

http://www.nallatech.com/40gbit-aes-encryption-using-opencl-and-fpgas/

Analytics

http://www.datanami.com/2015/03/10/fpga-system-smokes-spark-on-streaming-analytics/

ArchitecturalEnhancements

6

Somniloquy (NSDI09)(SEC04)

Whyisnowtherighttime?

7

SoCs withProgrammableLogiccoupledwithARMCortexA9 (sameasiPhone4andmanyothersmartphones)

High-levelSynthesisWriteC/C++/SystemC /OpenCL code

8

FundamentalProblem:

SharingtheFPGAbetweenapplications

Whatwecanalreadydo

9

Processor

Apploads:softwarerunsonprocessor,FPGAconfiguredwithhardware

FPGA

AppX

AppXHardware

AppXSoftware

Whatwecanalreadydo

10

Thisiscurrentlypossible– run-timereconfiguration

Processor FPGAAppXHardware

AppXSoftware

Apploads:softwarerunsonprocessor,FPGAconfiguredwithhardware

Sortof

Whatwecan’tdo

11

Whatifwehavetwoapps?

Processor FPGAAppXHardware

AppXSoftware

AppY

AppYHardware

AppYSoftware

Whatwecan’tdo

12

Whatifit’sasinglechip(andsomeI/OgoesthroughtheFPGA)

I/O

Processor FPGAAppXHardware

AppXSoftware

I/O

AppY

AppYHardware

AppYSoftware

• Overadecadeofresearchhasproposedtwomainsolutions:– Run-timeplace-and-route– Slot-basedreconfiguration

Whyhasn’tthisbeensolvedbefore?

13

• ThereisfreespaceintheFPGA• Placeanewmodulethere

14

Approach1:Run-timePlace/Route

• Routingcanfail• Routingisalsoverytimeconsuming

• Therefore,isnotpractical

15

Approach1:Run-timePlace/Route

• IdenticalemptyregionsarereservedinFPGA

• Constraintoolsto:– Notusewires/logicinsideofslots– Useexactsamewiresforinterface

16

Approach2:Slot-BasedReconfiguration

Slot1

Slot2

Slot3

• Hardwareisloadedintoslots• Problem:ifotherlogicexists,wireroutingbecomesveryconstrained

• Therefore,isalsonotpractical

17

Approach2:Slot-BasedReconfiguration

Slot1

Slot2

Slot3

• Run-timePlaceandRoute– Isverycomputationallyexpensive– Canpossiblyfail

• Slot-baseReconfiguration– Constrainedroutingisveryrestrictiveandnotapplicablegenerally

• Therefore,previousresearchisnotpractical

PreviousResearch

18

• AllowsforsharingoftheFPGAbetweengeneralapps

• Usesexistingvendortechnologies

• Adoptstheideaofslotsfrompreviousresearch

• CloudRTRmakesexistingvendortechnologyworkforgeneralapps

IntroducingCloudRTR

19

TheAppDeploymentModel

20

CloudRTR

21

Manufacturers

Developer

CloudRTR

Android

FPGA ARM

Consumer

StaticDesign

1 2 3

StaticDesign

1 2 3

StaticDesign

1 2 3

• Createsastaticdesign– Alllogicthatdoesnotchange

• Designincludesareasreservedforslots

• Sendsthistothecloudcompiler

Manufacturer

22

StaticDesign

1 2 3

GPU AXI

• Createanappusingexistingtools

• CreateahardwaredefinitioninC

Developer

23

boolexample(ap_uint<32>*inap_uint<32>*out,bool*enabled,

)

• Compileshardwareforeachapp– Foreachdevicevariant– Foreachslotineachvariant

AppStore(CloudCompiler)

24

X

App

[device1:[slot1:a.bit,slot2:b.bit,slot3:c.bit]]

[device2:[slot1:d.bit,slot2:e.bit]]

CloudCompiler

StaticDesign

1 2 3

StaticDesign

1 2 3

StaticDesign

1 2 3

• Asystemservicemanagesslots

• Downloadedappsincludeslothardware

• Thesystemserviceloadsapphardwareforapps

User(OperatingSystem)

25

.apk:[device1:[slot1:a.bit,slot2:b.bit,slot3:c.bit]]

FPGAGPU AXI

1 2 3X

• Theslotmanagerenforcesaccesstohardware

• However,FPGAscantheoreticallydirectlyaccesssensitiveresources(whilebypassingtheOS)

• Asecureloadingsystemensuresthatappscannotaccesssensitiveresources

SecurityConsiderations

26

Secureloadingsystem

27

Processor

FPGA

Howdoesthesecureloaderwork?

Slot1 Slot2

MemoryController

OperatingSystem SignatureVerification

ReconfigurationModule

ICAP

Secureloadingsystem

28

Processor

FPGA

Slot2

MemoryController

OperatingSystem SignatureVerification

ReconfigurationModule

ICAP

Signedmodule

Slot1

TheOSwantstoreconfigureSlot1

Secureloadingsystem

29

Processor

FPGA

Slot1 Slot2

MemoryController

OperatingSystem SignatureVerification

ReconfigurationModule

ICAP

Signedmodule

Thesignatureofthemoduleisverified

Secureloadingsystem

30

Processor

FPGA

Slot1 Slot2

MemoryController

OperatingSystem SignatureVerification

ReconfigurationModule

ICAP

Signedmodule

ThemoduleiswrittentotheICAP

Secureloadingsystem

31

Processor

FPGA

Slot1 Slot2

MemoryController

OperatingSystem SignatureVerification

ReconfigurationModule

ICAPSignedmodule

TheICAPperformsthereconfiguration

• Istherevalueinappswithhardware?

• Isthecloud-basedcompilationofCloudRTRpractical?

Evaluation

32

Microbenchmark1:QAMdemodulator

33

4ordersofmagnitude

Microbenchmark2:AES

34

FPGAis3xvs.

OpenSSL

• Wealsoimplementedahardwarememoryscanner

• ItcanscantheentireaddressspacetransparentlytotheOS– 2.7%memoryreadperformancehit– 5.5%memorywriteperformancehit

• WetestedthisusingtheLMbench testbench

Microbenchmark3:MemoryScanner

35

Brute-forcecompilation

36

GooglePlayStore Figures

#ofAppsas of Dec 14 1.43Million

AverageMonthlyAppGrowth 6.10%

#ofAppsforJanuary16 117,521

provided by AppFigures.

Brute-forcecompilation

37

Max#ofApps Compiledperday

#ofSlots

Apps

2 121

3 96

4 76

5 59

6 51

2SlotsRequirements %ofAprilApps thatuseHardware(#ofAppsUploadedperDay)

0.1(3)

1(34)

10(347)

#ofDeviceVariants

#ofMachines RequiredtoCompileApps

1 1 1 310 1 3 29100 3 29 2881000 29 288 2875

Reasonableformostscenarios

Brute-forcecompilation

38

6SlotsRequirements %ofAprilApps thatuseHardware(#ofAppsUploadedperDay)

0.1(3)

1(34)

10(347)

#ofDeviceVariants

#ofMachines RequiredtoCompileApps

1 1 1 710 1 7 69100 7 69 6811000 69 681 6809

Max#ofApps Compiledperday

#ofSlots

Apps

2 121

3 96

4 76

5 59

6 51

Stillreasonableformostscenarios

• Compilationcanbeoffloadedtomanufacturers

• Manufacturerswilllikelyreusedesigns (Qualcomm,ARMchipsareoftenreused)

• Developerswilllikelyuselibraries

Reducingthenumbersevenmore

39

• ToronAndroid

• AESisonthecriticalpath

• ExamineAESasanintegrationstudy

ImplementationCaseStudy:Orbot

40

Whatwefound:• Memoryoperationsarethebottleneck– Datamustbeplacedcorrectlyinmemory– Userspace I/Ohashighoverhead– ManysystemcallsareincompatiblewithUIO

• Itiseasiertobuildanapplicationfromground-up

ImplementationCaseStudy:Orbot

41

• Wehavepresentedourvisionofappswithhardware

• CloudRTRimplementsourvisionbyleveragingthemobileappdeploymentmodel

• Wehavedemonstratedthevalueandpracticalityofourvision

Conclusion

42

• Email:michael.coughlin@colorado.edu• Sourcecode:https://github.com/nsr-colorado/cloud-rtr

Questions?

43

VendorSupportedPartialReconfiguration

44

TargetFPGA

StaticDesign

DynamicModule (s)

Vendor tools

• base.bit• partial_1.bit• partial_2.bit

(Partialbitstreams workin1location,andarejustforbase.bit)

Goal:Spacesavingforcustomer

• Crypto– Asymmetric(RSA,ECDSA,etc…)– Symmetric(3DES,Twofish,Blowfish)

• Softprocessors• Encoding– Networkencoding(Reed-Solmon,etc…)– Mediaencoding(JPEG,MPEG,etc…)

• DSP– FFTs,Filters,etc…

ExamplesofLibraries

45

boolexample(ap_uint<32>*inap_uint<32>*out,bool*enabled,

)

Examplehardwaredefinition

46

typedef ap_uint<32>uint32_t_hw;typedef hls::stream<uint32_t_hw>mem_stream32;

boolaes(volatileunsignedintm_mm2s_ctl[500],volatileunsignedintm_s2mm_ctl[500],volatileunsignedsourceAddress,ap_uint<128>*key_in,ap_uint<128>*iv,volatileunsigneddestinationAddress,unsignedint numBytes,intmode,mem_stream32&s_in,mem_stream32&s_out

)

Morecomplicatedhardwaredefinition

47

Theproblem

48

Let’sexaminetheproblem

Processor FPGA

AppXhardware

AppXsoftware

I/O

I/O

Theproblem

49

Processor FPGA

AppXhardware

AppXsoftware

I/O

I/O

First,therearevariousinterconnectsneeded

Theproblem

50

Processor FPGA

AppXhardware

AppXsoftware

I/O

I/O

Controlsignalsandlogicmustalsobeplaced

Theproblem

51

Processor FPGA

AppXhardware

AppXsoftware

I/O

I/O

Theappmayhavecomplexinputs,orneedtointeractwithotherlogic

• AtrustedsystemisbootedwithSecureBoot

• Includedisastaticmodulethatreconfiguresslots

• Thismoduleonlyallowssignedmodulesintoslotsthataccesssensitiveresources

Secureloadingsystem

52

• Buildsoffofpriorresearch…

• …butinawaythatiscompatiblewithvendortools

• Todothis,weleveragethedeploymentmodelformobileapps

Oursolution

53

Recommended