14
Slick: Secure Middleboxes using Shielded Execution Bohdan Trach , Alfred Krohmer , Sergei Arnautov , Franz Gregor , Pramod Bhatotia , Christof Fetzer Technische Universit¨ at Dresden University of Edinburgh Abstract Cloud computing oers the economies of scale for computa- tional resources with the ease of management, elasticity, and fault tolerance. To take advantage of these benets, many enterprises are contemplating to outsource the middlebox processing services in the cloud. However, middleboxes that process condential and private data cannot be securely de- ployed in the untrusted environment of the (edge) cloud. To securely outsource middleboxes to the cloud, the state- of-the-art systems advocate network processing over the encrypted trac. Unfortunately, these systems support only restrictive middlebox functionalities, and incur prohibitively high overheads due to the complex computations involved over the encrypted trac. is motivated the design of S—a secure middlebox framework for deploying high-performance Network Func- tions (NFs) on untrusted commodity servers. S exposes a generic interface based on C to design and implement a wide-range of NFs using its out-of-the box elements and C++ extensions. S leverages S (a shielded execu- tion framework based on I SGX) and I DPDK to securely process condential data at line rate. More specically, S provides hardware-assisted mem- ory protection, and conguration and aestation service for seamless and veriable deployment of middleboxes. We have also added several new features for commonly required func- tionalities: new specialized C elements for secure packet processing, secure shared memory packet transfer for NFs chaining, secure state persistence, an ecient on-NIC timer for SGX enclaves, and memory safety against DPDK-specic Iago aacks. Furthermore, we have implemented several SGX-specic optimizations in S. Our evaluation shows that S achieves near-native throughput and latency. 1 Introduction Modern enterprises ubiquitously deploy network appliances or “middleboxes” to manage their networking infrastruc- ture. ese middleboxes manage a wide-range of workows for improving the eciency (e.g., WAN optimizers), perfor- mance (e.g., caching, proxies), reliability (e.g., load balancers and monitoring), and security (e.g., rewalls and intrusion detection systems). Due to their wide-spread usage in the networking infrastructure, these middleboxes incur signi- cant deployment, maintenance, and management costs [42]. To overcome these limitations, many enterprises are con- templating to outsource the middlebox processing services in the cloud [31, 42]. Cloud computing oers the economies of scale for computational resources with the ease of manage- ment, elasticity, and fault tolerance. e realization of vision: “middleboxes as a service in the cloud” is strengthened by the advancements in soware-dened middleboxes, also known as Network Function Virtualization (NFV) [29]. Network Function Virtualization oers a exible and modular architec- ture that can be easily deployed on the commodity hardware. us, NFV is a perfect candidate to reap the outsourcing benets of the (edge) cloud computing infrastructure. However, middleboxes that process condential and pri- vate data cannot be securely deployed in the untrusted en- vironment of the cloud. In the cloud environment, an ac- cidental or, in some cases, intentional action from a cloud administrator could compromise the condentiality and in- tegrity of execution. ese threats of potential violations to the integrity and condentiality of customer data is oen cited as a key barrier to the adoption of cloud services [36]. To securely outsource middleboxes to the cloud, the state- of-the-art systems advocate network processing over the encrypted trac [25, 43]. However, these systems support only restrictive type of middlebox functionalities, and incur prohibitively high performance overheads since they require complex computations over the encrypted network trac. ese limitations motivated our work—we strive to an- swer the following question: How to securely outsource the middlebox processing service on the untrusted third-party plat- form without sacricing performance while supporting a wide range of enterprise NFs? To answer this question, we propose S—a secure mid- dlebox framework for deploying high-performance Network Functions (NFs) on untrusted commodity servers. e ar- chitecture of S is based on four design principles: (1) Security — we aim to provide strong condentiality and in- tegrity guarantees for the execution of middleboxes against a powerful adversary, (2) Performance — we strive to achieve near-native throughput and latency, (3) Generality — we aim to support a wide range of network functions (same as plain- text processing) with the ease of programmability, and, (4) Transparency — we aim to provide a transparent, portable, and veriable environment for deploying middleboxes. arXiv:1709.04226v2 [cs.CR] 17 Oct 2019

Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

Slick: SecureMiddleboxes using Shielded Execution

Bohdan Trach†, Alfred Krohmer†, Sergei Arnautov†, Franz Gregor†,Pramod Bhatotia‡, Christof Fetzer†

†Technische Universitat Dresden ‡University of Edinburgh

Abstract

Cloud computing o�ers the economies of scale for computa-tional resources with the ease of management, elasticity, andfault tolerance. To take advantage of these bene�ts, manyenterprises are contemplating to outsource the middleboxprocessing services in the cloud. However, middleboxes thatprocess con�dential and private data cannot be securely de-ployed in the untrusted environment of the (edge) cloud.

To securely outsource middleboxes to the cloud, the state-of-the-art systems advocate network processing over theencrypted tra�c. Unfortunately, these systems support onlyrestrictive middlebox functionalities, and incur prohibitivelyhigh overheads due to the complex computations involvedover the encrypted tra�c.�is motivated the design of Slick—a secure middlebox

framework for deploying high-performance Network Func-tions (NFs) on untrusted commodity servers. Slick exposesa generic interface based on Click to design and implementa wide-range of NFs using its out-of-the box elements andC++ extensions. Slick leverages Scone (a shielded execu-tion framework based on Intel SGX) and Intel DPDK tosecurely process con�dential data at line rate.

More speci�cally, Slick provides hardware-assisted mem-ory protection, and con�guration and a�estation service forseamless and veri�able deployment of middleboxes. We havealso added several new features for commonly required func-tionalities: new specialized Click elements for secure packetprocessing, secure shared memory packet transfer for NFschaining, secure state persistence, an e�cient on-NIC timerfor SGX enclaves, and memory safety against DPDK-speci�cIago a�acks. Furthermore, we have implemented severalSGX-speci�c optimizations in Slick. Our evaluation showsthat Slick achieves near-native throughput and latency.

1 Introduction

Modern enterprises ubiquitously deploy network appliancesor “middleboxes” to manage their networking infrastruc-ture. �ese middleboxes manage a wide-range of work�owsfor improving the e�ciency (e.g., WAN optimizers), perfor-mance (e.g., caching, proxies), reliability (e.g., load balancersand monitoring), and security (e.g., �rewalls and intrusiondetection systems). Due to their wide-spread usage in thenetworking infrastructure, these middleboxes incur signi�-cant deployment, maintenance, and management costs [42].

To overcome these limitations, many enterprises are con-templating to outsource the middlebox processing servicesin the cloud [31, 42]. Cloud computing o�ers the economiesof scale for computational resources with the ease of manage-ment, elasticity, and fault tolerance. �e realization of vision:“middleboxes as a service in the cloud” is strengthened by theadvancements in so�ware-de�ned middleboxes, also knownas Network Function Virtualization (NFV) [29]. NetworkFunction Virtualization o�ers a �exible andmodular architec-ture that can be easily deployed on the commodity hardware.�us, NFV is a perfect candidate to reap the outsourcingbene�ts of the (edge) cloud computing infrastructure.However, middleboxes that process con�dential and pri-

vate data cannot be securely deployed in the untrusted en-vironment of the cloud. In the cloud environment, an ac-cidental or, in some cases, intentional action from a cloudadministrator could compromise the con�dentiality and in-tegrity of execution. �ese threats of potential violations tothe integrity and con�dentiality of customer data is o�encited as a key barrier to the adoption of cloud services [36].

To securely outsource middleboxes to the cloud, the state-of-the-art systems advocate network processing over theencrypted tra�c [25, 43]. However, these systems supportonly restrictive type of middlebox functionalities, and incurprohibitively high performance overheads since they requirecomplex computations over the encrypted network tra�c.�ese limitations motivated our work—we strive to an-

swer the following question: How to securely outsource themiddlebox processing service on the untrusted third-party plat-form without sacri�cing performance while supporting a widerange of enterprise NFs?

To answer this question, we propose Slick—a secure mid-dlebox framework for deploying high-performance NetworkFunctions (NFs) on untrusted commodity servers. �e ar-chitecture of Slick is based on four design principles: (1)Security — we aim to provide strong con�dentiality and in-tegrity guarantees for the execution of middleboxes againsta powerful adversary, (2) Performance — we strive to achievenear-native throughput and latency, (3) Generality — we aimto support a wide range of network functions (same as plain-text processing) with the ease of programmability, and, (4)Transparency — we aim to provide a transparent, portable,and veri�able environment for deploying middleboxes.

arX

iv:1

709.

0422

6v2

[cs

.CR

] 1

7 O

ct 2

019

Page 2: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

Slick leverages Click [23] to provide a �exible and mod-ular framework to build a rich set of NFs using its out-of-the box elements and C++ extensions. Slick leverageshardware-assisted secure enclaves based on Intel SGX toprovide strong con�dentiality and integrity properties [14].To achieve high performance despite the inherent limita-tions of the SGX architecture, Slick builds on Scone [35](a shielded execution framework) and Intel DPDK [2] toe�ciently process packets in the userspace secure enclavememory. Finally, Slick builds on the container technologywith a remote a�estation interface [32] to provide a portabledeployment and cryptographic veri�able mechanism.

Using the Slick framework, themiddlebox owner launchesa Slick instance in the cloud and performs remote a�estation,passing Slick an encrypted con�guration in case of success-ful a�estation. �erea�er, Slick executes user-de�ned Clickelements, which are responsible for reading packets in theuserspace directly from NIC, performing network tra�c pro-cessing, and sending them back to the network. All elementsrun inside SGX enclave. Packets that must be processed un-der SGX protection are copied into the enclave explicitly. Wee�ciently execute the expensive network I/O operations (to-and-from the enclave memory) by integrating Scone withDPDK.Furthermore, we have designed several new design fea-

tures for commonly required middlebox functionalities: (a) aremote a�estation and con�guration service for a seamlessand veri�able deployment of middleboxes in the cloud, (b)new Click elements for secure packet processing, (c) an e�-cient and secure shared memory packet transfer in the mul-tiple SGX enclaves setup for NFVs chaining [20], (d) a securestate persistence layer for fault-tolerance/migration [41], (e)an on-NIC PTP clock as the time source for the SGX enclaves,and (f) a memory protection mechanism to defend againstDPDK-speci�c Iago a�acks [13].We have implemented the aforementioned security fea-

tures, and also added several SGX-speci�c performance opti-mizations to Slick. Lastly, we have evaluated the system us-ing a series of microbenchmarks, and two case-studies: a mul-tiport IP Router, and Intrusion Detection System (IDS). Ourevaluation shows that Slick achieves near-native through-put and latency. In order to improve throughput, we limitmemory copying by storing most of the packets outside theenclave, using more e�cient data structures and preallocat-ing memory. In order to reduce latency, we use NIC timerwhen necessary, and avoid unnecessary system calls by mod-ifying Click timer event scheduler.

2 Background

In this section, we present a brief necessary backgroundabout the three technical building blocks of Slick: shieldedexecution, DPDK, and Click.

2.1 Shielded Execution

Shielded execution provides strong con�dentiality and in-tegrity guarantees for unmodi�ed legacy applications run-ning on untrusted platforms. Ourwork builds on Scone [35]—a shielded execution framework based on Intel SGX [14].

Intel So�ware Guard Extensions (SGX). Intel SGX is aset of ISA extensions for Trusted Execution Environments(TEE) released as part of the Skylake architecture. IntelSGX provides an abstraction of secure enclave—a memoryregion for which the CPU guarantees the con�dentiality andintegrity of the data and code residing in it. More speci�cally,the enclave memory is located in the Enclave Page Cache(EPC)—a dedicated memory region protected by MEE, anon-chip Memory Encryption Engine. �e MEE encrypts anddecrypts cache lines with writes and reads in the EPC, respec-tively. �e processor veri�es that the read/write accesses tothe EPC are originated from the enclave code. Furthermore,the MEE veri�es the integrity of the accessed page to detectmemory modi�cations and rollback a�acks.

However, the architecture of SGX su�ers from two majorlimitations: First, the EPC is a limited resource, currentlyrestricted to 128 MB (out of which only 94 MB is availableto all enclaves). To overcome this limitation, SGX supports asecure paging mechanism to an unprotected memory. How-ever, the paging mechanism incurs very high overheads de-pending on thememory access pa�ern (2× to 2000×). Second,the execution of system calls is prohibited inside the enclave.To execute a system call, the executing thread has to exitthe enclave. �e system call arguments need to be copiedin and out of the enclave memory. Such enclave transitionsare expensive—especially, in the context of middleboxes—because of security checks and TLB �ushes.

SCONE. SCONE is a shielded execution framework for un-modi�ed applications based on Intel SGX [35]. In the SCONEframework, the legacy applications are statically compiledand linked against a modi�ed standard C library (SCONElibc). In this model, application’s address space is con�ned tothe enclave memory, and interaction with the outside world(or the untrusted memory) is performed only via the systemcall interface. �e SCONE libc executes system calls out-side the enclave on behalf of the shielded application. �eSCONE framework protects the executing application fromthe outside world, such as untrusted OS, through shields.In particular, shields copy arguments of system calls insideand outside the enclave and provide functionality to trans-parently encrypt the data that leaves the enclave perimeter.Furthermore, SCONE provides a user-level threading mech-anism inside the enclave combined with the asynchronoussystem call mechanism in which threads outside the enclaveasynchronously execute the system calls [45] without forcingthe enclave threads to exit the enclave.

2

Page 3: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

2.2 Intel DPDK and Click

Intel DPDK.We build on the Intel DPDK library [2] thatsupports developing high-performance networked systemsrunning on commodity hardware. �e DPDK library allowsprocessing of L2 packets from NIC directly in the userspace;thus it completely bypasses the OS networking stack to im-prove both the throughput and latency.

�e DPDK library consists of three main components: En-vironment Abstraction Layer (EAL), memory managementsubsystem, and Poll Mode Drivers (PMD). EAL providesa uni�ed way to initialize the central DPDK components(memory management, poll drivers, threading, etc.). �ememory management unit of DPDK utilizes huge pages forthe bu�er management through its own memory allocator(mempool). �e mbuf library provides functionality to storeand process the raw packet data (which is directly mappedin the userspace virtual memory) in the memory blocks al-located from a mempool. Lastly, a PMD uses the memoryrings on the NIC to directly send and receive packets withoutinterrupts, thus achieving high CPU utilization.Click. In addition to DPDK, we leverage Click’s [23] pro-grammable and extensible architecture for the implementa-tion of NFs using its out-of-the-box elements and C++ exten-sions. In particular, Click provides a data�ow programminglanguage that allows construction of middleboxes as a graphof elements—small, reusable, atomic pieces of network tra�cprocessing functionality. �erea�er, Click routes packetsthrough the elements in the data�ow graph. �e modulararchitecture of Click greatly improves the programmability,productivity, and dependability of middleboxes.

3 Overview

Basic design. At a high-level, the core of our system Slickconsists of a simple integration of a DPDK-enabled Clickthat is running inside the SGX enclave using Scone. Figure 1shows the high-level architecture of Slick.

While designing Slick, we need to take into account the ar-chitectural limitations of Intel SGX. As described in §2.1, anenclave context switch (or exiting the enclave synchronouslyfor issuing system calls) is quite expensive in the SGX ar-chitecture. �e Scone framework overcomes this limitationusing an asynchronous system call mechanism [45]. Whilethe asynchronous mechanism is good enough for commonlyused services like HTTP servers or KV stores—it can notsustain the tra�c rates of modern middleboxes. �erefore,we decided to use the userspace DPDK I/O library as a be�er�t for the SGX enclaves to achieve high performance.

Furthermore, we need to ensure that the memory footprintof Slick code and data is minimal, due to several reasons:As described in §2.1, enclaves that use more than 94MB ofphysical memory su�er high performance penalties due toEPC paging (2× to 2000×). In fact, to process data packets

Inside SGX enclave

Userspace

Kernel and

SGX driver

Click

SCONE runtime

DPDK

NIC

Rx Tx

Figure 1: Slick basic design

at line rate, even stricter resource limit must be obeyed—theworking set of the application must �t into the L3 cache.�erefore, our design diligently ensures that we incur min-imum cache misses, and avoid EPC paging.Besides performance reasons, minimizing the code size

inside the enclave leads to a smaller Trusted ComputingBase (TCB) and allows to reduce the a�ack surface. �e coreof Click is already quite small (6MB for a statically linkedbinary section that is loaded in the memory). We decrease itssize by removing the unnecessaryClick elements at the buildtime. Importantly, we designed Slickwith the packet-relatedDPDK data structures running outside of the enclave.

More speci�cally, when the Scone runtime starts the appli-cation, it automatically places the application code, staticallyallocated data, and heap in the SGX-protected memory. �ismechanism is in contrast the way DPDK operates—it by de-fault allocates memory using x86 64 huge page mechanism,which reduces the TLB miss rate. Such pages are not sup-ported inside the enclave; besides that, NIC can only deliverpackets to the unprotected memory, and network tra�c en-tering or leaving machine can be modi�ed by a�acker. �ere-fore, we keep the huge pages enabled in DPDK outside theenclave, and explicitly copy packets that must be processedwith SGX protection into the enclave. With this scheme,DPDK-created packet data structures are allocated outsidethe SGX enclave. We support an e�cient data transfer be-tween the DPDK and enclave and processing inside the en-clave using the new secure Click elements (detailed in §4.2).When Slick starts, it performs remote a�estation and

obtains the con�guration. Slick initializes the DPDK sub-systems, allocates huge page memory and takes the controlover NICs that are available. �en, it starts running Clickelement scheduler, which reads packets from the NIC andpasses them along the processing chain until they leave thesystem or are dropped. Packets that need SGX protectionare explicitly copied into enclave.

�reatmodel. We target a scenario where the middleboxesthat process con�dential and sensitive data are deployed inthe untrusted cloud environment (or at the edge computingnodes at ISPs) [42]. Although the cloud providers are usuallycontractually obligated to not interfere with the compute

3

Page 4: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

and storage units, an accidental leakage, or even a maliciouscloud operator might compromise the con�dentiality andintegrity of the execution [36]. In the context of middle-boxes [25, 43], a�ackers might try to learn the contents ofthe encrypted data packets and con�guration of the systemsuch as cryptographic keys, �ltering and classi�cation rules,etc. Furthermore, a�ackers might try to compromise theintegrity of a middlebox by subverting its execution.To circumvent such a�acks, we protect against a very

powerful adversary even in the presence of complex layersof so�ware in the virtualized cloud computing infrastructure.More speci�cally, we assume that the adversary can controlthe entire system so�ware stack, including the OS or thehypervisor, and is able to launch physical a�acks, such asperforming memory probes.However, we note that Slick is not designed to protect

against side-channel a�acks [48], such as exploiting timingand page fault information. Furthermore, since the under-lying infrastructure is controlled by the cloud operator wecannot defend against the denial-of-service a�acks.

System work�ow. Figure 2 shows the system work�owof Slick. As a preparation to the deployment, developersbuild middlebox container images, and upload them to animage repository (such as Docker Hub [1]) using the Slicktoolchain. Network operator, on the other hand, must boot-strap a Con�guration and A�estation Service (CAS) on atrusted host, and a Local A�estation Service (LAS) on thehost that will be running the middlebox. A�er this, Slickcan be installed on the target machine in the cloud usingthe container technology—either manually or deployed as acontainer image from the image repository. Alternatively, itcan be installed by transferring a single binary to the targetmachine.

�e Slick framework is bootstrapped using the Con�gura-tion and Remote A�estation Service (CAS). �e CAS serviceis launched either inside an SGX enclave of a (already boot-strapped) untrusted machine in the cloud or on a trustedmachine under the control of the middlebox operator out-side the cloud. Middlebox developers implement the neces-sary NFs as Click con�gurations and send them to the CASservice together with all necessary secrets.

Once the operator launched Slick in the cloud, it will con-nect to CAS and carry out the remote a�estation (detailedin §4.1). If the remote a�estation is successful, the Slickinstance receives the con�guration and necessary secrets.A�er that, Slick can start processing the network tra�c,with all the bene�ts:

• Security: Slick provides strong con�dentiality andintegrity for the middlebox execution. We leveragehardware-assisted SGX memory enclaves to providestrong security properties.

Configuration and

Attestation Service

(CAS)

Middlebox

developer

Slick runtime & LAS

Middlebox

image repository

(e.g. Docker Hub)

Network

operator

Step #1

Workflow steps:

#1: Build and host middlebox images using the Slick toolchain

#2: Launch the CAS service on a trusted host

#3: Install the LAS service on a Slick host

#4: Install a Slick middlebox from the repository

#5: Provide Slick configuration and secrets to CAS

#6: Perform remote attestation, configuration, and launch Slick

Steps #2, #5 Step #6

Steps #3, #6

Step #4

Figure 2: Slick systemwork�ow

• Performance: Slick achieves near-native through-put and latency by building a high-performance net-working I/O architecture for the shielded executionof middleboxes. We overcome the architectural lim-itations of Intel SGX enclaves by optimizing thecombination of Scone and DPDK.• Generality: Slick supports a wide-range of NFVs,same as supported in the plain-text network pro-cessing, without restricting any functionalities. Weleverage Click to provide a general framework toimplement a wide-range of NFs.• Transparency: Lastly, Slick provides network op-

erators a portable, con�gurable, and veri�able archi-tecture for the middlebox deployment in cloud. Webuild on the container technologywith con�gurationand a�estation services for a seamless deployment.

Limitation. We note that neither DPDK nor Click support�ow-based stateful tra�c. �is implies that Slick currentlysupports NFs that work on L2 and L3; as only restricted pro-cessing of L4-L6 tra�c is supported. We plan to extend oursystem to support �ow-based tra�c with shielded executionby integrating a user-level networking stack [18].

4 Design Details

We next present the design details of Slick. Figure 3 showsthe detailed architecture of Slick.4.1 Con�guration and Remote Attestation

To bootstrap a trusted middlebox system in the cloud, onehas to establish trust in the system components. While IntelSGX provides a remote a�estation feature, a holistic systemmust be built for remote a�estation and secure con�gurationof middleboxes [37]. To achieve this goal, we have designed aCon�guration andA�estation Service (CAS) formiddleboxes.Figure 4 shows the system protocol for the CAS service.

4

Page 5: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

Click

DPDK

Untrusted runtimeSystem call threads

Huge pages

Kernel

SGX Enclave: Slick Userspace NIC

Code and dataConfiguration and secrets

Packet descriptorsProtected packets

Code and dataNIC driversPlatform abstraction

Control data structures

SCONE

Code and dataTrusted runtime

(All Click stateis protected)

SGX ModuleUIO Module

Transmit queuesReceive queues

PTP Clock

Enclave creation &system calls

NIC Timer Access (4.5)

Packet Rx/Tx

Protected packet copy

Ring packet IO

NFV Chaining (4.3)

Ring

Iago Attack Protection (4.6)

State Persistence (4.4)

Remote Attestation &Configuration Service (4.1)

New Elements (4.2)

Figure 3: Slick detailed design

In order to a�est an enclave using Intel Remote A�es-tation, veri�er (operator of a Slick instance) connects tothe application and requests a quote. �e enclave requestsa report from SGX hardware and transmits it to the Intel�oting Enclave (QE), which veri�es, signs, and sends backthe report. �e enclave then forwards it to the veri�er. �isquote can be veri�ed using the Intel veri�cation service [3].

Our remote a�estation system extends Intel’s RA service,and is integrated with a con�guration system, which provi-sions Slick with its con�guration in a secure way using atrusted channel established during a�estation. �is systemconsists of enclave-level library, Local A�estation Service(LAS), and Con�guration and A�estation Service (CAS).

• Enclave library interacts with LAS and CAS to carryout remote a�estation, and allows se�ing environ-ment variables, command-line arguments, and keysfor the Scone shielding layer in a secure and con�-dential manner.• Local A�estation Service is running on the same ma-chine as Slick middlebox, and acts as a proxy forinteraction with the Intel A�estation Service (IAS). Italso acts as the intermediate root of trust: once LASis a�ested, further Slick instances can be launchedeven when IAS is unavailable.• Con�guration A�estation Service is running on asingle (possibly replicated) node and stores con�g-uration and secrets of the services built with Scone.It maintains information about a�ested enclave in-stances, and provisions con�guration to applicationsusing the Enclave Library.

To bootstrap the system, the operator launches CAS, eitheron the host under his control or on the host in the cloud in-side an SGX enclave. �en, the CAS service is populated withcon�gurations and secrets using the RESTAPI or a command-line con�guration tool. LAS and IAS instances are launchedon cloud hosts that will run Slick instances. During startup,each Slick instance establishes a TLS connection to CAS. Si-multaneously, it connects to LAS to perform local a�estation.In the case the LAS instance is not yet a�ested, it will a�est it-self to CAS using Intel Remote A�estation protocol, and thena�est Slick instance to CAS using SGX local a�estation. Lo-cal a�estation veri�es integrity of the Slick binary, and estab-lishes whether Slick is running under SGX protection. �isallows to remove distribution mechanisms (such as DockerHub) from the TCB. A�er that, Slick sends the LAS quote toCAS, and a secure channel is established between CAS andSlick. �erea�er, Slick obtains its con�guration from theCAS service and transfers control to main Slick code.4.2 Secure Elements

As described in §3, we designed Slickwith the packet-relateddata structures of DPDK running outside the enclave. �ere-fore, we needed an e�cient way to support the communi-cation between DPDK and the enclave memory region. Inparticular, we have to consider the overheads of accessing theSGX-encrypted pages from the main memory and copyingof the data between the protected and unprotected memoryregions. When possible, the data packets with plain-textcontents should not be needlessly copied into the enclave,as it will degrade the performance. �erefore, we designedspecialized secure Click elements (shown in Table 1) forcopying the data packets into/outside the enclave to facili-tate e�cient communication.

5

Page 6: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

CAS Slick LASOperatorPopulate

configuration

IASSlick Machine

TLS Connectionestablishment

Attestation request

IntelRemote Attestation

Intel Attestation Quotefor verification

Attestation reply

Attestation reply

Configurationand secrets

Only after LASstartup

Configurationrequest

Figure 4: Slick’s con�guration and attestation

By default, packets are read from NIC queues into theuntrusted memory. �is reduces the overhead of using SGXwhen processing packets that are not encrypted and can besafely treated with less security mechanisms involved. Suchpackets are immediately forwarded or dropped upon headerinspection. On the other hand, we must move packets intothe enclave memory with explicit copy element. We haveimplemented such an element (ToEnclave), and use it toconstruct secure packet processing chains.

We have also added support for the commonly used AES-GCM cipher into Slick (Seal and Unseal elements). �isallows us to construct VPN systems that use modern crypto-graphic mechanisms. �is element was implemented usingIntel ISA-L crypto library. In order to allow secure key gen-eration inside the enclave, we have exposed Scone functionsfor ge�ing SGXSeal keys to the Slick internal APIs.To allow building high-performance IDS systems based

using Slick, we have created and element based on theHyperScan regular expression library. It allows fast match-ing of a number of regular expressions for the incoming pack-ets, simplifying implementation of systems like Snort [5].

We have also added elements that implement more broadmechanisms: DPDKRing (§4.3) for NFV chaining, and StateFile(§4.4) for state persistence in middleboxes.

4.3 NFVs Chaining

Typically NFVs are chained together to build a data�ow pro-cessing graph with multiple Click elements, spanning acrossmultiple cores, sockets, and machines [20, 31]. �e communi-cation between di�erent cores and sockets happens throughthe shared memory, and communication across machinesvia NICs over RDMA/In�niband. DPDK supports NUMAsystems and always explicitly tries to allocate memory fromthe local socket RAM nodes.

ToEnclaveTransfers a packet to enclave, frees the original packet

Seal(Key, Security Association state)Encrypts the packet with AES-GCM

Unseal(Key, Security Association state)Decrypts the packet with AES-GCM

HyperScan(rule database)High-performance regular expression matching engine

DPDKRing(Ring name)Transfers a packet to the DPDK ring structures

StateFile(Key, path)Provides se�ings to the persistent state engineTable 1: Slick new specialized elements

However, unlike normal POSIX applications, SGX enclavescan not be shared across di�erent sockets. (�e future SGX-enabled servers might have support for the NUMA architec-ture.) As a result, in the current Intel SGX architecture, theusers would need to run one Slick instance per each CPUsocket. Another reason for cross-instance chaining is collo-cation of middleboxes from di�erent developers that do notnecessarily trust each other. In this case, developers wouldwant to leverage SGX to protect the secrets. �erefore, it isimperative for the Slick framework to provide an e�cientcommunication mechanisms between enclaves to supporthigh-performance NFVs chaining.We built an e�cient mechanism for communication be-

tween di�erent Slick instances by leveraging existing DPDKfeatures. In particular, DPDK already provides a buildingblock for high performance communication between di�er-ent threads or processes with its ring API. �is API containshighly optimized implementations of concurrent, locklessFIFO queues which use huge page memory for storage. Wehave implemented the DPDKRing element (see Table 1) forSlick to utilize it for chaining. As huge page memory isshared between multiple Slick instances, the ring bu�ersare shared as well and can be used as an e�cient way ofcommunication between multiple Slick processes.�is solution requires assigning ownership of all shared

data structures to a single process. For this, we rely onDPDKdistinction between primary and secondary processes. Pri-mary processes, the default type, request huge page memoryfrom the operating system, allocate memory pools and ini-tialize the hardware. Secondary processes skip device initial-ization and map the huge page memory already requested bythe primary process into their own address space. In order tosupport NFV chaining using multiple processes on the samemachine, we added support for starting Slick instances assecondary DPDK processes.Depending on the process type, the DPDKRing element

either creates a new ring (primary process) or looks up an

6

Page 7: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

Seal(StateFile)Seals elements’ state in the StateFile

Unseal(StateFile)Unseals elements’ state from the StateFile

Persist(timer, StateFile)Periodically persists the state to StateFileTable 2: SlickAPIs for state persistence

existing ring (secondary process) in huge page memory. InSlick, packets pushed towards a DPDKRing element areenqueued into the ring and can be dequeued from the ringin another process for further processing. A bidirectionalcommunication between two processes can be establishedby using a pair of rings.

4.4 Middlebox State Persistence

Middleboxes o�en maintain useful state (such as countervalues, Ethernet switch mapping, activity logs, routing table,etc.) for fault-tolerance [41], migration [30], diagnostics [47],etc. To securely persist this state, we extend Slick with newAPIs (shown in Table 2) for the state persistence in middle-boxes. �e Seal primitive is used to collect the state thatmust be persisted from the elements, and write it down inencrypted form to disk. Unseal reads this state from disk,decrypts it and populates the elements with this state.

To con�gure this functionality, we have added a new con-�guration element to Slick, called StateFile (see Table 1).Its parameters are �le to which state should be wri�en andthe key that should be used for encryption. Note that thisinformation is transmi�ed to Slick instance in the con�g-uration string via remote a�estation, and is not accessibleoutside the enclave. We don’t use Scone �le system shield,and instead encrypt and decrypt �le as a single block. �isensures con�dentiality and integrity of stored data via theuse of AES-GCM cipher.We do not a�empt to extract the relevant state transpar-

ently. Instead, we rely on programmer providing necessaryserialization routines that save only necessary parts of el-ement state. �ese routines are available in Slick as readand write handlers, and are triggered in the Slick startupprocedure a�er the con�guration is loaded, parsed, and ini-tialization of the basic components is �nished, or manuallyvia ControlSocket interface of the StateFile element. It’salso possible to trigger them periodically via a timer.

4.5 NIC Time Source

Timer is one of the commonly used functionalities in middle-boxes [29, 31]. It is used for a variety of purposes such asmea-suring performance, scheduling execution of periodically-triggered NFs, etc.

�e timemeasurement can be �ne-grained or coarse-grainedbased on the application requirements. For the �ne-grained

cycle-level measurements, developers use rdtscp instruc-tion, which is extremely cheap and precise. Whereas for thecoarse-grained measurements, applications invoke systemcalls like gettimeofday or clock gettime.However, in the context SGX enclaves, both rdtsc and

system calls have unacceptable latency for their use in mid-dleboxes to process the network tra�c at line rate. Morespeci�cally, the rdtscp instruction is forbidden inside the en-clave, and therefore, it causes an enclave exit event; whereas,asynchronous system calls in Scone are submi�ed though asystem call queue that is optimized for the raw throughput,but not latency.To overcome these limitations, we have opted to use the

on-NIC PTP clock as the clock source for the enclave. �isclock can be read inside the enclave reasonably fast (0.9µsec, which is on the same scale of magnitude as readingHPET). Moreover, it neither causes enclave exits nor requiressubmi�ing system calls. Furthermore, the on-NIC clock isextremely precise since it is intended to use for the PTPsynchronization protocol.We stress that this time source is not secure, and can be

used as a denial of service a�ack vector by the maliciousoperation system. However, the same is true for the othertime sources—a trusted, e�cient and precise time source forSGX enclaves remains an unsolved problem that will likelyrequire changes to the CPU hardware [39].

4.6 DPDK-Speci�c Iago Attacks

Iago a�acks [13] are a serious class of security a�acks thatcan be launched on shielded execution to compromise thecon�dentiality and integrity properties. In particular, an Iagoa�ack originates through malicious inputs supplied by theuntrusted code to the trusted code. In the classical se�ing, amalicious OS can subvert the execution of an SGX-protectedapplication by exploiting the application’s dependency oncorrect values of system call return values [11].

�e decision (§3) to allocate huge pages for packet bu�ersand DPDK rings has security implications. �e fact thatpackets are passed through rings by reference, and DPDKbu�ers contain pointers, opens a new a�ack surface on Slick.A�ackers with access to this memory region could modifypointers to point into the SGX-protected regions and makethe enclave inadvertently leak secrets over the network.To protect against DPDK-speci�c Iago a�acks, we have

implemented a pointer validation function. More speci�-cally, the scheme uses an enclave parameter structure thatis located inside the enclave memory and de�nes the en-clave memory boundaries. Pointers are validated by check-ing if they do not point into the enclave memory range[base,base + enclave size). We note that Slick is alreadyprotected against the classical syscall-speci�c Iago a�acksthrough Scone’s shielding interfaces.

7

Page 8: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

�is ensures that no pointers possibly pointing to the se-crets stored in EPC are accepted through the unprotectedhuge page memory. Pointers can still be modi�ed by a ma-licious a�acker, but they can only point to the unprotectedmemory.

As it is possible for an application to enqueue and dequeuearbitrary pointers into DPDK’s rte ring structures, it is noteasily possible to integrate this pointer check directly intoDPDK. Instead, we implemented these pointer checks in theDPDKRing and FromDPDKDevice (§4.3) elements. If Slick de-tects a malicious pointer, it assumes an a�ack, noti�es theapplication operator and drops the packet.

5 Implementation

5.1 Toolchain

We built Slick’s toolchain using DPDK (version 16.11) andClick (master branch commit 0e860a93). We further in-tegrate it with the Scone runtime to compile Slick. �etoolchain is based on the musl-cross-make [4] project, mod-i�ed to use Scone libc instead of the standard musl-libc.We use gcc version 6.3.0 for the compilation process. �isenables us to use both C and C++ code. musl-cross-makeis also used to compile native version of Click for the eval-uation. We used Boost C++ library (version 1.63) to build astatic version of the Hyperscan high performance regular ex-pression matching engine (master branch commit 7aff6f61)and incorporated it into Slick. We use WolfSSL library [7]to implement the remote a�estation system and StateFilesealing, and packet Seal/Unseal elements. �e toolchaincontains automated scripts for building and deploying mid-dlebox container images, and se�ing up Slick and CAS ser-vices (as described in the system work�ow in §3).

To make the compilation of Slick work with the Sconetoolchain, some changes to DPDK were necessary. In par-ticular, we need to remove the helper functions for printingstack tracebacks and provide some glibc-speci�c structuresand macros as well as kernel header �les. Click is imple-mented in C++ using mostly high level APIs and required noadaptions.�e resulting Slick binary is 8.2 MB in size, and around

16 MB including minimal runtime stack and heap allocation.�is implies that we could run roughly up to six instances ofSlick in parallel on one processor without impacting the per-formance by EPC paging (system-wide EPC limit is 94 MB).5.2 MemoryManagement

Scone implements it’s own, in-enclave memory manage-ment, using EPC pages for all malloc and mmap anonymousmemory allocation calls. Allocating huge page memorythrough this allocator is not possible without modi�cations.However, accessing huge pages in DPDK does not nec-

essarily require bypassing Scone, because of the speci�cway DPDK uses to allocate huge pages. Instead of passing

MAP HUGETLB �ag to mmap(), it opens shared memory �les inthe hugetlbfs virtual �lesystem and passes those �le de-scriptors to mmap call, which is not protected by Scone. �ismmap �le-to-memory mapping request is directly passed tothe operating system by Scone.5.3 Optimizations

In order to further improve the performance, especially forthe case of DPDK running inside the enclave, we optimizedthe data path inside Click. We used the Linux performancepro�ling tool perf [6] to �nd the bo�lenecks.Memory pre-allocation. �e FromDPDK element allocatedmemory for packet descriptor storage on the stack each timethe run task function was called. We pre-allocated thismemory once in a constructor.Branching hints. We inserted GCC-speci�c unlikely /likely macros in several if-clauses. �ese get translated toplatform speci�c instructions which instruct the processor toalways try the given branch �rst instead of using its built-inbranch prediction.Modulo operations. We replaced all modulo operations inthe data path by compare-and-subtract operations (whichare way less expensive in terms of processing time).�eue optimization. In the ToDPDKDevice Click elementwe replaced the ine�cient implementation of the queue(which used std::vector from the C++ standard library)by the rte ring structure provided by DPDK.Timer event scheduler optimization. In the Click timerevent scheduler, we have optimized the code to reduce theamount of clock gettime system calls. �is allowed us toreduce the latency in short element chains to the native level.

6 Evaluation

6.1 Experimental Setup

Testbed. We benchmark our system using two machines: (1)load generator, and (2) SGX-enabled machine. �e load gen-erator is a Broadwell Intel Xeon D-1540 (2.00GHz, 8 cores, 16hyperthreads) machine with 32 GB RAM. �e SGX machineunder test is Intel Xeon E3-1270 v5 (3.60GHz, 4 cores, 8 hyper-threads) with 32 GB RAM running Linux kernel 4.4. Eachcore has private 32KB L1 and 256KB L2 caches, and all coresshare a 8MB L3 cache. �e load generator is connected to thetest machine using 40 GbE Intel XL-710 network card. Weuse pktgen-dpdk for throughput testing. �e test machineis con�gured to use the maximum number of cores. �e loadgenerator saturates the link starting with 128 byte packets.Applications. For the microbenchmarks, we use three ba-sic Click elements: (1) Wire, which sends the packet im-mediately a�er receiving; (2) EtherMirror, which sendsthe packet a�er swapping the source and destination ad-dresses; and (3) Firewall, which does packet �ltering based

8

Page 9: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 5: Wire throughput w/ varying packet size

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 6: EtherMirror throughputw/ varying packet

size

on PCAP-like rules. �e Firewall element is con�guredwith 10 rules.

For the case-studies, we evaluated Slick using two appli-cations: (1) a multiport IPRouter, and (2) an IDS.

Methodology. For the performance measurements, we con-sider several cases of our system:

• Native (SGX-independent) with and without genericoptimizations.• SGX-enabled Slick with and without optimizations.• SGX-enabled Slick with the on-NIC timer.

6.2 �roughput

We �rst measure the throughput of three microbenchmarkapplications with varying packet size. Slick is running onfour cores. Figure 5, Figure 6, and Figure 7 present thethroughput for Wire, Ethermirror, and Firewall, respec-tively.�e results show that the performance of Slick matches

the performance of Click. In the case of Wire applicationwith the packet sizes smaller than 256 bytes, Slick is be�erthan the native version. �is is explained by the fact thatClick timer event scheduler optimization is missing in thenative Click, which removes some system call overheadfrom the Wire application. �e impact is smaller with otherapplications, because they contain elements that reduce therelative overhead of Click scheduler. We also see that Slickachieves line rate at 512 byte packets.

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 7: Firewall throughput w/ varying packet size

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7

Thro

ughput, G

B/s

Cores

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 8: Wire throughput w/ increasing cores

6.3 Scalability

We next consider the scalability of our system with increas-ing number of cores. (Our latest SGX-enabled server hasmax-imum of 4 cores / 8 HT. Recently released Intel X-Series willconsist of 18 cores.) Figure 8, Figure 9, and Figure 10 presentthe throughput for Wire, Ethermirror, and Firewall, re-spectively. �e scalability of both Slick and Click is limited.We can see that the performance for both native and Slickpeaks at four cores. �ere are several reasons for this:

• DPDK and Slick work best with hyperthreadingdisabled.• Slick runs system call threads to execute systemcalls asynchronously. When there are no systemcalls for a long time, they back-o� (i.e. yield theCPU). �ey wake up from back-o� periodically, andas result, they preempt in-enclave threads.

6.4 Latency

To evaluate the in�uence of Slick’s runtime environmenton latency, we have also measured the packet processinglatency using the following scheme: load generator runs anapplication, continuously generates a UDP packet and waitsfor its return spinning. We study packet round trip timemeasured at the load generator. On the Slick instance, weare running the EtherMirror application. (We omit the re-sults for other applications due to the space constraints.) Forthis measurements, we did not perform any latency-speci�ctuning of the environment other than thread pinning, whichis enabled by default in DPDK. We expect that a productionsystem with stringent requirements to low latency will use

9

Page 10: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7

Thro

ughput, G

B/s

Cores

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 9: EtherMirror throughput w/ increasing

cores

0

5

10

15

20

25

30

35

40

1 2 3 4 5 6 7

Thro

ughput, G

B/s

Cores

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 10: Firewall throughput w/ increasing cores

SCHED FIFO scheduler and have isolated cores. Figure 11present the latency measurements for EtherMirror.�e low performance of Slick without optimizations is

explained by the fact that Slick executes clock gettime sys-tem calls in the timer event scheduling code. Scone systemcalls are optimized for raw throughput with a large numberof threads, but not for low latency; this makes the latencymeasurement result 3× worse than the native execution. Wehave considered the following latency optimizations:

• Reduced system call rate for immediately-scheduledtimer events. It removes one system call round-tripfrom the packet latency.• Modi�ed scheduler that prioritizes immediately-sche-

duled events and allows to remove a system call fromscheduler if there are no periodic timer events.

One of the surprising results that we have is that each ofthese optimizations does not have a statistically signi�cantin�uence when applied individually, which can be explainedby the fact that once the system call thread has le� the back-o� mode, it will execute system calls with low individualoverhead. On the other hand, when applied simultaneously,they return the latency to almost-native levels—in�uence ofSGX and Scone on the latency is extremely small.We consider using NIC timer as a separate optimization.

One can see that reading NIC timer is a costly operation;it happens twice per packet in our measurements, addingapproximately 0.9∗2= 1.8µsec to the total latency. On the

0

10

20

30

40

50

60

64 128 256 512 1024 1500

Late

ncy, µsec

Packet Size, bytes

NativeSlick w/o opt.

Slick + mod.sched.Slick + red.syscalls

Slick + MS + RSSlick + NIC timer

Figure 11: EtherMirror latencymeasurement

Phase Average Duration, µsecA�estation 19467

CAS communication 19301LAS communication 1474

Con�guration 825.6Total time 26368

Table 3: Overheads of con�guration and attestation

other hand, it’s much faster than executing clock-reading sys-tem calls, and can further improve system timeliness whencombined with other optimizations.6.5 Con�guration and Attestation

We next evaluate the overheads of the con�guration anda�estation service in Slick. �e measurement results arepresented in Table 3. �e results show that remote a�estationhas negligible e�ect on the Slick startup time. Furthermore,even though TLS session establishment is a costly operation,it is performed once per instance start-up, allowing operatorto use a single CAS node for thousands of Slick instances.6.6 ToEnclave Overhead

Wenextmeasure the throughput of the new secure ToEnclaveelement added in Slick, which is used to copy the packetdata inside SGX enclave protected memory. We evaluate theimpact of this extra data copy by measuring the throughputscaling with varying packet size. Figure 12 and Figure 13show the results for Wire and EtherMirror, respectively.

As we can see that the overhead of the extra memory copypeaks with small packet sizes. �is is due to the fact that foreach received packet, operations with rather high overheadmust be executed to allocate the packet. One way to reducethis cost would be to batch the memory allocation for allpackets. It can be also seen that the overhead with Slickwhen compared to the native execution is relatively small:Slick with ToEnclave is running within 88% of the nativeversion with extra memory copy in the worst case of smallpacket sizes, and within 60% of the native Click withoutToEnclave element.6.7 NFVs Chaining

To measure the throughput of the NFV chaining scheme,we have implemented a chaining application. �e chain-ing application implements packet communication between

10

Page 11: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

Native Native + ToEnc Slick Slick + ToEnc

Figure 12: �roughput w/ ToEnclave +Wire

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

Native Native + ToEnc Slick Slick + ToEnc

Figure 13: �roughput w/ ToEnclave + EtherMirror

two Slick instances running on the same machine through aDPDK packet ring. One instance contains an application thatreceives packets from network and sends them to the otherinstance via the DPDKRing element. �e second instance re-ceives packets from the ring and sends them back througha di�erent DPDKRing element. �ese packets are received bythe �rst Slick forming a circular ring. �erea�er, the pack-ets are transmi�ed back to the load generating node. Pleasenote that the packets cross the rings twice. �e chainingapplication showcases the worst-case throughput since theClick elements are not performing any computation on thenetwork packets.Figure 14 presents the throughput with varying packet

size for the NFV chaining application. �e results show thatusing the ring communication causes a substantial perfor-mance drop for Slick independent of the optimizations. �isis mostly related to the way Scone runs enclaves—it mustallocate constantly-running thread for the service threadscreated by Slick andDPDK. Due to this, there is interferencebetween the service threads and processing cores, which de-creases the throughput and also increases the result variance.

6.8 Packet Sealing Performance

We next evaluate the throughput of the Seal/Unseal secureelements. In particular, we use our AES-GCM encryptioncode running inside the SGX enclave (the same code is usedfor middlebox state persistence code). Figure 15 presentsthe throughput of the Seal element with varying packetsize. �e result shows that the code inside SGX enclave runswithin 88% of the native performance irrespective of the op-timizations applied. �is is explained by the fact that most

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 14: NFV chaining application throughput

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 15: Seal application throughput

of the application CPU time is spent doing the encryption.�e di�erence between the native and SGX version can beexplained by di�erent thread scheduling strategies used byScone and native POSIX. In POSIX, threads are pinned to thereal CPU cores, while in Scone, the userspace threads insideenclave are pinned to the in-enclave kernel threads. �ismakes thread pinning non-deterministic—sometimes twothreads that are to be pinned to di�erent cores are pinnedto sibling hyper-threads.6.9 Case Studies

We next evaluate Slick’s performance with the followingtwo case-studies: (1) IPRouter, and (2) IDS.IPRouter. IPRouter application is an adaptation of a multi-port routerClick example application to our evaluation hard-ware. �is application �rst classi�es all packets into threecategories: ARP requests, ARP replies, and all other packets.ARP requests are answered. ARP replies are dropped. Otherpackets are passed to a routing table element that sendsthem to the NIC output port. Figure 16 shows the through-put of the IPRouter application with varying packet size.We can see that Slick has the same performance as Clickwith packet sizes bigger than 256 bytes, and performs within90% of Click with smaller packets.We also measured the latency of the IPRouter applica-

tion as presented in Figure 17. We can see that even if thenumber of elements in the application increases, latency ofthe application remains the same as the native execution.Intrusion Detection System (IDS). IDS application imple-ments NF that is commonly found in the enterprise network.It pushes the tra�c through the �rewall, and then performs

11

Page 12: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 16: IPRouter throughputmeasurements

0

10

20

30

40

50

60

64 128 256 512 1024 1500

Late

ncy, µsec

Packet Size, bytes

NativeSlick w/o opt.

Slick + mod.sched.Slick + red.syscalls

Slick + MS + RSSlick + NIC timer

Figure 17: IPRouter latencymeasurements

0

5

10

15

20

25

30

35

40

64 128256

5121024

1500

Thro

ughput, G

B/s

Packet Size, bytes

NativeNative w/o opt.

SlickSlick w/o opt.

Slick + NIC timer

Figure 18: IDS throughputmeasurements

tra�c scanning with the HyperScan element. Tra�c thatdoes not match any pa�ern is sent to the output, while match-ing tra�c passes through a counter and then dropped.Slick performs as close to the native Click execution

with a slight performance drop. �is drop comes from thegeneral SGX overhead for memory accesses. Due to the spaceconstraints, we omit the latency measurement results for IDS.

7 RelatedWork

So�waremiddleboxes. Click’s [23] modular architecturehas been leveraged in the research community to build manyuseful so�ware-based middlebox platforms [10, 12, 19, 19, 27,29]. Our work also builds on the Click architecture, but un-like the previous work, Slick focuses on securing the Clickarchitecture on the untrusted commodity hardware.

However, most Click-based middleboxes operate at L2-L3layer, with the notable exception of CliMB [26]. To sup-port �ow-based abstractions, many state-of-the-art middle-boxes [8, 9, 17, 28, 28, 40] support comprehensive applica-tions and use-cases. Since both Click and DPDK are geared

toward L2-L3 network processing, our current architecturedoes not support L4-L7 network functions. As part of thefuture work, we plan to integrate a high-performance user-level networking stack [18] in the Scone framework to sup-port the development of secure higher layer middleboxes.Securemiddleboxes. APLOMB [42] is one the �rst systemsto showcase that it is a viable alternative, performance- andcost-wise, to outsourcemiddleboxes from the enterprise envi-ronment to the cloud. However, APLOMB does not considerthe security implications of outsourcing in the cloud.To overcome the limitation of APLOMB, the follow up

systems, namely Embark [25] and BlindBox [43], advocatenetwork data processing over the encrypted tra�c. In partic-ular, BlindBox [43] proposes an encryption scheme based ongarbled circuits to support string matching operations overencrypted tra�c. However, Blindbox supports only restric-tive type of middlebox functionalities, such as NFs for deeppacket inspection. To overcome this limitation, Embark [25]extends BlindBox to support a wider range of middleboxfunctions. However, Embark su�ers from prohibitively lowperformance as it involves complex cryptographic computa-tions over the encrypted network tra�c. In contrast, Slicksupports a wide range of network functions (same as plain-text), and achieves a near-native throughput and latency.�e recently published workshop papers [15, 22] have

elaborated the challenges and potential usages of SGX in thenetwork applications. In the domain of network-intensiveapplications, SGX-Tor [21] is one of the �rst systems to useSGX to enhance the security and privacy of Tor. In a similarvein, CBR [33] leverages SGX to support privacy-preservingrouting. Likewise, the Slick project builds the �rst compre-hensive system using Intel SGX to secure middleboxes.Shielded execution. Shielded execution provides strongsecurity guarantees for legacy applications running on un-trusted platforms [11, 24, 35, 38, 44, 46]. Our work leveragesshielded execution based on Intel SGX. It is worth notingthat unlike the prior usage of shielded execution for com-monly used services like HTTP servers or KV stores, weneed to adapt the shielded execution to process the networktra�c at line rates. To achieve this, Slick is the �rst systemthat integrates a high-speed packet I/O framework [2, 16, 34]with shielded execution.

8 Conclusion

In this paper, we presented the design, implementation, andevaluation of Slick—a secure middlebox framework for de-ploying high-performance Network Functions (NFs) on un-trusted commodity servers. Slick exposes a generic interfacebased on Click to design and implement a wide-range of NFsusing its out-of-the box elements and C++ extensions. Tosecurely process data at line rate, Slick is the �rst system tointegrate a high-performance I/O processing library (Intel

12

Page 13: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

DPDK) with a shielded execution (Scone) framework basedon Intel SGX. We have also added several new useful fea-tures, and optimizations for secure network processing. Ourevaluation using a wide-range of NFs and case-studies showthat Slick achieves near-native throughput and latency.Acknowledgements. �is project was funded by the Euro-pean Union’s Horizon 2020 program under grant agreementsNo. 645011 (SERECA), No. 690111 (SecureCloud), and No.690588 (SELIS).References

[1] Docker Hub. h�ps://hub.docker.com/.[2] Intel DPDK. h�p://dpdk.org/.[3] Intel So�ware Guard Extensions Remote A�estation End-to-End Ex-

ample. h�ps://so�ware.intel.com/en-us/articles/intel-so�ware-guard-extensions-remote-a�estation-end-to-end-example.

[4] musl-cross-make. h�ps://github.com/richfelker/musl-cross-make.[5] Snort. h�ps://www.snort.org/.[6] perf: Linux pro�ling with performance counters. h�ps://perf.wiki.

kernel.org/index.php/Main Page.[7] Wolf SSL Library. h�ps://www.wolfssl.com/.[8] A. Alim, R. G. Clegg, L. Mai, L. Rupprecht, E. Seckler, P. Costa,

P. Pietzuch, A. L. Wolf, N. Sultana, J. Crowcro�, A. Madhavapeddy,A. W. Moore, R. Mortier, M. Koleni, L. Oviedo, M. Migliavacca, andD. McAuley. FLICK: Developing and Running Application-Speci�cNetwork Services. In Proceedings of the USENIX Annual TechnicalConference (USENIX ATC), 2016.

[9] J. W. Anderson, R. Braud, R. Kapoor, G. Porter, and A. Vahdat. xOMB:Extensible OpenMiddleboxes with Commodity Servers. In Proceedingsof the Eighth ACM/IEEE Symposium on Architectures for Networkingand Communications Systems (ANCS), 2012.

[10] B. Anwer, T. Benson, N. Feamster, and D. Levin. ProgrammingSlick Network Functions. In Proceedings of the 1st ACM SIGCOMMSymposium on So�ware De�ned Networking Research (SOSR), 2015.

[11] A. Baumann, M. Peinado, and G. Hunt. Shielding Applications from anUntrusted Cloud with Haven. In 11th USENIX Symposium on OperatingSystems Design and Implementation (OSDI), 2014.

[12] A. Bremler-Barr, Y.Harchol, andD.Hay. OpenBox: A So�ware-De�nedFramework for Developing, Deploying, and Managing NetworkFunctions. In Proceedings of the 2016 ACM SIGCOMMConference, 2016.

[13] S. Checkoway and H. Shacham. Iago A�acks: Why the System CallAPI is a Bad Untrusted RPC Interface. In Proceedings of the EighteenthInternational Conference on Architectural Support for ProgrammingLanguages and Operating Systems (ASPLOS), 2013.

[14] V. Costan and S. Devadas. Intel SGX Explained. Cryptology ePrintArchive, Report 2016/086, 2016.

[15] M. Coughlin, E. Keller, and E. Wustrow. Trusted Click: OvercomingSecurity Issues of NFV in the Cloud. In Proceedings of the ACMInternational Workshop on Security in So�ware De�ned NetworksNetwork Function Virtualization (SDN-NFVSec), 2017.

[16] S.Han, K. Jang, K. Park, and S.Moon. PacketShader: AGPU-acceleratedSo�ware Router. In Proceedings of the ACM SIGCOMM 2010 Conference,2010.

[17] M. A. Jamshed, Y. Moon, D. Kim, D. Han, and K. Park. mOS: A ReusableNetworking Stack for FlowMonitoring Middleboxes. In 14th USENIXSymposium on Networked Systems Design and Implementation (NSDI),2017.

[18] E. Jeong, S. Wood, M. Jamshed, H. Jeong, S. Ihm, D. Han, andK. Park. mTCP: a Highly Scalable User-level TCP Stack for MulticoreSystems. In 11th USENIX Symposium on Networked Systems Designand Implementation (NSDI), 2014.

[19] M. Kablan, B. Caldwell, R. Han, H. Jamjoom, and E. Keller. StatelessNetwork Functions. In Proceedings of the 2015 ACM SIGCOMMWork-shop on Hot Topics in Middleboxes and Network Function Virtualization(HotMiddlebox), 2015.

[20] G. P. Katsikas, G. Q. Maguire Jr., and D. Kostic. Pro�ling andAccelerating Commodity NFV Service Chains with SCC. Journal ofSystems and So�ware, 2017.

[21] S. Kim, J. Han, J. Ha, T. Kim, and D. Han. Enhancing Security andPrivacy of Tor’s Ecosystem by Using Trusted Execution Environments.In 14th USENIX Symposium on Networked Systems Design andImplementation (NSDI), 2017.

[22] S. Kim, Y. Shin, J. Ha, T. Kim, and D. Han. A First Step TowardsLeveraging Commodity Trusted Execution Environments for NetworkApplications. In Proceedings of the 14th ACMWorkshop on Hot Topicsin Networks (HotNets), 2015.

[23] E. Kohler, R. Morris, B. Chen, J. Janno�i, and M. F. Kaashoek. �e ClickModular Router. In ACM Transactions on Computer Systems (TOCS),2000.

[24] D. Kuvaiskii, O. Oleksenko, S. Arnautov, B. Trach, P. Bhatotia, P. Felber,and C. Fetzer. SGXBounds: Memory Safety for Shielded Execution.In Proceedings of the Twel�h European Conference on Computer Systems(EuroSys), 2017.

[25] C. Lan, J. Sherry, R. A. Popa, S. Ratnasamy, andZ. Liu. Embark: SecurelyOutsourcing Middleboxes to the Cloud. In 13th USENIX Symposiumon Networked Systems Design and Implementation (NSDI), 2016.

[26] R. Laufer, M. Gallo, D. Perino, and A. Nandugudi. CliMB: EnablingNetwork FunctionCompositionwithClickMiddleboxes. In Proceedingsof the 2016Workshop on Hot Topics inMiddleboxes and Network FunctionVirtualization (HotMIddlebox), 2016.

[27] B. Li, K. Tan, L. L. Luo, Y. Peng, R. Luo, N. Xu, Y. Xiong, P. Cheng, andE. Chen. ClickNP: Highly Flexible and High Performance NetworkProcessing with Recon�gurable Hardware. In Proceedings of the 2016ACM SIGCOMMConference, 2016.

[28] L. Mai, L. Rupprecht, A. Alim, P. Costa, M. Migliavacca, P. Pietzuch,and A. L. Wolf. NetAgg: UsingMiddleboxes for Application-speci�cOn-path Aggregation in Data Centres. In Proceedings of the 10th ACMInternational on Conference on Emerging Networking Experiments andTechnologies (CoNEXT), 2014.

[29] J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Honda, R. Bifulco,and F. Huici. ClickOS and the Art of Network Function Virtualization.In 11th USENIX Symposium on Networked Systems Design andImplementation (NSDI), 2014.

[30] V. A. Olteanu and C. Raiciu. E�cientlyMigrating Stateful Middleboxes.In Proceedings of the ACM SIGCOMM 2012 Conference on Applications,Technologies, Architectures, and Protocols for Computer Communication(SIGCOMM), 2012.

[31] S. Palkar, C. Lan, S. Han, K. Jang, A. Panda, S. Ratnasamy, L. Rizzo, andS. Shenker. E2: A Framework for NFV Applications. In Proceedingsof the 25th Symposium on Operating Systems Principles (SOSP), 2015.

[32] B. Parno, J. M. McCune, and A. Perrig. Bootstrapping Trust inCommodity Computers. In Proceedings of the 2010 IEEE Symposiumon Security and Privacy (OAKLAND), 2010.

[33] R. Pires, M. Pasin, P. Felber, and C. Fetzer. Secure Content-BasedRouting Using Intel So�ware Guard Extensions. InArxiv, 2017.

[34] L. Rizzo. netmap: A Novel Framework for Fast Packet I/O. In 2012USENIX Annual Technical Conference (USENIX ATC), 2012.

[35] S. Arnautov et al. SCONE: Secure linux containers with IntelSGX. In 12th USENIX Symposium on Operating Systems Design andImplementation (OSDI), 2016.

[36] N. Santos, K. P. Gummadi, and R. Rodrigues. Towards Trusted CloudComputing. In Proceedings of the 2009 Conference on Hot Topics inCloud Computing (HotCloud), 2009.

13

Page 14: Slick: Secure Middleboxes using Shielded Execution · processing services in the cloud. However, middleboxes that process con•dential and private data cannot be securely de-ployed

[37] N. Santos, R. Rodrigues, K. P. Gummadi, and S. Saroiu. Policy-sealedData: A New Abstraction for Building Trusted Cloud Services. InProceedings of the 21st USENIX Conference on Security Symposium(USENIX Security), 2012.

[38] F. Schuster, M. Costa, C. Fournet, C. Gkantsidis, M. Peinado, G. Mainar-Ruiz, and M. Russinovich. VC3: Trustworthy Data Analytics in theCloud Using SGX. In Proceedings of the 2015 IEEE Symposium onSecurity and Privacy (Oakland), 2015.

[39] M. Schwarz, S. Weiser, D. Gruss, C. Maurice, and S. Mangard. MalwareGuard Extension: Using SGX to Conceal Cache A�acks. In Arxiv, 2017.

[40] V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi. Design andImplementation of aConsolidatedMiddleboxArchitecture. In In the 9thUSENIX Symposium on Networked Systems Design and Implementation(NSDI), 2012.

[41] J. Sherry, P. X. Gao, S. Basu, A. Panda, A. Krishnamurthy, C. Maciocco,M. Manesh, J. a. Martins, S. Ratnasamy, L. Rizzo, and S. Shenker.Rollback-Recovery for Middleboxes. In Proceedings of the 2015ACM Conference on Special Interest Group on Data Communication(SIGCOMM), 2015.

[42] J. Sherry, S. Hasan, C. Sco�, A. Krishnamurthy, S. Ratnasamy, andV. Sekar. Making Middleboxes Someone else’s Problem: Network

Processing As a Cloud Service. In Proceedings of the ACM SIGCOMM2012 Conference on Applications, Technologies, Architectures, andProtocols for Computer Communication (SIGCOMM), 2012.

[43] J. Sherry,C.Lan,R.A.Popa, andS.Ratnasamy. BlindBox: DeepPacket In-spection over Encrypted Tra�c. In Proceedings of the 2015 ACMConfer-ence on Special InterestGroup onDataCommunication (SIGCOMM), 2015.

[44] S. Shinde, D. L. Tien, S. Tople, and P. Saxena. Panoply: Low-TCB LinuxApplications With SGX Enclaves. In Proceedings of the Network andDistributed System Security Symposium (NDSS), 2017.

[45] L. Soares and M. Stumm. FlexSC: Flexible System Call Scheduling withException-Less System Calls. In Proceedings of the 9th USENIX Confer-ence on Operating Systems Design and Implementation (OSDI), 2010.

[46] C.-C. Tsai, D. Porter, and M. Vij. Graphene-SGX: A Practical LibraryOS for Unmodi�ed Applications on SGX. In Proceedings of the USENIXAnnual Technical Conference (USENIX ATC), 2017.

[47] W. Wu, K. He, and A. Akella. PerfSight: Performance Diagnosis forSo�ware Dataplanes. In Proceedings of the 2015 Internet MeasurementConference (IMC), 2015.

[48] Y. Xu, W. Cui, and M. Peinado. Controlled-Channel A�acks: Determin-istic Side Channels for Untrusted Operating Systems. In Proceedingsof the 2015 IEEE Symposium on Security and Privacy (Oakland), 2015.

14