Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science
FEARLESS engineering
SGX BigMatrixA Practical Encrypted Data Analytic Framework with Trusted
Processors
Fahad Shaon Murat Kantarcioglu Zhiqiang LinLatifur Khan
The University of Texas at Dallas
FEARLESS engineering 1 / 49
Problem - Secure Data Analytics on Cloud
Result
Code & Data
I We want to utilize cloud environment for data analytics
I Service provider can observe the data
I Problematic for sensitive data (e.g., medical, financial data)
FEARLESS engineering 2 / 49
Problem - Secure Data Analytics on Cloud
Encrypted Result
Encrypted Code & Data
I We outsource encrypted sensitive data
I However, encrypted data is difficult to analyze
FEARLESS engineering 3 / 49
Problem - Secure Data Analytics - Approaches
Homomorphic Encryption
I Theoretically robust andprovides highest level ofsecurity
I High computational cost
I Impractical for large dataprocessing
Trusted Hardware
I Cost effective
I Provides reasonable security
I Intel SGX is available in allnew processors
I Needs careful considerationof side channel attacks
FEARLESS engineering 4 / 49
Objective of the work
Create a data analytics platform utilizing trustedprocessor, which is - secure, practical, generalpurpose, and scalable.
FEARLESS engineering 5 / 49
State of the Art
ObliVM (Liu et al., 2015)
I Provides a language and covert the logic into circuit
I Difficult to perform analysis on large data set
Oblivious Multi-party ML (Ohrimenko et al., 2016)
I Performs important machine learning algorithms using SGX
I Specific for set of algorithms
Opaque (Zheng et al., 2017)
I Oblivious and encrypted distributed analytics platform usingApache Spark and Intel SGX (mainly focused on supportingSQL)
FEARLESS engineering 6 / 49
Background - Intel SGX
I SGX stands for Software Guard Extensions
I SGX is new Intel instruction set
I Allows us to create secure compartment inside processor,called Enclave
I Privileged softwares, such as, OS, Hypervisor, can’t directlyobserve data and computation inside enclave
FEARLESS engineering 7 / 49
Background - Intel SGX - Attack Surface
I SGX essentially reduce the attack surface to processor andenclave code
OS
VMM
Hardware
App App App
Attack Surface
Attack surface of traditionalcomputation system
OS
VMM
App App App
Hardware
Attack Surface
Attack surface with SGX
FEARLESS engineering 8 / 49
Background - Intel SGX - Attack Surface
I SGX essentially reduce the attack surface to processor andenclave code
OS
VMM
Hardware
App App App
Attack Surface
Attack surface of traditionalcomputation system
OS
VMM
App App App
Hardware
Attack Surface
Attack surface with SGX
FEARLESS engineering 8 / 49
Background - Intel SGX Application
Untrusted Part
of App
Trusted Part
of App
I We only trust the processor and the code inside theenclave (Intel, 2015)
FEARLESS engineering 9 / 49
Background - Intel SGX Impact
SGX Server
Encrypted Result
Encrypted Code & Data
I We can outsource computation securely
I No need to trust the cloud provider (i.e. Hypervisor, OS,Cloud administrators)
FEARLESS engineering 10 / 49
Threat Model
Server
Memory Processor
Enclave
Disk
Code & Data
Result
I Adversary can control OS (i.e. memory, disk, networking)
I Adversary can not temper with enclave code
I Adversary can not observe CPU register content
FEARLESS engineering 11 / 49
Challenges - Obliviousness
Challenge: Access Pattern Leakage
I SGX uses system memory, which is controlled by the adversary
I Adversary can observe memory accesses
I Memory access reveals a lot about the data (Islam, Kuzu, andKantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)
Solution
I To reduce information leakage we ensure Data Obliviousness
FEARLESS engineering 12 / 49
Challenges - Obliviousness
Challenge: Access Pattern Leakage
I SGX uses system memory, which is controlled by the adversary
I Adversary can observe memory accesses
I Memory access reveals a lot about the data (Islam, Kuzu, andKantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)
Solution
I To reduce information leakage we ensure Data Obliviousness
FEARLESS engineering 12 / 49
Data Obliviousness - Example
I Program executes same path for all input of same size
Example: Non-Oblivious swap method of Bitonic sort
if (dir == (arr[i] > arr[j])) {
int h = arr[i];
arr[i] = arr[j];
arr[j] = h;
}
FEARLESS engineering 13 / 49
Data Obliviousness - Example
I Program executes same path for all input of same size
Example: Non-Oblivious swap method of Bitonic sort
if (dir == (arr[i] > arr[j])) {
int h = arr[i];
arr[i] = arr[j];
arr[j] = h;
}
FEARLESS engineering 13 / 49
Data Obliviousness - Example (Cont.)
Example: Oblivious swap method of Bitonic sort
int x = arr[i];
int y = arr[j];
_asm{
...
mov eax , x
mov ebx , y
mov ecx , dir
cmp ebx , eax
setg dl
xor edx , ecx
mov eax , x
mov ecx , y
mov ebx , y
mov edx , x
cmovz eax , ecx
cmovz ebx , edx
mov [x], eax
mov [y], ebx
}
FEARLESS engineering 14 / 49
Data Obliviousness - Challenges
Challenge
I Building data obliviousness solution is non-trivial
I Requires a lot of time and effort
Solution
I We provide our own python (NumPy, Pandas) inspiredlanguage that ensures data obliviousness
FEARLESS engineering 15 / 49
Data Obliviousness - Challenges
Challenge
I Building data obliviousness solution is non-trivial
I Requires a lot of time and effort
Solution
I We provide our own python (NumPy, Pandas) inspiredlanguage that ensures data obliviousness
FEARLESS engineering 15 / 49
Data Oblivious - Vectorization
I We removed if and emphasis on vectorization
Example: Compute average income of people with age >= 50
sum = 0, count = 0
for i = 0 to Person.length:
if Person.age >= 50:
count++
sum += P.income
print sum / count
FEARLESS engineering 16 / 49
Data Oblivious - Example
Example: Compute average income of people with age >= 50
S = where(Person , "Person[‘age ’] >= 50")
print (S .* Person[‘income ’] ) / sum(S)
FEARLESS engineering 17 / 49
Challenge - Memory constraint
Challenge
I Current version of SGX (v1) allows only 90MB of memoryallocation
Solution
I We build flexible data blocking mechanism with efficientand secure caching
I We build matrix manipulation library that supports blockingand we call the abstraction BigMatrix
FEARLESS engineering 18 / 49
Challenge - Memory constraint
Challenge
I Current version of SGX (v1) allows only 90MB of memoryallocation
Solution
I We build flexible data blocking mechanism with efficientand secure caching
I We build matrix manipulation library that supports blockingand we call the abstraction BigMatrix
FEARLESS engineering 18 / 49
Security Properties - Summary
I Individual operations in our system is data oblivious
I Combination of oblivious operations is also oblivious
I Compiler warns user about potential leakage
I We perform optimization based on publicly knowninformation, e.g. data size
FEARLESS engineering 19 / 49
System Overview - SGX BigMatrix
Untrusted Trusted
CompilerBlock SizeOptimizer
Service ManagerBigMatrix Library
Intel SGX SDK
Execution Engine
BlockCache
OCalls
ECalls
Compiler
BMRT Client
ServerClient
SGX BigMatrix
FEARLESS engineering 20 / 49
BigMatrix Library
Untrusted Trusted
CompilerBlock SizeOptimizer
Service ManagerBigMatrix Library
Intel SGX SDK
Execution Engine
BlockCache
OCalls
ECalls
Compiler
BMRT Client
ServerClient
SGX BigMatrix - BigMatrix Library
FEARLESS engineering 21 / 49
BigMatrix Library
Operations in BigMatrix Library
I Data access operations - load, publish, get row, etc.
I Matrix Operations - inverse, multiply, element wise,transpose, etc.
I Relational Algebra Operations - where, sort, join, etc.
I Data generation operations - rand, zeros, etc.
I Statistical Operations - norm, var
FEARLESS engineering 22 / 49
BigMatrix Library - Security Properties
I All the operations are data oblivious
I All the operations supports blocking
I We proved that combination of data oblivious operations isalso data oblivious (in Section 4)
I Data oblivious and blocking aware implementation details inAppendix A
FEARLESS engineering 23 / 49
BigMatrix Library - Trace
I Each operation has fixed trace
I Trace is the information disclosed to adversary duringexecution
I For example: operation type, input and output data size
Example: Trace of Matrix Multiplication C = A ∗BI Instruction type (i.e. multiplication)
I Input Matrices size (i.e., A.rows,A.cols, B.rows,B.cols)
I Output Matrix size (i.e., C.rows,C.cols)
I Block size
I Oblivious memory read and write sequences, which does notdepend on data content
FEARLESS engineering 24 / 49
BigMatrix Library - Trace
I Each operation has fixed trace
I Trace is the information disclosed to adversary duringexecution
I For example: operation type, input and output data size
Example: Trace of Matrix Multiplication C = A ∗BI Instruction type (i.e. multiplication)
I Input Matrices size (i.e., A.rows,A.cols, B.rows,B.cols)
I Output Matrix size (i.e., C.rows,C.cols)
I Block size
I Oblivious memory read and write sequences, which does notdepend on data content
FEARLESS engineering 24 / 49
Exec. Engine & Block Cache
Untrusted Trusted
CompilerBlock SizeOptimizer
Service ManagerBigMatrix Library
Intel SGX SDK
Execution Engine
BlockCache
OCalls
ECalls
Compiler
BMRT Client
ServerClient
SGX BigMatrix - Execution Engine and Block Cache
FEARLESS engineering 25 / 49
Exec. Engine & Block Cache
Execution Engine
I Execute BigMatrix library operations
I Parse instruction in the form of
Var ASSIGN Operation (Var, Var, ...)
I Process sequence of instructions
I Maintain intermediate states required to execute complexprogram, such as, variable to BigMatrix assignments
Block Cache
I Help with the decision when to remove a block from memorybased on next sequence of instructions
FEARLESS engineering 26 / 49
Exec. Engine & Block Cache - Security Properties
I Execution Engine and Block Cache is also data obliviousgiven the input program is data oblivious
I Compiler warns about potential data leakage
I Adversary can not infer anything more about data, apart fromthe trace of all the operations
FEARLESS engineering 27 / 49
Compiler
Untrusted Trusted
CompilerBlock SizeOptimizer
Service ManagerBigMatrix Library
Intel SGX SDK
Execution Engine
BlockCache
OCalls
ECalls
Compiler
BMRT Client
ServerClient
SGX BigMatrix - Compiler
FEARLESS engineering 28 / 49
Compiler
I Compiles our python inspired language into basic command
I It ensures data obliviousness by removing support for if
I We emphasis on operation vectorization
Input: Linear Regression
x = l o a d ( ‘ path / to / X Matr ix ’ )y = l o a d ( ‘ path / to / Y Matr ix ’ )x t = t r a n s p o s e ( x )t h e t a = i n v e r s e ( x t ∗ x ) ∗ x t ∗ yp u b l i s h ( t h e t a )
FEARLESS engineering 29 / 49
Compiler - Output
Output: Linear Regression
x = l o a d ( X M a t r i x I D )y = l o a d ( Y M a t r i x I D )x t = t r a n s p o s e ( x )t1 = m u l t i p l y ( xt , x )u n s e t ( x )t2 = i n v e r s e ( t1 )u n s e t ( t1 )t3 = m u l t i p l y ( t2 , x t )u n s e t ( x t )u n s e t ( t2 )t h e t a = m u l t i p l y ( t3 , y )u n s e t ( y )u n s e t ( t3 )p u b l i s h ( t h e t a )
FEARLESS engineering 30 / 49
Compiler - Track data leakage
I We report against accidental data leakage through trace
I We check if any sensitive data is used in trace of any operation
I In our system, sensitive data - content of any BigMatrix,content of intermediate variables
Example
X = load(‘path/to/X_Matrix ‘)
s = count(where(X[1] >= 0))
Y = zeros(s, 1)
publish(Y)
We report that zeros operation revealing sensitive data s
FEARLESS engineering 31 / 49
SQL Support
I We also support basic SQL
Input
I = sql(‘SELECT *
FROM person p
JOIN person_income pi (1)
ON p.id = pi.id
WHERE p.age > 50
AND pi.income > 100000 ’)
FEARLESS engineering 32 / 49
SQL Support (Cont.)
Output
t1 = where(person , ’C:3;V:50;O:=’)
# person.age is in column 3
t2 = zeros(person.rows , 2)
set_column(t2, 0, t3)
t3 = get_column(person , 0)
# person.id is in column 0
set_column(t2, 1, t1)
t4 = where(person_income , ’C:1;V:100000;O:=’)
t5 = zeros(person_income.rows , 2)
set_column(t5, 0, t6)
t6 = get_column(person_income , 0)
# person_income.id is in column 0
set_column(t5, 1, t4)
A = join(t3, t5, ’c:t1.0;c:t2.0;O:=’, 1)
...FEARLESS engineering 33 / 49
Block Size Optimizer
Untrusted Trusted
CompilerBlock SizeOptimizer
Service ManagerBigMatrix Library
Intel SGX SDK
Execution Engine
BlockCache
OCalls
ECalls
Compiler
BMRT Client
ServerClient
SGX BigMatrix - Block Size Optimizer
FEARLESS engineering 34 / 49
Block Size Optimizer - Intro & Design Decisions
I We observed that input block size has impact onperformances of the system
I Adversary doesn’t gain any knowledge about data based onblock size
I So, we find optimum block size for each instruction beforeexecuting a program
I We explicitly do not want to perform optimization insideenclave because
I Optimization libraries are large and complex, which canintroduce unintended security flaws
I Any efficient optimization algorithm will reveal informationabout data
I So we only perform optimization on trace data, nothing else
FEARLESS engineering 35 / 49
Block Size Optimizer - Overview
I We generate DAG of execution graphI Internal nodes represent operationsI Edges represent block conversions
I We know cost for each operation for different matrix andblock size
I Given input matrix sizes we can find optimized block size
I We can convert one block configuration to another and knowthe cost of conversion
FEARLESS engineering 36 / 49
Block Size Optimizer - Example - Linear Regression
I Execution graph (DAG) of Θ = (XTX)−1XTY in linerregression training phase
FEARLESS engineering 37 / 49
Block Size Optimizer - Example - LR Cost Function
Cost = Convert(X, (brX , bcX), (x0, x1))
+ OP Cost(′Transpose′, X, (x0, x1))
+ Convert(XT , (x1, x0), (x2, x3))
+ Convert(X, (brX , bcX), (x4, x5))
+ OP Cost(′Multiply′, [XT , X], [(x2, x3), (x4, x5)])
+ ...
We convert this into integer programming and solve it for all thexn variables.
FEARLESS engineering 38 / 49
Experimental Evaluations
We implemented a prototype using Intel SGX SDK and observeperformance of different operations
Setup
I Processor Intel Core i7 6700
I Memory 64GB
I OS Windows 7
I SGX SDK Version 1.0
I Number of Machine 1
FEARLESS engineering 39 / 49
Performance Impact - Matrix Size
0
200000
400000
600000
800000
1x106
1.2x106
1.4x106
0
5x10
6
1x10
7
1.5
x107
2x10
7
2.5
x107
Mat
rix
Mu
ltip
lica
tio
n T
ime
(ms)
Matrix Elements
UnencryptedEncrypted
Matrix Multiplication(e.g. C = A ∗B)
150000
200000
250000
300000
350000
400000
450000
500000
550000
600000
650000
700000
750000
800000
850000
900000
950000
1x10
6
Join
tim
e (m
s)
Matrix Elements
UnencryptedEncrypted
Oblivious Join
FEARLESS engineering 40 / 49
Performance Impact - Matrix Size - Summary
I We observe similar trends for all matrix operations
I We observe minimal overhead for encrypted computation
I However, the overhead depends on operation type
I More experimental evaluations in Section 5
FEARLESS engineering 41 / 49
Performance Impact - Block Size
Execution Time
100 200
300 400
500 100
200
300
400
500
140
145
150
155
160
Sca
lar
Oper
atio
n T
ime
(ms)
Scalar Multiplication
Execution Time
100 200
300 400
500 100
200
300
400
500
18000
18400
18800
19200
19600
20000
Mat
rix M
ult
ipli
cati
on T
ime
(ms)
Matrix Multiplication
FEARLESS engineering 42 / 49
Performance Impact - Block Size - Summary
I We observe execution time increases with block size
I Also, very small block size increases execution time, due toblocking overhead
I As a result, we performed optimization
FEARLESS engineering 43 / 49
Comparison with ObliVM
I We compare performance of SGX-BigMatrix with ObliVM fortwo-party matrix multiplication
I We observe that SGX-BigMatrix is magnitude faster becausewe are utilizing hardware and do not require expensive overthe network communication
Matrix ObliVM BigMatrix BigMatrixDimension SGX Enc. SGX Unenc.
100 28s 660ms 10ms 10ms250 7m 0s 90ms 93ms 88ms500 53m 48s 910ms 706.66ms 675.66ms750 2h 59m 40s 990ms 2s 310ms 2s 260ms
1,000 6h 34m 17s 900ms 10s 450ms 10s 330ms
Table: Two-party matrix multiplication time in ObliVM vs BigMatrix
FEARLESS engineering 44 / 49
Case Studies - Page Rank
I Performed Page Rank on three popular datasets
I Each dataset contains directed graph
Data Set Nodes BigMatrix Encrypted
Wiki-Vote 7,115 97s 560msAstro-Physics 18,772 6m 41s 200msEnron Email 36,692 23m 19s 700ms
Table: Page Rank on real datasets
FEARLESS engineering 45 / 49
Conclusion
I We propose a practical data analytics framework with SGX
I We present BigMatrix abstraction to handle large matrices inconstrained environment
I We proposed a programming abstraction for secure dataanalytics
I We applied our system to solve real world problems
FEARLESS engineering 46 / 49
Thank You
Questions / Comments
I Fahad Shaon - [email protected]
I Murat Kantarcioglu - [email protected]
I Zhiqiang Lin - [email protected]
I Latifur Khan - [email protected]
FEARLESS engineering 47 / 49
References I
Intel (2015). Presentation for Intel SGX: ISCA 2015. url: https://software.intel.com/sites/default/files/332680-
002.pdf.Islam, Mohammad Saiful, Mehmet Kuzu, and Murat Kantarcioglu
(2012). “Access Pattern disclosure on Searchable Encryption:Ramification, Attack and Mitigation.” In: NDSS. Vol. 20, p. 12.
Liu, Chang et al. (2015). “Oblivm: A programming framework forsecure computation”. In: Security and Privacy (SP), 2015 IEEESymposium on. IEEE, pp. 359–376.
Naveed, Muhammad, Seny Kamara, and Charles V Wright (2015).“Inference attacks on property-preserving encrypted databases”.In: Proceedings of the 22nd ACM SIGSAC Conference onComputer and Communications Security. ACM, pp. 644–655.
FEARLESS engineering 48 / 49
References II
Ohrimenko, Olga et al. (2016). “Oblivious Multi-Party MachineLearning on Trusted Processors”. In: 25th USENIX SecuritySymposium (USENIX Security 16). Austin, TX: USENIXAssociation, pp. 619–636. isbn: 978-1-931971-32-4. url:https://www.usenix.org/conference/usenixsecurity16/
technical-sessions/presentation/ohrimenko.Zheng, Wenting et al. (2017). “Opaque: A Data Analytics
Platform with Strong Security”. In: 14th USENIX Symposiumon Networked Systems Design and Implementation (NSDI 17).Boston, MA: USENIX Association. url:https://www.usenix.org/conference/nsdi17/technical-
sessions/presentation/zheng.
FEARLESS engineering 49 / 49