View
217
Download
0
Category
Preview:
Citation preview
Architecture Support for Dynamic Integrity Checking
2015. 6. 2Bonhyun Koo
Arun K. Kanuparthi1, Ramesh Karri1, Gaston Ormazabal2, Sateesh K. Addepalli3
1Polytechnic Institute of NYU, Brooklyn, NY USA2Columbia University, New York, NY USA
3Cisco Systems, San Jose, CA USA
Information Forensics and Security, IEEE Transactions on(2012)
Contents
2/20
1. Introduction
2. Background
3. Dynamic Integrity Checker (DIC)
4. DIC Design
5. Experiment and Evaluation
6. Conclusion
Problem and Contributions
Problem
Existing TPM Architectures do not support runtime integrity checking
Vulnerability : TOCTOU (Time of Check-Time of Use) Attacks
Contribution
1) Integrity Checker Module with a superscalar pipeline
2) Architecture for Dynamic Integrity Checking (Dynamic Execution traces)
3) Optimizations to reduce performance impact
(without compromising the security of the system)
4) Evaluation the proposed scheme (using a cycle-accurate simulator)
3/20
1. Introduction
- Trusted Platform Module (TPM)
4/20
- Trusted Platform Module (developed by Trusted Computing Group, TCG): A separate chip to address security concerns : TPM acts as a root of trust for checking platform integrity at boot time (Only guarantee boot time security)
(PC) TPM module
Integrity Measurement
RSA Key pairsGeneration
TPM
PCR
H(PA) H(PB) H(PC)
: H(PN)
ProgramC
:
:
Nonce
EPukC( Nonce, H(PC) )
DPriC(EPukC( Nonce, H(PC) ))
PC H =
[ Example of Integrity Verification ]
2. Background
TOCTOU Threat Model
5/20
TOC(ti)
TOU(ti+k) :
Attacker can exploit the duration between TOC and TOU
One approach to counter such TOCTOU threats is to frequently check for the integrity of instructions being executed. (by calculating the hash)
Attacker can change the system state after checking and before using it.
2. Background
Vulnerabilities of Computer Systems
6/20
Stack smashing at-tack
Cold Boot Attack
No attack on a superscalar processor Disk and main memory are insecure, but the processor is secure.
2. Background
7/20
Cryptographic Hash Function
MAC(keyed) MDC(unkeyed)
m MAC = h(k, m)
m MDC = h(m)
MAC
k
h?=
k
h
mm
Hash Function (MAC & MDC)
m
?=
k1
h
m
Ek2(m || h(k1, m))m || h(k1, m)k1
h
k2
E
k2
D
[ MAC : Integrity Checking ] [ Integrity + Confidentiality (Encryption) ]
2. BackgroundDifferent Dynamic Integrity Schemes
8/20
1. REM(Basic Block)
3. CODESSEAL(Basic Block)
2. XOM(Instruction)
Comparison with precomputed Hash (execution time, AES-128)
(Hash) Memory of FPGA between memory and cache
Session Key to encrypt/decryptdifferent program (DES)Program decryption by the instruction
Compiler & Micro-architecture modification4. SPEF(Basic Block)
2. Background
Different Dynamic Integrity Schemes
9/20
3. Dynamic Integrity Checker (DIC)
A. Motivation (choose the optimum granularity level)
10/20
Example) 403.gcc (ref. input 166.i)SPEC CPU2006 benchmark
250x increase in the total execution cycles over the baseline (caused by DIC)
3. Dynamic Integrity Checker (DIC)
B. DIC of Program Traces
11/20
T(1)
0
CFG (Control Flow Graph)
BB: Basic Block
There are six possible traces
# Trace Path
1 BB0-BB1-BB4-BB5
2 BB0-BB1-BB3
3 BB0-BB1-BB3-BB5
4 BB0-BB2
5 BB0-BB2-BB3
6 BB0-BB2-BB3-BB5
0x00400000: the starting address of the first instruction
TraceID : 0x00400000-010
1
N(0)
1
2
3
4
5
6
4. DIC Design
12/20
Application Profiling and Trace Generation
(Compile-Time)
Interaction with the Pipe-line (DIC)
(Run-Time)
Hash Trace Cache(HTC)
Hash Storage Hierarchy
Prefetching : to reduce the impact of
Cold Start
load-time prefetching
A. B.
C. D.
4. DIC Design
A. Compile-Time Preprocessing : Application Profiling and Trace Generation
13/20
(1) Generate a list of all basic blocks of the program.(2) Construct the control flow graph (CFG) of the program, where each node is a basic block.(3) Enumerate all traces of length (in terms of number of basic blocks) or smaller.(4) Profile the program by applying test inputs to count the number of times each trace is encountered.(5) Order the traces by their frequency of execution.
(1)
(2)
(3)
(4)
(5)
4. DIC Design B. DIC : Interaction with the Pipeline
14/20
Architecture of the proposed scheme
(1) Tag all instructions with a pending bit. (※ Pending bit is cleared Commit) (2) Build the Trace ID (ex. 0x00400000-010...)(3) DIC initiates a fetch of the hash from the disk, using TraceID(4) DIC calculates the Hash(TraceID) generated in Step 2.(5) Decrypt the encrypted hash using the RSA key (hardwired in the DIC) Compare to the one calculated by the DICIf the hashes are equal Commit & Clear the Pending bit If the hashes are not equal Execution is aborted.
(1)
(3)
(4) (2)
(5)
4. DIC Design
B. DIC : Interaction with the Pipeline
15/20
Case : A (Hash comparison) > B (Reaching the head of ROB) No performance degradation
A
B
the instructions reaches the head of the reorder-buffer (ROB)
4. DIC Design
C. Hash Storage Hierarchy
16/20
(Performance Problem – Disk Access)If each disk access consumes this large number of cycles, instructions will line up at the commit stage, and performance will reach an unacceptable level.
One solution to this problem is to have a storage hierarchy of hashes
Disk
The main goal of HTC is to cache hashes fetched from the disk
DIC will not need to generate cryptographic hashes and make comparison if there is an HTC hit.
Hash Trace Cache : it only stores the hashes of traces prede-termined at compile time.
4. DIC Design
D. Prefetching (to reduce the impact of Cold Start)
17/20
AAAAA....AAAAAABCFFFFFF....FFFFFXYGGGG..GG
Example) Trace Access Pattern
Disk
TraceID Frequency
A 100
F 70
G 30
B 1
C 1
X 1
Y 1
load-time prefetching: An easy and cost-effective way to get rid of compulsory misses
5. Experimental Result
A. Experimental Setup
18/20
Simulator : Zesto (DIC, HTC implementation)
Intel Core2 Architecture, Nehalem
Benchmark : (1) SPEC CPU2006 (runspec), (2) BioBench, (3) STREAM (GNU gcc)
Profiling : exp-bbv (basic block generation tool)
- Hash design
Hashes comparison : 2 cycles, Hash searching : 1 cycle
HTC size : 32KB
- Access Time
Main memory : 200 cycles, Encryption/Decryption : 150 cycles
Disk : 2,656,250 cycles
5. Experimental Result
19/20
54% (1.54x)42%35%32%
HTC (35%) + main memory (17%)
35%17.8%
HTC EffectivenessBaseline : without DIC
5. Experimental Result
20/20
The worst case11.2%
Average : 8.03%
HTC + Main memory + PrefetchingHit Rate comparison
#1 Scheme : 32K Directed Mapped HTC Only#2 Scheme : HTC + main memory#3 Scheme : 4-way set-associate + prefetching
Average : 97%Lowest : 94%
- Appendix -
21/10
Example) Basic Block Trace
OpenFile ReadFile
22/10
2. Background
23/20
RSA Public-Key Encryption Algorithm
SHA-1 Hash Algorithm
Overall Sequence Flow
24/10
메시지
메시지 (448) 패딩
입력블록512 비트
입력블록512 비트
W0~W79 32 비트 ×80 개
초기 상태 160 비트
(A,B,C,D,E 32 비트 ×5 개
블록의 처리 80 단계
내부 상태 160 비트
(A,B,C,D,E 32 비트 ×5 개
블록의 처리 80 단계
블록의 처리 80 단계
해시값160 비트
입력블록512 비트
입력블록512 비트
입력블록512 비트
입력블록512 비트
W0~W79 32 비트 ×80 개
W0~W79 32 비트 ×80 개
내부 상태 160 비트
(A,B,C,D,E 32 비트 ×5 개
최종 상태 160 비트
(A,B,C,D,E 32 비트 ×5 개
서명문 길이
서명문 100‥‥0 64bit
K 비트 <2 비트64
패 딩
L x 512 비트 = N x 32 비트
25/10
입력블록 512 비트
W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10W11W12W13W14W15
XOR
XOR
XOR
XOR
XOR
1bit회전
1bit회전
1bit회전
1bit회전
1bit회전
W16
W17
W18
W19
W20
W63 W65 W71 W76
XOR
W79
1bit회전
Wt-16 Wt-14 Wt-8 Wt-3
XOR
Wt
1bit회전
SHA-1 Hash Algorithm
Wt 생성 과정
26/10
SHA-1 Hash Algorithm
입력 블록 512 비트 A 버퍼32 비트
B 버퍼32 비트
C 버퍼32 비트
D 버퍼32 비트
E 버퍼32 비트
단계 0
단계 1
단계 2
단계 3
단계 77
단계 78
단계 79
+
1 블록 처리 전의 내부 상태 160비트
++
++
A 버퍼32 비트
B 버퍼32 비트
C 버퍼32 비트
D 버퍼32 비트
E 버퍼32 비트
1 블록 처리 후의 내부 상태 160 비트
입력 블록 512 비트를 160비트의 내부 상태에 섞어 넣는다 (80 단계 )
27/10
SHA-1 Hash Algorithm
A 버퍼32 비트
B 버퍼32 비트
C 버퍼32 비트
D 버퍼32 비트
E 버퍼32 비트
+
1 단계 처리 전의 내부 상태 160 비트
+
+
+
1 단계 처리 후의 내부 상태 160 비트
A 버퍼의 초기값 67 45 23 01
B 버퍼의 초기값 EF CD AB 89
C 버퍼의 초기값 98 BA DC FE
D 버퍼의 초기값 10 32 54 76
E 버퍼의 초기값 C3 D2 E1 F0
A 버퍼32 비트
B 버퍼32 비트
C 버퍼32 비트
D 버퍼32 비트
E 버퍼32 비트
기약논리함수
ft
5 비트회전
30 비트회전
입력 블록과 단계에 의존하는 수 Wt
(32 비트 )
단계에 의존하는 정수 Kt 32 비트
f0~f19=(B · C) + (~B · D)
f20~f39=B ⊕ C ⊕ D
f40~f59=(B · C) + (C · D) + (D · B)
f60~f79=B ⊕ C ⊕ D
K0~K19= 5A 82 79 99
K20~K39= 6E D9 EB A1
K40~K59= 8F 1B BC DC K60~K79= CA 62 C1 D6
HSHA-1 각 단계 처리
+ : 법 232 연산
Recommended