27
Responsible : Prof. Frédéric Pétrot Supervisor : Luc Michel TIMA Laboratory - SLS Group Grenoble, France Translation cache policies for dynamic binary translation Ecole Nationale des Sciences de l'Informatique Saber Ferjani

Translation Cache Policies for Dynamic Binary Translation

Embed Size (px)

Citation preview

Page 1: Translation Cache Policies for Dynamic Binary Translation

Responsible : Prof. Frédéric Pétrot

Supervisor : Luc Michel

TIMA Laboratory - SLS Group

Grenoble, France

Translation cache policies for

dynamic binary translation

Ecole

Nationale

des Sciences

de l'Informatique

Saber Ferjani

Page 2: Translation Cache Policies for Dynamic Binary Translation

2

DBT: Is a CPU simulation technique, it reads a short sequence of code (Target), translates it, and executes it in a different CPU (Host).

Host Machine

CPU Simulated Target

translation

asm code

Page 3: Translation Cache Policies for Dynamic Binary Translation

TB TB TB TB TB TB

3

Translation cache: It is a buffer in host machine that stores the Translated Blocks (TB)

Page 4: Translation Cache Policies for Dynamic Binary Translation

Outline

1. Virtualization and simulation techniques

2. Qemu Internals

3. Typical cache algorithms

4. Cache algorithm proposal

5. Simulation results

6. Conclusion & Perspectives

4

Page 5: Translation Cache Policies for Dynamic Binary Translation

1. Virtualization and simulation techniques

5

1.1. Just In Time Compiler

Page 6: Translation Cache Policies for Dynamic Binary Translation

1. Virtualization and simulation techniques

6

1.2. Hosted & Native Hypervisors

Page 7: Translation Cache Policies for Dynamic Binary Translation

1. Virtualization and simulation techniques

7

1.3. Virtualization tools

Virtual Box

Virtual PC

VMware

Xen

Bochs

Valgrind

Qemu

KVM

Page 8: Translation Cache Policies for Dynamic Binary Translation

1. Virtualization and simulation techniques

8

1.4. Simulation techniques

Interpretive technique ► Extremely slow!

Native Simulation ► Need source code!

Binary Translation:

Static ► Cannot handle indirect branches

Dynamic ► Quite fast & flexible

Page 9: Translation Cache Policies for Dynamic Binary Translation

2. Qemu internals

9

2.1. Overview

Generic & Open source machine emulator

Created by Fabrice Bellard in 2003

Supported targets: IA32, ARM, SPARC, MIPS, PPC…

Page 10: Translation Cache Policies for Dynamic Binary Translation

2. Qemu internals

10

2.2. Execution flow example

Page 11: Translation Cache Policies for Dynamic Binary Translation

2. Qemu internals

11

2.3. Main execution loop

Page 12: Translation Cache Policies for Dynamic Binary Translation

2. Qemu internals

12

2.4. Translation cache size

Page 13: Translation Cache Policies for Dynamic Binary Translation

2. Qemu internals

13

2.4. TB allocation

Page 14: Translation Cache Policies for Dynamic Binary Translation

3. Typical cache algorithms

14

Optimal cache algorithm (offline)

Basic cache algorithms:

Flush, Random, FIFO, LRU, LFU

Advanced cache algorithms:

LRFU, 2Q, LIRS, ARC

Qemu constraints:

TB are not movable

TB size is variable,

TB size is unpredictable

Page 15: Translation Cache Policies for Dynamic Binary Translation

4. Cache algorithm proposal

15

4.1. Algorithm design

Page 16: Translation Cache Policies for Dynamic Binary Translation

4. Cache algorithm proposal

16

4.2. Data structure

Constant insertion overhead

Frequently referenced TBs are elected for

re-translation into separated cache area

Page 17: Translation Cache Policies for Dynamic Binary Translation

4. Cache algorithm proposal

17

4.3. HST update

Before CSA flush, add address of all TBs

that were executed more than 𝐹𝑡ℎ

HST is used as circular buffer,

HST size is fixed to half of HSA size

@HS1

@HS2

@HS3 @HS4

@HS5

Page 18: Translation Cache Policies for Dynamic Binary Translation

Qemu monitor: Back-end configuration

console interface

Log options:

out asm: show generated host code

In asm: show target assembly code

Exec: show trace before each executed TB

…etc

Generated log of (log exec):

Trace (Host Address) [(Target Address)]

5. Simulation results

18

5.1. Qemu log

Page 19: Translation Cache Policies for Dynamic Binary Translation

5. Simulation results

19

5.2. TB-trace: Translation cache simulator

Page 20: Translation Cache Policies for Dynamic Binary Translation

5. Simulation results

20

5.3. Simulated cache algorithms

LRU

LFU

CSA HSA

• A-LRU:

• A-LFU:

• A-2Q:

@

@

@ @

@ HST

Page 21: Translation Cache Policies for Dynamic Binary Translation

5. Simulation results 5.3. Qemu used guest machines

LZMA benchmark

Linux Kernel

Windows XP start-up

Page 22: Translation Cache Policies for Dynamic Binary Translation

5. Simulation results

22

5.5. Guest 1: LZMA benchmark over Debian

0,25 0,375 0,5

62

89 72

50 55 52 56 68

88

CSA flushs

Quota=

LRU LFU 2Q

0,25 0,375 0,5

18,5%

39,6% 26,1%

86,9% 91,3% 90,1% 81,8% 81,9% 81,8%

Hotspot hit

Page 23: Translation Cache Policies for Dynamic Binary Translation

5. Simulation results

23

5.6. Guest 2: Linux kernel 2.6.20

0,25 0,375 0,5

15 18

22

15 17

21

16 19

23

CSA flushs

Quota=

LRU LFU 2Q

+1

HSA

flush

+1

HSA

flush

0,25 0,375 0,5

24,1% 32,1%

43,6%

24,4%

61,9% 57,4%

30,0%

64,1% 65,2%

Hotspot hit

Page 24: Translation Cache Policies for Dynamic Binary Translation

5. Simulation results

24

5.7. Guest 3: Windows XP start-up

0,25 0,375 0,5

15 18

21

15 17

21

16 19

24

CSA flushs

Quota=

LRU LFU 2Q

+1

HSA

flush

+1

HSA

flush

+1

HSA

flush

0,25 0,375 0,5

16,0%

45,2% 52,1%

23,4%

56,5% 51,4%

29,0%

45,3%

64,7%

Hotspot hit

Page 25: Translation Cache Policies for Dynamic Binary Translation

Qemu translation cache is inefficient

Cache algorithms based on page

replacement cannot be used

Our algorithm proposal advantages:

Reduce unneeded re-translations

TB insertion overhead is constant

Drawbacks:

Invalidated TB remain allocated

Address find operation depend on HST size

6. Conclusion & Perspectives

25

6.1. Conclusion

Page 26: Translation Cache Policies for Dynamic Binary Translation

Use a hash function for HST to accelerate

TB lookup before each new translation,

Use an op-code buffer to accelerate TB

re-translation of hot spots,

Estimate size of next translation, and try

to overwrite invalidated TB

6. Conclusion & Perspectives

26

6.2. Perspectives

Page 27: Translation Cache Policies for Dynamic Binary Translation

27

Questions?