Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
VXA: A Virtual Architecture forDurable Compressed Archives
Bryan FordComputer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
http://pdos.csail.mit.edu/~baford/vxa/
The Ubiquity of Data Compression
Everything is compressed these days– Archive/Backup/Distribution: ZIP, tar.gz, ...
– Multimedia streams: mp3, ogg, wmv, ...
– Office documents: XMLinZIP
– Digital cameras: JPEG, proprietary RAW, ...
– Video camcorders: DV, MPEG2, ...
Compressed Data Formats
Observation #1:Data compression formats evolve rapidly
— SQ
— L
U—
ARC
— Z
OO
— L
Harc
— Z
IP
— bz
ip2
— gz
ip
— co
mpress
— 7z
Lossless Compression—
RAR
1980 1985 1990 1995 2000 2005
Compressed Data Formats
Observation #1:Data compression formats evolve rapidly
— SQ
— L
U—
ARC
— Z
OO
— L
Harc
— Z
IP
— bz
ip2
— gz
ip
— co
mpress
— 7z
Lossless Compression—
RAR
— M
PEG2
— D
V—
WM
V7
Video Encoding
Audio Encoding
— M
PEG1
— M
PEG4
— M
P3
— R
ealA
udio
— A
AC
— FLAC
— V
orbis
— Sore
nson
— Q
uickT
ime
— JP
EG
— G
IF
— PNG
— JP
EG 2000
Image Encoding—
TIF
F
— W
MA7
— W
MA9
— W
MV9
— W
MV8
— FLIC
— W
AV
— A
IFF
— 8S
VX
— A
NIM
— A
IFFC
— IL
BM
— PCX
— T
GA
— B
MP
1980 1985 1990 1995 2000 2005
Compressed Data Formats
Observation #1:Data compression formats evolve rapidly
Problems:– Inconvenient:
each new algorithm requires decoder install/upgrade
– Impedes data portability:data unusable on systems without supported decoder
– Threatens longterm data usability:old decoders may not run on new operating systems
Archiving Compressed Data
Observation #2:Processor architectures evolve more conservatively
1980 1985 1990 1995 2000 2005
x86 Architecture—
8086
— 80
386
(32b
it)
— x8
664
(64b
it)
— SSE
(FP ve
ctor)
— M
MX
(int
vecto
r)
Archiving Compressed Data
Observation #2:Processor architectures evolve more conservatively
1980 1985 1990 1995 2000 2005
x86 Architecture—
8086
— 80
386
(32b
it)
— x8
664
(64b
it)
— SSE
(FP ve
ctor)
— M
MX
(int
vecto
r)
Fully Backward Compatible Extensions
Archiving Compressed Data
Observation #2:Processor architectures evolve more conservatively
1980 1985 1990 1995 2000 2005
x86 Architecture—
8086
— 80
386
(32b
it)
— x8
664
(64b
it)
— SSE
(FP ve
ctor)
— M
MX
(int
vecto
r)
— 68
000
— M
IPS
— A
RM
— PAR
ISC
— Pow
erPC
— D
EC Alph
a
— It
anium
Other Architectures—
SPARC
— 68
000
— M
IPS
— A
RM
— PAR
ISC
— Pow
erPC
— D
EC Alph
a
— It
anium
Other Architectures—
SPARC
Archiving Compressed Data
Observation #2:Processor architectures evolve more conservatively
1980 1985 1990 1995 2000 2005
x86 Architecture—
8086
— 80
386
(32b
it)
— x8
664
(64b
it)
— SSE
(FP ve
ctor)
— M
MX
(int
vecto
r)
— 68
000
— M
IPS
— A
RM
— PAR
ISC
— Pow
erPC
— D
EC Alph
a
— It
anium
Other Architectures—
SPARC
Archiving Compressed Data
Observation #2:Processor architectures evolve more conservatively
1980 1985 1990 1995 2000 2005
x86 Architecture—
8086
— 80
386
(32b
it)
— x8
664
(64b
it)
— SSE
(FP ve
ctor)
— M
MX
(int
vecto
r)
— 68
000
— M
IPS
— A
RM
— PAR
ISC
— Pow
erPC
— D
EC Alph
a
— It
anium
Other Architectures—
SPARC
Archiving Compressed Data
Observation #2:Processor architectures evolve more conservatively
1980 1985 1990 1995 2000 2005
x86 Architecture—
8086
— 80
386
(32b
it)
— x8
664
(64b
it)
— SSE
(FP ve
ctor)
— M
MX
(int
vecto
r)
— 68
000
— M
IPS
— A
RM
— PAR
ISC
— Pow
erPC
— D
EC Alph
a
— It
anium
Other Architectures—
SPARC
Archiving Compressed Data
Observation #2:Processor architectures evolve more conservatively
1980 1985 1990 1995 2000 2005
x86 Architecture—
8086
— 80
386
(32b
it)
— x8
664
(64b
it)
— SSE
(FP ve
ctor)
— M
MX
(int
vecto
r)
Itani
c
VXA: Virtual Executable Archives
Observation 1+2: Instruction formats are historically more durable than compressed data formats
Make archive selfextracting (data + executable decoder)
To extract data, archive reader runs embedded decoder
ArchiveWriter
ArchiveReader
D D
Archive
Encoder
Decoder
Goals of VXA
Make selfextracting archives...
ArchiveWriter
ArchiveReader
D D
Archive
Encoder
Decoder
Goals of VXA
Make selfextracting archives...
1. Safe: malicious decoders can't compromise host
2. Futureproof: simple, welldefined architecture [Lorie]
ArchiveWriter
ArchiveReader
D
Emulator
D
Archive
Encoder
Decoder
Goals of VXA
Make selfextracting archives...
1. Safe: malicious decoders can't compromise host
2. Futureproof: simple, welldefined architecture [Lorie]
3. Easy: allow reuse of existing code, languages, tools
ArchiveWriter
ArchiveReader
D
x86Emulator
D
Archive
Encoder
Decoder
Goals of VXA
Make selfextracting archives...
1. Safe: malicious decoders can't compromise host
2. Futureproof: simple, welldefined architecture [Lorie]
3. Easy: allow reuse of existing code, languages, tools
4. Efficient: practical for short term data packaging too
ArchiveWriter
ArchiveReader
D
Fast x86Emulator
D
Archive
Encoder
Decoder
Outline
● Archiver Operation● vxZIP Archive Format● Decoder Architecture● Emulator Design & Implementation● Evaluation (performance, storage overhead)● Conclusion
Archive Writer Operation
Archive
VXA Archiver
Archive Writer Operation
Archive
D1
Uncompressed Input Files
GeneralCompressor
Decoder1
VXA Archiver
Archive Writer Operation
Archive
D1
Uncompressed Input Files
GeneralCompressor
Decoder1
VXA Archiver
Archive Writer Operation
Archive
D1 D2 D3
Uncompressed Input Files
GeneralCompressor
Decoder1
ImageCompressor
Decoder2
AudioCompressor
Decoder3
Archive Writer Operation
Archive
Uncompressed Input Files PreCompressed Input Files
D1 D2 D3
GeneralCompressor
ImageCompressor
AudioCompressor
Decoder1 Decoder2 Decoder3
Archive Writer Operation
Archive
D4 D5
Uncompressed Input Files PreCompressed Input Files
Image FormatRecognizer
Decoder4
Audio FormatRecognizer
Decoder5
D1 D2 D3
GeneralCompressor
ImageCompressor
AudioCompressor
Decoder1 Decoder2 Decoder3
Archive Reader Operation
Archive
VXA Archive Readerx86 Emulator
D4 D5D1 D2 D3
VXA Archive Reader
Archive Reader Operation
Archive
Original Uncompressed Files
x86 Emulator
Decoder1
D4 D5D1 D2 D3
VXA Archive Reader
Archive Reader Operation
Archive
Original Uncompressed Files
x86 Emulator
Decoder1 Decoder2 Decoder3
D4 D5D1 D2 D3
VXA Archive Reader
Archive Reader Operation
Archive
Original Uncompressed Files Original PreCompressed Files
x86 Emulator
D4 D5D1 D2 D3
Decoder1 Decoder2 Decoder3
VXA Archive Reader
Archive Reader Operation
Archive
Original Uncompressed Files Decompressed Files
x86 Emulator
Decoder4 Decoder5
D4 D5D1 D2 D3
Decoder1 Decoder2 Decoder3
vxZIP Archive Format
● Backward compatible with legacy ZIP format
Central Directory
Audio file
Audio file
Image file
vxZIP Archive
vxZIP Archive Format
● Backward compatible with legacy ZIP format
● Decoders intermixed with archived files
Central Directory
Audio file
FLAC Decoder
Audio file
Image file
JP2 Decoder
vxZIP Archive
vxZIP Archive Format
● Backward compatible with legacy ZIP format
● Decoders intermixed with archived files
● Archived files havenew extension header pointing to decoder
Central Directory
Audio file(FLAC-encoded)
FLAC Decoder
Audio file(FLAC-encoded)
Image file(JP2-encoded)
JP2 Decoder
vxZIP Archive
vxZIP Archive Format
● Backward compatible with legacy ZIP format
● Decoders intermixed with archived files
● Archived files havenew extension header pointing to decoder
● Decoders are hidden,“deflated” (gzip)
Central Directory
Audio file(FLAC-encoded)
FLAC Decoder(deflated)
Audio file(FLAC-encoded)
Image file(JP2-encoded)
JP2 Decoder(deflated)
vxZIP Archive
vxZIP Decoder Architecture
● Decoders are ELF executables for x8632– Can be written in any language, safe or unsafe
– Compiled using ordinary tools (GCC)
● Decoders have access to five “system calls”:– read stdin, write stdout, malloc, next file, exit
● Decoders cannot:– open files, windows, devices, network connections, ...
– get system info: user name, current time, OS type, ...
Decoders Ported So Far(using existing implementations in C, mostly unmodified)
Generalpurpose (lossless):● zlib: Classic gzip/deflate algorithm● bzip2: BurrowsWheeler algorithm
Still image codecs:● jpeg: Classic lossy image compression scheme● jp2: JPEG 2000 waveletbased algorithm, lossy or lossless
Audio codecs:● flac: Free Lossless Audio Codec● vorbis: Standard lossy audio codec for Ogg streams
vx32 Emulator Architecture
Runs in vxUnZIP process● Loads decoder into
address space sandbox● Restricts decoder's
memory accessesto sandbox
● Dispatches decoder's VXA “system calls”to vxUnZIP(not to host OS!)
VXA DecoderAddress Space
(up to 1GB)
vxUnZIPProcessAddress
Space
vxUnZIPApplication
DecoderAddress Space
0
0
VXA System Calls
vx32 Emulator library
vx32 Emulator Implementation
On x86{32/64} hosts:– Secure fault isolation
[Wahbe]
– Data sandboxing viacustom LDT segments
– Code sandboxing viainstruction rewriting[Sites, Nethercote]
– No privileges orkernel extensions
KernelAddress Space
CodeRewriting
VXA DecoderAddress Space
(up to 1GB)
Flat-ModelCode/Data
Segment
vxUnZIPApplication
DecoderData Segment(LDT)
0
0
ControlledProcedureCalls
vx32 Emulator library
Transformed code cache
vx32 Emulator Implementation
On other host architectures:– Portable but slow “fallback” instruction interpreter
(mostly done)
– Fast x86toPowerPC binary translator(in progress)
– Hopefully more in the future
Emulator implemented as generic library– Can be used for other sandboxing applications
Evaluation
Two issues to address:● Performance overhead of emulated decoders
– not important for longterm archival storage, but...
– very important for common shortterm uses of archives:backups, software distribution, structured documents, ...
● Storage overhead of archived decoders
Performance Test Method
Run 6 ported decoders on appropriate data sets– Athlon 64 3000+ PC running SuSE Linux 9.3
– Measure usermode CPU time (not wallclock time)
Compare:– Emulated vs native execution
– Running on x8632 vs x8664 host environment
Performance Overhead
zlib bzip2 jpeg jp2 flac vorbis0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
native x86-32 vx32 on x86-32 native x86-64 vx32 on x86-64
No
rma
lize
d U
ser-
mo
de
Exe
cutio
n T
ime
Performance Overhead
zlib bzip2 jpeg jp2 flac vorbis0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
native x86-32 vx32 on x86-32 native x86-64 vx32 on x86-64
No
rma
lize
d U
ser-
mo
de
Exe
cutio
n T
ime
Storage Overhead
Archiver stores only one copy of each decoder– Storage cost amortized over all files of same type
– Relative overhead depends on size of archive
Therefore, measure only absolute decoder size
(compressed, as stored in archive)
Storage Overhead
zlib bzip2 jpeg jp2 flac vorbis0
10
20
30
40
50
60
70
80
90
100
110
120
130
Decoder
C library
Co
mp
ress
ed
co
de
siz
e (
KB
yte
s)
Conclusion
VXA makes selfextracting archives...
● Safe: decoders fully sandboxed
● Futureproof: simple, OSindependent environment
● Easy: reuse existing decoders, languages, tools
● Efficient: ≤ 11% slowdown vs native x8632
Available at: http://pdos.csail.mit.edu/~baford/vxa/