45
VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/

VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

VXA: A Virtual Architecture forDurable Compressed Archives

Bryan FordComputer Science and Artificial Intelligence Laboratory

Massachusetts Institute of Technology

http://pdos.csail.mit.edu/~baford/vxa/

Page 2: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

The Ubiquity of Data Compression

Everything is compressed these days– Archive/Backup/Distribution: ZIP, tar.gz, ...

– Multimedia streams: mp3, ogg, wmv, ...

– Office documents: XML­in­ZIP

– Digital cameras: JPEG, proprietary RAW, ...

– Video camcorders: DV, MPEG­2, ...

Page 3: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Compressed Data Formats

Observation #1:Data compression formats evolve rapidly

— SQ

— L

U—

 ARC

— Z

OO

— L

Harc

— Z

IP

— bz

ip2

— gz

ip

— co

mpress

— 7z

Lossless Compression—

 RAR

1980 1985 1990 1995 2000 2005

Page 4: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Compressed Data Formats

Observation #1:Data compression formats evolve rapidly

— SQ

— L

U—

 ARC

— Z

OO

— L

Harc

— Z

IP

— bz

ip2

— gz

ip

— co

mpress

— 7z

Lossless Compression—

 RAR

— M

PEG­2

— D

V—

 WM

V7

Video Encoding

Audio Encoding

— M

PEG­1

— M

PEG­4

— M

P3

— R

ealA

udio

— A

AC

— FLAC

— V

orbis

— Sore

nson

— Q

uickT

ime

— JP

EG

— G

IF

— PNG

— JP

EG 2000

Image Encoding—

 TIF

F

— W

MA7

— W

MA9

— W

MV9

— W

MV8

— FLIC

— W

AV

— A

IFF

— 8S

VX

— A

NIM

— A

IFF­C

— IL

BM

— PCX

— T

GA

— B

MP

1980 1985 1990 1995 2000 2005

Page 5: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Compressed Data Formats

Observation #1:Data compression formats evolve rapidly

Problems:– Inconvenient:

each new algorithm requires decoder install/upgrade

– Impedes data portability:data unusable on systems without supported decoder

– Threatens long­term data usability:old decoders may not run on new operating systems

Page 6: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archiving Compressed Data

Observation #2:Processor architectures evolve more conservatively

1980 1985 1990 1995 2000 2005

x86 Architecture—

 8086

— 80

386

  (32­b

it)

— x8

6­64

  (64­b

it)

— SSE

  (FP ve

ctor)

— M

MX

  (int 

vecto

r)

Page 7: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archiving Compressed Data

Observation #2:Processor architectures evolve more conservatively

1980 1985 1990 1995 2000 2005

x86 Architecture—

 8086

— 80

386

  (32­b

it)

— x8

6­64

  (64­b

it)

— SSE

  (FP ve

ctor)

— M

MX

  (int 

vecto

r)

Fully Backward Compatible Extensions

Page 8: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archiving Compressed Data

Observation #2:Processor architectures evolve more conservatively

1980 1985 1990 1995 2000 2005

x86 Architecture—

 8086

— 80

386

  (32­b

it)

— x8

6­64

  (64­b

it)

— SSE

  (FP ve

ctor)

— M

MX

  (int 

vecto

r)

— 68

000

— M

IPS

— A

RM

— PA­R

ISC

— Pow

erPC

— D

EC Alph

a

— It

anium

Other Architectures—

 SPARC

Page 9: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

— 68

000

— M

IPS

— A

RM

— PA­R

ISC

— Pow

erPC

— D

EC Alph

a

— It

anium

Other Architectures—

 SPARC

Archiving Compressed Data

Observation #2:Processor architectures evolve more conservatively

1980 1985 1990 1995 2000 2005

x86 Architecture—

 8086

— 80

386

  (32­b

it)

— x8

6­64

  (64­b

it)

— SSE

  (FP ve

ctor)

— M

MX

  (int 

vecto

r)

Page 10: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

— 68

000

— M

IPS

— A

RM

— PA­R

ISC

— Pow

erPC

— D

EC Alph

a

— It

anium

Other Architectures—

 SPARC

Archiving Compressed Data

Observation #2:Processor architectures evolve more conservatively

1980 1985 1990 1995 2000 2005

x86 Architecture—

 8086

— 80

386

  (32­b

it)

— x8

6­64

  (64­b

it)

— SSE

  (FP ve

ctor)

— M

MX

  (int 

vecto

r)

Page 11: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

— 68

000

— M

IPS

— A

RM

— PA­R

ISC

— Pow

erPC

— D

EC Alph

a

— It

anium

Other Architectures—

 SPARC

Archiving Compressed Data

Observation #2:Processor architectures evolve more conservatively

1980 1985 1990 1995 2000 2005

x86 Architecture—

 8086

— 80

386

  (32­b

it)

— x8

6­64

  (64­b

it)

— SSE

  (FP ve

ctor)

— M

MX

  (int 

vecto

r)

Page 12: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

— 68

000

— M

IPS

— A

RM

— PA­R

ISC

— Pow

erPC

— D

EC Alph

a

— It

anium

Other Architectures—

 SPARC

Archiving Compressed Data

Observation #2:Processor architectures evolve more conservatively

1980 1985 1990 1995 2000 2005

x86 Architecture—

 8086

— 80

386

  (32­b

it)

— x8

6­64

  (64­b

it)

— SSE

  (FP ve

ctor)

— M

MX

  (int 

vecto

r)

Itani

c

Page 13: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

VXA: Virtual Executable Archives

Observation 1+2: Instruction formats are historically more durable than compressed data formats

Make archive self­extracting (data + executable decoder)

To extract data, archive reader runs embedded decoder

ArchiveWriter

ArchiveReader

D D

Archive

Encoder

Decoder

Page 14: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Goals of VXA

Make self­extracting archives...

ArchiveWriter

ArchiveReader

D D

Archive

Encoder

Decoder

Page 15: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Goals of VXA

Make self­extracting archives...

1. Safe: malicious decoders can't compromise host

2. Future­proof: simple, well­defined architecture [Lorie]

ArchiveWriter

ArchiveReader

D

Emulator

D

Archive

Encoder

Decoder

Page 16: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Goals of VXA

Make self­extracting archives...

1. Safe: malicious decoders can't compromise host

2. Future­proof: simple, well­defined architecture [Lorie]

3. Easy: allow reuse of existing code, languages, tools

ArchiveWriter

ArchiveReader

D

x86Emulator

D

Archive

Encoder

Decoder

Page 17: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Goals of VXA

Make self­extracting archives...

1. Safe: malicious decoders can't compromise host

2. Future­proof: simple, well­defined architecture [Lorie]

3. Easy: allow reuse of existing code, languages, tools

4. Efficient: practical for short term data packaging too

ArchiveWriter

ArchiveReader

D

Fast x86Emulator

D

Archive

Encoder

Decoder

Page 18: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Outline

● Archiver Operation● vxZIP Archive Format● Decoder Architecture● Emulator Design & Implementation● Evaluation (performance, storage overhead)● Conclusion

Page 19: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archive Writer Operation

Archive

VXA Archiver

Page 20: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archive Writer Operation

Archive

D1

Uncompressed Input Files

GeneralCompressor

Decoder1

VXA Archiver

Page 21: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archive Writer Operation

Archive

D1

Uncompressed Input Files

GeneralCompressor

Decoder1

VXA Archiver

Page 22: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archive Writer Operation

Archive

D1 D2 D3

Uncompressed Input Files

GeneralCompressor

Decoder1

ImageCompressor

Decoder2

AudioCompressor

Decoder3

Page 23: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archive Writer Operation

Archive

Uncompressed Input Files Pre­Compressed Input Files

D1 D2 D3

GeneralCompressor

ImageCompressor

AudioCompressor

Decoder1 Decoder2 Decoder3

Page 24: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archive Writer Operation

Archive

D4 D5

Uncompressed Input Files Pre­Compressed Input Files

Image FormatRecognizer

Decoder4

Audio FormatRecognizer

Decoder5

D1 D2 D3

GeneralCompressor

ImageCompressor

AudioCompressor

Decoder1 Decoder2 Decoder3

Page 25: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Archive Reader Operation

Archive

VXA Archive Readerx86 Emulator

D4 D5D1 D2 D3

Page 26: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

VXA Archive Reader

Archive Reader Operation

Archive

Original Uncompressed Files

x86 Emulator

Decoder1

D4 D5D1 D2 D3

Page 27: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

VXA Archive Reader

Archive Reader Operation

Archive

Original Uncompressed Files

x86 Emulator

Decoder1 Decoder2 Decoder3

D4 D5D1 D2 D3

Page 28: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

VXA Archive Reader

Archive Reader Operation

Archive

Original Uncompressed Files Original Pre­Compressed Files

x86 Emulator

D4 D5D1 D2 D3

Decoder1 Decoder2 Decoder3

Page 29: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

VXA Archive Reader

Archive Reader Operation

Archive

Original Uncompressed Files De­compressed Files

x86 Emulator

Decoder4 Decoder5

D4 D5D1 D2 D3

Decoder1 Decoder2 Decoder3

Page 30: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vxZIP Archive Format

● Backward compatible with legacy ZIP format

Central Directory

Audio file

Audio file

Image file

vxZIP Archive

Page 31: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vxZIP Archive Format

● Backward compatible with legacy ZIP format

● Decoders intermixed with archived files

Central Directory

Audio file

FLAC Decoder

Audio file

Image file

JP2 Decoder

vxZIP Archive

Page 32: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vxZIP Archive Format

● Backward compatible with legacy ZIP format

● Decoders intermixed with archived files

● Archived files havenew extension header pointing to decoder

Central Directory

Audio file(FLAC-encoded)

FLAC Decoder

Audio file(FLAC-encoded)

Image file(JP2-encoded)

JP2 Decoder

vxZIP Archive

Page 33: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vxZIP Archive Format

● Backward compatible with legacy ZIP format

● Decoders intermixed with archived files

● Archived files havenew extension header pointing to decoder

● Decoders are hidden,“deflated” (gzip)

Central Directory

Audio file(FLAC-encoded)

FLAC Decoder(deflated)

Audio file(FLAC-encoded)

Image file(JP2-encoded)

JP2 Decoder(deflated)

vxZIP Archive

Page 34: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vxZIP Decoder Architecture

● Decoders are ELF executables for x86­32– Can be written in any language, safe or unsafe

– Compiled using ordinary tools (GCC)

● Decoders have access to five “system calls”:– read stdin, write stdout, malloc, next file, exit

● Decoders cannot:– open files, windows, devices, network connections, ...

– get system info: user name, current time, OS type, ...

Page 35: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Decoders Ported So Far(using existing implementations in C, mostly unmodified)

General­purpose (lossless):● zlib: Classic gzip/deflate algorithm● bzip2: Burrows­Wheeler algorithm

Still image codecs:● jpeg: Classic lossy image compression scheme● jp2: JPEG 2000 wavelet­based algorithm, lossy or lossless

Audio codecs:● flac: Free Lossless Audio Codec● vorbis: Standard lossy audio codec for Ogg streams

Page 36: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vx32 Emulator Architecture

Runs in vxUnZIP process● Loads decoder into 

address space sandbox● Restricts decoder's 

memory accessesto sandbox

● Dispatches decoder's VXA “system calls”to vxUnZIP(not to host OS!)

VXA DecoderAddress Space

(up to 1GB)

vxUnZIPProcessAddress

Space

vxUnZIPApplication

DecoderAddress Space

0

0

VXA System Calls

vx32 Emulator library

Page 37: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vx32 Emulator Implementation

On x86­{32/64} hosts:– Secure fault isolation 

[Wahbe]

– Data sandboxing viacustom LDT segments

– Code sandboxing viainstruction rewriting[Sites, Nethercote]

– No privileges orkernel extensions

KernelAddress Space

CodeRewriting

VXA DecoderAddress Space

(up to 1GB)

Flat-ModelCode/Data

Segment

vxUnZIPApplication

DecoderData Segment(LDT)

0

0

ControlledProcedureCalls

vx32 Emulator library

Transformed code cache

Page 38: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

vx32 Emulator Implementation

On other host architectures:– Portable but slow “fallback” instruction interpreter

(mostly done)

– Fast x86­to­PowerPC binary translator(in progress)

– Hopefully more in the future

Emulator implemented as generic library– Can be used for other sandboxing applications

Page 39: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Evaluation

Two issues to address:● Performance overhead of emulated decoders

– not important for long­term archival storage, but...

– very important for common short­term uses of archives:backups, software distribution, structured documents, ...

● Storage overhead of archived decoders

Page 40: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Performance Test Method

Run 6 ported decoders on appropriate data sets– Athlon 64 3000+ PC running SuSE Linux 9.3

– Measure user­mode CPU time (not wall­clock time)

Compare:– Emulated vs native execution

– Running on x86­32 vs x86­64 host environment

Page 41: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Performance Overhead

zlib bzip2 jpeg jp2 flac vorbis0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

native x86-32 vx32 on x86-32 native x86-64 vx32 on x86-64

No

rma

lize

d U

ser-

mo

de

Exe

cutio

n T

ime

Page 42: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Performance Overhead

zlib bzip2 jpeg jp2 flac vorbis0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

native x86-32 vx32 on x86-32 native x86-64 vx32 on x86-64

No

rma

lize

d U

ser-

mo

de

Exe

cutio

n T

ime

Page 43: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Storage Overhead

Archiver stores only one copy of each decoder– Storage cost amortized over all files of same type

– Relative overhead depends on size of archive

Therefore, measure only absolute decoder size

(compressed, as stored in archive)

Page 44: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Storage Overhead

zlib bzip2 jpeg jp2 flac vorbis0

10

20

30

40

50

60

70

80

90

100

110

120

130

Decoder

C library

Co

mp

ress

ed

co

de

siz

e (

KB

yte

s)

Page 45: VXA: A Virtual Architecture for Durable Compressed ArchivesVXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory

Conclusion

VXA makes self­extracting archives...

● Safe: decoders fully sandboxed

● Future­proof: simple, OS­independent environment

● Easy: re­use existing decoders, languages, tools

● Efficient: ≤ 11% slowdown vs native x86­32

Available at: http://pdos.csail.mit.edu/~baford/vxa/