25
2006 Symantec Corporation, All Rights Reserved Anonymizing Anonymizing Filesystem Filesystem Metadata Metadata for Analysis for Analysis Chris Xin Symantec

Virus-AntiVirus Co-evolution - DTC

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Virus-AntiVirus Co-evolution - DTC

2006 Symantec Corporation, All Rights Reserved

AnonymizingAnonymizing FilesystemFilesystem Metadata Metadata for Analysisfor Analysis

Chris Xin

Symantec

Page 2: Virus-AntiVirus Co-evolution - DTC

Challenges of Filesystem Analysis

Real-time live-system monitoring is difficult.– performance degradation– security & privacy concerns– stability risk

Traces– difficult to reconstruct I/O dependencies– system states– security & privacy concerns

Benchmarks– “There are lies, damn lies and then there are benchmarks.”

Filesystem images– snapshot, backups– security & privacy concerns

Page 3: Virus-AntiVirus Co-evolution - DTC

Agenda

Challenges of filesystem analysis

Keeping filesystem images– metasave

Metadata anonymization– secure metasave

Measurement– space efficiency– time efficiency– resource consumption

Summary

Page 4: Virus-AntiVirus Co-evolution - DTC

Filesystem Images

Storing the whole system would be expensive.– large storage space– long time

Keeping metadata is a wise idea.– A good resource for understanding some characteristics of a file

system– Cumulative images can be obtained to track the change trend of a file

systemfile size, age, type informationfilesystem aging analysis

– Address some privacy concerns by eliminating user data

Some file systems already provide such a utility.– Ext2: e2image– Linux NTFS: ntfsclone --metadata– VxFS: metasave

Page 5: Virus-AntiVirus Co-evolution - DTC

Metasave Utility

The utility saves or restores the metadata of VxFS– Available in version 1 and later versions.– Metadata is kept in a way that the original geometry of a file system

is preserved and all the inode information is intact.– No user data is retained.– Metadata can be saved on top of a snapshot, a backup, or a live

system as an image file.– The image file can be deflated and metadata can be restored back to

a file or a device.

What do we do with images?– troubleshooting– debugging– file system analysis

Page 6: Virus-AntiVirus Co-evolution - DTC

Efficient Anonymization

But …your clients may say no …– Sensitive information is still in the file and directory names– Concerns of performance degradation

Solution: Anonymize clients’ information in metadata– Names of files and directories– Client information in file system intent logs

Requirements– Must be difficult to recover original information– Keep the geometry of the file system: retain the length of the

file/directory names– Time efficient– Space efficient– Minimum performance degradation

Page 7: Virus-AntiVirus Co-evolution - DTC

Secure Metasave

Enhanced metasave with encryption options– Evolved from metasave, a VxFS utility for saving/restoring

metadata of a file system– Online image saving– Use cryptographic message digest algorithm to obfuscate

client informationThe algorithm can be chosen by a client’s requirementDefault: SHA-1

Page 8: Virus-AntiVirus Co-evolution - DTC

Message Digest

Secure one-way hash function: e=H(M)– M: original message– H: hash function– e: digested message

Key properties– Given M, easy to compute e=H(M) – Given e, hard to compute M such that e=H(M)– Given M, hard to find M' (different from M) such that

H(M)=H(M') (minimum collision)

Page 9: Virus-AntiVirus Co-evolution - DTC

Implementation

OpenSSL libraryObfuscate a file/directory name

– Do it by individual pathname components/a/bc/bcd /x/rd/wyz

– Retain name lengthDigest works on a fixed length of characters at a time.

– 20 characters for SHA-1If len(name) > len(digest), process it in segments.If len(name) < len(digest) or len(final segment) < len(digest), digest the name string and remove some characters to preserve its original length.Digest can contain characters that are illegal in file/directorynames; map them to legal characters.

Page 10: Virus-AntiVirus Co-evolution - DTC

File/Directory Name Manipulation

Parse a name stringMessage digestChop it to its original length

Random number generator with a changeable seed

Character mapping

790

digests

0 67

original name string

20 6040

0 67

obfuscated filename

0

chop to org. length

67

Page 11: Virus-AntiVirus Co-evolution - DTC

Obfuscation Options

Full-name obfuscation

Retain file extension if any

Obfuscate extensions as well and make them consistent

original nameobfuscation option

foo1.c foo2.c

full-name abcde uwxyz

retain file extension jkis.c swdx.c

consistent extension jkis.x swdx.x

Page 12: Virus-AntiVirus Co-evolution - DTC

Further Handling

Multiple extensions and prefixes for name-only obfuscation option– Look at the last extension only

foo.c.bak abced.bak– retain extension of 4 or less; obfuscate anything bigger

Do not obfuscate the name of special administrative files or directories– lost+found

Rebuild directory indexes and block checksums after name obfuscationSymlinks

– Point to the same place within the file system– “..” is kept intact

Intent logs– Offers an option to not include intent logs in an image file.– If intent log is retained, file and directory names are obfuscated.

Page 13: Virus-AntiVirus Co-evolution - DTC

Collision Probability

What’s a collision?– Two files/directories with different names, say A and B, end up with

the same name after obfuscation.

Do we have to worry about it?– Not really– Collision only matters within individual directories.– Chance of collision is tiny

With SHA-1, 1 in 1024 possibility for a filesystem with a trillion file/directory names, and 1 in 1018 for quadrillion names.The character mapping and name length chopping increase the chance of collisions slightly.

– An optional name conflict check is followed after obfuscation for a file system with large directories.

Page 14: Virus-AntiVirus Co-evolution - DTC

Measurement

Three categories– Space consumption– Time consumption

encryption overhead– Resource consumption

Six filesystems measured– four customer filesystems– two filesystems on our production server (fs #2 and #6)

Experiment environment– Live production system

Sun Fire E690016 Sparc CPUs, 32GB memory, shared disks

– Test machineSun Fire V2402 Sparc CPUs, 2GB memory, single-user disks

Page 15: Virus-AntiVirus Co-evolution - DTC

Space Efficiency

The image of metadata usually takes about 1-5% of the filesystemsize.

storage efficiency

0.08 0.06 0.04 0.05

6.88

0.600.12 0.08

0.73 0.56

11.73

0.63

0

2

4

6

8

10

12

1 2 3 4 5 6

filesystem

% o

f im

age

over

fs s

ize

% of total cap.% of used cap.

Page 16: Virus-AntiVirus Co-evolution - DTC

Time Efficiency

How long does it take to get an anonymized file system image?– use “filename-only” option– on the live production system

about 30 minutes to get an encrypted metadata image from fs #6.5--8 secs for fs #2.

– on the test machine:time efficiency

1.9 1.7 0.267 6.4

108.33

273

0

50

100

150

200

250

300

1 2 3 4 5 6

filesystem

time

(sec

)

Page 17: Virus-AntiVirus Co-evolution - DTC

A closer look

The factors in play– # of inodes– total filesystem size– filesystem capacity

fs # files time (sec)

production

msv size/

used fs cap.

39 --

4

--

--

--

1836

742

0.12%

0.08%

0.73%

0.56%

11.73%

3,721

59,584

956,180

2,259,443 0.63%

time (sec)

test

msv size/

total fs cap.

total(GB) used(GB)

1.9

1.7

0.267

6.4

108.33

273.0

27.80.08%

0.06%

0.04%

0.05%

6.88%

49.5

9.0

150.0

3.9

0.60% 195.4

1 18.3

2 39.4

3 0.6

4 12.4

5 2.3

6 186.9

Page 18: Virus-AntiVirus Co-evolution - DTC

Encryption Overhead

Space efficiency is the same.

time efficiency– Little overhead introduced on a live production system

I/O boundedshared disk

– Noticeable computational overhead on the test machine.

Page 19: Virus-AntiVirus Co-evolution - DTC

Encryption Overhead on the Test Machine

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6

file system

norm

aliz

ed ti

me

no-encryptionfull-obfuscationfilename-onlyconsistent-extension

Page 20: Virus-AntiVirus Co-evolution - DTC

Encryption Overhead on the Production System

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6

file system

norm

aliz

ed ti

me

no-encryptionfull-obfuscationfilename-onlyconsistent-extension

Page 21: Virus-AntiVirus Co-evolution - DTC

Resource Consumption

Not much performance degradation during image saving

– 20 MB memory and 1% of CPU were utilized during the image dumping on a live production system.

Page 22: Virus-AntiVirus Co-evolution - DTC

Summary

A method of anonymizing filesystem metadata.– Obfuscate clients information to relieve privacy concerns– Cost 1-5% storage of the original file system size.– Fairly quick process and little performance degradation.

We encourage saving file metadata images with anonymization.

– Provide a good resource for file system analysis– Benefit both development and research

The anonymization scheme can be used in other file system utilities, such as trace collecting.

Page 23: Virus-AntiVirus Co-evolution - DTC

References

Bruce Schneier, Applied Cryptography. Second Edition, J. Wiley and Sons, 1996

Mark Ryan, “One-way secure hash functions”, Computer Security lecture notes, University of Birmingham.

Geoff Kuenning and Ethan L. Miller, "Anonymization Techniques for URLs and Filenames," Technical Report UCSC-CRL-03-05, University of California, Santa Cruz, September 2003.

Xiaoyun Wang, Yiqun Lisa Yin and Hongbo Yu, “Finding Collisions in the Full SHA-1”, CRYPTO 2005

http://www.linux-ntfs.org/

Page 24: Virus-AntiVirus Co-evolution - DTC

Acknowledgements

Thanks to Oleg Kiselev, John Colgrove, Craig Harmer, Chuck Silvers and George Mathew for discussions.

Thanks to Marianne Lent and Paul Massiglia for suggestions.

Thanks to Ken Zachmann for helping with experiments.

Page 25: Virus-AntiVirus Co-evolution - DTC

Questions