Introduction to Steganalysis Schemes Multimedia Security

Introduction to Steganalysis Schemes

Multimedia Security

Outline

• Steganalysis to LSB encoding

• Steganalysis based on JPEG compatibility

• Some discussions

Introduction

• Steganography– The art of secret communication– Stego content (e.g. images) should not

contain any easily detectable artifacts due to message embedding

– The less information is embedded, the smaller the probability of introducing detectable artifacts

Watermarking vs. Steganography

Fidelity

Robustness Capacity

Watermarking

Steganography

Steganalysis of LSB Encoding

Goal

• To inspect one or possibly more images for statistical artifacts due to message embedding in color images using the LSB method– To find out which images are likely to

contain secret messages– To estimate the reliability of decisions

• Type I error (false-alarm) and Type II error (Miss)

Application Scenarios

Internet

Automatic Checking

Internet node with a special filter

Forensics Expert

Images in Seized computer Images sent

to a certain address

LSB Encoding

• Replacing the LSB of every gray-level of color channel with message bits– On average 50% of the LSB are changed– Logic behind this scheme

• LSB in scanned or camera-taken images are essentially random

• Encrypted (randomized) message are random• No statistical artifacts will be introduced

Important Observation

• Number of unique colors in cover images– Typically smaller than the number of pixels in the images

• 1:2 for high quality scans in BMP format• 1:6 or lower for JPEG images or video

• Many true-color images have a relatively small “palette”

• After LSB embedding, new color palette will have a distinct feature– Many pairs of close colors– An evidence of LSB encoding-based steganography

Formulations

• U: number of unique colors in an image

• P: number of close color pairs– Two colors (R1,G1,B1) and (R2,G2,B2) are

close if |R1-R2|≤1 and |G1-G2|≤1 and |B1-B2|≤1

• R: ratio between the number of close pairs of colors and all pairs of colors– R=P/C(U, 2) , C(., .) # of combination

The Proposed Scheme

• After embedding, U will be increased to U’, and we can evaluate the number of unique pairs of P’.

• The value of R for an image that does not have a message will be smaller than that of an image that already has a message already embedded in it

The Proposed Scheme (cont.)

• It is impossible to find a threshold of R for all images– Due to a large variation of U

• Observations for reliable distinguishing– For an image already contains a large message

• Embedding another message in it does not modify R significantly

– For an image not containing a message• R increases significantly

– Use the relative comparison of R as the decision criterion

Detection Algorithm

• To find out whether or not an image has a secret message

1. Calculate R=P/C(U, 2) 2. Using LSB embedding in randomly selected pixels

– Size of the test message: 3 a M N (for M by N color images)‧ ‧ ‧3. Calculate R’=P’/C(U’,2) 4. Decide whether an image is embedded

– R~=R’ the image already had a large message hidden– R’>R the image did not have a message in it

R’/R: the separating statistics

Limitations

• If the secret message size is too small– the two ratio will be very close to each other

• We cannot distinguish images with and without messages

Experiments

• Using an image database of 300 color images– 350x250 pixels– JPEG compressed– Capacity for each image: 32.8k bits (350x250*3/8)

• A message of length 20KB (2/3 of maximal capacity) was embedded into each image to form a new database of images with messages

• The detection algorithm is run for both database and the message presence is tested by embedding a test message of size 1KB (a=1/30)

Experimental Results

1.1

_ : original database… : embedded database

Parameter Optimization

• Model the density functions as Gaussian distributions

– N(μ, σ) and N(μs, σs) • Different size of secret

messages ,denoted as s, and test messages are tested

– Secret messages: 1% to 50%– Test messages: a=0.01 – 0.5

• Results– μ>μs for all s– s decreases N(μs, σs) become flat and

the peak moves right– s increases N(μs, σs) become narrower

and the peak moves left • Easier to separate the two peaks for larger

secret message sizes

Threshold Selection

Type I Error = Type II Error(equals minimizing overall error)

Change the threshold Th to adjust for the importance of not missing an image with a secret message at the expense of false-alarm

Experimental Results

K K

K K

Experimental Results (cont.)

K

K

Conclusions

• The probability of error prediction is mainly determined by the size of the secret message– The influence of the test message size is much smaller

• The optimal test message size is different for different secret message size

• The detection algorithm mainly targets for images with smaller number of unique colors– The results for high-quality scanned and loselessly compressed

images (U>0.5MN) may be unreliable

Steganalysis Based on JPEG Compatibility

Image Steganography

• Image formats– Uncompressed (BMP)

• Offering the highest capacity and best overall security

– Palette (GIF)• Difficult to provide security with reasonable capacity

– Lossy compressed (JPEG, JPEG 2000)• Difficult to hide message in JPEG stream in a secure

manner while keeping the capacity practical

Goal of this Paper

• To show that images may be extremely poor candidates for cover images if

• Initially acquired as JPEG images and later decompressed to a loseless format

• For steganalysis methods, minimal amount of distortion is to be achieved to reduce visible artifacts– The act of message embedding will not erase the characteri

stic structure created by JPEG compression– Analyzing the DCT coefficients of images to recover even th

e values of JPEG quantization table• Evidence for steganography

– An image stored in loseless format that bears a strong fingerprinting of JPEG compression, yet is not fully compatible with JPEG compressed image

JPEG Compression

Uncompressed Image

Borig

DCT

dk(i), i=0,…,63

Dk(i)=Round (dk(i)/Q(i))

JPEG Quantization Matrix Q

Zigzag-scanHuffman coder

JPEG Decompression

• Huffman decoding• QDk(i)=Q(i)*Dk(i)

– Multiplying quantized DCT step with quantization step

• Braw=DCT-1(QD )– Inverse DCT

• B=[Braw]– rounded to integers in the range of 0-255

Observations

• If the block B has no pixels saturated at 0 or 255– ||Braw-B||2 ≤ 16 , ||·||: L2 norm

– Since |Braw(i) –B(i)| ≤0.5 for all i

The Proposed Scheme

• Question– Given an arbitrary 8x8 block B of pixel values, could this block h

ave arisen through the process of JPEG decompression with the quantization matrix Q (if available)?

– ||B-Braw||2

=||DCT(B)- DCT(Braw)|| =||QD’-QD|| ≤ 16- Additional check

- Σ(QD’(i)-qp(i)(i))2 ≤ 16, qp(i):integer multiples of Q(i) close to QD(i)- B=[DCT-1(QD)], where QD(i)=qp(i)(i)

≧Σ|QD’(i)-Q(i)round(QD’(i)/Q(i)| = S

By Parseval’s Equality

Algorithm

1. Divide the images into 8x8 blocks

2. Arrange the blocks in a list, and remove all saturated blocks from the list

• T: number of remaining blocks

3. Extract the quantization matrix Q from all T blocks

• If all elements of Q are 1s, the image is not calculated

Algorithm (cont.)

4. For each block B, calculate S5. If S>16,

B is not compatible with JPEG compression. else Perform the additional check6. After going through T blocks, if no incompatible blocks is

found, no evidence of steganography is available.7. Repeat the algorithm for different 8x8 division for

detecting cropped images

Extracting the Quantization Matrix

Some Discussions

Reference

• J. Fridrich, R. Du and M. Long, “Steganalysis of LSB encoding in color images, ” ICME 2000, New York, 2000

• J. Fridrich, M. Goljan and R. Du, “Steganalysis based on JPEG compatibility,” SPIE Multimedia Systems and Applications IV, Denver, 2001

• G. Goth, “Steganalysis gets past the hype,’ IEEE Distributed Systems Online, April 2005

Documents

Introduction to Steganalysis Schemes Multimedia Security