41
Effect of Linearization on Normalized Compression Distance Jonathan Mortensen Julia Wu DePaul University July 2009

Effect of Linearization on Normalized Compression Distance

  • Upload
    kele

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Effect of Linearization on Normalized Compression Distance. Jonathan Mortensen Julia Wu DePaul University July 2009. Introduction. Kolmogorov Complexity is an emerging similarity metric Transformation Distance Universal Similarity Measure - PowerPoint PPT Presentation

Citation preview

Page 1: Effect of Linearization on Normalized Compression Distance

Effect of Linearization on Normalized Compression Distance

Jonathan MortensenJulia Wu

DePaul University

July 2009

Page 2: Effect of Linearization on Normalized Compression Distance

Introduction Kolmogorov Complexity is an emerging

similarity metric Transformation Distance

Universal Similarity Measure Does not require feature identification and selection

How can it be applied to images? CBIR, Classification

Investigate its effectiveness Discovered some fundamentals have been

overlooked thus far

Page 3: Effect of Linearization on Normalized Compression Distance

Outline Background Kolmogorov Complexity and Complearn Research Topics Spatial Transformations Intensity Transformations Image Groupings Conclusion Future Work

Page 4: Effect of Linearization on Normalized Compression Distance

Background Li (2004): successful clustering of phylogeny

trees, music, text files 1D to 2D data?

Tran (2007): NCD not a good predictor of visual indistinguishability Only one photograph used, one type of linearization (row-

by-row) Gondra (2008): CBIR using NCD produced

statistically significant measures against H0 of random retrieval and other similarity measures Test set of hundreds of images, inconsistent methods of

compression and concatenation, linearization unclear

Page 5: Effect of Linearization on Normalized Compression Distance

Kolmogorov Complexity

max{ ( | *), ( | *)max{ ( ), ( )K x y K y xNIDK x K y

K(x) – The length of the shortest program or string x* to produce x

K(x|y) - The shortest binary string to convert output x given input y

E(x,y)=max{K(x|y),K(y|x)} Normalized Information Distance:

Page 6: Effect of Linearization on Normalized Compression Distance

Kolmogorov Complexity Universal, in that it captures all other semi-

computable normalized distance measures Therefore also semi-computable Compression losslessly simplifies strings, and

therefore is used as an approximation, C(x)

( ) min{ ( ), ( )}( , )max{ ( ), ( )}

C xy C x C yNCD x yC x C y

“The human brain is incapable of creating anything which is really complex.”--Kolmogorov,  A.N., Statistical Science, 6, p314, 1990

Page 7: Effect of Linearization on Normalized Compression Distance

CompLearn Open Source package which implements K-

Complexity Developed by Rudi Cilibrasi, Anna Lissa Cruz,

Steven de Rooij, and Maarten Keijzer Uses basic linux compression tools to develop

the comparison map

Page 8: Effect of Linearization on Normalized Compression Distance
Page 9: Effect of Linearization on Normalized Compression Distance

Images from “Google Similar Images”

Page 10: Effect of Linearization on Normalized Compression Distance

Initial Questions Linearization Methods and Alternatives

How to Preserve a 2D signal Linearization’s affect NCD on spatial

transformations and intensity shifts Do additional feature images lower NCD? CBIR: Can K-Complexity be used with feature

vectors or image semantics

Page 11: Effect of Linearization on Normalized Compression Distance

Spatial Transformations Applied 4 types of linearization to 800 images

(original and 7 transformations) Found that each linearization type produced

distinctly different NCDs Certain linearizations result in lower NCDs for

certain transformations

Page 12: Effect of Linearization on Normalized Compression Distance

Linearization Methods

Row Major

Column Major

SCPO:

Images transformed to 35% of original size

Hilbert-Peano SPC:

Images transformed to 128x128

Page 13: Effect of Linearization on Normalized Compression Distance

Original Image Down Shift Left Shift

90 rotation

180 rotation

270 rotation

Reflection Y Axis Reflection X Axis

Spatial Transformations

Page 14: Effect of Linearization on Normalized Compression Distance

Intensity Transformations Additive Constant Three types of noise

Gaussian Speckle Salt and Pepper

Least Significant Bit (LSB) Steganography Contrast Windowing

Page 15: Effect of Linearization on Normalized Compression Distance

Additive Constant

P = Intensity + Constant +4, +8, +12… +100

16 bit 255 (+4)-> 259

Truncation 255 (+4)-> 255

Wrap 255 (+4)-> 4

Image 937.jpg+32 and +64 respectively

Page 16: Effect of Linearization on Normalized Compression Distance

Additive ConstantAdditive Intensity

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

0 20 40 60 80 100 120

Intensity Added

NC

D fr

om O

rigin

al

16bitTruncatedWrap

Page 17: Effect of Linearization on Normalized Compression Distance

Various Noise

Gaussian (Statistical)

Speckle (Multiplicative)

Salt and Pepper (Drop-off)

 

0.32 and 0.64 Variance/Noise Density Respectively

Page 18: Effect of Linearization on Normalized Compression Distance

Noise Cont:

Gaussian and Speckle Noise don’t compress well

Gaussian and Salt Pepper experience some posterior decay

Gaussian and Speckle Noise

1

1.01

1.02

1.03

1.04

1.05

1.06

1.07

0 0.2 0.4 0.6 0.8 1 1.2

Variance

NCD

from

Orig

inal

Gaussian

Speckle

Salt Pepper

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1 1.2

Noise Density

NCD

from

Orig

inal

Salt Pepper

Page 19: Effect of Linearization on Normalized Compression Distance

Least Significant Bit Steganography

Hide4PGP “Scrambles” message Changes pixel bit to

most similar color with opposite bit assignment

Spreads secret data over entire file

True Grayscale: Changes two bits per pixel

Image with No Text

Image hiding “Gettysburg Address”

Page 20: Effect of Linearization on Normalized Compression Distance

LSB SteganographyLSB Steganography

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 5000 10000 15000 20000 25000 30000 35000

Bits Hidden

NCD

from

Orig

inal

LSB Steganography

Page 21: Effect of Linearization on Normalized Compression Distance

Hamming DistanceHamming: Steganography

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

0 10000 20000 30000 40000

Bits Hidden

Ham

min

g Di

stan

ce

Hamming:Steganography

Page 22: Effect of Linearization on Normalized Compression Distance
Page 23: Effect of Linearization on Normalized Compression Distance

Contrast Windowing Computed Tomography image enhancement

that increases contrast in certain structures Brief Medical Exploration

Page 24: Effect of Linearization on Normalized Compression Distance

Contrast Windowing

Soft Tissue Window (50 HU, width 350 HU)

Bone Window (300 HU, width 1500 HU)

Lung Window (-200 HU, width 2000 HU)

Patient 5: Original Image

top left

Page 25: Effect of Linearization on Normalized Compression Distance

original bone lung tiss

p1 0 1.028241 1.049258 1.02429

bone 1.028241 0 1.036157 1.011354

lung 1.049258 1.036157 0 1.039524

tiss 1.02429 1.011519 1.039524 0

p3 0 1.02097 1.043942 1.025635

bone 1.020539 0 1.037073 1.014142

lung 1.044137 1.037073 0 1.037244

tiss 1.026016 1.014354 1.037244 0

p5 0 1.020947 1.047888 1.023039bone 1.020947 0 1.038712 1.019146lung 1.047888 1.038712 0 1.036131tiss 1.023039 1.019924 1.036131 0

P1 P3

P5

Page 26: Effect of Linearization on Normalized Compression Distance

Cross Dicom Comparisonp1tiss p1lung p1bone p1 p3tiss p3lung p3bone p3 p5tiss p5lung p5bone p5

p1tiss 0.0000 1.0395 1.0115 1.0243 0.9739 1.0390 1.0157 1.0223 0.9813 1.0325 1.0066 1.0234

p1lung 1.0395 0.0000 1.0362 1.0493 1.0362 0.9772 1.0361 1.0485 1.0410 0.9853 1.0412 1.0477

p1bone 1.0114 1.0362 0.0000 1.0282 1.0158 1.0378 0.9642 1.0278 1.0197 1.0365 0.9761 1.0247

p1 1.0243 1.0493 1.0282 0.0000 1.0255 1.0460 1.0258 0.9811 1.0258 1.0455 1.0240 1.0025

p3tiss 0.9741 1.0362 1.0168 1.0255 0.0000 1.0372 1.0144 1.0260 0.9810 1.0328 1.0140 1.0222

p3lung 1.0390 0.9772 1.0378 1.0460 1.0372 0.0000 1.0371 1.0441 1.0434 0.9874 1.0418 1.0513

p3bone 1.0137 1.0361 0.9650 1.0258 1.0141 1.0371 0.0000 1.0205 1.0175 1.0360 0.9728 1.0220

p3 1.0238 1.0485 1.0271 0.9811 1.0256 1.0439 1.0210 0.0000 1.0278 1.0414 1.0218 0.9997

p5tiss 0.9932 1.0410 1.0180 1.0258 0.9821 1.0434 1.0172 1.0278 0.0000 1.0361 1.0199 1.0230

p5lung 1.0325 0.9853 1.0365 1.0455 1.0328 0.9874 1.0360 1.0414 1.0361 0.0000 1.0387 1.0479

p5bone 1.0062 1.0412 0.9757 1.0240 1.0142 1.0418 0.9724 1.0217 1.0191 1.0387 0.0000 1.0209

p5 1.0234 1.0477 1.0247 1.0025 1.0222 1.0513 1.0220 0.9997 1.0230 1.0479 1.0209 0.0000

Page 27: Effect of Linearization on Normalized Compression Distance

Conclusion: "How Many" vs "How Little" NCD for Ordinal Comparisons Numerical Redundancy

Entire Picture

Selective

Larger NCD Smaller NCD

SteganographySalt and Pepper Noise

GaussianSpeckleNoise

Additive Constants

Contrast Windowing

Page 28: Effect of Linearization on Normalized Compression Distance

Feature Image Comparison and Grouping Feature Image: Pixel based values derived

from the original image 3 Main Types of Linearization Avg NCD inter > Avg NCD intra The greater inter - intra, the better NCD finds

groupings

Page 29: Effect of Linearization on Normalized Compression Distance

Feature Image Linearization Image-At-Once – row-order one feature image

at a time Row Concatenation – Appends all images,

then performs row-order linearization Pixel Order – Selects value from same pixel of

each feature image in row-order fashion Gray Row-Major – Grayscales an image and

follows row-order on intensities

Page 30: Effect of Linearization on Normalized Compression Distance
Page 31: Effect of Linearization on Normalized Compression Distance

Data Set and Methods

Corel Image Database with 10 predefined groupings

Linearized by 5 methods

NCDs were found within a group and then to the left and to the right

Page 32: Effect of Linearization on Normalized Compression Distance
Page 33: Effect of Linearization on Normalized Compression Distance

Results Nearly every linearization produced

statistically different NCDs Intra Group was always less than Inter Group Gray provided the greatest difference Inter-

Intra Thought this was due to filesize

Triple Concat’ed Gray creating equal filesize: Found an even greater difference

Page 34: Effect of Linearization on Normalized Compression Distance
Page 35: Effect of Linearization on Normalized Compression Distance
Page 36: Effect of Linearization on Normalized Compression Distance
Page 37: Effect of Linearization on Normalized Compression Distance

Conclusion NCD is a good model for predefined human

groupings and linearization has little impact on this

Gray-Triple Row-Major may be the best form of linearization

Direction of concatenation does not matter Defined a methodology for any number of

feature images

Page 38: Effect of Linearization on Normalized Compression Distance

Conclusion Compressor Errors Numerical Redundancy

Ordinal Variables vs Nominal Variables EX: 195 195 195 195 <=> 198 198 198 198

NCD = 0.100000 199 199 199 199 <=> 202 202 202 202

NCD = 0.128205 NCD needs refinement 2D image as a 1D string?

Page 39: Effect of Linearization on Normalized Compression Distance

Future Work Image Scaling and Normalization Additional Feature Images New Forms of Image concatenation Investigate Compressors (Numeric?)

Page 40: Effect of Linearization on Normalized Compression Distance

References A. Itani and D. Manohar. Self-Describing Context-Based pixel

ordering. Lecture notes in computer science, pages 124{134, 2002.

M. Li, X. Chen, X. Li, B. Ma, and P. M.B Vitnyi. The similarity metric. IEEE.Transactions on Information Theory, 50:12, 2004.

R. Dafner, D. Cohen-Or, and Y. Matias. Context-based space lling curves. In Computer Graphics Forum, volume 19, pages 209{218. Blackwell Publishers Ltd, 2000.

R. Cilibrasi, Anna L. Cruz, Steven de Rooij, and Maarten Keijzer. CompLearn home. http://www.complearn.org/.

R. Cilibrasi, P. Vitanyi, and R. de Wolf. Algorithmic clustering of music. Arxiv preprint cs.SD/0303025, 2003.

N. Tran. The normalized compression distance and image distinguishability. Proceedings of SPIE, 6492:64921D, 2007.

I. Gondra and D. R. Heisterkamp. Content-based image retrieval with the normalized information distance. Computer Vision and Image Understanding, 111(2):219{228, 2008.

Page 41: Effect of Linearization on Normalized Compression Distance

Questions