Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Diffusion and Data compression for
data security
A.J. Han VinckUniversity of Duisburg/Essen
April [email protected]
content
• Why diffusion is important?
• Why data compression is important?
• Unicity distance– Time to discover a secret
• Source coding principle• How data compression works• Zipf law
Han Vinck 2013 2
3
Diffusion-transposition
HOW:
rearrange the symbols in the data without changing the symbols
i.e. the frequency of symbols remains the same
GOAL:
destroy the relations between symbols
and make it more difficult to analyze!
ANALYSIS:
index of Co-incidence, finding periods
Han Vinck 2013
4
example of diffusion
a scytale is a tool used to perform a transposition cipher• http://www.youtube.com/watch?v=VeH0KnZtljY&feature=related
Han Vinck 2013
5
substitution
transposition
substitution
General round structure
Substitute bytes
Shift rows
Mix columns
Add round key
Same equipment can be used to decipher
http://www.youtube.com/watch?v=mlzxpkdXP58
Confusion and diffusion in AES
Han Vinck 2013
Data compression
Han Vinck 2013 6
The goal of data compression is to create
- a compact representation of the data to be encrpyted
- create independent symbols
Decompression gives the original data back!
Data compression
Han Vinck 2013 7
8
Part 1 Part 2 ••• Part n (for example every part 56 bits)
•••
key
••• n cryptograms,
encypher
Part 1
decypher•••
Part 2 Part n
Attacker:
n cryptograms to analyze for particular message of n
parts
key
dependancy exists between parts of the message
dependancy exists between cryptograms
Source coding in Message encryption (1)
Han Vinck 2013
9
Part 1 Part 2 •••Part n
1 cryptogram
source encode
encypherkey
decypher
Source decode
Part 1 Part 2 •••Part n
Attacker:
- 1 cryptogram to analyze for particular message
of n parts
- assume data compression factor n-to-1
Hence, less material for the same message!
(for example every part 56 bits)
n-to-1
Source coding in Message encryption (2)
Han Vinck 2013
10
The position of crypto in a Communication model
source Analogue to digital
conversion
compression/reduction security error
protection
from bit to signal
digital
Han Vinck 2013
11
Source coding
Two principles:
data reduction: remove irrelevant data (lossy, gives errors)
data compression: present data in compact (short) way (lossless)
remove irrelevance
original data compact
description
Relevant data
„unpack“„original data“
Transmitter side
receiver side
Han Vinck 2013
Illustration lossless/lossy
Han Vinck 2013 12
original
≈ original
What do we want (need)?
Han Vinck 2013 13
All data symbols to be enciphered must
occur with equal probability
and
are independent from each other
14
Example:• suppose we have a dictionary with 30.000 words
• these can be numbered (encoded) with 15 bits
• if the average word length is 5, we need „on the average“ 3 bits per letter
01000100
Han Vinck 2013
This can happen
Han Vinck 2013 15
Letter frequency of the vigenere cipher
Han Vinck 2013 16
How to compres? (binary 1)
Han Vinck 2013 17
- #0‘s = f0 N, #1‘s = f1 N; F = (f0, f1) the composition of x
- Then, the number of different vectors x for a given F is
source x = (x1, x2, •••, xN ), xi Є {0,1}
)(entropy! flogf-=Nlog+Nflogf-≈|x|logN1
xrepresent toneeded symbolbits/ ofnumber theand
N)!(fN)!(fN!
=Nf
N=|x|
∑∑1
0=ii2i2
1
0=ii2iF2
100F
en- and decoding
Han Vinck 2013 18
sourcex
N largefor 0letter /output bits 1)(NlogN1 need weF, of valuethe transmit To
entropy!Shannon the toequal is flogf thusand pfN, largefor
2
1
0ii2iii
encoder
F (composition)
Lexicographical index for xencoder
encoderF (composition)
Lexicographical index for xdecoder
x
Lexicographical en- and decoding is a solved problem in computer science
N letters
exercise
• For sequences of length 12 with 4 ones and 6 zeros,
– give the lexicographical index for the sequence 1 0 0 1 0 0 1 0 0 1 0 0
– What is the sequence that belongs to the index 512
Han Vinck 2013 19
20
Binary entropy
interpretation:
let a binary sequence contain pn ones, then we can specify each sequence with
log2 2nh(p) = n h(p) bits
( )2nh pnpn
)p(h
pnn
logn1lim 2n
Homework: Prove the approximation using ln N! ~ N lnN for N large.
Use also logax = y logb x = y logba
The Stirling approximation ! 2 N NN N N e
Han Vinck 2013
21
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
p
h
The Binary Entropy: h(p) = -plog2p – (1-p) log2 (1-p)
Note:
h(p) = h(1-p)
Han Vinck 2013
references
• Information theory books• MPEG, JPEG, …
Han Vinck 2013 22
Application to text: symbols are words
Han Vinck 2013 23
The distribution of words follows the law of Zipf(1935):
Let fn denote the frequency of the n-th most frequent word,
then fn = A/n.
English: A = 0.1
2.16 9.72/4.5 r bits/lette ofnumber The
letter; 4.5 h wordlengtaverage the
9.72; flogf 12366; Mfor M
1ii2i
Zipf‘s law
Han Vinck 2013 24
A web site with many references and applications
http://linkage.rockefeller.edu/wli/zipf/index_ru.html
Web Sites rank ordered by their popularity
Han Vinck 2013 25
Unicity distance (3)
• Idea:
- for a stream cipher after some time L,
the plaintext and keystream can be determined uniquely from the cipher stream
The smallest value where this is possible is called UNICITY DISTANCE U
A necessary condition:
|ML| x |K| |CL| , where | * | means cardinality (or # of)
Han Vinck 2013 26
(otherwise, when |ML| x |K| > |CL|, some plaintexts give the same cipher)
Unicity distance (4)
Han Vinck 2013 27
infinity to goes U ,redundancy low For! sequence source the of redundancy maximum the is R |M|log
:NOTE to IMPORTANT
|C| |M|where,R |M|log
|K|logU
and
L
L|M|log
L|C|log
|K|log:have we
|C|log |C|log |K|log |M|log from
1
11
LL
LLL
Han Vinck 2013 28
A probabilistic approach (Hellmann)
1 1
2 2
|ML| |CL |
Equal probable messages,
Equal probable keys
|K|
-)[z(c consider:proof
na)(cza )z(c z
incoming of# arrows outgoing of# the :used we
c entering arrows of number the )isz(c where|K|x|M|
)z(c)P(c and )z(c |K|x|M|
n
1ii
2n
1ii
2n
1ii
ii
Li
i|C|
1iiL
L
,
0 ≥]na
-)[z(cconsider :fproo
na≥)(cz=>a= )z(c :used we
C)M,pair unique (one before asresult same thegives 1=z
|C||K|x|M|
≥|K|x|M|
)(cz=)P(c)z(c=z
incoming of # = arrows outgoing of # the :used we
c entering arrows ofnumber the)isz(c where
,|K|x|M|
)z(c=)P(c and )z(c= |K|x|M|
2n
1=ii
2n
1=ii
2n
1=ii
L
L|C|
1=i L
i2
ic
i
ii
L
ii
|C|
1=iiL
∑
∑∑
∑∑
∑
L
i
L
Examples: Unicity distance (5)
Han Vinck 2013 29
Assume that the German language has a rate R of 2 bits per letter
- Then, for
a substitution cipher with 26! keys or
a permutation cipher with period 26 ( 26! keys )
- For a Vigenere cipher of length 80:
- Try to find U for the DES
322log26
log26!R |M|log
|K|logU
:have we
1
1402log26
log26R |M|log
|K|logU
:have we80
1
Conclusion: Unicity distance (6)
Han Vinck 2013 30
It is important to make the value of R as high as possible for a large U
Hence:
source compression before encryption is important for secure communications
Note added: Given the message to the analyst, the value of R = 0.
|M|log|K|logUplaintext, and ciphertext the given Hence,
1
Han Vinck 2013 31
THE MARCONI FELLOWS
1999 - Professor James L. MasseyMarconi Award citation
"For theoretical and practical contributions to cryptography and related coding problems; teacher and mentor to a generation of scientists and technologists"
Professor Massey made significant advances in forward-error-correcting codes, multi-user communications, and cryptographic systems. In addition, Professor Massey is known for his contributions to the field of engineering education. He is currently an Adjunct Professor at the University of Lund, Sweden.
Professor James L. Massey
A GREAT SCIENTIST and TEACHER!MOTTO: SIMPLE but SOLID
Data compression (M-ary 1)
Han Vinck 2013 32
- Suppose that a source generates N independent M-ary symbols
- The frequency of a symbol i is fi and thus fi N symbols i occur in x
- We call F = (f1, f2, •••, fM ) the composition of x
- Then, the number of different vectors x for a given F is
source x = (x1, x2, •••, xN ), xi Є {1,2, •••,M}
)(entropy! flogfNlogNflogf|x|logN1
x represent to needed symbol bits/ of number the andN)!(fN)!(fN)!(f
N!Nf
NfNfNfNNf
NfNNf
N|x|
M
1ii2i2
M
1ii2iF2
M21M
1M21
2
1
1F
en- and decoding
Han Vinck 2013 33
sourcex
N largefor 0letter /output bits 1)(NlogN
1-M need weF, of valuethe transmit To
entropy!Shannon the toequal is flogf thusand pfN, largefor
2
M
1ii2iii
encoder
F (composition)
Lexicographical index for x
encoder
encoderF (composition)
Lexicographical index for x
decoderx
Lexicographical en- and decoding is a solved problem in computer science
N letters