Stream ciphers 2 Session 2. Contents PN generators with LFSRs Statistical testing of PN generator...

Preview:

Citation preview

Stream ciphers 2

Session 2

Contents

• PN generators with LFSRs• Statistical testing of PN generator sequences• Cryptanalysis of stream ciphers

2/75

PN generators with LFSRs

• Computational complexity of the Berlekamp-Massey algorithm is quadratic in the length of the minimum LFSR capable of generating the intercepted sequence.

• Thus, if the linear complexity is very high, then the task of predicting the next bits of the sequence is too complex.

3/75

PN generators with LFSRs

• Linear complexity achievable with a sole LFSR is small.

• Then, in order to prevent the cryptanalysis of a pseudorandom sequence generator, we must design it in such a way that its linear complexity is too high for the practical application of the Berlekamp-Massey algorithm.

4/75

PN generators with LFSRs

• Since LFSRs have nice properties regarding

statistics of their output sequences, a good

idea is to base PN generators on LFSRs.

• But to increase linear complexity, we have to

combine outputs of several LFSRs in non-linear

manner – through non-linear Boolean

functions.5/75

Algebraic normal form

• It is the form of a Boolean function that uses only the operations and

• In the ANF, the product that includes the largest number of variables is denominated non linear order of the function.

• Example: The non linear order of the functionf(x1,x2,x3)=x1x1x3x2x3 is 2.

6/75

Algebraic normal form

• The ANF of a Boolean function can be determined from its truth table.

7/75

nn

u

ux:xu

,u

n

i

uiun

,u,,u,uu

,a

xfa

xax,,x,xfn

i

10

10

110

0

10

1

0110

The Möbius transform

Algebraic normal form

• Example: n=3

8/75

x0 x1 x2 f

0 0 0 00 0 1 10 1 0 00 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0

Algebraic normal form

• u=000 u=001 u=010

9/75

000001010011100101110111

000001010011100101110111

000001010011100101110111

a000=f(0,0,0)=0 a001=f(0,0,0)++f(0,0,1)=0+1=1

a010=f(0,0,0)++f(0,1,0)=0+0=0

x x x

Algebraic normal form

• u=011 u=100 u=101

10/75

000001010011100101110111

000001010011100101110111

000001010011100101110111

a011=f(0,0,0)+ f(0,0,1) +f(0,1,0)+f(0,1,1)= 0+1+0+1=0

a100=f(0,0,0)++f(1,0,0)=0+0=0

a101=f(0,0,0)+ f(0,0,1) +f(1,0,0)+f(1,0,1)= 0+1+0+1=0

x x x

Algebraic normal form

• u=110 u=111

11/75

000001010011100101110111

a110=f(0,0,0)+ f(0,1,0) +f(1,0,0)+f(1,1,0)= 0+0+0+1=1

a111=f(0,0,0)+ f(0,0,1) +f(0,1,0)+f(0,1,1)+ f(1,0,0) +f(1,0,1)+f(1,1,0)+ f(1,1,1) = 0

Then:f(x0,x1,x2)=a001x2+a110x0x1=x2+x0x1

x

Non-linear combiners

• In these generators, the keystream sequence is obtained by combining the output sequences of various LFSRs in a non linear manner.

• Example – it is possible to use a Boolean function (without memory).

12/75

Non-linear combiners

• If F is a Boolean function of N periodic input sequences a1(t), a2(t), ..., aN(t), then the output sequence b(t) = F(a1(t), a2(t), ..., aN(t)) is a linear combination of various products of sequences.

• These products are determined by determining the ANF of the function F.

13/75

Non-linear combiners

• Given the ANF of the function F, if we create a function F* from F in such a way that instead of the sum and product modulo 2 in F we use the sum and product of integers, for the linear complexity and the period of the output sequence of F the following holds:

14/75

N

N

aPer,,aPer,aPerbPer

aLC,,aLC,aLC*FbLC

21

21

lcm

Non-linear combiners

• Example (1)

– If the characteristic polynomials of the input sequences are:

15/75

20100210

20100210

xxxxxx,x,x*F

xxxxxx,x,xF

522

431

40

1

1

1

XX:a

XX:a

XX:a

All these polynomials are

primitive!

Non-linear combiners

• Example (2)– Then

16/75

465311515lcm

4054444

,,bPer

bLC

Non-linear combiners

• The sum of N sequences in GF(q) (1)

– The equality holds if the characteristic polynomials of the input sequences do not have common factors.

17/75

N

iiaLCbLC

1

Non-linear combiners

• The sum of N sequences in GF(q) (2)

– Obviously, if the periods of the input sequences are mutually prime then

18/75

N

N

ii

aPer,,aPer,aPerbPer

aLCbLC

21

1

lcm

thenIf

N

iiaPerbPer

1

Non-linear combiners

• The sum of N sequences in GF(q) (3)– Example:

19/75

89653

2

6110651

1

1

XXXXXf

XXXXXf

1212

89618961

Per

LC

Primitive!

The periods are Mersenne primes

Non-linear combiners

• The product of N sequences in GF(q) (1)– Theorem (Golić, 1989)

• If Per(ai) are mutually prime, then

– Theorem (Lidl, Niedereiter)

Per(ai) are mutually prime

20/75

N

iiaLCbLC

1

N

iiaPerbPer

1

Non-linear combiners

• Example

21/75

89653

2

6110651

1

1

XXXXXf

XXXXXf

1212

542989618961

Per

LC

Primitive!

The periods are Mersenne primes

Non-linear combiners

• The general case (1)– Let be the Boolean function obtained by

removing all the products from the function F except those of the maximum order. Let be the corresponding integer function.

22/75

^

F

*F^

Non-linear combiners

• The general case (2)– Theorem (Golić, 1989)

• F depends on all the N input variables.• Per(ai) are mutually prime.• Then

23/75

N

ii

N

^

aPerbPer

aLC,,aLC*FbLC

1

1 11

Non-linear combiners

• The general case (3)– Example (1)

24/75

2010210

2010210

20100210

20100210

xxxxx,x,x*F

xxxxx,x,xF

xxxxxx,x,x*F

xxxxxx,x,xF

^

^

Non-linear combiners

• The general case (4)– Example (2)

• If the characteristic polynomials of the input sequences are:

• Then

25/75

107974

2

896531

6110650

1

1

1

XXXXXf

XXXXXf

XXXXXf

Primitive, periods Mersenne

primes

121212

116401066088601078961

Per

LC

Non-linear combiners

• The general case (5)– Example – Geffe’s generator (1)

26/75

322133221321 1 xxxxxxxxxx,x,xF

Non-linear combiners

• The general case (6)– Example – Geffe’s generator (2) –

• Equivalent scheme

27/75

Non-linear combiners

• The general case (7)– Example – Geffe’s generator (3)

• If we set the feedback polynomials primitive, with periods that are Mersenne primes:

• Then

28/75

107974

3

896532

6110651

1

1

1

XXXXXf

XXXXXf

XXXXXf

121212

146081068888601078961

Per

LC

Statistical testing of PN generators

• The output sequence of a generator of pseudorandom sequences looks random, but it is not.

• Pseudorandom generators expand a truly random sequence (the key) to a much longer sequence, such that an adversary cannot distinguish between the pseudorandom sequence and a truly random sequence.

29/75

Statistical testing of PN generators

• In order to obtain a guarantee of the security of this type of generators, various statistical tests are applied, especially designed for this purpose.

• The fact that a generator passes a set of statistical tests should be considered a necessary condition, although not a sufficient one, for the security of the generator.

30/75

Statistical testing of PN generators

• If the result X of an experiment can take any real value, then X is a continuous random variable.

• The probability density function f(x) of a continuous random variable X can be integrated and the following holds:

f(x)0, for all xRFor all a, bR the following holds

31/75

1dxxf

b

a

dxxfbXaP

Statistical testing of PN generators

• A continuous random variable has a normal distribution with the mean and the variance 2 if its probability density function is:

• We say that X is• If X is , then we say that X has a standard

normal distribution.

32/75

xexf

x2

2

2

2

1

2,N

1,0N

Statistical testing of PN generators

• If the random variable X is , then the variable is .

• The Euler’s gamma function:

33/75

2,N

/ XZ 1,0N

0

1 dxext xt

Statistical testing of PN generators

• A continuous random variable X has a 2 distribution with degrees of freedom if its probability density function is

34/75

00

022

1 21

22

x,

x,ex/xf

x

/

22

Statistical testing of PN generators

• A statistical hypothesis H is an affirmation about the distribution of one or more random variables.

• A hypothesis test is a procedure based on the observed values of the random variable that leads to the acceptance or rejection of the hypothesis H.

35/75

Statistical testing of PN generators

• The test only provides a measure of the strength of evidence given by the data against the hypothesis.

• The conclusion is probabilistic.• The level of significance of the test of the

hypothesis H is the probability of rejecting the hypothesis H when it is true.

36/75

Statistical testing of PN generators

• The hypothesis to be tested is denominated the null hypothesis, H0.

• The alternative hypothesis is denoted by H1 or Ha.

• In cryptography:– H0 – the given generator is a random sequence

generator.– is between 0,001 and 0,05.

37/75

Statistical testing of PN generators

• A test:– Determines a statistic for the sample of the output

sequence.– This statistic is compared with the expected value

for a random sequence.

38/75

Statistical testing of PN generators

• How is the comparison carried out? (1)– The computed statistic – X0 – follows (usually) a 2

distribution with degrees of freedom.– It is assumed that this statistic takes large values

for non random sequences.

39/75

Statistical testing of PN generators

• How is the comparison carried out? (2)– In order to achieve , a threshold X is chosen (by

means of the corresponding table), such that P(X0>X)=.

– If the value of the statistic for the sample of the output sequence, Xs, satisfies Xs>X, then the sequence fails on the test.

40/75

Statistical testing of PN generators

• Basic tests for cryptographic use:– frequency test, – serial test, – poker test, – runs test, – autocorrelation test, – etc.

41/75

Statistical testing of PN generators

• Frequency test (1)– Purpose: determine if the number of zeros and

ones in a sequence s is approximately the same.– n0 – number of zeros, n1 – number of ones.– The statistic:

42/75

10

210

1

nnnn

nnX

Statistical testing of PN generators

• Frequency test (2)– The statistic follows a 2 distribution with 1 degree

of freedom.– The approximation is good enough if n10.

43/75

Statistical testing of PN generators

• Serial test (1)– Tries to determine if the number of occurrences of

00, 01, 10 and 11, as subsequences of s is approximately the same.

– The statistic:

44/75

1

12

1

4

11100100

21

20

211

210

201

2002

nnnnn

nnn

nnnnn

X

Statistical testing of PN generators

• Serial test (2)– The statistic follows a 2 distribution with 2

degrees of freedom.– The approximation is good enough if n21.

45/75

Statistical testing of PN generators

• Poker test (1)– A positive integer m is considered such that

– The sequence s is divided into k parts of size m.– ni is the number of occurrences of the type i of the

sequence of length m, 1i2m (that is, i is the value of the integer whose binary representation is the sequence of length m.

46/75

m

m

nk 25

Statistical testing of PN generators

• Poker test (2)– The test determines if every sequence of length m

appears approximately the same number of times.– The statistic:

– The statistic follows approximately a 2 distribution with 2m-1 degrees of freedom.

47/75

knk

Xm

ii

m

12

0

23

2

Statistical testing of PN generators

• Runs test (1)– A run of length i – a subsequence of s formed by i

consecutive zeros or i consecutive ones that are neither preceded nor followed by the same symbol.

– A run of zeros – gap– A run of ones – block

48/75

Statistical testing of PN generators

• Runs test (2)– Purpose: determine if the number of runs of

different lengths in the sequence s is that expected in a random sequence.

– The number of gaps (or blocks) of length i in a random sequence of length n is

– It is considered that k is equal to the largest integer i for which ei5.

49/75

223 ii /ine

Statistical testing of PN generators

• Runs test (3)– We denote by Bi and Hi the number of blocks and

gaps of length i in s, for each i, 1ik.– The statistic

– The statistic follows approximately a 2 distribution with 2k-2 degrees of freedom.

50/75

k

i i

iik

i i

ii

e

eH

e

eBX

1

2

1

2

4

Statistical testing of PN generators

• Autocorrelation test (1)– Checks the correlation between s and shifted

versions of s.– An integer d, 1 d n/2 is considered. – The number of bits in s that are not equal to the

d-shifts is

51/75

1

0

dn

idii ssdA

Statistical testing of PN generators

• Autocorrelation test (2)– The statistic

– The statistic follows approximately a N (0,1) distribution.

– The approximation is good enough if n-d 10.

52/75

dn

dndA

X

2

2

5

Cryptanalysis of stream ciphers

53/75

A

Plaintext

KEY

decipher

decrypt

Cryptanalysis

Ciphertextencipher

Plaintext

KEY

B

Cryptanalysis of stream ciphers

• The problem of cryptanalysis– Given some information related to the

cryptosystem (at least the ciphertext), determine plaintext and/or the key.

• The goal of the designer is to make this problem as difficult as possible for the cryptanalyst.

54/75

Cryptanalysis of stream ciphers

• General assumption – all the details of the cryptosystem are known to the cryptanalyst.

• The only unknown is the key.• Types of attack

– Ciphertext-only attack– Known plaintext attack– Chosen plaintext attack– Chosen ciphertext attack

55/75

Cryptanalysis of stream ciphers

• The ciphertext-only attack is the most difficult one for the cryptanalyst (in general).

• The more information known to the cryptanalyst, the easier the attack.

56/75

Cryptanalysis of stream ciphers

• The “brute force attack”– Elementary attack – no knowledge about

cryptanalysis is necessary.– Assumptions

• The cryptosystem is known• The ciphertext is known

– The goal• Determine the key/plaintext

– The means• Trying all the possible keys

57/75

Cryptanalysis of stream ciphers

• Complexity of the brute force attack– Extremely high, if there are many possible keys –

impractical• Key space – the total number of keys possible

in a cryptosystem

58/75

Cryptanalysis of stream ciphers

• Examples of key space size

59/75

Key space – 40 bits 11012

Key space – 56 bits (DES) 71016

Key space – 128 bits 31038

Key space – 256 bits 11077

Number of 256-bit primes 11072

Age of the Sun in seconds 11016

Number of clock pulses of a 3GHz computer clock through the Sun’s age

5.41026

Cryptanalysis of stream ciphers

• A cryptosystem’s security is ultimately determined by the size of its key space

• However, this is the upper limit of that security measure

• There may be a problem in the system design that may cause a significant reduction of the effective key space

• The task of the cryptanalyst – to find this pitfall and to use it to attack the system

60/75

Cryptanalysis of stream ciphers

• Basic attack methods against stream (and block) ciphers– Algebraic– Statistical

• Algebraic attacks (1)– The key symbols (e.g. bits) are the unknowns in

the system of equations assigned to the PRNG

61/75

Cryptanalysis of stream ciphers

• Algebraic attacks (2)– Given all the details of the PRNG to be

cryptanalyzed (except the key bits), determine the system of equations that relates the bits of the output sequence with the bits of the key

– The designer’s goal• To make this system as non-linear as possible• The reason

– non-linear systems are difficult to solve – there is no general method other than trying all the possible values of the variables: 2n possibilities for a system with n variables.

62/75

Cryptanalysis of stream ciphers

• Algebraic attacks (3)– The problem of solving a non-linear system in

GF(2) – the satisfiability problem (SAT)– Cook’s theorem (1971)

• SAT is NP-complete

– However, some instances of the SAT problem may be easier to solve

– The designer should check the system assigned to the PRNG

63/75

Cryptanalysis of stream ciphers

• Algebraic attacks (4)– Example – LFSR– The output sequence: 1110…– The initial state: a0, a1, a2, a3

– The output bits: y0=1, y1=1, y2=1, y3=0– The equations

64/75

41 xxxf

323

212

101

030

ayy

ayy

ayy

aay

a 3210y0 1100y1 1110y2 1111y3 0111

Linear system – easy to solve!

Cryptanalysis of stream ciphers

• Algebraic attacks (5)– Example (1): consider the non-linear PRNG below

65/75

Cryptanalysis of stream ciphers

• Algebraic attacks (6)– Example (2): The system of equations

• (1) y1=(x1+x4)(x5+x7)=x1x5+x1x7+x4x5+x4x7• (2) y2=(x1+x4+x3)(x5+x7+x6)=

=x1x5+x1x7+x1x6+x4x5+x4x7+x4x6+x3x5+x3x7+x3x6• … (we need 7 independent equations)

66/75

Cryptanalysis of stream ciphers

• Algebraic attacks (7)– Example (3): Methods of solving the system

• The brute force method: try all the possible 27-1 solutions (all zeros are not permitted)

• The linearization method– Replace all the products by new variables– Solve the obtained linear system (e.g. by Gaussian algorithm)– Try to guess the variables that were included in the products,

given the values of the new variables, in such a way that the overall system is consistent

67/75

Cryptanalysis of stream ciphers

• Algebraic attacks (8)– Example (4): The linearized system

• y1=z1+z2+z3+z4

• y2=z1+z2+z5+z3+z4+z6+z7+z8+z9

• ...

68/75

Cryptanalysis of stream ciphers

• Algebraic attacks (9)– Other methods of solving non-linear systems,

applied in cryptanalysis• Linear consistency test (LCT)• Methods of computational commutative algebra

(Gröbner bases etc.)• etc.

– No matter how sophisticated the method of solving the system is applied, cryptanalysis of a seriously designed system always includes search

69/75

Cryptanalysis of stream ciphers

• Statistical methods (1)– In the previous example, the majority of the

output symbols will be zero, due to the AND combining function

– The non-linearity of the assigned system of equations is the highest possible

– However, it is possible to make use of bad statistical properties of the output sequence to determine the plaintext sequence

70/75

Cryptanalysis of stream ciphers

• Statistical methods (2)– Example

• With the AND output combiner, the probability of zero in the output sequence will be ¾.

• This means that, upon enciphering with this sequence as the keystream, the probability that the plaintext bit is equal to the ciphertext bit is ¾.

• Consequence – easy reconstruction of the plaintext.

71/75

Cryptanalysis of stream ciphers

• Statistical methods (3)– Correlation – The output sequence coincides too

much with one or more internal sequences – this enables correlation attacks – a kind of statistical attack.

– Correlation attacks• It is possible to divide the task of the cryptanalyst into

several less difficult tasks – “Divide and conquer”

72/75

Cryptanalysis of stream ciphers

• Statistical methods (4)– Typical example – the Geffe’s generator

73/75

322133221321 1 xxxxxxxxxx,x,xF

F balanced – good statistical properties

Cryptanalysis of stream ciphers

• Statistical methods (5)– Problem: Correlation!

74/75

4

3

4

3

2

10

11

2

1

21

21

nn

nn

nnn

nnn

ssPr

ssPrsssPr

sssPr

Cryptanalysis of stream ciphers

• Statistical methods (6)– Since the output sequence is correlated with both

input sequences, we can independently guess the input sequences’ bits with high probability if the output sequence is known.

75/75

Recommended