template thesis work - DiVA portallnu.diva-portal.org/smash/get/diva2:324602/FULLTEXT01.pdf · auxiliary quantaties used in generating digital signatures of the employes authenticaiton

Master Project

Mohammad shafiqul haque 2010-06-14 Subject: Mathematics Level: Advance Course code: 4MA11E

Title:Varies statistical test of pseudorandom number generator

Abstract

This thesis is related to varies statistical test of pseudorandom number generator. In this

thesis I have tried to discuss some aspects of selecting and testing Pseudorandom number

generators. The outputs of such generators may be used in many cryptographic applica-

tions, such as the generation of key material. After statistical test I have tried to compaire

the test value of every generator and have discussed which one is producing good se-

quences and which one is a good generator.

Key-words: Pseudorandom number generator (PRNG), Statistical test,Test value,

Acknowledgments

I would like to thank my supervisor Marcus Nilsson for accepting and giving chance and

encourage to do my thesis under his kind supervision. I also want to thank of my head

of department and teachers who helped us in different ways. I am also thankfull to my

parents and friends who suported and encourage during my study. I am also giving thanks

to the university library and university lab for their excellent support. At last I am giving

thanks to the swedish government for giving excellent oppertunity for study in sweden.

iii

Contents

1 Introduction 11.1 Aim of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Random and pseudorandom number generators (RNGs and PRNG) . . . 1

1.3 Need for statistical test and why? . . . . . . . . . . . . . . . . . . . . . . 3

2 Applications of PRNG 32.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Some Types of PRNGs 43.1 Linear Congruential Generator . . . . . . . . . . . . . . . . . . . . . . . 4

3.2 How to produce a bit sequence . . . . . . . . . . . . . . . . . . . . . . . 5

4 Probability Distribution 64.1 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.2 chi-square (χ2) distribution . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 chi-square tests 8

6 statistical test 106.1 Monobit test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6.2 Twobit test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

6.3 Threebit test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.4 Subsequences of length t . . . . . . . . . . . . . . . . . . . . . . . . . . 11

6.5 General discussion about testvalues percentage . . . . . . . . . . . . . . 11

7 Investigates of some pseudo random number generators 127.1 Generator 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

7.2 Generator 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

7.3 Generator 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7.4 Generator 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

7.5 Generator 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Conclusion 16

References 16

9 appendix 17

iv

1 IntroductionThere are two basic types of generators for creating random sequences. One is random

numbar generator (RNG) and another is pseudorandom number generator (PRNG). Gen-

erally, a random number generator uses a non-deterministic source i.e. unpredictable

source along with some processing function to generate random sequence. The outputs

of an RNG may be used directly as a random number or may be fed into a pseudorandom

number generator (PRNG) and, the pseudorandom number generator creates a sequence

of random bits from an initial value called a seed by using known algorithm. Random

numbers play an important role in the field of network security applicaitons. The need for

random and pseudorandom numbers arises in many cryptographic applications. Crypto-

graphic protocols require random or pseudorandom inputs at various points, and also for

auxiliary quantaties used in generating digital signatures of the employes authenticaiton.

We can generate random number by spinning wheels or rolling dice or card shuffling.

Nowadays we can produce a pseudorandom number sequence by using the latest com-

puter technology. Pseudorandom numbers sequences are very useful for different types

of applications. Let suppose simulation, sampling, numerical analysis and computer pro-

gramming see [1]. By using PRNG with one or more inputs we can generate a lots of

"pseudorandom" numbers. A random number seems to be more random and we can ob-

tain it through natural sources of the random numbers. We can choose an example from

a semiconductor resistor that is thermal noise. It is a good source of randomness. On

the other hand, just as flipping coins to generate random bits would not be practical for

cryptographic applications, most natural conditions are not practical due to the natural

slowness in sampling the procedure and the complexity of ensuring that an opponent does

not examine the process see [2]. There are many types of methods to create a pseudo ran-

dom number and we have read that the Linear Congruential method and the Blum Blum

Shub method are widely used to create a pseudorandom number.

1.1 Aim of the Project

The aim of the project is by using pseudorandom number generators in Mathematica

we will do various statistical tests of various kinds of generators and after getting these

statistical test values we will discuss the randomness of the sequences.

1.2 Random and pseudorandom number generators (RNGs and PRNG)

At the heart of simulations of random models is a method for producing random numbers-

a procedure or function that will churn out number after number uniformly distributed

in the interval [0,1]. The method explained in this section is the method used by most

programming languages which have built in random number generators. Actually, the

random number generator will be a specific formula that produces random numbers in

a completely deterministic way. This is a contradiction to the very idea of randomness.

Consequently, the numbers produced by the random number generator are often called

pseudorandom because, although they have a very definite pattern, they appear to have no

discernible pattern detectable without knowing the exact formula used. The fact that the

same sequence of pseudorandom numbers is generated each time the generator is used

even useful in helping debug programs and understanding the results of the simulation.

The second type of generator is pseudorandom number generator (PRNG). A PRNG uses

one or more inputs and generates multiple pseudorandom numbers. Inputs to PRNGs

are called seeds. In contexts in which unpredictability is needed, the seed itself must be

random and unpredictable. Hence, by default, a PRNG should obtain its seeds from the

1

outputs of an RNG, so a PRNG requires a RNG as a companion. The outputs of a PRNG

are typically deterministic functions of the seed, so all true randomness is confined to

seed generation. The deterministic nature of the process leads to the term pseudorandom.

Since each element of a pseudorandom sequence is reproducible from its seed, only the

seed needs to be saved if reproduction or validation of the pseudorandom sequence is re-

quired.

Ironically, pseudorandom numbers often appear to be more random than random numbers

obtained from physical sources. If a pseudorandom sequence is properly constructed,

each value in the sequence is produced from the previous value via transformations that

appear to introduce additional randomness. A series of such transformations can elimi-

nate statistical auto correlations between input and output. Thus, the outputs of a PRNG

may have better statistical properties and be produced faster than an RNG. Randomness

means "no pattern", pseudorandomness means "no apparent pattern". Start with positive

integers MULT(for multiplier), ADDR(for adder), and NORM(for normalizer). SEED is

to be a pseudorandom number satisfying

0 ≤ SEED < NORM

Each time a new random number is needed, it is produced from the previous value of

SEED by the formula

SEED := (MULT ∗SEED+ADDR)modNORM

That is, first SEED gets multiplied by MULT, then ADDR is added on, and finally the

remainder upon division by Norm is the new value of SEED.

Example 1.1. Use these values

MULT := 6 ADDR := 5 NORM := 11

What values of SEED will be produced if the initial value of SEED is 0? if the initial

value is 4 ?

Each time the new value of SEED is

SEED := (6∗SEED+5)mod11.

With SEED initially 0 this sequence is generated.

052689473105268 . . .

With SEED initially 4 this sequence is generated.

47310526894731052 . . .

From Example 1 two facts are apparent: First, since the next number in the sequence is

generated from only the value of the previous number, if any number is generated again,

the entire list is a repetition from that point on. Second, the "cycle length"- the number

of distinct numbers before repetition occurs-can be at most of length equal to the value

of NORM. This is so since the mod function produces the remainder upon division by

NORM, which necessarily is a number between 0 and NORM-1; thus there are NORM

many possible remainders-NORM many possible values of the random number SEED,

that is; and as soon as one is repeated, repetition of the entire list occurs. A good random

number generator would use values of MULT, ADDR, and NORM so that the cycle length

is large.

2

Example 1.2. Consider the generator

SEED := (6∗SEED+3)mod7

Then with different initial values of SEED, these sequences are generated:

0303030303 . . .

1212121212 . . .

4646464646 . . .

5555555555 . . .

Although the value of NORM= 7 suggests that the cycle length might be the maximum

value of 7, the actual cycle lengths are small; if SEED is initially 5, then the random

number generator is quite useless. The theory of what values of the parameters result in

good random number generators is complicated and more a subject of abstract algebra

than of probability see [3].

1.3 Need for statistical test and why?

My thesis is related to statistical test and I have to do statistical test in mathematica. With-

out statistical test it is not possible to get the test value sequence and without sequence it

is also not possible to justify randomness of sequences. After doing statistical test we will

check the randomness of the sequences. [4].

2 Applications of PRNG

2.1 Simulation

Simulation is essentially a technique of statistical sampling control. Used in combination

with a model to obtain approximate answers to questions about complex, multifactorial

probalistic problems. It is very useful when the numerical and analytical techniques can

not answer. Simulation is infact a statistical experiment, performed in a digital computer

see [5]. Statistical models for estimating the characteristics of the distributions as a means

of functions, quantiles and other functions that can not be calculated in closed form. Using

a simulation estimator, it is good for calculating a measure of how accurate the estimate,

plus the estimate itself see [6]. Simulation is a technique that can be used to shed light on

how a complex system as well as a thorough analysis is not available.

Example 2.1. Engineers can simulate traffic models in the surrounding area of a construc-

tion plan to use what the effects may have different limitations. A physicist can simulate

the activities of gas molecules under conditions that are enclosed by no known assump-

tions. Statistical models used to estimate the probability of the type of our models can not

be considered logically. Simulation, because it includes and ingredients of randomness in

an analysis, it is occasionally the Monte Carlo analysis, the name of the famour European

gambling see [6].

3

2.2 Cryptography

Cryptography means hiding information. We can see the use of cryptography in the field

of mathematics, computer science and engineering. We are using cryptography in the

advanced applications for example ATM (automated teller machine) Cards, computer

passwords, and the electronic commerce Pseudorandom number generator is related to

cryptography the degine of Pseudorandom number generators is one of the main is-

sues in stream ciphering, where Pseudorandom number sequences are often employed

as keystreams see [7].

Example 2.2. In cryptograph to encrypt a plaintext we can use a pseudorandom number

as a key stream and to decrypt the ciphertext also use pseudorandom number then we get

our plaintext again. So we can see here pseudorandom number is very essential in our

cryptograpic application.

Our plaintext is 01101001100

Key stream is 10110010111

Add by bit wise then Our ciphertext becomes 11011011011

Again if we add kye stream with ciphertext by bit wise then we will get back our plaintext

which is given below

11011011011

plus10110010111

plaintext 01101001100

3 Some Types of PRNGs3.1 Linear Congruential Generator

The most widely used technique for pseudorandom number generation is an algorithm, it

is called linear congruential method, first introduced by D. H. Lehmer in 1949. We need

to choose four parameters which are given in the table below.

Parameter name Range

m 0 < ma 0 ≤ a < mc 0 ≤ c < mX0 0 ≤ X0 < m

The sequence of random numbers (Xn) is obtained via the following equation

Xn+1 ≡ (aXn + c) (mod m),0 ≤ Xn+1 < m. (3.1)

This will produce a sequence of integers with each integer in the range 0 ≤ Xn < m, (see

[4]).

Example 3.1. Let us now consider the Linear Congruential Generator for the parameters

m=5 , a=2, X0 = 2 and c=1. We get,

X1 ≡ 2 ·X0 +1 (mod m)

≡ 2 ·2+1 (mod 5)

4

≡ 5 (mod 5)

≡ 0 (mod 5)

and,

X2 ≡ 2 ·0+1 ≡ 1 (mod 5)

in a similar way we get,

X3 ≡ 3,X4 ≡ 2,X5 ≡ 0,X6 ≡ 1.

The sequence is 0 1 3 2 0 1 3 2. . . We can see that the sequence is repeating with period 4.

3.2 How to produce a bit sequence

We will discuss about how to produce a bit sequence. We can consider a linear congruen-

tial equation mod m and its starting value X0=2.

Xn ≡ 3Xn−1 +5 (mod m),X0 = 2. (3.2)

By using the above mentioned linear congruential equation we can produce a binary se-

quence of Sm. We can also produce the binary sequence by using the last bit, last three

bits and last six bits of every number in Sm.

Example 3.2. Let us take m=11 in our linear congruantial equation then we can produce

the binary sequence by using the last bit, last three bits and last six bits of every number

in S11

our linear congruantial equation becomes

Xn ≡ 3Xn−1 +5 (mod 11),X0 = 2.

If we take n=1, then our equation becomes

X1 ≡ 3X0 +5 (mod 11).

X1 ≡ 3 ·2+5 (mod 11).

X1 ≡ 11 (mod 11).

X1 = 0

when n=2

X2 ≡ 3X1 +5 (mod 11).

X2 ≡ 3 ·0+5 (mod 11).

X2 ≡ 5 (mod 11).

X2 = 5

In a similar way we get the sequence 2 0 5 9 10 2 Now we will make it to a binary

sequence. So our binary sequence becomes 0 0 1 1 0 0.

Now we will produce a binary sequence by using the last three bits of every number in

S11. The binary sequence of 2 is

2 = 0 ·20 +1 ·21

5

So the binary sequence of 2 is 1 0

In a similar way we get the binary sequence of 0 is 0

The binary sequence of 5 is 1 0 1

The binary sequence of 9 is 1 0 0 1

The binary sequence of 10 is 1 0 1 0

Now the binary sequence by using the last bit of every number in S11 are 0,0,1,1,0,0.

The binary sequence by using the last three bits of every number in S11 are 010,000,101,001,010.

The binary sequence by using the last six bits of every number

in S11 are 000010,000000,000101,001001,001010.

4 Probability Distribution

A probability distribution indicates either the probability of each value of an unidenti-

fied random variable or the probability of the value falling within a particular interval.

The behavior of a random variable is characterized by its probability distribution, that

is, by the way probabilities are distributed over the values it assumes. The corresponding

functions for a continuous random variable are the probability distribution function, prob-

ability distributions are uses to calculate definite intervals for parameters and to calculate

critical area for hypothesis tests. It is sometimes useful to identify a reasonable distri-

butional model for the data. Statistical intervals and hypothesis tests are often based on

specific distributional assumptions. Before computing an interval based on distributional

assumption, we need to justify that the assumption is justified for the given data. In this

case, the distribution does not need to be the best-fitting distribution for the data. But an

adequate enough that the statistical technique yields valid conclusions. Also simulation

studies with random numbers generated from using a specific probability distribution are

often needed. That the term probability functions covers both discrete and continuous

distributions. We may use the term probability density functions to mean both discrete

and continuous probability functions. Given a random experiment with its associated ran-

dom variable X and given a real number x, let us consider the probability of the even

(s : X(s) ≤ x),or, simply, P(X ≤ x). This probability is clearly dependent on the assigned

value x. The function

Fx(x) = P(X ≤ x), (4.1)

is defined as the probability density function (PDF), or simply the distribution function,

of X. In equation(4.1), subscript X identifies the random variable. This subscript is some-

times omitted when there is no risk of confusion. Let us repeat that Fx(x) is simply P(A),

the probability of an event A occuring, the event being X ≤ x. The PDF is thus the prob-

ability that X will assume a value lying in a subset of S, the subset being point x and all

points lying to the left of x. As x increases, the subset covers more of the real line, and the

value of PDF increases until it reaches 1. The PDF of a random variable thus accumulates

probability as x increases, and the name cumulative distribution function (CDF) is also

used for this function. In view of the definition and the discussion above, we give some

of the important properties possessed by a PDF.

It exists for discrete and continuous random variables and has values between 0 and 1.

It is a nonnegative, continuous to the left, and nondecreasing function of the real variable

x.

Fx(−∞) = 0,andFx(+∞) = 1 (4.2)

6

If a and b are two real numbers such that a < b,than

P(a < X ≤ b) = Fx(b)−Fx(a). (4.3)

This relation is a direct result of the identity

P(X ≤ b) = P(X ≤ a)+P(a < X ≤ b).

We see from equation (4.3) that the probability of X having a value in an arbitrary inter-

val can be represented by the difference between two values of the PDF. Generalizing,

probabilities associated with any sets of intervals are derivable from the PDF see [8].

4.1 Gamma distribution

A random variable X with density

f (x) =1

Γ(α)β α xα−1e−xβ

where x > 0, α > 0, β > 0 is said to have a gamma distribution with parameters α and βsee[10], where

Γ(α) =∫ ∞

0tα−1e−tdt.

4.2 chi-square (χ2) distribution

A particular type of gamma distribution known as a χ2 distribution. This distribution is

closely related to a random samples from a normal distribution, which is widely applied

in the field of statistics. The gamma distribution with parameter α and β , and positive

integer n, the gamma distribution for which α = n/2 and β = 1/2 is called the chi square

distribution with n degrees of freedom. The chi square distribution has one parameter,

Figure 4.1: chi square distribution with 6 degrees of fredom

its degrees of freedom. It has a opposite skew, the skew is less with more degrees of

freedom. As the degrees of fredom increase, the chi square distribution approaches a

normal distribution. The mean of a chi square distribution is its degree of freedom. see

[6]

7

Figure 4.2: chi square distribution with 3 degrees of freedom

Figure 5.1: chi square distribution with 5 degrees of freedom

5 chi-square testsAny statistical test that uses the chi- square distribution can be called chi-square test.

The chi square test is perhaps the best known of statistical test and it is basic method

that is used in connection many other test. In a particular example of the chi square test

as it might be applied to dice throwing. The chi Square distribution is a mathematical

distribution that is directly in many tests of significance. The most common use of the

chi square distribution is to test differences among propertions. Although this test is by

no means the only test based on the chi square distribution, it has come to be known as

the chi square test The chi-square test is the most commonly used method for comparing

frequences or propertions. It is a statistical test used to determine if observed data deviate

from those expected under a particular hypothesis. The chi-square test is also referred

to as a test of a measure of "goodness of fit" between data. The χ2 distribution with k-

1 degrees of freedom is a process for testing the null hypothes is that our data from an

sample from a specific distribution against the alternative hypothesis that the data have

some other distribution. The test is most natural when the specific distribution is discrete.

Suppose that there are k possible values for each observation. We observe Ni with value

i for i = 1, ....,k. Suppose that the null hypothesis says that the probability of the ithpossible value is pi for i = 1, ....,k. Then we compute

Q =k

∑i=1

(Ni −npi)2

npi(5.1)

Where n = ∑ki=1 Ni is a simpel size. When the null hypothesis says that the data have

a continuous distribution, then one must first create a corresponding discrete distribu-

tion. One does this by dividing the real line into finitely many intervals, calculating the

probability of each interval p1, .....pk, and then pretending as we learned from the data

8

were into which intervals each obserbation fell. This converts the original data into dis-

crete data with k possible values. All the χ2 test statistics in this text have the form

∑ (observed−expected)2

expected , where "observed" stands for an observed count, and "expected"

stands for the expected value of the observed count under the assumption that the null

hypothesis is true.

In 1900, Karl Pearson showed that if the hypothesis H0 is true , then as the sample size

n → ∞, the degrees of freedom of Q converges to the degrees of freedom of the χ2 distri-

bution with k-1 degrees of freedom. Thus if H0 is true, and the sample size n is large, the

distribution of Q will be approximately a χ2 distribution with k-1 degrees of freedom the

discussion that we have presented indicates that H0 should be rejected when Q≥C, where

C is an appropriate constant. If it is desired to carry out the at the level of significance ∝o,

then C should be chosen to be the 1− ∝o quantity of the χ2 distribution with k-1 degrees

of freedom. This is called the χ2 test of goodness-of-fit. see [6]

The chi-square analysis is used to test the null hypothesis H0, which is the hypothesis

that states there is no significant difference between expected and observed data. Investi-

gators either accept or reject H0, after compairing the value of chi-square to a probability

distribution. Chi-square values with low probability lead to the rejection of H0 and it is

assumed that a factor other than chance creates a large deviation between expected and

observed results.

Example 5.1. If we flip a coin 200 times the probability of flipping heads is 0.5, and the

probability of flipping tails is 0.5this means that we are predicting that half of the time the coin will come up heads, and

half of the time the coin will come up tails.

then our hypothesis predicts.

Expected; Heads-100, Tails-100, total- 200 to test hypothesis, we are flipping our penny

200 times .

Observed: Heads- 108, Tails- 92, total- 200

our chi-square formula is

Q =k

∑i=1

(Ni −npi)2

npi(5.2)

For heads, (Ni −npi)2 = (108−100)2 = (8)2 = 64

For tails, (Ni −npi)2 = (92−100)2 = (−8)2 = 64

The number of trails is very important. A particular deviation from perfect means a lot

more if there are only a few trials then it would if there were many trials, this is done by

dividing our (Ni −npi)2 values by the expected values

For heads,(Ni−npi)2

npi= 64

100= 0.64

For tails,(Ni−npi)2

npi= 64

100=0.64

To calculate the chi-square value for our experiment, we add together all of the(Ni−npi)2

npi

values sum of χ2 = 0.64+0.64 = 1.28

We can descrive all information by the following table.

data observed expected (O−E) (O−E)2 (O−E)2

Eheads 100 108 8 64 .64

tails 100 92 -8 64 .64

total 200 200 sum χ2= 1.28

9

Now we have to find χ2(v) (degrees of freedom ). To calculate the χ2(v) we need

to know the numbers of classes of data. In the case of this example that number would

be two ("Heads" or "Tails") so, this degree of freedom is, χ2(v) = (2-1)=1. If we were

dealing with dice rather then coins then df would be (6-1)=5. Now we have the sum of

χ2 and the χ2(v) 1.28 and 1 respectively. According to chi-square distribution table, 1.28

falls between the numbers 1.07 and 1.64 which represents 0.30 and 0.20 respectively. So,

we could say, that probability of our chi-square falls between 0.20 and 0.30.

A probability of 0.20 corresponds to a "chance" of 20%, and 0.30 to a chance of 30%,

this chi- square result means that, If our hypothesis is correct, then our results would be

at least this far from what we predited or the probability that we would get results at

least as bad as these, even though our hypothetsis is correct is between 0.20 and 0.30.in

bilogically applications, a probability 5% is usually adopted as the standard. This values

means that the chances of an observed value arising by chance is only 1 in 20, beacause

the chi squared value we obtained in the coin exemple is greater then 0.05, we accept the

null hypothesis as true and conclud that our coin is fair[10]

6 statistical test

Now we are going to define a couple of tests to use on our mathamatica file. Note that we

will be dealing with binary sequence. We will use n=100 as a sample size.

6.1 Monobit test

Here the focuse of the test is the proportion of zeroes and ones for the entire sequence.

The purpose of this test is to determine whether the number of zeros in a sequence are ap-

proximately the same as would be expected for a truly random sequence. All subsequence

tests depend on the passing of this test. Now we will derive the statistic for the monobit

test from our chi square formula see(6.4). In our statistical test for monobit test we can

take the length of the subsequence is 1, The number of different outcomes k is 2 and the

degree of fredom is k-1=2-1=1. pi=1/2 is the probability for monobit test and n=100, is

the number of samples. Now our chi square formula becomes

Q1 =2

∑i=1

(Ni −npi)2

npi

=(N1 −np1)2

np1+

(N2 −np2)2

np2

=(N1 −50)2

50+

(N2 −50)2

50.

(6.1)

6.2 Twobit test

Now we will derive the statistic for the twobit test from our chi square formula see (6.4).

In our statistical test for monobit sequence we can take the length of the subsequence is 2,

The number of different outcomes k is 4 and the degrees of fredom is k-1=4-1=3. pi=1/4

is the probability for twobit test and n is the number of samples. Now our chi square

10

formula becomes

Q2 =4

∑i=1

(Ni −npi)2

npi

=(N1 −np1)2

np1+

(N2 −np2)2

np2+

(N3 −np3)2

np3+

(N4 −np4)2

np4

=(N1 −25)2

25+

(N2 −25)2

25+

(N3 −25)2

25+

(N4 −25)2

25.

(6.2)

So outcomes of two bits sequences are (00,10,01,11)

6.3 Threebit test

Now we will derive the statistic for the threebit test from our chi square formula see (6.4).

In our statistical test for threebit sequence we can take the length of the subsequence is

3, then the number of different outcomes k is 8 and the degree of fredom is k-1=8-1=7.

pi=1/8 is the probability for threebit test and n is the number of samples. Now our chi

square formula becomes,

Q3 =8

∑i=1

(Ni −npi)2

npi. (6.3)

And outcomes of three bits sequences are (000,100,010,001,110,101,011,111)

6.4 Subsequences of length t

Now we are going to discuss the subsequence of length t. Let t be a positive integer. The

number of different outcomes is 2t and the probability of subsequence of length t is 1/2t

and the chi square formula becomes

Qt =2t

∑i=1

(Ni −n/2t)n/2t (6.4)

6.5 General discussion about testvalues percentage

From probability distribution function we can say that our test values must be between 0

and 1 (see4.3)

Figure 6.1: chi square distribution with 6 degrees of fredom

In the figure right sight area is significant level of α , which is 5% area of whole figure.

If our test values V lies in the area of α then we can say that our test value is not good

About test values percentage see the percent table.

11

7 Investigates of some pseudo random number generators

The sequence of random numbers (Xn) is obtained via the following equation

Xn+1 ≡ (aXn +b) (mod m),0 ≤ Xn+1 < m,X0 = S.

Which is called a linear congruential equation. Where m is the modulus, a is the multi-

plier, b is the increment and X0 = 0 is a starting value.

Now we will choose the values of m, a, b and s, and investigate different generators

by doing statistical test. There are five generators taken with different values and we

will observe after statistical test what kind of sequences the values of different generators

produces and we will justify that the values are random or not random. The values for

generators which we will use to investigate the pseudorandom number generators is pre-

sented in the table below.

generators m a b s

genetator 1 400 25 5 2

generator 2 2509 23 5 2

generator 3 1578 25 5 2

generator 4 3568 25 5 2

generator 5 6784 27 5 2

Given the above mentioned 5 generators and its values we can do statistical test for last

one bit test, last 3 bit test and last 6 bit test (see 6) and in each bit we can change the value

of t suppose for last 1 bit test for generator 1, we do the statistical test when t=1, 2 and 3

one after another. After each statistical test we will get different test values of sequences

and after analyzing these test values we will discuss which sequence is random and which

one is not random. We can now discuss the statistical test value V, if V is less then the

1% entry or greater than the 99% entry, we can reject the numbers as a not sufficiently

random, and according to our percentage table (see :percentage table)we can say that the

number is "very bad". If v lies between the 1% and 5% entries or between the 95% and

99% entries, then we can see in table (see:percentage table) , the numbers are "bad".If

V lies between the 5% and 25% entries, or the 75% and 95% entries, then according to

your percentage table we can say that the number is "not so good" but the number will be

"good" when the number lies between 25% and 75%.We can describes the percentage of

the test values in a easy way in a table below.

percentage scores

0%-1% very bad

1%-5% bad

5%-25% not so good

25%-75% good

75%-95% not so good

95%-99% bad

99%-100% very bad

12

After statistical test we get the sequences. The chi-square test is often done at least

three times on different sets of data which is test1, test2 and test3 , and if at least two

of the three tests are "not good" the results are suspect the numbers are regarded as not

sufficiently random. Now we will discuss about generator1.

7.1 Generator 1

7.1.1 generator 1 for last 1 bit

test test 1 test 2 test 3

monobit test very bad very bad very bad

2 sequence test very bad very bad very bad




monobit test bad very bad very bad




test test1 test 2 test 3

monobit test not so good Good Good

2 sequence test bad bad very bad

3 sequence test not so good not so Good good

7.1.4 general discussion

In our generator 1 for 1 bit we can see that mono bit, two bits and three bits tests values

all are "very bad" and when we change the bits from 1 to 3 bits then the values of the tests

are very similler to 1 bit tests when we change the bit from 3 to 6 bits then we can see that

in monobit test three of the two values are "good" and in two sequences test we can see

that three of the two tests are "bad" and one is "very bad" and in three sequences test two

tests are "not so good" and one is "good". So we can say that in generator 1, one bit and

three bits test are not good but 6 bits test are better than other two.

7.2 Generator 2



monobit test Good Good Good


3 sequence test not so good Good Not so

good

7.2.2 generator 2 for last 3 bits


monobit test Very bad Good not so good

2 sequence test good good not so good

3 sequence test not so good not so good not so good

13



monobit test not so good good good

2 sequence test bad not so good not so good

3 sequence test not so good not so good good


In generator 2 for last 1 bit we can see three test values are "good" but in 2 sequences test

all test values are "very bad" and in 3 sequences test two values "are not so good" and one

is "good". If we change the bit from 1 bit to 3 bit then we can see that in monobit test one

value is "good" and other two is "very bad" and "not so good" but in 2 sequence test two

values are "good" and one is "not so good". Unfortunately in 3 sequence test all values

are "not so good". Now in 6 bit test we can see that two tests are "good" and one is "not

so good" but in 2 sequences test two values are "not so good" and one is "bad" also for

3 sequences test we get two of the three test values are "not so good" values and one is

"good". In generator 2 we can see that for 1 bit test we get 4 "good" sequences, for 3 bit

sequences we get 3 "good" sequences and for 6 bits test we get 3 "good" sequences. So

we can say that in generator 2 we get more "good" sequences in one bit test, so generator

2 for one bit test is batter than other two .

7.3 Generator 3








monobit test not so good not so good not so good


3 sequence test not so good good very bad



monobit test not so good good good

2 sequence test not so good good not so good

3 sequence test not so good good bad


In generator 3 for last 1 bit we can see that monobit, twobits and three bits test values all

are "very bad". In 3 bit test, all three test values are "not so good" in monobit test and 2

sequence test all test values are "very bad", in 3 sequence test the test values are "not so

good", "good" and "very bad". In 6 bit test, for monobit test we get two "good" and one

"not so good" test values, for 2 sequence test we get two "not so good" and one "good"

test values and for 3 sequences test we get "not so good", "good" and "bad" test values.

We are looking here that, generator 3 for last one bit generates all "very bad" sequences

but 6 bits test makes some good sequences. So we can say that generator 3 is good for

last 6 bits.

14

7.4 Generator 4








monobit test very bad not so good not so good





monobit test good not so good good

2 sequence test good not so good good

3 sequence test not so good not so good bad


In generator 4 for last 1 bit we can see that mono bit, two bits and three bits test values

all are "very bad". Generator 3 for last 3 bits test, we get two "not so good" and one is

"very bad" in monobit test. In 2 sequence and 3 sequence test we get all test values are

"very bad", In generator 4 for 6 bits, for monobit test we get two "good" and one "not

so good" test values, for 2 sequence test we get one "not so good" and two "good" test

values and for 3 sequence test we get two "not so good", and one "bad" test values. Now

we can say that generator 4 for last bit sequences is not producing the good sequences but

only generator 4 for last 6 bits producing few good sequences. So generator 4 is better for

more bits.

7.5 Generator 5













monobit test good bad not so good

2 sequence test bad very bad very bad


15


In generator 5 for last 1 bit we can see that monobit, twobits and three bits test values all

are "very bad". Generator 5 for last 3 bits ,we get also all values are "very bad", but In 6

bit test, for monobit test we get "good", "bad" and "not so good", for 2 sequence test we

get two "very bad" and one "bad" test values and for 3 sequence test we get all test values

are "very bad". So we can say that generator 5 is not good to produce good sequences.

8 ConclusionAfter statistical test we get the test values and according to our percentage table we mark

the test values as a "good", "not so good", "bad" and "very bad". Now we will compaire

the values from generator to generator. In generator 1, 3, 4 and 5 for last 1 bit test we can

see that all the test values are "very bad but" in generator 2 we can see that all monobit

test are "good" but two sequence test value are "very bad"’. In generator 1 for last 3 bit

we can see that almost all sequences test value are "very bad", but in generator 2 for last

3 bit, maximum test values are "not so good". In generator 3 we can see that the number

of test values "not so good" and "very bad" are equal, but in generator 4 maximum val-

ues are "very bad" and in generator 5 all values are "very bad". We can observe that in

every generator last 6 bits creating some "good" sequences. In generator 2 and 3 for last

3 bits also produces some "good" sequences but maximum are "not so good" sequences.

Generator 1 and 5 are very simillar because 1 bit and 3 bits test results are very simillar

in both generators, but in generator 3 we can see that maximum are "good" and "not so

good". Now in generator 5 we can see that maximum are "very bad". Compairing all five

generators we can find many "good" test values for last 6 bits except generator five, and i

can say that generator 2 is the best generator to produce "good" sequences.

References[1] Sheldom M.Ross simulation, 2002.

[2] http://csrc.nist.gov/publications/nistpubs/800-22-rev1/SP800-22rev1.pdf

[3] Frederick Solomon Probability and Stochastic Processes

[4] Knuth, The art of computer programming, volume 2, 1998.

[5] P.A.W.Lewis and E.J.Orav, Simulation methodology for statisticians,operation ana-lysts, and engineers 1998.

[6] Morris H. DeGroot, Mark J.Schervish, Probability and statistics 2002.

[7] Wade Trappe and Lawrence C. Washington Introduction to Cryptography with Cod-ing Theory 2006.

[8] T.T. Soong Fundamentals of Probability And Statistics For E ngineers

[9] Jay L.Devore Probability and Statistics for Engineering and the Sciences

[10] J. Susan Milton, Jesse C.Arnold Introduction to Probability and Statistics

[11] http://www. science.jrank.org/pages/1401/chi square- test.html

[12] Kenneth H. Rosen , Elementary number theory and its application.

16

9 appendixgenerator 1 for last 1 bits test when t=1,

when, n=100, test value is 0 , 1

n=200, test value is 0 , 1


now for generator 2,when t=1

when, n=100, test value is 0.36 , 0.548506

n=200, test value is 0.36 , 0.548506

n=300, test value is 0.36 , 0.548506

now adding last 1 bits test when t=2, for generator 1.




now last 1 bit test when t=2, for generator 2,


n=200, test value is 11.92 , 0.00766229

n=300, test value is 11.44 , 0.00956972

now last 1 bit test when t=3 for generator 1.


n=200, test value is 77.6 , 4.24105




n=200, test value is 8.8 , 0.267336

n=300, test value is 9.76 , 0.202587



n=200, test value is 7.84 , 0.00511026

n=300, test value is 9 , 0.0026998



n=200, test value is 0.64 , 0.423711

n=300, test value is 1.44 , 0.230139



n=200, test value is 21.84 , 0.0000704273

n=300, test value is 23.12 , 0.0000381227

17



n=200, test value is 1.76 , 0.623678

n=300, test value is 0.56 , 0.905525



n=200, test value is 0.16 , 0.999988

n=300, test value is 0.16 , 0.999988



n=200, test value is 2.4 , 0.934437

n=300, test value is 2.88 , 0.895877



n=200, test value is 0.16 , 0.689157

n=300, test value is 0.16 , 0.689157



n=200, test value is 1.16 , 0.689157

n=300, test value is 0.36 , 0.548506



n=200, test value is 7.84 , 0.0494368

n=300, test value is 14.48 , 0.00231953



n=200, test value is 0.72 , 0.86849

n=300, test value is 0.96 , 0.810929



n=200, test value is 3.04 , 0.881277

n=300, test value is 4.68 , 0.7038



n=200, test value is 3.2 , 0.865905

n=300, test value is 8 , 0.332594

if we organize avobe information into a table then it becomes

generator:1 for last 1 bit

18

test test1 test2 test3

monobit test 1 1 1

2 sequence test 0 0 0



test test1 test2 test3.

monobit test 0.548506 0.548506 0.548506

2 sequence test 0.00956972 0.00766229 0.00956972

3 sequence test 0.184085 0.267336 0.202587



monobit test 1 1 1





monobit test 1 1 1





monobit test 1 1 1



generator:1 for last 3 bits


monobit test 0.0278069 0.00511026 0.0026998

2 sequence test 0.0000704273 0.0000704273 0.0000381227

3 sequence test 0.999988 0.999988 0.999988



monobit test 1 0.423711 0.230139

2 sequence test 0.288573 0.623678 0.905525

3 sequence test 0.849824 0.934437 0.895877



monobit test 0.689157 0.689157 0.689157

2 sequence test 0.0199763 0.0000176682 1.88138–

3 sequence test 0.151307 0.625835 0.995448


19


monobit test 1 0.841481 0841481

2 sequence test 0.0000704273 0.0000704273 0.0000381227

3 sequence test 0.999988 0.999988 0.999988



monobit test 0.0026998 0.000318217 0.000673859

2 sequence test 6.47316– 1.21377– 1.70053–




monobit test 0.841481 0.689157 0.689157

2 sequence test 0.0248683 0.0.494368 0.00231953

3 sequence test 0.881277 0.881277 0.7038



monobit test 0.230139 0.689157 0.548506

2 sequence test 0.956224 0.86849 0.810929

3 sequence test 0.193153 0.865905 0.332594



monobit test 0.841481 0.689157 0.423711

2 sequence test 0.86849 0.696186 0.86849

3 sequence test 0.0655731 0.74227 0.984314



monobit test 0.548506 0.841481 0.689157

2 sequence test 0.65939 0.830251 0.606269

3 sequence test 0.909641 0.865905 0.922512



monobit test 0.423711 0.0278069 0.230139

2 sequence test 0.016033 0.0045509 0.00888689

3 sequence test 0.000555888 0.000487356 0.000487356

Here i am giving mathematica code in below

Length of subsequence to testt = 2;outputbits = 3;m = 400;

20

The generatorlincongen[a_Integer, b_Integer, n_Integer, x_Integer] := Mod[a*x + b, n]Generate a suequence. list1 contains numbers modulo nlis2 contains numbers modulo 2f[x_] := lincongen[25, 5, m, x];list1 = {};list2 = {};s = 2;x = s;Do[AppendTo[list1, x];bits = IntegerDigits[x, 2, outputbits];list2 = Join[list2, bits];x = f[x];, {i, 1, m}]list1;list2;list2Create a table containing the frequences of the different subsequences

totalfreq = Table[0, {i, 0, 2^t - 1}]Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];totalfreq[[FromDigits[test, 2] + 1]]++;, {i, 0, Length[list2]/t - 1}]

{0, 0, 0, 0}FromDigits[{1, 1, 1, 1}, 2]totalfreq{150, 250, 50, 150}

n = 100;freq = Table[0, {i, 0, 2^t - 1}];Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];freq[[FromDigits[test, 2] + 1]]++;, {i, 0, n - 1}]stat = N[Sum[(freq[[i]] - n*(1/2^t))^2/(n/2^t), {i, 1, 2^t}]]testvalue = 1 - N[CDF[ChiSquareDistribution[2^t - 1], stat]]

21.84

0.0000704273

freq

21

{50, 50}

n = 100;freq = Table[0, {i, 0, 2^t - 1}];Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];freq[[FromDigits[test, 2] + 1]]++;, {i, n, 2*n - 1}]stat = N[Sum[(freq[[i]] - n*(1/2^t))^2/(n/2^t), {i, 1, 2^t}]]testvalue = 1 - N[CDF[ChiSquareDistribution[2^t - 1], stat]]

0.04

0.841481

n = 100;freq = Table[0, {i, 0, 2^t - 1}];Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];freq[[FromDigits[test, 2] + 1]]++;, {i, 2*n, 3*n - 1}]stat = N[Sum[(freq[[i]] - n*(1/2^t))^2/(n/2^t), {i, 1, 2^t}]]testvalue = 1 - N[CDF[ChiSquareDistribution[2^t - 1], stat]]

0.04

0.841481

22

SE-351 95 Växjö / SE-391 82 Kalmar Tel +46-772-28 80 00 [email protected] Lnu.se

Documents

template thesis work - DiVA portallnu.diva-portal.org/smash/get/diva2:324602/FULLTEXT01.pdf · auxiliary quantaties used in generating digital signatures of the employes authenticaiton