Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Master Project
Mohammad shafiqul haque 2010-06-14 Subject: Mathematics Level: Advance Course code: 4MA11E
Title:Varies statistical test of pseudorandom number generator
Abstract
This thesis is related to varies statistical test of pseudorandom number generator. In this
thesis I have tried to discuss some aspects of selecting and testing Pseudorandom number
generators. The outputs of such generators may be used in many cryptographic applica-
tions, such as the generation of key material. After statistical test I have tried to compaire
the test value of every generator and have discussed which one is producing good se-
quences and which one is a good generator.
Key-words: Pseudorandom number generator (PRNG), Statistical test,Test value,
Acknowledgments
I would like to thank my supervisor Marcus Nilsson for accepting and giving chance and
encourage to do my thesis under his kind supervision. I also want to thank of my head
of department and teachers who helped us in different ways. I am also thankfull to my
parents and friends who suported and encourage during my study. I am also giving thanks
to the university library and university lab for their excellent support. At last I am giving
thanks to the swedish government for giving excellent oppertunity for study in sweden.
iii
Contents
1 Introduction 11.1 Aim of the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Random and pseudorandom number generators (RNGs and PRNG) . . . 1
1.3 Need for statistical test and why? . . . . . . . . . . . . . . . . . . . . . . 3
2 Applications of PRNG 32.1 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Some Types of PRNGs 43.1 Linear Congruential Generator . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 How to produce a bit sequence . . . . . . . . . . . . . . . . . . . . . . . 5
4 Probability Distribution 64.1 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 chi-square (χ2) distribution . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 chi-square tests 8
6 statistical test 106.1 Monobit test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.2 Twobit test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6.3 Threebit test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.4 Subsequences of length t . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.5 General discussion about testvalues percentage . . . . . . . . . . . . . . 11
7 Investigates of some pseudo random number generators 127.1 Generator 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.2 Generator 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.3 Generator 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.4 Generator 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.5 Generator 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
8 Conclusion 16
References 16
9 appendix 17
iv
1 IntroductionThere are two basic types of generators for creating random sequences. One is random
numbar generator (RNG) and another is pseudorandom number generator (PRNG). Gen-
erally, a random number generator uses a non-deterministic source i.e. unpredictable
source along with some processing function to generate random sequence. The outputs
of an RNG may be used directly as a random number or may be fed into a pseudorandom
number generator (PRNG) and, the pseudorandom number generator creates a sequence
of random bits from an initial value called a seed by using known algorithm. Random
numbers play an important role in the field of network security applicaitons. The need for
random and pseudorandom numbers arises in many cryptographic applications. Crypto-
graphic protocols require random or pseudorandom inputs at various points, and also for
auxiliary quantaties used in generating digital signatures of the employes authenticaiton.
We can generate random number by spinning wheels or rolling dice or card shuffling.
Nowadays we can produce a pseudorandom number sequence by using the latest com-
puter technology. Pseudorandom numbers sequences are very useful for different types
of applications. Let suppose simulation, sampling, numerical analysis and computer pro-
gramming see [1]. By using PRNG with one or more inputs we can generate a lots of
"pseudorandom" numbers. A random number seems to be more random and we can ob-
tain it through natural sources of the random numbers. We can choose an example from
a semiconductor resistor that is thermal noise. It is a good source of randomness. On
the other hand, just as flipping coins to generate random bits would not be practical for
cryptographic applications, most natural conditions are not practical due to the natural
slowness in sampling the procedure and the complexity of ensuring that an opponent does
not examine the process see [2]. There are many types of methods to create a pseudo ran-
dom number and we have read that the Linear Congruential method and the Blum Blum
Shub method are widely used to create a pseudorandom number.
1.1 Aim of the Project
The aim of the project is by using pseudorandom number generators in Mathematica
we will do various statistical tests of various kinds of generators and after getting these
statistical test values we will discuss the randomness of the sequences.
1.2 Random and pseudorandom number generators (RNGs and PRNG)
At the heart of simulations of random models is a method for producing random numbers-
a procedure or function that will churn out number after number uniformly distributed
in the interval [0,1]. The method explained in this section is the method used by most
programming languages which have built in random number generators. Actually, the
random number generator will be a specific formula that produces random numbers in
a completely deterministic way. This is a contradiction to the very idea of randomness.
Consequently, the numbers produced by the random number generator are often called
pseudorandom because, although they have a very definite pattern, they appear to have no
discernible pattern detectable without knowing the exact formula used. The fact that the
same sequence of pseudorandom numbers is generated each time the generator is used
even useful in helping debug programs and understanding the results of the simulation.
The second type of generator is pseudorandom number generator (PRNG). A PRNG uses
one or more inputs and generates multiple pseudorandom numbers. Inputs to PRNGs
are called seeds. In contexts in which unpredictability is needed, the seed itself must be
random and unpredictable. Hence, by default, a PRNG should obtain its seeds from the
1
outputs of an RNG, so a PRNG requires a RNG as a companion. The outputs of a PRNG
are typically deterministic functions of the seed, so all true randomness is confined to
seed generation. The deterministic nature of the process leads to the term pseudorandom.
Since each element of a pseudorandom sequence is reproducible from its seed, only the
seed needs to be saved if reproduction or validation of the pseudorandom sequence is re-
quired.
Ironically, pseudorandom numbers often appear to be more random than random numbers
obtained from physical sources. If a pseudorandom sequence is properly constructed,
each value in the sequence is produced from the previous value via transformations that
appear to introduce additional randomness. A series of such transformations can elimi-
nate statistical auto correlations between input and output. Thus, the outputs of a PRNG
may have better statistical properties and be produced faster than an RNG. Randomness
means "no pattern", pseudorandomness means "no apparent pattern". Start with positive
integers MULT(for multiplier), ADDR(for adder), and NORM(for normalizer). SEED is
to be a pseudorandom number satisfying
0 ≤ SEED < NORM
Each time a new random number is needed, it is produced from the previous value of
SEED by the formula
SEED := (MULT ∗SEED+ADDR)modNORM
That is, first SEED gets multiplied by MULT, then ADDR is added on, and finally the
remainder upon division by Norm is the new value of SEED.
Example 1.1. Use these values
MULT := 6 ADDR := 5 NORM := 11
What values of SEED will be produced if the initial value of SEED is 0? if the initial
value is 4 ?
Each time the new value of SEED is
SEED := (6∗SEED+5)mod11.
With SEED initially 0 this sequence is generated.
052689473105268 . . .
With SEED initially 4 this sequence is generated.
47310526894731052 . . .
From Example 1 two facts are apparent: First, since the next number in the sequence is
generated from only the value of the previous number, if any number is generated again,
the entire list is a repetition from that point on. Second, the "cycle length"- the number
of distinct numbers before repetition occurs-can be at most of length equal to the value
of NORM. This is so since the mod function produces the remainder upon division by
NORM, which necessarily is a number between 0 and NORM-1; thus there are NORM
many possible remainders-NORM many possible values of the random number SEED,
that is; and as soon as one is repeated, repetition of the entire list occurs. A good random
number generator would use values of MULT, ADDR, and NORM so that the cycle length
is large.
2
Example 1.2. Consider the generator
SEED := (6∗SEED+3)mod7
Then with different initial values of SEED, these sequences are generated:
0303030303 . . .
1212121212 . . .
4646464646 . . .
5555555555 . . .
Although the value of NORM= 7 suggests that the cycle length might be the maximum
value of 7, the actual cycle lengths are small; if SEED is initially 5, then the random
number generator is quite useless. The theory of what values of the parameters result in
good random number generators is complicated and more a subject of abstract algebra
than of probability see [3].
1.3 Need for statistical test and why?
My thesis is related to statistical test and I have to do statistical test in mathematica. With-
out statistical test it is not possible to get the test value sequence and without sequence it
is also not possible to justify randomness of sequences. After doing statistical test we will
check the randomness of the sequences. [4].
2 Applications of PRNG
2.1 Simulation
Simulation is essentially a technique of statistical sampling control. Used in combination
with a model to obtain approximate answers to questions about complex, multifactorial
probalistic problems. It is very useful when the numerical and analytical techniques can
not answer. Simulation is infact a statistical experiment, performed in a digital computer
see [5]. Statistical models for estimating the characteristics of the distributions as a means
of functions, quantiles and other functions that can not be calculated in closed form. Using
a simulation estimator, it is good for calculating a measure of how accurate the estimate,
plus the estimate itself see [6]. Simulation is a technique that can be used to shed light on
how a complex system as well as a thorough analysis is not available.
Example 2.1. Engineers can simulate traffic models in the surrounding area of a construc-
tion plan to use what the effects may have different limitations. A physicist can simulate
the activities of gas molecules under conditions that are enclosed by no known assump-
tions. Statistical models used to estimate the probability of the type of our models can not
be considered logically. Simulation, because it includes and ingredients of randomness in
an analysis, it is occasionally the Monte Carlo analysis, the name of the famour European
gambling see [6].
3
2.2 Cryptography
Cryptography means hiding information. We can see the use of cryptography in the field
of mathematics, computer science and engineering. We are using cryptography in the
advanced applications for example ATM (automated teller machine) Cards, computer
passwords, and the electronic commerce Pseudorandom number generator is related to
cryptography the degine of Pseudorandom number generators is one of the main is-
sues in stream ciphering, where Pseudorandom number sequences are often employed
as keystreams see [7].
Example 2.2. In cryptograph to encrypt a plaintext we can use a pseudorandom number
as a key stream and to decrypt the ciphertext also use pseudorandom number then we get
our plaintext again. So we can see here pseudorandom number is very essential in our
cryptograpic application.
Our plaintext is 01101001100
Key stream is 10110010111
Add by bit wise then Our ciphertext becomes 11011011011
Again if we add kye stream with ciphertext by bit wise then we will get back our plaintext
which is given below
11011011011
plus10110010111
plaintext 01101001100
3 Some Types of PRNGs3.1 Linear Congruential Generator
The most widely used technique for pseudorandom number generation is an algorithm, it
is called linear congruential method, first introduced by D. H. Lehmer in 1949. We need
to choose four parameters which are given in the table below.
Parameter name Range
m 0 < ma 0 ≤ a < mc 0 ≤ c < mX0 0 ≤ X0 < m
The sequence of random numbers (Xn) is obtained via the following equation
Xn+1 ≡ (aXn + c) (mod m),0 ≤ Xn+1 < m. (3.1)
This will produce a sequence of integers with each integer in the range 0 ≤ Xn < m, (see
[4]).
Example 3.1. Let us now consider the Linear Congruential Generator for the parameters
m=5 , a=2, X0 = 2 and c=1. We get,
X1 ≡ 2 ·X0 +1 (mod m)
≡ 2 ·2+1 (mod 5)
4
≡ 5 (mod 5)
≡ 0 (mod 5)
and,
X2 ≡ 2 ·0+1 ≡ 1 (mod 5)
in a similar way we get,
X3 ≡ 3,X4 ≡ 2,X5 ≡ 0,X6 ≡ 1.
The sequence is 0 1 3 2 0 1 3 2. . . We can see that the sequence is repeating with period 4.
3.2 How to produce a bit sequence
We will discuss about how to produce a bit sequence. We can consider a linear congruen-
tial equation mod m and its starting value X0=2.
Xn ≡ 3Xn−1 +5 (mod m),X0 = 2. (3.2)
By using the above mentioned linear congruential equation we can produce a binary se-
quence of Sm. We can also produce the binary sequence by using the last bit, last three
bits and last six bits of every number in Sm.
Example 3.2. Let us take m=11 in our linear congruantial equation then we can produce
the binary sequence by using the last bit, last three bits and last six bits of every number
in S11
our linear congruantial equation becomes
Xn ≡ 3Xn−1 +5 (mod 11),X0 = 2.
If we take n=1, then our equation becomes
X1 ≡ 3X0 +5 (mod 11).
X1 ≡ 3 ·2+5 (mod 11).
X1 ≡ 11 (mod 11).
X1 = 0
when n=2
X2 ≡ 3X1 +5 (mod 11).
X2 ≡ 3 ·0+5 (mod 11).
X2 ≡ 5 (mod 11).
X2 = 5
In a similar way we get the sequence 2 0 5 9 10 2 Now we will make it to a binary
sequence. So our binary sequence becomes 0 0 1 1 0 0.
Now we will produce a binary sequence by using the last three bits of every number in
S11. The binary sequence of 2 is
2 = 0 ·20 +1 ·21
5
So the binary sequence of 2 is 1 0
In a similar way we get the binary sequence of 0 is 0
The binary sequence of 5 is 1 0 1
The binary sequence of 9 is 1 0 0 1
The binary sequence of 10 is 1 0 1 0
Now the binary sequence by using the last bit of every number in S11 are 0,0,1,1,0,0.
The binary sequence by using the last three bits of every number in S11 are 010,000,101,001,010.
The binary sequence by using the last six bits of every number
in S11 are 000010,000000,000101,001001,001010.
4 Probability Distribution
A probability distribution indicates either the probability of each value of an unidenti-
fied random variable or the probability of the value falling within a particular interval.
The behavior of a random variable is characterized by its probability distribution, that
is, by the way probabilities are distributed over the values it assumes. The corresponding
functions for a continuous random variable are the probability distribution function, prob-
ability distributions are uses to calculate definite intervals for parameters and to calculate
critical area for hypothesis tests. It is sometimes useful to identify a reasonable distri-
butional model for the data. Statistical intervals and hypothesis tests are often based on
specific distributional assumptions. Before computing an interval based on distributional
assumption, we need to justify that the assumption is justified for the given data. In this
case, the distribution does not need to be the best-fitting distribution for the data. But an
adequate enough that the statistical technique yields valid conclusions. Also simulation
studies with random numbers generated from using a specific probability distribution are
often needed. That the term probability functions covers both discrete and continuous
distributions. We may use the term probability density functions to mean both discrete
and continuous probability functions. Given a random experiment with its associated ran-
dom variable X and given a real number x, let us consider the probability of the even
(s : X(s) ≤ x),or, simply, P(X ≤ x). This probability is clearly dependent on the assigned
value x. The function
Fx(x) = P(X ≤ x), (4.1)
is defined as the probability density function (PDF), or simply the distribution function,
of X. In equation(4.1), subscript X identifies the random variable. This subscript is some-
times omitted when there is no risk of confusion. Let us repeat that Fx(x) is simply P(A),
the probability of an event A occuring, the event being X ≤ x. The PDF is thus the prob-
ability that X will assume a value lying in a subset of S, the subset being point x and all
points lying to the left of x. As x increases, the subset covers more of the real line, and the
value of PDF increases until it reaches 1. The PDF of a random variable thus accumulates
probability as x increases, and the name cumulative distribution function (CDF) is also
used for this function. In view of the definition and the discussion above, we give some
of the important properties possessed by a PDF.
It exists for discrete and continuous random variables and has values between 0 and 1.
It is a nonnegative, continuous to the left, and nondecreasing function of the real variable
x.
Fx(−∞) = 0,andFx(+∞) = 1 (4.2)
6
If a and b are two real numbers such that a < b,than
P(a < X ≤ b) = Fx(b)−Fx(a). (4.3)
This relation is a direct result of the identity
P(X ≤ b) = P(X ≤ a)+P(a < X ≤ b).
We see from equation (4.3) that the probability of X having a value in an arbitrary inter-
val can be represented by the difference between two values of the PDF. Generalizing,
probabilities associated with any sets of intervals are derivable from the PDF see [8].
4.1 Gamma distribution
A random variable X with density
f (x) =1
Γ(α)β α xα−1e−xβ
where x > 0, α > 0, β > 0 is said to have a gamma distribution with parameters α and βsee[10], where
Γ(α) =∫ ∞
0tα−1e−tdt.
4.2 chi-square (χ2) distribution
A particular type of gamma distribution known as a χ2 distribution. This distribution is
closely related to a random samples from a normal distribution, which is widely applied
in the field of statistics. The gamma distribution with parameter α and β , and positive
integer n, the gamma distribution for which α = n/2 and β = 1/2 is called the chi square
distribution with n degrees of freedom. The chi square distribution has one parameter,
Figure 4.1: chi square distribution with 6 degrees of fredom
its degrees of freedom. It has a opposite skew, the skew is less with more degrees of
freedom. As the degrees of fredom increase, the chi square distribution approaches a
normal distribution. The mean of a chi square distribution is its degree of freedom. see
[6]
7
Figure 4.2: chi square distribution with 3 degrees of freedom
Figure 5.1: chi square distribution with 5 degrees of freedom
5 chi-square testsAny statistical test that uses the chi- square distribution can be called chi-square test.
The chi square test is perhaps the best known of statistical test and it is basic method
that is used in connection many other test. In a particular example of the chi square test
as it might be applied to dice throwing. The chi Square distribution is a mathematical
distribution that is directly in many tests of significance. The most common use of the
chi square distribution is to test differences among propertions. Although this test is by
no means the only test based on the chi square distribution, it has come to be known as
the chi square test The chi-square test is the most commonly used method for comparing
frequences or propertions. It is a statistical test used to determine if observed data deviate
from those expected under a particular hypothesis. The chi-square test is also referred
to as a test of a measure of "goodness of fit" between data. The χ2 distribution with k-
1 degrees of freedom is a process for testing the null hypothes is that our data from an
sample from a specific distribution against the alternative hypothesis that the data have
some other distribution. The test is most natural when the specific distribution is discrete.
Suppose that there are k possible values for each observation. We observe Ni with value
i for i = 1, ....,k. Suppose that the null hypothesis says that the probability of the ithpossible value is pi for i = 1, ....,k. Then we compute
Q =k
∑i=1
(Ni −npi)2
npi(5.1)
Where n = ∑ki=1 Ni is a simpel size. When the null hypothesis says that the data have
a continuous distribution, then one must first create a corresponding discrete distribu-
tion. One does this by dividing the real line into finitely many intervals, calculating the
probability of each interval p1, .....pk, and then pretending as we learned from the data
8
were into which intervals each obserbation fell. This converts the original data into dis-
crete data with k possible values. All the χ2 test statistics in this text have the form
∑ (observed−expected)2
expected , where "observed" stands for an observed count, and "expected"
stands for the expected value of the observed count under the assumption that the null
hypothesis is true.
In 1900, Karl Pearson showed that if the hypothesis H0 is true , then as the sample size
n → ∞, the degrees of freedom of Q converges to the degrees of freedom of the χ2 distri-
bution with k-1 degrees of freedom. Thus if H0 is true, and the sample size n is large, the
distribution of Q will be approximately a χ2 distribution with k-1 degrees of freedom the
discussion that we have presented indicates that H0 should be rejected when Q≥C, where
C is an appropriate constant. If it is desired to carry out the at the level of significance ∝o,
then C should be chosen to be the 1− ∝o quantity of the χ2 distribution with k-1 degrees
of freedom. This is called the χ2 test of goodness-of-fit. see [6]
The chi-square analysis is used to test the null hypothesis H0, which is the hypothesis
that states there is no significant difference between expected and observed data. Investi-
gators either accept or reject H0, after compairing the value of chi-square to a probability
distribution. Chi-square values with low probability lead to the rejection of H0 and it is
assumed that a factor other than chance creates a large deviation between expected and
observed results.
Example 5.1. If we flip a coin 200 times the probability of flipping heads is 0.5, and the
probability of flipping tails is 0.5this means that we are predicting that half of the time the coin will come up heads, and
half of the time the coin will come up tails.
then our hypothesis predicts.
Expected; Heads-100, Tails-100, total- 200 to test hypothesis, we are flipping our penny
200 times .
Observed: Heads- 108, Tails- 92, total- 200
our chi-square formula is
Q =k
∑i=1
(Ni −npi)2
npi(5.2)
For heads, (Ni −npi)2 = (108−100)2 = (8)2 = 64
For tails, (Ni −npi)2 = (92−100)2 = (−8)2 = 64
The number of trails is very important. A particular deviation from perfect means a lot
more if there are only a few trials then it would if there were many trials, this is done by
dividing our (Ni −npi)2 values by the expected values
For heads,(Ni−npi)2
npi= 64
100= 0.64
For tails,(Ni−npi)2
npi= 64
100=0.64
To calculate the chi-square value for our experiment, we add together all of the(Ni−npi)2
npi
values sum of χ2 = 0.64+0.64 = 1.28
We can descrive all information by the following table.
data observed expected (O−E) (O−E)2 (O−E)2
Eheads 100 108 8 64 .64
tails 100 92 -8 64 .64
total 200 200 sum χ2= 1.28
9
Now we have to find χ2(v) (degrees of freedom ). To calculate the χ2(v) we need
to know the numbers of classes of data. In the case of this example that number would
be two ("Heads" or "Tails") so, this degree of freedom is, χ2(v) = (2-1)=1. If we were
dealing with dice rather then coins then df would be (6-1)=5. Now we have the sum of
χ2 and the χ2(v) 1.28 and 1 respectively. According to chi-square distribution table, 1.28
falls between the numbers 1.07 and 1.64 which represents 0.30 and 0.20 respectively. So,
we could say, that probability of our chi-square falls between 0.20 and 0.30.
A probability of 0.20 corresponds to a "chance" of 20%, and 0.30 to a chance of 30%,
this chi- square result means that, If our hypothesis is correct, then our results would be
at least this far from what we predited or the probability that we would get results at
least as bad as these, even though our hypothetsis is correct is between 0.20 and 0.30.in
bilogically applications, a probability 5% is usually adopted as the standard. This values
means that the chances of an observed value arising by chance is only 1 in 20, beacause
the chi squared value we obtained in the coin exemple is greater then 0.05, we accept the
null hypothesis as true and conclud that our coin is fair[10]
6 statistical test
Now we are going to define a couple of tests to use on our mathamatica file. Note that we
will be dealing with binary sequence. We will use n=100 as a sample size.
6.1 Monobit test
Here the focuse of the test is the proportion of zeroes and ones for the entire sequence.
The purpose of this test is to determine whether the number of zeros in a sequence are ap-
proximately the same as would be expected for a truly random sequence. All subsequence
tests depend on the passing of this test. Now we will derive the statistic for the monobit
test from our chi square formula see(6.4). In our statistical test for monobit test we can
take the length of the subsequence is 1, The number of different outcomes k is 2 and the
degree of fredom is k-1=2-1=1. pi=1/2 is the probability for monobit test and n=100, is
the number of samples. Now our chi square formula becomes
Q1 =2
∑i=1
(Ni −npi)2
npi
=(N1 −np1)2
np1+
(N2 −np2)2
np2
=(N1 −50)2
50+
(N2 −50)2
50.
(6.1)
6.2 Twobit test
Now we will derive the statistic for the twobit test from our chi square formula see (6.4).
In our statistical test for monobit sequence we can take the length of the subsequence is 2,
The number of different outcomes k is 4 and the degrees of fredom is k-1=4-1=3. pi=1/4
is the probability for twobit test and n is the number of samples. Now our chi square
10
formula becomes
Q2 =4
∑i=1
(Ni −npi)2
npi
=(N1 −np1)2
np1+
(N2 −np2)2
np2+
(N3 −np3)2
np3+
(N4 −np4)2
np4
=(N1 −25)2
25+
(N2 −25)2
25+
(N3 −25)2
25+
(N4 −25)2
25.
(6.2)
So outcomes of two bits sequences are (00,10,01,11)
6.3 Threebit test
Now we will derive the statistic for the threebit test from our chi square formula see (6.4).
In our statistical test for threebit sequence we can take the length of the subsequence is
3, then the number of different outcomes k is 8 and the degree of fredom is k-1=8-1=7.
pi=1/8 is the probability for threebit test and n is the number of samples. Now our chi
square formula becomes,
Q3 =8
∑i=1
(Ni −npi)2
npi. (6.3)
And outcomes of three bits sequences are (000,100,010,001,110,101,011,111)
6.4 Subsequences of length t
Now we are going to discuss the subsequence of length t. Let t be a positive integer. The
number of different outcomes is 2t and the probability of subsequence of length t is 1/2t
and the chi square formula becomes
Qt =2t
∑i=1
(Ni −n/2t)n/2t (6.4)
6.5 General discussion about testvalues percentage
From probability distribution function we can say that our test values must be between 0
and 1 (see4.3)
Figure 6.1: chi square distribution with 6 degrees of fredom
In the figure right sight area is significant level of α , which is 5% area of whole figure.
If our test values V lies in the area of α then we can say that our test value is not good
About test values percentage see the percent table.
11
7 Investigates of some pseudo random number generators
The sequence of random numbers (Xn) is obtained via the following equation
Xn+1 ≡ (aXn +b) (mod m),0 ≤ Xn+1 < m,X0 = S.
Which is called a linear congruential equation. Where m is the modulus, a is the multi-
plier, b is the increment and X0 = 0 is a starting value.
Now we will choose the values of m, a, b and s, and investigate different generators
by doing statistical test. There are five generators taken with different values and we
will observe after statistical test what kind of sequences the values of different generators
produces and we will justify that the values are random or not random. The values for
generators which we will use to investigate the pseudorandom number generators is pre-
sented in the table below.
generators m a b s
genetator 1 400 25 5 2
generator 2 2509 23 5 2
generator 3 1578 25 5 2
generator 4 3568 25 5 2
generator 5 6784 27 5 2
Given the above mentioned 5 generators and its values we can do statistical test for last
one bit test, last 3 bit test and last 6 bit test (see 6) and in each bit we can change the value
of t suppose for last 1 bit test for generator 1, we do the statistical test when t=1, 2 and 3
one after another. After each statistical test we will get different test values of sequences
and after analyzing these test values we will discuss which sequence is random and which
one is not random. We can now discuss the statistical test value V, if V is less then the
1% entry or greater than the 99% entry, we can reject the numbers as a not sufficiently
random, and according to our percentage table (see :percentage table)we can say that the
number is "very bad". If v lies between the 1% and 5% entries or between the 95% and
99% entries, then we can see in table (see:percentage table) , the numbers are "bad".If
V lies between the 5% and 25% entries, or the 75% and 95% entries, then according to
your percentage table we can say that the number is "not so good" but the number will be
"good" when the number lies between 25% and 75%.We can describes the percentage of
the test values in a easy way in a table below.
percentage scores
0%-1% very bad
1%-5% bad
5%-25% not so good
25%-75% good
75%-95% not so good
95%-99% bad
99%-100% very bad
12
After statistical test we get the sequences. The chi-square test is often done at least
three times on different sets of data which is test1, test2 and test3 , and if at least two
of the three tests are "not good" the results are suspect the numbers are regarded as not
sufficiently random. Now we will discuss about generator1.
7.1 Generator 1
7.1.1 generator 1 for last 1 bit
test test 1 test 2 test 3
monobit test very bad very bad very bad
2 sequence test very bad very bad very bad
3 sequence test very bad very bad very bad
7.1.2 generator 1 for last 3 bit
test test 1 test 2 test 3
monobit test bad very bad very bad
2 sequence test very bad very bad very bad
3 sequence test very bad very bad very bad
7.1.3 generator 1 for last 6 bit
test test1 test 2 test 3
monobit test not so good Good Good
2 sequence test bad bad very bad
3 sequence test not so good not so Good good
7.1.4 general discussion
In our generator 1 for 1 bit we can see that mono bit, two bits and three bits tests values
all are "very bad" and when we change the bits from 1 to 3 bits then the values of the tests
are very similler to 1 bit tests when we change the bit from 3 to 6 bits then we can see that
in monobit test three of the two values are "good" and in two sequences test we can see
that three of the two tests are "bad" and one is "very bad" and in three sequences test two
tests are "not so good" and one is "good". So we can say that in generator 1, one bit and
three bits test are not good but 6 bits test are better than other two.
7.2 Generator 2
7.2.1 generator 2 for last 1 bit
test test 1 test 2 test 3
monobit test Good Good Good
2 sequence test very bad very bad very bad
3 sequence test not so good Good Not so
good
7.2.2 generator 2 for last 3 bits
test test 1 test 2 test 3
monobit test Very bad Good not so good
2 sequence test good good not so good
3 sequence test not so good not so good not so good
13
7.2.3 generator 2 for last 6 bits
test test 1 test 2 test 3
monobit test not so good good good
2 sequence test bad not so good not so good
3 sequence test not so good not so good good
7.2.4 general discussion
In generator 2 for last 1 bit we can see three test values are "good" but in 2 sequences test
all test values are "very bad" and in 3 sequences test two values "are not so good" and one
is "good". If we change the bit from 1 bit to 3 bit then we can see that in monobit test one
value is "good" and other two is "very bad" and "not so good" but in 2 sequence test two
values are "good" and one is "not so good". Unfortunately in 3 sequence test all values
are "not so good". Now in 6 bit test we can see that two tests are "good" and one is "not
so good" but in 2 sequences test two values are "not so good" and one is "bad" also for
3 sequences test we get two of the three test values are "not so good" values and one is
"good". In generator 2 we can see that for 1 bit test we get 4 "good" sequences, for 3 bit
sequences we get 3 "good" sequences and for 6 bits test we get 3 "good" sequences. So
we can say that in generator 2 we get more "good" sequences in one bit test, so generator
2 for one bit test is batter than other two .
7.3 Generator 3
7.3.1 generator 3 for last 1 bit
test test 1 test 2 test 3
monobit test very bad very bad very bad
2 sequence test very bad very bad very bad
3 sequence test very bad very bad very bad
7.3.2 generator 3 for last 3 bits
test test 1 test 2 test 3
monobit test not so good not so good not so good
2 sequence test very bad very bad very bad
3 sequence test not so good good very bad
7.3.3 generator 3 for last 6 bits
test test 1 test 2 test 3
monobit test not so good good good
2 sequence test not so good good not so good
3 sequence test not so good good bad
7.3.4 general discussion
In generator 3 for last 1 bit we can see that monobit, twobits and three bits test values all
are "very bad". In 3 bit test, all three test values are "not so good" in monobit test and 2
sequence test all test values are "very bad", in 3 sequence test the test values are "not so
good", "good" and "very bad". In 6 bit test, for monobit test we get two "good" and one
"not so good" test values, for 2 sequence test we get two "not so good" and one "good"
test values and for 3 sequences test we get "not so good", "good" and "bad" test values.
We are looking here that, generator 3 for last one bit generates all "very bad" sequences
but 6 bits test makes some good sequences. So we can say that generator 3 is good for
last 6 bits.
14
7.4 Generator 4
7.4.1 generator 4 for last 1 bit
test test 1 test 2 test 3
monobit test very bad very bad very bad
2 sequence test very bad very bad very bad
3 sequence test very bad very bad very bad
7.4.2 generator 4 for last 3 bits
test test 1 test 2 test 3
monobit test very bad not so good not so good
2 sequence test very bad very bad very bad
3 sequence test very bad very bad very bad
7.4.3 generator 4 for last 6 bits
test test 1 test 2 test 3
monobit test good not so good good
2 sequence test good not so good good
3 sequence test not so good not so good bad
7.4.4 general discussion
In generator 4 for last 1 bit we can see that mono bit, two bits and three bits test values
all are "very bad". Generator 3 for last 3 bits test, we get two "not so good" and one is
"very bad" in monobit test. In 2 sequence and 3 sequence test we get all test values are
"very bad", In generator 4 for 6 bits, for monobit test we get two "good" and one "not
so good" test values, for 2 sequence test we get one "not so good" and two "good" test
values and for 3 sequence test we get two "not so good", and one "bad" test values. Now
we can say that generator 4 for last bit sequences is not producing the good sequences but
only generator 4 for last 6 bits producing few good sequences. So generator 4 is better for
more bits.
7.5 Generator 5
7.5.1 generator 5 for last 1 bit
test test 1 test 2 test 3
monobit test very bad very bad very bad
2 sequence test very bad very bad very bad
3 sequence test very bad very bad very bad
7.5.2 generator 5 for last 3 bits
test test 1 test 2 test 3
monobit test very bad very bad very bad
2 sequence test very bad very bad very bad
3 sequence test very bad very bad very bad
7.5.3 generator 5 for last 6 bits
test test 1 test 2 test 3
monobit test good bad not so good
2 sequence test bad very bad very bad
3 sequence test very bad very bad very bad
15
7.5.4 general discussion
In generator 5 for last 1 bit we can see that monobit, twobits and three bits test values all
are "very bad". Generator 5 for last 3 bits ,we get also all values are "very bad", but In 6
bit test, for monobit test we get "good", "bad" and "not so good", for 2 sequence test we
get two "very bad" and one "bad" test values and for 3 sequence test we get all test values
are "very bad". So we can say that generator 5 is not good to produce good sequences.
8 ConclusionAfter statistical test we get the test values and according to our percentage table we mark
the test values as a "good", "not so good", "bad" and "very bad". Now we will compaire
the values from generator to generator. In generator 1, 3, 4 and 5 for last 1 bit test we can
see that all the test values are "very bad but" in generator 2 we can see that all monobit
test are "good" but two sequence test value are "very bad"’. In generator 1 for last 3 bit
we can see that almost all sequences test value are "very bad", but in generator 2 for last
3 bit, maximum test values are "not so good". In generator 3 we can see that the number
of test values "not so good" and "very bad" are equal, but in generator 4 maximum val-
ues are "very bad" and in generator 5 all values are "very bad". We can observe that in
every generator last 6 bits creating some "good" sequences. In generator 2 and 3 for last
3 bits also produces some "good" sequences but maximum are "not so good" sequences.
Generator 1 and 5 are very simillar because 1 bit and 3 bits test results are very simillar
in both generators, but in generator 3 we can see that maximum are "good" and "not so
good". Now in generator 5 we can see that maximum are "very bad". Compairing all five
generators we can find many "good" test values for last 6 bits except generator five, and i
can say that generator 2 is the best generator to produce "good" sequences.
References[1] Sheldom M.Ross simulation, 2002.
[2] http://csrc.nist.gov/publications/nistpubs/800-22-rev1/SP800-22rev1.pdf
[3] Frederick Solomon Probability and Stochastic Processes
[4] Knuth, The art of computer programming, volume 2, 1998.
[5] P.A.W.Lewis and E.J.Orav, Simulation methodology for statisticians,operation ana-lysts, and engineers 1998.
[6] Morris H. DeGroot, Mark J.Schervish, Probability and statistics 2002.
[7] Wade Trappe and Lawrence C. Washington Introduction to Cryptography with Cod-ing Theory 2006.
[8] T.T. Soong Fundamentals of Probability And Statistics For E ngineers
[9] Jay L.Devore Probability and Statistics for Engineering and the Sciences
[10] J. Susan Milton, Jesse C.Arnold Introduction to Probability and Statistics
[11] http://www. science.jrank.org/pages/1401/chi square- test.html
[12] Kenneth H. Rosen , Elementary number theory and its application.
16
9 appendixgenerator 1 for last 1 bits test when t=1,
when, n=100, test value is 0 , 1
n=200, test value is 0 , 1
n=300, test value is 0 , 1
now for generator 2,when t=1
when, n=100, test value is 0.36 , 0.548506
n=200, test value is 0.36 , 0.548506
n=300, test value is 0.36 , 0.548506
now adding last 1 bits test when t=2, for generator 1.
when, n=100, test value is 300 , 0
n=200, test value is 300 , 0
n=300, test value is 300 , 0
now last 1 bit test when t=2, for generator 2,
when, n=100, test value is 11.44 , 0.00956972
n=200, test value is 11.92 , 0.00766229
n=300, test value is 11.44 , 0.00956972
now last 1 bit test when t=3 for generator 1.
when, n=100, test value is 300 , 0
n=200, test value is 77.6 , 4.24105
n=300, test value is 100 , 0
now last 1 bit test when t=3 for generator 2.
when, n=100, test value is 10.08 , 0.184085
n=200, test value is 8.8 , 0.267336
n=300, test value is 9.76 , 0.202587
now last 3 bit test when t=1 for generator 1.
when, n=100, test value is 4.84 , 0.0278069
n=200, test value is 7.84 , 0.00511026
n=300, test value is 9 , 0.0026998
now last 3 bit test when t=1 for generator 2.
when, n=100, test value is 0 , 1
n=200, test value is 0.64 , 0.423711
n=300, test value is 1.44 , 0.230139
now last 3 bit test when t=2 for generator 1.
when, n=100, test value is 21.84 , 0.0000704273
n=200, test value is 21.84 , 0.0000704273
n=300, test value is 23.12 , 0.0000381227
17
now last 3 bit test when t=2 for generator 2.
when, n=100, test value is 3.76 , 0.288573
n=200, test value is 1.76 , 0.623678
n=300, test value is 0.56 , 0.905525
now last 3 bit test when t=3 for generator 1.
when, n=100, test value is 0.16 , 0.999988
n=200, test value is 0.16 , 0.999988
n=300, test value is 0.16 , 0.999988
now last 3 bit test when t=3 for generator 2.
when, n=100, test value is 3.36 , 0.849824
n=200, test value is 2.4 , 0.934437
n=300, test value is 2.88 , 0.895877
now last 6 bit test when t=1 for generator 1.
when, n=100, test value is 0.04 , 0.841481
n=200, test value is 0.16 , 0.689157
n=300, test value is 0.16 , 0.689157
now last 6 bit test when t=1 for generator 2.
when, n=100, test value is 1.44 , 0.230139
n=200, test value is 1.16 , 0.689157
n=300, test value is 0.36 , 0.548506
now last 6 bit test when t=2 for generator 1.
when, n=100, test value is 9.36 , 0.0248683
n=200, test value is 7.84 , 0.0494368
n=300, test value is 14.48 , 0.00231953
now last 6 bit test when t=2 for generator 2.
when, n=100, test value is 0.32 , 0.956224
n=200, test value is 0.72 , 0.86849
n=300, test value is 0.96 , 0.810929
now last 6 bit test when t=3 for generator 1.
when, n=100, test value is 3.04 , 0.881277
n=200, test value is 3.04 , 0.881277
n=300, test value is 4.68 , 0.7038
now last 6 bit test when t=3 for generator 2.
when, n=100, test value is 9.32 , 0.193153
n=200, test value is 3.2 , 0.865905
n=300, test value is 8 , 0.332594
if we organize avobe information into a table then it becomes
generator:1 for last 1 bit
18
test test1 test2 test3
monobit test 1 1 1
2 sequence test 0 0 0
3 sequence test 0 0 0
generator:2 for last 1 bit
test test1 test2 test3.
monobit test 0.548506 0.548506 0.548506
2 sequence test 0.00956972 0.00766229 0.00956972
3 sequence test 0.184085 0.267336 0.202587
generator:3 for last 1 bit
test test1 test2 test3.
monobit test 1 1 1
2 sequence test 0 0 0
3 sequence test 0 0 0
generator:4 for last 1 bit
test test1 test2 test3.
monobit test 1 1 1
2 sequence test 0 0 0
3 sequence test 0 0 0
generator:5 for last 1 bit
test test1 test2 test3.
monobit test 1 1 1
2 sequence test 0 0 0
3 sequence test 0 0 0
generator:1 for last 3 bits
test test1 test2 test3.
monobit test 0.0278069 0.00511026 0.0026998
2 sequence test 0.0000704273 0.0000704273 0.0000381227
3 sequence test 0.999988 0.999988 0.999988
generator:2 for last 3 bits
test test1 test2 test3.
monobit test 1 0.423711 0.230139
2 sequence test 0.288573 0.623678 0.905525
3 sequence test 0.849824 0.934437 0.895877
generator:3 for last 3 bits
test test1 test2 test3.
monobit test 0.689157 0.689157 0.689157
2 sequence test 0.0199763 0.0000176682 1.88138–
3 sequence test 0.151307 0.625835 0.995448
generator:4 for last 3 bits
19
test test1 test2 test3.
monobit test 1 0.841481 0841481
2 sequence test 0.0000704273 0.0000704273 0.0000381227
3 sequence test 0.999988 0.999988 0.999988
generator:5 for last 3 bits
test test1 test2 test3.
monobit test 0.0026998 0.000318217 0.000673859
2 sequence test 6.47316– 1.21377– 1.70053–
3 sequence test 0 0 0
generator:1 for last 6 bits
test test1 test2 test3.
monobit test 0.841481 0.689157 0.689157
2 sequence test 0.0248683 0.0.494368 0.00231953
3 sequence test 0.881277 0.881277 0.7038
generator:2 for last 6 bits
test test1 test2 test3.
monobit test 0.230139 0.689157 0.548506
2 sequence test 0.956224 0.86849 0.810929
3 sequence test 0.193153 0.865905 0.332594
generator:3 for last 6 bits
test test1 test2 test3.
monobit test 0.841481 0.689157 0.423711
2 sequence test 0.86849 0.696186 0.86849
3 sequence test 0.0655731 0.74227 0.984314
generator:4 for last 6 bits
test test1 test2 test3.
monobit test 0.548506 0.841481 0.689157
2 sequence test 0.65939 0.830251 0.606269
3 sequence test 0.909641 0.865905 0.922512
generator:5 for last 6 bits
test test1 test2 test3.
monobit test 0.423711 0.0278069 0.230139
2 sequence test 0.016033 0.0045509 0.00888689
3 sequence test 0.000555888 0.000487356 0.000487356
Here i am giving mathematica code in below
Length of subsequence to testt = 2;outputbits = 3;m = 400;
20
The generatorlincongen[a_Integer, b_Integer, n_Integer, x_Integer] := Mod[a*x + b, n]Generate a suequence. list1 contains numbers modulo nlis2 contains numbers modulo 2f[x_] := lincongen[25, 5, m, x];list1 = {};list2 = {};s = 2;x = s;Do[AppendTo[list1, x];bits = IntegerDigits[x, 2, outputbits];list2 = Join[list2, bits];x = f[x];, {i, 1, m}]list1;list2;list2Create a table containing the frequences of the different subsequences
totalfreq = Table[0, {i, 0, 2^t - 1}]Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];totalfreq[[FromDigits[test, 2] + 1]]++;, {i, 0, Length[list2]/t - 1}]
{0, 0, 0, 0}FromDigits[{1, 1, 1, 1}, 2]totalfreq{150, 250, 50, 150}
n = 100;freq = Table[0, {i, 0, 2^t - 1}];Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];freq[[FromDigits[test, 2] + 1]]++;, {i, 0, n - 1}]stat = N[Sum[(freq[[i]] - n*(1/2^t))^2/(n/2^t), {i, 1, 2^t}]]testvalue = 1 - N[CDF[ChiSquareDistribution[2^t - 1], stat]]
21.84
0.0000704273
freq
21
{50, 50}
n = 100;freq = Table[0, {i, 0, 2^t - 1}];Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];freq[[FromDigits[test, 2] + 1]]++;, {i, n, 2*n - 1}]stat = N[Sum[(freq[[i]] - n*(1/2^t))^2/(n/2^t), {i, 1, 2^t}]]testvalue = 1 - N[CDF[ChiSquareDistribution[2^t - 1], stat]]
0.04
0.841481
n = 100;freq = Table[0, {i, 0, 2^t - 1}];Do[pos = t*i + 1;test = Take[list2, {pos, pos + t - 1}];freq[[FromDigits[test, 2] + 1]]++;, {i, 2*n, 3*n - 1}]stat = N[Sum[(freq[[i]] - n*(1/2^t))^2/(n/2^t), {i, 1, 2^t}]]testvalue = 1 - N[CDF[ChiSquareDistribution[2^t - 1], stat]]
0.04
0.841481
22
SE-351 95 Växjö / SE-391 82 Kalmar Tel +46-772-28 80 00 [email protected] Lnu.se