Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Subwords, Regular Languages, and Prime
Numbers
Jeffrey ShallitSchool of Computer Science
University of WaterlooWaterloo, Ontario N2L 3G1
https://www.cs.uwaterloo.ca/~shallit
Joint work with Curtis Bright and Raymond Devillers.
1 / 34
Partial orders
Recall: a partial order “≤” on a set S is a subset T ⊆ S × Ssatisfying three properties (where we write x ≤ y if (x , y) ∈ T ):
1. Reflexive: ∀x x ≤ x
2. Transitive: ∀x , y , z x ≤ y and y ≤ z implies x ≤ z
3. Anti-symmetric: ∀x , y x ≤ y and y ≤ x implies x = y
So partial orders mimic the behavior of “≤” on the real numbers.
2 / 34
Comparable and incomparable elements
We say x , y ∈ S are comparable according to the partial order ifeither x ≤ y or y ≤ x .
Otherwise they are incomparable.
An antichain is a list of pairwise incomparable elements.
Some partial orders have infinite antichains and some do not...
3 / 34
Antichains in Nk
Consider the following partial order on k-tuples of natural numbers(Nk):
a point (a1, a2, . . . ak) is ≤p (b1, b2, . . . , bk)
if a1 ≤ b1, a2 ≤ b2, . . ., ak ≤ bk .
Are there infinite antichains in this partial order?
4 / 34
Antichains in Nk
No! We prove this by induction on k .
For k = 1 this is clear: any two elements of N are comparable.
Otherwise assume true for k − 1 and we prove for k .
Let p1, p2, . . . be an infinite antichain in Nk .
Since each of p2, p3, . . . are incomparable to p1, each pi has somecoordinate where it is less than the corresponding coordinate of p1.
Since there are only k coordinates, some coordinate has theproperty that infinitely many of the pi are less than p1 in thatcoordinate.
Without loss of generality, let it be the first coordinate.
5 / 34
Antichains in Nk
Call these infinitely many pi
q1, q2, q3, . . . .
Now there are only finitely many non-negative integers less thanthe first coordinate of p1, so there is some non-negative integersuch that infinitely many of the qi have their first coordinate equal(say equal to d for some d < first coordinate of p1).
Call these r1, r2, . . ..
Now delete the first coordinate of each of the ri to get infinitelymany pairwise incomparable elements in N
k−1, a contradiction.
That completes the proof.
6 / 34
Partial orders on words
There are a number of obvious partial orders on words:
x ≤ y if |x | ≤ |y |
x ≤ y if x is a factor of y (a contiguous block sitting inside y , theway ore is a factor of theorem)
x ≤ y if x precedes y in alphabetic order
x ≤ y if x is a subword of y (alternatively, x is obtained from y bystriking out 0 or more letters of y , the way them is a subword oftheorem)
Note: “subword” is also called “scattered subword” or “substring”or “subsequence”.
7 / 34
The factor partial order has infinite antichains
For example, the set
{abna : n ≥ 1} = {aba, abba, abbba, . . .}
is an infinite set in which no two words are factors of each other.
8 / 34
Higman-Haines theorem: the subword partial order has no
infinite antichains
Write x ⊳ y for the partial order “x is a subword of y” and x ⊳/yfor “x is not a subword of y”.
Proof strategy: assume there is an infinite antichain.
This implies the weaker result that there is an infinite division-freesequence of words (fi )i≥1, i.e., a sequence of strings f1, f2, . . . suchthat i < j =⇒ fi ⊳/fj .
Now iteratively choose a minimal such sequence, as follows:
◮ Let f1 be a shortest word beginning an infinite division-freesequence;
◮ Let f2 be a shortest word such that f1, f2 begins an infinitedivision-free sequence;
◮ Let f3 be a shortest word such that f1, f2, f3 begins an infinitedivision-free sequence; etc.
9 / 34
Higman-Haines theorem: the subword partial order has no
infinite antichains
By the pigeonhole principle, there exists an infinite subsequence ofthe fi , say fi1 , fi2 , fi3 , . . . such that each of the strings in thissubsequence starts with the same letter, say a.
Define xj for j ≥ 1 by fij = axj . Then
f1, f2, f3, . . . , fi1−1, x1, x2, x3, . . .
is an infinite division-free sequence which precedes (fi )i≥1,contradicting the supposed minimality of (fi )i≥1.
To see this, note that fi ⊳/fj for 1 ≤ i < j < i1 by assumption.
Next, if fi ⊳ xj for some i with 1 ≤ i < i1 and j ≥ 1, thenfi ⊳ axj = fij , a contradiction.
Finally, if xj ⊳ xk , then axj ⊳ axk , and hence fij ⊳ fik , acontradiction. That completes the proof.
10 / 34
The difference between infinite and very large
Notice that although we have proved there are no infinite pairwiseincomparable sets for the subword ordering, there are arbitrarilylarge such sets.
For example, the language {0, 1}n consists of 2n strings that arepairwise incomparable.
11 / 34
Two operations on languages
We now introduce two operations on languages, the subword andsuperword operations.
Let L ⊆ Σ∗.
We define
sup(L) = {x ∈ Σ∗ : there exists y ∈ L such that y ⊳ x}
sub(L) = {x ∈ Σ∗ : there exists y ∈ L such that x ⊳ y}
Our goal is to prove that if L is a language, then sub(L) andsup(L) is regular.
12 / 34
Basics
LemmaLet L ⊆ Σ∗. Then
(a) L ⊆ sup(L);
(b) L ⊆ sub(L);
(c) sub(L) = sub(sub(L)).
13 / 34
Minimal elements
Let R be a partial order on a set S .
Then we say x ∈ S is minimal if
yRx =⇒ y = x
for y ∈ S .
14 / 34
Basic properties of minimal elements
Let D(y) be the set {x ∈ S : xRy}.
LemmaLet R be a partial order on a set S.
(a) If x , y are distinct minimal elements, then x , y areincomparable.
(b) Suppose the set D(y) is finite. Then there exists a minimal y ′
such that y ′Ry.
15 / 34
The result for sup
LemmaLet L ⊆ Σ∗. Then there exists a finite subset M ⊆ L such thatsup(L) = sup(M).
Proof.Let M be the set of minimal elements of L.
We proved that the elements of M are pairwise incomparable.Hence M is finite.
It remains to see that sup(L) = sup(M).
Clearly sup(M) ⊆ sup(L). Now suppose x ∈ sup(L).
Then there exists y ∈ L such that y ⊳ x . By lemma above, thereexists y ′ ∈ M such that y ′ ⊳ y .
Then y ′ ⊳ y ⊳ x , and so x ∈ sup(M).16 / 34
The second lemma
LemmaLet L ⊆ Σ∗. Then there exists a finite subset G ⊆ Σ∗ such thatsub(L) = Σ∗ − sup(G ).
Proof.Let T = Σ∗ − sub(L). I claim that T = sup(T ).
Clearly T ⊆ sup(T ).
Suppose sup(T ) 6⊆ T .
Then there exists an x ∈ sup(T ) with x 6∈ T .
Since T = Σ∗ − sub(L), this means x ∈ sub(L).
Since x ∈ sup(T ), there exists y ∈ T such that y ⊳ x .
Hence, by a lemma, we have y ∈ sub(L).17 / 34
The second lemma
But then y 6∈ T , a contradiction.
Finally, by part (a) there exists a finite subset G such thatsup(G ) = sup(T ).
Then sup(G ) = sup(T ) = T = Σ∗ − sub(L), and sosub(L) = Σ∗ − sup(G ).
18 / 34
The main result
TheoremLet L be a language (not necessarily regular). Then both sub(L)and sup(L) are regular.
Proof.
Clearly sup(L) is regular if L = {w} for some single word w .
This is because if w = a1a2 · · · ak , then
sup({w}) = Σ∗a1Σ∗a2Σ
∗ · · ·Σ∗akΣ∗.
Similarly, for any finite language F ⊆ Σ∗, sup(F ) is regular because
sup(F ) =⋃
w∈F
sup({w}).
19 / 34
The main result
Now let L ⊆ Σ∗, and let M and G be defined as in the proof before.
Then sup(L) = sup(M), and so sup(L) is regular, since M is finite.
Also, sub(L) = Σ∗ − sup(G ), and so sub(L) is regular since G isfinite. That completes the proof.
20 / 34
Representations of integers
We’ll represent integers in base k using the digits 0, 1, . . . , k − 1.
We’ll write (n)k for the word giving the canonical representation ofthe integer n in base k (with no leading zeroes).
We’ll write [w ]k for the integer represented by the word w in basek (where w can have leading zeroes).
21 / 34
Minimal elements for the prime numbers
Consider the language
P3 = {2, 10, 12, 21, 102, 111, 122, 201, 212, 1002, . . .},
which represents the primes in base 3.
I claim that the minimal elements of P3 are {2, 10, 111}.
Clearly each of these are in P3 and no proper subword is in P3.
Now let x ∈ P3.
If 2 ⊳/x , then x ∈ {0, 1}∗.
If further 10 ⊳/x , then x ∈ 0∗1∗.
22 / 34
An example involving prime numbers
Since x represents a number, x cannot have leading zeroes.
It follows that x ∈ 1∗.
But the numbers represented by the strings 1 and 11 are notprimes.
However, 111 represents the number 13, which is prime.
It now follows that
sup(P3) = Σ∗2Σ∗ ∪ Σ∗1Σ∗0Σ∗ ∪ Σ∗1Σ∗1Σ∗1Σ∗
where Σ = {0, 1, 2}.
On the other hand, sub(P3) = Σ∗. This follows from Dirichlet’stheorem on primes in arithmetic progressions, which states thatevery arithmetic progression of the form (a+ nb)n≥0,gcd(a, b) = 1, contains infinitely many primes.
23 / 34
The base-10 case
THE PRIME GAME
Ask a friend to write down a prime number.Bet them that you can always strike out 0 or
more digits to get a prime on this card.
2, 3, 5, 7, 11, 19, 41, 61, 89, 409, 449, 499, 881, 991,
6469, 6949, 9001, 9049, 9649, 9949, 60649,
666649, 946669, 60000049, 66000049, 66600049
c©2007 - [email protected]
24 / 34
Minimal elements for the primes in other bases
A computationally difficult problem! No algorithm is known that isguaranteed to halt.
There is a “sort-of” algorithm:
(1) M := ∅(2) while (L 6= ∅) do
(3) choose x , a shortest string in L
(4) M := M ∪ {x}
(5) L := L− sup({x})
It’s hard to carry out step (5)!
In practice we work with L′, a regular “over-approximation” of L,and we assume L′ is the union of sets of the form L1L
∗2L3, and use
heuristics.25 / 34
Heuristics
We have to rule out prime numbers in various regular languages.
One method is to find an N such that N divides each of thenumbers [xL∗z ]b.
You might think you have to check [xLiz ]b for all i .
But in fact
LemmaLet x , z ∈ Σ∗
b, and let L ⊆ Σ∗b. Then N divides all numbers of the
form [xL∗z ]b iff N divides [xz ]b and all numbers of the form [xLz ]b.
26 / 34
Heuristics
Corollary
If 1 < gcd([xz ]b, [xy1z ]b, . . . , [xynz ]b) < [xz ]b, then all numbers ofthe form [x{y1, y2, . . . , yn}
∗z ]b are composite.
Example: since gcd(49, 469) = 7, every number with base-10representation of the form 46∗9 is divisible by 7 and hencecomposite.
27 / 34
Other heuristics
Difference-of-squares factorization:
An example: since
[44n1]16 =(4n+1 · 8 + 7)(4n+1 · 8− 7)
15,
it follows that all numbers of the form [44n1]16 are composite.
28 / 34
Our results
We were able to find the minimal elements for the primes in allbases up to 16, and some additional bases up to 30.
Sometimes we had to do primality tests on very large numbers(with thousands of digits).
For primes of the form 4n + 3 in base 10, the set of minimalelements consists of 13 elements, with the largest having 19153decimal digits! This was proved prime by Francois Morain.
29 / 34
Minimal elements for the composite numbers
By contrast, for computing the minimal elements for the compositenumbers, there is an algorithm (Devillers).
Write Sb := { (n)b : n ≥ 4 is composite }.
TheoremEvery minimal element of Sb is of length at most b + 2.
Proof.
Consider any word w of Sb of length ≥ b + 3.
Since there are only b distinct digits, some digit d is repeated atleast twice, so that dd ⊳ w .
If d > 1, the number [dd ]b is composite, as it is divisible by [11]bbut not equal to it.
30 / 34
Minimal elements for the composite numbers
If d = 0, then some nonzero digit c precedes it in w , so c00 ⊳ wand [c00]b is divisible by b2, which is composite.
Finally, if no digit other than 1 is repeated, then 1111 ⊳ w , and[1111]b = [11]b · [101]b, and hence is composite.
31 / 34
Minimal elements for the composite numbers, base 10
They are:
{4, 6, 8, 9, 10, 12, 15, 20, 21, 22, 25, 27, 30, 32, 33, 35, 50,
51, 52, 55, 57, 70, 72, 75, 77, 111, 117, 171, 371, 711, 713, 731}
32 / 34
Some open problems
1. What are the minimal elements for the powers of 2, expressedin base 10? Probably
{1, 2, 4, 8, 65536}
but nobody knows how to prove this!
2. Are there infinitely many primes whose base-10 representationconsists of all 1’s? The only known “repunit” primes are ofthe form (10p − 1)/9 for p = 2, 19, 23, 317, 1031. It seemslikely that those for p = 49081, 86453, 109297, 270343 arealso prime, but this has not been rigorously proven.
33 / 34
Some open problems
3. Is the following problem decidable? Given a finite automaton Aaccepting (say) numbers expressed in base 2, does A accept thebase-2 representation of at least one prime number? By contrast,the same problem with “prime” replaced with “composite” isdecidable.
4. Is the following even weaker variant decidable? Given a regularexpression of the form xy∗z , does it represent the base-2 expansionof at least one prime number? If this were decidable, in principle wecould determine if there exists another Fermat prime in addition to22
i
+ 1 for i = 0, 1, 2, 3, 4. (Choose x = 1, y = 0, and z = 0161.)
34 / 34