Sudoku and the Minimum Number of Clues...Sudoku and the Minimum Number of Clues Abstract The Sudoku success story started in Japan in the 1990s and took the world by storm in the beginning

Bachelor’s Thesis in Computer Science at Stockholm University, Sweden 2012

Sudoku and the Minimum Number of Clues

Anders Hård

NADA

Sudoku and the Minimum Number of Clues Anders Hård

Bachelor’s Thesis in Computer Science (15 ECTS credits) Single Subject Courses Stockholm University year 2012 Supervisor at Nada was Johan Håstad Examiner was Johan Håstad TRITA-CSC-E 2012:069 ISRN-KTH/CSC/E--12/069--SE ISSN-1653-5715 Department of Numerical Analysis and Computer Science KTH CSC SE-100 44 Stockholm, Sweden

Sudoku and the Minimum Number of Clues

AbstractThe Sudoku success story started in Japan in the 1990s and took the world by storm in the beginning of the century. This devious number puzzle can be viewed as a graph coloring problem and have given many a commuter headache as they try to solve the daily Sudoku in the local newspaper. How can we use mathematics to analyze them and what can we learn from them? This thesis discusses the progress that have been made in the field of Sudoku research such as NP-completeness, the total number of valid grids and efficient ways of solving and creating Sudokus, as well as poses some questions for the future. In particular, this thesis addresses the least-number-of-clues problem:

What is the minimum number of clues required for a Sudoku puzzle to be proper, i.e. have a single, unique solution?

The thesis introduces the concept of unavoidable sets in order to analyze this question and others. We will argue that the answer is most probably 17, although a proof is yet to be found.

Sudoku och minsta antalet ledtrådar

ReferatSudokuts framgångssaga startade i Japan i slutet av 1990-talet och tog världen med storm i början av århundradet. Detta knepiga logikspel kan ses som ett graffärgningsproblem och har gett många pendlare huvudvärk när de försöker lösa det dagliga Sudokut som publiceras i den lokala tidningen. Hur kan vi använda matematik för att analysera dem och vad vi kan vi lära från dem? Detta examensarbete diskuterar de framsteg som gjorts inom Sudoku-forskningen såsom NP-fullständighet, antalet giltiga bräden och effektiva lösnings- och genereringsmetoder, samt ställer även upp frågor för framtiden. Den största frågan för detta examensarbete var den om det minsta antalet ledtrådar, dvs:

Vad är det minsta antalet ledtrådar som krävs för att ett Sudoku ska vara entydigt, dvs ha en, unik lösning?

Examensarbetet introducerar konceptet oundvikliga mängder som ett led i att analysera denna och andra frågor. Vi argumenterar för att svaret med största sannolikhet är 17, även om vi fortfarande väntar på ett bevis.

PrefaceThis bachelor’s thesis has been a very long and drawn-out process due to a number of reasons. With several years in the making, I hope time has affected it like it does a fine wine. Therefore, I would like to thank my supervisor and examiner Johan Håstad for his incredible patience and always valuable input throughout the process.

Also a special thanks to my family for their support and especially to my lovely Marie, who has been pushing me to finish it. Your motivation and drive is an inspiration!

Anders Hård

Stockholm, February 2012

Table of contents1 Background............................................................................................12 Definitions and terminology..................................................................33 Previous results......................................................................................7

3.1 Number of Sudokus........................................................................73.2 Collection of 17s............................................................................73.3 Solving techniques.........................................................................8

3.3.1 Brute force..............................................................................83.3.2 Stochastic search....................................................................93.3.3 Exact cover.............................................................................93.3.4 Constraint Programming......................................................10

3.4 Methods of creation......................................................................103.5 NP-completeness..........................................................................11

4 Contributions........................................................................................124.1 The minimum number of clues....................................................12

4.1.1 Mathematical reasoning........................................................124.1.2 Exhaustive search.................................................................134.1.3 Unavoidable sets...................................................................134.1.4 Finding small unavoidable sets............................................14

5 Interesting questions............................................................................185.1 Minimum number of clues...........................................................18

5.1.1 Mathematical reasoning........................................................185.1.2 Using unavoidable sets.........................................................185.1.3 Exhaustive search.................................................................18

5.2 Maximum number of independent clues......................................195.3 Sudoku space................................................................................195.4 Real-world applications................................................................20

Sources....................................................................................................22

1 BackgroundA Sudoku is a number puzzle that started its success story in Japan in the end of the last century. It reached USA and Europe by 2005 and has been a regular feature in many newspapers ever since.

The rules are simple. Each number from 1 to 9 must appear once and only once in each row, column and “box”. It is essentially a Latin square of rank 9 with the extra condition of boxes. Some numbers will be given from the start and the objective is to fill in the rest using these clues and the constraints. One example of a Sudoku is given in Figure 1. Although you only need logic to solve them, mathematics gives us many tools with which we can analyze them.

A Sudoku can for example be viewed as a graph with each square in the puzzle being represented by a vertex. There would be edges between squares that are in the same row, column or box. The numbers could then be expressed as different colors with some of the vertices already colored. Solving the Sudoku would then be equivalent to extending this partial coloring to encompass all of the vertices under the constraint that no vertices that share an edge may have the same color.

1

Figure 1: Sudoku example.

5 3 7

6 1 9 5

9 8 6

8 6 3

4 8 3 1

7 2 6

6 2 8

4 1 9 5

8 7 9

The graph coloring problem is one of Karp’s 21 NP-complete problems and it was shown in 2002 that solving a Sudoku also is NP-complete in respect to the size of the puzzle (see chapter 3).

Other noteworthy results regarding Sudokus include the total number of grids. It was shown in 2005 using logic and brute-force to be about6,67⋅1021 . This is significantly less than the total number of rank 9 Latin squares but it is still a large enough number to give us problems when trying to analyze them.

The (limited) progress that we have made to the field of Sudoku research will be presented in chapter 4. Although quite a lot of research has been done regarding Sudokus, there are still many unsolved questions. One part of this thesis is to pose these questions and try to discuss them as thoroughly as possible (chapter 5). To do that, we first need to define some terminology (chapter 2) and explain the progress that has been made so far (chapter 3).

2

2 Definitions and terminologyIn this chapter we give some definitions of basic Sudoku terminology.

SquareA square is the empty place where a number should be written when solving a Sudoku. A Sudoku consists of 81 such squares and the objective of the game is to determine the value of each square while avoiding conflicts with the already given and previously determined squares.

GridA grid is a collection of fully colored 9×9 squares that obey the rules of Sudoku Although there exists both larger and smaller Sudokus than9×9 , we have focused our attention on this standard size as they are the most common.

PuzzleA partially colored grid is called a puzzle. A puzzle with only one possible solution is called a proper puzzle. Most people agree that proper puzzles are the interesting ones. We investigate this distinction more closely later on.

ColorSudokus most commonly use the digits 1–9 as the values of each square. The letters A–I or any other set of 9 symbols would do equally well, however, since the actual numbers are not that useful for calculations. We prefer to use the term color since this brings us back to the close connection between Sudokus and the graph coloring problem.

ClueThe clues of the puzzle are the squares with a given color from the start. Typically 17–32 clues are given initially.

BoxA Sudoku is made up of the 9 boxes from B1 to B9, each covering 9 squares. See Figure 2.

Band/StackA band is the set of three horizontal boxes and a stack is the set of three vertical boxes as shown in Figure 2.

3

SolutionA solution is a way to extend a partial coloring of a Sudoku to a full grid. A proper Sudoku only has one solution. It is not uncommon to find Sudokus in newspapers that have more than one solution, however, as it does make generating them a little bit easier for the creator.

Equivalence/IsomorphismTwo Sudokus are equivalent or isomorphic if one can be reached from the other through any number of the following operations:

• Relabeling the colors.• Permuting the bands.• Permuting the rows within a band.• Permuting the stacks.• Permuting the columns within a stack.• Reflection along the diagonal axises.• Rotation by 90/180/270 degrees.

4

Figure 2: Bands and stacks.S

tack 1

Stack 2

Stack 3

Band 1 B1 B2 B3

Band 2 B4 B5 B6

Band 3 B7 B8 B9

Independent clueAn independent clue contains information that cannot be derived from the other clues. Thus, an independent clue cannot be left out without increasing the number of solutions.

Minimal SudokusA minimal Sudoku contains only independent clues. All clues are necessary to make the puzzle proper. If any clue is erased, the puzzle will have two or more solutions.

Unavoidable set (UA)We define an unavoidable set to be a subset of the grid that has to have representation in the set of clues in order for the puzzle to be proper, i.e. have more than one solution in this case. We’ll make some basic examples: Take a look at Figure 3, which is the solution to Figure 1 but missing a few of the squares. In both rows of empty squares, we miss a “6” and a “7”. Either order (67/76 or 76/67) would result in a valid grid. Hence, if none of these four squares were given from the start, there would be at least two solutions to the puzzle. These 4 squares constitutes an unavoidable set.

5

Figure 3: An unavoidable set of size 4.

5 3 4 8 9 1 2

6 7 2 1 9 5 3 4 8

1 9 8 3 4 2 5 6 7

8 5 9 1 4 2 3

4 2 6 8 5 3 7 9 1

7 1 3 9 2 4 8 5 6

9 6 1 5 3 7 2 8 4

2 8 7 4 1 9 6 3 5

3 4 5 2 8 6 1 7 9

In Figure 4 we are missing 6 squares, three in both rows. Either row could be completed in two ways with the other row to match. The two solutions are 179/791 and 791/179.

Unavoidable sets can be from 4 squares up to all of the squares and contain anywhere from 2 to all 9 colors. There are typically many thousands of unavoidable sets of various sizes in any Sudoku grid and they are central to this thesis. We study these in detail in chapter 4.

6

Figure 4: An unavoidable set of size 6.

5 3 4 6 7 8 9 1 2

6 7 2 1 9 5 3 4 8

1 9 8 3 4 2 5 6 7

8 5 9 7 6 1 4 2 3

4 2 6 8 5 3

7 1 3 9 2 4 8 5 6

9 6 1 5 3 7 2 8 4

2 8 7 4 1 9 6 3 5

3 4 5 2 8 6

6

3 Previous resultsThis chapter summarizes the, in our opinion, most important mathematical results reached so far in the field of Sudokus. A lot of work has also been done in the field of solving Sudokus using logic but that is not something particularly relevant to this thesis and has therefore been left out.

3.1 Number of Sudokus

Early Sudoku analysts naturally asked themselves how many possible Sudoku grids there are. The number is obviously smaller than the number of Latin squares of rank 9 since there are extra constraints. It is known that the number of 9×9 Latin squares is approximately5,525⋅1027 but how many of these are actually Sudokus?

The first known calculation was posted in 2003 by the pseudonym “QSCGZ” on an Internet discussion board[1]. Felgenhauer/Jarvis later posted a detailed calculation[2], reaching the same total of9!⋅722

⋅27⋅27704 267971=6670903 752021072936 960≈6,67⋅1021 .

This number was found by analyzing the permutations of the first band, then brute-forcing the rest of the board. These calculations have later been confirmed by several other, faster methods.

Russell/Jarvis were then the first to calculate how many of these were essentially different, considering all the possible symmetries given in the last chapter (chapter 2: Equivalence). They first counted all the possible symmetries to divide the Sudokus into isomorphic groups before using Burnside’s Lemma from group theory to calculate that the number of essentially different Sudokus is 5 472 730 538[3].

3.2 Collection of 17s

In order to find the minimum number of clues needed to have a proper puzzle (see chapter 4), Gordon Royle has compiled an impressive collection of minimal proper puzzles with only 17 clues[4]. These are interesting because the general consensus is that 17 is the minimum number of clues needed. In total, 49 151 distinct 17-clue Sudokus have been found (as of 25th of October 2010). It is believed that this list is close to complete, as very few new puzzles are being found.

77

However, as randomized searches have proved to be inefficient, most have been found by removing and adding clues to an existing low-clue puzzle. This leads to the possibility of other “regions” in Sudoku space, yet to be explored, that contains more 17-clue Sudokus. We discuss this more in chapter 4.

3.3 Solving techniques

Solving a Sudoku by hand can be anything from a trivial 5 minute job to something that potentially can cause lack of sleep. It is possible to create Sudokus that human logic is incapable of solving, forcing trial and error and backtracking. A cleverly written computer program, on the other hand, can solve any Sudoku in milliseconds. There are several different algorithms that can be used and here we do a quick summary of the most common ones.

3.3.1 Brute force

The brute force algorithm is the most obvious and the simplest to program. The algorithm can for example start in the first square and, if the square is free and no conflicts would arise, write a “1”. It then steps to the next square and tries to write a “1” – which creates a conflict since there is already a “1” in that row. So it tries to write a “2” and moves on. If it’s not possible to write any number in a square, the algorithm backtracks and increases the previous number by one before continuing. The algorithm will always find a solution, even to the most difficult puzzles. The brute force algorithm may be easy to program but can be very slow since very little available information is used and it probably needs to backtrack quite often. Figure 5 shows a puzzle that is close to the worst case scenario for this simplest form of the brute force algorithm. No clues are given in the first row, which solves to 987654321, forcing a huge amount of time spent before even the first row is correct. While most typical Sudokus are solved in a few seconds with this brute force method, this particular Sudoku takes about 30-45 minutes to solve using a 3 GHz processor. More advanced brute force algorithms would for example first rotate this Sudoku to find the best starting point, before solving.

8

3.3.2 Stochastic search

It is also possible to solve a Sudoku using a stochastic, i.e. random-based, search. You can for example start by randomly assigning numbers to the blank squares of a grid and then calculate the number of errors. Now start shuffling these inserted numbers around the grid while decreasing the number of errors until there are no more conflicts which means you have found the solution. This algorithm is faster than using brute force but otherwise shares the same benefits and drawbacks. It does not use much of the information given which leaves a big search space.

3.3.3 Exact cover

A Sudoku can be viewed as an instance of the exact cover problem. Exact cover is one of Karps original 21 NP-complete problems and allows an elegant and fast solution using a backtracking algorithm. In brief, a Sudoku uses 324 constraint sets (4 constraints per square, each of which contains 9 possibilities), and 729 possibilities (every square can be colored in 9 ways). Thus the problem can be formulated as choosing 81 of these possibilities so that precisely all the constraint sets are covered. Some of these possibilities will already have been chosen in the form of clues, and every chosen possibility rules out many of the others.

9

Figure 5: A near worst case scenario for brute-forcing.

3 8 5

1 2

5 7

4 1

9

5 7 3

2 1

4 9

Much more of the board information is used to reduce the search space, which makes this a much faster method than the aforementioned two.

3.3.4 Constraint Programming

Constraint programming is perfect for solving Sudokus since all the information given by the board can be used to narrow the search space. There are many different reasoning algorithms available that can be applied to Sudokus to solve the problem, as detailed in a paper by Helmut Simoni[5]. Almost all Sudokus can be solved without backtracking when using a strong reasoning algorithm which makes it even faster than using an exact cover implementation.

3.4 Methods of creation

Creating a Sudoku puzzle is in many ways a more interesting problem than actually solving it. Creating a random Sudoku is not as easy as one might think, if we by random mean that the algorithm generates all of the approximately 1037 possible minimal Sudoku puzzles, calculated by Berthier[6], by equal probability. The fastest methods of generating Sudokus are either biased and/or not able to generate all of the possible puzzles.

An obvious method, called top-down, is to start with a random grid and just remove one randomly selected clue, then try to solve the puzzle. If there are several solutions, backtrack to the previous state and try to remove another clue. Otherwise, keep going until no more clues can be removed and you’ll have a minimal puzzle. This method may seem random but is in fact biased towards puzzles with fewer clues since it always tries to achieve a puzzle with as few clues as possible. A better way is to, as soon as you reach a point where you have multiple solutions, discard the puzzle and start over. This will result in only very few runs actually producing a puzzle (about 1 in 250 000) but will improve the bias. Unfortunately, it does not remove it completely.

There are ways to generate a truly random puzzle, but they are all problematic. For example, one can start by generating a random complete grid P. Then, for each cell in P, delete it’s value with a probability of 1 in 2, thus obtaining a puzzle Q. If Q is minimal you’re done, otherwise set P=Q and start over. This is an unbiased way of generating puzzles but the chances of actually reaching a valid puzzle is, however, extremely slim so it is not a practical method.

10

3.5 NP-completeness

The task of solving a Sudoku was proven to be NP-complete in regard to the size of the puzzle in 2002 by Yato/Seta[7]. They built on previous results by Colbourn that had showed NP-completeness of partial Latin square completion in 1983. Yato/Seta were able to construct a reduction of the partial Latin square completion problem to Sudoku in polynomial time, thus showing that solving a Sudoku puzzle is also NP-complete.

1111

4 ContributionsThe main objective of this thesis is to investigate the minimum number of clues problem. Even if we do not resolve this problem, we in this chapter discuss some approaches that might lead to insight into this problem.

4.1 The minimum number of clues

The problem of minimum number of clues can be formulated as follows:

What is the minimum number of clues required for a Sudoku puzzle to be proper, i.e. have a single, unique solution?

As stated in the previous chapter, the believed answer is 17. About 50 000 proper 17-clue puzzles have been found by Sudoku programmers world-wide but not a single 16-clue puzzle. Because very few new 17-clue puzzles are being found, and because 65 different 17-clue puzzles could be created from every 16-clue puzzle by just adding a clue to any empty square, the general consensus is that no 16-clue puzzle exists. The most fruitful grid found can generate 29 different 17-clue puzzles so the existence of a 16-clue puzzle does indeed seem extremely improbable. Yet, it has not been proven.

While our very optimistic hope was to prove the nonexistence of a 16-clue puzzle, we quickly realized that it was beyond the scope of what was possible within a bachelors thesis. Thus, we instead tried to work from the bottom up, seeing what kind of results we could reach on puzzles with even fewer clues.

4.1.1 Mathematical reasoning

It is easy to prove that no 7-clue puzzle can exist. At least all but one color must be represented in order to have a single solution, otherwise the two colors could be switched and we would reach another valid Sudoku grid. We also know that at least two out of the three vertical and horizontal bands must be represented, and that at least two of the three rows and columns in every band must have a clue. This all follows from our definition on equivalence. Unfortunately, there’s not that much more we can say.

When analyzing Sudoku puzzles we have to remember that there are Sudokus that are unsolvable with human logic, while still only having one solution. Thus, there does not exist any one set of logical rules that is enough to solve – and prove the nonexistence of a solution – of all

1212

puzzles. If we had such a complete set of tools, deducing whether a set of clues makes the puzzle proper would be an easy task and that would in turn make it possible to determine what conditions would be insufficient. Unfortunately, we are not that lucky.

4.1.2 Exhaustive search

The problem could be solved quite easily in theory by a brute-force algorithm testing to solve all possible locations of 16 clues. However, the search space of such an algorithm is simply huge. A naive calculation shows that there are approximately

ways of placing 16 clues in an empty grid. Even if the fastest Sudoku solver is used to check the solutions, we could put all the computers on earth on the task and still not find a solution in our universes lifetime.

Luckily, this search space can be narrowed down significantly. Remember that every Sudoku grid has plenty of isomorphic grids that can be reached from different enumerations, reflections, rotations etc. Two isomorphic grids have the same number of minimum clues since the same transformation could be done on the puzzle as on the grid. Thus, we only need to consider clue placements that are not isomorphic to each other.

We also know that at least 8 colors must be represented to not force multiple solutions and we know many clue placements that are invalid, like ones that don’t cover at least two of the three bands etc. This will decrease the search space by a large factor but additional ideas are still needed for the problem to be solved by brute-force even using distributed computing in a reasonable amount of time. There was an Austrian project[8] that started to do exhaustive searches for 1, 2, 3, … clues in an empty grid but they only seemed to have reached 12. The project has been inactive and the web page closed since October 2009.

So, what other ways could there be to solve this problem? We turn our focus towards something that seems like a much more promising concept – unavoidable sets.

4.1.3 Unavoidable sets

In unavoidable sets we have another tool to use when analyzing Sudokus. The most important implication being that any set of clues that does not hit all the UA’s of a grid will have more than one solution, hence any proper puzzle will hit all the unavoidable sets in that grid.

13

(8116)⋅916

≈10120

In other words, we can find out the minimum number of clues required for a given grid by finding all the unavoidable sets and finding the smallest hitting set covering all these. However, even finding all the unavoidable sets in one given grid is a complex procedure that requires quite some computer work – from a few minutes up to 24 hours using the program called “gridchecker[9]” written by Mladen Dobrichev. Enumerating all the possible unavoidable sets in all possible Sudoku grids is hence not possible given today’s ideas and hardware.

4.1.4 Finding small unavoidable sets

In an effort to simplify the problem and investigate some of these unavoidable sets, we analyzed the number of small unavoidable sets (of size 4 and 6) in a Sudoku grid. We expanded upon some code written by Guenter Stertenbrink that finds small unavoidable sets in a given grid[10] into an iterative algorithm that searches for the fewest possible small unavoidable sets of any grid. We hoped to find some correlation between grids that have few small unavoidable sets and grids with a low number of minimum clues.

An essential part of any analytical approach to Sudokus is a not-too-biased grid generator. To be able to draw any conclusions of the efficiency of our algorithm we first need to analyze the distribution of random grids. The generator we wrote works one square at a time until the grid is finished, backtracking when necessary. Each square is given a random color, as long as there is no conflict with existing colored squares. If no color can be given to a square, the generator backtracks to the previous square and tries the next color. Using this algorithm, we are able to generate about 40 grids / second. It could be optimized but the generator is fast enough for our application.

In Figure 6 we show the distribution of the number of small (of size 4 and 6) unavoidable sets in random grids. Note the logarithmic Y-axis. The average when generating 105 random grids with our generator is 26.7 small UA’s with about 90% of the grids falling between 19 and 34 UA’s. We set out to improve on these numbers by swapping the colors of any two squares of which one is in a small UA, in order to break existing UA’s. To still have a valid Sudoku board, we must then realign the grid by shuffling all the squares in the two-color UA the two squares belonged to. We calculate how many small UA’s these new grids have and choose one to step to. To not end up in the closest local minimum every time, we need to allow the algorithm a chance to sometimes go to a ‘worse’ state.

14

By using this algorithm we see some large improvements in the number of small UA’s after only a few steps with smaller improvements the more steps we allow the algorithm to run. Figure 7 shows the effect of the program on the average number of small UA’s in a large batch of random grids. We were able to calculate the figures for fewer iterations with more confidence due to using more grids. For large numbers of iterations we used less grids in order to avoid unreasonably large amounts of computer work.

15

Figure 6: Distribution of number of unavoidable sets of sizes 4 and 6 in random grids.

1012

1416

1820

2224

2628

3032

3436

3840

4244

4648

5052

5456

5860

1

10

100

1000

10000

Distribution of 100k random grids

UA's

Small unavoidable sets

Co

un

t

Figure 7: Improving the numbers.

0 1 2 4 8 16 32 64 128 256 512 1024

0

5

10

15

20

25

30

Program effectiveness

UA's

Iterations

Ave

rag

e s

ma

ll U

na

void

ab

le s

ets

The smallest number of small unavoidable sets we could find in any grid was 3 and one of those grids can be seen in Figure 8. The squares that are part of an unavoidable set have been highlighted and squares in the same UA share the same notation. This grid unfortunately does not contain a proper 17-clue puzzle.

It is uncertain if we could reach the entirety of the Sudoku space with this method. Every step is through permuting the squares of an UA with two colors and all two-color UA’s are found and are possible steps. We discuss this further in the next chapter where we deal with the Sudoku space.

Recently (March 2010), some time after our program was written, Mladen Dobrichev made a post on Sudoku programmers forum that he had managed to search through the entirety of the 5 billion essentially different Sudoku grids for UA’s of size 4 and 6[11]. It produced some interesting results and it confirmed that 3 indeed was the smallest number of UA’s of size 4 and 6 possible, validating our program. Those grids were extremely rare and indeed; it took us many hours to find one. Interestingly, the maximum number of small UA’s in a grid is 162, with all of them being of size 6. His results also shows some correlation between 17-clue puzzles and a low number of small UA’s – from the average of 26.92 for all grids to 21.83 for grids that have a 17-clue puzzle.

16

Figure 8: A Sudoku grid with only 3 small unavoidable sets.

3 6 2 5 1 4 8 9 7

1 9 4 3 8 7 5 2 6

5 8 7 2 6 9 4 1 3

2 4 6 9 5 3 7 8 1

8 3 5 7 2 1 6 4 9

7 1 9 6 4 8 3 5 2

9 2 8 4 3 6 1 7 5

6 5 1 8 7 2 9 3 4

4 7 3 1 9 5 2 6 8

16

The next step would naturally be to include slightly larger UA’s and see if we get the same results. The computational work required will increase exponentially with the sizes of the UA’s included since there are many more types of UA’s of size 8 and 10 than of size 4 and 6. Hence we were forced to leave that outside this thesis.

We believe that there is still much to be learned about Sudokus through analyzing the unavoidable sets. They play an integral role in many problems related to the clue selection as well as the structure of the grid. You may not think about them when you try to solve a Sudoku from your favorite newspaper in the morning but keep in mind that they dictate what clues must be given and may in fact help you to solve the puzzle, providing it is indeed proper.

17

5 Interesting questionsIn this chapter we discuss some of the, in our opinion, most important unanswered questions.

5.1 Minimum number of clues

This problem is still unresolved. It is probably the most eagerly awaited result in Sudoku research. Here are our thoughts on some different methods of reaching a definite result in the matter:

5.1.1 Mathematical reasoning

Many pure logical approaches to the board and the position of clues will run into the problem that we actually do not know all the logical rules needed to solve all Sudokus. We do not really know how much information can be derived from having a clue in a certain position. Although much work have been done in the field of solving Sudokus, until we can solve every single Sudoku using a chain of logical rules many approaches remain invalid.

5.1.2 Using unavoidable sets

Using unavoidable sets to prove a minimum number of clues might be more fruitful. It reduces the problem to proving that there will always be unavoidable sets not covered if we use n < 17 clues. By deeper analyzing the structure of unavoidable sets and how they arise, we believe it should be possible to provide a better lower limit than n = 8 clues required.

To prove the nonexistence of a 16, however, one might have to investigate the total space of possible unavoidable sets. Given that every Sudoku grid has tens to hundreds of thousands of unavoidable sets, this space is simply huge and it is doubtful if any such approach will be feasible. And even if, it would likely be a harder problem than simply doing an exhaustive search.

5.1.3 Exhaustive search

Given the complexity of the problem and the large number of possible unavoidable sets, perhaps the easiest way to rule out the existence of a proper 16-clue puzzle is, after all, a cleverly designed exhaustive search. There are many ways to reduce the search space and given that “only” about 5 billion essentially different grids need to be considered, it is certainly possible. The current fastest method to search through a grid

1818

for a potential 16-clue puzzle (gridchecker) is, however, too slow. Assuming 1 hour per grid, this would give about five billion computer hours at today’s technology level. Not impossible but certainly too much to consider without some refinement to the algorithms used. We believe that it should be possible to find a faster way to check a grid for a 16-clue puzzle if optimizing the program for just his task.

One drawback to an exhaustive search is obviously that not much knowledge can be extended to other, related problems. In particular, we will have no use of the result when looking at 16×16 or bigger Sudokus.

5.2 Maximum number of independent clues

So far, we have only talked about the minimum number of clues required. There is a corresponding problem for the maximum number of clues. The maximum number of clues that can be given without the puzzle being proper is 77 – all the squares given except an unavoidable set of size 4. We need a slight addition to the previous sentence to make the problem interesting:

What is the maximum number of independent clues that can be given in a proper puzzle?

As of 11th of November 2010, 139 minimal 39-clue Sudokus have been found[12]. They seem to be rarer than the 17-clue puzzles and it is improbable that a minimal 40-clue puzzle exist. However, significantly less work have been done at this end of the spectrum. It is also much harder to find the maximum minimal proper puzzle in a given grid than it is to find the minimum proper puzzle, since there are significantly more ways to place 40 clues in a puzzle, than 16.

As with the minimum number of clues problem, we can provide an upper limit of the number of independent clues that can exist in a puzzle. There clearly cannot be more than 72 since that would mean that all 9 squares with a color would be given as clues, and the last one can always be derived from the position of the others. The same holds true for every box, row and column – no more than 8 clues in connected squares can be independent. Can we do better than 72 using only reasoning? Probably.

5.3 Sudoku space

Another interesting field of research with open questions is the one regarding the Sudoku space. The most obvious way to measure distance is the Hamming distance which is simply the number of squares that two Sudokus differ. The maximum distance between two Sudoku grids

19

would then be 81 and the minimum distance would be 4 since no two Sudokus can differ in less squares than the smallest possible unavoidable set.

Another, in our opinion, more interesting way to measure distance is to count how many unavoidable sets need to be permuted to go from one Sudoku to another. Two Sudokus that are the same up to relabeling the 8’s with 9’s would then have the distance of 1. This measure of distance focuses more on the structure of Sudokus, rather than the actual colors used. The main question that arises here is if the entire Sudoku space is connected this way, or if there exists “islands” that have no unavoidable sets in common with each other. Or, in other words,

Starting at an arbitrary Sudoku grid, can we reach all the other Sudoku grids by any order of permuting the unavoidable sets of the grid?

We believe the answer is yes. An experimental algorithm written by internet pseudonym “blue” tried to find connections between about 4 million random Sudoku pairs and succeeded in every case[13]. If there are any remote “islands”, they are small.

The question that then arises, and it connects back to our work in chapter 4, is if the entire Sudoku space can be reached by only permuting unavoidable sets with two colors. If so, our algorithm did indeed have the possibility to reach every possible Sudoku. If not, would also allowing the permutation of three-color UA’s be enough? This is an interesting question and one we would have liked to investigate further.

5.4 Real-world applications

Now that some significant research and results have been made to the field of Sudokus, a relevant question to ask is if these results can be extended to other, related problems. There are few Sudokus existing in the real-world but many other problems that have similar properties. If we again go back to viewing a Sudoku as a problem in graph theory and look at it as extending a partial coloring of a graph, it should be possible to generalize some of the results that have been reached. One example of a related real-world problem is that of placing mobile masts with different frequencies on a map in such a way that they don’t interfere with each other. Some of the masts have already been placed and some positions are ruled out because they are too close to existing towers. This, and many other problems, can be viewed as a partial graph coloring problem, and thus could benefit from Sudoku research.

Most of the work within the field of Sudoku research has been done in Sudoku solving methods. A large set of logical rules that allow the

20

determining of single squares, or exclusion of possible candidates of a square, has been developed. It would be interesting (and challenging) to try to formalize these rules to work for a more general graph.

21

Sources

[1] Google group rec.puzzles, post by pseudonym “QSCGZ” the 21st of September 2003http://groups.google.co.uk/group/rec.puzzles/browse_thread/thread/3ba62ed2d76a052/94ce2b94b84f4e53?lnk=st&q=6670903752021072936960++#Last visited 2012-02-08

[2] B.Felgenhauer, F.Jarvis (2006) – Mathematics of Sudoku 1http://www.afjarvis.staff.shef.ac.uk/sudoku/felgenhauer_jarvis_spec1.pdfLast visited 2012-02-08

[3] E.Russell, F.Jarvis (2006) – Mathematics of Sudoku IIhttp://www.afjarvis.staff.shef.ac.uk/sudoku/russell_jarvis_spec2.pdfLast visited 2012-02-08

[4] G.Royle (2011), page about minimum Sudokuhttp://mapleta.maths.uwa.edu.au/~gordon/sudokumin.phpLast visited 2012-02-08

[5] H.Simonis (2005), Sudoku as a Constraint Problemhttp://4c.ucc.ie/~hsimonis/sudoku.pdfLast visited 2012-02-08

[6] D.Berthier (2009) – Unbiased Statistics of a Constraint Satisfaction problem – a Controlled-Bias Generatorhttp://www-lor.int-evry.fr/~berthier/CISSE09-CSP-UStats.pdf – Last visited 2010-10-25Published in Innovations in Computing Sciences and Software Engineering 2010DOI: 10.1007/978-90-481-9112-3_16 http://www.springerlink.com/content/n537311u481g6278/

[7] T.Yato, T.Seta (2003) – Complexity and Completeness of Finding Another Solution and Its Application to Puzzleshttp://www-imai.is.s.u-tokyo.ac.jp/~yato/data2/SIGAL87-2.pdfLast visited 2012-02-08

[8] The unfinished project of an exhaustive search for a 16-clue puzzle, site is down as of Nov 2011.http://dist2.ist.tugraz.at/sudoku/ - not working as of 2012-02-08.Last visited 2010-10-25

[9] M.Dobrichev (2011), home of the gridchecker program.http://sites.google.com/site/dobrichev/gridcheckerLast visited 2012-02-08

[10] G.Sterten (2005), home of the unav program that finds small unavoidable sets in a given grid.http://magictour.free.fr/sudoku.htmLast visited 2012-02-08

22

[11] Thread called Solution Grids in U-space (Unavoidable Sets) started the 25 th of February 2010 by M.Dobrichev.http://www.setbb.com/sudoku/viewtopic.php?t=1893&highlight=finding+unavoidable+sets&mforum=sudokuLast visited 2012-02-08

[12] Thread called High clue tamagotchis created the 23 rd of May 2010 by pseudonym “eleven”.http://forum.enjoysudoku.com/high-clue-tamagotchis-t30020.html#p200099Last visited 2012-02-08

[13] Thread called Solution grids and UA-connections started the 12th of October 2010 by pseudonym “blue”.http://www.setbb.com/sudoku/viewtopic.php?t=2043&highlight=sudoku+space+connected&mforum=sudokuLast visited 2012-02-08

23

TRITA-CSC-E 2012:069 ISRN-KTH/CSC/E--12/069-SE

ISSN-1653-5715

www.kth.se

Documents

Sudoku and the Minimum Number of Clues...Sudoku and the Minimum Number of Clues Abstract The Sudoku success story started in Japan in the 1990s and took the world by storm in the beginning