Introduction
I was keen to try solving Sudoku using only T-SQL. I enjoy fiddling with scripting, number sequences, and modulo: which I use whenever a repeating sequence presents itself. As I was on a long flight home and my work computer battery was dead it was my time to play on my personal laptop!
In preparation for the task I had screen grabbed a single game on my phone from the internet and started with pen and paper exploring the concepts. I have not read descriptions of other approaches as wanted the challenge and learning opportunity.
In this first article I demonstrate how a single select method can be used to solve a puzzle. The next article shows methods to reduce permutations so the execution time goes from a log relationship to puzzle complexity to an almost linear relationship.
I am not trying through these articles to find the best method, rather show a process in optimising an algorithm from a simple set-based start through to one that first optimises the options prior to running the solve step. The overall process was a good learning exercise and a fun way to make use of a laptop not being stretched.
A word of warning, please do not run this on critical machines as it will impact performance!
The Game
The puzzle is based on a 9 x 9 grid where the player has to fill in missing numbers between fixed values.
+---+---+---+---+---+---+---+---+---+---+ | . | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | +---+---+---+---+---+---+---+---+---+---+ | 1 | 5 | 7 | | | 1 | 8 | | | | +---+---+---+---+---+---+---+---+---+---+ | 2 | | | 8 | | | | | | 4 | +---+---+---+---+---+---+---+---+---+---+ | 3 | | 1 | 2 | 5 | | | | | | +---+---+---+---+---+---+---+---+---+---+ | 4 | | 9 | 5 | | 8 | | | 6 | 7 | +---+---+---+---+---+---+---+---+---+---+ | 5 | 8 | | | 7 | 5 | 2 | | | 3 | +---+---+---+---+---+---+---+---+---+---+ | 6 | 3 | 4 | | | 6 | | 8 | 5 | | +---+---+---+---+---+---+---+---+---+---+ | 7 | | | | | | 5 | 2 | 7 | | +---+---+---+---+---+---+---+---+---+---+ | 8 | 1 | | | | | | 9 | | | +---+---+---+---+---+---+---+---+---+---+ | 9 | | | | 1 | 4 | | | 8 | 5 | +---+---+---+---+---+---+---+---+---+---+
- Note that there are row numbers on the left and column numbers along the top
There are nine lots of 3x3 blocks of cells which for purpose of coordination are numbered left to right and top down (row priority).
+---+---+---+---+---+---+---+---+---+---+ | . | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | +---+---+---+---+---+---+---+---+---+---+ | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | +---+---+---+---+---+---+---+---+---+---+ | 2 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | +---+---+---+---+---+---+---+---+---+---+ | 3 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 | +---+---+---+---+---+---+---+---+---+---+ | 4 | 4 | 4 | 4 | 5 | 5 | 5 | 6 | 6 | 6 | +---+---+---+---+---+---+---+---+---+---+ | 5 | 4 | 4 | 4 | 5 | 5 | 5 | 6 | 6 | 6 | +---+---+---+---+---+---+---+---+---+---+ | 6 | 4 | 4 | 4 | 5 | 5 | 5 | 6 | 6 | 6 | +---+---+---+---+---+---+---+---+---+---+ | 7 | 7 | 7 | 7 | 8 | 8 | 8 | 9 | 9 | 9 | +---+---+---+---+---+---+---+---+---+---+ | 8 | 7 | 7 | 7 | 8 | 8 | 8 | 9 | 9 | 9 | +---+---+---+---+---+---+---+---+---+---+ | 9 | 7 | 7 | 7 | 8 | 8 | 8 | 9 | 9 | 9 | +---+---+---+---+---+---+---+---+---+---+
The game is played following three rules.
- the numbers in a row can not repeat,
- the numbers in a column can not repeat,
- and the numbers in 3x3 cell sets can not repeat.
A properly formed game should have only one solution however some end up with more solutions.
Players typically work by eliminating solutions for the blank cells, using the three rules, then either note those or remember them as they go. The process, for me at least, is one of eliminating options and noting possible answers until I can see the combination that honor the three rules. I will need to work on my memory to be able to speed this process up.
Permutations
For a completely empty 9 x 9 puzzle there are 9^81 possible permutations to be considered or 1.97E+77. When a position is assigned a fixed value and the three rules are applied to adjacent cells the number of permutations decreases dramatically.
The most significant cost saver is to solve a cell, you go from having as many as a x9 permutation penalty for an open cell to x1 (none). However just by reducing the available options for one cell you are reducing the permutations, remember each option acts as a multiplier. So, for all scenarios when the first value is fixed the permutations become 60 cells with 9 possibilities, 20 with 8 and 1 fixed, which is 9^60*8^20*1 or about 2.07E+75 permutations. For subsequent solved values, the reduction in the number of permutations is based on the location of that cell relative to the other.
It does not matter from puzzle solution to solution, the permutations are the same for all puzzles. The reduction in the number of permutations is from the relative locations of fixed cells. The best reduction in permutations is when values are clustered.
The number of unique sequences to fix all 81 values into a game is 81! (81 factorial) or 81*80*79, which is about 5.797126e+120. So there is not shortage of configurations for even 1 game. Of all of the permutations, there are 6,670,903,752,021,072,936,960 complete game solutions according to work by Bertram Felgenhauer and Frazer Jarvis (refer to http://www.afjarvis.staff.shef.ac.uk/sudoku/bertram.html).
The most notable difference between very easy and very hard games is the number of fixed values provided initially. A simple game came be solved by deduction, completing the most filled row, column or block first then progressing to the more sparse as the options are determined.
Approach
Given the puzzle is a regular 9x9 grid it can be serialised into an 81 character string. Using a row preference means the 9th field is the top right and 81st field is the bottom right cell.
Consider the following puzzle that includes a header line of column numbers and a column of row numbers.
+--------+---+---+---+---+---+---+---+---+---+ | RowNum | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | +--------+---+---+---+---+---+---+---+---+---+ | 1 | 2 | | | 3 | | | | | 8 | +--------+---+---+---+---+---+---+---+---+---+ | 2 | | | | | | | | | | +--------+---+---+---+---+---+---+---+---+---+ | 3 | | | 1 | 2 | 8 | 4 | | 5 | | +--------+---+---+---+---+---+---+---+---+---+ | 4 | | 6 | | 9 | | | 2 | | | +--------+---+---+---+---+---+---+---+---+---+ | 5 | | 9 | | | | | | 6 | | +--------+---+---+---+---+---+---+---+---+---+ | 6 | | | 5 | | | 1 | | | | +--------+---+---+---+---+---+---+---+---+---+ | 7 | | 1 | | | 4 | | 7 | | | +--------+---+---+---+---+---+---+---+---+---+ | 8 | | | 8 | | 7 | 9 | | | 3 | +--------+---+---+---+---+---+---+---+---+---+ | 9 | | | | | | | | 9 | 6 | +--------+---+---+---+---+---+---+---+---+---+
For parsing a puzzle, such as the above example, a 81 character length char variable is used. A zero value is used to denote an empty cell from the puzzle.
200300008000000000001284050060900200090000060005001000010040700008079003000000096
Using this data structure means that the row, column and 3x3 block number can be simply derived where:
- Row = ((field number - 1)/9)+1
- Column = ((field number - 1)%9)+1
- Block number = (((Row-1)/3)*3)+((Column-1)/3)+1
- Ordinal in 81 character string = (RowNum*9)-9+ColumnNum
This detail is stored to a temporary table for use in referencing during execution processing.
The steps to solve the puzzle are:
- Capture the initial puzzle
- Determine the options for the remaining open cells
- Solve such that the three rules are met
- Display the solution
This can all be solved from a single select statement. No recursion or complexity required just brute processing force. The issue with using brute force is that there is a logarithmic penalty on the number of options (permutations to be tested) and execution time.
I will present the logic and code for the brute force approach here then in part two a more sophisticated approach will be presented using various reduction methods. The execution time in part two has a linear relationship to the number of options.
Single select brute force method
The grid properties are stored in a table to be able to relate rows, columns and blocks to the numbered cell value. Here are the top 10 lines of the 81 that make up the puzzle.
+--------------+--------+-----------+----------+ | CoordinateID | RowNum | ColumnNum | BlockNum | +--------------+--------+-----------+----------+ | 1 | 1 | 1 | 1 | | 2 | 1 | 2 | 1 | | 3 | 1 | 3 | 1 | | 4 | 1 | 4 | 2 | | 5 | 1 | 5 | 2 | | 6 | 1 | 6 | 2 | | 7 | 1 | 7 | 3 | | 8 | 1 | 8 | 3 | | 9 | 1 | 9 | 3 | | 10 | 2 | 1 | 1 | +--------------+--------+-----------+----------+
The first task is to store the puzzle in a suitable data structure for working on. Since the grid table contains the same number of rows as the puzzle it can be used to drive a substring splitting into 81 records. Note that the 0 values are not stored for processing as they are to be solved.
insert into #tFixed (CoordinateID, Value) select CoordinateID, substring(@PuzzleIn,CoordinateID,1) Value from #tGrid where substring(@PuzzleIn,CoordinateID,1) != 0
The first feature of the problem to solve, is the generation of the permutation for each cell across the puzzle. For a completely empty puzzle the options for all 81 cells is 9. So the permutations that would need to be tested would be 9*9*9... (81 times) or 9^81. Testing that number of records would not be feasible and storing the number of records to a table would be inefficient. Each fixed cell comes off the 81 exponent of the permutation and then rules are applied, this means the adjoining row, column and block have their base (starting at 9) reduced by one.
The simplest method I came to for determining options for each of the 81 cells is using a cross join between the 81 record grid table with a 9 record number table.
select CoordinateID, Num from #tGrid a cross join (select top 9 ROW_NUMBER() over (order by CoordinateID) Num from #tGrid) b
This produces a table of 729 records (81*9).
So the next step is to add a where clause to the options cross join to prevent the fixed numbers from being present in the adjoining row, column or block number. The simplest form of this was using three not exists statements, then finally union all to add back the fixed values.
insert into #tOption (CoordinateID, Value) select CoordinateID, Num from #tGrid a cross join (select top 9 ROW_NUMBER() over (order by CoordinateID) Num from #tGrid) b /* exclude fixed coordinates*/ where not exists (select 1 from #tFixed c where a.CoordinateID = c.CoordinateID) /* exclude row */ and not exists (select 1 from #tFixed c inner join #tGrid d on c.CoordinateID = d.CoordinateID where a.RowNum = d.RowNum and b.Num = c.Value) /* exclude column */ and not exists (select 1 from #tFixed c inner join #tGrid d on c.CoordinateID = d.CoordinateID where a.ColumnNum = d.ColumnNum and b.Num = c.Value) /* exclude value same BlockNum */ and not exists (select 1 from #tFixed c inner join #tGrid d on c.CoordinateID = d.CoordinateID where a.BlockNum = d.BlockNum and b.Num = c.Value) /* add back fixed values */ union select CoordinateID, Value from #tFixed
At this point the #tOption table contains all potential options for each CoordinateID.
The final solution can now be selected from the options table with 81 cross joins and 810 statements to prevent values repeating within rows, columns or blocks in the where clause! The 810 clauses are from three lots of 324 statements however due to the overlap between block and row and block and column this reduces to 810. Since values in each row, column, and block need to be unique; there are only 9 options for each the simplest concept I found was to used <> between the cells. In all cases the operand is smaller on the left side of the operator to ensure that only one orientation is tested.
So the code looks like this:
select * from (select Value [1] from #tOption where CoordinateID = 1) [1] cross join(select Value [2] from #tOption where CoordinateID = 2) [2] ... cross join(select Value [81] from #tOption where CoordinateID = 81) [81] where [1] <> [2] and [1] <> [3] and [1] <> [4] ...
The concept of cross joining so many sub-queries seems absurd however the query completes. The issue with this method is for the more hard puzzles where only few fixed values are provided in the puzzle or the alignment between fixed values is minimal. This is the perfect storm where there are few static values in the cross join and there are still a lot of options requiring testing.
For harder puzzles it is justified to recurse sections and solve or at least reduce open options. By attempting to solve the smallest and most populated segments first a divide and conquer strategy pays off in overall query time. Once the smallest units (rows, columns and blocks) are exhausted moving onto larger segments is still more efficient than skipping forward to the single find solve statement. This will be presented in the next post on optimisation through permutation reduction.
Conclusion
The attached code is of the stored procedure dbo.sp_sudoku_solve_simple. The procedure accepts a puzzle defintion and shows the puzzle, reduced options, and solution once solved as described above.
An example execution is
exec sp_sudoku_solve_simple '023780460000620000060304080001000534280000097439000100010205040000036000056018370'
Very simple puzzles will solve within a few seconds, however, hard puzzles will take from hours to weeks or longer!
My latest testing of optimised methods for solving only take seconds even for the hard puzzles but these use various optimisation methods, which I will describe in the next article along with the optimisation process I followed.
In the header of the stored proceedure a set of puzzles are provided to test with.
A word of warning, please do not run this on critical machines as it will impact performance. My puzzle play is all on personal hardware.
Thanks for reading!