Introduction
I guess that users of SQLServerCentral would be numerate, and that a puzzle such as Sudoku would therefore appeal. Certainly, there are several different resources devoted to this topic on this website already, which begs the question of why I am offering yet another Sudoku Solver. I am just an ordinary guy, and when I start to look at some of the excellent and very clever solutions around, my eyes start to glaze over, and although I am reading what is written, understanding is some distance away.
So when I found myself with a bit of time on my hands, I thought that instead of trying to resolve the imbalance in world trade, the debt mountain, why dogs shed hair or any of those big issues, I would write a Sudoku Solver that would make sense to ordinary people. My intention was not to create something that will solve the puzzle almost before it has started, and neither was I aiming to write something that was incredibly elegant, and demonstrated some cunning code. I was aiming to keep it as simple as possible, and aiming to reproduce the logic that I would use in solving an everyday sort of Sudoku puzzle manually. And if I had to resort to guessing (which would mean that it was a bad puzzle, in my books), I was only ever going to look for the first possible solution.
I also aimed at leaving only a small footprint, so that clearing up after trying this out would not require several weeks’ work. So this meant that I would work with temp tables and table variables rather than instantiating a bunch of tables to clutter things up afterwards. And the logic exists in a query script rather than in a Stored Procedure. In the event, I found that having three UDFs made the query a whole lot simpler, so the only legacy that will be left from running this will be three Scalar Functions, all beginning with ‘sfn’, my hastily implemented naming standard for Sudoku Functions.
By keeping it in its simplest form, it gives the greatest scope for everybody else to make their own tweaks, so that everybody can own their own personalised version. So, if you don’t like what I have done, please do feel absolutely free to make your own smoother, cooler, more customised version, either using mine as a starting point, or writing your own from scratch. Most if not all of the Sudoku Solving solutions that I have seen have been Advanced Level T-SQL, although it has to be said that some of those that do not actually work are perhaps more simple. My aim is to have a solution that is in the Simple T-SQL category, or as close to this as possible.
The reason why we can attempt to solve Sudoku with T-SQL, is that each Sudoku cell is the intersection of three sets – Row, Column and Block. I haven’t gone for huge queries, but have sort of sniped around the edges, using smaller queries to apply smaller logical steps. And when tackling big problems that are nothing to do with Sudoku, this is a method that can work, at least initially.
My aim when writing stuff is always to aim for simplicity – although of course, this ideal is not always easy to implement. Having something simple is much easier to modify, when you come to look at your code again in due course. Or even worse, if somebody else looks at your code to make a change! If they don’t understand what you have done, they can get into a big mess quickly. So with this solution, I have pretty much aimed to replicate the steps that I take when solving things manually, mapping them into simple chunks of code. I was not too fussed about making it as quick as possible, as I suspect that this code is unlikely to be run several thousand times every second, and if it takes a few seconds to produce an answer, then that would not be a big problem.
Within the discussion of previous Sudoku Solvers, reference is made to interdimensional computing resources, in order to accommodate all the possible permutations, but I can assure you that these will not be required in this instance. I have in fact done most of the work on this query on a five year old laptop running Windows 8. When it was new, it was certainly some distance away from being cutting edge, and now that the passage of time has further eroded its technical credentials, I was confident that if I got it to solve any puzzle that I tried in less than 10 seconds, then my solution would run pretty much anywhere in a reasonable time. In fact, I did not come across any puzzles that took over 5 seconds to solve. This is all hugely quicker than the typing of the puzzle in would be, although of course, the clever thing would be to write some code to have a camera scan a puzzle, convert it to text, fire it off at the solver, and show the answer. That would certainly be impressive!
This is meant to be an article, not an epic tome, so we had better leave the introductory matters, and get to the nitty gritty, without further ado.
The Solution, Stage 1 – variables, tables etc
The first thing to resolve is how to introduce the details of the puzzle to the query. In my modest reading up on this subject (see comments above about eyes getting glazed), it would appear that variety is everything when it comes to this topic. Richard Romley has taken the approach of requiring 81 variables, one for each cell. However crazy this may sound, it really does work for him, and his solution is outrageously fast. I saw some talk of comma delimited values, but pretty much everybody else has gone for a single string 81 characters long. The differences within this camp derive from what placeholders are used for the blank cells. I have seen three different characters used, namely a space, a zero or an underbar.
I spent approximately 1 second or less in deep thought and decided that the simplest approach is to use a single string 81 characters long. With this, the simplest code would entail casting each character as a TinyInt, so the placeholder could be either a space or a zero. No doubt if this offends you, it would be simple enough for you to tweak the code to accommodate whatever placeholder you feel is most appropriate.
I have a further 6 variables that control the working of the logic, and two variables that have no bearing on the solving of the puzzle, but do provide some measures of how much work the query has had to do in reaching the solution. Three table variables, and a single temp table complete the initialization of the query.
Because I am accommodating guesswork in this solution, I also need to include a way of unwinding the guess, in the event that a guess proves to be inauspicious. The fact that guesses may be nested several levels deep before an illogical state is discovered makes this process more tricky. Indeed, much of the complexity of my solution comes from this area. So my data structures have to cater for these guesses. I am not scared of the Normalization Police, and have included some denormalized data, as it makes some bits so much easier.
My temp table has a primary key comprised of the Guess Number, Row and Column. Whilst it would be possible to derive the Block number from the Row and Column, it is much easier to have it in the table, as the first of the denormalized fields. Then I have the value in that Cell, which is a zero for an unresolved cell. I round the table off with 4 other denormalized fields, the first three being the Sets for that Row, Column and Block as 9 character strings, and the final field which contains the possible values for that cell, if it is unresolved. These last 4 cells need to be periodically refreshed during the course of the process, which is the price you have to pay for not normalizing. In fact, the updating of these cells is where the query has to work the hardest in the Solving-by-Logic section.
There is a table variable that contains the numbers 1 to 9, and another that is a view by cell of the possible values for that cell. The final table variable contains details of the various guesses that are made. That ends the waffle about all the declarations and definitions, so now we can see how I have approached this problem.
The Solution, Stage 2 – Solving By Logic
The absolute first step is to convert the puzzle string into a table of values, which is a straightforward exercise, performed only once. Thereafter, each section of code may be repeated multiple times, until we get to an error, or the end of the solution.
The first bit of repeatable code is to calculate the Sets for each of Row, Column and Block. The set comprises of any currently known (given or derived) answers, with zeroes as placeholders. Having updated these fields, we ensure that things are valid, using the function dbo.sfnValidateSudokuSet. For each cell, it checks that the 3 sets for that cell are valid i.e. there are no duplicate non-zero values. If it is the first time through, an invalid result would indicate that the puzzle that was offered to the query is invalid, while if an invalid state is detected after some running, it will indicate either a bad guess has been made, or the code is defective, which of course never happens. If it is a bad guess, the guess will be rolled back (which we will discuss further down), but otherwise no further processing will be done, and a status message will be displayed.
Having established that all is well, the next step is to find the possible values for each blank cell, using the function dbo.sfnFindPossibles. This function takes the three relevant sets for each Cell, and looks for any ‘holes’ or ‘gaps’ or call them what you will. The cell can only comprise of values that do not yet appear in any of the sets.
When the possible values in the temp table have been populated, it is time to populate the table variable @tPossibles. This table contains a record for every possible value for each cell, so if a particular cell may have possible values of 4, 5 or 6, there would be 3 records in this table to represent each of those values.
All of the preparatory work is done now, so on to the actual work of solving the puzzle. An important thing to remember is that once a cell has a value in it, no calculations have to be performed against it.
The first bit of processing is easier to code than to describe clearly, and it is about curtailing the possibilities. It boils down to this – if for instance Row 1 has 2 cells that could contain the value 4, and both of these cells are in Block 1, then any other cells in Block 1 that have a possibility of being a 4 can be cleared of that value. Now we reverse that logic, so that in my example, if the only values for 4 within the Block were in Row 1, then Row 1 should contain no other 4’s.
Still with us? That’s grand! Now think about that scenario, but with columns instead of rows. Language does not always permit the precision that logic can easily handle! That logic needs to be applied recursively, until no more such situations arise, at which point the next bit of logic is engaged.
The previous bit of processing may have reduced the possibilities for a cell to just 1 value, so this is the first bit of processing that is done – any cell with only a single possibility will of course adopt that value.
The next bit of processing is related – with having shaved off some possibilities, it may be that within a Row for instance there is only a single cell that can be a 4, although that cell has other possibilities as well. But that single occurrence of 4 will override the others, so that cell gets that value.
And of course, the same goes for single occurrences within Columns and Blocks.
Having done all this, the logic returns to updating the Sets for Rows, Columns and Blocks. This cycle carries on until there is no more trimming of Possibilities or determining values of cells. Once all the logic that can be applied has been applied, the Puzzle is checked for validity. If it fails at this point, there has been an error in applying the logic, a message is displayed, and processing ends.
But if all is well, processing continues, with the first of the following steps being a check to see if there are any unresolved cells. If there aren’t, then obviously the logic was good enough to solve the Puzzle, messages are displayed and processing stops.
The Solution, Stage 3 – First Guess
There is a variable in my query that governs whether or not guesses are to be made. If you don’t want it to guess, then change the value of tiMakeGuesses to anything but 1.
If there are unresolved cells after all the logical bits have been applied to the puzzle, then it is time to enter the murky world of guessing. Because this has to be as simple as possible and controlled programmatically, the guessing has to contain some structure. When choosing a cell to make a guess on, I go for the shortest set of possibilities first - if there are only two possibilities, then it is 50/50 that I get it right first time, but if there are 6 possibilities then the odds of being right first time are greatly reduced. With a cell with 2 possibilities, then if it is wrong first time, it will be right the next time, but with 6 possibilities, there could be a lot of stepping back through the guesses before the right one is found. My selection of cell to make a guess on is Top 1 sorted by Length of Possibilities, Row and Column. Then I start at the left of the string of Possibilities (i.e. the lowest number), and work across.
It would be wonderful if guessing on a value for a cell resulted in the solution being found, but whilst this is possible, it is most probable that guesses may go several deep before an end is reached. And if that end is an invalid Puzzle, then the list of guesses needs to be unwound, 1 at a time, most recent first.
Enough chat, this is how I did it.
Firstly, I write a record to the @tGuesses table variable, with details of the guess. The second step is to make a copy in the temp table of the latest set of good data, but with the new Guess Number, and write the guess in. Then it is a case of going back to the processing of the Sets for Row, Column and Block, and follow the logic through from there.
That is the logic for a first guess, or for a nth level nested guess, which has not been initiated by a roll back.
The Solution, Stage 3 – Roll Guesses back
The logic for rolling guesses back is reached when an invalid state for the puzzle is detected, after a guess or series of guesses has been made. This is the trickiest area of code in my solution. But if guesses are to be made, then it is vital.
If a Guess is proved to be wrong, then the logic has to back down the decision tree, and taking the first free fork to try another route. Any parts of the tree that become redundant by travelling back down the tree need to be pruned from the Guess table.
The temp table is populated with a copy of the data immediately prior to the new branch being taken, and then processing can continue for the new guess.
Consider this situation, which uses values produced from the Puzzle that starts ‘0203’ in my code, so you can step through the logic using that Puzzle, should you wish
The first guess made is for Cell(2, 4), which has possibilities of 6 or 7. As discussed already, the code chooses 6, and processing continues, until it gets stuck again. The Puzzle is still valid at this point, so the FirstGuess code moves on to Cell(2, 6) and chooses 7 for Guess 2. It then finds that this makes the Puzzle invalid, so the RollGuessBack section chooses 8 for Guess Number 3.
Processing applies all the logic it can, and then gets stuck without completing the Puzzle, so the FirstGuess code makes Guess No 4 in Cell(2, 1) for 3. It churns away, but still can’t finish, so it makes Guess No 5 in Cell(4, 2) for 3. This guess proves to be invalid, so Guess 6 is the second option for Cell(4, 2) for 9.
tiGuessNo | tiRow | tiCol | tiVal | Possibles |
1 | 2 | 4 | 6 | 67 |
2 | 2 | 6 | 7 | 78 |
3 | 2 | 6 | 8 | 78 |
4 | 2 | 1 | 3 | 37 |
5 | 4 | 2 | 3 | 39 |
6 | 4 | 2 | 9 | 39 |
This proves to be invalid as well, which indicates that the problem lies further back. So Guesses 5 and 6 are proved to be redundant, and are deleted, and Guess 7 is made, choosing the second character of the Possible in Cell(2, 1), the other half of Guess 4.
tiGuessNo | tiRow | tiCol | tiVal | Possibles |
1 | 2 | 4 | 6 | 67 |
2 | 2 | 6 | 7 | 78 |
3 | 2 | 6 | 8 | 78 |
4 | 2 | 1 | 3 | 37 |
7 | 2 | 1 | 7 | 37 |
Bad news, this gave rise to another invalid situation, so Guesses 4 and 7 had to be removed. This proved that Guess 3 was also invalid, so that had to be removed, together with its other half Guess 2. Which all shows that Guess 1 was invalid, so a new Guess was made as Guess 8 for Cell(2, 4) for 7.
Guesses 9, 10 and 11 were made, but Guess 11 proved to be invalid, so Guess 12 was made to be the other half of Guess 11
tiGuessNo | tiRow | tiCol | tiVal | Possibles |
1 | 2 | 4 | 6 | 67 |
8 | 2 | 4 | 7 | 67 |
9 | 2 | 2 | 3 | 38 |
10 | 2 | 6 | 6 | 68 |
11 | 2 | 1 | 1 | 18 |
12 | 2 | 1 | 8 | 18 |
However, Guess 12 was found to be invalid too, so Guesses 11 and 12 were deleted, indicating that Guess 10 was wrong, so Guess 13 was created to be the second half of Guess 10. This was still not enough to solve the puzzle, and Guess 14 was added, which turned out to be sufficient to finally reach the solution.
tiGuessNo | tiRow | tiCol | tiVal | Possibles |
1 | 2 | 4 | 6 | 67 |
8 | 2 | 4 | 7 | 67 |
9 | 2 | 2 | 3 | 38 |
10 | 2 | 6 | 6 | 68 |
13 | 2 | 6 | 8 | 68 |
14 | 2 | 1 | 1 | 16 |
What is intriguing about this, is that although 14 Guesses were made, only 6 were finally utilised, and of these 6, 2 were wrong choices. Another intriguing thing is that Cell(2, 6) initially appeared in Guesses 2 and 3, but that path down the decision tree was deleted, and then the cell reappeared as Guesses 10 and 13 in the new path through the decision tree.
If you like to draw boxes, decision trees, logic diagrams, etc, then this scenario would no doubt make a nice picture to add to your collection.
That has successfully used up the spare time I had available – decisions have been made, and DIY is calling. Perhaps I will play with this some more in due course, reducing the processing time, making it generally sleeker, smoother and more desirable, etc. But it ain’t broke, so I may never get around to fixing it. You on the other hand might find yourself with some time on your hands, and spend some time customizing this to suit yourself.