A SQL Sudoko Solver
Introduction
During a long wet May bank holiday weekend I decided to try my hand at a Sudoko puzzle in the Sunday paper.
After all, how hard could it be?
Four frustrating hours later I thought "hang on, this is a set based problem, surely I can use
SQL Server to solve this thing".
Basic Sudoko
Sudoko is usually a 9x9 grid, although any grid that can be regularly divided up is theoretically possible.
For the sake of this article we will assume that we are trying to solve a 9x9 grid.
A Sudoko grid may look something like the following:
|
|
| |||||||||||||||||||||||||||
|
|
| |||||||||||||||||||||||||||
|
|
|
As with all the best puzzles the rules are very simple
- Each row must contain the numbers 1 through to 9 only once
- Each column must contain the numbers 1 through to 9 only once
- Each 3x3 cell must contain the numbers 1 through to 9 only once
Looking at our grid this means that in row 9 column 9 we know that its value cannot be
- 1 because a 1 already exists in that 3x3 cell
- 4 or 9 because those digits already exist in that row
- 2,6 or 7 because those digits already exist in that column
Building the initial Sudoko square.
Our first task is to come up a table structure that will hold our Sudoko information.
Field | Type |
---|---|
RowId | TINYINT |
ColumnId | TINYINT |
CellValue | TINYINT |
We can populate this table using simple INSERT statements.
TRUNCATE TABLE dbo.Sudoko -- Row 1 INSERT INTO dbo.Sudoko (RowId, ColumnId, CellValue) VALUES (1,4,5) INSERT INTO dbo.Sudoko (RowId, ColumnId, CellValue) VALUES (1,6,8) ...etc
Where we do not yet know the values in the cells we want an entry with a NULL value. Do do this I have a couple of views
and a quick bit of script as follows
IF EXISTS (SELECT * FROM sys.views WHERE Name='Digits') DROP VIEW dbo.Digits GO CREATE VIEW dbo.Digits AS SELECT 0 AS value UNION ALL SELECT 1 AS value UNION ALL SELECT 2 AS value UNION ALL SELECT 3 AS value UNION ALL SELECT 4 AS value UNION ALL SELECT 5 AS value UNION ALL SELECT 6 AS value UNION ALL SELECT 7 AS value UNION ALL SELECT 8 AS value UNION ALL SELECT 9 AS value GO -------------------------------------------------------------------------- IF EXISTS (SELECT * FROM sys.views WHERE Name='AllValues') DROP VIEW dbo.AllValues GO CREATE VIEW dbo.AllValues AS SELECT R.Value AS RowId, C.Value AS ColumnId, V.Value AS CellValue FROM dbo.Digits AS R,dbo.Digits AS C, dbo.Digits AS V WHERE R.Value>0 AND C.Value>0 AND V.Value>0 GO -------------------------------------------------------------------------- -- Insert NULL values into the Sudoko square where row/column coordinates do not yet exist. INSERT INTO dbo.Sudoko (RowId,ColumnID) SELECT DT.RowId,DT.ColumnId FROM dbo.Sudoko AS SD RIGHT JOIN( SELECT DISTINCT RowId,ColumnId FROM dbo.AllValues WHERE RowId>0 AND ColumnId > 0 AND CellValue>0 ) AS DT ON SD.RowId = DT.RowId AND SD.ColumnId = DT.ColumnID WHERE SD.RowId IS NULL GO
Blocked Values Part One
Once we have built our Sudoko square the first step in the solution is to establish which values are blocked from
each individual cell. Remember the rules we outlined earlier.
- Each row must contain the numbers 1 through to 9 only once
- Each column must contain the numbers 1 through to 9 only once
- Each 3x3 cell must contain the numbers 1 through to 9 only once
We have to build a query that will determine these values for us.
To begin with we want to identify all the values that occur in any one row.
SELECT DISTINCT RowId,CellValue FROM dbo.Sudoko WHERE CellValue IS NOT NULL
Now we want to join this back to the empty cells of our Sudoko square as follows
SELECT SD.RowId, SD.ColumnId,SR.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( -- Get the values for the row. SELECT DISTINCT RowId,CellValue FROM dbo.Sudoko WHERE CellValue IS NOT NULL ) AS SR ON SD.RowId = SR.RowId WHERE SD.CellValue IS NULL
We want a similar query for the columns and finally a query that identifies the values in a particular 3x3 cell.
SELECT SD.RowId, SD.ColumnId,CC.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( -- Get the values present in a 3x3 cell. SELECT DISTINCT (RowId-1)/3+1 AS RowCellId, (ColumnId-1)/3+1 AS ColumnCellId, CellValue FROM dbo.Sudoko WHERE CellValue IS NOT NULL ) AS CC ON (SD.ColumnId-1)/3+1 = CC.ColumnCellId AND (SD.RowId-1)/3+1 = CC.RowCellId WHERE SD.CellValue IS NULL
We have our 3 queries and we want all of these added together which we can do in a single view as follows:
IF EXISTS (SELECT * FROM sys.views WHERE Name='BlockedValues') DROP VIEW dbo.BlockedValues GO --##SUMMARY Values that cannot be inserted into cells CREATE VIEW dbo.BlockedValues AS SELECT SD.RowId, SD.ColumnId,SR.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( SELECT DISTINCT RowId,CellValue FROM dbo.Sudoko WHERE CellValue IS NOT NULL ) AS SR ON SD.RowId = SR.RowId WHERE SD.CellValue IS NULL UNION SELECT SD.RowId, SD.ColumnId,SC.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( SELECT DISTINCT ColumnId,CellValue FROM dbo.Sudoko WHERE CellValue IS NOT NULL ) AS SC ON SD.ColumnId = SC.ColumnId WHERE SD.CellValue IS NULL UNION SELECT SD.RowId, SD.ColumnId,CC.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( SELECT DISTINCT (RowId-1)/3+1 AS RowCellId, (ColumnId-1)/3+1 AS ColumnCellId, CellValue FROM dbo.Sudoko WHERE CellValue IS NOT NULL ) AS CC ON (SD.ColumnId-1)/3+1 = CC.ColumnCellId AND (SD.RowId-1)/3+1 = CC.RowCellId WHERE SD.CellValue IS NULL GO
We now have all those cell values that are directly blocked by existing data.
This isn't the whole story but it is a good start
Possible Values Part One
The list of possible values that can be inserted into any one cell must be the values still left to solve less those
identified in our "BlockValues" view.
To get to this we can define two new views as follows:
--------------------------------------------------------------------- IF EXISTS (SELECT * FROM sys.views WHERE Name='AllPossibleValues') DROP VIEW dbo.AllPossibleValues GO CREATE VIEW dbo.AllPossibleValues AS SELECT AllVal.RowId, AllVal.ColumnId , AllVal.CellValue FROM dbo.Sudoko AS SD INNER JOIN dbo.AllValues AS AllVal ON SD.RowId = AllVal.RowId AND SD.ColumnID = AllVal.ColumnID WHERE SD.CellValue IS NULL GO ---------------------------------------------------------------------
This first view simply produces a total list of all values for those cells that are currently blank.
--------------------------------------------------------------------- IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValues') DROP VIEW dbo.PossibleValues GO CREATE VIEW dbo.PossibleValues AS SELECT AllVal.RowId, AllVal.ColumnId , AllVal.CellValue FROM dbo.AllPossibleValues AS AllVal LEFT JOIN dbo.BlockedValues AS Block ON AllVal.RowId = Block.RowId AND AllVal.ColumnId = Block.ColumnId AND AllVal.CellValue = Block.CellValue WHERE Block.RowId IS NULL GO
This 2nd view excludes those values that are deemed to be blocked.
Our first solving query
What we want to do is to populate any cells in our Sudoko puzzle where there is only one possible value,
that is 8 out of 9 values are blocked.
Of course if we solve a cell (or hopefully cells) this means that our list of possible values will be
reduced for other cells and therefore we want to keep hitting our query until it can solve no more values. This means
we need some code that says
WHILE @@ROWCOUNT >0 BEGIN ... exec dbo.OurSolvingSP END
Firstly we must identify those cells for which there is only one solution.
SELECT RowId, ColumnId, MIN(CellValue) AS CellValue FROM dbo.PossibleValues GROUP BY RowId, ColumnId HAVING COUNT(*)=1
Our stored procedure becomes the following:
------------------------------------------------------------ IF EXISTS(SELECT 1 FROM sys.procedures where name = 'SolveSingleCells') DROP PROC dbo.SolveSingleCells GO CREATE PROCEDURE dbo.SolveSingleCells AS SET NOCOUNT ON DECLARE @RecordCount1 INT , @RecordCount2 INT , @TotalCount INT SET @TotalCount=1 WHILE @TotalCount>0 BEGIN -- Update the Sudoko square where only a single solution exists -- for any one cell UPDATE SD SET SD.CellValue = DT.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( SELECT RowId, ColumnId, MIN(CellValue) AS CellValue FROM dbo.PossibleValues GROUP BY RowId, ColumnId HAVING COUNT(*)=1 ) AS DT ON SD.RowId = DT.RowId AND SD.ColumnId = DT.ColumnId SET @TotalCount= @@ROWCOUNT END GO
In a simple puzzle this will solve quite a few values however our puzzle is a little more complicated than that
so we must think again.
A 2nd attempt
Look at the first 3x3 cell of our grid again. We know that there are no cells with a single possible solution
so we must turn the problem on its head and look for occurrences where a value can only fit in a single cell.
Consider the number 7. The shaded square show where figure 7 cannot go and as you can see there is only one
possible cell where 7 is not blocked.
Using the same method we can see that in the bottom right hand 3x3 cell there is also only one cell where 7 is not blocked.
|
|
| |||||||||||||||||||||||||||
|
|
| |||||||||||||||||||||||||||
|
|
|
Clearly we need a query that can identify these occurrences and that we can incorporate into our solving stored procedure!
Our first step is to build a view that can determine the number of different cell values that can go in any particular
cell.
IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValuesByCell') DROP VIEW dbo.PossibleValuesByCell GO CREATE VIEW dbo.PossibleValuesByCell AS SELECT (PV.RowId-1)/3+1 AS RowCellId, (PV.ColumnId-1)/3+1 AS ColumnCellId, PV.CellValue, COUNT(*) AS Occurrences FROM dbo.possiblevalues AS PV GROUP BY (PV.RowId-1)/3+1 , (PV.ColumnId-1)/3+1 , PV.CellValue GO
Our update statement therefore becomes the following
UPDATE SD SET SD.CellValue = PV.CellValue FROM dbo.Sudoko AS SD INNER JOIN dbo.possiblevalues AS PV ON SD.RowId = PV.RowId AND SD.ColumnId = PV.ColumnID INNER JOIN PossibleValuesByCell AS PC ON PC.RowCellId = (PV.RowId-1)/3+1 AND PC.ColumnCellId = (PV.ColumnId-1)/3+1 AND PC.CellValue = PV.CellValue WHERE PC.Occurrences=1
Wrapping this up into a single stored procedure becomes the following
------------------------------------------------------------ IF EXISTS(SELECT 1 FROM sys.procedures where name = 'SolveSingleCells') DROP PROC dbo.SolveSingleCells GO CREATE PROCEDURE dbo.SolveSingleCells AS SET NOCOUNT ON DECLARE @RecordCount1 INT , @RecordCount2 INT , @TotalCount INT SET @TotalCount=1 WHILE @TotalCount>0 BEGIN -- Update the Sudoko square where only a single solution exists -- for any one cell UPDATE SD SET SD.CellValue = DT.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( SELECT RowId, ColumnId, MIN(CellValue) AS CellValue FROM dbo.PossibleValues GROUP BY RowId, ColumnId HAVING COUNT(*)=1 ) AS DT ON SD.RowId = DT.RowId AND SD.ColumnId = DT.ColumnId SET @RecordCount1 = @@ROWCOUNT UPDATE SD SET SD.CellValue = PV.CellValue FROM dbo.Sudoko AS SD INNER JOIN dbo.PossibleValues AS PV ON SD.RowId = PV.RowId AND SD.ColumnId = PV.ColumnID INNER JOIN PossibleValuesByCell AS PC ON PC.RowCellId = (PV.RowId-1)/3+1 AND PC.ColumnCellId = (PV.ColumnId-1)/3+1 AND PC.CellValue = PV.CellValue WHERE PC.Occurrences=1 SET @RecordCount2 = @@ROWCOUNT SET @TotalCount = @RecordCount1+@RecordCount2 -- Display the number of solved values. RAISERROR('Single cells solved = %d',10,1,@TotalCount) END RETURN @TotalCount GO
By itterating through this procedure until it can solve no other cells we can see that it will fill in
quite a few of the gaps in our puzzle.
|
|
| |||||||||||||||||||||||||||
|
|
| |||||||||||||||||||||||||||
|
|
|
Solving double values
The cells that are left have multiple possible solutions so we need something that will evaluate these cells.
If we look at the row 3 and 4 for column 6 we can see that they are both limited to the same pair of values, 2 &9.
This means that nothing else in that column can take either of these values. If any other cells did take one of these
values it would mean that both of our cells would have exactly the same value, which is against the rules.
What we need is a query that blocks these values from our list of possible values for other cells.
Our next step is to create a view that identifies those cells that have only two possible solutions.
--------------------------------------------------------------------- IF EXISTS (SELECT * FROM sys.views WHERE Name='TwinValues') DROP VIEW dbo.TwinValues GO CREATE VIEW dbo.TwinValues AS SELECT RowId, ColumnId,MIN(CellValue) AS Value1, MAX(CellValue) AS Value2 FROM dbo.PossibleValues GROUP BY RowId, ColumnId HAVING COUNT(*)=2 GO ---------------------------------------------------------------------
Then we want to identify where a pair of values is shared in the same column.
--------------------------------------------------------------------- SELECT T1.ColumnId, MIN(T1.RowId) AS RowId1, MAX(T1.RowId) AS RowId2, T1.Value1, T1.Value2 FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2 ON T1.ColumnId = T2.ColumnId -- Must be in the same column AND T1.Value1 = T2.Value1 -- Must have an identical 1st value AND T1.Value2 = T2.Value2 -- Must have an identical 2nd value WHERE T1.RowId<>T2.RowId -- Must be in a different row. GROUP BY T1.ColumnId,T1.Value1, T1.Value2 ---------------------------------------------------------------------
Finally we must combined this back with our possible values so that we can identify any cells in our column, other
than the ones we have identified that have one or more of these values.
--------------------------------------------------------------------- SELECT PV.RowId , PV.ColumnId, PV.CellValue FROM dbo.PossibleValues AS PV INNER JOIN ( SELECT T1.ColumnId, MIN(T1.RowId) AS RowId1, MAX(T1.RowId) AS RowId2, T1.Value1, T1.Value2 FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2 ON T1.ColumnId = T2.ColumnId AND T1.Value1 = T2.Value1 AND T1.Value2 = T2.Value2 WHERE T1.RowId<>T2.RowId GROUP BY T1.ColumnId,T1.Value1, T1.Value2 ) AS DT ON PV.ColumnId = DT.ColumnId -- Must be in the same column AND PV.CellValue IN (DT.Value1, DT.Value2)-- Must have at least one of the values AND PV.RowId NOT IN (DT.RowId1,DT.RowId2)-- Must not be in the same cells as our paired value cells. ---------------------------------------------------------------------
Blocked Values Part Two
The previous query will form the basis for our 2nd blocked values view. This new view will contain the following:
- Our query identifying values from pairs which occur in column cells other than our paired cells
- A similar query identifying values from pairs which occur in row cells other than our paired cells
- The values from our original BlockedValues view
Putting it all together we can come up with a 2nd stage blocked view
--------------------------------------------------------------------- IF EXISTS (SELECT * FROM sys.views WHERE Name='BlockedValuesStage2') DROP VIEW dbo.BlockedValuesStage2 GO --##SUMMARY Values that cannot be inserted into cells CREATE VIEW dbo.BlockedValuesStage2 AS -- Identify cell values blocked because they exist in a column pair SELECT PV.RowId , PV.ColumnId, PV.CellValue FROM dbo.PossibleValues AS PV INNER JOIN ( SELECT T1.ColumnId, MIN(T1.RowId) AS RowId1, MAX(T1.RowId) AS RowId2, T1.Value1, T1.Value2 FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2 ON T1.ColumnId = T2.ColumnId AND T1.Value1 = T2.Value1 AND T1.Value2 = T2.Value2 WHERE T1.RowId<>T2.RowId GROUP BY T1.ColumnId,T1.Value1, T1.Value2 ) AS DT ON PV.ColumnId = DT.ColumnId AND PV.CellValue IN (DT.Value1, DT.Value2) AND PV.RowId NOT IN (DT.RowId1,DT.RowId2) UNION -- Identify cell values blocked because they exist in a row pair SELECT PV.RowId , PV.ColumnId, PV.CellValue FROM dbo.PossibleValues AS PV INNER JOIN ( SELECT T1.RowId, MIN(T1.ColumnId) AS ColumnId1, MAX(T1.ColumnId) AS ColumnId2, T1.Value1, T1.Value2 FROM dbo.twinvalues AS T1 INNER JOIN dbo.twinvalues AS T2 ON T1.RowId = T2.RowId AND T1.Value1 = T2.Value1 AND T1.Value2 = T2.Value2 WHERE T1.ColumnId<>T2.ColumnId GROUP BY T1.RowId,T1.Value1, T1.Value2 ) AS DT ON PV.RowId = DT.RowId AND PV.CellValue IN (DT.Value1, DT.Value2) AND PV.ColumnId NOT IN (DT.ColumnId1,DT.ColumnId2) UNION -- Identify cells from the original blocked query. SELECT RowId , ColumnId, CellValue FROM dbo.BlockedValues GO
A case to ponder
There is one other case that we could consider. Where a paired value exists in the same 3x3 cell but does not share
a row or column. This would indeed pose a challenge as a query because, where as a column pair or row pair share a common
axis, this instance does not share any easily encoded values.
Let us suppose that row 1, column 1 shares the same pairing as row 3 column 3. Writing a query that would investigate
values not in these two cells would be quite tricky. The ways that the preceding queries work would cause row 1 column 3
and row 3 column 1 to be blocked.
There are two reasons why we don't need to worry about this query
- It would only affect the diagonal 3x3 cells 1,1 & 2,2 & 3,3. That is, you cannot transpose row 3 column 4 with
column 4 row 3 as these are in 3x3 cells 1,2 and 2,1 respectively
- The other blocked values actually work around the whole query making our cell test irrelevant.
At worst we end up
with an extra couple of itterations in our loop but the solution remains the same.
Possible Values Part Two
Of course, now we have an additional blocked values query we also need a 2nd stage PossibleValues query.
This will be similar to the first version but will be dbo.PossibleValues minus dbo.BlockedValuesStage2
--------------------------------------------------------------------- IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValuesStage2') DROP VIEW dbo.PossibleValuesStage2 GO CREATE VIEW dbo.PossibleValuesStage2 AS SELECT AllVal.RowId, AllVal.ColumnId , AllVal.CellValue FROM dbo.PossibleValues AS AllVal LEFT JOIN dbo.BlockedValuesStage2 AS Block ON AllVal.RowId = Block.RowId AND AllVal.ColumnId = Block.ColumnId AND AllVal.CellValue = Block.CellValue WHERE Block.RowId IS NULL GO ---------------------------------------------------------------------
Now we can revisit our dbo.SolveSingleCells procedure and dbo.PossibleValuesByCell view and replace
references to dbo.PossibleValues with dbo.PossibleValuesStage2
--------------------------------------------------------------------- IF EXISTS (SELECT * FROM sys.views WHERE Name='PossibleValuesByCell') DROP VIEW dbo.PossibleValuesByCell GO CREATE VIEW dbo.PossibleValuesByCell AS SELECT (PV.RowId-1)/3+1 AS RowCellId, (PV.ColumnId-1)/3+1 AS ColumnCellId, PV.CellValue, COUNT(*) AS Occurrences FROM dbo.PossibleValuesStage2 AS PV GROUP BY (PV.RowId-1)/3+1 , (PV.ColumnId-1)/3+1 , PV.CellValue GO ------------------------------------------------------------ IF EXISTS(SELECT 1 FROM sys.procedures where name = 'SolveSingleCells') DROP PROC dbo.SolveSingleCells GO CREATE PROCEDURE dbo.SolveSingleCells AS SET NOCOUNT ON DECLARE @RecordCount1 INT , @RecordCount2 INT , @TotalCount INT SET @TotalCount=1 WHILE @TotalCount>0 BEGIN -- Update the Sudoko square where only a single solution exists -- for any one cell UPDATE SD SET SD.CellValue = DT.CellValue FROM dbo.Sudoko AS SD INNER JOIN ( SELECT RowId, ColumnId, MIN(CellValue) AS CellValue FROM dbo.PossibleValuesStage2 GROUP BY RowId, ColumnId HAVING COUNT(*)=1 ) AS DT ON SD.RowId = DT.RowId AND SD.ColumnId = DT.ColumnId SET @RecordCount1 = @@ROWCOUNT UPDATE SD SET SD.CellValue = PV.CellValue FROM dbo.Sudoko AS SD INNER JOIN dbo.PossibleValuesStage2 AS PV ON SD.RowId = PV.RowId AND SD.ColumnId = PV.ColumnID INNER JOIN PossibleValuesByCell AS PC ON PC.RowCellId = (PV.RowId-1)/3+1 AND PC.ColumnCellId = (PV.ColumnId-1)/3+1 AND PC.CellValue = PV.CellValue WHERE PC.Occurrences=1 SET @RecordCount2 = @@ROWCOUNT SET @TotalCount = @RecordCount1+@RecordCount2 RAISERROR('Single cells solved = %d',10,1,@TotalCount) -- If any updates occurred then remove them from the possible values list. IF @TotalCount>0 exec dbo.RemoveSolvedCells END RETURN @TotalCount GO ------------------------------------------------------------
Wrapping it all up
Perhaps the final stage is to write two short views to ensure that the puzzle is truly solved.
The following view should never return any records if the solution is valid. It attempts to find the
following occurrences
- Rows where a single value occurs more than once.
- Columns where a single value occurs more than once.
------------------------------------------------------------ IF EXISTS (SELECT * FROM sys.views WHERE Name='ValidateSolution') DROP VIEW dbo.ValidateSolution GO CREATE VIEW dbo.ValidateSolution AS SELECT RowId AS IndexValue,'Row' AS Item,cellvalue FROM dbo.sudoko GROUP BY rowid,cellvalue HAVING count(*)>1 UNION ALL SELECT ColumnId AS IndexValue,'Column' AS Item,cellvalue FROM dbo.sudoko GROUP BY ColumnId,cellvalue HAVING count(*)>1 GO ------------------------------------------------------------
|
|
| |||||||||||||||||||||||||||
|
|
| |||||||||||||||||||||||||||
|
|
|
Concluding thoughts
I have tested this routine on the Sudoko puzzles in the UK's "The Independent" newspaper over the past week
or so and in each case it has produced the correct solution.
I see no reason why this Sudoko solver shouldn't be extended to solve any size of puzzle thought this would almost
certainly move away from the single solving stored procedure.
Of course it is one thing to solve a Sudoko puzzle but the real challenge would be to come up with the necessary
SQL code to generate the solution!