Blog Post

Sudoku by SQL Part 2 of 3

,

The Challenge

Since I came across the first sudoku I was wondering if it could by solved by SQL using queries that select just the right value for a cell.

Well, some day I tried, here is what I have got now.

Previous Part

In part one I introduced tables, views and triggers. In this part comes the logic.

Rules

Single Candidate

CREATEPROCEDURE ApplyRuleSingleCandidate (@rows SMALLINT OUTPUT) AS
BEGIN
  -- if there is only one candidate for a cell, the cell has this value
  INSERT INTO Cell (CellID, [Rule], Value)
  SELECT t.CellID, t.[Rule], t.Value
  FROM (
    SELECT c.CellID, 'Single Candidate' AS[Rule], MAX(c.Value) AS Value
    FROMCandidate c
    GROUP BY c.CellID
    HAVING COUNT(*)=1
  ) t
  SET @rows=@@ROWCOUNT
  IF EXISTS(SELECT 2 FROM Violation)
   THROW 50001, 'Invalid value in cells', 1
END;

The main point is COUNT(*)=1. There is only one candidate for a cell. As aggregation I choose MAX, as there is only one row, MIN would do the same. For traceability I use the string 'Single Candidate' as the rule. After inserting I check if any violation exists.

How can violations happen?

If we operate pure rule based, no violation should happen. But if we implement guessing we have to check if a guess leads us to a dead end.

Single Token In Group

CREATEPROCEDURE ApplyRuleSingleTokenInGroup (@rows SMALLINT OUTPUT) AS
BEGIN
  -- if, for a given value, there is only one cell in a row, column or square as a candidate for this value, it has to be there
  INSERT INTO Cell (CellID, [Rule], Value)
  SELECT CellID, [Rule], Value
  FROM (
    SELECT MAX(c.CellID) AS CellID
    , 'Single Token in ' + CASE WHEN l.GroupType='C' THEN 'Column' WHEN l.GroupType='R' then 'Row' ELSE 'Square' END AS [Rule]
    , c.Value
    , ROW_NUMBER() OVER(PARTITION BY MAX(c.CellID) ORDER BY l.GroupType) AS RowNumber
    FROMCandidate c
    INNER JOIN Location l ON c.CellID=l.CellID
    GROUP BY l.GroupType, l.GroupValue, c.Value
    HAVING COUNT(*)=1
    ) t
  WHERERowNumber=1 -- we need this to avoid duplicate inserts due to more than one rule that applies
  SET @rows=@@ROWCOUNT
  IF EXISTS(SELECT 2 FROM Violation)
   THROW 50001, 'Invalid value in cells', 1
END;

Again, the main point ist COUNT(*)=1. But grouping is now by value and unit. Unit means row, column or square. The encapsulation in a subquery and restriction to RowNumber=1 is only to avoid duplicate key violation if a value unique in more then one unit, e.g. a row and a column.

Eliminating candidates

The two rules above are all rules that give us a value for a cell that I can think of. If you can think of another, let me know.
Further progress in solving relies on rules that eliminate candidates.

Complete Match

CREATEPROCEDURE RemoveCandidateCompleteMatch (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE c
  FROMCandidate c
  INNER JOIN CandidateMask m ONc.CellID=m.CellID
  INNER JOIN Location l on m.CellID=l.CellID
  INNER JOIN (
    SELECT l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    FROM SetAll s
    INNER JOIN CandidateMask c1 ONs.SetValue = c1.Mask
    INNER JOIN Location l1 ONc1.CellID=l1.CellID
    WHERE s.Size NOT IN (1, 9)
    GROUP BY l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    HAVING COUNT(*) = s.Size
  ) h ON h.GroupType=l.GroupType AND h.GroupValue=l.GroupValue
  WHERE m.Mask != h.SetValue
  AND c.Mask & h.SetValue != 0
  SET @rows=@@ROWCOUNT
END;

If we find a set of candidate values, lets say a set with two members, in a unit exactly two times, the values of this set cant be anywhere else in the unit.
Example: candidate set 1,3 in row 4 two times => 1 and 3 can be removed as candidates from all other cells in row 4.

Partial Match

CREATEPROCEDURE RemoveCandidatePartialSet (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE c
  FROMCandidate c
  INNER JOIN CandidateMask m onc.CellID=m.CellID
  INNER JOIN Location l on m.CellID=l.CellID
  INNER JOIN (
    SELECT l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    FROM SetAll s
    INNER JOIN CandidateMask c1 ONs.SetValue &c1.Mask = c1.Mask AND ~s.SetValue & c1.Mask = 0
    INNER JOIN Location l1 ONc1.CellID=l1.CellID
    WHERE s.Size NOT IN (1, 9)
    GROUP BY l1.GroupType, l1.GroupValue, s.SetValue, s.Size
    HAVING COUNT(*) = s.Size
  ) h ON h.GroupType=l.GroupType AND h.GroupValue=l.GroupValue
  WHERE ~h.SetValue & m.Mask != 0 AND c.Mask & h.SetValue != 0
  SET @rows=@@ROWCOUNT
END;

 If a part of a candidate set, lets say with 3 members, are the only candidate members for 3 cells in a unit, we can remove the members of the set from all other candidate values for the unit.
Example: 1,3 is in cell 11, 1,4 is in cell 13 and 3,4 is in cell 22. Our set is 1,3,4, our square is 1, we can remove the values 1,3 and 4 from all candidates from square 1.

Same Row Or Column And Square  

CREATEPROCEDURERemoveCandidateValueInSameRowOrColumnAndSquare (@rows SMALLINT OUTPUT) AS
BEGIN
  DELETE FROM c
  FROMCandidate c
  INNER  JOIN Location l ON c.CellID=l.CellID
  INNER  JOINLocationWide s ON c.CellID=s.CellID
  INNER JOIN (
    SELECT c.Value, ls.[Square], lrc.GroupType, MAX(lrc.GroupValue) as GroupValue
    FROMCandidate c
    INNER JOIN LocationWide ls ONc.CellID=ls.CellID
    INNER JOIN Location lrc ONc.CellID=lrc.CellID AND lrc.GroupType IN ('C','R')
    GROUP BY c.Value, ls.[Square], lrc.GroupType
    HAVING COUNT(DISTINCT ls.[Square])=1 AND COUNT(DISTINCT lrc.GroupValue)=1
  ) a ON a.Value=c.Value AND l.GroupType=a.GroupType AND l.GroupValue=a.GroupValue
  WHERE s.[Square] != a.[Square]
  SET @rows=@@ROWCOUNT
END;

If a candidate exists within a square multiple times, but only in one row or column, the candidate value can be removed for cells in this row but in another square.
Example: candidate value 3 exists in cell 12 and in cell 13 but nowhere else in square 1. candidate value 3 can be removed for cells 14, 15, 16, 17, 18, 19.

Executing Rules

Now we can execute the rules until they do not affect any rows.
  DECLARE@cells INT
  DECLARE@affected INT
  SET @affected=0
  SET @cells=1
  WHILE @cells>0
  BEGIN
    WHILE@cells>0
    BEGIN
      EXECApplyRuleSingleCandidate @cells OUTPUT
      print SPACE(@@nestlevel) + 'Iteration ' + CAST(@iteration AS VARCHAR(5)) + ' ApplyRuleSingleCandidate solves ' + CAST(@cells AS VARCHAR(5)) + ' cells.'
     
      EXECApplyRuleSingleTokenInGroup @affected OUTPUT
      print SPACE(@@nestlevel) + 'Iteration ' + CAST(@iteration AS VARCHAR(5)) + ' ApplyRuleSingleTokenInGroup solves ' + CAST(@affected AS VARCHAR(5)) + ' cells.'
      SET@cells=@cells+@affected
      SET@iteration=@iteration+1
    END
    -- execution of rules that give values directly ends here
    -- now we apply rules that remove candidates
    EXECRemoveCandidateCompleteMatch @cells OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@cells AS VARCHAR(10)) + ' candidates due to complete match in same group'
    EXECRemoveCandidatePartialSet @affected OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@affected AS VARCHAR(10)) + ' candidates due to exclusive partial sets in same group'
    SET @cells=@cells+@affected
    EXECRemoveCandidateValueInSameRowOrColumnAndSquare @affected OUTPUT
    PRINT SPACE(@@nestlevel) + 'Removed ' + CAST(@affected AS VARCHAR(10)) + ' candidates due to candidate values in a square in only one row or column'
    SET @cells=@cells+@affected
  END

 So that is all we can do right now. If you know more rules, let me know.

Guessing

But I found some sudokus that could not be solved be these rules. I was more curious about implementing guessing than implementing another rule. With guessing, I wanted to make the process as following: 
  1. Try solving by rules
  2. If that does not work out, make up a list of choices. Take the cell with most candidates, fill in the first choice. Try step 1. If it does not solve the sudoku, try the next choice.

You see, it is going to be recursive. Here is the code:

  DECLARE@guess TABLE (CellID SMALLINT, Value SMALLINT)
  INSERT INTO @guess (CellID, Value)
  SELECT c.CellID, c.Value
  FROMCandidate c
  WHERE c.CellID = (SELECT TOP 1 cm.CellID
                    FROMCandidateMask cm
                    INNERJOIN SetAll s ONs.SetValue=cm.Mask
                    ORDERBY s.Size DESC, cm.CellID )
  ORDER BY c.Value
  DECLARE@cellID SMALLINT,@value SMALLINT,@count SMALLINT
  WHILE (SELECT COUNT(*) FROM Cell)<81
  BEGIN
    SET @count=(SELECT COUNT(*) FROM @guess)
    IF (@count=0)
      THROW50002, 'Error in guessing, all candidates used but not solved.',1
    IF NOT EXISTS(SELECT 2 FROMCandidate)
      THROW50002, 'Error in guessing, no candidates.', 1
    SELECT TOP 1 @cellID=g.CellID, @value=g.Value
    FROM @guess g
    ORDER BY g.CellID, g.Value
    DELETE FROM @guess
    WHERECellID=@cellID ANDValue=@value
    PRINT SPACE(@@nestlevel) + 'nestlevel ' + CAST(@@NESTLEVEL as VARCHAR(10)) +': guessing value ' + CAST(@value as CHAR(1)) + ' in cell ' + CAST(@cellID as VARCHAR(5))
    BEGIN TRY
      SAVE TRANSACTION Guessing
      INSERT INTO Cell(CellID, Value, [Rule])
      VALUES (@cellID, @value, 'Guess')
      EXECTrySolve @iteration
    END TRY
    BEGIN CATCH
      PRINT SPACE(@@nestlevel) + 'rolling back transaction due to ' + ERROR_MESSAGE()
      ROLLBACK TRANSACTION Guessing
      IF (ERROR_NUMBER() NOT IN (50001, 50002))
        THROW
    END CATCH
  END
END;

This code together with the code above make up the procedure TrySolve, with is called recursively at EXEC TrySolve @iteration.

Next Part

In the next part I will introduce some convenience features for entering start data and executing the solver.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating