I may be carbon-dating myself here, but one of the first things I implemented on my Apple IIe in the early 80s was John Conway's Game of Life. If you're not familiar with it, it's a mathematical game that simulates cellular automata with fixed rules. There's an infinite square array containing spaces for cells. Every generation, cells can be born or they can die, according to very simple rules. Despite the simple rules, very complex structures can be created. See the wiki page for more information.
All the code here can be found on github: https://github.com/pauljchang/sandbox/blob/master/life/life.sql
The rules for the game are:
- Any empty cell with exactly three neighbours gives "birth" to a new cell.
- Any cell with 1 or 0 neighbours "dies" from starvation and is removed.
- Any cell with 4 or more neighbours "dies" from overcrowding and is removed.
For fun, I've implemented this in BASIC and C++, and maybe even Java, but I thought I'd share my MSSQL implementation, as it uses some interesting concepts.
First of all, instead of using an array, we are going to create a table that contains one row for every cell that is in the field. This may seem overkill, but for very sparse fields where there are just a few cells occupying a very large area, this can be surprisingly efficient, and more efficient than a very large, two-dimensional array.
So, let's see the table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | -- -- dbo.cell -- Table to hold all cells -- -- Each cell has an entry in the dbo.cell table -- It knows its (x,y) coordinates, as well as the generation number -- The generation number is useful for diagnostics -- and indicates how "old" a cell is -- CREATE TABLE dbo.cell ( x SMALLINT NOT NULL , y SMALLINT NOT NULL , gen SMALLINT NOT NULL ); CREATE UNIQUE CLUSTERED INDEX KUX_cell ON dbo.cell (y, x); CREATE UNIQUE INDEX UX_cell_x_y ON dbo.cell (x, y) INCLUDE (gen); CREATE INDEX X_cell_gen ON dbo.cell (gen, y, x); |
You might notice that I clustered the table along the y-axis, which will facilitate printing out the field for display. I also included other indices that will help later on as we step through generations of cells.
There is also a "gen" field for the generation of the cell. This is interesting information, as to the age of the cells, and also helps us distinguish between newly "born" cells and cells that were already there.
Let's also seed the field with the classic R-Pentamino shape. This happens to be one of the smallest patterns that grows into a huge shape that doesn't stabilise for many generations. Here's what we want to seed:
+---+ 1 | 00| 0 |00 | -1 | 0 | +---+
The numbers on the left are the y-axis, and I didn't think it was necessary to display the x-axis. Here's the code to seed the table:
1 2 3 4 5 6 7 8 | INSERT INTO dbo.cell (x, y, gen) VALUES ( 0, 1, 0) , ( 1, 1, 0) , (-1, 0, 0) , ( 0, 0, 0) , ( 0, -1, 0) ; |
In order to step through, there are two calculations we must make. One, we have to see if any new cells will be born with the existing cells on the field. And two, we have to see if any of the existing cells (not the newly born cells) will die.
To do this, I use CTEs (Common Table Expressions), which are like temporary queries or views that live only for the lifespan of a query, but can be referenced by the query.
The first CTE I create is "delta", which allows me to look at neighbouring cells or spaces in all eight directions. Just one delta CTE can be referenced multiple times, which is very convenient.
1 2 3 4 5 | WITH delta (num) AS ( SELECT CAST (-1 AS SMALLINT ) AS num UNION ALL SELECT CAST ( 0 AS SMALLINT ) AS num UNION ALL SELECT CAST ( 1 AS SMALLINT ) AS num ) |
...
The next CTE is for generating a set of neighbouring empty spaces. This is based on the current set of cells, but looking in left, right, up, down, and diagonally for empty spaces. Here, we can see how "delta" is used. It's a pretty simple query -- return distinct neighbouring spaces that don't already have a cell.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ... , empty_neighbour (x, y) AS ( SELECT DISTINCT dbo.cell.x + delta_x.num AS x , dbo.cell.y + delta_y.num AS y FROM dbo.cell CROSS JOIN delta AS delta_x CROSS JOIN delta AS delta_y WHERE NOT EXISTS ( SELECT * FROM dbo.cell AS other_cell WHERE other_cell.x = dbo.cell.x + delta_x.num AND other_cell.y = dbo.cell.y + delta_y.num ) ) ... |
And another CTE, but this one counts how many neighbouring cells exist for each of those empty spaces. Again, we use "delta" to look in eight directions to count neighbouring cells.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ... , neighbour_count (x, y, neighbour_count) AS ( SELECT empty_neighbour.x , empty_neighbour.y -- This expression is here to eliminate the silly NULL agregation warning -- Otherwise, we could just COUNT(other_cell.gen) , COALESCE ( SUM ( CASE WHEN other_cell.gen IS NOT NULL THEN 1 ELSE 0 END ), 0) AS neighbour_count FROM empty_neighbour CROSS JOIN delta AS delta_x CROSS JOIN delta AS delta_y LEFT JOIN dbo.cell AS other_cell ON other_cell.x = empty_neighbour.x + delta_x.num AND other_cell.y = empty_neighbour.y + delta_y.num GROUP BY empty_neighbour.x , empty_neighbour.y ) ... |
I have a funny COALESCE(SUM(CASE...)) statement because I'm trying to avoid the annoying aggregation of NULL values warning. Just think of it as counting the number of neighbours.
And finally, the INSERT. We insert with the current generation number to distinguish the newly "born" cells from older cells. This will be useful in the DELETE query.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ... INSERT INTO dbo.cell (x, y, gen) SELECT neighbour_count.x , neighbour_count.y , @gen AS gen FROM neighbour_count WHERE neighbour_count.neighbour_count = 3 ORDER BY neighbour_count.y , neighbour_count.x ; ... |
+---+ 1 |100| 0 |00 | -1 |10 | +---+
The "1" cells are the ones that we just inserted, based on the rule of empty spaces with three surrounding cells. But we still need to delete cells that are overcrowded or starved. In this case, the centre cell should be removed because it has four neighbours (not including the newly "born" ones).
Here, we merely count the neighbours of existing cells, minus newly "born" ones. Again, we use the "delta" CTE.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ... , neighbour_count (x, y, neighbour_count) AS ( SELECT dbo.cell.x , dbo.cell.y -- This expression is here to eliminate the silly NULL agregation warning -- Otherwise, we could just COUNT(other_cell.gen) , COALESCE ( SUM ( CASE WHEN other_cell.gen IS NOT NULL THEN 1 ELSE 0 END ), 0) AS neighbour_count FROM dbo.cell CROSS JOIN delta AS delta_x CROSS JOIN delta AS delta_y LEFT JOIN dbo.cell AS other_cell ON other_cell.x = dbo.cell.x + delta_x.num AND other_cell.y = dbo.cell.y + delta_y.num -- Don't count the cells we just created AND other_cell.gen < @gen -- We don't want to count the cell itself, just neighbours AND ( other_cell.x <> dbo.cell.x OR other_cell.y <> dbo.cell.y ) WHERE -- Don't count the cells we just created dbo.cell.gen < @gen GROUP BY dbo.cell.x , dbo.cell.y ) ... |
...and the final delete:
1 2 3 4 5 6 7 8 9 10 11 12 13 | ... DELETE dbo.cell FROM dbo.cell INNER JOIN neighbour_count ON neighbour_count.x = dbo.cell.x AND neighbour_count.y = dbo.cell.y AND ( neighbour_count.neighbour_count <= 1 OR neighbour_count.neighbour_count >= 4 ) ; ... |
Now our field looks like the following:
+---+ 1 |100| 0 |0 | -1 |10 | +---+
If we step forward a few generations, we see the pattern grow:
+----+ 2 | 2 | 1 | 10 | 0 |2 2| -1 | 10 | +----+ +----+ 2 | 32 | 1 | 103| 0 |2 2| -1 | 10 | +----+ +----+ 2 | 3 4| 1 |4 3| 0 |2 2| -1 | 10 | +----+ +-----+ 2 | 5 | 1 |45 35| 0 |2 2 | -1 | 10 | +-----+
And after 100 generations, we see it grows, and keeps growing:
+--------------------------------------------------+ 12 | 98 | 11 | 9 9 | 10 | 58 8 8 | 9 | 45 98 | 8 | 89 | 7 | 0 70 45 00 9 | 6 | 989 0 69 49 80 | 5 | 9 70 | 4 | 80 07 069 89| 3 | 950 980 08 80| 2 | 09 0 06 0 6 89 00076 | 1 | 0 0 38 8 5 0 7 90 | 0 | 0 0 6 8 9 8 | -1 | 9 9 9 98 9 0 0 | -2 | 00 0 089 | -3 | | -4 | 09 80 | -5 | 7 | -6 |07 0 71 | -7 |88 0 52 | -8 |0 | -9 | 9 0 | -10 | 0 8 | -11 | 090 | +--------------------------------------------------+
The code on github contains two stored procedures -- "print_cells" to display the field, and "step_cells" to step one generation. It also contains code to set up the initial field with the R-Pentamino example above. The code is written for MSSQL, but I've tried to stay within ANSI SQL-92, so it should be reasonably easy to port this to other RDBMSs.
I'm reasonably happy with this code. Unlike array implementations, which slow down as the patterns grow larger and occupy more rectangular space, this implementation grows O(n), linearly with the number of cells, n.
Is there a better way to implement this in SQL? Of course, we all know that SQL is likely not the best way to implement this, but within the bounds of SQL, how else can we improve this?