Don’t Put All Your Eggs in One Basket – Use Cartons!
The Puzzle
You are the DBA for Farmer Brown, who runs a chicken farm. Or is it a chicken ranch? Here is Texas, that can be a legal issue. One side argues that it has animals and not plants, therefore it is ranch. The other side says it is a farm because a ranch has livestock, and livestock is defined by an animal that is herded to market. There are cattle drives, there are sheep drives but there are no chicken drives. There is a story about that disaster, but I digress.
Farmer Brown gathers eggs in baskets and the eggs are homogeneous. Each basket has an identifier which tells us where and when it was collected. Or are eggs harvested if it is a farm? But I digress again.
We do not put our eggs in one basket; we put them in dozen egg cartons. The dozen is an incredibly strong tradition. During the decimalization of the UK a dairy offered ‘decimal eggs’- a ten-pack of eggs. People would not buy it; it was just too weird, even when the cost per egg was less. I have a theory about the chickens being taught to lay eggs that way. However, Brits were happier to consider adjusting to metric beer when it was explained that a liter is more beer than a quart, before dismissing the idea as a Napoleonic trick.
We need table(s) for the baskets and the cartons. A basket comes in, and it is split into cartons with 12 or fewer eggs. This means that we will have one or zero cartons with fewer than 12 eggs in it. Example: the basket with 144 eggs becomes 12 cartons. The Basket with 255 eggs becomes 21 full cartons and one with 3 eggs.
- Your first job is to design the table(s)! Yes, this is a puzzle where you have to write DDL. Just doing SELECT/UPDATE/INSERT/DELETE is only part of SQL; most of the real work is in DDL. A bad schema design will force bad DML. A carton, for example, must have CHECK (COUNT(egg_cnt BETWEEN 1 AND 12) constraint.
- Split the baskets of eggs into cartons, each carton having its own identifier. This can be a statement or a stored procedure. Call this procedure or function SplitBaskets().
- Once the baskets are split into cartons, write a procedure to split a carton. You will need parameters for the source carton and the number of eggs moved. You need to create a new carton. Call this procedure or function SplitCarton(). You need to watch the size rules.
- Once the baskets are split into cartons, write a procedure to consolidate two cartons. You will need parameters for the source carton and the destination carton. Assume that the destination carton gets filled and the source carton decremented by the appropriate amount. I.e. (c1 = 10, c2 = 4) => (c1 = 12, c2 = 2). Call this procedure or function ConsolidateCartons(). Watch out for the empty carton problem.
The answer needed include the DDL as well as the DML, as the code is easier if you get the schema design right.
The competitors had to…
- Solve the problem — Duh!
- Avoid proprietary features in SQL Server that will not port or be good across all releases, present and future.
- Use Standard features in SQL Server that will be good across all releases, present and future. Extra points for porting code.
- Be clever but not obscure.
- Explain what they were doing
A discussion about the Stumper
I really liked my Egg Puzzle because it was a chance to show off a lot of newer programming tricks. I usually write ANSI Standard SQL and then translate it into local dialect, so bear with me.
Begin by modeling Baskets of eggs. We can assume that there is a basket identifier of some kind (basket_id) and that it has a certain number of eggs in it. The collection date and other information is not needed for the puzzle, but put him in there to remind us about that data.
1 2 3 4 5 6 7 |
CREATE TABLE Baskets (basket_id INTEGER NOT NULL PRIMARY KEY, collection_date DATE DEFAULT CURRENT_DATE NOT NULL, egg_cnt SMALLINT NOT NULL CHECK (egg_cnt >= 0), dozen_cnt GENERATED ALWAYS AS (FLOOR(egg_cnt/12)), partial_cnt GENERATED ALWAYS AS (MOD(egg_cnt, 12)) ); |
or in SQL Server
1 2 3 4 5 6 7 |
CREATE TABLE Baskets (basket_id INTEGER NOT NULL PRIMARY KEY, collection_date DATE DEFAULT GETDATE() NOT NULL, egg_cnt SMALLINT NOT NULL CHECK (egg_cnt>=0), dozen_cnt AS (FLOOR(egg_cnt/12)) PERSISTED, partial_cnt AS (egg_cnt%12) PERSISTED ) ; |
The number of possible full cartons is a computed value, as its the size of the partial, possibly empty, carton. But where do you compute these values? I think that Dkorzennik was correct to do this job in the table where the parameter lives. They can done with the new computed column feature; Standard SQL uses “GENERATED ALWAYS AS” instead of the “PERSISTED” syntax of SQL Server.
Integer math is preferred over using FLOOR() and CEILING() functions on decimal or floating point expressions.
Next, we need to model the cartons. I am making the assumption that a carton is identified by (basket_id, carton_id) because that is the pattern required by law for chickens, swine, shrimp, etc: RFID tags. Basically, animals are sold in ‘lots’ and not individually identified.
1 2 3 4 5 6 7 |
CREATE TABLE Cartons (basket_id INTEGER NOT NULL REFERENCES Baskets (basket_id) ON UPDATE CASCADE, carton_id INTEGER NOT NULL, PRIMARY KEY (basket_id, carton_id), egg_cnt SMALLINT CHECK (egg_cnt BETWEEN 0 AND 12)); |
Notice the DRI action on the key and a CHECK() on the egg count in each carton. The CHECK() might look redundant at first because the computed columns assure that dozen_cnt and partial_cnt are always between 0 and 12 when we get to the procedure to pack the cartons. Wrong. Firstly, this constraint assures that nobody can subvert the data. Secondly, the optimizer can use it, but cannot use the computations.
Now, let’s pack those cartons. Carton #0 will always be the partial carton from its basket that we cannot ship. That is a handy thing for a lot of queries, where we need to look at partial and full cartons. The parameter for the procedure is just the basket identifier; we have the other data we need already computed in the Baskets table. SQL Server people will need to turn the IN keyword into a @ prefix.
1 2 3 4 5 6 7 8 9 10 11 |
CREATE PROCEDURE Split_Cartons (IN in_basket_id INTEGER) INSERT INTO Cartons (basket_id, carton_id, egg_cnt) SELECT in_basket_id, 0, partial_cnt FROM Baskets WHERE basket_id = in_basket_id UNION ALL SELECT in_basket_id, S.seq, 12 FROM Series AS S, Baskets AS B WHERE B/basket_id = in_basket_id AND S.seq <= dozen_cnt; |
Or in SQL Server…
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE PROCEDURE Split_Cartons (@in_basket_id INTEGER) AS INSERT INTO Cartons (basket_id, carton_id, egg_cnt) SELECT @in_basket_id, 0, partial_cnt FROM Baskets WHERE basket_id=@in_basket_id UNION ALL SELECT @in_basket_id, S.seq, 12 FROM Series AS S,Baskets AS B WHERE B.basket_id=@in_basket_id AND S.seq<=dozen_cnt ; |
I assume we have the usual auxiliary table with a series of integers from 1 to (n). The first term of the UNION ALL created the partial, possibly empty, carton. I don’t want to have empty cartons, I can add a DELETE FROM statement after this or add a “.. AND partial_cnt > 0” predicate to the first SELECT.
The second SELECT returns the full cartons numbered 1 to dozen_cnt within the basket.
Here is a characteristic for good SQL programming: do as much work as you can in ONE statement, so the optimizer can have as much information as possible.
Now, the next procedure is supposed to move eggs from a source carton to the destination carton. Because SQL is a data language, I like data-driven solutions over computational ones. This is a state-change problem. There are “conservation of eggs” laws that have to be met:
- I cannot pack or unpack more than 12 eggs total to either carton.
- I cannot move more eggs than there are in the source carton.
- The total number of eggs is constant after they change locations.
So, we build a classic state-change table, with precondition and post-condition states.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE Consolidations (pre_dest_egg_cnt INTEGER NOT NULL CHECK (pre_dest_egg_cnt BETWEEN 0 AND 12), pre_src_egg_cnt INTEGER NOT NULL CHECK (pre_src_egg_cnt BETWEEN 0 AND 12), post_dest_egg_cnt INTEGER NOT NULL CHECK (post_dest_egg_cnt BETWEEN 0 AND 12), post_src_egg_cnt INTEGER NOT NULL CHECK (post_src_egg_cnt BETWEEN 0 AND 12), CONSTRAINT conservation_of_eggs CHECK (pre_dest_egg_cnt + pre_src_egg_cnt = post_dest_egg_cnt + post_src_egg_cnt), PRIMARY KEY (pre_dest_egg_cnt, pre_src_egg_cnt, post_dest_egg_cnt, post_src_egg_cnt) ); |
Or, in SQL Server…
1 2 3 4 5 6 7 8 9 10 11 |
CREATE TABLE Consolidations (pre_dest_egg_cnt INTEGER NOT NULL CHECK (pre_dest_egg_cnt BETWEEN 0 AND 12), pre_src_egg_cnt INTEGER NOT NULL CHECK (pre_src_egg_cnt BETWEEN 0 AND 12), post_dest_egg_cnt INTEGER NOT NULL CHECK (post_dest_egg_cnt BETWEEN 0 AND 12), post_src_egg_cnt INTEGER NOT NULL CHECK (post_src_egg_cnt BETWEEN 0 AND 12), CONSTRAINT conservation_of_eggs CHECK (pre_dest_egg_cnt + pre_src_egg_cnt = post_dest_egg_cnt + post_src_egg_cnt), constraint unique_consolidation PRIMARY KEY (pre_dest_egg_cnt, pre_src_egg_cnt, post_dest_egg_cnt, post_src_egg_cnt) ); |
The first thought to use the Series table and load the table like this:
1 2 3 4 5 6 7 8 9 10 |
INSERT INTO Consolidations (pre_dest_egg_cnt, pre_src_egg_cnt, post_dest_egg_cnt, post_src_egg_cnt) SELECT S1.seq, S2.seq, S3.seq, S4.seq FROM Series AS S1, Series AS S2, Series AS S3, Series AS S4 WHERE S1.seq <= 12 AND S2.seq <= 12 AND S3.seq <= 12 AND S4.seq <= 12 AND (S1.seq + S2.seq) = (S3.seq + S4.seq); |
Nice, but it does not work! We need to allow for cartons that have or will have zero eggs during consolidation. We could use a CTE that UNIONs a zero the to Series table, but it is just as easy to do a little math and subtract one to adjust the range.
1 2 3 4 5 6 7 8 9 10 |
INSERT INTO Consolidations (pre_dest_egg_cnt, pre_src_egg_cnt, post_dest_egg_cnt, post_src_egg_cnt) SELECT S1.seq-1, S2.seq-1, S3.seq-1, S4.seq-1 FROM Series AS S1, Series AS S2, Series AS S3, Series AS S4 WHERE S1.seq <= 13 AND S2.seq <= 13 AND S3.seq <= 13 AND S4.seq <= 13 AND (S1.seq + S2.seq) = (S3.seq + S4.seq); |
This will give us 169 rows to use. It might throw a new SQL programmer when he sees the two-part keys, but Standard SQL has row constructors; SQL Server programmers will have expand them. What we are going to do is find the egg count in the source table, the egg count in the destination table and use them to do a fancy look-up in the Consolidations. The MERGE statement is made for this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
CREATE PROCEDURE Consolidate_Cartons (IN in_src_basket_id INTEGER, IN in_src_carton_id INTEGER, IN in_dest_basket_id INTEGER, IN in_dest_carton_id INTEGER) MERGE INTO Cartons USING Consolidations ON (SELECT egg_cnt FROM Cartons -- AS Source WHERE (in_src_basket_id, in_src_carton_id) = (basket_id, carton_id)) = pre_src_egg_cnt AND (SELECT egg_cnt FROM Cartons -- AS Destination WHERE (in_dest_basket_id, in_dest_carton_id) = (basket_id, carton_id)) = pre_dest_egg_cnt WHEN MATCHED THEN UPDATE SET egg_cnt = CASE WHEN (basket_id, carton_id) = (in_src_basket_id, in_src_carton_id) THEN post_src_egg_cnt WHEN (basket_id, carton_id) = (in_dest_basket_id, in_dest_carton_id) THEN post_dest_egg_cnt ELSE egg_cnt END; |
In SQL Server, this would probably look like this…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
CREATE PROCEDURE Consolidate_Cartons (@in_src_basket_id INTEGER, @in_src_carton_id INTEGER, @in_dest_basket_id INTEGER, @in_dest_carton_id INTEGER ) as MERGE INTO Cartons USING Consolidations ON (SELECT egg_cnt FROM Cartons -- AS Source WHERE @in_src_basket_id=basket_ID and @in_src_carton_id=carton_id )=pre_src_egg_cnt AND (SELECT egg_cnt FROM Cartons WHERE @in_dest_basket_id=basket_id and @in_dest_carton_id=carton_id )=pre_dest_egg_cnt WHEN MATCHED THEN UPDATE SET egg_cnt =CASE WHEN (basket_id=@in_src_basket_id and carton_id=@in_src_carton_id) THEN post_src_egg_cnt WHEN (basket_id=@in_dest_basket_id and carton_id=@in_dest_carton_id) THEN post_dest_egg_cnt ELSE egg_cnt END ; |
The CASE expression puts the right post-condition numbers in the appropriate rows. If we cannot find our situation the Consolidations table, we do nothing because there are no matches.
You could make Consolidations into a derived table to get this into one statement. I am not sure that it is worth it; as the table is very small but since you can index a base table, there might be a slight performance boost.
Another point; this code was written in straight ANSI SQL without any proprietary features. Do you think that you would have any problems porting it to SQL Server? DB2? Oracle? Any SQL that has the SQL-99 standards or better?
And the winner is…
I was impressed by the ingenuity and energy of the contestants. As always, it is difficult to choose a winner because almost every entry had something of merit to offer. I liked Peso’s ingenoius entry, which would probably take the speed award, if there was one. I am a fan of his — his stuff is usually clean and solid code. When you earn a living fixing disasters, you really like to see that. However, Phil and I both voted to give this one to Dkorzennik in the end. He actually does this in the Real World. His code was pretty clean. But he have IDENTITY when it was not needed, a WHILE which is evil and some minor dialect that a Petty Printer would clean up (INSERT instead of INSERT INTO, DELETE instead of DELETE FROM, etc).
What I liked was seeing computed columns for computed VALUES. Nobody else got that and this was a DDL problem more than anything else.
Putting it into ISO-11179 data element names and adding constraints, he had:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE TABLE Baskets (basket_id INTEGER NOT NULL PRIMARY KEY, collection_date DATE DEFAULT CURRENT_DATE NOT NULL, egg_cnt SMALLINT NOT NULL CHECK (egg_cnt >= 0) dozen_cnt AS FLOOR(egg_cnt/12) PERSISTED, partial_cnt AS MOD (egg_cnt, 12) PERSISTED); CREATE TABLE Cartons (basket_id INTEGER NOT NULL REFERENCES Baskets (basket_id), carton_id INTEGER NOT NULL, PRIMARY KEY (basket_id, carton_id), egg_cnt SMALLINT CHECK (egg_cnt <= 12)); |
The trigger and its loop can be replaced by a one statement procedure. The partial carton is always #0.
1 2 3 4 5 6 7 8 9 |
CREATE PROCEDURE Split_Cartons (@i_basket_id INTEGER, @n_dozen_cnt SMALLINT, @n_partial_cnt SMALLINT) INSERT INTO Cartons (basket_id, carton_id, egg_cnt) SELECT (basket_id, carton_id, egg_cnt) FROM (VALUES(@in_basket_id, 0, @in_partial_cnt)) UNION ALL SELECT @in_basket_id, seq, 12 FROM Series -- a table of numbers from 1 to (n) WHERE seq <= @in_dozen_cnt; |