Generating n-Tuples with SQL

I ran across an interesting problem the other day that required this and I thought I’d post on this topic of n-Tuples as a result.

First what is an n-Tuple? Suppose you have a list of unique character strings like ‘1’ ‘2’ ‘3’ and these are stored as rows in a table. Tuples are simply combinations of these values, so 1-Tuples would be:

1

2

3

2-Tuples would be:

1,2

1,3

2,1

2,3

3,1

3,2

And of course there are these 3-Tuples:

1,2,3

1,3,2

2,1,3

2,3,1

3,1,2

3,2,1

Let’s now write a SQL query that generates Tuples for all n where we have unique strings we want to combine:

DECLARE @t TABLE (strcol CHAR(3))
INSERT INTO @t (strcol)
SELECT '001' UNION ALL SELECT '002' UNION ALL SELECT '003'
--UNION ALL SELECT '004' UNION ALL SELECT '005' UNION ALL SELECT '006'
--UNION ALL SELECT '007'  UNION ALL SELECT '008'
--UNION ALL SELECT '009' UNION ALL SELECT '010'
;WITH nTuples (n, Tuples) AS (
      SELECT DISTINCT 1, CAST(strcol AS VARCHAR(max))
      FROM @t
      UNION ALL
      SELECT 1 + n.n, t.strcol + ',' + n.Tuples
      FROM @t t JOIN nTuples n ON t.strcol <> n.Tuples
      WHERE CHARINDEX(t.strcol, n.Tuples) = 0
)
SELECT *
 FROM nTuples
 ORDER BY n, Tuples

This recursive CTE returns the following results set when run for n=3:

n Tuples

1 001

1 002

1 003

2 001,002

2 001,003

2 002,001

2 002,003

2 003,001

2 003,002

3 001,002,003

3 001,003,002

3 002,001,003

3 002,003,001

3 003,001,002

3 003,002,001

Interestingly enough, that wasn’t really the result I needed but it was pretty close. What I really needed was what I call “unique, ordered” n-Tuples. In other words, if I list out the 2-Tuples from above, I can show you only the ones I need:

1,2 [Want]

1,3 [Want]

2,1 [Don’t Want]

2,3 [Want]

3,1 [Don’t Want]

3,2 [Don’t Want]

This means that 1,2 is equivalent to 2,1 but I want only want the first one where the data is ordered across the Tuple. Likewise, there will then be only 1 3-Tuple that I’m interested in: 1,2,3

This actually ends up being a pretty simple change to the above CTE. The not equals (<>) has been changed to less than (<).

;WITH UNIQUEnTuples (n, Tuples) AS (
      SELECT DISTINCT 1, CAST(strcol AS VARCHAR(max))
      FROM @t
      UNION ALL
      SELECT 1 + n.n, t.strcol + ',' + n.Tuples
      FROM @t t JOIN UNIQUEnTuples n ON t.strcol < n.Tuples
      WHERE CHARINDEX(t.strcol, n.Tuples) = 0
)
SELECT *
 FROM UNIQUEnTuples
 ORDER BY n, Tuples

The results from running this SQL with n=3 are:

n Tuples

1 001

1 002

1 003

2 001,002

2 001,003

2 002,003

3 001,002,003

Thinking about it, I realized that the row sets generated by these bad boys are going to grow quite quickly! After a little bit of playing around and without resorting to looking up the combinatorial statistics, I was able to deduce that the number of records in the respective row sets are generated by these formulas:

n-Tuples: Sum over n + n*(n-1) + n*(n-1)*(n-2) + … + n*(n-1)*…*(n-(n-1))

Unique, ordered, n-Tuples: 2^n-1

For the n-Tuples formula, you can verify in a spreadsheet that the last 2 factors in the series for n>=2 are n! and that, my intrepid reader, becomes a really big number very quickly!

Of course, being constantly performance-minded, I couldn’t help but wonder how well SQL would handle these huge row sets, so I set about to running my SQL across incrementing n’s.

For n-Tuples, I got these results (CPU and elapsed time are in milliseconds):

n #n-Tuples CPU Elapsed

1 1 0 1

2 4 0 1

3 15 0 1

4 64 0 2

5 325 15 83

6 1956 47 202

7 13699 468 831

8 109600 3682 7033

9 986409 41809 116940

10 9864100 485054 1410456

At n=10, I decided to stop because I’d be too impatient to run for n=11!

But you can take the unique, ordered, n-Tuples much further as the progression of increase in number of the rows in the row sets is much slower.

n #n-Tuples CPU Elapsed

1 1 0 0

2 3 0 0

3 7 0 0

4 15 0 0

5 31 0 2

6 63 15 2

7 127 0 4

8 255 15 62

9 511 0 34

10 1023 16 70

11 2047 63 264

12 4095 109 218

13 8191 234 392

14 16383 453 799

15 32767 1030 1717

16 65535 2230 3711

17 131071 4555 8535

18 262143 10452 27458

19 524287 21871 58594

20 1048575 45802 135406

21 2097151 -- --

22 4194303 -- --

23 8388607 -- --

Once again, I’ll be too impatient to wait for n=21 to complete!

There are some caveats to these functions that you should be aware of:

If there are any duplicates in your source table, these will be eliminated in the results (by design).
You need to take care if your data points look like A, AA, AAA. Note how the source table has defined this column as CHAR(3). This is also by design. The CHARINDEX statement would remove the n-Tuples (where n>1) of this particular set if you used VARCHAR(3) instead. So if you’re working with integers you want n-Tuples of, your best course of action is to left pad them with 0’s or right pad them with blanks.

We hope you enjoyed this quick excursion into the world of recursive CTEs and n-Tuples. We know we most certainly did. Perhaps even one day it will be as useful for you as it was for me.

Now if I can only figure out how these pesky recursive CTEs really work, then I could post something really interesting!

Rate

Share

Categories

Share

Rate

Generating n-Tuples with SQL

Rate

Share

Categories

Share

Rate

Related content

Tame Those Strings - Using SUBSTRING

Troubleshooting ASP and ADO Errors

Using Stored Procedures in SQL Server

Using XP_FILEEXIST

Missing Temp Tables