March 11, 2004 at 9:26 am
Anyone knows how to create a mathematical function for the following purpose:
Problem: A table contains rows of a single field. We wish to determine the number of combinations of these chosen 2 at a time.
examples:
if the table contained rows a, b and c the answer would be three rows: ab, ac, bc
if the table contained rows a, b, c and d the answer would be six rows: ab, ac, ad, bc, bd, cd
if the table contains n rows, the answer has (n!)/(2*(n-2)!)
if I have a table of 26 letters, the result set should contain 325 rows.
How to create a user-defined function to generate the combinations of any arbitrary populated table?
Thanks!
Frank
March 11, 2004 at 11:08 am
In general the formula for a combination of M things taken N at a time is:
C(M,N)=M!/(N!)(M-N)!
For example:
In numbers, C(5,3) = (5*4*3*2*1)/(3*2*1)(2*1) = 60/6 = 10. These numbers mean that there are 60 ways to order any three of five objects, and for each particular group of three there are six ways to order them; so the number of distinct groups of three out of five objects is 60 divided by 6.
Francis
March 11, 2004 at 11:18 am
Thanks. You are right about the formula. Any ideas how to write it in sql?
March 11, 2004 at 11:26 am
You will find it difficult to create a UDF for this as there are some limitations in SQL data types. For example, 21! is creater than the largest number a biginit in sql can hold ; 20! = 2432902008176640000. Also if you use recusion, you are limited to 32 calls so theoretically even if you could store a number bigger than 21! and you used recusion in the function you couldn't calculate anything bigger than 32! I can probably come up with a function with a bit of thought.. stay tuned.
Francis
March 11, 2004 at 1:39 pm
You don't need a function to do this, just a simple self-join.
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col < b.Col
--Jonathan
March 11, 2004 at 1:48 pm
Thank you very much Jonathan!
March 11, 2004 at 2:03 pm
I missread the question. I thought you wanted the number, not the actual rows. That is I thought you wanted to supply the numbers 5, 2 and get the answer 10 and opposed to supply a table with5 rows and get a result set with 10 rows. This was like one of the QOD that you just don't get even after reading it a couple of times. Of course now its obvious.
Francis
March 11, 2004 at 3:08 pm
Thanks fhanlon.
Johnathan, There is still a problem with simple self-join. If the table has duplicates, the total number of return will not match the binomial coefficient sequence by self-join. In my case, aa is legitimate for two rows a and a. Any ideas to count dups and combine them?
Thanks.
Frank
March 11, 2004 at 3:13 pm
The "problem," then is not with the code but instead with your examples. How about:
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col <= b.Col
--Jonathan
March 11, 2004 at 3:34 pm
You will miscount unique rows if you use this script. It can't distinguish unique rows from dups. For example, you have five rows with values a, b, c, d and d. The return should be ten rows as:
------
ab
ac
bc
ad
bd
cd
ad
bd
cd
dd
Using SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col <= b.Col, the return is
------
aa
ab
bb
ac
bc
cc
ad
bd
cd
dd
dd
ad
bd
cd
dd
dd
(16 row(s) affected)
Using SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col < b.Col, the return is
------
ab
ac
bc
ad
bd
cd
ad
bd
cd
(9 row(s) affected)
March 11, 2004 at 7:18 pm
Okay, now I finally know what you're trying to do here.
In SQL, it's a good idea to have a primary key...
create table tab(
col char)
insert tab select 'a' union all select 'b' union all select 'c' union all select 'd' union all select 'd'
SELECT IDENTITY(int) Id, Col
INTO #Tab
FROM Tab
SELECT a.Col + b.Col
FROM #Tab a JOIN #Tab b ON a.Col <= b.Col and a.Id < b.Id
--Jonathan
March 12, 2004 at 8:10 am
Thanks Johnathan.
In general, the issue is related to a binomial coefficient theory. The return rows should be nCm, read as n choose m. Here is the list of values:
m n nCm
------- --------- -------
2 2 1
3 2 3
4 2 6
5 2 10
6 2 15
7 2 21
Using you script can’t get correct returns. For example if a table has 6 rows and two pairs of dups, then return rows should be 15 as
col
----
a
b
c
d
d
a
(6 row(s) affected)
----
ab
ac
bc
ad
bd
cd
ad
bd
cd
dd
aa
ab
ac
ad
ad
(15 row(s) affected)
However, your script only returns 11 rows as follows
----
ab
ac
bc
ad
bd
cd
ad
bd
cd
dd
aa
(11 row(s) affected)
March 12, 2004 at 9:13 am
Okay, how about something like this?
SELECT a.Col + b.Col
FROM #Tab a JOIN #Tab b ON a.Id <> b.Id AND a.Col <= b.Col AND NOT (a.Col = b.Col AND a.Id > b.Id)
--Jonathan
March 12, 2004 at 9:29 am
Thank you very much! You got all right. Appreciate ...
March 12, 2004 at 11:20 pm
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col < b.Col
UNION
SELECT a.Col + b.Col
FROM Table a JOIN Table b ON a.Col = b.Col
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply