July 24, 2006 at 5:52 pm
I'm trying to count common items between pairs of sets. Basically, the job is to rationalise redundant sets of items by making appropriate supersets based on sets with high degrees of commonality. In essence, my data table looks something like this:
set1 item1
set1 item2
set1 item3
set2 item4
set3 item5
... ...
setN itemM
The 'items' column includes considerable redundancy. For example, the value of item4 might be identical to that of item1, meaning that this same value is contained in both set1 and set2 (and maybe a lot of other sets). N is ~ 10,000 and M is ~ 60,000.
I want to build an N x N table in which each cell (i,j) contains the percentage of items in set_i that are also contained in set_j. This table would then serve as a lookup during the subsequent rationalisation process.
I can write an SQL statement that calculates the number of items from set2 that are common to set1, by hard-coding the values of these fields, but I can't figure out how to do this in a general way (i.e. by rows, or in a single shot for the whole array). I am also unsure of how to populate the table using code (in MS Access).
I suppose someone out there has done something like this, but my search keywords have brought me no joy.
Thanks!
July 25, 2006 at 2:55 am
Will this code do?
-- prepare test data
declare @table table (s int, item int)
insert @table
select 1, 1 union all
select 1, 2 union all
select 1, 3 union all
select 2, 2 union all
select 3, 5
-- do the work
select t1.s s1,
t2.s s2,
1.0 * sum(case when t1.item = t2.item then 1 else 0 end) / count(distinct t1.item) matching
from @table t1
inner join @table t2 on t2.s <> t1.s
group by t1.s,
t2.s
order by t1.s,
t2.s
Good luck! Be aware that this will produce a 100 million row resultset.
N 56°04'39.16"
E 12°55'05.25"
July 25, 2006 at 3:48 am
This will produce only the pairs where there are matches, which will be much smaller resultset that previous
-- prepare test data
declare @table table (s int, item int)
insert @table
select 1, 1 union all
select 1, 2 union all
select 1, 3 union all
select 2, 2 union all
select 3, 5
-- do the work
select t1.s s1,
t2.s s2,
1.0 * sum(case when t1.item = t2.item then 1 else 0 end) / count(distinct t1.item) matching
from @table t1
inner join @table t2 on t2.s <> t1.s
inner join (
select t1.s s1,
t2.s s2
from @table t1
inner join @table t2 on t2.s <> t1.s and t2.item = t1.item
) z on t1.s = s1 and t2.s = s2
group by t1.s,
t2.s
order by t1.s,
t2.s
N 56°04'39.16"
E 12°55'05.25"
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply