counting common values of a field within super-fields

Question

Post reply

counting common values of a field within super-fields

Allen Nugent

Right there with Babe

Points: 734
More actions
July 24, 2006 at 5:52 pm

#191578

I'm trying to count common items between pairs of sets. Basically, the job is to rationalise redundant sets of items by making appropriate supersets based on sets with high degrees of commonality. In essence, my data table looks something like this:
set1   item1
set1   item2
set1   item3
set2   item4
set3   item5
...      ...
setN   itemM
The 'items' column includes considerable redundancy. For example, the value of item4 might be identical to that of item1, meaning that this same value is contained in both set1 and set2 (and maybe a lot of other sets). N is ~ 10,000 and M is ~ 60,000.
I want to build an N x N table in which each cell (i,j) contains the percentage of items in set_i that are also contained in set_j. This table would then serve as a lookup during the subsequent rationalisation process.
I can write an SQL statement that calculates the number of items from set2 that are common to set1, by hard-coding the values of these fields, but I can't figure out how to do this in a general way (i.e. by rows, or in a single shot for the whole array). I am also unsure of how to populate the table using code (in MS Access).
I suppose someone out there has done something like this, but my search keywords have brought me no joy.
Thanks!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 1

Will this code do?

-- prepare test data

declare @table table (s int, item int)

insert @table

select 1, 1 union all

select 1, 2 union all

select 1, 3 union all

select 2, 2 union all

select 3, 5

-- do the work

select t1.s s1,

t2.s s2,

1.0 * sum(case when t1.item = t2.item then 1 else 0 end) / count(distinct t1.item) matching

from @table t1

inner join @table t2 on t2.s <> t1.s

group by t1.s,

t2.s

order by t1.s,

t2.s

Good luck! Be aware that this will produce a 100 million row resultset.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 2

This will produce only the pairs where there are matches, which will be much smaller resultset that previous

-- prepare test data

declare @table table (s int, item int)

insert @table

select 1, 1 union all

select 1, 2 union all

select 1, 3 union all

select 2, 2 union all

select 3, 5

-- do the work

select t1.s s1,

t2.s s2,

1.0 * sum(case when t1.item = t2.item then 1 else 0 end) / count(distinct t1.item) matching

from @table t1

inner join @table t2 on t2.s <> t1.s

inner join (

select t1.s s1,

t2.s s2

from @table t1

inner join @table t2 on t2.s <> t1.s and t2.item = t1.item

) z on t1.s = s1 and t2.s = s2

group by t1.s,

t2.s

order by t1.s,

t2.s

N 56°04'39.16"
E 12°55'05.25"