Label Partitioning Data based on matching attributes

Question

Label Partitioning Data based on matching attributes

Viewing 2 posts - 16 through 16 (of 16 total)

You must be logged in to reply to this topic. Login to reply

R.P.Rozema SSChampion Points: 12317 More actions · Answer 1

After a night of sleep I think I understand what you mean: You want all the rows for which at least one duplicate exists, and leave off any rows for which no duplicate exists. Your test data did not have any rows for which no duplicate exists, so I've added one row, a row with ItemNum 11 and attributes 'ABC'. This row will not be in the results but for all other 10 rows at least one duplicate exists, so all 10 will be returned.

Is this what you are looking for?

insert into T1

values

(1,'AAA','BBB','CCC'),

(2,'DDD','EEE','FFF'),

(3,'GGG','HHH','III'),

(4,'AAA','BBB','CCC'),

(5,'DDD','EEE','FFF'),

(6,'GGG','HHH','III'),

(7,'AAA','BBB','CCC'),

(8,'DDD','EEE','FFF'),

(9,'JJJ','KKK','LLL'),

(10,'JJJ','KKK','LLL'),

(11,'ABC','ABC','ABC')

select Category =

'Group' + CHAR(64 +

DENSE_RANK() OVER(

ORDER BY Attribute1,Attribute2,Attribute3)),

t.*

from T1 t

where exists (

select *

from T1 a

where t.Attribute1 = a.Attribute1

and t.Attribute2 = a.Attribute2

and t.Attribute3 = a.Attribute3

and t.ItemNum <> a.ItemNum

)

This outputs:

Category ItemNum Attribute1 Attribute2 Attribute3

-------- ----------- ---------- ---------- ----------

GroupA 1 AAA BBB CCC

GroupA 4 AAA BBB CCC

GroupA 7 AAA BBB CCC

GroupB 8 DDD EEE FFF

GroupB 5 DDD EEE FFF

GroupB 2 DDD EEE FFF

GroupC 3 GGG HHH III

GroupC 6 GGG HHH III

GroupD 9 JJJ KKK LLL

GroupD 10 JJJ KKK LLL

(10 row(s) affected)

Posting Data Etiquette - Jeff Moden[/url]
Posting Performance Based Questions - Gail Shaw[/url]
Hidden RBAR - Jeff Moden[/url]
Cross Tabs and Pivots - Jeff Moden[/url]
Catch-all queries - Gail Shaw[/url]

If you don't have time to do it right, when will you have time to do it over?

drew.allen SSC Guru Points: 76995 More actions · Answer 2

R.P.Rozema (2/15/2012)
Combine the both solutions. Take Paul's DENSE_RANK() to generate a unique name per 'group' and add another column, using my ROW_NUMBER() suggestion to give each individual member of the group a number from 1 up to the number of members. Does that answer your questions?

Instead of using ROW_NUMBER(), you should be using COUNT(*) OVER( PARTITION BY <your fields> )

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA