February 12, 2012 at 5:05 pm
Hi,
I am trying to figure if there is a way to group partitioned data based on duplication criteria.
Please see the attached file for test data and resulting output that I am trying to accomplish.
It is easier to explain through the example attached. Also, pasting the content of the attachment here.
--Table with unique itemnumbers with their attributes
create table T1
(
ItemNum int not null primary key,
Attribute1 varchar(100) null,
Attribute2 varchar(100) null,
Attribute3 varchar(100) null
)
--Populating the table with Itemnumbers and their attributes.
--Some items numbers has matching attributes with other items and they are considered as duplicate --items
insert into T1
values
(1,'AAA','BBB','CCC'),
(2,'DDD','EEE','FFF'),
(3,'GGG','HHH','III'),
(4,'AAA','BBB','CCC'),
(5,'DDD','EEE','FFF'),
(6,'GGG','HHH','III'),
(7,'AAA','BBB','CCC'),
(8,'DDD','EEE','FFF'),
(9,'JJJ','KKK','LLL'),
(10,'JJJ','KKK','LLL')
--Result Desired, the items are considered duplicates based on matching values for Attribute1,
--Attribute2, and Attribute3. For example, item 1,4,and 7 are duplicates based on their attributes and --hence are labeled aa GroupA.
Item Category
1 GroupA
4 GroupA
7 GroupA
2 GroupB
5 GroupB
8 GroupB
3 GroupC
6 GroupC
9 GroupD
10 GroupD
--I am starting off with following script to partition the data, however,not sure how to label them into groups.
select
ItemNum,
row_number() over(partition by Attribute1,Attribute2,Attribute3 order by ItemNum) as Rownum
from T1
order by ItemNum
February 13, 2012 at 12:59 am
Hi,
I wanted to edit the title of this post but I believe that is not possible.
I wanted review the title of this post to state " Grouping unique items based on matching attributes" instead as partitioning word might confuse this topic with table/data partitioning.
Any feedback to the post will be appreciated.
Thanks
February 13, 2012 at 3:33 pm
I am using the following query to partition the items based on matching attributes, however, not sure if there is any function which will label them into groups based on the matching attributes.
select
ItemNum,
row_number() over(partition by Attribute1,Attribute2,Attribute3 order by ItemNum) as Rownum
from T1
order by ItemNum
February 13, 2012 at 8:34 pm
SELECT
ItemNum,
Category =
'Group' + CHAR(64 +
DENSE_RANK() OVER(
ORDER BY Attribute1,Attribute2,Attribute3))
FROM T1;
February 14, 2012 at 9:06 am
SQL Kiwi (2/13/2012)
SELECT
ItemNum,
Category =
'Group' + CHAR(64 +
DENSE_RANK() OVER(
ORDER BY Attribute1,Attribute2,Attribute3))
FROM T1;
The more I learn about the rank() functions, the more impressed I am at how they can be used. Thanks Paul!
Jared
CE - Microsoft
February 14, 2012 at 2:38 pm
Thanks a lot Paul for the tip.
I was guessing it would be a rank functions but since I never used before, could not get it to work.
Is there way in this solution to only return the items that have matching attribute with another items?
The only thing I care about is the to find the items that have matching attribute as they are duplicates in my case.
Thanks
February 15, 2012 at 12:39 am
Thanks a lot Paul for the tip.
February 15, 2012 at 1:13 pm
Similar to Paul's suggestion:
SELECT
ItemNum,
ROW_NUMBER() OVER(PARTITION BY Attribute1,Attribute2,Attribute3 ORDER BY (SELECT NULL)) as DupNR
FROM T1;
All rows having a value higher than 1 in DupNR are duplicates of the row that has DupNR = 1 (with no preference on the row that does get DupNR = 1). Put the above query in a cte and you can use it in a delete statement directly, like so:
WITH cte AS
(
SELECT
ItemNum,
ROW_NUMBER() OVER(PARTITION BY Attribute1,Attribute2,Attribute3 ORDER BY (SELECT NULL)) AS DupNR
FROM T1
)
DELETE cte WHERE DupNR > 1
February 15, 2012 at 1:20 pm
Hi Rozema,
I am not trying to delete but I am rather trying to display the ones that are selected as duplicates.
This requirement makes it trickier.
Thanks
R.P.Rozema (2/15/2012)
Similar to Paul's suggestion:
SELECT
ItemNum,
ROW_NUMBER() OVER(PARTITION BY Attribute1,Attribute2,Attribute3 ORDER BY (SELECT NULL)) as DupNR
FROM T1;
All rows having a value higher than 1 in DupNR are duplicates of the row that has DupNR = 1 (with no preference on the row that does get DupNR = 1). Put the above query in a cte and you can use it in a delete statement directly, like so:
WITH cte AS
(
SELECT
ItemNum,
ROW_NUMBER() OVER(PARTITION BY Attribute1,Attribute2,Attribute3 ORDER BY (SELECT NULL)) AS DupNR
FROM T1
)
DELETE cte WHERE DupNR > 1
February 15, 2012 at 1:24 pm
Ok, then don't delete but just select the rows that have DupNR > 1. Or do I not understand your requirements correctly?
February 15, 2012 at 1:25 pm
R.P.Rozema (2/15/2012)
Similar to Paul's suggestion:
SELECT
ItemNum,
ROW_NUMBER() OVER(PARTITION BY Attribute1,Attribute2,Attribute3 ORDER BY (SELECT NULL)) as DupNR
FROM T1;
All rows having a value higher than 1 in DupNR are duplicates of the row that has DupNR = 1 (with no preference on the row that does get DupNR = 1). Put the above query in a cte and you can use it in a delete statement directly, like so:
WITH cte AS
(
SELECT
ItemNum,
ROW_NUMBER() OVER(PARTITION BY Attribute1,Attribute2,Attribute3 ORDER BY (SELECT NULL)) AS DupNR
FROM T1
)
DELETE cte WHERE DupNR > 1
No, just change the DELETE to SELECT * FROM
WITH cte AS
(
SELECT
ItemNum,
ROW_NUMBER() OVER(PARTITION BY Attribute1,Attribute2,Attribute3 ORDER BY (SELECT NULL)) AS DupNR
FROM T1
)
SELECT * FROM cte WHERE DupNR > 1
Jared
CE - Microsoft
February 15, 2012 at 1:39 pm
I also need the record that has DupNR > 1 as part of the results.
What I don't want is the rows that only have DupNR = 1, i.e. that are not duplicates based on my criteria.
That is the reason why I was trying to label them and Paul's solutions works for that.
I am trying to filter out the rows that have matching labels from Paul's solution but not able to.
Does that makes sense?
February 15, 2012 at 1:57 pm
Combine the both solutions. Take Paul's DENSE_RANK() to generate a unique name per 'group' and add another column, using my ROW_NUMBER() suggestion to give each individual member of the group a number from 1 up to the number of members. Does that answer your questions?
February 15, 2012 at 2:25 pm
Actually I did that, however I am stuck at how to only select the one that are duplicates and not the ones that are not duplicates. The row_number() function will label the same groups as 1,2,3..so on..the records are the only one in the group will have just 1 as run_number. The problem is how to only select the rows with consecutive row numbers.
February 15, 2012 at 5:50 pm
bdba (2/15/2012)
Actually I did that, however I am stuck at how to only select the one that are duplicates and not the ones that are not duplicates. The row_number() function will label the same groups as 1,2,3..so on..the records are the only one in the group will have just 1 as run_number. The problem is how to only select the rows with consecutive row numbers.
Please provide clear sample data and *expected output*. Any one of us can provide a good T-SQL solution if the requirement is explained well...
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply