Eliminating Duplicates

Question

Post reply

Eliminating Duplicates

olsentimothy03

Old Hand

Points: 329
More actions
October 15, 2009 at 11:09 am

#385587

Hi experts,
I have a question that usually come up in the sql community but with a different twist (i think). I have a table that I am trying to eliminate duplicate data from. The only twist here is that the lngCompanyID, shtCompanyID and dtmModified must all be the same in different rows before it is considered duplicate. If 2 of the columns are the same and 1 is different, it won't be duplicate.
Is there a different way to go about this other than the regular eliminating duplicate "having count / rownumber() > 1" routine?
Below is my test data
CREATE TABLE tDupData
(
lngCompanyID INT,
shtCompanyID INT,
strCompanyName VARCHAR(20),
strAddress VARCHAR(10),
dtmModified DATETIME
)
--Create test data
INSERT INTO tDupData VALUES (1,10,'CompanyOne','Address1','01/15/2003')
INSERT INTO tDupData VALUES (2,20,'CompanyTwo','Address2','01/15/2003')
INSERT INTO tDupData VALUES (3,30,'CompanyThree','Address3','01/15/2003')
INSERT INTO tDupData VALUES (2,40,'CompanyTwo','Address','01/16/2003')
INSERT INTO tDupData VALUES (3,50,'CompanyThree','Address','01/16/2003')
-- Duplicate Data
INSERT INTO tDupData VALUES (1,10,'CompanyOne','Address1','01/15/2003')
GO
Thanks for reading

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Ron McCullough SSC Guru Points: 63877 More actions · Answer 1

Try this:

;with numbered as(SELECT rowno=row_number() over

(partition by lngCompanyID,shtCompanyID,dtmModified order by lngCompanyID)

, lngCompanyID,shtCompanyID,dtmModified,strAddress from #tDupData)

select * from numbered

Resulting in:

rownolngCompanyIDshtCompanyIDdtmModified strAddress

11 10 2003-01-15 00:00:00.000Address1

21 10 2003-01-15 00:00:00.000Address1

12 20 2003-01-15 00:00:00.000Address2

12 40 2003-01-16 00:00:00.000Address

13 30 2003-01-15 00:00:00.000Address3

13 50 2003-01-16 00:00:00.000Address

If after checking you can replace select * from numbered with a DELETE FROM #TdupData WHERE Rowno > 1 .. Check results again before moving to production.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

olsentimothy03 Old Hand Points: 329 More actions · Answer 2

Right on the money. I really appreciate your input. So just pretty much the same thing but include all the 3 columns in the "partition by". Sweeet. You are the best !!!

Ron McCullough SSC Guru Points: 63877 More actions · Answer 3

olsentimothy03
So just pretty much the same thing

No not at all close to the same thing. Using a group by with the HAVING clause

SELECT lngCompanyID,shtCompanyID,dtmModified, COUNT(*) AS '#Dup Entries' FROM #tDupData

GROUP BY lngCompanyID,shtCompanyID,dtmModified

HAVING COUNT(*) > 1;

Returned:

lngCompanyIDshtCompanyIDdtmModified #Dup Entries

1 102003-01-15 00:00:00.000 2

Here is where the difference makes life easy. With above result, how do indicate which single row to delete? Easy enough to delete both rows, but as your table was construced not a easy task to code to delete just one row, and think of the difficulty, if you had said 100 or 150 duplicate rows with some individnual rows being duplicated more than once.

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

olsentimothy03 Old Hand Points: 329 More actions · Answer 4

Understood. I have definitely learnt a lot from you today. Thanks again