Interesting query - remove duplicates from my table

Question

Interesting query - remove duplicates from my table

Oksana March

SSChasing Mays

Points: 653
More actions
September 26, 2012 at 7:11 am

#264912

Hi everyone,
I need help understanding something.
The situation is - I have a table that stores comments about shipments, and sometimes the same comment is in a table several times. So I had to write some code to delete duplicate comments.
I could not find a way without using some programming language and loop through each row, but I asked a colleague from India and he gave me this query:
DELETE NestleComments FROM NestleComments d
LEFT OUTER JOIN
(SELECT ManagerComment, pk = MIN(PK)
FROM NestleComments
GROUP BY ManagerComment) x
ON d.pk = x.pk
WHERE x.pk IS NULL
GO
It works perfect and deletes the duplicate comments! But I don't understand why! I looked at it like 100 times but don't get it. But I don't want to give up, I want to learn why it works! Can someone please explain what happens in this query? Thank you!

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183706 More actions · Answer 1

~~Are you sure that works?~~

~~For me it has errors. There's no such thing as "DELETE field FROM Table" as it will delete the entire row. And AFAIK you can't use JOINs in a DELETE statement.~~You should post DDL and sample data and explain why are you using 2 tables ( NestleComments & Dup).

Here's an example of what you can do if you have an ID field.

DECLARE @Tabletable(

tableidint identity,

stringvarchar(15))

INSERT @Table(string)

SELECT 'a' UNION ALL

SELECT 'b' UNION ALL

SELECT 'c'

SELECT * FROM @Table

DELETE FROM @Table

WHERE tableid NOT IN (SELECT MIN(tableid)

FROM @Table

GROUP BY string)

SELECT * FROM @Table

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sean Lange SSC Guru Points: 286609 More actions · Answer 2

Are you sure that works?
For me it has errors. There's no such thing as "DELETE field FROM Table" as it will delete the entire row. And AFAIK you can't use JOINs in a DELETE statement.
You should post DDL and sample data and explain why are you using 2 tables ( NestleComments & Dup).

The code is not deleting a field it is deleting the table. This is perfectly valid syntax. It is sql server specific but it is much the same as UPDATE FROM.

Delete TableName

from TableName

join OtherTable ...

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Sean Lange SSC Guru Points: 286609 More actions · Answer 3

Oksana March (9/26/2012)
Hi everyone,
I need help understanding something.
The situation is - I have a table that stores comments about shipments, and sometimes the same comment is in a table several times. So I had to write some code to delete duplicate comments.
I could not find a way without using some programming language and loop through each row, but I asked a colleague from India and he gave me this query:
DELETE NestleComments FROM NestleComments d
LEFT OUTER JOIN
(SELECT ManagerComment, pk = MIN(PK)
FROM Dup
GROUP BY ManagerComment) x
ON d.pk = x.pk
WHERE x.pk IS NULL
GO
It works perfect and deletes the duplicate comments! But I don't understand why! I looked at it like 100 times but don't get it. But I don't want to give up, I want to learn why it works! Can someone please explain what happens in this query? Thank you!

As Luis said, without ddl and some sample data it is hard to say exactly but the gist of it is pretty straight forward.

You have two tables NestleComments and another called Dup. I assume that this table Dup contains a copy of the comment when it is duplicated? Maybe it will help your understanding if you change this into a select.

Below is the query that will return all the rows to delete.

Select *

FROM NestleComments d

LEFT OUTER JOIN

(SELECT ManagerComment, pk = MIN(PK)

FROM Dup

GROUP BY ManagerComment) x

ON d.pk = x.pk

WHERE x.pk IS NULL

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

Sean Lange (9/26/2012)
Are you sure that works?
For me it has errors. There's no such thing as "DELETE field FROM Table" as it will delete the entire row. And AFAIK you can't use JOINs in a DELETE statement.
You should post DDL and sample data and explain why are you using 2 tables ( NestleComments & Dup).
The code is not deleting a field it is deleting the table. This is perfectly valid syntax. It is sql server specific but it is much the same as UPDATE FROM.
Delete TableName
from TableName
join OtherTable ...

Oh, you're right. I got confused and thought it was a field (not sure why). Now it makes sense.

What does not makes sense is the "Dup" table.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

laurie-789651 SSCertifiable Points: 7680 More actions · Answer 5

It looks as though dup is a copy of NestleComments.

Assuming this is so:

The subquery gets the minimum PK for the comment. This is joined to NestleComments on PK. This means all comments in NestleComments with PK <> minimum PK show a NULL in the join, on the x-side.

Any records where x.PK is null are deleted - i.e. any where PK <> minimum PK!

I think thats right... 🙂

Oksana March SSChasing Mays Points: 653 More actions · Answer 6

Sorry, my bad. Of course Dup is an exact copy of NestleComments which is a table I made for testing. I have corrected the original query.

Thanks everyone for responding. I think I started to understand after turning it into SELECT as Sean suggested. Thanks, Sean! The SELECT returns only the records that need to be deleted.

I still have questions...

I have never seen this syntax "delete table from table". When is it used?

What is the value of x in my query? Is x like a table with records? Does it loop through all the records in x and compare the PK? Didn't think this is possible with simple query, only with cursor 😉

Sean Lange SSC Guru Points: 286609 More actions · Answer 7

Oksana March (9/26/2012)
Sorry, my bad. Of course Dup is an exact copy of NestleComments which is a table I made for testing. I have corrected the original query.
Thanks everyone for responding. I think I started to understand after turning it into SELECT as Sean suggested. Thanks, Sean! The SELECT returns only the records that need to be deleted.
I still have questions...
I have never seen this syntax "delete table from table". When is it used?
What is the value of x in my query? Is x like a table with records? Does it loop through all the records in x and compare the PK? Didn't think this is possible with simple query, only with cursor 😉

If Dup is a copy of NestleComments then it is not needed for this at all. You can just do a self join (in fact, I see you have edited your original post doing this).

x is the alias given to the derived table. It is exactly like a table but it is derived for the purpose of this query. This DOES NOT LOOP, it is a join. In this case it is just joining to another query instead of a base table. You can use this technique when you want to delete rows that meet a certain condition like in this example.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

laurie-789651 SSCertifiable Points: 7680 More actions · Answer 8

DELETE table FROM table... is used when you want to use a something like a select statement with joins (usually) to control which records you delete. You leave off 'SELECT *' & use the FROM... onwards. If the condition is within the single table you can just say 'DELETE table WHERE ...'.

x is just an alias for the query in brackets. The results of the query are treated exactly as if it was a real table, & x is the alias of that table. It's not a loop - it's a set-based operation like this:

SELECT ManagerComment, MIN(PK) FROM table GROUP BY ManagerComment.

Rob Schripsema SSCertifiable Points: 7469 More actions · Answer 9

Oksana March (9/26/2012)
DELETE NestleComments FROM NestleComments d
LEFT OUTER JOIN
(SELECT ManagerComment, pk = MIN(PK)
FROM NestleComments
GROUP BY ManagerComment) x
ON d.pk = x.pk
WHERE x.pk IS NULL
GO
It works perfect and deletes the duplicate comments! But I don't understand why! I looked at it like 100 times but don't get it. But I don't want to give up, I want to learn why it works! Can someone please explain what happens in this query? Thank you!

Let me have a try at explaining (though I think those who have responded before me have probably given you enough info already):

Let's say you start with data like this:

PKManagerComment

1Some Comment

2Another Comment

3Stupid Comment

4Some Comment

5No Comment

6Stupid Comment

The subquery (in quotes) will group by the ManagerComment field and return a sort of virtual table with the smallest PK value for each duplicate comment. In this example, like this:

PKManagerComment

1Some Comment

2Another Comment

3Stupid Comment

5No Comment

The main query refers to this virtual table with the alias 'x', and the original table as 'd'. So, a left outer join would give you results like this:

d.PKd.ManagerCommentx.PKx.ManagerComment

1Some Comment1Some Comment

2Another Comment2Another Comment

3Stupid Comment3Stupid Comment

4Some CommentNULLNULL

5No Comment5No Comment

6Stupid CommentNULLNULL

So, the rows where x.PK is NULL have corresponding d.PK values of 4 and 6. These are the duplicates. SQL then deletes the corresponding rows in d. That's the DELETE [FROM] <table> FROM <table> JOIN <othertable>... It's a confusing syntax, and not an ANSI SQL standard, but it works.

Hope that helps.

Rob Schripsema
Propack, Inc.