September 26, 2012 at 7:11 am
Hi everyone,
I need help understanding something.
The situation is - I have a table that stores comments about shipments, and sometimes the same comment is in a table several times. So I had to write some code to delete duplicate comments.
I could not find a way without using some programming language and loop through each row, but I asked a colleague from India and he gave me this query:
DELETE NestleComments FROM NestleComments d
LEFT OUTER JOIN
(SELECT ManagerComment, pk = MIN(PK)
FROM NestleComments
GROUP BY ManagerComment) x
ON d.pk = x.pk
WHERE x.pk IS NULL
GO
It works perfect and deletes the duplicate comments! But I don't understand why! I looked at it like 100 times but don't get it. But I don't want to give up, I want to learn why it works! Can someone please explain what happens in this query? Thank you!
September 26, 2012 at 7:23 am
Are you sure that works?
For me it has errors. There's no such thing as "DELETE field FROM Table" as it will delete the entire row. And AFAIK you can't use JOINs in a DELETE statement.You should post DDL and sample data and explain why are you using 2 tables ( NestleComments & Dup).
Here's an example of what you can do if you have an ID field.
DECLARE @Tabletable(
tableidint identity,
stringvarchar(15))
INSERT @Table(string)
SELECT 'a' UNION ALL
SELECT 'a' UNION ALL
SELECT 'b' UNION ALL
SELECT 'b' UNION ALL
SELECT 'b' UNION ALL
SELECT 'c'
SELECT * FROM @Table
DELETE FROM @Table
WHERE tableid NOT IN (SELECT MIN(tableid)
FROM @Table
GROUP BY string)
SELECT * FROM @Table
September 26, 2012 at 7:42 am
Are you sure that works?
For me it has errors. There's no such thing as "DELETE field FROM Table" as it will delete the entire row. And AFAIK you can't use JOINs in a DELETE statement.
You should post DDL and sample data and explain why are you using 2 tables ( NestleComments & Dup).
The code is not deleting a field it is deleting the table. This is perfectly valid syntax. It is sql server specific but it is much the same as UPDATE FROM.
Delete TableName
from TableName
join OtherTable ...
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 26, 2012 at 7:46 am
Oksana March (9/26/2012)
Hi everyone,I need help understanding something.
The situation is - I have a table that stores comments about shipments, and sometimes the same comment is in a table several times. So I had to write some code to delete duplicate comments.
I could not find a way without using some programming language and loop through each row, but I asked a colleague from India and he gave me this query:
DELETE NestleComments FROM NestleComments d
LEFT OUTER JOIN
(SELECT ManagerComment, pk = MIN(PK)
FROM Dup
GROUP BY ManagerComment) x
ON d.pk = x.pk
WHERE x.pk IS NULL
GO
It works perfect and deletes the duplicate comments! But I don't understand why! I looked at it like 100 times but don't get it. But I don't want to give up, I want to learn why it works! Can someone please explain what happens in this query? Thank you!
As Luis said, without ddl and some sample data it is hard to say exactly but the gist of it is pretty straight forward.
You have two tables NestleComments and another called Dup. I assume that this table Dup contains a copy of the comment when it is duplicated? Maybe it will help your understanding if you change this into a select.
Below is the query that will return all the rows to delete.
Select *
FROM NestleComments d
LEFT OUTER JOIN
(SELECT ManagerComment, pk = MIN(PK)
FROM Dup
GROUP BY ManagerComment) x
ON d.pk = x.pk
WHERE x.pk IS NULL
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 26, 2012 at 7:58 am
Sean Lange (9/26/2012)
Are you sure that works?
For me it has errors. There's no such thing as "DELETE field FROM Table" as it will delete the entire row. And AFAIK you can't use JOINs in a DELETE statement.
You should post DDL and sample data and explain why are you using 2 tables ( NestleComments & Dup).
The code is not deleting a field it is deleting the table. This is perfectly valid syntax. It is sql server specific but it is much the same as UPDATE FROM.
Delete TableName
from TableName
join OtherTable ...
Oh, you're right. I got confused and thought it was a field (not sure why). Now it makes sense.
What does not makes sense is the "Dup" table.
September 26, 2012 at 8:10 am
It looks as though dup is a copy of NestleComments.
Assuming this is so:
The subquery gets the minimum PK for the comment. This is joined to NestleComments on PK. This means all comments in NestleComments with PK <> minimum PK show a NULL in the join, on the x-side.
Any records where x.PK is null are deleted - i.e. any where PK <> minimum PK!
I think thats right... 🙂
September 26, 2012 at 8:52 am
Sorry, my bad. Of course Dup is an exact copy of NestleComments which is a table I made for testing. I have corrected the original query.
Thanks everyone for responding. I think I started to understand after turning it into SELECT as Sean suggested. Thanks, Sean! The SELECT returns only the records that need to be deleted.
I still have questions...
I have never seen this syntax "delete table from table". When is it used?
What is the value of x in my query? Is x like a table with records? Does it loop through all the records in x and compare the PK? Didn't think this is possible with simple query, only with cursor 😉
September 26, 2012 at 9:00 am
Oksana March (9/26/2012)
Sorry, my bad. Of course Dup is an exact copy of NestleComments which is a table I made for testing. I have corrected the original query.Thanks everyone for responding. I think I started to understand after turning it into SELECT as Sean suggested. Thanks, Sean! The SELECT returns only the records that need to be deleted.
I still have questions...
I have never seen this syntax "delete table from table". When is it used?
What is the value of x in my query? Is x like a table with records? Does it loop through all the records in x and compare the PK? Didn't think this is possible with simple query, only with cursor 😉
If Dup is a copy of NestleComments then it is not needed for this at all. You can just do a self join (in fact, I see you have edited your original post doing this).
x is the alias given to the derived table. It is exactly like a table but it is derived for the purpose of this query. This DOES NOT LOOP, it is a join. In this case it is just joining to another query instead of a base table. You can use this technique when you want to delete rows that meet a certain condition like in this example.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 26, 2012 at 9:04 am
DELETE table FROM table... is used when you want to use a something like a select statement with joins (usually) to control which records you delete. You leave off 'SELECT *' & use the FROM... onwards. If the condition is within the single table you can just say 'DELETE table WHERE ...'.
x is just an alias for the query in brackets. The results of the query are treated exactly as if it was a real table, & x is the alias of that table. It's not a loop - it's a set-based operation like this:
SELECT ManagerComment, MIN(PK) FROM table GROUP BY ManagerComment.
September 26, 2012 at 6:33 pm
Oksana March (9/26/2012)
DELETE NestleComments FROM NestleComments dLEFT OUTER JOIN
(SELECT ManagerComment, pk = MIN(PK)
FROM NestleComments
GROUP BY ManagerComment) x
ON d.pk = x.pk
WHERE x.pk IS NULL
GO
It works perfect and deletes the duplicate comments! But I don't understand why! I looked at it like 100 times but don't get it. But I don't want to give up, I want to learn why it works! Can someone please explain what happens in this query? Thank you!
Let me have a try at explaining (though I think those who have responded before me have probably given you enough info already):
Let's say you start with data like this:
PKManagerComment
1Some Comment
2Another Comment
3Stupid Comment
4Some Comment
5No Comment
6Stupid Comment
The subquery (in quotes) will group by the ManagerComment field and return a sort of virtual table with the smallest PK value for each duplicate comment. In this example, like this:
PKManagerComment
1Some Comment
2Another Comment
3Stupid Comment
5No Comment
The main query refers to this virtual table with the alias 'x', and the original table as 'd'. So, a left outer join would give you results like this:
d.PKd.ManagerCommentx.PKx.ManagerComment
1Some Comment1Some Comment
2Another Comment2Another Comment
3Stupid Comment3Stupid Comment
4Some CommentNULLNULL
5No Comment5No Comment
6Stupid CommentNULLNULL
So, the rows where x.PK is NULL have corresponding d.PK values of 4 and 6. These are the duplicates. SQL then deletes the corresponding rows in d. That's the DELETE [FROM] <table> FROM <table> JOIN <othertable>... It's a confusing syntax, and not an ANSI SQL standard, but it works.
Hope that helps.
Rob Schripsema
Propack, Inc.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply