January 19, 2010 at 3:47 am
The other day i was working on a number of deduplication routines to remove duplicates from a table via various record attributes. this involved identifying duplicates and recording the record ID in a temporary table to then allow me to delete them from the master table.
However when i had finished and ran the sql it removed all the data (3.1million rows) from the table and not the data i expected (133k).
I have reworked my sql to post it up on here and removed teh deduplication.
If you run the sql asis you will see that 10 records are deleted, but then if swap the commented out delete for the commented in delete and then re-run you will find it deletes 5 records.
This is because the column name in the subquery on teh first run is not teh column from the table being queried in the subquery, however sql is still allowing this to run. Hope i have explained this well enough...
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects] o WHERE o.[name] = N'##SourceDataTable')
DROP TABLE ##SourceDataTable
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects] o WHERE o.[name] = N'##DataToDelete')
DROP TABLE ##DataToDelete
CREATE TABLE ##SourceDataTable
(
System_id INT
)
GO
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 0 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 1 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 2 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 3 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 4 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 5 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 6 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 7 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 8 )
INSERT INTO ##SourceDataTable ( System_id ) VALUES ( 9 )
GO
CREATE TABLE ##DataToDelete
(
SystemID INT
)
GO
INSERT INTO ##DataToDelete ( SystemID )
SELECT System_ID
FROM ##SourceDataTable
WHERE System_id BETWEEN 1 AND 5
--DELETE FROM ##SourceDataTable WHERE System_id IN ( SELECT Systemid FROM ##DataToDelete )
DELETE FROM ##SourceDataTable WHERE System_id IN ( SELECT System_id FROM ##DataToDelete )
January 19, 2010 at 4:10 am
Is there a question here, or is this an observation?
You can always reference a column from an outer table within a subquery. It may look odd, but it is valid and SQL's doing exactly what you tell it to. This is one reason why you should always qualify the column names with the tables.
so
DELETE FROM ##SourceDataTable st WHERE st.System_id IN ( SELECT dd.Systemid FROM ##DataToDelete dd )
DELETE FROM ##SourceDataTable st WHERE st.System_id IN ( SELECT dd.System_id FROM ##DataToDelete dd)
Now if you've accidentally used the wrong column name you'll get an error.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 19, 2010 at 4:17 am
This was more of an observation; and yes school boy error with sql as i do normally alias my tables and refer to them when adding in cloumns, and in this instance i didnt which due to the size of the script i had written made it harder to identify.
I wont be making it again !
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply