November 29, 2017 at 11:22 am
Hi,
The code below used to work fine.
It is part of a stored procedure which bulk inserts data from a csv file.
Suddenly, after editing, I keep on getting an error stating that RowNr does not exist.
Any ideas?
Cheers,
Julian
Netherlands
/*##############################################
###
### FIND DUPLICATE ROWS AND DELETE
### THESE ROWS EXEPT ONE
###
############################################## */
-- ADD COLUMN TO STORE ROW NUMBER
--ALTER TABLE Bewoners_STAGING
--ADD RowNr INT;
SET @MYSQL = 'ALTER TABLE Bewoners_STAGING ADD RowNr INT;'
EXEC (@MYSQL);
-- ADD NUMBER OF ROWS TO FIELD RowNr
UPDATE Bewoners_STAGING
SET RowNr = R.RowNr
FROM(
SELECT ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id]) AS RowNr,
zzpclient_id
FROM [Bewoners_STAGING]) R
WHERE R.zzpclient_id = Bewoners_STAGING.zzpclient_id
-- DELETE ROWS WHERE ROW NUMBER > 1
DELETE FROM Bewoners_STAGING WHERE RowNr > 1
November 29, 2017 at 11:44 am
SQL must parse the code before it runs. At parse time, the column "RowNr" indeed does not exist because the code that adds the column to the table hasn't run yet.
You could use dynamic SQL to run the UPDATE as well. I believe that would work.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
November 29, 2017 at 1:31 pm
You're making this much more complicated than it needs to be. You're 1) adding a column, 2) updating the column, 3) deleting rows based on the column values, and presumably 4) dropping the column. You only need one step: delete duplicates.
WITH dupes_cte AS
(
SELECT ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id]) AS RowNr,
zzpclient_id
FROM [Bewoners_STAGING]
)
DELETE dupes_cte
WHERE RowNr > 1
Also notice that your RowNr definition is non-deterministic. This may cause issues down the line.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
November 30, 2017 at 4:24 am
Drew and Scott,
Thank you.
Cheers,
Julian
USE xxx;
GO
WITH MYCTE AS
(
SELECT COUNT (*) OVER (PARTITION BY zzpclient_id) AS [Rows],
ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id], [startdate] desc) AS [RowNr],
*
FROM CLIENTENOVERZICHT
)
SELECT * FROM MYCTE
WHERE RowNr = 1
ORDER BY ROWS DESC, RowNr ASC
November 30, 2017 at 2:01 pm
drew.allen - Wednesday, November 29, 2017 1:31 PMYou're making this much more complicated than it needs to be. You're 1) adding a column, 2) updating the column, 3) deleting rows based on the column values, and presumably 4) dropping the column. You only need one step: delete duplicates.
WITH dupes_cte AS
(
SELECT ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id]) AS RowNr,
zzpclient_id
FROM [Bewoners_STAGING]
)DELETE dupes_cte
WHERE RowNr > 1Also notice that your RowNr definition is non-deterministic. This may cause issues down the line.
Drew
--------------------------------------------USE XXX;
GO
WITH MYCTE AS
(
SELECT
--COUNT (*) OVER (PARTITION BY zzpclient_id) AS [Rows],
ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id], [startdate] desc) AS [RowNrS],
*
FROM testBewonersSTAGING
)DELETE FROM MYCTE
WHERE RowNrS > 1
Hi Drew,
Now I see what you mean. Great!
Q: why does the delete statement "do its work" on the Staging table?
Cheers,
Julian
November 30, 2017 at 2:24 pm
JJR333 - Thursday, November 30, 2017 2:01 PMHi Drew,
Now I see what you mean. Great!
Q: why does the delete statement "do its work" on the Staging table?
Cheers,
Julian
This follows the rules of "updatable views" as either a view or cte don't have any data stores and simply reference a table (or group of tables), any DML will affect those underlying tables.
Also, ROW_NUMBER is deterministic when the PARTITION and ORDER clauses refer to unique rows and not duplicates.
November 30, 2017 at 2:30 pm
Thank you Luis!
November 30, 2017 at 5:55 pm
Deletes are pretty expensive and they're fully logged even in the SIMPLE recovery model. Why not just use the generated "dupe number" (row number) to process only the data you want?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 3, 2017 at 5:26 am
Thank you Jeff, have to look into that.
Cheers,
Julian
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply