January 19, 2016 at 5:28 pm
Hi All,
I am not sure if what i would like to do is possible but thought i would ask. I am doing a simple update statement to clean up some data, after running it produces the error message in italics.
So I am wondering if it possible to pass the key values in to a variable then use that in a further delete statement to remove the duplicate record. Perhaps nest both update\delete statements in a Try Catch?
UPDATE [dbo].[xxx]
SET [Supplier Number] = REPLACE([Supplier Number], '"', '')
WHERE CHARINDEX('"', [Supplier Number]) <> 0
Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'pk_tbl_bcc_supplier'. Cannot insert duplicate key in object 'dbo.Supplier'. The duplicate key value is (01175, 80).
January 19, 2016 at 11:15 pm
I have been taught that relying on exceptions for Control Flow is a poor design pattern. Exceptions are expensive for systems to generate and if we are incorporating them into our logic then surely we can avoid them. Exceptions are by name supposed to be exceptional conditions we could not reasonably foresee. Instead of building the mousetrap why not just issue the delete before you try inserting to remove the rows in the way of your update? If the delete statement deletes 0 rows no problem but this way your update should have a high success rate.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 20, 2016 at 2:42 am
Quick question, how are you going to handle the duplicate values?
😎
The easiest solution is to dedupe the set before the update.
January 20, 2016 at 2:59 am
Thanks for the replies. The plan is to delete duplicate records, there's nearly 100k records hence I don't want to manually enter the values from the error message.
January 20, 2016 at 3:54 am
Why not just delete all the duplicates in an initial pre-processing step, rather than one at a time after an error?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 20, 2016 at 4:25 am
There not duplicate until I run the update and remove the double quotes in the column.
January 20, 2016 at 4:41 am
ringovski (1/20/2016)
There not duplicate until I run the update and remove the double quotes in the column.
If the true value is repeated, regardless of what extra characters are in the data then these are indeed duplicates. Suggest you deduplicate the set based on the true/clean value before processing the data.
😎
January 20, 2016 at 9:59 am
ringovski (1/20/2016)
There not duplicate until I run the update and remove the double quotes in the column.
Yes, but you know the current state and desired end state.
Check this out:
USE tempdb;
GO
IF EXISTS ( SELECT *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'dbo.Supplier')
AND type IN (N'U') )
DROP TABLE dbo.Supplier;
GO
CREATE TABLE dbo.Supplier
(
id INT NOT NULL
PRIMARY KEY,
[Supplier Number] VARCHAR(10) NOT NULL UNIQUE
);
INSERT INTO dbo.Supplier
(id, [Supplier Number])
VALUES (1, '"abc"'),
(2, 'abc');
IF OBJECT_ID(N'tempdb..#tmp_Supplier') IS NOT NULL
DROP TABLE #tmp_Supplier;
SELECT MIN(id) AS id
INTO #tmp_Supplier
FROM dbo.Supplier
GROUP BY REPLACE([Supplier Number], '"', '');
SELECT *
FROM #tmp_Supplier;
MERGE INTO dbo.Supplier AS target
USING #tmp_Supplier AS source
ON source.id = target.id
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
SELECT *
FROM dbo.Supplier;
UPDATE dbo.Supplier
SET [Supplier Number] = REPLACE([Supplier Number], '"', '')
WHERE CHARINDEX('"', [Supplier Number]) <> 0;
SELECT *
FROM dbo.Supplier;
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 21, 2016 at 10:14 pm
Thanks Orlando, I will try your's now my idea was something like this but it not finished yet.
DECLARE @II INT, @iiMax INT, @Code VARCHAR(100),@corporate_id VARCHAR(100)
DECLARE @TemporaryStagingTable TABLE(id INT IDENTITY(1,1) PRIMARY KEY,supplier_id CHAR(100), Corporate_id CHAR(100) )
DECLARE @Error TABLE (Error_ID INT IDENTITY(1,1) PRIMARY KEY,ErrorCode INT,PostCode VARCHAR(100),TransactionState INT,ErrorMessage VARCHAR(255))
INSERT INTO @TemporaryStagingTable (supplier_id,Corporate_id)
SELECT [Supplier Number],[Corporate Group ID]
FROM [dbo].[Supplier]
WHERE CHARINDEX('"', [Supplier Number]) <> 0
SELECT @ii=MIN(id)
,@iiMax=MAX(id)
,@code = (select supplier_id from @TemporaryStagingTable where supplier_id = @II)
,@corporate_id =(select Corporate_id from @TemporaryStagingTable where supplier_id = @II)
FROM @TemporaryStagingTable
WHILE @ii<=@iiMax
BEGIN
BEGIN try
UPDATE [dbo].[Supplier]
SET [Supplier Number] = REPLACE([Supplier Number], '"', '')
WHERE CHARINDEX('"', [Supplier Number]) <> 0
END try
BEGIN CATCH
INSERT INTO @error(ErrorCode, PostCode,TransactionState,ErrorMessage)
SELECT ERROR_NUMBER(), @Code, XACT_STATE(), ERROR_MESSAGE()
DELETE FROM [dbo].[Supplier]
WHERE [Supplier Number] = @II and [Corporate Group ID] = @corporate_id --can i pass in the error values??
END CATCH;
SELECT @ii=@ii+1
END
SELECT * FROM @error
January 21, 2016 at 10:48 pm
Thanks it worked, the group by replace is a good one didn't think of that I ended up with this.
IF OBJECT_ID(N'tempdb..#tmp_Supplier') IS NOT NULL
DROP TABLE #tmp_Supplier;
SELECT MIN([Supplier Number]) AS id
INTO #tmp_Supplier
FROM [dbo].[Supplier2]
GROUP BY REPLACE([Supplier Number], '"', '');
SELECT *
FROM #tmp_Supplier;
MERGE INTO [dbo].[Supplier] AS target
USING #tmp_Supplier AS source
ON source.id = target.[Supplier Number]
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
SELECT *
FROM [dbo].[Supplier];
UPDATE [dbo].[Supplier]
SET [Supplier Number] = REPLACE([Supplier Number], '"', '')
WHERE CHARINDEX('"', [Supplier Number]) <> 0;
SELECT *
FROM [dbo].[Supplier];
January 21, 2016 at 11:01 pm
Quick suggestion, a different and more efficient way of doing this
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @SAMPLE_SIZE INT = 1000;
;WITH T(N) AS (SELECT N FROM ( VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS X(N))
, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) 0 AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
,SAMPLE_DATA(SD_ID,SD_STR) AS
(
SELECT
ROW_NUMBER() OVER
(
ORDER BY (SELECT NULL)
) AS SD_ID
,CASE
WHEN ABS(CHECKSUM(NEWID())) % 3 = 0 THEN '"'
ELSE ''
END
+ CHAR(48 + (ABS(CHECKSUM(NEWID())) % 10))
+ CHAR(48 + (ABS(CHECKSUM(NEWID())) % 10))
+ CHAR(48 + (ABS(CHECKSUM(NEWID())) % 10))
+ CASE
WHEN ABS(CHECKSUM(NEWID())) % 3 = 0 THEN '"'
ELSE ''
END AS SD_STR
FROM NUMS NM
)
,DEDUPED_DATA AS
(
SELECT
SD.SD_ID
,SD.SD_STR
,ROW_NUMBER() OVER
(
PARTITION BY REPLACE(SD.SD_STR,CHAR(49),'')
ORDER BY (SELECT NULL)
) AS DD_RID
FROM SAMPLE_DATA SD
)
SELECT
DD.SD_ID
,DD.SD_STR
FROM DEDUPED_DATA DD
-- CHANGE TO > 1 TO FIND THE DUPLICATES.
WHERE DD.DD_RID = 1;
January 22, 2016 at 7:59 am
@ringovski, you're welcome, happy you got something suitable going.
@Eirikur, agreed your solution should improve performance. It took me a second to cull out the test data creation from the solution to see what you were doing. I see a couple things I would call out about the results, 1. the replace uses CHAR(49) but I think it should use CHAR(34) to replace double-quotes. 2) I am not sure it is relevant to the solution but there does not seem to be a tiebreaker built in. ringovski is effectively using MIN(Supplier_Number) including the quotes and mine was using a MIN surrogate ID (I assumed there was one). I was thinking a tiebreaker could be implemented in the ORDER BY of the ROW_NUMBER() in DEDUPED_DATA.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply