Preventing Duplicate Rows On An Insert

  • Hi All,

    I'm having issues with duplicate rows being inserted into a table and need some thoughts on the best way to prevent these duplicates from occurring.

    The table in question has a structure like this (edited for simplicity):

    CREATE TABLE t (

    LogId INT NOT NULL IDENTITY(1,1) PRIMARY KEY,

    LocationId INT NOT NULL,

    ExternalId INT NULL

    );

    We must check for the existence of the row and if it doesn't exist perform the insert statement. The stored proc that performs the insert has a structure like this (again, edited for simplicity):

    DECLARE @LogId INT, @L INT, @E INT;

    SET @LogId = NULL;

    SET @L = 17;

    SET @E = 9999;

    BEGIN TRAN;

    SELECT @LogId = LogId FROM t WHERE LocationId = @L AND ExternalId = @E;

    IF @LogId IS NULL

    BEGIN;

    INSERT INTO t (LocationId, ExternalId) VALUES (@L, @E);

    <Updates to several other tables here>

    END;

    COMMIT TRAN;

    Most of the time this works ok, but when this proc is called many times in succession, duplicate entries are inserted. When multiple proc executions are performed in rapid succession the SELECT @LogId = .... returns NULL for each execution and since both are returning NULL each then goes on to execute the INSERT statement, resulting in duplicate entries.

    A unique constraint cannot be implemented since the combination of LocationId and ExternalId must be unique only for the cases when ExernalId is not null. There can be duplicates for a given LocationId and NULL value for ExternalId.

    Even if I were able to put a unique constraint on the columns in question, I would still like to have a more robust solution than having a constraint violation error be returned to the application.

    I have read some posts saying that a trigger could be put on the table to check for duplication on the insert and rollback the transaction if it is a duplicate. But this seems inefficient, in my view. I would rather prevent the insert in the first place.

    There are several instances of stored procs in our system with this particular structure and I'm looking for a solution that I can use to replace this existing method. (This is another reason I'd rather not use a trigger solution since a trigger would have to be implemented on many high traffic tables in the DB).

    I was thinking of using a transaction isolation level of serializable. But, I'm concerned that this will cause excessive blocking (and deadlocking) on the tables. I can live with some blocking, but am unsure if serializable will cause more problems that it would solve.

    What is the best approach to check for existence of a row and only perform the insert if the row doesn't exist, especially in an environment when the proc is called many, many times within a short time frame?

    Any help would be most appreciated.

    Thanks,

    Peter

  • Why not just use a LEFT OUTER JOIN to check for existence?

    DECLARE @LogId INT, @L INT, @E INT;

    SET @LogId = NULL;

    SET @L = 17;

    SET @E = 9999;

    INSERT INTO t (LocationId, ExternalId)

    SELECT @L, @E

    LEFT OUTER JOIN t

    ON @L = t.LocationId

    AND @E = t.ExternalId

    WHERE t.LogId IS NULL;

    NULL never matches NULL so as long as you don't use a ISNULL() function on t.ExternalId and @E, this code should work for you. (But without testing I make no guarantees).

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • AZ Pete (8/15/2014)

    A unique constraint cannot be implemented since the combination of LocationId and ExternalId must be unique only for the cases when ExernalId is not null. There can be duplicates for a given LocationId and NULL value for ExternalId.

    Even if I were able to put a unique constraint on the columns in question, I would still like to have a more robust solution than having a constraint violation error be returned to the application.

    I'm not sure why you'd want to write something on your own, when a constraint built into SQL can do what you need by just adding a computed, persisted column. Admittedly you'd prefer that your app capture it more gracefully, but for now you can name the constraint so that it provides a reasonably good error message.

    ALTER TABLE <table_name>

    ADD ExternalId_Unique AS CASE WHEN ExternalId IS NULL THEN LogId ELSE 0 END PERSISTED

    ALTER TABLE <table_name>

    ADD CONSTRAINT <table_name>_...

    UNIQUE NONCLUSTERED ( LocationId, ExternalId_Unique ) WITH ( FILLFACTOR = ..., SORT_IN_TEMPDB = ON ) ON [<filegroup>]

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • INSERT INTO t (LocationId, ExternalId) VALUES (@L, @E);

    where not exists (

    select 1 from t

    where t.LocationId = @L

    and t.ExternalId = @E

    )

  • I THINK the EXISTS would be faster than the LEFT join, but both are v.easy to read.

    For the unique constraint. I think you could create a filtered unique index with a where clause EXternalID is not null.

    So it only includes ExternalID when it is not null.

    I've never used filtered indices though.

    You might also consider using the MERGE statement.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply