Ignore Violation of PRIMARY KEY constraint

  • In changing from MS Access to SQL server, I have some Access append queries (aka Insert statements), which append to MyTable with a primary key. The way these queries were (intentionally) set up in Access, they will regularly violate MyTable's PK. In Access, you have only to leave off the "dbfailonerror" designation when running the query, and Access will just insert the first record it comes to for each PK, ignoring the others and giving no error.

    So, there a switch somewhere in SQL server where I can similarly turn off the error and still get new records added?

    Here's my code:

    INSERT INTO DataIntegrityErrors ( Product,

    Problem, ProductItemNo, ChartPoint, MinDate, MaxDate, Qty, FinalsCell, MfgGroup, Source, ProductNounProblem )

    SELECT [YieldDataAllforTool].[ProductName],

    'Finals Yields not in FinalsProductInfo' AS Expr1, ProductCodes.Item,

    max([YieldDataAllforTool].ChartPoint), Min([YieldDataAllforTool].[TestDate]) AS [MinOfTestDate],

    Max([YieldDataAllforTool].[TestDate]) AS [MaxOfTestDate],

    Sum([YieldDataAllforTool].QtyTested) AS SumOfQtyTested,

    [YieldDataAllforTool].Cell, [YieldDataAllforTool].[MFGGroup],

    [YieldDataAllforTool].Source, [YieldDataAllforTool].[ProductName] + 'Finals Yields not in FinalsProductInfo' AS Expr2

    FROM (([YieldDataAllforTool] LEFT JOIN FinalsProductInfo

    ON [YieldDataAllforTool].[ProductName] = FinalsProductInfo.Product)

    LEFT JOIN DataIntegrityErrors

    ON [YieldDataAllforTool].[ProductName] = DataIntegrityErrors.Product)

    LEFT JOIN ProductCodes

    ON [YieldDataAllforTool].[ProductName] = ProductCodes.ProductName

    WHERE FinalsProductInfo.Product Is Null

    AND YieldDataAllforTool.TestDate>'5/28/2004'

    AND (DataIntegrityErrors.Ignore=0 Or DataIntegrityErrors.Ignore Is Null)

    AND YieldDataAllforTool.TestDate Between getDate()-30 And getDate()

    AND YieldDataAllforTool.MFGGroup In ('fnet mfg','fpm mfg','ind mfg','ptool mfg','biomed mfg')

    and YieldDataAllforTool.ProductName<>'default item'

    -- and not exists (select [YieldDataAllforTool].[ProductName] + 'Finals Yields not in FinalsProductInfo'

    -- from dataintegrityerrors)

    GROUP BY [YieldDataAllforTool].[ProductName], ProductCodes.Item, [YieldDataAllforTool].Cell, [YieldDataAllforTool].[MFGGroup], [YieldDataAllforTool].Source, [YieldDataAllforTool].[ProductName] + 'Finals Yields not in FinalsProductInfo', [YieldDataAllforTool].Company, [YieldDataAllforTool].Model

    The PK is ProductNounProblem. If this wasn't a calculated field within the query, I'd just do a left-join and add something like "where ProductNounProblem is null". As indicated by my commented-out EXISTS statement, I thought maybe this would do the trick, but it causes nothing to be inserted even when the PK is not violated.

    I can think of some other ways to handle this but just wondering what's the easiest way.

    [font="Comic Sans MS"]She Through Whom All Data Flows[/font]

  • You could use the ignore_dup_key option with a unique index and get a similar effect.

    create unique index ix_Uniq on MyTable (colA asc) with ignore_dup_key

    create table MyTable (colA int not null, bb int)

    insert into MyTable select 1, 2 union select 3, 4

    insert into MyTable select 4, 20 union select 4, 40

    -- gives message Duplicate key was ignored.

    -- one of the records with ColA 4 will be inserted

    Your primary key constraint would have to be removed.

  • Instead of intentionally violating the key (not something I'd encourage), why not just change the query slightly so that it selects the TOP 1 value and inserts that? You wouldn't have to hop through hoops that way.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Is the problem that there are two (or more) rows being generated with duplicate primary keys by your query or that the primary key already exists in the destination table or both?

  • Also, here is a snippet of code that you might be able to use to avoid PK violations:INSERT

    DataIntegrityErrors

    (

    Column1,

    ...

    ColumnN

    )

    SELECT

    T.Col1,

    ...

    T.ColN

    FROM

    (

    -- Insert current SELECT making sure to get

    -- rows without any duplicates PKs

    ) AS T

    LEFT OUTER JOIN

    DataIntegrityErrors

    ON T.PKColumn = DataIntegrityErrors .PKColumn

    WHERE

    DataIntegrityErrors .PKColumn IS NULL

  • Honestly, I would never ignore a row that produces a duplicate primary key. Primary keys are there to uniquely identify a row in a table when it comes to extracting, updating or deleting that row. In addition, I have never come across a table that contains only columns that are part of the primary key, they always have additional columns. So if you have duplicates, how do you decide which one of the rows carries the correct additional information? I'd say you better report on those duplicates BEFORE you attempt to ignore them, resolve them and the retry. Usually it will boil down to the system that the data was extracted from is buggy or contains duplicates (fix those at source), or update existing records on your side given the PK in your input.

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • No offense, but using the option in Access as a means of avoiding having to write the query to produce correct results is a horribly lazy way of getting something accomplished. If you need to not add any duplicate records, then don't generate any. You can easily achieve this by adding to the WHERE clause a NOT EXISTS (SELECT PRIMARY_KEY FROM EXISTING_TABLE AS ET WHERE ET.PRIMARY_KEY=SELECTED_DATA.PRIMARY_KEY) type of construct, substituting in the table aliases and fields as appropriate. That "option" to ignore duplicate key errors is the kind of thing that products like MS Access make all too easy to implement, and then when someone finally moves up to SQL Server, they end up trying to reproduce the same lazy methodology, which in the long run, is just not a good idea. There are lots of ways where someone could hang themselves out to dry by trying to over-minimize their own effort.

    Steve

    (aka smunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • SHort answer: Lamprey13, I've gone with your subquery solution.

    Longer answer:

    ksullivan, don't want to delete the PK constraint.

    Grant, I can't use the TOP 1 option because the issue is records that already exist in the table, NOT duplicate records in the incoming data.

    Others: IMHO, there are perfectly valid reasons for wanting to ignore duplicate PK's. In this case, the table with the PK contains a list of product-specific data entry issues that need to be fixed. Say people on the manufacturing floor enter data on a product that my db has never heard of. This fact gets entered into my table with a PK (conceptually) something like "Product A -- missing from one of my lookup tables". I may not get around to fixing this issue right away, and would rather not have the issue logged multiple times.

    Yes, it would be nice to disallow data entry on products that I've never heard of; they should be set up first in the db. But realistically in a manufacturing environment, this will not fly, believe me.

    [font="Comic Sans MS"]She Through Whom All Data Flows[/font]

  • I hear you on the "won't fly" deal, but if you stop and think about it, that rather strongly suggests that there's a huge disconnect between the process being used and the database design to hold the information. Just because a manufacturing shift isn't operating during "normal" business hours shouldn't exempt it from an appropriate business process. It kind of sounds like the people entering the data are saying "My lack of planning DOES constitute an emergency on your part", and they're either getting away with it or they have tacit management approval for saying it.

    Sounds like an opportunity to me... to introduce a cost-saving, time-saving plan to ensure there's an appropriate process for managing that information at whatever hour of the day it might be needed. It may be that the company just doesn't want to acknowledge the "true cost" of doing business. One thought I have is to recode your application to attempt to do the insert, and if it fails on primary key, perhaps have some kind of additional unique information that's date/time based that can be grafted onto the the PK somehow, and then allow the data entry, and additionally, insert that same data into a special table for your review the next business day. Whatever you end up with, I do wish you luck with it. Sounds like the politics will be real fun... 😀 😀 😀

    Steve

    (aka sgmunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Yeah, I've been dealing with this for 20 years, and I long ago gave up beating my head against the wall. This is why I have developed the system that tells me: "oh ,they're entering data on a new product, I guess I'd better find out about it so I can get it set up". Not ideal, but the best we seem to be able to do, and it works OK.

    [font="Comic Sans MS"]She Through Whom All Data Flows[/font]

  • Ugh, even with the sub-query solution, it still isn't working as expected. I'm trying this, as suggested (greatly simplified from the full sql)

    To debug, I've commented out the INSERT part so it's just a select statement. But rows that clearly already exist in DataINtegrityError.PKColumn are still showing up.

    Is this because I'm joining on a derived value (a,b,c+d+'hello world' as PKColumn )? :unsure:

    INSERT

    DataIntegrityErrors

    (

    Column1,

    ...

    ColumnN

    )

    SELECT

    T.Col1,

    ...

    T.ColN

    FROM

    (

    select a,b,c+d+'hello world' as PKColumn from MyTable) AS T

    LEFT OUTER JOIN

    DataIntegrityErrors

    ON T.PKColumn = DataIntegrityErrors .PKColumn

    WHERE

    DataIntegrityErrors .PKColumn IS NULL

    [font="Comic Sans MS"]She Through Whom All Data Flows[/font]

  • Unless you have the 'hello world' text on the end of at least some of the data in the DataIntegrityErrors table, you'll never have a match to worry about avoiding, and thus, the only records you'll see are those in MyTable, which if I understand you correctly, also contain the problem records. At some point, you need to be a lot more specific about exactly what you are doing and provide an example with code that produces an identical problem result, with just fake data instead of the real stuff, and it has to reproduce all the elements of the joins in the real-world scenario, as well as getting all the data types accurate, you can leave out fields that don't participate in join conditions or where clauses for simplicity, but everything else would need to be 100% on the money, but just with fake data and even fake field names. It's rather difficult to try and help solve a problem that you can't reproduce because you don't have anything practical to work with. We can't be expected to create sample data because we're not the least bit knowledgable about what kinds of variations exist in your data. That's a task ONLY you can perform.

    Steve

    (aka smunson)

    :-):-):-)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I'm guessing the issue is that you are dealing with sets.. So as I mentioned in my first suggestion, your derived table has to only select DISTINCT primary keys or you will get duplicate PK violations..

    Again I don't know your data so I'm not sure what all the difficulties are, but here are 3 suggestions:

    1. Use the ROW_NUMBER() function to only select one row per PK.

    2. Create a temp table with primary key using the IGNORE DUP KEY option. Dump your select into that table, then use that to left outer join to in order to insert into your final table.

    3. There is the concept of late-arriving data (usually as it pertains to data warehousing). You might be able to take advantage of something like that to handle people entering things that don't exists. But, probably not if they can enter duplicates..

    Here is an example of how to use the ROW_NUMBER() function:DECLARE @Foo TABLE (ID INT, SomeVal INT, SomeOtherVal INT)

    INSERT @Foo

    SELECT 1, 1, 1

    UNION ALL SELECT 1, 2, 1

    UNION ALL SELECT 1, 2, 1

    UNION ALL SELECT 2, 3, 1

    UNION ALL SELECT 2, 4, 1

    UNION ALL SELECT 2, 5, 1

    UNION ALL SELECT 2, 6, 1

    UNION ALL SELECT 3, 7, 1

    UNION ALL SELECT 3, 8, 1

    UNION ALL SELECT 3, 9, 1

    SELECT

    ID,

    SomeVal,

    SomeOtherVal

    FROM

    (

    SELECT

    ID,

    SomeVal,

    SomeOtherVal,

    ROW_NUMBER() OVER (PARTITION BY ID ORDER BY SomeVal) AS RowNum

    FROM

    @Foo

    ) AS D

    WHERE

    RowNum = 1

  • Thanks for all of your help. The problem was (as Lamprey13 suggested) duplicate PK's in my incoming data.

    [font="Comic Sans MS"]She Through Whom All Data Flows[/font]

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply