Potential Foreign Key Bug

  • In my current project I have encountered an interesting situation. I have designed and developed a process to purge old data from several related databases. In the testing portion it has been determined that it is not performing well enough so I've looked for ways to improve performance. The main issue is that database is a normalized OLTP database with many foreign keys defined and there is one table (the final one I delete from) that is the referenced table for 48 foreign keys. A recent test run to delete 100 rows from this table took ~90 minutes (there are thousands of rows to be deleted). One of the methods I decided to test was disabling the foreign keys and then re-enabling them after the delete. The code is like this:

    /* Run this for 48 FK's */

    ALTER TABLE referenced_table NOCHECK fk_name;

    /* This is actually in an SP */

    Delete from referenced_table where exists (Select 1 from staging_table ST WHERE ST.PK = referenced_table.PK);

    /* Run this for 48 fk's */

    ALTER TABLE referenced_table WITH CHECK CHECK fk_name

    Notice that when I re-enable the FK's I'm using the WITH CHECK syntax so that the FK's are enabled and trusted. A query of sys.foreign_keys verifies this.

    This code is being run using SSIS with 3 Execute SQL Tasks - disable FK's - delete - enable FK's. Imagine my delight when I implemented this and the same 100 row delete was SUB-SECOND! I disabled the tasks that disabled and enabled the FK's to run the test again without them to again verify that this is what sped up the process and to my surprise the delete ran JUST AS FAST. This didn't seem right to me so I loaded up the SP in SQL Sentry Plan Explorer and asked for the estimated execution plan while I had a saved execution plan previous time loaded in another window. The new execution plan did NOT have the nodes checking the FK's in it, while the old one did. By adding WITH RECOMPILE to the SP in the new session I could get the old execution plan with the nodes for the FK checks. Now this isn't a bad thing, I mean it is faster and that's what I want. But then I thought, what if SQL Server actually allowed the deletes to take place because the FK is not being checked?

    So I decided to do some testing. Attached is a test that, on my PC, Windows Server 2008R2 and SQL Server 2008 SP2 64-bit, shows that there are times an enabled, trusted FK is not enforced. Please try it out and let me know what happens & what you think.

    The original box that I found this on is a SQL Server 2005 SP4

  • Have you done any test queries to see if there are any FK mismatches?

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Do you mean in the real DB?

    If that's what you mean, no I haven't, but if there were re-enabling the FK would fail and I'd get an error. That doesn't fail.

    I will run the new DBCC command I learned during this DBCC CHECKCONSTRAINTS. That's easier than writing the 48 queries.

  • Jack Corbett (8/25/2011)


    Do you mean in the real DB?

    If that's what you mean, no I haven't, but if there were re-enabling the FK would fail and I'd get an error. That doesn't fail.

    I will run the new DBCC command I learned during this DBCC CHECKCONSTRAINTS. That's easier than writing the 48 queries.

    The re-enabling of the FK might be where the bug lies. I seem to recall encountering a situation where there were FKs that didn't match up and wondering why it allowed the FK, but it was awhile ago and I don't remember the version or specifics, I just remember feeling confounded.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • Hi Jack,

    This is a horrible ROLLBACK bug, a bit like the one Remi found a week or so ago. Well ok so it's more of a plan caching bug really, but the ROLLBACK handling is a big part of it. You should report this on Connect. I modified your script a bit to expose the bug further:

    USE tempdb

    GO

    IFOBJECT_ID(N'dbo.child') IS NOT NULL DROP TABLE dbo.child

    IFOBJECT_ID(N'dbo.parent') IS NOT NULL DROP TABLE dbo.parent

    IFOBJECT_ID(N'dbo.DeleteParent') IS NOT NULL DROP PROCEDURE dbo.DeleteParent

    GO

    -- Clear plan cache

    DBCC FREEPROCCACHE

    GO

    -- Create tables

    CREATE TABLE dbo.parent (id INTEGER PRIMARY KEY)

    CREATE TABLE dbo.child (parent_id INTEGER REFERENCES dbo.parent)

    GO

    -- Add six rows to each

    INSERT dbo.parent (id) VALUES (1), (2), (3), (4), (5), (6)

    INSERT dbo.child (parent_id) VALUES (1), (2), (3), (4), (5), (6)

    GO

    -- Procedure to delete a parent record by key

    CREATE PROCEDURE dbo.DeleteParent (@id INTEGER)

    AS DELETE dbo.parent WHERE dbo.parent.id = @id

    GO

    BEGIN TRANSACTION

    ALTER TABLE dbo.child NOCHECK CONSTRAINT ALL

    -- Cache ad-hoc plan stubs and parameterized plans

    DELETE dbo.parent WHERE id = 1

    EXECUTE dbo.DeleteParent @id = 2

    EXECUTE sys.sp_executesql

    @stmt = N'MERGE dbo.parent AS p USING (VALUES(@key)) AS u (k) ON p.id = u.k WHEN MATCHED THEN DELETE;',

    @params = N'@key INTEGER',

    @key = 3;

    -- No errors

    ROLLBACK

    GO

    -- Plan cache

    SELECT

    decp.cacheobjtype,

    decp.objtype,

    dest.[text],

    deqp.query_plan

    FROM sys.dm_exec_cached_plans AS decp

    CROSS APPLY sys.dm_exec_sql_text (decp.plan_handle) AS dest

    CROSS APPLY sys.dm_exec_query_plan (decp.plan_handle) AS deqp

    GO

    -- FK constraint is still enabled and checked

    SELECT

    constraint_name = OBJECT_NAME(fk.object_id),

    referencing_table = OBJECT_NAME(fk.parent_object_id),

    referenced_table = OBJECT_NAME(fk.referenced_object_id),

    fk.is_disabled,

    fk.is_not_trusted

    FROM sys.foreign_keys AS fk

    GO

    -- All succeed (using non-FK-checking cached plans) but should fail (notice different parameter values)

    DELETE dbo.parent WHERE id = 1

    DELETE dbo.parent WHERE id = 2

    EXECUTE dbo.DeleteParent @id = 3

    EXECUTE dbo.DeleteParent @id = 4

    GO

    -- Correctly checks the constraint (yay!)

    MERGE dbo.parent AS p USING (VALUES(5)) AS u (k) ON p.id = u.k WHEN MATCHED THEN DELETE;

    GO

    -- Uh oh!

    EXECUTE sys.sp_executesql

    @stmt = N'MERGE dbo.parent AS p USING (VALUES(@key)) AS u (k) ON p.id = u.k WHEN MATCHED THEN DELETE;',

    @params = N'@key INTEGER',

    @key = 6;

    GO

    -- Show broken constraint values

    DBCC CHECKCONSTRAINTS (N'dbo.child') WITH ALL_CONSTRAINTS

    edit: The bug is not fixed in Denali CTP 3. I also tested on R2 SP1 CU2.

  • According to your script, IMO it is quit normal DRI checks may take a long time.

    One of my standard advises for DRI is to support FK with matching indexes, meaning put an index on the child table that matches the PK of the parent table 100% ( N columns + column order + asc/desc). (or the used AK for DRI)

    I didn't find those on the published script.

    ( You can opt to disable them during regular ops time, and only enable them when your delete cycle has to be run )

    Can you test your situation having these FK-indexes in place on a larger volume ?

    Maybe this can help you enough, but it doesn't debug the situation.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • SQL 2008 SP1 (I keep seeing R2 in the posts, so I don't know if anyone tested this yet, or not).

    Results for initial constraint test (right after tables are created):

    nameSCHEMAreferenced_tablereferencing_tableis_disabledis_not_trusted

    FK_child_parentdboparentchild00

    So, FK is enabled and is trusted, if I'm reading this correctly.

    Run Try...Catch.

    ErrorNumberErrorMessage

    547The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "FK_child_parent".

    The conflict occurred in database "fk_test", table "dbo.parent", column 'id'.

    This is good, right?

    Second check of constraint is the same as above. Now trying delete.

    ErrorNumberErrorMessage

    547The DELETE statement conflicted with the REFERENCE constraint "FK_child_parent".

    The conflict occurred in database "fk_test", table "dbo.child", column 'parent_id'.

    Again, this looks good. Third check of constraint is same as the first. Now trying delete with hard-coded value. Getting the same ALTER TABLE complaint. Also getting:

    Trying delete of parent without deleting child with hard-coded value and FK disabled then enabled

    Begin Tran 3

    Disable FK

    Delete 3 - hard coded value

    (0 row(s) affected)

    Enable FK

    ******************* ERROR *******************

    (1 row(s) affected)

    Rollback tran 3

    The fourth check of the constraint is the same as the first. Trying the next delete. No error received.

    FK Enabled & Trusted

    Trying delete of parent without deleting child with hard-coded value FK enabled

    Begin Tran 4

    Delete with hard-coded value FK-enabled

    (1 row(s) affected)

    ******************* DELETE SUCCESSFUL!? How is that? *******************

    Commit Tran 4

    Curiousier and Curiousier.

    Running the last bit of code, I get

    Return table results

    (1 row(s) affected)

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    child_idparent_id_in_childparent_id

    11NULL

    TableConstraintWhere

    [dbo].[child][FK_child_parent][parent_id] = '1'

    Paul, I admit I don't quite understand your code. The Plan Cache bit, for instance. What's the purpose of doing that?

    And are you saying that using the MERGE statement works, but everything else doesn't work the way it should?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (8/26/2011)


    Paul, I admit I don't quite understand your code. The Plan Cache bit, for instance. What's the purpose of doing that?

    And are you saying that using the MERGE statement works, but everything else doesn't work the way it should?

    The issue is that SQL Server caches a plan that doesn't check the FK constraint. Future queries that parameterize to the same shape execute incorrectly. The point of displaying the plan cache is to show the parameterized form of the SQL, and the XML plans that go with it (no FK checks). MERGE also fails if it gets parameterized - that's what the sp_executesql example is for.

    I wrote the script primarily to illustrate what I believe to be the core issue (caching) for the Microsoft people that would read a Connect item. If you ignore the plan cache bit, the rest of it should make sense, and worry you a lot.

    The problem with this bug is that there is no way to know if a rogue plan is in cache. If it is, any statement, procedure or whatever that you write, might not check FK constraints. That's the worst sort of data-corrupting bug IMO.

  • Okay. That makes sense. Thank you for clarifying.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.

    https://connect.microsoft.com/SQLServer/feedback/details/685800/parameterized-delete-and-merge-allow-foreign-key-constraint-violations

  • SQLkiwi (8/26/2011)


    I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.

    https://connect.microsoft.com/SQLServer/feedback/details/685800/parameterized-delete-and-merge-allow-foreign-key-constraint-violations

    Paul,

    I'm very happy that you took the time to look at this. You have confirmed exactly what I thought the problem was. I also suspected it was reuse of the plan cached by the delete when the FK is disabled, I just wasn't confident that I understood the internals well enough to be correct about it.

  • ALZDBA (8/26/2011)


    According to your script, IMO it is quit normal DRI checks may take a long time.

    One of my standard advises for DRI is to support FK with matching indexes, meaning put an index on the child table that matches the PK of the parent table 100% ( N columns + column order + asc/desc). (or the used AK for DRI)

    I didn't find those on the published script.

    ( You can opt to disable them during regular ops time, and only enable them when your delete cycle has to be run )

    Can you test your situation having these FK-indexes in place on a larger volume ?

    Maybe this can help you enough, but it doesn't debug the situation.

    I didn't put the indexes on because in reality I don't think they make a difference when it comes to the bug.

    I originally found the issue running against a table with millions of rows with 48 fk's defined against it with many of those tables having millions of rows. I didn't have any orphaned rows in the original delete but I did see the plan re-use which caused me to develop the simple test script.

  • SQLkiwi (8/26/2011)


    I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.

    https://connect.microsoft.com/SQLServer/feedback/details/685800/parameterized-delete-and-merge-allow-foreign-key-constraint-violations

    I'll just up vote and confirm this one. Since your an MVP you might add a little more weight to it.

    Thanks again for the verification.

  • Jack Corbett (8/26/2011)


    SQLkiwi (8/26/2011)


    I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.

    https://connect.microsoft.com/SQLServer/feedback/details/685800/parameterized-delete-and-merge-allow-foreign-key-constraint-violations

    I'll just up vote and confirm this one. Since your an MVP you might add a little more weight to it.

    Thanks again for the verification.

    The only weight it needs is to blow up when they run it. That breaks their sla so they have to fix it.

  • Ninja's_RGR'us (8/26/2011)


    The only weight it needs is to blow up when they run it. That breaks their sla so they have to fix it.

    That may be true, but when it gets fixed and for what release it gets fixed may depend some on who reported it. (A bug I reported was fixed in Denali and closed in Connect. That doesn't really do me any good at this point.)

    Of course mine wasn't any where near as serious as this, so I imagine this one will get fixed in all supported versions fairly quickly. (At least I hope it will.)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply