August 25, 2011 at 1:19 pm
In my current project I have encountered an interesting situation. I have designed and developed a process to purge old data from several related databases. In the testing portion it has been determined that it is not performing well enough so I've looked for ways to improve performance. The main issue is that database is a normalized OLTP database with many foreign keys defined and there is one table (the final one I delete from) that is the referenced table for 48 foreign keys. A recent test run to delete 100 rows from this table took ~90 minutes (there are thousands of rows to be deleted). One of the methods I decided to test was disabling the foreign keys and then re-enabling them after the delete. The code is like this:
/* Run this for 48 FK's */
ALTER TABLE referenced_table NOCHECK fk_name;
/* This is actually in an SP */
Delete from referenced_table where exists (Select 1 from staging_table ST WHERE ST.PK = referenced_table.PK);
/* Run this for 48 fk's */
ALTER TABLE referenced_table WITH CHECK CHECK fk_name
Notice that when I re-enable the FK's I'm using the WITH CHECK syntax so that the FK's are enabled and trusted. A query of sys.foreign_keys verifies this.
This code is being run using SSIS with 3 Execute SQL Tasks - disable FK's - delete - enable FK's. Imagine my delight when I implemented this and the same 100 row delete was SUB-SECOND! I disabled the tasks that disabled and enabled the FK's to run the test again without them to again verify that this is what sped up the process and to my surprise the delete ran JUST AS FAST. This didn't seem right to me so I loaded up the SP in SQL Sentry Plan Explorer and asked for the estimated execution plan while I had a saved execution plan previous time loaded in another window. The new execution plan did NOT have the nodes checking the FK's in it, while the old one did. By adding WITH RECOMPILE to the SP in the new session I could get the old execution plan with the nodes for the FK checks. Now this isn't a bad thing, I mean it is faster and that's what I want. But then I thought, what if SQL Server actually allowed the deletes to take place because the FK is not being checked?
So I decided to do some testing. Attached is a test that, on my PC, Windows Server 2008R2 and SQL Server 2008 SP2 64-bit, shows that there are times an enabled, trusted FK is not enforced. Please try it out and let me know what happens & what you think.
The original box that I found this on is a SQL Server 2005 SP4
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 25, 2011 at 1:31 pm
Have you done any test queries to see if there are any FK mismatches?
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
August 25, 2011 at 1:37 pm
Do you mean in the real DB?
If that's what you mean, no I haven't, but if there were re-enabling the FK would fail and I'd get an error. That doesn't fail.
I will run the new DBCC command I learned during this DBCC CHECKCONSTRAINTS. That's easier than writing the 48 queries.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 25, 2011 at 1:41 pm
Jack Corbett (8/25/2011)
Do you mean in the real DB?If that's what you mean, no I haven't, but if there were re-enabling the FK would fail and I'd get an error. That doesn't fail.
I will run the new DBCC command I learned during this DBCC CHECKCONSTRAINTS. That's easier than writing the 48 queries.
The re-enabling of the FK might be where the bug lies. I seem to recall encountering a situation where there were FKs that didn't match up and wondering why it allowed the FK, but it was awhile ago and I don't remember the version or specifics, I just remember feeling confounded.
--------------------------------------
When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
--------------------------------------
It’s unpleasantly like being drunk.
What’s so unpleasant about being drunk?
You ask a glass of water. -- Douglas Adams
August 25, 2011 at 7:11 pm
Hi Jack,
This is a horrible ROLLBACK bug, a bit like the one Remi found a week or so ago. Well ok so it's more of a plan caching bug really, but the ROLLBACK handling is a big part of it. You should report this on Connect. I modified your script a bit to expose the bug further:
USE tempdb
GO
IFOBJECT_ID(N'dbo.child') IS NOT NULL DROP TABLE dbo.child
IFOBJECT_ID(N'dbo.parent') IS NOT NULL DROP TABLE dbo.parent
IFOBJECT_ID(N'dbo.DeleteParent') IS NOT NULL DROP PROCEDURE dbo.DeleteParent
GO
-- Clear plan cache
DBCC FREEPROCCACHE
GO
-- Create tables
CREATE TABLE dbo.parent (id INTEGER PRIMARY KEY)
CREATE TABLE dbo.child (parent_id INTEGER REFERENCES dbo.parent)
GO
-- Add six rows to each
INSERT dbo.parent (id) VALUES (1), (2), (3), (4), (5), (6)
INSERT dbo.child (parent_id) VALUES (1), (2), (3), (4), (5), (6)
GO
-- Procedure to delete a parent record by key
CREATE PROCEDURE dbo.DeleteParent (@id INTEGER)
AS DELETE dbo.parent WHERE dbo.parent.id = @id
GO
BEGIN TRANSACTION
ALTER TABLE dbo.child NOCHECK CONSTRAINT ALL
-- Cache ad-hoc plan stubs and parameterized plans
DELETE dbo.parent WHERE id = 1
EXECUTE dbo.DeleteParent @id = 2
EXECUTE sys.sp_executesql
@stmt = N'MERGE dbo.parent AS p USING (VALUES(@key)) AS u (k) ON p.id = u.k WHEN MATCHED THEN DELETE;',
@params = N'@key INTEGER',
@key = 3;
-- No errors
ROLLBACK
GO
-- Plan cache
SELECT
decp.cacheobjtype,
decp.objtype,
dest.[text],
deqp.query_plan
FROM sys.dm_exec_cached_plans AS decp
CROSS APPLY sys.dm_exec_sql_text (decp.plan_handle) AS dest
CROSS APPLY sys.dm_exec_query_plan (decp.plan_handle) AS deqp
GO
-- FK constraint is still enabled and checked
SELECT
constraint_name = OBJECT_NAME(fk.object_id),
referencing_table = OBJECT_NAME(fk.parent_object_id),
referenced_table = OBJECT_NAME(fk.referenced_object_id),
fk.is_disabled,
fk.is_not_trusted
FROM sys.foreign_keys AS fk
GO
-- All succeed (using non-FK-checking cached plans) but should fail (notice different parameter values)
DELETE dbo.parent WHERE id = 1
DELETE dbo.parent WHERE id = 2
EXECUTE dbo.DeleteParent @id = 3
EXECUTE dbo.DeleteParent @id = 4
GO
-- Correctly checks the constraint (yay!)
MERGE dbo.parent AS p USING (VALUES(5)) AS u (k) ON p.id = u.k WHEN MATCHED THEN DELETE;
GO
-- Uh oh!
EXECUTE sys.sp_executesql
@stmt = N'MERGE dbo.parent AS p USING (VALUES(@key)) AS u (k) ON p.id = u.k WHEN MATCHED THEN DELETE;',
@params = N'@key INTEGER',
@key = 6;
GO
-- Show broken constraint values
DBCC CHECKCONSTRAINTS (N'dbo.child') WITH ALL_CONSTRAINTS
edit: The bug is not fixed in Denali CTP 3. I also tested on R2 SP1 CU2.
August 26, 2011 at 12:25 am
According to your script, IMO it is quit normal DRI checks may take a long time.
One of my standard advises for DRI is to support FK with matching indexes, meaning put an index on the child table that matches the PK of the parent table 100% ( N columns + column order + asc/desc). (or the used AK for DRI)
I didn't find those on the published script.
( You can opt to disable them during regular ops time, and only enable them when your delete cycle has to be run )
Can you test your situation having these FK-indexes in place on a larger volume ?
Maybe this can help you enough, but it doesn't debug the situation.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
August 26, 2011 at 5:06 am
SQL 2008 SP1 (I keep seeing R2 in the posts, so I don't know if anyone tested this yet, or not).
Results for initial constraint test (right after tables are created):
nameSCHEMAreferenced_tablereferencing_tableis_disabledis_not_trusted
FK_child_parentdboparentchild00
So, FK is enabled and is trusted, if I'm reading this correctly.
Run Try...Catch.
ErrorNumberErrorMessage
547The ALTER TABLE statement conflicted with the FOREIGN KEY constraint "FK_child_parent".
The conflict occurred in database "fk_test", table "dbo.parent", column 'id'.
This is good, right?
Second check of constraint is the same as above. Now trying delete.
ErrorNumberErrorMessage
547The DELETE statement conflicted with the REFERENCE constraint "FK_child_parent".
The conflict occurred in database "fk_test", table "dbo.child", column 'parent_id'.
Again, this looks good. Third check of constraint is same as the first. Now trying delete with hard-coded value. Getting the same ALTER TABLE complaint. Also getting:
Trying delete of parent without deleting child with hard-coded value and FK disabled then enabled
Begin Tran 3
Disable FK
Delete 3 - hard coded value
(0 row(s) affected)
Enable FK
******************* ERROR *******************
(1 row(s) affected)
Rollback tran 3
The fourth check of the constraint is the same as the first. Trying the next delete. No error received.
FK Enabled & Trusted
Trying delete of parent without deleting child with hard-coded value FK enabled
Begin Tran 4
Delete with hard-coded value FK-enabled
(1 row(s) affected)
******************* DELETE SUCCESSFUL!? How is that? *******************
Commit Tran 4
Curiousier and Curiousier.
Running the last bit of code, I get
Return table results
(1 row(s) affected)
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
child_idparent_id_in_childparent_id
11NULL
TableConstraintWhere
[dbo].[child][FK_child_parent][parent_id] = '1'
Paul, I admit I don't quite understand your code. The Plan Cache bit, for instance. What's the purpose of doing that?
And are you saying that using the MERGE statement works, but everything else doesn't work the way it should?
August 26, 2011 at 6:31 am
Brandie Tarvin (8/26/2011)
Paul, I admit I don't quite understand your code. The Plan Cache bit, for instance. What's the purpose of doing that?And are you saying that using the MERGE statement works, but everything else doesn't work the way it should?
The issue is that SQL Server caches a plan that doesn't check the FK constraint. Future queries that parameterize to the same shape execute incorrectly. The point of displaying the plan cache is to show the parameterized form of the SQL, and the XML plans that go with it (no FK checks). MERGE also fails if it gets parameterized - that's what the sp_executesql example is for.
I wrote the script primarily to illustrate what I believe to be the core issue (caching) for the Microsoft people that would read a Connect item. If you ignore the plan cache bit, the rest of it should make sense, and worry you a lot.
The problem with this bug is that there is no way to know if a rogue plan is in cache. If it is, any statement, procedure or whatever that you write, might not check FK constraints. That's the worst sort of data-corrupting bug IMO.
August 26, 2011 at 6:50 am
Okay. That makes sense. Thank you for clarifying.
August 26, 2011 at 6:52 am
I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.
August 26, 2011 at 8:38 am
SQLkiwi (8/26/2011)
I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.
Paul,
I'm very happy that you took the time to look at this. You have confirmed exactly what I thought the problem was. I also suspected it was reuse of the plan cached by the delete when the FK is disabled, I just wasn't confident that I understood the internals well enough to be correct about it.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 26, 2011 at 8:44 am
ALZDBA (8/26/2011)
According to your script, IMO it is quit normal DRI checks may take a long time.One of my standard advises for DRI is to support FK with matching indexes, meaning put an index on the child table that matches the PK of the parent table 100% ( N columns + column order + asc/desc). (or the used AK for DRI)
I didn't find those on the published script.
( You can opt to disable them during regular ops time, and only enable them when your delete cycle has to be run )
Can you test your situation having these FK-indexes in place on a larger volume ?
Maybe this can help you enough, but it doesn't debug the situation.
I didn't put the indexes on because in reality I don't think they make a difference when it comes to the bug.
I originally found the issue running against a table with millions of rows with 48 fk's defined against it with many of those tables having millions of rows. I didn't have any orphaned rows in the original delete but I did see the plan re-use which caused me to develop the simple test script.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 26, 2011 at 8:49 am
SQLkiwi (8/26/2011)
I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.
I'll just up vote and confirm this one. Since your an MVP you might add a little more weight to it.
Thanks again for the verification.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 26, 2011 at 3:24 pm
Jack Corbett (8/26/2011)
SQLkiwi (8/26/2011)
I added a quick Connect item for this bug, to let the powers that be know. I'll happily remove it if and when Jack enters one.I'll just up vote and confirm this one. Since your an MVP you might add a little more weight to it.
Thanks again for the verification.
The only weight it needs is to blow up when they run it. That breaks their sla so they have to fix it.
August 26, 2011 at 3:29 pm
Ninja's_RGR'us (8/26/2011)
The only weight it needs is to blow up when they run it. That breaks their sla so they have to fix it.
That may be true, but when it gets fixed and for what release it gets fixed may depend some on who reported it. (A bug I reported was fixed in Denali and closed in Connect. That doesn't really do me any good at this point.)
Of course mine wasn't any where near as serious as this, so I imagine this one will get fixed in all supported versions fairly quickly. (At least I hope it will.)
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply