March 28, 2014 at 8:14 am
andrew gothard (3/27/2014)
ScottPletcher (3/27/2014)
andrew gothard (3/27/2014)
ScottPletcher (3/27/2014)
yuvipoy (3/27/2014)
Sean Lange (3/27/2014)
True. I have worked with this OP on several threads and the problem is performance because they have several related tables and the primary key (and clustered index) is GUIDs on several tables each with millions of rows. I am certain they have fragmentation in the 90%+ range consistently given the amount of traffic it sounds like these tables have.Apart from GUID Column i dont have any columns unique.
for fragmentation we use to rebuild index when we delete some million of recrods.
while rebuilding the transaction (ldf and mdf)will grow huge these things are apart from the main problem.
If you delete by datetime, then cluster by that.
If not, this seems to be one of the relatively rare times when you should likely add an IDENTITY column and cluster on that. You will also have a nonclus, PK index on the GUID column, if you need to lookup by the GUID.
Then, when deleting, you can delete rows in batches by cluster key, say 100,000 at a time (or whatever value works best for keeping down the log size).
100 000 will escalate to a table lock. Batches of - say 4000 should prevent lock escalation if a table lock would be problematic
SQL would be able to use page locks if it does a key-range lock. I was hoping that deleting by the clustering key, as I stated above, would allow SQL to use page locks.
But I'll admit, often SQL doesn't use key-range locks as I would hope or expect it to :-D, so a table lock could result (assuming that SQL could actually acquire a table lock, otherwise it won't escalate anyway). Since, however, the pages would always be contiguous and linked, I would hope the lock wouldn't last that long anyway.
It will - up to a threshold of 5000 locks (3000 in 2000 and earlier). If you delete 100000 rows, you will escalate to a table lock, that's it (well -you can override it - but I'm not even going to post a hint as to how, because some idiot will try it, seriously, if you do, unless you're a SQL Server GOD, you will end your career) - this is by design. This is documented and deliberate behaviour. It doesn't happen by accident or randomly, and happens for a good reason. http://technet.microsoft.com/en-us/library/ms184286(v=sql.105).aspx
I know lock escalation is a little known feature, but it needs to be bourne in mind. I've seen too many cases of "We'll delete a few (hundreds of) million of rows and they're really old it doesn't matter, nothing's using them" from a third party and seeing a system just go down for hours.
Documented and deliberate behaviour, and done for a reason.
100,000 rows won't necessarily be 5,000 pages. If it's more than that, reduce it to 80,000 or whatever. I hoped if we specified the clus key range to delete from, SQL might do a key-range lock. If SQL decides to use a key-range lock, it can lock pages only, not individual rows.
Besides, SQL won't escalate to a table lock unless it can actually get that table lock. So saying it "will" escalate is not correct.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 29, 2014 at 4:35 pm
ScottPletcher (3/28/2014)
andrew gothard (3/27/2014)
ScottPletcher (3/27/2014)
andrew gothard (3/27/2014)
ScottPletcher (3/27/2014)
yuvipoy (3/27/2014)
Sean Lange (3/27/2014)
True. I have worked with this OP on several threads and the problem is performance because they have several related tables and the primary key (and clustered index) is GUIDs on several tables each with millions of rows. I am certain they have fragmentation in the 90%+ range consistently given the amount of traffic it sounds like these tables have.Apart from GUID Column i dont have any columns unique.
for fragmentation we use to rebuild index when we delete some million of recrods.
while rebuilding the transaction (ldf and mdf)will grow huge these things are apart from the main problem.
If you delete by datetime, then cluster by that.
If not, this seems to be one of the relatively rare times when you should likely add an IDENTITY column and cluster on that. You will also have a nonclus, PK index on the GUID column, if you need to lookup by the GUID.
Then, when deleting, you can delete rows in batches by cluster key, say 100,000 at a time (or whatever value works best for keeping down the log size).
100 000 will escalate to a table lock. Batches of - say 4000 should prevent lock escalation if a table lock would be problematic
SQL would be able to use page locks if it does a key-range lock. I was hoping that deleting by the clustering key, as I stated above, would allow SQL to use page locks.
But I'll admit, often SQL doesn't use key-range locks as I would hope or expect it to :-D, so a table lock could result (assuming that SQL could actually acquire a table lock, otherwise it won't escalate anyway). Since, however, the pages would always be contiguous and linked, I would hope the lock wouldn't last that long anyway.
It will - up to a threshold of 5000 locks (3000 in 2000 and earlier). If you delete 100000 rows, you will escalate to a table lock, that's it (well -you can override it - but I'm not even going to post a hint as to how, because some idiot will try it, seriously, if you do, unless you're a SQL Server GOD, you will end your career) - this is by design. This is documented and deliberate behaviour. It doesn't happen by accident or randomly, and happens for a good reason. http://technet.microsoft.com/en-us/library/ms184286(v=sql.105).aspx
I know lock escalation is a little known feature, but it needs to be bourne in mind. I've seen too many cases of "We'll delete a few (hundreds of) million of rows and they're really old it doesn't matter, nothing's using them" from a third party and seeing a system just go down for hours.
Documented and deliberate behaviour, and done for a reason.
100,000 rows won't necessarily be 5,000 pages. If it's more than that, reduce it to 80,000 or whatever. I hoped if we specified the clus key range to delete from, SQL might do a key-range lock. If SQL decides to use a key-range lock, it can lock pages only, not individual rows.
Besides, SQL won't escalate to a table lock unless it can actually get that table lock. So saying it "will" escalate is not correct.
Simple arithmetic, if you have 8k pages, one row per page (and we know that it's going to be bigger than that), 12500 rows. Table Lock (if one can be gained).
Yes, indeed the lock escalation might possibly fail. This would be an excellent reason for doing it properly. The difference in execution time might be a bit longer, sure, why is this an issue? It's efficient in log use and doesn't lock other processes out or fail if it can't escalate. What part of this are you against?
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
March 29, 2014 at 4:49 pm
yuvipoy (3/27/2014)
andrew gothard (3/27/2014)
Oh - and if you've made the mistake of update / delete cascade being set to on, do not delete from the parent table. Use an approach that deletes, in batches, from the lowest children to the highest parent appropriately. If you have cascade on and delete 5000 parent records with a number of generations of child tables - you're in for a lot of woe. Don't do it kids!
Ok say if i am having 10 tables with cascade Delete ON. All tables are linked with Fkey.
Say 7 of the tables are connected to a single table(F7 key tables--> 1 Master table), if i delete recrods from my child tables(7 tables) first and then my master table.Will not the master table will look in to child tables for records?
Time consumed in the master table delete is more even when i delete the child records first,why? So can i go with deleting master table straightly instead of deleting child table and then master.
ScottPletcher (3/27/2014)
If you delete by datetime, then cluster by that.
If not, this seems to be one of the relatively rare times when you should likely add an IDENTITY column and cluster on that. You will also have a nonclus, PK index on the GUID column, if you need to lookup by the GUID.
Then, when deleting, you can delete rows in batches by cluster key, say 100,000 at a time (or whatever value works best for keeping down the log size).
Kept my clusterindex on date key column
Deleting all the records on batches only 3000 records per batch, to delete 1/2 million row.
Still time consuming.I am first deleting child tables and then master table
Well, yes, if you have FK's on the tables, it'll check whether the delete breeches referential integrity, so if you have a delete that removes rows with value X in the key, it will check whether there are matching rows in the child table before it does the delete. If there are, it will fail (unless you have cascade set - with the associated potential locking issues).
As for the issue of "time consuming". Is it acceptable for the tables to be locked and the system effectively down, or for the delete to fail because it's unable to get an exclusive lock? If so, it's not a problem. If either is a problem, you probably need to do a batched delete. Ask the question Do I want to be in front of the boss answering the question "How long will the system be down for" with the answer "I don't know". In some situations that might be OK. In most it can be a bit career limiting.
Simple question, if it takes a bit longer to run, but no-one notices, because nothing goes wrong, is that a problem, if so, is it a worse problem than the system being down?
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
March 30, 2014 at 3:31 pm
andrew gothard (3/29/2014)
Simple arithmetic, if you have 8k pages, one row per page (and we know that it's going to be bigger than that), 12500 rows. Table Lock (if one can be gained).
Yes, indeed the lock escalation might possibly fail. This would be an excellent reason for doing it properly. The difference in execution time might be a bit longer, sure, why is this an issue? It's efficient in log use and doesn't lock other processes out or fail if it can't escalate. What part of this are you against?
You are again assuming SQL must do row locks. SQL can do page locks only under the right conditions.
I'm against deleting only 4K rows at a time when multiple millions need deleted because I think it breaks it into too small of pieces. That small a batch size is in itself inefficient because of the overhead of such a small transaction.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
March 31, 2014 at 8:15 am
Just want to say, your tag line is truly inspirational:
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
Simple but really, truer words were never spoken.
Thanks,
webrunner
-------------------
A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html
April 2, 2014 at 2:54 pm
ScottPletcher (3/30/2014)
andrew gothard (3/29/2014)
Simple arithmetic, if you have 8k pages, one row per page (and we know that it's going to be bigger than that), 12500 rows. Table Lock (if one can be gained).
Yes, indeed the lock escalation might possibly fail. This would be an excellent reason for doing it properly. The difference in execution time might be a bit longer, sure, why is this an issue? It's efficient in log use and doesn't lock other processes out or fail if it can't escalate. What part of this are you against?
You are again assuming SQL must do row locks. SQL can do page locks only under the right conditions.
I'm against deleting only 4K rows at a time when multiple millions need deleted because I think it breaks it into too small of pieces. That small a batch size is in itself inefficient because of the overhead of such a small transaction.
Ok - the engine, and the people who coded it disagree. Argue with them. " because I think it breaks it into too small of pieces". I think is not evidence. I've done this, tested, and convinced vendors to move to this approach with very busy systems (such as large call centres) where your approach is not acceptable from a business perspective. I'll accept this is not all businesses.
It's a process I've used time and time again - never a problem. Boils down to - if locking the table is ok, do what you want. If locking the table is wrong, batch your deletes appropriately.
I have no idea why you would specifically choose to lock a table in this situation though, when there's an easy way not to. I've seen that approach cause a LOT of pain.
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
April 2, 2014 at 2:59 pm
webrunner (3/31/2014)
Just want to say, your tag line is truly inspirational:I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
Simple but really, truer words were never spoken.
Thanks,
webrunner
Thank you very much. It was a throwaway line in a thread on something and Jeff Moden quite liked it too, so I thought I'd stick with it. Glad you like it. 🙂
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
April 2, 2014 at 3:07 pm
andrew gothard (4/2/2014)
ScottPletcher (3/30/2014)
andrew gothard (3/29/2014)
Simple arithmetic, if you have 8k pages, one row per page (and we know that it's going to be bigger than that), 12500 rows. Table Lock (if one can be gained).
Yes, indeed the lock escalation might possibly fail. This would be an excellent reason for doing it properly. The difference in execution time might be a bit longer, sure, why is this an issue? It's efficient in log use and doesn't lock other processes out or fail if it can't escalate. What part of this are you against?
You are again assuming SQL must do row locks. SQL can do page locks only under the right conditions.
I'm against deleting only 4K rows at a time when multiple millions need deleted because I think it breaks it into too small of pieces. That small a batch size is in itself inefficient because of the overhead of such a small transaction.
Ok - the engine, and the people who coded it disagree. Argue with them. " because I think it breaks it into too small of pieces". I think is not evidence. I've done this, tested, and convinced vendors to move to this approach with very busy systems (such as large call centres) where your approach is not acceptable from a business perspective. I'll accept this is not all businesses.
It's a process I've used time and time again - never a problem. Boils down to - if locking the table is ok, do what you want. If locking the table is wrong, batch your deletes appropriately.
I have no idea why you would specifically choose to lock a table in this situation though, when there's an easy way not to. I've seen that approach cause a LOT of pain.
You might want to read the section "Key-Range Locking" in Books Online, or in white papers in MSDN or selected blogs on the topic. You can explicitly specify page locking for key-range deletes that meet the qualifications, greatly lessening the chance that SQL will need to escalate to a table lock. Edit: In fact, I can't see why SQL would ever escalate to a table lock in a key-range locking situation, because of the extremely large number of small transactions.
I'm not saying your method won't work, just that it's likely causing much more overhead than is necessary to do those DELETEs.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply