September 27, 2018 at 7:00 am
DISCLAIMER: I'm a dev rather than a DBA.
We use Log4Net to capture database interactions - lots of INFO messages, a few WARNINGs and, crucially, ERRORs. Recently our website was misbehaving and in an effort to diagnose it I looked at the Log4NetLog table.(there are currently 17756271 rows in the table.)
I realised that there was a problem with the design of the table - the useful columns [Message] and [Exception] are nvarchar(4000) and nvarchar(2000) respectively. Our code was trying to write the Exception thrown by the application but because the column wasn't wide enough it was truncating the stack trace.
So I asked our data team to run the following code
ALTER TABLE [dbo].[Log4NetLog]
;
ALTER COLUMN [Message] nvarchar(MAX) NOT NULL;
ALTER TABLE [dbo].[Log4NetLog]
ALTER COLUMN [Exception] nvarchar(MAX) NOT NULL;
Unfortunately, because of the huge amount of data in the table this essentially took the database off-line, and thus killed the website.
I have two questions, if I may:
1. In the event of someone attempting to execute such a query in future, what is the best way to kill this running query? The database was out of action for half an hour - could this have been speeded up?
2. It would be really useful to be able to apply the DMO script above. To this end we plan a) to schedule it during the night (the website is only used in this country and has virtually no traffic at night) and b) we want to remove old records - say, any record older than 1 year. However, my instinct is that if we were to try to run a query like
DELETE
FROM
Log4NetLog
WHERE
([Date] < '27 September 2017')
that this would similarly crash the database. Are there any strategies that might be applied to remove huge numbers of superfluous records?
Many thanks
Edward
September 27, 2018 at 8:06 am
Edward
With 18 million rows, and not narrow ones at that, you'd have to expect it to take a long time. You could try creating a new table and moving the data into it instead - but that would be more to prevent excessive transaction log growth than to save time.
John
September 27, 2018 at 8:10 am
John Mitchell-245523 - Thursday, September 27, 2018 8:06 AMEdwardWith 18 million rows, and not narrow ones at that, you'd have to expect it to take a long time. You could try creating a new table and moving the data into it instead - but that would be more to prevent excessive transaction log growth than to save time.
John
Thanks John. I wondered if there were a pattern for easy trimming of unwanted data; an approach for doing this in easy stages. Old data could be deleted without impacting our reporting capabilities.
September 27, 2018 at 8:19 am
Edward
Yes, you can delete in batches. Just delete TOP 10000 (or the number of your choice) over and over until there are no more rows that you want to delete. You may want to allow a few seconds and/or run a CHECKPOINT and/or do a transaction log backup after each individual delete, in order to check log growth.
As for killing the running query, possibly not. Killing the query rolls back the transaction, which may take even longer than allowing it to finish.
John
September 27, 2018 at 8:28 am
John Mitchell-245523 - Thursday, September 27, 2018 8:19 AMEdwardYes, you can delete in batches. Just delete TOP 10000 (or the number of your choice) over and over until there are no more rows that you want to delete. You may want to allow a few seconds and/or run a CHECKPOINT and/or do a transaction log backup after each individual delete, in order to check log growth.
As for killing the running query, possibly not. Killing the query rolls back the transaction, which may take even longer than allowing it to finish.
John
John
I discussed with others in myteam about the deleting in batches scenario, but I assumed that the query to a) identify rows to be deleted and b) deleting them might have negative effects on DB performance. However, I'll have a look and see if I can get this to work.
As for killing the query - thanks for the information. We'll know next time!
Edward
September 27, 2018 at 8:38 am
Edward
That depends on your indexing. A log table is the perfect candidate to have its clustered index on the Date column. If that's what you have then DELETE TOP (10000) FROM Log4NetLog WHERE Date < '20170927' will be real quick.
John
September 27, 2018 at 8:41 am
John Mitchell-245523 - Thursday, September 27, 2018 8:38 AMEdwardThat depends on your indexing. A log table is the perfect candidate to have its clustered index on the Date column. If that's what you have then DELETE TOP (10000) FROM Log4NetLog WHERE Date < '20170927' will be real quick.
John
Thanks John.
September 27, 2018 at 9:16 am
Here's an idea.
Create a view that selects the data you want to keep.
BCP out the data from the view.
Drop, or truncate the table.
BCP the data back in.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
September 27, 2018 at 9:20 am
Michael L John - Thursday, September 27, 2018 9:16 AMHere's an idea.
Create a view that selects the data you want to keep.
BCP out the data from the view.
Drop, or truncate the table.
BCP the data back in.
Thank you - an interesting suggestion!
September 27, 2018 at 9:36 am
edwardwill - Thursday, September 27, 2018 9:20 AMMichael L John - Thursday, September 27, 2018 9:16 AMHere's an idea.
Create a view that selects the data you want to keep.
BCP out the data from the view.
Drop, or truncate the table.
BCP the data back in.Thank you - an interesting suggestion!
We have just completed a similar situation to purge old data for GDPR.
As an example, the most recent was a table containing ~57 million records. We had to remove 7 million records.
This method took just under 30 minutes.
More details:
1. SELECT * into Table_OLD from Table
2. Create the view
3. BCP out the data. That took 6 minutes
4. Drop the table
5. Create the table, leaving off the keys and indexes
6. BCP in the data. That took 11 minutes
7. Re-create the PK, indexes, and foreign keys. That took 18 minutes
BCP in will be slower than out. Use native mode, not character mode.
Here's another question. Do you need the MAX data types?
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
September 27, 2018 at 10:35 am
edwardwill - Thursday, September 27, 2018 8:28 AMJohn Mitchell-245523 - Thursday, September 27, 2018 8:19 AMEdwardYes, you can delete in batches. Just delete TOP 10000 (or the number of your choice) over and over until there are no more rows that you want to delete. You may want to allow a few seconds and/or run a CHECKPOINT and/or do a transaction log backup after each individual delete, in order to check log growth.
As for killing the running query, possibly not. Killing the query rolls back the transaction, which may take even longer than allowing it to finish.
John
John
I discussed with others in myteam about the deleting in batches scenario, but I assumed that the query to a) identify rows to be deleted and b) deleting them might have negative effects on DB performance. However, I'll have a look and see if I can get this to work.
As for killing the query - thanks for the information. We'll know next time!
Edward
Trying to delete a large number of records in one go is going to have, potentially, a rather more serious impact as, after you impact on a certain number of locks for the delete the engine will lock the table - it's c. 5000 locks. Otherwise the server would become resource starved and eventually fall over (look up lock escalation). Batching will - if done in batches small enough, I often use 3000 - stop the table being locked
I'm a DBA.
I'm not paid to solve problems. I'm paid to prevent them.
September 27, 2018 at 11:31 am
edwardwill - Thursday, September 27, 2018 7:00 AMDISCLAIMER: I'm a dev rather than a DBA.We use Log4Net to capture database interactions - lots of INFO messages, a few WARNINGs and, crucially, ERRORs. Recently our website was misbehaving and in an effort to diagnose it I looked at the Log4NetLog table.(there are currently 17756271 rows in the table.)
I realised that there was a problem with the design of the table - the useful columns [Message] and [Exception] are nvarchar(4000) and nvarchar(2000) respectively. Our code was trying to write the Exception thrown by the application but because the column wasn't wide enough it was truncating the stack trace.
So I asked our data team to run the following code
ALTER TABLE [dbo].[Log4NetLog]
;
ALTER COLUMN [Message] nvarchar(MAX) NOT NULL;
ALTER TABLE [dbo].[Log4NetLog]
ALTER COLUMN [Exception] nvarchar(MAX) NOT NULL;Unfortunately, because of the huge amount of data in the table this essentially took the database off-line, and thus killed the website.
I have two questions, if I may:
1. In the event of someone attempting to execute such a query in future, what is the best way to kill this running query? The database was out of action for half an hour - could this have been speeded up?
2. It would be really useful to be able to apply the DMO script above. To this end we plan a) to schedule it during the night (the website is only used in this country and has virtually no traffic at night) and b) we want to remove old records - say, any record older than 1 year. However, my instinct is that if we were to try to run a query like
DELETE
FROM
Log4NetLog
WHERE
([Date] < '27 September 2017')that this would similarly crash the database. Are there any strategies that might be applied to remove huge numbers of superfluous records?
Many thanks
Edward
To avoid the issue when altering the columns - you could just create new columns. Adding the new columns would be very quick since there is no validation that needs to be done - you would just need to update the code to use the new columns or you could create a view that checks each column for null and returns the non-null value (e.g. coalesce(MessageMax, Message) AS Message, coalesce(ExceptionMax, Exception) AS Exception).
Or - since this is just a logging table you could create a new table with the appropriate data types with a new name. Then - at a low time rename the existing table to an archive name and the new table to the original name. The rename should be very quick and once completed your system will start logging to the new table.
If you did that - then deleting from the archive table would have zero impact on your application since that table is no longer being utilized. I would still recommend batching the deletes so you don't negatively impact the transaction log.
And one more item - if you can setup the new table with an appropriate partition scheme - you could eliminate any future issues of archiving/purging data by switching out the old partitions and creating new partitions. You could setup the partitions on a monthly basis...
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
September 27, 2018 at 12:55 pm
If there aren't any foreign keys (and logging tables like this rarely use them) just sp_rename the existing table to something else and recreate a new one. That way you have all the existing data in an offline table should you want to keep any of it and your system can go about it's merry way. You can always just perform a delete on the offline copy to clear out stuff you don't need.
Going forward, have a regular process in place to purge/archive unneeded records before it all gets to large - either by just deleting stuff or by partitioning the table and using partition switching to move data out of the table.
September 28, 2018 at 12:59 am
Thank you everyone for all your helpful input. I believe that our team has a plan, now (and for Blackadder fans, it's a cunning one).
Best wishes
Edward
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply