January 14, 2024 at 7:05 pm
Just to make sure and keep the readers aware that I am not a DBA but learning and working as I face the issue.
Scenario:
Have a 2TB db where there is a heavy load of activity during the 1st and 2nd week of the month. Also, to add – the data keeps adding starting from the first month like first month will do a ETL with 4 M records and with next month there will be another 4 M added. This trend continues for the year and every month processing increases as they process data from the starting of the month. Also, there is a history of data which is preserved on the source and transactional / reporting tables. One of the biggest object which is existing has a size of 250 GB ( as of now – 400 M Records ) . As the process is random – predicting the transactional log size seems to vary based on the different periods of time. Determined the log size – not like fixed approach as I didn’t see any place that gives a formula to calculate on what would be the limit of transactional log that needs to be set . So , based on common analysis and observation – kept the transactional log to be as double the size of the biggest object which exists – so in our case it was 500 GB initially ( will be revisited on frequent intervals) . It works fine most of the time in the year even though there is a fluctuation on the intensity of how much transactions happens – but during the end of the year or starting of the next year when huge amount of data gets processed – this limit is not enough.
Process fails out due to no space of logs – as it gets filled up within 2 to 3 days in the starting of the week as it crosses and we were forced to add the space for the log as the user needs to have the db highly available and on strict timelines ( Note - differential backup is in place and complete one happens in the weekend ) .
Questions –
If we increase the log size then it’s actually set and getting it back is an issue – so we end up spending more log space ( For ex to handle the case if I increase it 1 TB) which is not used for most of the time in the year – how to be more conservative or what steps needs can be taken to control this
Will the below work or what else do we need
- Working on long running transactions – trying to avoid transactions if its not required ( Insert / Delete or Update)
- Working on removing un-wanted indexes on big data objects
- CDC is activated on the db – checking on ways to reduce the utilization
- Data archival of big objects
Question on Shrink Log – there are different opinions on applying shrink log – what’s the impact of adding the shrink log in the ETL rather than when required ( issues happen) to truncate logs .
Are there different approaches to check on this from your experience . Thanks
January 14, 2024 at 8:31 pm
My opinion - predicting log file usage is tricky and often not a good use of resources (your time and energy). I've seen systems that have 0 transactions one hour and thus nothing in the log file and the next hour there are millions of transactions and a larger tlog file. So, if I need to predict the size needed, my approach is to look at the size on a specific date (such as the 1st) and then again the following month assuming I am doing regular backups, am in full recovery mode, and that the month is an "average" month. For an ETL server (aka a reporting server), I would NOT be in full recovery, but you may need to be full for your environment.
IF this is an ETL server, I imagine it is mostly used for reporting and doesn't have much data going in outside of the ETL data. If this is the case, you could probably change the recovery model to simple and do a shrink on the log after the ETL finishes and then watch the size grow throughout the month until the next ETL load. Then you will know roughly how large the log file needs to be to last for that month and you can shrink it after the ETL to some size larger than that value (so you have wiggle room).
If you MUST be in full recovery mode, then make sure you are taking a log backup as soon as the ETL completes as that is very likely to be the thing that causes the most growth in the log file. This will help reduce the amount of auto-grow by the log file.
Now, with the above being said, I REALLY like pre-growing my log file to whatever the MOST I anticipate it to use for a period or letting it auto-grow to the size it needs to be and only shrinking IF the growth was due to an anomaly. The reason being I do NOT want my ETL to fail due to lack of disk space. In your scenario, you KNOW that your ETL needs a lot of log file to complete, so you will want to make 100% sure that you have at LEAST that much disk free before the ETL runs. If you have the log file already using that space, you have nothing to worry about (unless the disk usage goes up monthly). If you really are low enough on disk space that you are trying to juggle log files throughout the month to make sure you aren't wasting disk, I would talk to the IT dept and try to get more disk. Disk is a cheap - ETL failure due to lack of disk is an expensive mistake.
As for your guess of having the log file to be double the size of the largest object, that may be sufficient or it may be overkill or it may not be enough. It is VERY likely to be overkill as the tlog only tracks what CHANGED in the database. So (usually) a SELECT will not have a log in the tlog file, but an INSERT, UPDATE, and DELETE will. So, unless you are deleting and repopulating the tables in your ETL, you are not likely to use double the space of your largest object in there. So to predict the size of the tlog file, you need to know how much data will be changing. If it is simple recovery, then you need to know the size of the data (in kb/mb/gb, not rows) changing per transaction. If it is full recovery, then you need to know the size of the data (in kb/mb/gb, not rows) changing between log backups.
I said that SELECTs usually are not in the tlog as there are cases where they CAN be in there. Like most things with SQL Server, "it depends", but in most cases, my understanding is that SELECT operations won't show up in the tlog file.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 16, 2024 at 7:29 pm
Thanks for your detailed analysis .
I have another follow up question - to shrink the log file - is it possible if it doesn't have any free space available . Trying to shrink from SSMS or using DBCC ShrinkFile is not working . Trying not to add any space to the log file - without doing that would like to release the space - is that possible . Can't truncate the log files as well -even if we have the db in simple mode - comes back to the same issue of no space .
January 16, 2024 at 7:46 pm
do also note that having CDC on the db means that those transactions will be kept on the LOG even if db is in simple recovery until they are processed - looking at the log_reuse_wait_desc from sys.databases will tell if if this is why the log usage is high.
So if volume of changes is high and processing speed is slow this may well be the reason why your tlog grows so much.
January 16, 2024 at 8:18 pm
Are you able to switch to simple recovery mode and do a backup? If so, that should put your log file down to 99% unused at which point you should be able to shrink it. I think the backup is required after switching recovery modes to clean up the log file or I may be remembering things wrong...
I also would avoid doing tasks in SSMS GUI, but use TSQL. You can use TSQL from within SSMS, but if you are using GUI operations, they tend to add overhead and sometimes do things you are not expecting. In general, I try to avoid using the GUI for things apart from lookups. Any changes I use TSQL except for database mail as I have never found a way to enable and configure that from TSQL.
Now, with the above being said, more disk is still my recommendation. IF the log file grew to this size once, it is likely to do it again in the future. Shrinking the log file is a band-aid fix to the problem that may (likely will) come back again and when it comes back it may be worse.
What I would try doing is to switch to simple recovery and then try shrinking the log file by a small amount. I would go SUPER small, like shrinking it by 10 MB just so you are not sitting at 0 bytes free and if that succeeds, then start doing larger and larger chunks until the log file is at a reasonable size.
Now there are risks to switching recovery model - by switching to simple, you break the log chain, so you can no longer do point in time restores. But at the moment, I think the bigger concern is the full disk.
Now to add to what Frederico_Fonseca said (do correct me if I am mistaken in this), once CDC writes the changes to disk, it is freed up in the log. So unless you have a TON of writes happening that are causing the data to remain queued up to be consumed by CDC, you shouldn't have CDC causing the log file to fill the disk.
Also, after you get the space back, I would recommend running a DBCC CHECKDB to make sure that there is no data corruption. Filling the disk shouldn't lead to that, but I have seen it happen with consumer grade disks before - my personal laptop had that happen - disk hit 100% full, 0 bytes free and the disk lost it's mind and everything was just gone. I was able to repair and recover things (thankfully just the file allocation table went bonkers on it), but was a pain in the butt. Not trying to scare you, but just something to check.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
February 3, 2024 at 5:55 am
Thanks to everyone with the insight .. with details. We were able to shrink the log file - after we disabled the CDC as it was not allowing us to do anything . Once we disabled - we shrink the log file and then enable back the CDC . I am still learning and gathering from all your feedbacks and resources . Thanks once again - will share more details as i implement .
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply