July 9, 2015 at 10:37 am
Hi,
I have a situation where I need to rebuild indexes on a large DB (500G).
When I do a test run of the rebuilds in my test environment it uses 100G of space - which is fine with me.
When I do a rebuild in my High Availability environment - same DB, same script - it eats up over 600G of space and fills the volume.
What can I do without removing my DB from H/A to rebuild the indexes?
July 9, 2015 at 10:43 am
First, your test environment should mirror your production environment to give you a better sense of what will happen in production.
Second, you say in production it uses 600G of space and fills the volume. Are we talking the mdf/ldf file(s) or the ldf file? What form does your HA environment take (what are you using for HA)? Have you monitored the various DMVs that track your HA environment?
July 9, 2015 at 11:07 am
Lynn Pettis (7/9/2015)
First, your test environment should mirror your production environment to give you a better sense of what will happen in production.Second, you say in production it uses 600G of space and fills the volume. Are we talking the mdf/ldf file(s) or the ldf file? What form does your HA environment take (what are you using for HA)? Have you monitored the various DMVs that track your HA environment?
Hi Lynn -
Agreed, we've asked for a real test environment.
It's the LDF that fills the volume.
We are using synchronous mode in HA. HA seems to be working just fine so I didn't run the HA DMV's while the log was growing.
Seems odd to me that doing a rebuild in our test environment takes so little space with most things being the same. As soon as I run in HA-the tlogs fill like crazy. I can take the DB out of HA - but it's a 6 hour process to reattach to the listener.
July 9, 2015 at 11:52 am
There are a couple things I'd check.
First, is the DB using the Full recovery model in the test environment? If it's using Simple in the test environment, that would go at least part of the way to explaining the difference.
Second, is there concurrent user activity when the rebuild is running in production? Rebuilds run as a single transaction, which prevents log truncation. If you have a fair amount of concurrent user activity in production, and none in test, then that could also be contributing.
You could check that by checking the log_reuse_wait_desc from sys.databases for that DB in each environment during the rebuild, in addition to looking at what was writing to the logs (by leveraging sys.dm_tran_database_transactions). Just compare those results between environments.
Also, while it's probably not the case here, since you said you're using synchronous and the AG looked healthy, if the secondary were falling behind (say, it was disconnected or suspended), then that would also prevent log truncation.
Cheers!
July 9, 2015 at 12:23 pm
Jacob Wilkins (7/9/2015)
There are a couple things I'd check.First, is the DB using the Full recovery model in the test environment? If it's using Simple in the test environment, that would go at least part of the way to explaining the difference.
Second, is there concurrent user activity when the rebuild is running in production? Rebuilds run as a single transaction, which prevents log truncation. If you have a fair amount of concurrent user activity in production, and none in test, then that could also be contributing.
You could check that by checking the log_reuse_wait_desc from sys.databases for that DB in each environment during the rebuild, in addition to looking at what was writing to the logs (by leveraging sys.dm_tran_database_transactions). Just compare those results between environments.
Also, while it's probably not the case here, since you said you're using synchronous and the AG looked healthy, if the secondary were falling behind (say, it was disconnected or suspended), then that would also prevent log truncation.
Cheers!
Hi Jacob,
Yes it is set to full recovery in the test environment.
There is no one is the DB while I am running the rebuild. So little or no user activity.
July 9, 2015 at 12:46 pm
Ah the HA index rebuild, I've just had to deal with this same situation. I found this article pretty helpful in determining my action plan for the work:
The bottom line is that it creates a huge amount of log activity, there is not a lot you can do to get away from that. My implementation involved booking a maintenance window during a low activity period for 5 days in a row. I broke up the indexes into 5 groups and ran one group each night.
I'm not sure if this question was asked, but you will only need to rebuild them on the primary server. The new indexes will be replicated to each node.
Cheers!
July 9, 2015 at 12:49 pm
Ah, if literally the only difference is the AG, then it is likely you're just running into the problem of REDO on the secondary falling very behind during the rebuilds. The log on the primary can't be truncated beyond the last redone lsn, so if the REDO thread on the secondary gets blocked or otherwise falls behind, then that can lead to a lot of log growth on the primary.
That issue is described here: http://blogs.msdn.com/b/alwaysonpro/archive/2015/03/03/recommendations-for-index-maintenance-with-alwayson-availability-groups.aspx.
I've run into this a couple times in the past, and it's a bit annoying, to be sure.
I'd check whether that's the case by watching the redo queue size on the secondary during the rebuilds. If that is indeed the cause, then you may have to follow some of the recommendations in that article, like rebuilding fewer indexes each time, for example.
Cheers!
July 9, 2015 at 12:59 pm
Jacob,
They're going to think we're the same person...we linked the same article and signed off with the same salutation!
Cheers indeed!
July 9, 2015 at 1:01 pm
And started our posts with "Ah"! My doppelganger! 🙂
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply