Optimizing a Data Warehouse on SQL Server 2012 with AlwaysON DR - Discussion?

  • Hi,

    Just thought I would throw something out for discussion and perhaps see what other people out there are doing to see if I can be doing things better where I am!

    I administer a Data Warehouse which comprises of 6 SQL Server 2012 DBs each sitting at approx. 600-800GB - each DB holds data in various forms as it is consumed and transformed from our main platform. I didn’t design the DW – I just need to keep it up and running.

    We have a remote DR site which we replicate too via AlwaysOn and also an additional server in the same data centre as the Primary which is used for Adhoc reporting by the Analysts – also Replicated with AlwyasOn

    I have Ola Hallengren's Index Optimization scripts running across the Primary Replica Mon-Fri 0800-2000, Auto Update Statistics is OFF for all DBs and I do a FULL SCAN update with the scripts. The tables are so big that we found that having the Auto Update Stats on was rarely triggering updates.

    Sooooooooooo - issues!!

    Running the optimisation Jobs across the Primary blows out the Transaction logs big time and we have constant fall behinds with the DR site as all of Index Rebuilds/Reorgs have to be replayed on the Secondaries.

    Even in the time window I have I can't hit every table every week to to the FULL SCAN Stat updates and Rebuild/Reorg all of the Indexes.

    So I am thinking maybe there is a better way - is AlwaysOn really the best way to DR a DataWarehouse? Should the T2371 trace flag negate the need to be doing FULL SCAN updates - if this is in place then the ETL processes would trigger the Stat updates but then this would degrade performance during ETL processing? I did do a live test and turn on this trace flag and also turned back on Auto Update Stats but performance degradation on ETL loads was reported so I reverted back again.

    Just be interesting to hear how other people manage their environments out there

    Thanks for your time,

    JK

  • JayK (3/12/2014)


    Hi,

    Just thought I would throw something out for discussion and perhaps see what other people out there are doing to see if I can be doing things better where I am!

    I administer a Data Warehouse which comprises of 6 SQL Server 2012 DBs each sitting at approx. 600-800GB - each DB holds data in various forms as it is consumed and transformed from our main platform. I didn’t design the DW – I just need to keep it up and running.

    We have a remote DR site which we replicate too via AlwaysOn and also an additional server in the same data centre as the Primary which is used for Adhoc reporting by the Analysts – also Replicated with AlwyasOn

    I have Ola Hallengren's Index Optimization scripts running across the Primary Replica Mon-Fri 0800-2000, Auto Update Statistics is OFF for all DBs and I do a FULL SCAN update with the scripts. The tables are so big that we found that having the Auto Update Stats on was rarely triggering updates.

    Sooooooooooo - issues!!

    Running the optimisation Jobs across the Primary blows out the Transaction logs big time and we have constant fall behinds with the DR site as all of Index Rebuilds/Reorgs have to be replayed on the Secondaries.

    Even in the time window I have I can't hit every table every week to to the FULL SCAN Stat updates and Rebuild/Reorg all of the Indexes.

    So I am thinking maybe there is a better way - is AlwaysOn really the best way to DR a DataWarehouse? Should the T2371 trace flag negate the need to be doing FULL SCAN updates - if this is in place then the ETL processes would trigger the Stat updates but then this would degrade performance during ETL processing? I did do a live test and turn on this trace flag and also turned back on Auto Update Stats but performance degradation on ETL loads was reported so I reverted back again.

    Just be interesting to hear how other people manage their environments out there

    Thanks for your time,

    JK

    1) Yes, for stats updates on large tables consider TF 2371: http://blogs.msdn.com/b/saponsqlserver/archive/2011/09/07/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371.aspx. BUT, as you noted you may still want them disabled so that they don't fire MULTIPLE times during your ETL load.

    2) What are the exact parameters you are using for ola's index maintenance execution?

    3) Have you reviewed the output from ola's script? My guess is that you are blowing out tlogs because your fill factors are too high (anyone wanna bet against them being defaulted?? :-)) and you are getting lots of page splits as you load in more data with ETL process. Maybe GUIDs are in play - ufff if so!!

    4) Have you optimized your networking stack between the two servers? What kind of bandwidth and latency do you have? Sometimes getting what you want for DR costs $$$. You either pony up or back off your DR requirements.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi Kevin,

    Thanks for taking the time to reply - apreciated.

    To address your questions below:

    1.I could look at enabling the T2371 flag again for sure - I guess where I was confused was should the flag be enabled along with having the AUTO UPDATE STATS set to TRUE - or should I have left that as FALSE. I can investigate this one further.

    2. Parameters for Ola's script are below:

    @FragmentationLow nvarchar(max) = NULL,

    @FragmentationMedium nvarchar(max) = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',

    @FragmentationHigh nvarchar(max) = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',

    @FragmentationLevel1 int = 10,

    @FragmentationLevel2 int = 30,

    @PageCountLevel int = 100,

    @SortInTempdb nvarchar(max) = 'N',

    @MaxDOP int = 10,

    @FillFactor int = NULL,

    @PadIndex nvarchar(max) = NULL,

    @LOBCompaction nvarchar(max) = 'Y',

    @UpdateStatistics nvarchar(max) = 'ALL',

    @OnlyModifiedStatistics nvarchar(max) = 'Y',

    @StatisticsSample int = 100,

    @StatisticsResample nvarchar(max) = 'N',

    @PartitionLevel nvarchar(max) = 'N',

    @TimeLimit int = NULL,

    @Indexes nvarchar(max) = NULL,

    @Delay int = NULL,

    @LogToTable nvarchar(max) = 'Y',

    @Execute nvarchar(max) = 'Y'

    So yes I am going with default here and checked the server setting is 0 - this is an excellent pick up - I never thought of this.

    We currently have a 100MB dedicated link between our data centres.

    To give you an example of what I am up against running Ola's script over the biggest table in the DW which has 2 billion rows took 5 days.

  • JayK (3/12/2014)


    Hi Kevin,

    Thanks for taking the time to reply - apreciated.

    To address your questions below:

    1.I could look at enabling the T2371 flag again for sure - I guess where I was confused was should the flag be enabled along with having the AUTO UPDATE STATS set to TRUE - or should I have left that as FALSE. I can investigate this one further.

    2. Parameters for Ola's script are below:

    @FragmentationLow nvarchar(max) = NULL,

    @FragmentationMedium nvarchar(max) = 'INDEX_REORGANIZE,INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',

    @FragmentationHigh nvarchar(max) = 'INDEX_REBUILD_ONLINE,INDEX_REBUILD_OFFLINE',

    @FragmentationLevel1 int = 10,

    @FragmentationLevel2 int = 30,

    @PageCountLevel int = 100,

    @SortInTempdb nvarchar(max) = 'N',

    @MaxDOP int = 10,

    @FillFactor int = NULL,

    @PadIndex nvarchar(max) = NULL,

    @LOBCompaction nvarchar(max) = 'Y',

    @UpdateStatistics nvarchar(max) = 'ALL',

    @OnlyModifiedStatistics nvarchar(max) = 'Y',

    @StatisticsSample int = 100,

    @StatisticsResample nvarchar(max) = 'N',

    @PartitionLevel nvarchar(max) = 'N',

    @TimeLimit int = NULL,

    @Indexes nvarchar(max) = NULL,

    @Delay int = NULL,

    @LogToTable nvarchar(max) = 'Y',

    @Execute nvarchar(max) = 'Y'

    So yes I am going with default here and checked the server setting is 0 - this is an excellent pick up - I never thought of this.

    We currently have a 100MB dedicated link between our data centres.

    To give you an example of what I am up against running Ola's script over the biggest table in the DW which has 2 billion rows took 5 days.

    A) How many stats are there? FULL SCAN on a bunch of stats on a 2B row table could hurt!

    B) What is the full table schema, with indexes?

    C) What is the server config (CPU cores, RAM, detailed IO subsystem config)?

    D) Have you done a waitstats and file IO stall differential analysis while the job is running?

    E) Is the big table partitioned?

    F) Are you on enterprise edition of SQL Server and what build number?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply