ETL into database - auto update stats vs UPDATE STATISTICS

  • I have a SQL 2005 database with tables that gets wiped every night (some tables TRUNCATE'd, others DELETE with no WHERE).

    Then several of the tables are (for the most part) populated by INSERT INTO table SELECT columns FROM linked_server_table. Others are built using logic in stored procs.

    After the data is loaded, the tables are read-only, used by reports, for lookups by apps, etc.

    From a performance point of view, am I better off, or is it best practice to:

    1) Leave "auto update statistics" on and let SQL update the stats sometime after the data is loaded

    2) Turn "auto update statistics" off and run UPDATE STATISTICS ... WITH FULLSCAN on all the tables after the load

    3) Both 1 & 2 (leave "auto update stats" on AND do UPDATE STATISTICS on all the tables after the load

  • I don't know how big your tables are, but unless you have asynchronous updates of stats turned on then is is possible that you will encounter delays and possibly timeouts when the tables are queried for the first time (and the stats consequently auto-updated) after the data manipulation. I would therefore recommend manual stats updates in this scenario.

    Turn the auto stats update off and perform an UPDATE STATISTICS...WITH FULLSCAN after the load, assuming that you have time in your maintenance window to perform a full stats update. Leave the auto creation of stats turned on.

    When making this kind of configuration change to a database, you may find it helpful to add an Extended Property to the database documenting the reason for the change. This way anyone querying the decision in your absence (and with permission to view the database's Extended Properties) can see the justification for the decision.

    Chris

  • Chris Howarth-536003 (9/6/2010)


    Turn the auto stats update off and perform an UPDATE STATISTICS...WITH FULLSCAN after the load, assuming that you have time in your maintenance window to perform a full stats update. Leave the auto creation of stats turned on.Chris

    agree with chris ,but look for heavy tables ,you can go for partial statistics update for those tables.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • For almost all ETL scenarios where you wipe the entire table or affect a reasonable fraction of rows it is best to disable auto update/auto create, do the load, then manually refresh stats. FULLSCAN obviously gets you the best stats, but may not be possible on larger tables within the load window.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • As these settings are at the database level you should turn them on again after the load if you have other tables outside of the ETL process.

    Only true way to tell if it helps is to try it. you could run a server trace alongside it checking for update stats events.

    ---------------------------------------------------------------------

  • mannaggia (9/5/2010)


    I have a SQL 2005 database with tables that gets wiped every night (some tables TRUNCATE'd, others DELETE with no WHERE).

    Then several of the tables are (for the most part) populated by INSERT INTO table SELECT columns FROM linked_server_table. Others are built using logic in stored procs.

    After the data is loaded, the tables are read-only, used by reports, for lookups by apps, etc.

    From a performance point of view, am I better off, or is it best practice to:

    1) Leave "auto update statistics" on and let SQL update the stats sometime after the data is loaded

    2) Turn "auto update statistics" off and run UPDATE STATISTICS ... WITH FULLSCAN on all the tables after the load

    3) Both 1 & 2 (leave "auto update stats" on AND do UPDATE STATISTICS on all the tables after the load

    Most of the time ETL processes are somehow consistent in terms of data volume and data distribution.

    If this is the case I'll probably try option #4 which is ...

    a) turn auto update statistics off at table level.

    b) do nothing else

    The rationale behind this alternative is: Statistics are there to help query optimizer to generate a good execution plan so, assuming current execution plans are good just leave the old stats there.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply