Distribution Statistics

  • One thing I'm not clear on is whether Indices act as distribution statistics?

    I know that if you specify a WHERE clause in a query that uses a non-indexed column you get an entry in sysindexes called _WA_SYS_etc. If you have an index then this doesn't happen.

    If SQL Server's default behaviour is to create these _WA_SYS entries then when would you use CREATE STATISTICTS?

  • Good question..

    I dug into the same situation you are about a year ago. If we all were perfect DBA's we would know what distribution stats to create and would turn auto-stats off.

    There may be a time when you want to create Stats with a composite of columns where an index does not exist. You may also want to insure there are statistics for a column(s) and using CREATE STATS.. will enforce this. I don't feel that there is a gaurantee the stats will be created automatically. I have also found situations where the stats reduced performance...

    As always...take this for what it is worth. It is only one DBA's opinion.


    "Keep Your Stick On the Ice" ..Red Green

  • Thanks for this.

    I notice that the _WA_SYS stats count towards the 255 indices allowed on a table.

    There is an db option to autocreate these stats which defaults to "ON" as does the autoupdate facility.

    I notice that tables under 8Mb do a FULLSCAN where as those over the 8 do a sample. Although the performance overhead of these is minimal in general apps I suspect it is a question of scale.

    If I was populating a data warehouse then I would

    • Delete all indices.
    • Delete all statistics.
    • Switch off autocreate of statistics.
    • Populate the database
    • Recreate indices/stats
    • Switch on autocreate

    I find it very hard to get meaningful stats on performance issues because even if I am the only connected user on a machine the variability of timing for a given tests is huge. For this reason I don't think that I could get a reliable indication of stats vs indices.

  • It's a tough question. Not sure that indexes count towards the statistics, but knowing the laziness of programmers, I doubt it.

    Steve Jones

    sjones@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/sjones

    http://www.dkranch.net

  • Hi there

    I believe they do, especially if the so called "parameter sniffing" feature of sqlserver is still around. Run something like this for example:

    DBCC SHOW_STATISTICS (mytable, PK_MyPrimaryKey)

    this returns info similar to what I would expect for disribution statistics with a date of the refresh.

    Cheers

    Chris K


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • Statistics DO count toward the limit of indexes allowed on a table. The Query Optimizer will utilize the distribution information from both index statistics blob and the non-index statistics blob when determining which columns to use to limit the query result set first (presuming there is more than one filter condition). It is also used in some join operations to determine which table should be on the left versus the right.

    To try to answer your questions directly, creating an index does not automatically create the corresponding statistics. You may manually create those statistics, or you may wait until the Query Optimizer determines the need for them (at query run-time, which is not ideal).

    You may also create statistics for other columns that are utilized in less frequently used queries that you do not wish to maintain indexes for due to performance considerations.

    If the auto-statistics option is turned on (the default), the Query Optimizer will create the non-index statistics (_WA_SYS_%) as it sees fit.


    David R Buckingham, MCDBA,MCSA,MCP

  • Reading through my old MS notes from SQL 6.5 days it appears that running UPDATE STATISTICS on a table will recreate all distribution statistics within an index. Moving forward to SQL7/2000 this will also update distribution statistics built using CREATE STATISTICS.

    Although there is a database option to auto update statistics (defaults to on) there seems to be the implication that after populating a database you should run an UPDATE STATISTICS on all tables/indices.

  • From an architecture standpoint I really like Buckingham's last post.

    Lots of good stuff in this thread!!


    "Keep Your Stick On the Ice" ..Red Green

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply