September 19, 2012 at 1:21 pm
I have probably a silly question but I wanted to throw it out there anyways...
In developing a methodology of reindexing / index defragging...I have come up with a couple of questions.
For example
If I have a table and the table has a Clustered Key and 1 or more NC'd keys...
The Clustered key needs to be rebuilt
One NC key needs to be rebuilt
and the rest of the NC key's need to be defragged
Now the way I understand it and I am pretty sure i am right (SQL 2005 and above) if I run an ALTER INDEX WITH REBUILD that will not only rebuild the index but will update the statistics on the associated columns with a FULL SCAN.
Any columns that are NOT part of an index will not have their statistics updated during this process.
If I wanted to updated the non indexed columns statistics I would need to run a manual UPDATE STATISTICS (with either a SAMPLE or a FULL SCAN), and then define if it would be ALL, COLUMN, INDEX
QUESTION
Is it assumable, that I could run through a database and do the following in the specific order listed
1. Alter Index REBUILD all indexes that meet requirements for rebuilding
2. Alter Index REORG all indexes that don't meet the requirements for rebuild but for reorg
Once done with the index maint. the REBUILT index table columns should have stats updated with 100% sample rate
In Order to get the non-indexed columns as well as the indexed columns that were only reorg'ed to have updated column stats could I run the following to cover all the bases regarding statistics and index maintenance?
UPDATED STATISTICS </insert table name/> WITH FULLSCAN, COLUMNS
My thought is that the indexed columns are already updated...and this should go through the non-indexed columns and update their stats with 100% same size...
I have been looking and cannot find an answer online that supports my thought so I guess I am trying here next to figure out what the most complete method is to database maintenance regarding index maintenance and maintaining statistics.
Any insight is appreciated.
Thanks in advance,
Lee
September 20, 2012 at 7:31 pm
Looking for ANY comment...any at all 🙂
September 21, 2012 at 7:35 am
Yes, if you rebuild an index, then the stats for the associated columns do get automatically updated in a manner equivalent to fullscan. (Although if your only goal is to update stats themselves, then doing an update stats with fullscan usually incurs much more overhead than necessary; and unless you have some "unique" situation with some of your queries, the sp_updatestats command is sufficient in most cases to keep your stats current.)
However, to your other point, SQL Server maintains NO statistics for columns that are not part of any index, unless you create your own custom statistics with the "CREATE STATISTICS" command. Check Books Online, etc. for details.
September 21, 2012 at 3:00 pm
2ndHelping (9/21/2012)
Yes, if you rebuild an index, then the stats for the associated columns do get automatically updated in a manner equivalent to fullscan. (Although if your only goal is to update stats themselves, then doing an update stats with fullscan usually incurs much more overhead than necessary; and unless you have some "unique" situation with some of your queries, the sp_updatestats command is sufficient in most cases to keep your stats current.)However, to your other point, SQL Server maintains NO statistics for columns that are not part of any index, unless you create your own custom statistics with the "CREATE STATISTICS" command. Check Books Online, etc. for details.
2ndHelping (9/21/2012)
Yes, if you rebuild an index, then the stats for the associated columns do get automatically updated in a manner equivalent to fullscan. (Although if your only goal is to update stats themselves, then doing an update stats with fullscan usually incurs much more overhead than necessary; and unless you have some "unique" situation with some of your queries, the sp_updatestats command is sufficient in most cases to keep your stats current.)However, to your other point, SQL Server maintains NO statistics for columns that are not part of any index, unless you create your own custom statistics with the "CREATE STATISTICS" command. Check Books Online, etc. for details.
Thank you for the clarification...I guess I was looking at the process as how do I get the best bang for the buck from a limited maintenance window.
For a given table...
I determine which indexes need to be rebuilt then the stats are updated as well with 100% sample size...for the given columns of the said indexes.
To your point all it looks like sp_updatestats does is actually execute UPDATE STATISTICS ALL under the covers with either the default sample size or a defined sample size.
Also, based on BOL, it says that sp_udatestats won't re-update the same stats just updated when a ALTER INDEX - REBUILD command occurs...but I can't find that same claim against UPDATE STATISTICS...
Would you say it is safe to assume that the same logic would hold true for UPDATE STATISTICS?
If that is the case would it be advisable to:
Blanket run UPDATE STATISTICS WITH FULLSCAN, ALL (which I would presume would do a full scan against each column in the table that had NOT already had the stats updated with the index rebuild)
OR
Figure out what the names are of the system generated statistics and then build a process to roll through them and update each one individually with a FULLSCAN?
update statistics [dbo].[table_A](_WA_Sys_00000001_3F954511) with fullscan
I suppose the answer is...it DEPENDS lol...
Any input is appreciated!
September 21, 2012 at 3:05 pm
2ndHelping (9/21/2012)
However, to your other point, SQL Server maintains NO statistics for columns that are not part of any index, unless you create your own custom statistics with the "CREATE STATISTICS" command. Check Books Online, etc. for details.
That's not completely true. If auto create statistics is on, SQL can automatically create statistics on columns that it needs distribution info on. Those stats won't be updated when indexes are rebuild.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 21, 2012 at 3:08 pm
Leeland (9/21/2012)
Also, based on BOL, it says that sp_udatestats won't re-update the same stats just updated when a ALTER INDEX - REBUILD command occurs...but I can't find that same claim against UPDATE STATISTICS...Would you say it is safe to assume that the same logic would hold true for UPDATE STATISTICS?
No, it is not.
If that is the case would it be advisable to:
Blanket run UPDATE STATISTICS WITH FULLSCAN, ALL (which I would presume would do a full scan against each column in the table that had NOT already had the stats updated with the index rebuild)
UPDATE STATISTICS WITH FULLSCAN, ALL will update every single statistic regardless of when it was last updated. There's no checks for staleness at all (and all sp_updatestats does is check to see if a single row has changed)
As for what to do, depends on the size of the database and the size of the maintenance windows. If you have the time to blanket rebuild every single index and blanket update every single statistic, go ahead. It's when you don't that you have to get smart about selective rebuilds and selective stats updates.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 24, 2012 at 1:08 pm
GilaMonster (9/21/2012)
Leeland (9/21/2012)
Also, based on BOL, it says that sp_udatestats won't re-update the same stats just updated when a ALTER INDEX - REBUILD command occurs...but I can't find that same claim against UPDATE STATISTICS...Would you say it is safe to assume that the same logic would hold true for UPDATE STATISTICS?
No, it is not.
If that is the case would it be advisable to:
Blanket run UPDATE STATISTICS WITH FULLSCAN, ALL (which I would presume would do a full scan against each column in the table that had NOT already had the stats updated with the index rebuild)
UPDATE STATISTICS WITH FULLSCAN, ALL will update every single statistic regardless of when it was last updated. There's no checks for staleness at all (and all sp_updatestats does is check to see if a single row has changed)
As for what to do, depends on the size of the database and the size of the maintenance windows. If you have the time to blanket rebuild every single index and blanket update every single statistic, go ahead. It's when you don't that you have to get smart about selective rebuilds and selective stats updates.
The last statement is where I am little lost...
I know about managing indexes Clustered / Non-Clustered) selectively, the question for me then comes for managing statistics and how that should be handled...this is something that up until recently I haven't given as much consideration.
As I have learned more I am seeing their importance and want to see what methods are for managing them...
From what I have read and actually tested, the SQL server will auto generate single column statistics on columns used in the predicate of a query...
Why are these system generated statistics created even when there is a covering index?
Example.
CREATE TABLE [dbo].Table_A(
[row_id] [int] IDENTITY(1,1) NOT NULL,
[job_id] [varchar](50) NULL,
[job_name] [varchar](256) NULL,
[job_status] [char](1) NULL,
[transaction_time] [datetime] NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX ci_Table_A ON [dbo].Table_A ([row_id] ASC)
GO
CREATE NONCLUSTERED INDEX nci_Table_A_01 ON [dbo].Table_A ([row_id] ASC) INCLUDE(job_status, job_name)
GO
CREATE NONCLUSTERED INDEX nci_Table_A_02 ON [dbo].Table_A ([row_id], job_status, job_name)
SET IDENTITY_INSERT [dbo].[tb_job_status_checker] ON
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (1, N'F62AC1F0-ED02-449B-934B-0D245FC3DABA', N'job_name test entry Check Refresh', N'N', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (2, N'FCDF6797-498D-44CF-8323-19F0CE065457', N'job_name test entry Check', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (3, N'1EEF0345-6F1E-43F6-9A30-1FE1D4C182FF', N'job_name test entry', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (4, N'A849AC12-A4B8-4C7E-ABD2-3FF3281AE223', N'job_name test entry', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (5, N'611EEBB9-29FD-46CE-A0D4-58EB4B2DDEE2', N'job_name test entry', N'N', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (6, N'4AEC8E61-E57F-4FA6-BF82-60A4147E4998', N'job_name test entry', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (7, N'14094376-1457-4C41-BCDE-8BBEB74A0A7E', N'test entry', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (8, N'45A2771D-6C3D-4632-A36E-9E02979EB47B', N'job_name test entry', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (9, N'0FFD9DD3-A764-43ED-8F85-C1C680C0F244', N'job_name test entry', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (10, N'202419A3-7B3E-4A22-B8DF-D3FA22EB4FF0', N'job_name test entry', N'N', CAST(0x0000A0D600DB3533 AS DateTime))
INSERT [dbo].[tb_job_status_checker] ([row_id], [job_id], [job_name], [job_status], [transaction_time]) VALUES (11, N'ECE28CCE-1856-464E-806F-ECBEE1DF285B', N'job_name test entry', N'Y', CAST(0x0000A0D600DB3533 AS DateTime))
SET IDENTITY_INSERT [dbo].[tb_job_status_checker] OFF
SELECT [row_id]
,[job_status]
,[job_name]
FROM [DB_UTILS].[dbo].[tb_job_status_checker]
WHERE job_status = 'N' and job_name = 'D'
After executing the simple select, two system generated statistics are created...even with a covering index (unless I messed that up.).
Is that normal? Is my test case incorrect?
Regardless, when you speak about selectively managing the indexes and statistics that are the actual problem...
How do you manage the indexes side? Are these manually created statistics or system generated?
What metrics to you measure to decide?
Sorry for the plethora of questions...just a lot of uncertainties for me.
Thanks in advance!
Lee
September 24, 2012 at 2:08 pm
Ignoring the stats question for a moment...
Your indexes are redundant and unnecessary. You've got a clustered index on a column, then two nonclustered indexes on the same column. The nonclustered indexes aren't necessary at all there.
As for the stats...
The query you posted will likely result in stats being created on the job_status and job_name columns, because there is no index on either column, Ok, without an index there's no way other than a scan to process that query, but still...
Stats is a tough question because there's no easy way to manage them selectively and efficiently, hence many people will just update all stats when they do maintenance. If the maint window is large enough, that works. It's when the DB is too large and the maint window too small that it becomes difficult and, unlike indexes, there's no good guidelines as to when the stats should be updated because it's too dependant on data distribution and queries. Hence if you have time to just update them all, just update them all.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 24, 2012 at 3:18 pm
GilaMonster (9/24/2012)
Ignoring the stats question for a moment...Your indexes are redundant and unnecessary. You've got a clustered index on a column, then two nonclustered indexes on the same column. The nonclustered indexes aren't necessary at all there.
As for the stats...
The query you posted will likely result in stats being created on the job_status and job_name columns, because there is no index on either column, Ok, without an index there's no way other than a scan to process that query, but still...
Stats is a tough question because there's no easy way to manage them selectively and efficiently, hence many people will just update all stats when they do maintenance. If the maint window is large enough, that works. It's when the DB is too large and the maint window too small that it becomes difficult and, unlike indexes, there's no good guidelines as to when the stats should be updated because it's too dependant on data distribution and queries. Hence if you have time to just update them all, just update them all.
Hi Gail,
Sorry if that was a little confusing, I guess I was doing trial and error to see what would work as far as NOT creating the system generated stats...
I had thought that if the query (which has the two filters in the WHERE) had a covering NC index that it would bypass the creation of those system generated stats...
It didn't work and the stats were still created...
What I am getting out of your comment is that the only way to avoid having those system generated stats is to have a NC index for each column in the predicate...is that correct?
Thanks for the comments on stats management, I guess I was looking for something like that which detailed that there is no cut and dry way of managing them.
In your experience with updating stats...how much of an impact is it in comparison to running something like a index rebuild?
I would presume that if you are running stats update...and a process hits the table / column that is being updated it will be blocked until it is stats update is completed.
Thanks again and I will play around with that simple process above to get a better understanding of what works and what is redundant.
Lee
September 24, 2012 at 3:32 pm
Leeland (9/24/2012)
I had thought that if the query (which has the two filters in the WHERE) had a covering NC index that it would bypass the creation of those system generated stats...
Except that neither of the indexes you created is at all useful for that query. Technically covering, not none were seekable.
What I am getting out of your comment is that the only way to avoid having those system generated stats is to have a NC index for each column in the predicate...is that correct?
Yes, but that's not a very good indexing strategy. Why are you trying to avoid those stats? There's little downside in having them
In your experience with updating stats...how much of an impact is it in comparison to running something like a index rebuild?
Impact, not all that much usually. Depends on the settings though (sampled vs full scan)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 24, 2012 at 9:41 pm
"Except that neither of the indexes you created is at all useful for that query. Technically covering, not none were seekable."
Sorry for dragging this out, I see what you mean by this statement...I didn't have the index created correct.
"Yes, but that's not a very good indexing strategy. Why are you trying to avoid those stats? There's little downside in having them"
It wasn't so much as trying to avoid them, just trying to see what type of index creation covered the query to avoid having them created. The test wasn't anything more than that.
"Impact, not all that much usually. Depends on the settings though (sampled vs full scan)"
Thanks for the input.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply