January 16, 2012 at 4:27 am
Hello,
I am encountering some critical issues on SQL Server 2008 in a production environment; We have a database that contains a FTS index; we also have a maintenance job that rebuilds the FTS index and updates statistics like this:
alter fulltext catalog [ftCatalogName] rebuild
update statistics [tableNameWithFtIndex]
The job runs weekly, Monday at 9AM; it finishes in around 1minute and always succeeds;
Now the problem: everytime after the job finishes, the CPU of sqlserver.exe process goes up to 99%; regularly the CPU stays at around 70-80% (i know it is very high and we are trying to improve this); and the only way to make it go back to it's regular state (70-80%) is only if we stop the IIS on all front-end servers (we have a webfarm) and then start them up again - this way the CPU for sqlserver.exe goes to 0% when the IISs are stopped, and back to 70-80% when the IISs are restarted back up;
Also i've analyzed the behavior of CPU during the rebuild process:
- start FTS rebuild job (above statements)
- sqlserver.exe CPU=75% (normal); fdhost.exe=20-25% (normal during rebuild)
- job finishes in 51 secs and fdhost CPU goes to 0%, and sqlserver.exe goes to 99% and stays like this until i apply the flow described above (start/stop IIS on front-end servers)
Does anyone have any idea about why this happens? At least an explication... or some ways to optimize this index and make it less resource hungry at rebuild...
Thank you in advance!
January 16, 2012 at 4:59 am
Please provide some information about the hardware of the server also.
As a guideline:
1. Rebuild the index only in the off hours (in the night or whatever is appropriate)
2. Since you are already rebuilding the index, you don't need to update the statistics again for the same index.
January 16, 2012 at 5:05 am
Thank you for your quick response!
I will remove the "update stats" statement; regarding the job schedule, we used to have it on Saturday at 3AM, but what made us move it to Monday morning, was the fact that the CPU remained at 99%, and this caused SQL Server to respond extremely slow, thus causing front-end servers to go down, all leading to huge periods of unavailability - and thus we decided to move it on Monday so that we can quickly take action when this happened; either way, it is not OK from my point of view to have to apply that whole workaround with IIS reset because it is not a reliable long-term solution
Again, thank you for your suggestions!
January 16, 2012 at 5:21 am
george.pancescu (1/16/2012)
Thank you for your quick response!I will remove the "update stats" statement; regarding the job schedule, we used to have it on Saturday at 3AM, but what made us move it to Monday morning, was the fact that the CPU remained at 99%, and this caused SQL Server to respond extremely slow, thus causing front-end servers to go down, all leading to huge periods of unavailability - and thus we decided to move it on Monday so that we can quickly take action when this happened; either way, it is not OK from my point of view to have to apply that whole workaround with IIS reset because it is not a reliable long-term solution
Again, thank you for your suggestions!
Try to use the MAXDOP hint in your index rebuild statement & test the performance after that:
ALTER INDEX IndexName ON TableName REBUILD WITH (MAXDOP=1)
January 16, 2012 at 5:49 am
Divine Flame (1/16/2012)
Try to use the MAXDOP hint in your index rebuild statement & test the performance after that:
ALTER INDEX IndexName ON TableName REBUILD WITH (MAXDOP=1)
I am not sure I can use this hint on FTS catalog; it works with regular indexes but not with FTS catalog (i tried: "ALTER fulltext catalog dictionaryFTCatalog REBUILD WITH maxdop=1" and several other syntax but kept throwing errors)
January 16, 2012 at 6:42 am
george.pancescu (1/16/2012)
Divine Flame (1/16/2012)
Try to use the MAXDOP hint in your index rebuild statement & test the performance after that:
ALTER INDEX IndexName ON TableName REBUILD WITH (MAXDOP=1)
I am not sure I can use this hint on FTS catalog; it works with regular indexes but not with FTS catalog (i tried: "ALTER fulltext catalog dictionaryFTCatalog REBUILD WITH maxdop=1" and several other syntax but kept throwing errors)
It seems, you are right George. MAXDOP can't be used when rebuilding Full Text Catalog. My apologies.
January 16, 2012 at 12:52 pm
It is quite possible that the update statistics was causing this. When you rebuild the index, the statistics are updated using a full scan and are much more accurate/efficient than the estimates used by the update statistics step. So basically, you were taking accurate statistics and changing them to estimates. Now this can throw off execution plans and other things. See what happens first now that you took out this step. It may have been all you needed to fix the issue.
Jared
CE - Microsoft
January 16, 2012 at 9:43 pm
how many transactions does the database see in a minute? The scheduled index rebuild may be causing the problem. I ran into a problem in the past with scheduled index rebuild causing high CPU usage on a traffic heavy db server. The index rebuild causes a traffic jam of transactions which the server cannot keep up.
Try manually running the index rebuild individually.
January 16, 2012 at 11:33 pm
Regarding update statistics, you may be right... i made some tests and saw that this operation itself is very resource hungry; i'll wait to see what happens;
Regarding traffic, we have big traffic; i do not know the exact number of trans. per minute but i will check; however i do know we have around 2 million page views per day for this website; and in certain pages we have multiple queries to the DB, so probably around 40-50trans per second, but this is just an estimate; indeed, rebuilding the fts index is very resource consuming because it adds locks and calls getData for each record in the table, and this together with high number of transactions that come from the app, can skyrocket the CPU...
I have the following plan:
- stop existing rebuild job
- create a new job that runs at 2 days and:
-- checks FTS index fragmentation
-- if fragmentation is between 10-35%, then launch a REORGANIZE (not rebuild) to help reduce the fragmentation level
- create a new job that runs weekly on sunday and:
-- checks FTS index fragmentation
-- if fragmentation is higher/equal than/to 35%, then launch a REBUILD (without updating stats)
This way, the rebuild operation will not run weekly, unless it is really needed; I will need however to monitor the results of FTS queries, because i know the past they were degrading after a certain period of time and that is why we added the REBUILD weekly - maybe REORGANIZE is sufficient enough...
What still bothers me is the fact that the CPU even at its normal traffic conditions, it is around 70-80%; we tried optimizing procedures based on how often they were executed (profiler) but still the CPU is too high... i'm thinking either the hardware resources are not enough for this kind of traffic or something is wrong in the DB (procedures or something else) - the server has 2 quad cores Xenon CPUs, 24GB of RAM, RAID hdds...
Thank you all for your responses and help so far!
January 17, 2012 at 1:10 am
This very good script is available on msdn itself to identify & rebuild/reorganize indexes based on the fragmentation level. You can use the same.
-- Ensure a USE <databasename> statement has been executed first.
SET NOCOUNT ON;
DECLARE @objectid int;
DECLARE @indexid int;
DECLARE @partitioncount bigint;
DECLARE @schemaname nvarchar(130);
DECLARE @objectname nvarchar(130);
DECLARE @indexname nvarchar(130);
DECLARE @partitionnum bigint;
DECLARE @partitions bigint;
DECLARE @frag float;
DECLARE @command nvarchar(4000);
-- Conditionally select tables and indexes from the sys.dm_db_index_physical_stats function
-- and convert object and index IDs to names.
SELECT
object_id AS objectid,
index_id AS indexid,
partition_number AS partitionnum,
avg_fragmentation_in_percent AS frag
INTO #work_to_do
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED')
WHERE avg_fragmentation_in_percent > 10.0 AND index_id > 0;
-- Declare the cursor for the list of partitions to be processed.
DECLARE partitions CURSOR FOR SELECT * FROM #work_to_do;
-- Open the cursor.
OPEN partitions;
-- Loop through the partitions.
WHILE (1=1)
BEGIN;
FETCH NEXT
FROM partitions
INTO @objectid, @indexid, @partitionnum, @frag;
IF @@FETCH_STATUS < 0 BREAK;
SELECT @objectname = QUOTENAME(o.name), @schemaname = QUOTENAME(s.name)
FROM sys.objects AS o
JOIN sys.schemas as s ON s.schema_id = o.schema_id
WHERE o.object_id = @objectid;
SELECT @indexname = QUOTENAME(name)
FROM sys.indexes
WHERE object_id = @objectid AND index_id = @indexid;
SELECT @partitioncount = count (*)
FROM sys.partitions
WHERE object_id = @objectid AND index_id = @indexid;
-- 30 is an arbitrary decision point at which to switch between reorganizing and rebuilding.
IF @frag < 30.0
SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REORGANIZE';
IF @frag >= 30.0
SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REBUILD';
IF @partitioncount > 1
SET @command = @command + N' PARTITION=' + CAST(@partitionnum AS nvarchar(10));
EXEC (@command);
PRINT N'Executed: ' + @command;
END;
-- Close and deallocate the cursor.
CLOSE partitions;
DEALLOCATE partitions;
-- Drop the temporary table.
DROP TABLE #work_to_do;
GO
To go to that MSDN url click here.
January 17, 2012 at 2:02 am
george.pancescu (1/16/2012)
What still bothers me is the fact that the CPU even at its normal traffic conditions, it is around 70-80%; we tried optimizing procedures based on how often they were executed (profiler) but still the CPU is too high... i'm thinking either the hardware resources are not enough for this kind of traffic or something is wrong in the DB (procedures or something else) - the server has 2 quad cores Xenon CPUs, 24GB of RAM, RAID hdds...Thank you all for your responses and help so far!
You can use below query to identify the queries which are using the most of the CPU.
select top 20
sum(qs.total_worker_time) as total_cpu_time,
sum(qs.execution_count) as total_execution_count,
count(*) as number_of_statements,
qs.plan_handle
from
sys.dm_exec_query_stats qs
group by qs.plan_handle
order by sum(qs.total_worker_time) desc
Once you have identified the CPU intensive queries, you can start optimizing them (rather than optimizing those which are running most of the time but may be not resource intensive;-)).
January 17, 2012 at 4:06 am
Divine Flame, thank you very much for your help; your posts are real helpful! I will try to use those queries and see what i discover..
Thank you!
January 17, 2012 at 9:04 am
High CPU utilization can be a number of things; i.e. it could be a symptom from slow network or other things. Have you looked at the book Troubleshooting SQL Server: A Guide for the Accidental DBA?
http://www.red-gate.com/our-company/about/book-store/ --p.s. if anyone can tell me how to embed the url in text, that would be great 🙂
This can help you a lot in these types of things. It seems like your hardware is pretty good, so I would investigate other things first.
Jared
CE - Microsoft
January 17, 2012 at 9:11 am
SQLKnowItAll (1/17/2012)
High CPU utilization can be a number of things; i.e. it could be a symptom from slow network or other things. Have you looked at the book Troubleshooting SQL Server: A Guide for the Accidental DBA?http://www.red-gate.com/our-company/about/book-store/ --p.s. if anyone can tell me how to embed the url in text, that would be great 🙂
This can help you a lot in these types of things. It seems like your hardware is pretty good, so I would investigate other things first.
Hi I will look over this book; regarding network issues, we investigated this pretty thorough, and we are sure that it is not a network issue - it is most definitely something within the sql server...
January 17, 2012 at 9:17 am
george.pancescu (1/17/2012)
SQLKnowItAll (1/17/2012)
High CPU utilization can be a number of things; i.e. it could be a symptom from slow network or other things. Have you looked at the book Troubleshooting SQL Server: A Guide for the Accidental DBA?http://www.red-gate.com/our-company/about/book-store/ --p.s. if anyone can tell me how to embed the url in text, that would be great 🙂
This can help you a lot in these types of things. It seems like your hardware is pretty good, so I would investigate other things first.
Hi I will look over this book; regarding network issues, we investigated this pretty thorough, and we are sure that it is not a network issue - it is most definitely something within the sql server...
Well, I was just using that as an example. There are so many things that it could be that starting somewhere more general to point you in the right direction may be better suited. Just saying that I have spend days going over stored procedures and queries only to find that it was the offsite backup that was causing issues. Of course, if you are convinced that it is SQL Server then looking at the proper dmv's and knowing how to interpret them along with a proper trace will help you find the culprit sooner than trying to sort through stored procs or ad-hoc queries on the application side. 🙂
Jared
CE - Microsoft
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply