November 2, 2002 at 7:17 pm
We have a situation that comes up every week or three.
We have a quad processor W2K server running SQL 2K SP2.
Every once in a while we have a query that is running that appears to block, or almost block, all other activity, without showing any reason to, and leaving three CPU's idle.
This happened just a few days ago. The query was a read-only query against a database that is only read from. It used a few memory tables to (quickly) build a table, the joined it to several large tables and ran for about 6 hours.
During that time, other queries against the same database were running obscenely slowly. NO query showed any blocking. The system had extremely low I/O, no paging, and was showing 25% CPU utilization. Other queries WERE running, but at about 1/10th the speed of normal. The real key here is that 3 of the 4 CPU's were mostly idle (note I do not mean literally 3 CPU's, but that the system showed 25% CPU busy, 75% idle).
Normally the system (which is 98% SQL Server and about 2% web server), when busy, is either I/O limited or running on all 4 CPU's, staying very busy. But something was happening here that prevented the system from simultaneously processing other queries, which were waiting variously in "Awaiting Command", "Conditional", "SELECT INTO". Statuses changing (slowing), nothing unusual. Looking at the DBCC INPUTBUFFER I could see them progressing, but extremely slowly. Taking repetitive snapshots of the CPU and IO showed the long-running query getting about 1 second of CPU for each 1 second elapsed, and no others getting much at all - I/O or CPU.
What does one do in this case to try to determine the cause? No blocking shown, nothing obvious limiting the CPU, there appeared to be something in SQL that caused it to run the long-running query on one CPU, and other queries to make progress only very occasionally (maybe while the long running query did I/O or something).
We killed the long running quries, the rest of the server took off as normal.
November 2, 2002 at 9:11 pm
In current activity in EM or sp_who in QA is tempdb showing a log backup taking place? I have a similar occurrance with slow downs and seemed to be focused around remaining drive free space, I had 3 GB of 56 left, I have since had a day to cleanup and free 6GB so now at 9 of 56 and not seen.
Also, take a look at the query plan of the query this tends to happen on and see if any odd operations occurr.
Lastly, one thing that often gets overlooked when using temp tables is the fact that there is a know issue with using SELECT INTO to create and populate them in that they hold locks on the tables data is from. The work arround is to use CREATE TABLE to create the temp table or make real tables. Then INSERT the data to the table.
November 3, 2002 at 3:52 am
The other option is parallism. If you have a poorly performing query that has a query cost larger than your parallism threshold noremally 5 then the query will try and use >1 processor. There is a known issue with the parallel processors merging their results. The result is the query just grinding to a halt, and showing little activity.
I have had this in 2 situations
Query reading lots (far too much data) due to crap indexing.
SP written that creates a table, populates it, then queries the table populated.
The lattter is a recompilation issue as well, in that you should never populate a table and then query on it in the same scope, because the SP has to do a recompile after the population(or update/delete).
Anyway to see if this is the issue look at sysprocesses, two columns waittype and another one will show 0x0200 and exchange (not sure which way round.
To cure it use the options clause on the query, this forces use of 1 processor (Max Degree Of Parallism).
OPTIONS (MAXDOP=1) (I think check BOL)
Simon Sabin
Co-author of SQL Server 2000 XML Distilled
http://www.amazon.co.uk/exec/obidos/ASIN/1904347088
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons
November 3, 2002 at 5:36 pm
quote:
In current activity in EM or sp_who in QA is tempdb showing a log backup taking place?
No.
quote:
Also, take a look at the query plan of the query this tends to happen on and see if any odd operations occurr.
The query that was running for hours was, simply, a very long query. It might be written better, but it was a one shot need that needed to go through about 60G of data. The problem was not that it was taking a long time, but that it was preventing other queries from proceeding despite the CPU, Memory and I/O being near idle (25%, no faulting, near zero I/O).
quote:
Lastly, one thing that often gets overlooked when using temp tables is the fact that there is a know issue with using SELECT INTO to create and populate them in that they hold locks on the tables data is from. The work arround is to use CREATE TABLE to create the temp table or make real tables. Then INSERT the data to the table.
Can you elaborate, or point me to information?
I set up a controlled test, did a very slow SELECT INTO and tried accessing the (only) table in the FROM clause from another connection, and had no problems. I was even accessing the the same rows. While I've always head do CREATE TABLE instead of SELECT INTO, I had not heard any specific reasons. Locking like this would be a HUGE reason. Does it only happen some times?
November 3, 2002 at 5:50 pm
quote:
The other option is parallism. If you have a poorly performing query that has a query cost larger than your parallism threshold noremally 5 then the query will try and use >1 processor. There is a known issue with the parallel processors merging their results.......
Anyway to see if this is the issue look at sysprocesses, two columns waittype and another one will show 0x0200 and exchange (not sure which way round.
Thanks, I'll try this next time it happens (and if I get some idle time when I'm also idle will try to set up the killer query to try this again -- I saved it).
Can you elaborate a bit though? Or give me a pointer? This is a large data warehouse, so a huge percentage of the queries run on it result in 4 parallel threads (or whatever they are called; we have 4 processors, it seems to always break it up that way). If there's a query running with 4 threads, are you saying that merging these results slows down THAT query, or does it slow down other queries also runing?
The problem I observed was that the first really long query (which very likely had been parallel at some point though I didn't notice by the time I looked) was still running, only one thread (maybe the only) was getting any CPU, and other queries were waiting and not progressing despite there being 75% of the CPU's available to run. Could this query have originally been running in 4 threads, and something about it coming together be blocking the (effective) use of the other CPU's?
I did definitely notice that the other queries that were waiting were running in parallel threads. There were two in particular, both continued to show 4 threads each (same SPID, different ECID), and these 8 were getting extremely little CPU as I took repeative snapshots of activity. But they WERE moving. And no blocking was shown on a SP_WHO.
In SQL7 I have occasionally had to set MAXDOP to 1. But so far in SQL2K I haven't run into any queries that misbehaved. I'd love to read about known issues here, as we use it a lot and I assume setting MAXDOP to 1 in general is a bad thing.
November 4, 2002 at 3:05 am
I always found it to be the case when each parallel process was processing large blocks of data. And yes I haven't some across it yet in 2K.
Simon Sabin
Co-author of SQL Server 2000 XML Distilled
http://www.amazon.co.uk/exec/obidos/ASIN/1904347088
Simon Sabin
SQL Server MVP
http://sqlblogcasts.com/blogs/simons
November 4, 2002 at 3:53 am
As for SELECT INTO causing locks that can become an issue. This relates to locks on the tempdb while trying to create the temp table, which were a problem in SQL 6.5 and 7. May not even be an issue in 2K, have not tested enough to know if can be reproduced there.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply