April 21, 2009 at 9:15 am
I've had slow issues with a PROD Server we just started using. The Prod server has 2 processors, 4 GB of RAM, and a lot of disk Space (On the D Drive).
I first thought that the drive space on the C Drive might be an issue, so I changed the database to the D Drive, which has plenty space. That did not help.
When I launch a query on the server, it takes about 10 times longer than on my dev machine, which is not that great.
Here's a query I tested for speed:
DBCC dropcleanbuffers GO
DBCC FREEPROCCACHE GO
SET statistics IO ON
SET statistics TIME ON
SELECT TOP 1000000 identity(INT,1,1) AS N
INTO dbo.Tally
FROM MASTER.sys.syscolumns sc1
CROSS JOIN MASTER.sys.syscolumns sc2
SET statistics TIME OFF
SET statistics IO OFF
DROP TABLE TALLY
Creating that tally table with 1 000 000 rows gives those results on
DEV:
Table 'syscolrdb'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 115, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 2, logical reads 19, physical reads 1, read-ahead reads 21, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1172 ms, elapsed time = 1707 ms.
(1000000 row(s) affected)
PROD:
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Table 'syscolrdb'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 115, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 2, logical reads 11, physical reads 1, read-ahead reads 16, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1047 ms, elapsed time = 11727 ms.
(1000000 row(s) affected)
Everything is the same, but the Elapsed time, which takes 10 times longer on the PROD Server. How do I know what the server is waiting on when running the query... and if it's so hard for the server to do this, why isn't it using the whole memory, or processor to run this query, it stays at about 10% processor usage...
Any thoughts?
Cheers,
J-F
April 21, 2009 at 11:10 am
Is there a difference in the execution plans on each machine? Also, have you checked the stats and what your major wait times are on the second server? You might be having a bottleneck that doesn't show in the general performance indicators.
April 21, 2009 at 11:33 am
I've used the master database for the query to show how it's not the query that is the problem, but really something on the server itself. I can run this query on several servers I have, and every server give a better result than this one. There seems to be a bottleneck, and I do not know how I can find it.
Is there another way I could monitor what is happening on the server itself? I ran the Perfmon, but really, I'm confused as what to check to see if the server has a bottleneck.
Cheers,
J-F
April 21, 2009 at 11:48 am
Free up the buffer cache and the proc cache and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
Run your query, then look at the wait stats (this is a very basic version, but can certainly help identify bottlenecks)
select * From sys.dm_os_wait_stats order by wait_time_ms desc
Post up the top 10 to 20 of those
April 21, 2009 at 11:52 am
Nicholas Cain (4/21/2009)
Free up the buffer cache and the proc cache and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
Run your query, then look at the wait stats (this is a very basic version, but can certainly help identify bottlenecks)
select * From sys.dm_os_wait_stats order by wait_time_ms desc
Post up the top 10 to 20 of those
Eh, nice, I did not know I could do that query to get the wait times... I do not know what they mean though..:
LAZYWRITER_SLEEP15116150817811015140
SQLTRACE_BUFFER_FLUSH37711508000040000
WRITELOG397555169584317182531
ASYNC_NETWORK_IO608541016402000765
SLEEP_BPOOL_FLUSH97639473415615
LOGBUFFER311368875234265
OLEDB479966273595150
IO_COMPLETION1499206712030
BROKER_TASK_STOP220000100000
PAGEIOLATCH_UP1077199371090
PAGEIOLATCH_SH14111903117115
PAGEIOLATCH_EX699154062340
LATCH_EX844219840
LCK_M_U942039840
LCK_M_S21178111250
SLEEP_TASK323508131293234
PAGELATCH_UP268906090
MSQL_XP5788282030
ASYNC_IO_COMPLETION17657650
CHKPT15785780
SLEEP_SYSTEMTASK15785780
SOS_SCHEDULER_YIELD547417115156
MSSEARCH121091090
PAGELATCH_EX466311515
LATCH_SH431310
SQLTRACE_LOCK115150
Cheers,
J-F
April 21, 2009 at 11:59 am
Now can you run it against the other machine and see where there's a difference.
April 21, 2009 at 12:04 pm
The query DBCC DropCleanBuffers does not seem to be reseting those counters to 0, ... is that Normal?
I just run DBCC DropCleanBuffers; select * from ...Stats.. and it still has the same results.
Cheers,
J-F
April 21, 2009 at 12:08 pm
For the basics on identifying what's happening on a server, I'd recommend this white paper from Microsoft. It's a great read.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 21, 2009 at 12:14 pm
April 21, 2009 at 12:20 pm
Nicholas Cain (4/21/2009)
oops, sorry
DBCC SQLPERF(WAITSTATS, CLEAR);
Ahhh!! Looks a lot more accurate now!
Here are the results...
LAZYWRITER_SLEEP222200010000
SQLTRACE_BUFFER_FLUSH52000040000
LOGBUFFER5001248412546
SLEEP_BPOOL_FLUSH5357421780
WRITELOG2413903750
PAGEIOLATCH_SH41671780
PAGELATCH_UP129629615
PAGEIOLATCH_EX462150
PAGEIOLATCH_UP246310
Can you see clearer in those? How do I know what they do, is there anything in the White Papers Grant posted that will help me know what are those counters? I guess so... I have some reading to do then!
Cheers,
J-F
April 21, 2009 at 12:24 pm
Can you compare those to another machine that is having better performance?
April 21, 2009 at 12:36 pm
J-F Bergeron (4/21/2009)
Nicholas Cain (4/21/2009)
oops, sorry
DBCC SQLPERF(WAITSTATS, CLEAR);
Ahhh!! Looks a lot more accurate now!
Here are the results...
LAZYWRITER_SLEEP222200010000
SQLTRACE_BUFFER_FLUSH52000040000
LOGBUFFER5001248412546
SLEEP_BPOOL_FLUSH5357421780
WRITELOG2413903750
PAGEIOLATCH_SH41671780
PAGELATCH_UP129629615
PAGEIOLATCH_EX462150
PAGEIOLATCH_UP246310
Can you see clearer in those? How do I know what they do, is there anything in the White Papers Grant posted that will help me know what are those counters? I guess so... I have some reading to do then!
Looks a lot better, on the wait times,
wait_typewaiting_tasks_countwait_time_msmax_wait_time_mssignal_wait_time_ms
LAZYWRITER_SLEEP6601510150
SQLTRACE_BUFFER_FLUSH1400040000
PAGEIOLATCH_SH37218460
LOGBUFFER393930
PAGEIOLATCH_EX1246150
SOS_SCHEDULER_YIELD279311531
PAGEIOLATCH_UP315150
Cheers,
J-F
April 21, 2009 at 2:22 pm
Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).
as an example
PAGEIOLATCH_SH 41 671 78 0 --new server
PAGEIOLATCH_SH 37 218 46 0 --old server
where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.
If you also look you can see
WRITELOG 24 1390 375 0
in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.
This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.
April 21, 2009 at 2:30 pm
Nicholas Cain (4/21/2009)
Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).as an example
PAGEIOLATCH_SH 41 671 78 0 --new server
PAGEIOLATCH_SH 37 218 46 0 --old server
where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.
If you also look you can see
WRITELOG 24 1390 375 0
in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.
This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.
Thanks for the analysis, it's really appreciated, by the way.
I've checked the different information that could be given in that view, and I've found a huge comparison in the "LOGBUFFER". I checked BOL, and all they say is that it's the time it waited to write to the LOGBuffer... that does not help me much.
I get a LOGBUFFER of 93 ms on the old server, compared to 12000 ms on the new one, that seems like my problem, but I do not know what it means...
Is there someplace where we can get a better explanation of every counter in the list? I'm really interested in knowing how to actually know which is what, and how we can solve it.
Thanks,
Edited : Here's what they say on the LOGBuffer:
Occurs when a task is waiting for space in the log buffer to store a log record. Consistently high values can indicate that the log devices cannot keep up with the logging information being generated by the server.
Cheers,
J-F
April 21, 2009 at 2:40 pm
J-F Bergeron (4/21/2009)
Nicholas Cain (4/21/2009)
Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).as an example
PAGEIOLATCH_SH 41 671 78 0 --new server
PAGEIOLATCH_SH 37 218 46 0 --old server
where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.
If you also look you can see
WRITELOG 24 1390 375 0
in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.
This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.
Thanks for the analysis, it's really appreciated, by the way.
I've checked the different information that could be given in that view, and I've found a huge comparison in the "LOGBUFFER". I checked BOL, and all they say is that it's the time it waited to write to the LOGBuffer... that does not help me much.
I get a LOGBUFFER of 93 ms on the old server, compared to 12000 ms on the new one, that seems like my problem, but I do not know what it means...
Is there someplace where we can get a better explanation of every counter in the list? I'm really interested in knowing how to actually know which is what, and how we can solve it.
Thanks,
Edited : Here's what they say on the LOGBuffer:
Occurs when a task is waiting for space in the log buffer to store a log record. Consistently high values can indicate that the log devices cannot keep up with the logging information being generated by the server.
Check The disk where the Logs are. Here are some pointers.
Make sure that you are getting the throughput you need from it.
Also defragment your log for certain loads it helps.
Make sure that nothing else is using the drive or at least that the data files are not in it.
* Noel
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply