Performance issues on a bigger server.

Question

Performance issues on a bigger server.

J-F Bergeron

SSCrazy Eights

Points: 8593
More actions
April 21, 2009 at 9:15 am

#199606

I've had slow issues with a PROD Server we just started using. The Prod server has 2 processors, 4 GB of RAM, and a lot of disk Space (On the D Drive).
I first thought that the drive space on the C Drive might be an issue, so I changed the database to the D Drive, which has plenty space. That did not help.
When I launch a query on the server, it takes about 10 times longer than on my dev machine, which is not that great.
Here's a query I tested for speed:
DBCC dropcleanbuffers GO
DBCC FREEPROCCACHE GO
SET statistics IO ON
SET statistics TIME ON
SELECT TOP 1000000 identity(INT,1,1) AS N
INTO dbo.Tally
FROM MASTER.sys.syscolumns sc1
CROSS JOIN MASTER.sys.syscolumns sc2
SET statistics TIME OFF
SET statistics IO OFF
DROP TABLE TALLY
Creating that tally table with 1 000 000 rows gives those results on
DEV:
Table 'syscolrdb'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 115, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 2, logical reads 19, physical reads 1, read-ahead reads 21, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1172 ms, elapsed time = 1707 ms.
(1000000 row(s) affected)
PROD:
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Table 'syscolrdb'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 115, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 2, logical reads 11, physical reads 1, read-ahead reads 16, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1047 ms, elapsed time = 11727 ms.
(1000000 row(s) affected)
Everything is the same, but the Elapsed time, which takes 10 times longer on the PROD Server. How do I know what the server is waiting on when running the query... and if it's so hard for the server to do this, why isn't it using the whole memory, or processor to run this query, it stays at about 10% processor usage...
Any thoughts?
Cheers,
J-F

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 1

Is there a difference in the execution plans on each machine? Also, have you checked the stats and what your major wait times are on the second server? You might be having a bottleneck that doesn't show in the general performance indicators.

Shamless self promotion - read my blog http://sirsql.net

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 2

I've used the master database for the query to show how it's not the query that is the problem, but really something on the server itself. I can run this query on several servers I have, and every server give a better result than this one. There seems to be a bottleneck, and I do not know how I can find it.

Is there another way I could monitor what is happening on the server itself? I ran the Perfmon, but really, I'm confused as what to check to see if the server has a bottleneck.

Cheers,

J-F

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 3

Free up the buffer cache and the proc cache and buffer cache

DBCC FREEPROCCACHE;

DBCC DROPCLEANBUFFERS;

Run your query, then look at the wait stats (this is a very basic version, but can certainly help identify bottlenecks)

select * From sys.dm_os_wait_stats order by wait_time_ms desc

Post up the top 10 to 20 of those

Shamless self promotion - read my blog http://sirsql.net

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 4

Nicholas Cain (4/21/2009)
Free up the buffer cache and the proc cache and buffer cache
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
Run your query, then look at the wait stats (this is a very basic version, but can certainly help identify bottlenecks)
select * From sys.dm_os_wait_stats order by wait_time_ms desc
Post up the top 10 to 20 of those

Eh, nice, I did not know I could do that query to get the wait times... I do not know what they mean though..:

LAZYWRITER_SLEEP15116150817811015140

SQLTRACE_BUFFER_FLUSH37711508000040000

WRITELOG397555169584317182531

ASYNC_NETWORK_IO608541016402000765

SLEEP_BPOOL_FLUSH97639473415615

LOGBUFFER311368875234265

OLEDB479966273595150

IO_COMPLETION1499206712030

BROKER_TASK_STOP220000100000

PAGEIOLATCH_UP1077199371090

PAGEIOLATCH_SH14111903117115

PAGEIOLATCH_EX699154062340

LATCH_EX844219840

LCK_M_U942039840

LCK_M_S21178111250

SLEEP_TASK323508131293234

PAGELATCH_UP268906090

MSQL_XP5788282030

ASYNC_IO_COMPLETION17657650

CHKPT15785780

SLEEP_SYSTEMTASK15785780

SOS_SCHEDULER_YIELD547417115156

MSSEARCH121091090

PAGELATCH_EX466311515

LATCH_SH431310

SQLTRACE_LOCK115150

Cheers,

J-F

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 5

Now can you run it against the other machine and see where there's a difference.

Shamless self promotion - read my blog http://sirsql.net

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 6

The query DBCC DropCleanBuffers does not seem to be reseting those counters to 0, ... is that Normal?

I just run DBCC DropCleanBuffers; select * from ...Stats.. and it still has the same results.

Cheers,

J-F

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 7

For the basics on identifying what's happening on a server, I'd recommend this white paper from Microsoft. It's a great read.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 8

oops, sorry

DBCC SQLPERF(WAITSTATS, CLEAR);

Shamless self promotion - read my blog http://sirsql.net

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 9

Nicholas Cain (4/21/2009)
oops, sorry
DBCC SQLPERF(WAITSTATS, CLEAR);

Ahhh!! Looks a lot more accurate now!

Here are the results...

LAZYWRITER_SLEEP222200010000

SQLTRACE_BUFFER_FLUSH52000040000

LOGBUFFER5001248412546

SLEEP_BPOOL_FLUSH5357421780

WRITELOG2413903750

PAGEIOLATCH_SH41671780

PAGELATCH_UP129629615

PAGEIOLATCH_EX462150

PAGEIOLATCH_UP246310

Can you see clearer in those? How do I know what they do, is there anything in the White Papers Grant posted that will help me know what are those counters? I guess so... I have some reading to do then!

Cheers,

J-F

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 10

Can you compare those to another machine that is having better performance?

Shamless self promotion - read my blog http://sirsql.net

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 11

J-F Bergeron (4/21/2009)
Nicholas Cain (4/21/2009)
oops, sorry
DBCC SQLPERF(WAITSTATS, CLEAR);
Ahhh!! Looks a lot more accurate now!
Here are the results...
LAZYWRITER_SLEEP222200010000
SQLTRACE_BUFFER_FLUSH52000040000
LOGBUFFER5001248412546
SLEEP_BPOOL_FLUSH5357421780
WRITELOG2413903750
PAGEIOLATCH_SH41671780
PAGELATCH_UP129629615
PAGEIOLATCH_EX462150
PAGEIOLATCH_UP246310
Can you see clearer in those? How do I know what they do, is there anything in the White Papers Grant posted that will help me know what are those counters? I guess so... I have some reading to do then!

Looks a lot better, on the wait times,

wait_typewaiting_tasks_countwait_time_msmax_wait_time_mssignal_wait_time_ms

LAZYWRITER_SLEEP6601510150

SQLTRACE_BUFFER_FLUSH1400040000

PAGEIOLATCH_SH37218460

LOGBUFFER393930

PAGEIOLATCH_EX1246150

SOS_SCHEDULER_YIELD279311531

PAGEIOLATCH_UP315150

Cheers,

J-F

Nicholas Cain SSC-Dedicated Points: 33009 More actions · Answer 12

Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).

as an example

PAGEIOLATCH_SH 41 671 78 0 --new server

PAGEIOLATCH_SH 37 218 46 0 --old server

where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.

If you also look you can see

WRITELOG 24 1390 375 0

in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.

This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.

Shamless self promotion - read my blog http://sirsql.net

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 13

Nicholas Cain (4/21/2009)
Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).
as an example
PAGEIOLATCH_SH 41 671 78 0 --new server
PAGEIOLATCH_SH 37 218 46 0 --old server
where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.
If you also look you can see
WRITELOG 24 1390 375 0
in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.
This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.

Thanks for the analysis, it's really appreciated, by the way.

I've checked the different information that could be given in that view, and I've found a huge comparison in the "LOGBUFFER". I checked BOL, and all they say is that it's the time it waited to write to the LOGBuffer... that does not help me much.

I get a LOGBUFFER of 93 ms on the old server, compared to 12000 ms on the new one, that seems like my problem, but I do not know what it means...

Is there someplace where we can get a better explanation of every counter in the list? I'm really interested in knowing how to actually know which is what, and how we can solve it.

Thanks,

Edited : Here's what they say on the LOGBuffer:

Occurs when a task is waiting for space in the log buffer to store a log record. Consistently high values can indicate that the log devices cannot keep up with the logging information being generated by the server.

Cheers,

J-F

noeld SSC Guru Points: 96590 More actions · Answer 14

J-F Bergeron (4/21/2009)
Nicholas Cain (4/21/2009)
Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).
as an example
PAGEIOLATCH_SH 41 671 78 0 --new server
PAGEIOLATCH_SH 37 218 46 0 --old server
where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.
If you also look you can see
WRITELOG 24 1390 375 0
in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.
This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.
Thanks for the analysis, it's really appreciated, by the way.
I've checked the different information that could be given in that view, and I've found a huge comparison in the "LOGBUFFER". I checked BOL, and all they say is that it's the time it waited to write to the LOGBuffer... that does not help me much.
I get a LOGBUFFER of 93 ms on the old server, compared to 12000 ms on the new one, that seems like my problem, but I do not know what it means...
Is there someplace where we can get a better explanation of every counter in the list? I'm really interested in knowing how to actually know which is what, and how we can solve it.
Thanks,
Edited : Here's what they say on the LOGBuffer:
Occurs when a task is waiting for space in the log buffer to store a log record. Consistently high values can indicate that the log devices cannot keep up with the logging information being generated by the server.

Check The disk where the Logs are. Here are some pointers.

Make sure that you are getting the throughput you need from it.

Also defragment your log for certain loads it helps.

Make sure that nothing else is using the drive or at least that the data files are not in it.

* Noel