Performance issues on a bigger server.

  • I've had slow issues with a PROD Server we just started using. The Prod server has 2 processors, 4 GB of RAM, and a lot of disk Space (On the D Drive).

    I first thought that the drive space on the C Drive might be an issue, so I changed the database to the D Drive, which has plenty space. That did not help.

    When I launch a query on the server, it takes about 10 times longer than on my dev machine, which is not that great.

    Here's a query I tested for speed:

    DBCC dropcleanbuffers GO

    DBCC FREEPROCCACHE GO

    SET statistics IO ON

    SET statistics TIME ON

    SELECT TOP 1000000 identity(INT,1,1) AS N

    INTO dbo.Tally

    FROM MASTER.sys.syscolumns sc1

    CROSS JOIN MASTER.sys.syscolumns sc2

    SET statistics TIME OFF

    SET statistics IO OFF

    DROP TABLE TALLY

    Creating that tally table with 1 000 000 rows gives those results on

    DEV:

    Table 'syscolrdb'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 115, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 2, logical reads 19, physical reads 1, read-ahead reads 21, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1172 ms, elapsed time = 1707 ms.

    (1000000 row(s) affected)

    PROD:

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Table 'syscolrdb'. Scan count 1, logical reads 95, physical reads 0, read-ahead reads 115, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'syscolpars'. Scan count 2, logical reads 11, physical reads 1, read-ahead reads 16, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1047 ms, elapsed time = 11727 ms.

    (1000000 row(s) affected)

    Everything is the same, but the Elapsed time, which takes 10 times longer on the PROD Server. How do I know what the server is waiting on when running the query... and if it's so hard for the server to do this, why isn't it using the whole memory, or processor to run this query, it stays at about 10% processor usage...

    Any thoughts?

    Cheers,

    J-F

  • Is there a difference in the execution plans on each machine? Also, have you checked the stats and what your major wait times are on the second server? You might be having a bottleneck that doesn't show in the general performance indicators.



    Shamless self promotion - read my blog http://sirsql.net

  • I've used the master database for the query to show how it's not the query that is the problem, but really something on the server itself. I can run this query on several servers I have, and every server give a better result than this one. There seems to be a bottleneck, and I do not know how I can find it.

    Is there another way I could monitor what is happening on the server itself? I ran the Perfmon, but really, I'm confused as what to check to see if the server has a bottleneck.

    Cheers,

    J-F

  • Free up the buffer cache and the proc cache and buffer cache

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    Run your query, then look at the wait stats (this is a very basic version, but can certainly help identify bottlenecks)

    select * From sys.dm_os_wait_stats order by wait_time_ms desc

    Post up the top 10 to 20 of those



    Shamless self promotion - read my blog http://sirsql.net

  • Nicholas Cain (4/21/2009)


    Free up the buffer cache and the proc cache and buffer cache

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    Run your query, then look at the wait stats (this is a very basic version, but can certainly help identify bottlenecks)

    select * From sys.dm_os_wait_stats order by wait_time_ms desc

    Post up the top 10 to 20 of those

    Eh, nice, I did not know I could do that query to get the wait times... I do not know what they mean though..:

    LAZYWRITER_SLEEP15116150817811015140

    SQLTRACE_BUFFER_FLUSH37711508000040000

    WRITELOG397555169584317182531

    ASYNC_NETWORK_IO608541016402000765

    SLEEP_BPOOL_FLUSH97639473415615

    LOGBUFFER311368875234265

    OLEDB479966273595150

    IO_COMPLETION1499206712030

    BROKER_TASK_STOP220000100000

    PAGEIOLATCH_UP1077199371090

    PAGEIOLATCH_SH14111903117115

    PAGEIOLATCH_EX699154062340

    LATCH_EX844219840

    LCK_M_U942039840

    LCK_M_S21178111250

    SLEEP_TASK323508131293234

    PAGELATCH_UP268906090

    MSQL_XP5788282030

    ASYNC_IO_COMPLETION17657650

    CHKPT15785780

    SLEEP_SYSTEMTASK15785780

    SOS_SCHEDULER_YIELD547417115156

    MSSEARCH121091090

    PAGELATCH_EX466311515

    LATCH_SH431310

    SQLTRACE_LOCK115150

    Cheers,

    J-F

  • Now can you run it against the other machine and see where there's a difference.



    Shamless self promotion - read my blog http://sirsql.net

  • The query DBCC DropCleanBuffers does not seem to be reseting those counters to 0, ... is that Normal?

    I just run DBCC DropCleanBuffers; select * from ...Stats.. and it still has the same results.

    Cheers,

    J-F

  • For the basics on identifying what's happening on a server, I'd recommend this white paper from Microsoft. It's a great read.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • oops, sorry

    DBCC SQLPERF(WAITSTATS, CLEAR);



    Shamless self promotion - read my blog http://sirsql.net

  • Nicholas Cain (4/21/2009)


    oops, sorry

    DBCC SQLPERF(WAITSTATS, CLEAR);

    Ahhh!! Looks a lot more accurate now!

    Here are the results...

    LAZYWRITER_SLEEP222200010000

    SQLTRACE_BUFFER_FLUSH52000040000

    LOGBUFFER5001248412546

    SLEEP_BPOOL_FLUSH5357421780

    WRITELOG2413903750

    PAGEIOLATCH_SH41671780

    PAGELATCH_UP129629615

    PAGEIOLATCH_EX462150

    PAGEIOLATCH_UP246310

    Can you see clearer in those? How do I know what they do, is there anything in the White Papers Grant posted that will help me know what are those counters? I guess so... I have some reading to do then!

    Cheers,

    J-F

  • Can you compare those to another machine that is having better performance?



    Shamless self promotion - read my blog http://sirsql.net

  • J-F Bergeron (4/21/2009)


    Nicholas Cain (4/21/2009)


    oops, sorry

    DBCC SQLPERF(WAITSTATS, CLEAR);

    Ahhh!! Looks a lot more accurate now!

    Here are the results...

    LAZYWRITER_SLEEP222200010000

    SQLTRACE_BUFFER_FLUSH52000040000

    LOGBUFFER5001248412546

    SLEEP_BPOOL_FLUSH5357421780

    WRITELOG2413903750

    PAGEIOLATCH_SH41671780

    PAGELATCH_UP129629615

    PAGEIOLATCH_EX462150

    PAGEIOLATCH_UP246310

    Can you see clearer in those? How do I know what they do, is there anything in the White Papers Grant posted that will help me know what are those counters? I guess so... I have some reading to do then!

    Looks a lot better, on the wait times,

    wait_typewaiting_tasks_countwait_time_msmax_wait_time_mssignal_wait_time_ms

    LAZYWRITER_SLEEP6601510150

    SQLTRACE_BUFFER_FLUSH1400040000

    PAGEIOLATCH_SH37218460

    LOGBUFFER393930

    PAGEIOLATCH_EX1246150

    SOS_SCHEDULER_YIELD279311531

    PAGEIOLATCH_UP315150

    Cheers,

    J-F

  • Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).

    as an example

    PAGEIOLATCH_SH 41 671 78 0 --new server

    PAGEIOLATCH_SH 37 218 46 0 --old server

    where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.

    If you also look you can see

    WRITELOG 24 1390 375 0

    in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.

    This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.



    Shamless self promotion - read my blog http://sirsql.net

  • Nicholas Cain (4/21/2009)


    Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).

    as an example

    PAGEIOLATCH_SH 41 671 78 0 --new server

    PAGEIOLATCH_SH 37 218 46 0 --old server

    where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.

    If you also look you can see

    WRITELOG 24 1390 375 0

    in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.

    This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.

    Thanks for the analysis, it's really appreciated, by the way.

    I've checked the different information that could be given in that view, and I've found a huge comparison in the "LOGBUFFER". I checked BOL, and all they say is that it's the time it waited to write to the LOGBuffer... that does not help me much.

    I get a LOGBUFFER of 93 ms on the old server, compared to 12000 ms on the new one, that seems like my problem, but I do not know what it means...

    Is there someplace where we can get a better explanation of every counter in the list? I'm really interested in knowing how to actually know which is what, and how we can solve it.

    Thanks,

    Edited : Here's what they say on the LOGBuffer:

    Occurs when a task is waiting for space in the log buffer to store a log record. Consistently high values can indicate that the log devices cannot keep up with the logging information being generated by the server.

    Cheers,

    J-F

  • J-F Bergeron (4/21/2009)


    Nicholas Cain (4/21/2009)


    Just going by the comparisons that are shown there it appears as though the disk on the new machine is not as fast as on the old (or there is a difference in RAID levels, disk cacheing etc).

    as an example

    PAGEIOLATCH_SH 41 671 78 0 --new server

    PAGEIOLATCH_SH 37 218 46 0 --old server

    where the second number indicates the total wait time in milliseconds and the first being the waiting tasks count. As you can see the total wait time on the new server is higher than on the old.

    If you also look you can see

    WRITELOG 24 1390 375 0

    in the new wait stats, however it does not appear at all in the shortened listing on the old server. If you are waiting to write to the log then you are going to be looking at disk slowness. It may well not be seen via perfmon and not seen as disk queuing, it just can't write to the disk as fast.

    This could simply come down to the speed of the drives in each machine (10k vs 15k for example). I'd check the disk configs between the two and see if anything shows there.

    Thanks for the analysis, it's really appreciated, by the way.

    I've checked the different information that could be given in that view, and I've found a huge comparison in the "LOGBUFFER". I checked BOL, and all they say is that it's the time it waited to write to the LOGBuffer... that does not help me much.

    I get a LOGBUFFER of 93 ms on the old server, compared to 12000 ms on the new one, that seems like my problem, but I do not know what it means...

    Is there someplace where we can get a better explanation of every counter in the list? I'm really interested in knowing how to actually know which is what, and how we can solve it.

    Thanks,

    Edited : Here's what they say on the LOGBuffer:

    Occurs when a task is waiting for space in the log buffer to store a log record. Consistently high values can indicate that the log devices cannot keep up with the logging information being generated by the server.

    Check The disk where the Logs are. Here are some pointers.

    Make sure that you are getting the throughput you need from it.

    Also defragment your log for certain loads it helps.

    Make sure that nothing else is using the drive or at least that the data files are not in it.


    * Noel

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply