Tempdb pefformance problems

  • I would like to know what to check for if I have tempdb contention problems, currently I don’t see any high alerts for the disk related counters on perfmon. At the same time, I want to rule out the tempdb configuration problem with the use of 8 tempdb files with equal sizes, I have learnt that this can sometimes cause problems and one should start with 1 core = 1 file first and work down the list, at the moment I have 8, as the server has 4 CPU’s and is a dual core machine.

    I noticed that select * from #temptable takes around 2 seconds when there is only 1 thread running the query, but as soon as another thread starts, then it takes a ridiculous amount of time, I mean 30 – 40 seconds.

    Are there any DMV’s that I can query, to nail out the root cause of the issue, I have looked at CPU and memory and they all look fine, I’m suspecting tempdb as being the bottleneck.

  • Paul Randal has blogged about the 1 file per cpu with tempdb in the past. You can read his thoughts on this at:

    In Recovery... | Search Engine Q&A #12: Should you create multiple ...

    I wouldn't have more than 2 files with your stated configuration unless you have guaranteed PFS contention. To troubleshoot tempdb see the following whitepaper:

    Working with tempdb in SQL Server 2005

    It would take much more than 2 connections to cause tempdb contention unless tempdb is on a incorrectly configured IO subsystem. Ideally tempdb should be on its own dedicated disk array that is a RAID 1+0 to minimize the impact of write costs associated with RAID 5.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Thanks for the reply, so you saying that 0.5 x number of cores would be beneficial or is it more like 0.25 x no of cores ?

    I guess one would have to try out both of them to see which one is suitable, but is there a way to run checks to track performance when using both options ?

    Thanks.

  • There are a couple of things you could look at. If the bottleneck is not PFS contention and is instead a diskio problem, sys.dm_io_virtual_file_stats() would be a good thing to watch between the two. This will show you io_stall rates for the database files for tempdb as well as the write vs read io rates for each file. Beyond that, it would be monitoring tempdb as outlined in the whitepaper to see which configuration has the least amount of contention.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan Kehayias (12/9/2009)


    There are a couple of things you could look at. If the bottleneck is not PFS contention and is instead a diskio problem, sys.dm_io_virtual_file_stats() would be a good thing to watch between the two. This will show you io_stall rates for the database files for tempdb as well as the write vs read io rates for each file. Beyond that, it would be monitoring tempdb as outlined in the whitepaper to see which configuration has the least amount of contention.

    Are there any guidelines for checking sys.dm_io_virtual_file_stats, ie baseline figures etc.

    Also, reading up MS information on PFS contention see http://support.microsoft.com/kb/328551, its does say it refers to SQL 2000, for SQL 2005/2005 are there any guidelines for this and what do I need to be looking at for detecting PFS contention.

    Thanks.

  • For IO_Stalls, as close to zero as possible. Stalls = bad performance/high latency to disk.

    For SQL 2005 look at the following Engine Team Blog posts:

    TempDB Monitoring and Troubleshooting: Allocation Bottleneck

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Thanks for the information. I have checked tempdb on system A and ran thesame code on Server B, Server B has a dedicated IO subsystem for tempdb.

    SELECT substring(name,1,20) as 'Database',

    case file_id when 1 then 'Data'

    when 2 then 'Log'

    end as 'Type',

    num_of_reads,

    num_of_bytes_read,

    num_of_writes,

    num_of_bytes_written,

    size_on_disk_bytes,

    io_stall_read_ms,

    io_stall_write_ms,

    io_stall

    FROM sys.dm_io_virtual_file_stats(null,null),

    sys.databases

    where sys.databases.database_id = sys.dm_io_virtual_file_stats.database_id

    and substring(name,1,20) = 'tempdb'

    go

    Database|Type|num_of_reads|num_of_bytes_read|num_of_writes|num_of_bytes_written|size_on_disk_bytes|io_stall_read_ms|io_stall_write_ms|io_stall

    tempdb|Data|363945|22155657216|354290|22878412800|1073741824|650129|6007306|6657435

    tempdb|Log|21|450560|107859|6620096000|5368709120|15|172421|172436

    tempdb|NULL|362888|22146949120|353840|22856851456|1073741824|611836|6062488|6674324

    tempdb|NULL|363025|22144933888|353933|22862233600|1073741824|629907|6034630|6664537

    tempdb|NULL|362770|22139232256|353795|22854279168|1073741824|608596|5929837|6538433

    tempdb|NULL|363005|22142525440|353961|22862700544|1073741824|610345|5971057|6581402

    tempdb|NULL|363107|22149324800|354143|22863060992|1073741824|616765|6067918|6684683

    tempdb|NULL|363044|22153854976|353945|22862241792|1073741824|617332|6043372|6660704

    tempdb|NULL|363137|22153707520|353900|22861438976|1073741824|631008|5965039|6596047

    Not sure how to read the data above, but it looks pretty bad doesnt it ?

  • Without knowing more information it is impossible to tell you good or bad. How long has the server been up? If those numbers are aggregate over 2 months, then they don't look as bad as if you restarted the server on Monday. You can get an idea of uptime by querying sys.databases for the create date of the tempdb database since it is created when the server starts.

    What RAID level and how many disks is the tempdb drive? I'm going to guess RAID 5 or 6 based on the fact that writes are approximate 10 times worse than reads for stalls? One of the trade offs with RAID 5 or 6 is that there is a penalty for writing data due to parity calculation. What is the controller cache configured for ratio wise read/write? How much cache is on the controller. I'd reduce the number of files personally which should help reduce disk head contention trying to write across multiple files. You may have traded one problem, allocation contention in SQL for hardware contention at the disk array. Finding the balance between the two is what you are essentially trying to do here.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Dean Jones-454305 (12/8/2009)


    I noticed that select * from #temptable takes around 2 seconds when there is only 1 thread running the query, but as soon as another thread starts, then it takes a ridiculous amount of time, I mean 30 – 40 seconds.

    For how many rows?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Dean, there are a LOT of variables here. I recommend you get a performance tuning professional in for a day to review your system and make recommendations as well as mentor you on what to look for in the future.

    Some initial questions that I would ask:

    1) how many physical spindles underly your tempdb partition.

    2) how many other things are on that partition?

    3) tlog on same partition?

    4) are the 8 cores physical or is hyperthreading enabled?

    5) what is raid type if applicable?

    6) is the partition sector aligned and is cluster size 64K?

    That is just a start...

    I also don't understand what you mean by a second query causing a 10X delay in response...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (12/10/2009)


    Dean, there are a LOT of variables here. I recommend you get a performance tuning professional in for a day to review your system and make recommendations as well as mentor you on what to look for in the future.

    Some initial questions that I would ask:

    1) how many physical spindles underly your tempdb partition.

    2) how many other things are on that partition?

    3) tlog on same partition?

    4) are the 8 cores physical or is hyperthreading enabled?

    5) what is raid type if applicable?

    6) is the partition sector aligned and is cluster size 64K?

    That is just a start...

    I also don't understand what you mean by a second query causing a 10X delay in response...

    1) how many physical spindles underly your tempdb partition.

    2 spindles for system DB's, as there is a seperate LUN for this, and the size needed is small, hence 2 spindles were used here.

    2) how many other things are on that partition?

    Just other system DB MDF files, so only master,msdb and distribution for replication

    3) tlog on same partition?

    Transaction is on a different LUN as well, 2 spindles used here.

    4) are the 8 cores physical or is hyperthreading enabled?

    HT not sure, 8 cores. 4 physical socksets.

    5) what is raid type if applicable?

    RAID 10

    6) is the partition sector aligned and is cluster size 64K?

    Yes, it is aligned.

  • Dean Jones-454305 (12/12/2009)


    1) how many physical spindles underly your tempdb partition.

    2 spindles for system DB's, as there is a seperate LUN for this, and the size needed is small, hence 2 spindles were used here.

    2) how many other things are on that partition?

    Just other system DB MDF files, so only master,msdb and distribution for replication

    3) tlog on same partition?

    Transaction is on a different LUN as well, 2 spindles used here.

    5) what is raid type if applicable?

    RAID 10

    If you have 2 spindles you have RAID 1, not RAID 10 and this is the problem with your tempdb configuration. A RAID 10 would require a minimum of 4 disks, 2 mirrored pairs, striped. You only have a mirrored pair, no striping, so your maximum IOPS is based off a single disk.

    While tempdb may be a "system database", it is one of the most important databases configuration wise because it is used the most heavily, especially on a system that is memory constrained or that makes heavy use of worktables for interim results, sorts, hashes, or uses temp tables explictly. Tempdb should be moved to a true RAID 10 disk array that has the ability to do the IOPS required by your workload.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply