2008 R2 performance issues (132 GB RAM, 80 CPU, 370 databases)

  • 2008 R2 severe performance issues

    I am a DBA and have a server that is exhibiting severe performance problems. Basically, I'm looking for advice on configuration options we have set incorrectly. I came across an article specifically related to running SQL Server 2008 R2 on machine with more than 64 CPU's (we have 80). Here is the article. http://technet.microsoft.com/en-us/library/ee210547(SQL.105).aspx. A colleague suggested I look particularly hard at affinity mask and process affinity. The article specifically says to not use the affinity mask but to use ALTER SERVER CONFIGURATION SET PROCESS AFFINITY. Any comments on this would also be appreciated.

    Let me know if I left out any pertinent info, etc.

    Here are some facts:

    Hardware

    - Memory: 132 GB

    - CPU: 4 NUMA cores. Each core has 20 nodes.

    I'm not very familiar with NUMA so I hope I am saying this right.

    80 CPU's are displayed in Task Manager on the Performance tab.

    - Disk: K: dedicated to tempdb (SAN)

    L: dedicated to logs (all log files) (SAN)

    M:data drive (SAN)

    N: data drive (SAN)

    I don't know much about the SAN configuration but I know that drive N is configured with 12 (I think) disks all being presented as drive N.

    SQL configuiration:

    Max Memory: 104 GB

    CPU affinity: Auto

    TempDB: 4 data files; 1 log file. All tempdb files are on a single volume (K:).

    I *believe* it is raid 6.

    Recovery: All databases are set to Simple recovery

    MaxDOP: 8

    Cost threshold for parallellism: 10

    Number of databases: 370

    Number of sessions: 1800 - 2000

    Problem:

    - M and N % busy > 75%

    According to Idera SQLdm, data drives M and N are consistently above 50% busy and frequently above 80% busy.

    - CPU typically < 15%

    According to Idera SQLdm, CPU busy is typically < 15% with very occassional spikes to 30 to 50%. That seems low considering the number of databases on this machine.

    - Only 75 GB of 132 GB memory in use.

    I don't understand this one. I've set max memory for SQL Server to be 104 GB. Not sure why it's not using more than 75%

    Questions:

    - Is 370 very busy databases on a single instance excessive? It seems like it to me. The load on the databases is both OLTP and reporting. The split is approximately 50/50.

    - The N: drive on the SAN is configured with 12 drives. Wouldn't it be better to split these up into multiple LUNS? (I hope I'm using LUN correctly here). Generally, the more spindles the better. Also, all drives are sharing a single fiber channel (I assume). If they were split up, more channels could be used. DISCLAIMER: I'm not a SAN engineer so what I'm saying here may be total BS. 🙂

    - location of data/log files

    All the data files are on 2 drives. Generally I try to separate these on different drives when possible. Does this apply equally to a SAN?

    Any suggestions/advice is appreciated. It seems to me a server with this much horsepower should be able to handle a very big load.

    Thanks,

    Brett

  • Most of my performance issues where related to Disk I/O. That many DB's on 2 data drives seem excessive. On very busy databases we usually have 2 or 3 drives for data/indexes separately. I would check the disk queue lengths and throughput. I'd also have more then 1 instance of SQL and maybe breakout like DB's (OLTP/OLAP) into their own instance, can improve disk reads for OLAP since the data is usually more static. Lots of smaller drives are better than 1 large drive. 🙂

    ---------------------------------------------------------------
    Mike Hahn - MCSomething someday:-)
    Right way to ask for help!!
    http://www.sqlservercentral.com/articles/Best+Practices/61537/
    I post so I can see my avatar :hehe:
    I want a personal webpage 😎
    I want to win the lotto 😀
    I want a gf like Tiffa :w00t: Oh wait I'm married!:-D

  • brett.walker (11/29/2012)


    2008 R2 severe performance issues

    - Only 75 GB of 132 GB memory in use.

    I don't understand this one. I've set max memory for SQL Server to be 104 GB. Not sure why it's not using more than 75%

    Questions:

    - Is 370 very busy databases on a single instance excessive? It seems like it to me. The load on the databases is both OLTP and reporting. The split is approximately 50/50.

    - The N: drive on the SAN is configured with 12 drives. Wouldn't it be better to split these up into multiple LUNS? (I hope I'm using LUN correctly here). Generally, the more spindles the better. Also, all drives are sharing a single fiber channel (I assume). If they were split up, more channels could be used. DISCLAIMER: I'm not a SAN engineer so what I'm saying here may be total BS. 🙂

    - location of data/log files

    All the data files are on 2 drives. Generally I try to separate these on different drives when possible. Does this apply equally to a SAN?

    Any suggestions/advice is appreciated. It seems to me a server with this much horsepower should be able to handle a very big load.

    Just because you have allocated more memory for use by SQL Server doesn't mean it will use it all.... it will only grab what it needs.

    I would say 370 databases to be a lot on a server that is 50% OLTP and 50% reporting. Of course it depends of the transaction rate, connection rate and amount of data in each transaction. Add to that the quality of the database and application design and it could go different ways.

    As to how to partition the disk space up... that can depend on the SAN hardware and software and type of queries going on. I can say that having everything on one "drive" is rarely optimal when combining OLTP with reporting.

    Even a server that big has bottlenecks. Look for the obvious: disk queue lengths, obnoxious queries that are doing a lot of I/O or using high CPU, excessive blocking, etc. Proper partitioning of data makes a big difference generally, but the application database design should leverage that as well which tends to reduce contention and blocking.

    With that hardware you might consider virtualization.

    The probability of survival is inversely proportional to the angle of arrival.

  • You've got potential for serious issues on tempdb and log files especially::

    Tempdb should never be RAID6 because of its high write activity. Btw, same for log drives, for the same reason.

    In fact, tempdb and log drives should be RAID10; otherwise performance will not be best.

    You very likely need more data files on tempdb.

    Make sure all of tempdb's data files are exactly the same size in KB. You can see the current file sizes like so:

    EXEC tempdb.sys.sp_helpfile

    ALL of the tempdb data files should have the same size.

    If possible, the write cache % on log drives should be much higher than the read cache % (some drives allow you to specify this, some don't), unlike on data drives.

    Since SQL must wait on log writes to be confirmed, and tempdb is now the center of SQL's universe, performance issues on either of those will slow down your server disproportionately more.

    You may need another data drive set(s), but they don't typically slow down your system that much unless they are really bad. SQL always defers and "batches" data writes. Still, that many dbs on just two drives does seem extreme, so I would keep an eye on it after working on the tempdb and logs.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply