Degrading Service Performance Over Time

  • 0I have a SQL server 2008 R2 server that is in a data presentation environment where we load large amounts of data in evening (500Gb worth of new/update data) and then present it via OLAP during the day. The server performance seriously degrades over time. For instance, our first load package that runs takes about 1.5 hours when we reboot the evening before. The next night it takes a bit over 3 hours for that first package and then it continues at this level for the suceeding days. All the packages experience a similar increase in time from the first time after reboot as do the mostly small queries that run during the day for reporting drill to detail.

    The server has four hexa-core CPU and 120Gb of memory and is attached to a high performace SAN. Right now when this server slows down we are seeing performance time worse then the tiny server we moved it from (2 duel core, 32-bit hardware, 16 Gb Ram, document storage grade SAN). Exact same database (backup and restore to move) and exact same load SSIS packages.

    Max memory is set to 85 Gb because we plan to run powerpivot on this server and are reserving some space for it. The server grows to that 85 Gb pretty fast and stays there.

    I can't find anything that accounts for it. Looking at the performance counters I have not been able to see a bottleneck. During the day is particularly confusing since the server is more or less idle. We run OLAP on another server so once the cubes are processed the only thing going on is an occasional drill to detail taht hits the SQL tables. These are slower also. Pages/Sec is ussually 0, buffer cache hit is ussually 100.

    My DB setup is controlled by my IT department. I have little faith in the setup since they have a standard config they use regardless of the underlying hardware. I already had to force them to change the parameter for max degree of parallelism to 8 on direction of Microsoft because the server basically would not run when it was at 0.

  • Do you have an opportunity to watch (or record) activity from sp_who2 (or Adam Machanic's sp_whoisactive, great demo here)? Check the wait types.

    It's an interesting problem given the fact that the reboot makes all the difference. So what do reboots influence? Well, buffer pages of course, since memory will be cleared. Another possibility would be log writes I suppose. Other processes running on the server that degrade over time? Not in terms of memory (since you said SQL takes its 85 gig allocation without an issue), but perhaps other processes hitting the disk more and more the longer they are up? I recently saw an issue with an infrastructure tool doing WMI logging. Similarly network, something local or even remote that creates more traffic the longer its host, or a client, is up?

  • Have you noticed/tested to see if tempdb is growing?

    What is your tempdb configuration? #files , initial size, storage etc...

    Have you noticed/tested to see if parallelism affects performance - i.e. parallel queries that write to disk perform much worse than non-parallel ones?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • What about chached query plans? These are cleared on startup, could the cached plan no longer be effective? Have you tried doing a DBCC FREEPROCCACHE and seeing how things perform?

    Also is it possible with every load the statistics are going out and not being updated, and fragmentation could be an issue so are you doing reindexing. If you are using Enterprise you can do this online, but watch the size of TempDB.

    Is the data being loaded in the same order as the clustered indexes?

    What is the size of your growth factor on the database files?

    You don't mention which edition you are using, I assume you are using Enterprise, but just in case be aware Standard has a number of limitations on RAM (64GB) and CPU.

    See: http://www.microsoft.com/sqlserver/2008/en/us/editions.aspx

    A whole lot of questions that may point you to an issue.

    Cheers

    Leo

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • Leo.Miller (6/16/2011)


    What about chached query plans? These are cleared on startup, could the cached plan no longer be effective? Have you tried doing a DBCC FREEPROCCACHE and seeing how things perform?

    Also is it possible with every load the statistics are going out and not being updated, and fragmentation could be an issue so are you doing reindexing. If you are using Enterprise you can do this online, but watch the size of TempDB.

    Is the data being loaded in the same order as the clustered indexes?

    What is the size of your growth factor on the database files?

    You don't mention which edition you are using, I assume you are using Enterprise, but just in case be aware Standard has a number of limitations on RAM (64GB) and CPU.

    See: http://www.microsoft.com/sqlserver/2008/en/us/editions.aspx

    A whole lot of questions that may point you to an issue.

    Cheers

    Leo

    That's where I was going to go on this question. I'd normally add my schpeel about page splits on the clustered index and extent splits on non-clustered indexes but you've even got that covered.

    The only other thing I might add is that the growth rate for both the target DB and especially TempDB should be set to something other than a percentage as the default would have it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My mind first jumped to indexing but I don't think it makes sense. Why would a reboot fix it? There's no indication in the OP that a truncate-reload is happening if the server reboots, so if this was the issue I would expect there to be a more linear dropoff over time, not a big initial drop followed by a plateau.

    TempDB is a good idea for sure.

  • There's blocking or resource contention or something. I'd start working off of wait states. If you can see the processes as they're running, what are they waiting on? If you can't, what's your most common waits? Classic performance tuning, gather the metrics, determine where the bottlenecks are, fix 'em. It's something. It's not magic.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • allmhuran (6/17/2011)


    My mind first jumped to indexing but I don't think it makes sense. Why would a reboot fix it?

    Because a reboot clears proc cache. It's like telling every sproc to recompile which will build (possibly) better plans for the larger data content of the tables based on the current content of the indexes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I suppose so, I guess it depends on how long the process has been running for. If you've already got, say, 365 days worth of data, I wouldn't think a couple of reboots over a couple of weeks, with a few percent increase in row count prior to re-planning, would make much difference. But if data is periodically archived off or truncated, sure.

  • Viewing 9 posts - 1 through 8 (of 8 total)

    You must be logged in to reply to this topic. Login to reply