February 18, 2013 at 12:22 am
Hi All
I require help with a performance issue we currently having. We have a data warehouse which processes and downloads data from difference sources into our presentation database for end users. The problem is to do with the execution times of a automated SQL job. On different occasions the execution times differ quite a bit! for example. A quick execution time will be around 4 hours and a bad one around 7 hours! What we have found is that after a reboot of the server the job times improve, although over a period of time (about 2 weeks) the job reverts to poor performance (Average about 2 weeks)
1. We are running SQL 2008R2 enterprise.
2. Its on a VM (and please lets not go here).
3. We are running fast disks (Raid 10)
4. Server has 64GB ram, with the db is question is 5TB.
5. Execution plans, statistics, indexes and locks are not the problem.
6. All other jobs run are acceptable, its just this one that's a problem. (its not the developer cause if it was, then we would have consistently bad run times)
7. The SQL memory is limited on Memory option under the SSMS.
Let me know if you can pick up anything?
Thanks guys!!
February 18, 2013 at 12:54 am
I'm working on a project that has a very similar problem, and we found on investigation that a lot of the issues were around suboptimal code, database structure, lack of clustered indexes, and Ill conceived Table Partitioning (200kb tables partitioned!!).
Do you see a lot of Parallelism going on during the load?
Is the database split across filegroups? If so how?
I know you state that the stats and indexes are not the problem, how often is the index maintenance plan run?
Do you use a lot of SP's as it could be a number of outdated plans that are being stored and reused, which I believe are cleared out on a reboot, which might also explain the issues, you could use the WITH RECOMPILE option on SP's as its a data warehouse so theres not a major impact with this.
It might help to run Perfmon and SQL Profiler traces during a couple of loads to see if there are any obvious culprits, and line up the SQL profile trace with perfom and look whats happening at critical times (high disk latency, memory issues, page file swaps etc).
Out of curiosity have you tried running this on a similar spec'd hard box to see if you have the same issues.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
February 18, 2013 at 12:56 am
Jason-299789 (2/18/2013)
you could use the WITH RECOMPILE option on SP's as its a data warehouse so theres not a major impact with this.
HOw this wouldnt hurt the performance. ? WHy new exec plan everytime?
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
February 18, 2013 at 1:06 am
Bhuvnesh (2/18/2013)
Jason-299789 (2/18/2013)
you could use the WITH RECOMPILE option on SP's as its a data warehouse so theres not a major impact with this.HOw this wouldnt hurt the performance. ? WHy new exec plan everytime?
I wouldn't suggest using WITH RECOMPILE on an OLTP system, but an OLAP system is likely to benefit from generating new plans every time the ETL process is run, as you are generally only are only running the SP once per day during the load cycle, so the RECOMPILE overhead is minimal compared to an OLTP system that would probably be running the same SP several 1000 times an hour (depending on throughput).
It may also be that the WITH RECOMPILE option could be targeted at the longest running SP's, so that only they are affected while those that are behaving are left alone.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
February 18, 2013 at 1:13 am
Thanks for the quick reply guys...
@@Jason..
- Parallelism YES! I have noticed up to 90 process SPID's, the Machine is a 16 Core box, but when we tried to play around with the max cost and max degree it messed up the times even further, so i threw that out the window!
- Split across file groups YES! the data files lie on Raid 5 disks and the logs on raid 10. The data file and log file are also pre-grown. 300GB files.
- Index maintenance is done on a nightly basis at around 10pm and either rebuilds or re-orgs depending on fragmentation level.
- with regards to the SP's am i wrong to say that with a reboot, a new estimated plan will be used ( i understand that's not good practice, but it seems that, this is our only solution??? Cold execution plans, and cold stats???)
- Problem with running monitoring is that we don't really have a bench mark to work off, the server eg. DISK, CPU, Memory, I/O is stable! remember the machine is quite a beast! Its handles the load…maybe it’s a memory leak or something???
- with regards to physical machines, yes i am planning to move this server directly on the physical it resides...i used to host 3 VM's here, i migrated them and have this as a stand alone at the moment. But for now, i need to sort out this issue???
@@Bhuvnes
That is a long route...speaking to the developers to re-edit the code, might take a few weeks, a luxury i don't have 🙂
And I can’t edit their code because of business process’s
February 18, 2013 at 1:36 am
Thanks for the reply,
Its good to see you do a targeted rebuild, I would imaging its indexes over 30/40% that get rebuilt and anything under that re-organised. I'm not sure if a recognise updates stats though.
I wouldn't be too concerned about the Baseline for the monitoring as the immediate issue is to see where you have possible problems, with long running process and try and target them in terms of optimising code. Besides baselines really only work if you are getting consistent times, but the times you are seeing are very varied and possibly due to the variation in data volumes being loaded through.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
February 18, 2013 at 1:48 am
100% i agree with you, but the thing is also not data loads.
To curb this, i requested a data count, after every table re-load, and the difference between a slow day and a fast one with regards to data load is minimum! (table are truncated daily)
I read somewhere about OS not releasing free memory to SQL, even if SQL has been pre set to grab memory? But the problem with this theory is, we were running on 40GB total memory and increased to 64GB (performance did not change)...
I have also broken down all the stored proc's, view etc etc, into its components to see where the bottle neck is...
problem is individually they run very quick! Collectively the job is slow?
and with regards to the index rebuilds, you right on the button, 0-10% reorg 11>above rebuild.
February 18, 2013 at 2:08 am
The problem is that there is a tipping point where SQL can get into a twist, sometimes this can be a few 100,000 rows difference.
does your service account have lock pages in memory set, if so it might be worth reading this, http://blogs.technet.com/b/askperf/archive/2008/03/25/lock-pages-in-memory-do-you-really-need-it.aspx, also I've read that using this on VM can have performance implications.
I would also consider backing off the MAXSERVER memory to about 90% (about 58GB) so as to allow SSIS and all other processes to run in the background without overloading the VM.
While the processes individually run quickly its possible that if several are running at the same time then it can cause problems as the server runs out of resources.
Do you have any logging for the ETL process such as individual process/section starts/ends, as that would be a starting point, with an SQL trace running on to look at the specific queries that make up the long running process.
_________________________________________________________________________
SSC Guide to Posting and Best Practices
February 18, 2013 at 2:27 am
-thanks for this, i am reading up on the lock pages thing 🙂
-VM performance, yes, DW should never be run on a VM, problem is, that business has already bought the technology and we have to use it 🙁
-I have a Pre Production environment, i will test the MAXSERVER before implementing into production.
- with regards to multiple threads, i don't see this as a problem as the I/O does not peak when i do the tests.
-Yes i also requested individual stored process to log process times... Incrementally there is not really a problem there as well, each stored proc runs a few minutes...but if thats the case why does the job vary on times so much? Business needs the jobs to complete in under 5 hours at the most, i restarted the server this past Friday night, after a 7 hour run, Saturday, Sunday and Monday runs are around 3hr 30min!!! Also interestingly enough, when i restart just SQL, there is no benefit, when i restart the entire box, everything seems fine???
February 18, 2013 at 2:43 pm
Modest DBA (2/18/2013)
100% i agree with you, but the thing is also not data loads.To curb this, i requested a data count, after every table re-load, and the difference between a slow day and a fast one with regards to data load is minimum! (table are truncated daily)
I read somewhere about OS not releasing free memory to SQL, even if SQL has been pre set to grab memory? But the problem with this theory is, we were running on 40GB total memory and increased to 64GB (performance did not change)...
I have also broken down all the stored proc's, view etc etc, into its components to see where the bottle neck is...
problem is individually they run very quick! Collectively the job is slow?
and with regards to the index rebuilds, you right on the button, 0-10% reorg 11>above rebuild.
"table are truncated daily".
If the data is truncated and reloaded fully, how could indexes get fragmented day to day?
I'm assuming you load to staging and only the very final INSERT is to the actual DW tables, and it's sorted in clus key order.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
February 18, 2013 at 5:17 pm
In VM configuration set "RESERVED MEMORY" to at least half (or better - all 64GB) of the RAM you allocated for that guest.
That will prevent the ballooning driver in your guest to flush-out your sql server buffer cache.
Set instance parameter "Max server memory" about 4GB less than guest RAM. Check in task manager that at least 512MB is available all the time when db, ssis, and other software is running on that guest. If it's not - adjust!
Set instance parameter "Min server memory" to at least half of "max server memory".
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply