April 15, 2015 at 7:14 am
I have a strange case at hand, with a database which uses cursors for data access /* I know, this is bad but can't change that fact as it is 3rd party... */.
The brief change history with this application is that it ran on a VMWare based Windows 2008 R2 / SQL Server 2008 R2 platform until little over a year ago. There were no general resource issues on this server, nor anything that was standing out regarding wait_stats or blocking.
As performance complaints regarding this application escalated, this particular database was moved to physical box (2 * Intel Xeon E5-2690, 32 GB mem and 4 SAS drives in RAID 1+0) again with the 2008R2 combo... For a database of just 18 GB!!! This of course was a temporary solution, but application performance was considered acceptable by the users. Memory utilization at max went to 20 GB and in general most of the procs were flat lining with 2 to 4 processors working between 10 and 40 % (out of 32 Hyperthreaded cores)
Fast forward a year, the physical box had to go and with the new virtualization platform (Simplivity Omnicube CN-3000) in place the database was migrated to a new Windows 2012 R2 with SQL Server 2012 environment on VMWare. The target server has 4 cores allocated and 32 GB of memory. Again from day 1, users have a deja vu with this applications performance. And while again resource usage does not seem problematic and there are no significant wait_stats or blocking, the perception by the users is back to what is was before the applications database went physical.
Without very hard metrics from the past and no clear indications from SQL / Windows where the problem lives, my fear is that there is a significant overhead introduced with sp_cursorexecute on VWware.
My questions;
Does anyone have similar experiences?
Are there any metrics that could confirm or refute my fear?
Does VMware have settings that could change this behavior?
And help or thoughts welcome, even "me too" (so we do not feel lonely). The only thing not helping is; get rid of the cursors... I know cursors are evil and this particular database design sucks... but it is 3rd party.
April 15, 2015 at 7:36 am
When you upgraded from 2008 to 2012 did you do any maintenance? Stats and indexes need to be rebuilt as the way the engine handles those things is very different between versions. I don't think the VM is what is causing the issue, we all know it is the cursors. That being said I don't think cursors are any worse on a VM than a stand alone box.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 15, 2015 at 7:51 am
Thanks for the feedback.
Regarding the index and statistics maintenance; that is done on a weekly basis using Ola Hallengrens solution. But even with a refreshed set of indices and up to date statistics (Monday morning), performance is bad and to the best of my knowledge does not worsen throughout the week. And whilst the database design isn't optimal, it is supported by additional indices where applicable.
April 15, 2015 at 10:33 am
I came across one person that had substantial problems with row-by-row processing on SQL 2012. I am severely jet-lagged at the moment so I can't remember the details, and could in fact be hallucinating. :w00t: I seem to recall that issue was fixed by a CU.
However, what I would investigate is differential wait stats and especially file IO stalls while the server is under load. Also check out the entire server stack for config/driver/firmware issues and bottlenecks (network is especially important here).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 15, 2015 at 11:41 am
Everything Kevin said.
Plus, if you know the queries that are giving you problems, can you look at the execution plans and, if possible, look at them prior to the upgrade. Also, didn't notice, did you change the compatibility level after the upgrade. That could affect things.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 15, 2015 at 3:21 pm
Oh, this will so much be a great learning experience..
@kevin, hopefully your jet-lag will quickly fade... I very much care about those details. The current SQL 2012 is at SP2 if that matters in aiding your memory.
The unfortunate event is that the physical server is no longer available (it was a leased machine), so there goes comparing wait stats or file IO.
To the details of the current situation, I've attached the differential filestats for tempdb (DbId = 2) and the database in question (DbId = 5) in both cases TLog is fileId 2 and tempdb has 4 data files. The numbers were generated over 5 minute intervals during working hours (and a bit before / after) over the course of two and a half day.
The database (DbId 5) is in compatibility level 110, Read Committed Snapshot Isolation Level (as per vendor recommendation/mandatory setting) and has the same collation as tempdb.
When looking at the io, the read/write ratio is very much towards write (in terms of file access)... but when profiling the database workload, about 99 % is read (logical, next to no physical reads). But that is to expect when there is more memory available than database to claim it.
I do have a similar collection of wait stats, but that is obviously a bigger pile to digest. At first glance, there wasn't anything that caught my eye though. Any particular wait types to watch when dealing with cursors?
From a user perspective, opening the main planning screens used to take 10 to 15 sec on the physical SQL Server and now is up to 45 sec up to 75 sec (I can't vouch the 10 tot 15 sec in the old situation, but the 45 to 75 were actually clocked). Upon user reports, other actions are significantly slower too.
In the area config/driver (firmware does not apply) everything in my visible scope looks okay. But a VMWare specialist is going to look under the hood there.
@Grant, as said, the old server is gone so I only can work with current activity and look back as far as query_stats and cached_plans allow me. What I see makes my eyes wet, but in this case it is by (database)design and a lot of FETCH API_CURSOR. So far, there is nothing I can see that more doubles the wait time at the users end.
April 16, 2015 at 3:36 am
Stan_Segers (4/15/2015)
Oh, this will so much be a great learning experience..@kevin, hopefully your jet-lag will quickly fade... I very much care about those details. The current SQL 2012 is at SP2 if that matters in aiding your memory.
The unfortunate event is that the physical server is no longer available (it was a leased machine), so there goes comparing wait stats or file IO.
To the details of the current situation, I've attached the differential filestats for tempdb (DbId = 2) and the database in question (DbId = 5) in both cases TLog is fileId 2 and tempdb has 4 data files. The numbers were generated over 5 minute intervals during working hours (and a bit before / after) over the course of two and a half day.
The database (DbId 5) is in compatibility level 110, Read Committed Snapshot Isolation Level (as per vendor recommendation/mandatory setting) and has the same collation as tempdb.
When looking at the io, the read/write ratio is very much towards write (in terms of file access)... but when profiling the database workload, about 99 % is read (logical, next to no physical reads). But that is to expect when there is more memory available than database to claim it.
I do have a similar collection of wait stats, but that is obviously a bigger pile to digest. At first glance, there wasn't anything that caught my eye though. Any particular wait types to watch when dealing with cursors?
From a user perspective, opening the main planning screens used to take 10 to 15 sec on the physical SQL Server and now is up to 45 sec up to 75 sec (I can't vouch the 10 tot 15 sec in the old situation, but the 45 to 75 were actually clocked). Upon user reports, other actions are significantly slower too.
In the area config/driver (firmware does not apply) everything in my visible scope looks okay. But a VMWare specialist is going to look under the hood there.
@Grant, as said, the old server is gone so I only can work with current activity and look back as far as query_stats and cached_plans allow me. What I see makes my eyes wet, but in this case it is by (database)design and a lot of FETCH API_CURSOR. So far, there is nothing I can see that more doubles the wait time at the users end.
1) The average read stall is 2.2ms and the average write stall is 8.9ms, which isn't that bad. But there are periods where the writes run 20-40, which isn't very good. In total, your processes waited 4.5 HOURS for IO during this period.
2) The important thing with IO stalls and wait stats is to do diff analysis in a short window. I typically do 3 or 5 minutes when trying to decipher a particular performance issue at a client. Then your top waits can be sorted easily. And to try to narrow your waits to ones specific to cursors is the wrong idea. I don't care what the worst wait is - I want to know it so I can fix it and then iterate until performance is acceptable.
3) I assume you are using sp_whoisactive to do real-time observations for execution metrics?
4) Your app team should obviously be reviewing all external processes for bottlenecks too.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
April 17, 2015 at 1:52 am
A little update which seems to underline one of Kevins first statements: network is especially important here.
A VMware specialist has reviewed the environment and did come up with a few recommendations, mainly in the networking area. So before delving further into our SQL-haystck to see if a needle is in there, these recommendations will be examined and implemented if applicable. I will follow up if these changes have made the difference.
The assumption regarding sp_whoisactive was incorrect, I was working on the sys.dm_exec views directly. But the suggestion to this gem is well noted. Thanks Kevin!
June 5, 2017 at 2:09 pm
I realize this post is quite old at this point but was wondering if a resolution was found.
Thanks
DL
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply