February 29, 2012 at 4:58 am
When running a query purely from cache one server is at least 50% slower than another with identical data.
We have 2 virtual SQL servers (hyper-v) from different hosts/backend san/storage etc.
Both SQL server are 2008R2 SP1 64-bit
Both have a single table DB with 3GB of data in it.
A select statement of just Select fielda,fieldb from tableA causes a clustered index scan (field a is the clustered index)
The logicial reads are the same but the duration on one server is between 50% and 100% slower, from 9 seconds to between 13 and 17 seconds to return the table)
Both servers have more than 3GB data cache, the faster server has 6GB and the slower has 30GB.
Before I go back to the hardware team, I need to prove that this slowness is not SQL but infrastrcture.
I would expect that a simple statement, no sum or order when run a few times in quick succession which proves it is in cache would take the same time unless the RAM is slower or slower CPUs or an o/s problem.
What are the best queries/perf mon counters to help prove that SQL is not at fault here.
Thanks
March 1, 2012 at 11:00 am
Use SQL Server profiler; if the reads, the writes, the CPU, and the results are the same, but the duration is different, that's #1.
Check for internal fragmentation, check your statistics, etc.
Check buffer contents with one of the many scripts available (perhaps Pinal Dave's[/url], or spend more than the 5 seconds I spent looking for a better one)
Check for external (filesystem level) fragmentation.
Though honestly, I'd probably start by asking the Hyper-V team what the specs on the hardware each has is, and what the minimum levels allocated for RAM, CPU, and so on are.
March 1, 2012 at 11:12 am
sotn (2/29/2012)
...The logicial reads are the same but the duration on one server is between 50% and 100% slower, from 9 seconds to between 13 and 17 seconds to return the table...
Return table to where? To a client?
It could be differences in your network...
March 2, 2012 at 2:31 am
The results are just in the SSMS query window and I am RDP'd directly onto the server, this is to remove the main application as the cause of the performance issue, as this is our first foray into hyper-v for production servers.
The cpu counts are higher on the slower server, both servers have quad core amd opteron but the slower one is 6174 2199Mhz and the faster server is 8378 2399 Mhz.
The slower servers have 16GB RAM and the database is only 3GB in total.
I ran an update table X with fullscan before running the simple query 5 times in immediate succession whilst there is no other activity to get a good baseline.
The query returns 652,527 rows, and I run this other query immediately before and after the data query
Select num_of_reads, num_of_bytes_read,
num_of_writes, num_of_bytes_written
from sys.dm_io_virtual_file_stats(db_id('DBname'),NULL)
None of these values change, which I understand to mean that all the data returned was from cache, so no I/O was performed.
set statistics io on also reported (the same) logical reads but no physical reads at all
March 2, 2012 at 8:54 am
sotn (3/2/2012)
The results are just in the SSMS query window and I am RDP'd directly onto the server, this is to remove the main application as the cause of the performance issue, as this is our first foray into hyper-v for production servers.The cpu counts are higher on the slower server, both servers have quad core amd opteron but the slower one is 6174 2199Mhz and the faster server is 8378 2399 Mhz.
The slower servers have 16GB RAM and the database is only 3GB in total.
I ran an update table X with fullscan before running the simple query 5 times in immediate succession whilst there is no other activity to get a good baseline.
The query returns 652,527 rows, and I run this other query immediately before and after the data query
Select num_of_reads, num_of_bytes_read,
num_of_writes, num_of_bytes_written
from sys.dm_io_virtual_file_stats(db_id('DBname'),NULL)
None of these values change, which I understand to mean that all the data returned was from cache, so no I/O was performed.
set statistics io on also reported (the same) logical reads but no physical reads at all
First off, the fact that your host hardware platforms are entire generations different would call into question the assumption that performance should be similar in the first place.
That's a good technique to show that SQL Server isn't pulling data from disk for that particular database; however, SSMS very likely is writing the results to a temp file on local (virtual) disk, which could account for at least some of the speed difference, and tempdb may be involved as well for internal sorting or other operations.
You can try writing results to a #temp table on each server as well, or write a query that forces a full table scan or ten and returns almost no results.
Alternately, use SQLIO, and see where the results from each guest SQL Server leads you.
As I said before, I'd start by getting the Hyper-V team involved, checking on the configuration on both VM's, checking for filesystem fragmentation at the host level, watching the overall host load monitors, checking on whether the host hardware configurations are the same (maybe one is set to mirror RAM internally, and the other is not), etc. etc.
Question: Are the guest SQL Server at identical patch and hotfix revisions? Are the guest OS's at identical revisions and patch levels? The host OS's?
Question: Can you even up the guest resources for testing purposes? RAM min, max, and shares, CPU min, max, and shares, IO ... and so on and so forth.
Question: Can you use Hyper-V's equivalent of V-motion (or a backup/restore/remove network access, etc.) to swap physical platforms and try it, then swap back and try it again?
March 2, 2012 at 9:14 am
Thanks for the feedback.
I have to prove/test that SQL is not the issue before I can go to the Hyper-V team, as the application on the server feels sluggish at times, so not touching any db including tempdb (i also captured the vrtiual file stats for tempdb) just to isolate the ram is one of my approaches to prove if SQL itself is at fault or not, so I guess first question would be, is my understanding that my approach for an individual VM is correct so in essence, there is nothing I can do to make SQL faster, it must be a Hyper-V issue. The slow VMs take 24 seconds for the 1.8 million records from cache, the faster VM takes 16 and a new phsyical server in a different branch takes just 11 seconds, so I need to go to the Hyper-V team to start them testing all your suggestions.
Yes I agree it is not exactly like for like, but the business are reluctant to migrate from the physical servers to the new hyper-v unless they can see better performance (the current physical servers are old) so we are trying to prove that and of course the first place of where the problem is, is with SQL, so I am having to check/prove if SQL is or is not the culprit.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply