October 23, 2017 at 3:44 pm
I have two servers configured to have the same hardware specifications. Server A is hosted on vsphere 5.5 and Server B is hosted on vsphere 6.0. Both are running the same version annd edition of SQL Server 2014, and each instance is configured exactly the same. They have identical copies of a database. I do not have details on the configuration of the underlying storage, but all data files are hosted on a D drive, and all log files are hosted on an E drive for both servers. TempDB is configured identically for both. Prior to this test, all indexes and stats were rebuilt and I also ran freeproccache against each server. The test executes the query "EXEC sp_executesql N'SELECT name, SSN, DOB FROM Person WHERE LastName LIKE @p1', @p1=[random values]" where [random values] is a string containing 3 alpha characters and a %.
On server A, the response time for 5000 executions of this query is 6ms. On server B, the response time is 12ms. Both have a ~500ms NETWORK_IO wait at the end of the test, but no other waits were observed. Disk response time is about the same for both. CPU is sub 10% throughout the duration of the test. Even though 6ms doesn't seem like a big deal, this problem actually scales beyond this basic test. In general, queries take twice as long to execute on server B than they do on server A.
I'm of the opinion at this point that there has to be some difference with either the virtualization config or the way the disk subsystem is configured, but seeing as how I have no visibility to those, it's hard to make that case. I just know that I can't see anything to explain the difference.
Anyone have any ideas?
October 25, 2017 at 12:46 pm
Execution plans the same? Row counts in dbs that same? I'd make sure this is the case first.
October 25, 2017 at 12:52 pm
Yes. We literally took a backup of the starting database from server A and restored it on server B. Execution plans are identical.
October 25, 2017 at 1:34 pm
Are all the latencies in terms of disk values that SQL sees, and the CPU usage differing? I'm thinking that you need to show that there is something different that each VM sees. From the perspective of the client, obviously there is a difference. Is that not enough to show your virtualization people?
October 25, 2017 at 1:38 pm
Steve Jones - SSC Editor - Wednesday, October 25, 2017 1:34 PMAre all the latencies in terms of disk values that SQL sees, and the CPU usage differing? I'm thinking that you need to show that there is something different that each VM sees. From the perspective of the client, obviously there is a difference. Is that not enough to show your virtualization people?
Yeah, disk response times are roughly the same, give or take 1 or 2 ms on the mean. CPU was meandering between 6-10% utilization throughout the test on both instances. I'm trying to exhaust all possible options before going back to the infrastructure team, as they are a particularly sensitive and defensive bunch.
October 26, 2017 at 7:59 am
If you see the same data, same execution plans, you've updated stats (implying same exec plan) and latency as well as CPU is the same, there must be something underlying the issues. I think you've worked the stack well, top down. Time to dig into the part underlying the guest OS.
October 26, 2017 at 9:29 am
Steve Jones - SSC Editor - Thursday, October 26, 2017 7:59 AMIf you see the same data, same execution plans, you've updated stats (implying same exec plan) and latency as well as CPU is the same, there must be something underlying the issues. I think you've worked the stack well, top down. Time to dig into the part underlying the guest OS.
Thanks, Steve. Much appreciated.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply