April 14, 2016 at 9:31 pm
Hi Team,
Please take 1 or 2 mins to read out the below. Sorry for long explanation!.
> We migrated to SQL Server 2008 r2 clustered environment, we are getting timeouts on one of the Reports which calls an SP.
> The same report works fine on test Servers, which has the same specs as Prod DB.
> Also, both the comparison were made at night when noone else was using the system and we switched the pointing to test DB,
report was working fine. So definitely some issue in Prod DB.
> Now when we compare the CPU's, RAM etc, both environments are same.
> When I execute the SP on prod DB (Query analyser) it gives results in 1.25 min's (same as test) , but fails when called from
.net 2.0 c# windows app button click.
My Questions:
1.Are there any Cluster level timeout settings which can cause such issues?. We are yet to dig into cluster info ,
we dont have access to these as there is separate team to take care of these things.
2. What Kind of settings needs to be checked as both envrmnts look same but there should be some difference as issue happens
on 1 envrmnt only?. Any particular node settings , DNS settings etc....
Forgive me for the long theory without much table level info , but I am looking into more of Cluster level info as I am not much aware of Clustered envrmnts.
So please provide any inputs that you have experienced.
April 15, 2016 at 2:15 am
Timeout is a client setting, not a server setting. It's nothing to do with cluster settings.
Timeout is the client saying it's waited too long for a database query to execute and aborting the command. For a .net app it's 30 seconds by default.
You need to focus on why the procedure takes longer on production than on your test servers. Look at the execution plans, the answers will be in those.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 15, 2016 at 4:48 am
Thanks for the inputs.
I tried increasing the timeout to a 1 hour , but still get the same issue.
If the Query plans are being generated differently on 2 environments. then isn't it related to the DB settings, h/w etc.. ?
April 17, 2016 at 12:10 pm
Anymore views anyone....
April 17, 2016 at 4:51 pm
Since I don't know anything about your environment Im going to ask several questions
1. Is the cluster heartbeat on a separate nic/vlan?
2. Is there blocking occurring on prod?
3. check the query estimates between test and prod are they the same? if no, investigate the differences...
4. Is your test environment working with the latest prod build?
5. Is your load very heavy during the timeouts? If so check connection pooling its default is 100 and sometimes this needs to be increased (this is a client side/connection string setting)
That's all I can think of for now....
April 18, 2016 at 2:52 am
GonnaCatchIT (4/15/2016)
If the Query plans are being generated differently on 2 environments. then isn't it related to the DB settings, h/w etc.. ?
Maybe, maybe not. Could be stats, data volumes (you haven't said if they're the same), hardware, or any one of a pile of other things.
You need to focus on why the procedure takes longer on production than on your test servers. Look at the execution plans, the answers will be in those. There's not going to be a silver bullet, single setting that fixes things.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 18, 2016 at 4:44 am
GonnaCatchIT (4/17/2016)
Anymore views anyone....
Is this a cluster on virtual machines?
Smendle (4/17/2016)
1. Is the cluster heartbeat on a separate nic/vlan?
Not really relevant in Windows 2008 R2 clusters, the heartbeat connection will bind itself to an available network via the virtual cluster network adapter
-----------------------------------------------------------------------------------------------------------
"Ya can't make an omelette without breaking just a few eggs" 😉
April 18, 2016 at 10:22 am
I am not even sure jumping to clustering is the right move until we have eliminated other DB-level issues. Have you compared execution plans between test and prod? Is it possible that those are different? Statistics and fragmentation? Other DB and SQL settings (again, not clustering)? And then the hardware -- are you sure that there are no issues with the disks? Clustering has very little to do with how SQL works in terms of query optimization (i.e. standalone and clusters should have same execution plans all things being equal) that it would be on the bottom of my list for things to check in your scenario.
April 18, 2016 at 10:31 am
GonnaCatchIT (4/15/2016)
Thanks for the inputs.I tried increasing the timeout to a 1 hour , but still get the same issue.
If the Query plans are being generated differently on 2 environments. then isn't it related to the DB settings, h/w etc.. ?
also note there are two timeouts to consider in your .Net 2.0 app: ConnectionTimeout for the SQLConnection and CommandTimeout for the SQLCommand; make sure you modify both, so that they don't quit early, since you stated the query itself takes a minute and a half..
Lowell
April 18, 2016 at 10:20 pm
Thanks all. I was able to somewhat replicate this. This is happening only during night after 1 am , and at this time the DB backup Job executes for about 20-30 mins.
So this might be some IO resource issue during this time. But all other screens work fine, this particular report times out.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply