Timeouts in SQL Server 2008 r2 Clustered Environment

  • Hi Team,

    Please take 1 or 2 mins to read out the below. Sorry for long explanation!.

    > We migrated to SQL Server 2008 r2 clustered environment, we are getting timeouts on one of the Reports which calls an SP.

    > The same report works fine on test Servers, which has the same specs as Prod DB.

    > Also, both the comparison were made at night when noone else was using the system and we switched the pointing to test DB,

    report was working fine. So definitely some issue in Prod DB.

    > Now when we compare the CPU's, RAM etc, both environments are same.

    > When I execute the SP on prod DB (Query analyser) it gives results in 1.25 min's (same as test) , but fails when called from

    .net 2.0 c# windows app button click.

    My Questions:

    1.Are there any Cluster level timeout settings which can cause such issues?. We are yet to dig into cluster info ,

    we dont have access to these as there is separate team to take care of these things.

    2. What Kind of settings needs to be checked as both envrmnts look same but there should be some difference as issue happens

    on 1 envrmnt only?. Any particular node settings , DNS settings etc....

    Forgive me for the long theory without much table level info , but I am looking into more of Cluster level info as I am not much aware of Clustered envrmnts.

    So please provide any inputs that you have experienced.

  • Timeout is a client setting, not a server setting. It's nothing to do with cluster settings.

    Timeout is the client saying it's waited too long for a database query to execute and aborting the command. For a .net app it's 30 seconds by default.

    You need to focus on why the procedure takes longer on production than on your test servers. Look at the execution plans, the answers will be in those.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the inputs.

    I tried increasing the timeout to a 1 hour , but still get the same issue.

    If the Query plans are being generated differently on 2 environments. then isn't it related to the DB settings, h/w etc.. ?

  • Anymore views anyone....

  • Since I don't know anything about your environment Im going to ask several questions

    1. Is the cluster heartbeat on a separate nic/vlan?

    2. Is there blocking occurring on prod?

    3. check the query estimates between test and prod are they the same? if no, investigate the differences...

    4. Is your test environment working with the latest prod build?

    5. Is your load very heavy during the timeouts? If so check connection pooling its default is 100 and sometimes this needs to be increased (this is a client side/connection string setting)

    That's all I can think of for now....

  • GonnaCatchIT (4/15/2016)


    If the Query plans are being generated differently on 2 environments. then isn't it related to the DB settings, h/w etc.. ?

    Maybe, maybe not. Could be stats, data volumes (you haven't said if they're the same), hardware, or any one of a pile of other things.

    You need to focus on why the procedure takes longer on production than on your test servers. Look at the execution plans, the answers will be in those. There's not going to be a silver bullet, single setting that fixes things.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GonnaCatchIT (4/17/2016)


    Anymore views anyone....

    Is this a cluster on virtual machines?

    Smendle (4/17/2016)


    1. Is the cluster heartbeat on a separate nic/vlan?

    Not really relevant in Windows 2008 R2 clusters, the heartbeat connection will bind itself to an available network via the virtual cluster network adapter

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • I am not even sure jumping to clustering is the right move until we have eliminated other DB-level issues. Have you compared execution plans between test and prod? Is it possible that those are different? Statistics and fragmentation? Other DB and SQL settings (again, not clustering)? And then the hardware -- are you sure that there are no issues with the disks? Clustering has very little to do with how SQL works in terms of query optimization (i.e. standalone and clusters should have same execution plans all things being equal) that it would be on the bottom of my list for things to check in your scenario.

  • GonnaCatchIT (4/15/2016)


    Thanks for the inputs.

    I tried increasing the timeout to a 1 hour , but still get the same issue.

    If the Query plans are being generated differently on 2 environments. then isn't it related to the DB settings, h/w etc.. ?

    also note there are two timeouts to consider in your .Net 2.0 app: ConnectionTimeout for the SQLConnection and CommandTimeout for the SQLCommand; make sure you modify both, so that they don't quit early, since you stated the query itself takes a minute and a half..

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks all. I was able to somewhat replicate this. This is happening only during night after 1 am , and at this time the DB backup Job executes for about 20-30 mins.

    So this might be some IO resource issue during this time. But all other screens work fine, this particular report times out.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply