Weird Performance problem. Fast on standalone laptop, slow on super-duper server.

  • Hi All,

    We have a 3rd party application which collects data and shovels it via a web server into a SQL 2008 R2 database (O/S is Windows Server 2008 R2 - 64bit).

    It runs on a high spec HP physical box 32Gb RAM, 4 Quad Core CPU and SAN attached disk.

    The performance is dreadful loading the data and also running reports against the data.

    Even just switching between different screens on the web based application takes forever.

    Database is currently about 100Gb and continually growing.

    All the stored procs are encrypted so I can't actually see what's going on.

    DMVs would indicate that possibly the 3rd party code is inefficient or could benefit from better indexing BUT.... When we gave the 3rd party a copy of our database they got it flying.... On a laptop....

    I thought maybe SAN latency could be an issue so I ran SQLIO and baselined the SAN disks and got the 3rd party to do the same with their laptop. The SAN and laptop disk latency was very similar but the SAN throughput was about 10 times faster so it doesn't look like disk I/O is the issue.

    If the vendor can get the app to fly when they restore our database onto a laptop but it's slow on our far higher spec server it then I feel it must be something external to SQL to do with our Production environment.

    The network guys claim there are no issues (but then again they always do) and i've checked we don't run anti-virus s/w against database files, etc...

    The 3rd party aren't coming up with any solutions and i'm running out of ideas.

    Anyone got any suggestions where else to check?

  • I am thinking initially it could be blocking but it truly could be any number of things. When they run it on the laptop I assume it's just the one user using the application. When it is run on the server-setup are there multiple users on the system?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • What's the network connection/switches between all this hardware? Is the DB server connected to the SAN via a fiber-channel or regular ethernet? Also does the same performance issues arise late at night or is it typically worse say from 6am- 6pm?

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • Blocking isn't (usually) a problem. It does occur occassionally but only when one particular proc runs and doesn't correlate to the issues we see.

    Disks are connected via fibre channel.

    We also have a test server (physical with locally attached 8 disk RAID10 array) with a copy of the Prod database and performance is just as rubbish on this and doesn't have the continual data loads occurring.

    So the fact we have rubbish performance on 2 high spec servers makes me think this is not SQL causing the bottleneck (as it runs very well on the laptop).

    Performance issues are bad any time of the day.

    On the laptop test it was only one user. But it was also one user on the test system and was still rubbish.

  • Have a look at sys.dm_os_waiting_tasks while the application is running slow on your server. What is the most common wait_type for processes that are executing application threads?

    John

  • Stueyd (4/9/2013)


    Anyone got any suggestions where else to check?

    Was performance miserable since day one or has performance deteriorated over time?

    Long shot... have you attempted to restore the offending database on the same server, with a different name then test performance in the newly created copy?

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • The 3 biggest are:

    wait_type wait_time_spct running_pct

    CXPACKET 9290353.1738.61 38.61

    PAGEIOLATCH_SH 4262907.4717.71 56.32

    LCK_M_IX 2542404.0410.57 66.89

    Which I take to mean as we either need to reduce parallelism or the code or indexing is not optimal, I/O could be bottleneck and waiting for an exclusive lock.

    But again, is this relevant when we can reproduce the behaviour on the test system (which isn't running the continual loads)

  • Performance from day 1 was poor and has just got worse.

  • Could be a problem with parallelism, especially given that it occurs on your high-spec servers but not on the lowly laptop. What sort of wait stats do you get on the latter?

    John

  • What is the RAM configuration for the production instance? The default setting could lead the SQL using too much RAM, which can trigger page file use. In Performance monitor, what is the Page Life expentancy?

  • Also check thet "lock pages in memory" has been set at the OS level, if it isn't you could be constantly swapping it all out between RAM and the disk. I would also check to see what the virtual memory has been set to on the server as well

    ______________________________________________________________________________Never argue with an idiot; Theyll drag you down to their level and beat you with experience

  • dan-572483 (4/9/2013)


    What is the RAM configuration for the production instance? The default setting could lead the SQL using too much RAM, which can trigger page file use. In Performance monitor, what is the Page Life expentancy?

    I had the same thought. RAM is 32Gb and I had configured SQL to use 28Gb. As a test I dropped it to 16Gb which had no effect.

    And of course the laptop didn't have any RAM restrictions and ran ok.

  • Long shot but what's the value for Max Degree of Parallelism set to? if it's still set to the default of 0 I would go with the MS recommendation of half the numbe4r of cores and change it to 8.

    ---------------------------------------------------------

    It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
    David Edwards - Media lens[/url]

    Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
    Howard Zinn

  • MyDoggieJessie (4/9/2013)


    Also check thet "lock pages in memory" has been set at the OS level, if it isn't you could be constantly swapping it all out between RAM and the disk. I would also check to see what the virtual memory has been set to on the server as well

    Oooh. Now there's a thought.... We've recently changed our build and this is not set automatically anymore. Just checked and sure enough it's not set. I'll give this a go on the test box.

  • Abu Dina (4/9/2013)


    Long shot but what's the value for Max Degree of Parallelism set to? if it's still set to the default of 0 I would go with the MS recommendation of half the numbe4r of cores and change it to 8.

    Cheers. I'll give this a go on our test server and report back.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply