Perplexing Massive Performance Differences - Bug?

  • I'm very confused. I hope someone here can shed some light on what might be happening.

    There is a job which calls a DTS package, which in turn calls a single stored proc (I know, not the best but that's what the developer wanted). Anyway, this proc has been taking several hours each night to complete (as much as 5 hours one night). However, when the proc is run manually through Query Analyzer, it runs in only a minute. I know it's not time dependent because the performance hit happened when we would start the job during the middle of the day as well. And what's really puzzling for me -- the job is now running in a minute.

    A little more detail. This morning, I kicked off a full profile on the main server and the linked server it references. I started the job and, 7 minutes later, I cancelled it. So it was gearing up to take forever. I then start another prfile, run the sproc in QA, and see it takes one minute. I kick off another profile, start it as a job, and this time it only takes a minute! I'm kind of floored at this point. Now, every time I subsequently run the job, it only takes a minute.

    Here's what I've found from the profiles. After an initial TRUNCATE TABLE, there is an insert statement which calls on local tables and views which reference a linked server. It spends almost no time on the linked server, so I can disregard it as a factor. The statement will always insert over 274k rows, regardless of time of day. The first profile was showing a pattern of 4 events repeated for the entire 7 minutes: SP:ExecContextHit, SP:Starting (for the insert statement), SP:Completed (with a duration of 0), Transaction Log - InsertRows(Clustered). There were over 54k sets of these. The subsequent profiles, though, only show a single pair of SP:Starting/Completion followed by a single, huge chunk of the Transaction Log - InsertRows(Clustered) events. In each run, there is a single transaction ID for the entire batch.

    So in the first instance it looks like it's inserting into the transaction log after every record but subsequent runs inserts them all in a single batch. It's like there's a new transaction for every record (but again, there is only a single transaction ID). And note that I get both from running the same job in the same way -- so the job or dts definitions or the execution method are not factors.

    In case you're interested, here's the piece of code which is at play:

    TRUNCATE Table dbo.tblCorporateDeliveryDispatch

    INSERT dbo.tblCorporateDeliveryDispatch

    SELECT dbo.vwCorporateDelivery.BranchAN, dbo.vwCorporateDelivery.PreSellAn, dbo.vwCorporateDelivery.CustomerNr, dbo.vwCorporateDelivery.DeliveryDt,

    dbo.vwCorporateDelivery.ProductAn, NorandLoad.ProductAn as NorProductAn, dbo.vwCorporateDelivery.OrderQt, dbo.vwCorporateDelivery.DeliveryQt, NorandLoad.SalesBranchAn,

    NorandLoad.CasesQt as DispatchQt, NorandLoad.RouteAn, 0 as Hotshots

    FROM dbo.vwCorporateDelivery (NOLOCK) FULL OUTER JOIN

    dbo.vwNorandLoadByProduct NorandLoad (NOLOCK) ON dbo.vwCorporateDelivery.BranchAN = NorandLoad.SalesBranchAn AND

    dbo.vwCorporateDelivery.CustomerNr =NorandLoad.CustomerNr AND

    dbo.vwCorporateDelivery.DeliveryDt =NorandLoad.DeliveryDt AND

    dbo.vwCorporateDelivery.ProductAn = NorandLoad.ProductAn

    WHERE dbo.vwCorporateDelivery.ProductAn <> '000000' AND dbo.vwCorporateDelivery.DeliveryQt>-1

    The table has 12 nullable columns, a single default (which is not used b/c of the "0 as Hotshots"), no triggers, and no indexes (including, of course, primary keys).

    I wonder if it's not something to do with indexes or the system cache. There is no autoshrinking or reindexing going on. Maybe it's a SQL bug. I'm going to try changing the insert statement to use explicit columns and see what it does tonight.

    Any thoughts? Thanks!

  • What happens in you run the DTS package from DTS designer?

    Does the stored proc have any params that could have different values?

    Can you change the job so it calls the proc directly? If that's all the DTS package does then there's no point. ('The delevloper wants it that way' is never an acceptable reason for doing something wierd.) And if so how does it run?

    What is vwCorporateDelivery? A view? If so what are the base tables like (indexes, columns?)

    Do you have any form of IO bottleneck? If there's a delay in writing to the tran log it's indicative of inadequate io throughput (personal experience, my DB is showing that right now)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the reply!

    Running it from the DTS designer or calling the proc directly from a job doesn't make a difference. They both behave the same. But I'm going to put pressure on the developer to change his method. Unfortunately, I don't have direct power to make him change, but he's real cool with listening to suggestions.

    No, there are no params being used.

    vwCorporateDelivery is a view that simply selects from a table via linked server. The trace showed, though, that there is extremely little impact or time spent on the linked server. Even though it's a huge table, the WHERE clause is effective.

    I suspect it could be some kind of IO bottleneck. The trick, though, is catching it. Ever since it ran quickly yesterday, it's been going great. Last night's run only took a minute and a half. The only thing I changed was I added explicit column names to the INSERT INTO clause.

    One thing I discovered yesterday was that things were running great until last Thursday or Friday. I don't know what changed, though. A version of the proc from a couple days before showed no differences. On Friday, I turned off transactions in the DTS package to test if it was a DTC issue and it ran quickly. But then it started dying again at nights. Besides, it wasn't changed since then when I saw it start to run quickly yesterday. So I don't think there's anything in the code or user objects.

    I very much suspect that it's something to do with SQL Server dynamically changing its caches and/or buffers. I have no explanation other than it may have re-optimized itself based on the workload from my running it a couple of times yesterday. It could have expanded a certain pool and thus enabled it to cache the inserts into one big write instead of as individuals. I'll get some stats on its buffers now and then compare them to the server's values if the proc starts to run poorly again.

  • OK, here are a few suggestions.

    Run profiler overnight and filter so that you'll only catch this sp.

    Recompile the stored procedure if you haven't already. (sp_recompile) or mark if for recompile every time it runs (CREATE PROCEDURE <proc name> WITH RECOMPILE)

    check the statistics and index fragmentation (may not affect anything, but good to be sure)

    Run performance monitor and capture the following - reads/sec, sec/reads and Avg. Disk Queue Length (under physical disk). Your sec/read should ideally be around 15ms. If they exceed 100 you have problems. Also capture page faults/sec (under memory) and Page Splits/sec (under SQL Server access methods)

    Set the trace to capture 1/min or 1/ 2 min. You'll end up with about a 500k csv for 24 hours. That should give you a good idea if there are any bottlenecks with disk or memory.

    HTH

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply