What's the fastest way to define a dataset in one SP and then use it in another?

  • I'm working on a system with a whole heap of SPs, many of which do a lot of complicated calculations. Some of these need to do the following sort of thing:

    1. SP1 defines a set of data upon which calculations need to be based.
    2. It passes that set to SP2, which does the calculation.

    My question relates to the seemingly-innocuous "passes that set to SP2". We're currently achieving this by:

    1. SP1 fills up a table X with the dataset and calls SP2.
    2. SP2 does its calculation based on the data in X.
    3. When SP2 finishes, SP1 clears its data out of X.

    This seems to work but we do get blocking / deadlocking when accessing X. And it means we have (yet) another table in our database which we're using to store transient data.

    We have tried:

    1. SP1 creates a temporary table Y, fills it up with the dataset and calls SP2.
    2. SP2 does its calculation based on the data in Y.
    3. When SP2 finishes, SP1 drops table Y.

    This usually turns out to be slower than the "permanent" table approach. (As an aside, it also means that you don't get intellisense when editing SP2, because the definition of temporary table Y is not in context.)

    More recently, we've tried user-defined table types:

    1. SP1 declares a local variable, Z, of a certain UDTT, fills it up with the dataset and calls SP2, passing Z as a (read-only) parameter.
    2. SP2 does its calculation based on the data in parameter Z.

    This is great because it makes it much clearer what data SP2 is working on - it's the data in the parameter, not data in some distant table. However, this has led to worse performance, particularly when there are many rows in Z - presumably because of SQL assuming it contains only 1 row (or it may be 100), irrespective of how many rows it actually contains.

    We tried adding "Option (Recompile)" to the statements using parameter Z, but this had mixed results in terms of performance.

    We haven't tried the approach in which SP2 caches the contents of parameter Z in its own temporary table and then uses this, the benefit being that SQL will maintain accurate statistics on it.

    So my question is; do any of you clever people out there have any suggestions for the optimal approach in terms of performance?

    Thanks.

  • At a total guess, perhaps, rather than having SP1 run and pass the data to SP2, why not have SP2 run SP1 and get the data from it? You could use an INSERT INTO...EXEC... statement and then you can reference the object, and have intellisense on the object you inserted into, as SP2 created the tables and handled the INSERT, not SP1.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Tight on time this morning so I may have overlooked some details when skimming through your post.  That said, here is what I would do:

    • Add GUID and datetime2 columns to table X
    • Move deletes out to an async job which runs nightly or weekly (deletes based on a timestamp)
    • Proc 1 - Create GUID variable, insert into X with GUID and datetime stamp, pass GUID to Proc 2
    • Proc 2 - perform work on X using GUID (to identity sub-set of data in X)
    • Be sure to index X appropriately

    This strategy should mitigate blocking and deadlocking problems.

    • This reply was modified 5 years, 5 months ago by  heb1014.
  • How complex is SP1 - could it be converted to an inline-table valued function?  Since all this does is generates the data to be used by SP2 - it might be easier to use an iTVF instead of a temp (or perm) table.

    If that isn't possible - or it doesn't improve performance - then I agree with Thom.  Have SP2 call SP1 and load the data to a temp table using INSERT ... EXEC.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Apologies - I completely forgot to say that table X has a SPID column that's used to isolate the data between processes. The column has a nonclustered index on it. I think the blocking / deadlocking arises from this particular use; table X may have, for example, 1000 rows for process A and 5 rows for process B. SQL might then decide to give process A a table lock as it 'owns' 99.5% of the rows.

  • Unfortunately, the way we've designed much of the code means that SP1, SP2 ... SPn are all quite complex so can't be converted to UDFs - they're not just generating the data for the calculation.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply