How can I improve this? Any suggestions?

  • I'm trying to write a script that returns rows in one table that are not in another table. One table is supplier data plstat and some example for one supplier for the year 2008 is:

    [pls_supplier] [pls_year] [pls_period]

    A1S001 2008 1

    A1S001 2008 2

    A1S001 2008 3

    A1S001 2008 4

    A1S001 2008 5

    A1S001 2008 6

    A1S001 2008 7

    A1S001 2008 8

    A1S001 2008 9

    A1S001 2008 10

    A1S001 2008 11

    A1S001 2008 12

    My other table is a table variable called @pltransactions and has data like this in for the same supplier for 2008:

    [supplier] [year] [month] [type] [purchases] [others] [cash]

    A1S001 20083c178.400031.2200.0000

    A1S001 20087c.0000 .0000 -209.6200

    The best code I have come up with so far is:

    SELECT DISTINCT pls_supplier, pls_year, pls_period

    FROM plstat LEFT OUTER JOIN @pltransactions

    ON [supplier] = pls_supplier AND [year] = pls_year AND [month] = pls_period

    WHERE [purchases] IS NULL AND [others] IS NULL AND [cash] IS NULL

    ORDER BY pls_supplier, pls_year, pls_period

    In other words, I want to know which periods in plstat are not found in the table variable for the supplier. The above code gives me the right answer which is:

    pls_supplier pls_year pls_period

    A1S001 2008 1

    A1S001 2008 2

    A1S001 2008 4

    A1S001 2008 5

    A1S001 2008 6

    A1S001 2008 8

    A1S001 2008 9

    A1S001 2008 10

    A1S001 2008 11

    A1S001 2008 12

    However! On the data I have, that code takes nearly 100 seconds to run, which is far too long for a production report. The full data is around 21,000 rows in plstat and 16,000 rows in the table variable. plstat has one row per year/period per supplier, but the table variable can have several rows per year/period per supplier.

    It's not an enormous amount of data, so it must be my poor coding skills that's holding it back. Any great ideas out there for improving this?

    Regards,

    Mark

  • Is there any way you could replace the table variable with a temp table? The table variable is no good for so much data. It would not be able to use statistics and this can result in bad performance.

    Also what (if any) indexes do you have on your table?

  • Have a look at the actual execution plan, this will show you where the performance bottlenecks are. Post them here if you need help analysing them.

    Also you should consider using a temp table or perm table so that you can add indexes.

  • steveb. (7/7/2010)


    Have a look at the actual execution plan, this will show you where the performance bottlenecks are. Post them here if you need help analysing them.

    Also you should consider using a temp table or perm table so that you can add indexes.

    Hi Rihojal and Steveb,

    I was hoping to use a table variable because a) I was a bit temp table-happy when I first started with SQL and realised later that it's best to avoid them; and b) the data in the table variable has quite a few duplicates.

    I don't have an index on the table variable as even with an index that includes every field in the table, it's still not unique.

    I'll do as you suggest and look at the execution plan and also compare the speed with a temp table.

    Cheers for your help.

    Regards,

    Mark

  • Why did you think it was best to avoid temp tables and use table variables instead?

  • mmcardle (7/7/2010)


    I was hoping to use a table variable because a) I was a bit temp table-happy when I first started with SQL and realised later that it's best to avoid them; and b) the data in the table variable has quite a few duplicates.

    You might want to check out this article on Comparing Table Variables to Temporary Tables[/url]

    Just a point to make: many of the gurus here only use temp tables - mainly because of the statistics issue on table variables causing inadequate execution plans that are avoided by the use of temporary tables.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • If your supplier data is unique the you don't need distinct when looking for records that don't match since there should only be one of each to begin with. And if your transaction data does not allow nulls then you only need to look for one field to be null instead of all 3. Putting the transaction data in a temp table with an index on the supplier code,year and month would help with the query but may be a trade off for the time taken to build the index on a temp table in the first place. Also all tables , temp or not, should have a unique key on them.


  • mrpolecat (7/7/2010)


    If your supplier data is unique the you don't need distinct when looking for records that don't match since there should only be one of each to begin with. And if your transaction data does not allow nulls then you only need to look for one field to be null instead of all 3. Putting the transaction data in a temp table with an index on the supplier code,year and month would help with the query but may be a trade off for the time taken to build the index on a temp table in the first place. Also all tables , temp or not, should have a unique key on them.

    Many thanks for your and everyone's input above. I've decided to go down the temp table route. I find table variables useful, but I can see now that they are best suited to small amounts of data. This is the biggest table variable I've ever used; most of them have far fewer than 100 rows in them, so that is ideal.

    A lot of the books, training courses, and online articles I have read have said that temp tables should be a last resort. I'd been using them for years on Informix SQL. I guess there are times when it could be deemed to be lazy coding to use temp tables instead of thinking through a requirement, but temp tables do have their uses.

    Thanks again for all your help and for helping me continue my journey of learning.......

  • Just reporting back......

    OK, the temp table route runs in less than 2 seconds......an acceptable speed given the amount of data and processing involved (of which I only showed you a snippet of because it was the main time-lag). Brilliant!

    I used a temp table with an index on the relevant columns for the query links and adding an identity column for uniqueness.

    It works great and I have learned my lesson about table variables Vs. temp tables. Both have their uses depending on circumstances.

    Thanks for all your help.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply