Performance? Efficient code??

  • Below I have two snipplets of code.  I'm looking for suggestions on which code snipplet within a stored procedure might be more efficient.

    SELECT [table1].LastName, [table1].FirstName, [table1].EmpNo AS EmpID
    FROM table1
    WHERE EmpTermDate IS NULL 
    UNION
    SELECT [table2].LastName, [table2].FirstName, [table2].SocSecNo AS EmpID
    FROM table2
    LEFT JOIN [table1] ON [table2].SocSecNo = [table1].EmpNo
    WHERE [table1].EmpNo IS NULL
    OR 
    ([table1].EmpNo IS NOT NULL
     AND 
    [table1].EmpTermDate IS NOT NULL)
    ORDER BY LastName ASC, FirstName ASC, EmpID ASC
    

    SELECT LastName,FirstName,EmpNo AS EmpID

    FROM table1

    WHERE EmpTermDate IS NULL

    UNION

    SELECT LastName, FirstName, SocSecNo AS EmpID

    FROM table2

    WHERE SocSecNo NOT IN (SELECT EmpNo FROM table1 WHERE EmpTermDate IS NULL)

    ORDER BY LastName ASC,FirstName ASC, EmpID ASC

    Table one includes everyone on our payroll including terminated employees. Table two is our health insurance table which includes retirees. The emp number will match especially with the retirees. The problem also comes into play when you have retirees come back to work on a contractual basis. Somedays I wish they would just go golfing.

    Hopefully my explanation above helps in understanding the data I’m trying to select. Both of the selects above work where we include all the active folks on our payroll (table1), plus all the active health insurance participants (table2). Both code snipplets also weed out the duplicate Retirees and only list them once.

    My question: Is one section of code using a stored procedure more efficient than the other? I’ve tested both selects in my VB.NET project and the second code snipplet is about 50 milliseconds faster. My record count is 1578. Even if the second code snipplet is faster, is it more efficient to use the join? Nobody will really notice 50 milliseconds, so I want to go with the code that will take the least amount of resources from SQL. I used Query analyzer to perform an Estimated Execution Plan and didn’t notice any real differences. Of course I’m new to this, and not sure what I’m looking at.

    Of course you need to understand that my background is COBOL using Image databases, and SQL is new to me.  The answer might be who cares which is more efficient, because with SQL performance is not a problem...

    Any suggestions or feedback is appreciated!! 

     

    Lost in the Vast Sea of .NET

  • The LEFT JOIN will be more efficiant than the NOT IN.

    However, if the only purpose of the LEFT JOIN is to check for a NULL to determine a non-existent record, then NOT EXISTS is a better construct, if only for the sake of making the code's intentions clearer when either yourself (or someone else) has to maintain it months from now:

     

    SELECT LastName,FirstName,EmpNo AS EmpID

    FROM table1

    WHERE EmpTermDate IS NULL

    UNION

    SELECT LastName, FirstName, SocSecNo AS EmpID

    FROM table2

    WHERE NOT EXISTS

      SELECT *

      FROM Table1

      WHERE Table2.SocSecNo = Table1.EmpNo

      AND table1.EmpTermDate IS NULL

    )

    ORDER BY LastName ASC,FirstName ASC, EmpID ASC

     

  • There are many ways to write this code, you just have to try them and weigh what occurrs in the execution plans to decide if any changes such as indexes need to occur.

    Try this (note not 100% sure this is the expected results but should be minor to fix).

    SELECT

     IsNull([table1].LastName, [table2].LastName) AS LastName,

     IsNull([table1].FirstName, [table2].FirstName) AS FirstName,

     IsNull([table1].EmpNo, [table2].SocSecNo) AS EmpID

    FROM

     dbo.table1 table1

    FULL OUTER JOIN

     dbo.table2 table2

    ON

     [table2].SocSecNo = [table1].EmpNo AND

     [table1].EmpTermDate IS NULL

  • I have two questions for you though:

    1. Do you have an index (unique) on table1.EmpNo and on table2.SocSecNo ?

    2. How many rows on each table independently ?

     


    * Noel

  • I have a question for you.  It looks like you are using a left join and the IS NULL to avoid duplicates.  However, you then have:

    OR ([table1].EmpNo IS NOT NULL

    AND

    [table1].EmpTermDate IS NOT NULL)

    Are you expecting this condition to produce dupes?  If not, you could change the UNION to UNION ALL to avoid the subsequent sort and search for dupes.

     

  • In answer to Antares686:

    I've tried the code, but instead of the 1578 rows that I was hoping for I received 4755 in the result set.  This code basically gave me a union of the two sets minus the 9 duplicate entries that I know we have.  I'm still trying to figure out your code to see if I can tweak it and change the results.

    In answer to noeld:

    We do not have these two tables indexed.  EmpNo is the Primary key for table 1 and SocSecNo is the Primary key for table 2.  Table 1 has 4158 records.  Around 1400 of those records are active employees with EmpTermDate blank.  Table 2 has 606 records.  Earlier I mentioned that Table 2 was our health insurance records.  Actually it is not related to the active employees.  This set contains COBRA participants, Retirees and two agencies that get Health insurance from us, but are not on our payroll.

    In answer to rhunt:

    I am expecting 9 duplicates.  The duplicates are retired employees who are actively getting benefits from our health insurance system, but also working on a contractual basis and are active on our payroll.

    Thanks for all the inquiries...  I hope this info helps.  I'm going to look into the code suggestion first, and also into indexing to help performance.  Any further help is appreciated...


    Lost in the Vast Sea of .NET

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply