March 10, 2005 at 9:47 am
Below I have two snipplets of code. I'm looking for suggestions on which code snipplet within a stored procedure might be more efficient.
SELECT [table1].LastName, [table1].FirstName, [table1].EmpNo AS EmpID FROM table1 WHERE EmpTermDate IS NULL UNION SELECT [table2].LastName, [table2].FirstName, [table2].SocSecNo AS EmpID FROM table2 LEFT JOIN [table1] ON [table2].SocSecNo = [table1].EmpNo WHERE [table1].EmpNo IS NULL OR ([table1].EmpNo IS NOT NULL AND [table1].EmpTermDate IS NOT NULL) ORDER BY LastName ASC, FirstName ASC, EmpID ASC
SELECT LastName,FirstName,EmpNo AS EmpID
FROM table1
WHERE EmpTermDate IS NULL
UNION
SELECT LastName, FirstName, SocSecNo AS EmpID
FROM table2
WHERE SocSecNo NOT IN (SELECT EmpNo FROM table1 WHERE EmpTermDate IS NULL)
ORDER BY LastName ASC,FirstName ASC, EmpID ASC
Table one includes everyone on our payroll including terminated employees. Table two is our health insurance table which includes retirees. The emp number will match especially with the retirees. The problem also comes into play when you have retirees come back to work on a contractual basis. Somedays I wish they would just go golfing.
Hopefully my explanation above helps in understanding the data I’m trying to select. Both of the selects above work where we include all the active folks on our payroll (table1), plus all the active health insurance participants (table2). Both code snipplets also weed out the duplicate Retirees and only list them once.
My question: Is one section of code using a stored procedure more efficient than the other? I’ve tested both selects in my VB.NET project and the second code snipplet is about 50 milliseconds faster. My record count is 1578. Even if the second code snipplet is faster, is it more efficient to use the join? Nobody will really notice 50 milliseconds, so I want to go with the code that will take the least amount of resources from SQL. I used Query analyzer to perform an Estimated Execution Plan and didn’t notice any real differences. Of course I’m new to this, and not sure what I’m looking at.
Of course you need to understand that my background is COBOL using Image databases, and SQL is new to me. The answer might be who cares which is more efficient, because with SQL performance is not a problem...
Any suggestions or feedback is appreciated!!
March 10, 2005 at 9:52 am
The LEFT JOIN will be more efficiant than the NOT IN.
However, if the only purpose of the LEFT JOIN is to check for a NULL to determine a non-existent record, then NOT EXISTS is a better construct, if only for the sake of making the code's intentions clearer when either yourself (or someone else) has to maintain it months from now:
SELECT LastName,FirstName,EmpNo AS EmpID
FROM table1
WHERE EmpTermDate IS NULL
UNION
SELECT LastName, FirstName, SocSecNo AS EmpID
FROM table2
WHERE NOT EXISTS
SELECT *
FROM Table1
WHERE Table2.SocSecNo = Table1.EmpNo
AND table1.EmpTermDate IS NULL
)
ORDER BY LastName ASC,FirstName ASC, EmpID ASC
March 10, 2005 at 10:12 am
There are many ways to write this code, you just have to try them and weigh what occurrs in the execution plans to decide if any changes such as indexes need to occur.
Try this (note not 100% sure this is the expected results but should be minor to fix).
SELECT
IsNull([table1].LastName, [table2].LastName) AS LastName,
IsNull([table1].FirstName, [table2].FirstName) AS FirstName,
IsNull([table1].EmpNo, [table2].SocSecNo) AS EmpID
FROM
dbo.table1 table1
FULL OUTER JOIN
dbo.table2 table2
ON
[table2].SocSecNo = [table1].EmpNo AND
[table1].EmpTermDate IS NULL
March 10, 2005 at 10:26 am
I have two questions for you though:
1. Do you have an index (unique) on table1.EmpNo and on table2.SocSecNo ?
2. How many rows on each table independently ?
* Noel
March 10, 2005 at 11:09 am
I have a question for you. It looks like you are using a left join and the IS NULL to avoid duplicates. However, you then have:
OR ([table1].EmpNo IS NOT NULL
AND
[table1].EmpTermDate IS NOT NULL)
Are you expecting this condition to produce dupes? If not, you could change the UNION to UNION ALL to avoid the subsequent sort and search for dupes.
March 10, 2005 at 11:31 am
In answer to Antares686:
I've tried the code, but instead of the 1578 rows that I was hoping for I received 4755 in the result set. This code basically gave me a union of the two sets minus the 9 duplicate entries that I know we have. I'm still trying to figure out your code to see if I can tweak it and change the results.
In answer to noeld:
We do not have these two tables indexed. EmpNo is the Primary key for table 1 and SocSecNo is the Primary key for table 2. Table 1 has 4158 records. Around 1400 of those records are active employees with EmpTermDate blank. Table 2 has 606 records. Earlier I mentioned that Table 2 was our health insurance records. Actually it is not related to the active employees. This set contains COBRA participants, Retirees and two agencies that get Health insurance from us, but are not on our payroll.
In answer to rhunt:
I am expecting 9 duplicates. The duplicates are retired employees who are actively getting benefits from our health insurance system, but also working on a contractual basis and are active on our payroll.
Thanks for all the inquiries... I hope this info helps. I'm going to look into the code suggestion first, and also into indexing to help performance. Any further help is appreciated...
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply