March 3, 2005 at 2:05 pm
Would someone please help me with the proper tsql syntax to perform the following:
I need to list all the account_number values in table1 that have the inactive_flag set to 1 and exist in table2 and/or table3 and/or table4.
Basically, list the account_number if it exists at least once in any one of the three other tables.
I only want to list the account_number in table1 once (in account_number order), even though it may be in one or more of the other tables.
I do NOT want to list the account_number from table1 if it does not exist in all three of the other tables.
Any help will be appreciated.
Howard
March 3, 2005 at 2:13 pm
SELECT account_number
FROM table1
WHERE inactive_flag = 1
AND (
EXISTS ( SELECT table2.account_number FROM table2 WHERE table2.account_number = table1.account_number )
OR EXISTS ( SELECT table3.account_number FROM table3 WHERE table3.account_number = table1.account_number )
OR EXISTS ( SELECT table4.account_number FROM table4 WHERE table4.account_number = table1.account_number )
)
I think that will do the trick. It should at least be close. Another variation would be as follows:
SELECT account_number
FROM table1
WHERE inactive_flag = 1
AND EXISTS ( SELECT table2.account_number FROM table2 WHERE table2.account_number = table1.account_number
UNION SELECT table3.account_number FROM table3 WHERE table3.account_number = table1.account_number
UNION SELECT table4.account_number FROM table4 WHERE table4.account_number = table1.account_number )
March 3, 2005 at 2:26 pm
Both appear to work just fine. Thanks for your help.
Do you have any idea as to which one would be more efficient when running against large tables?
Howard
March 3, 2005 at 2:30 pm
something like this should work if your indexes are clean
select account_number
from table1
Where InactiveFlag=1
AND (
Account_number IN ( Select Distinct AccountNumber From table2 )
OR Account_number IN ( Select Distinct AccountNumber From table3 )
OR Account_number IN ( Select Distinct AccountNumber From table4 )
)
March 3, 2005 at 2:34 pm
Then again, here is article that might shed some light on speed and efficiency.
http://archives.postgresql.org/pgsql-performance/2002-12/msg00185.php
seems that exists is better on larger sub-selects, where as in can be better on smaller sub-selects
March 3, 2005 at 2:36 pm
I'm going to say the former based on the following. If it only exists in table2 SQL Server should know enough in the first method to not even bother looking at the other two EXISTS clauses due to them using an OR constraint. Similary, if it is not in table 2 but is in table3 it wouldn't have to check table4. I'm not going to swear that SQL Server will actually evaluate them in the order you list them, but I would hope that it would know enough not to always evaluate every one. I've never done any analysis (or read anything to my knowledge) to prove that ORed conditions can be ignored by SQL Server once one has been satisfied.
The second method requires all three subselects to be evaluated before doing the condition check. What's more, they would need to be merged into a single result which although very fast is technically another step.
March 4, 2005 at 7:50 am
In the statement below from Aaron you should use UNION ALL, not UNION. A UNION on its own ALWAYS forces a sort to remove duplicates regardless of if duplicates actually exist. A UNION ALL does not do any sorting.
However, Aaron's statement using OR EXISTS is possibly your best solution. The version with UNION will always force all 3 tables to be scanned, but the OR EXISTS gives SQL a chance to stop scanning once it has found a match. Even if your current SQL fix level scans all 3 tables with OR EXISTS, a new fix level may avoid it.
SELECT account_number
FROM table1
WHERE inactive_flag = 1
AND EXISTS ( SELECT table2.account_number FROM table2 WHERE table2.account_number = table1.account_number
UNION SELECT table3.account_number FROM table3 WHERE table3.account_number = table1.account_number
UNION SELECT table4.account_number FROM table4 WHERE table4.account_number = table1.account_number )
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
March 4, 2005 at 7:54 am
Well, I wasn't expecting SQL to ignore the extra OR's, but I think it has. I ran the following statement against a test table I already had laying around with >800,000 records in it:
SELECT * FROM TEST
WHERE ID IN (SELECT ID FROM TEST)
OR ID IN (SELECT ID FROM TEST)
OR ID IN (SELECT ID FROM TEST)
The query with the extra "OR"s only took 4 extra seconds. The execution plan shows records coming from TEST and the first subquery of test. The second and third subquery show zero records contributing to the query.
March 4, 2005 at 8:40 am
I couldn't help but do some more research on this. Using 4 different tables I put together various queries. SQL Server is actually more aggressive on its optimization than I guessed. In all cases (EXISTS vs IN and OR vs UNION ALL) SQL Server always quit searching as soon as it found a single row. Not just a result from a single table, but a single row from the first table with any matching rows (1 or more of them). Since the tables I was using were relatively large (100,000 to 500,000 rows) I did see that using IN was slower than using EXISTS. I did not see a performance difference between OR and UNION ALL. Finally, SQL Server uses its own best estimate as to which subselect to perform first when ORing them together and so it is not necessarily the first one you list. However, when doing a UNION ALL it will check the tables in the order you list them. As a result, the UNION ALL is usually less efficient than ORing them together unless you happened to list them in the correct order.
March 4, 2005 at 10:18 am
March 4, 2005 at 5:08 pm
you know, I have seen quite a few optimization programs that all prefer to use the COALESCE over any IN, EXIST or UNION queries. I always thought that there must be a reason for that commonality between these programs.
March 8, 2005 at 9:33 am
Thanks to everyone for the excellent responses to my question.
Howard
March 8, 2005 at 9:47 am
For some reason the following example does not work. It gives me a higher count than all of the other examples.
SELECT * FROM TEST
WHERE ID IN (SELECT ID FROM TEST)
OR ID IN (SELECT ID FROM TEST)
OR ID IN (SELECT ID FROM TEST)
Does anyone know why?
Howard
March 8, 2005 at 9:54 am
For those that are following this thread in response to this suggestion:
I received the following error:
(135450 row(s) affected)
Server: Msg 248, Level 16, State 1, Line 1
The conversion of the varchar value '010115101001' overflowed an int column. Maximum integer value exceeded.
The account_number column is char 12 if this helps.
Howard
March 8, 2005 at 11:10 am
Since it is a varchar field and not int, you will need to set your fallback value in the coalesce to a varchar value.
coalesce(b.accountNumber, c.accountNumber, d.accountNumber,'No Value') <> 'No Value'
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply