efficient Querying - NOT EXISTS

Question

efficient Querying - NOT EXISTS

etl2016

Default port

Points: 1459
More actions
May 3, 2016 at 10:03 am

#313703

hi,
I have 1+5=6 tables, each of which have records around 200 million.
I am required to aggregate a KPI from one of those 6 tables, for all those records whose ID does NOT exists in the remaining 5 tables.
I think I should use "NOT EXIST" feature. Is there a better way? How best these large tables be JOINed or NOT EXISTS feature be crafted to take care of performance? The ID field in the question is not indexed.
Is it like:
Select sum (kpi) from Table_1 T1
where
ID NOT EXITS ( Select * from T2 WHERE T1.ID = T2.ID)
AND ID NOT EXITS ( Select * from T3 WHERE T1.ID = T3.ID)
AND ID NOT EXITS ( Select * from T4 WHERE T1.ID = T4.ID)
AND ID NOT EXITS ( Select * from T5 WHERE T1.ID = T5.ID)
AND ID NOT EXITS ( Select * from T6 WHERE T1.ID = T6.ID)
thank you

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101233 More actions · Answer 1

You need to create an index on ID in T2, T3, T4, T5 and T6.

You should check the tables in the order of most likely to EXIST first, so that you can exit as quickly as possible once a matching row is found.

SELECT SUM (kpi) AS kpi_total

FROM Table_1 T1

WHERE

NOT EXISTS ( SELECT * FROM T2 WHERE T1.ID = T2.ID )

NOT EXISTS ( SELECT * FROM T3 WHERE T1.ID = T3.ID )

NOT EXISTS ( SELECT * FROM T4 WHERE T1.ID = T4.ID )

NOT EXISTS ( SELECT * FROM T5 WHERE T1.ID = T5.ID )

NOT EXISTS ( SELECT * FROM T6 WHERE T1.ID = T6.ID )

Edit: Corrected "EXITS" to "EXISTS".

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Phil Parkin SSC Guru Points: 247211 More actions · Answer 2

ScottPletcher (5/3/2016)
You need to create an index on ID in T2, T3, T4, T5 and T6.
You should check the tables in the order of most likely to EXIST first, so that you can exit as quickly as possible once a matching row is found.
SELECT SUM (kpi) AS kpi_total
FROM Table_1 T1
WHERE
NOT EXITS ( Select * from T2 WHERE T1.ID = T2.ID )
NOT EXITS ( Select * from T3 WHERE T1.ID = T3.ID )
NOT EXITS ( Select * from T4 WHERE T1.ID = T4.ID )
NOT EXITS ( Select * from T5 WHERE T1.ID = T5.ID )
NOT EXITS ( Select * from T6 WHERE T1.ID = T6.ID )

And note that EXISTS contains the letter 's' twice 😀

drew.allen SSC Guru Points: 77014 More actions · Answer 3

ScottPletcher (5/3/2016)
You need to create an index on ID in T2, T3, T4, T5 and T6.
You should check the tables in the order of most likely to EXIST first, so that you can exit as quickly as possible once a matching row is found.
SELECT SUM (kpi) AS kpi_total
FROM Table_1 T1
WHERE
NOT EXISTS ( SELECT * FROM T2 WHERE T1.ID = T2.ID )
NOT EXISTS ( SELECT * FROM T3 WHERE T1.ID = T3.ID )
NOT EXISTS ( SELECT * FROM T4 WHERE T1.ID = T4.ID )
NOT EXISTS ( SELECT * FROM T5 WHERE T1.ID = T5.ID )
NOT EXISTS ( SELECT * FROM T6 WHERE T1.ID = T6.ID )
Edit: Corrected "EXITS" to "EXISTS".

I think you're also missing a few "AND"s.

Also, the physical processing order of the NOT EXISTS clauses isn't guaranteed. The optimizer can choose to evaluate them in any order.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

MadAdmin SSChampion Points: 11261 More actions · Answer 4

you could try doing left joins if the query isn't fast enough.

It may give better speed, but be careful of the possibility of duplicating rows.

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]

pietlinden SSC Guru Points: 63631 More actions · Answer 5

You could... but I'd read this first:

http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

I would have sworn that EXISTS stops as soon as a True/False changes, while an IN reads the whole table. So I would think that EXISTS would be more efficient. Might depend on what you're doing, though.

ScottPletcher SSC Guru Points: 101233 More actions · Answer 6

pietlinden (5/4/2016)
You could... but I'd read this first:
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
I would have sworn that EXISTS stops as soon as a True/False changes, while an IN reads the whole table. So I would think that EXISTS would be more efficient. Might depend on what you're doing, though.

Yes, an EXISTS/NOT EXISTS will stop as soon as it can, but with no index available, it must go thru the entire table when no matching row exists.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

pietlinden SSC Guru Points: 63631 More actions · Answer 7

Interesting... Good to know! So check your indexes, then try each one...

Thanks, Scott!

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 8

pietlinden (5/4/2016)
You could... but I'd read this first:
http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/
I would have sworn that EXISTS stops as soon as a True/False changes, while an IN reads the whole table. So I would think that EXISTS would be more efficient. Might depend on what you're doing, though.

That's a property of the semi join that implements the check, which both IN and EXISTS can use; since both can use it, there will not necessarily be any performance difference between the two (also the results of that cited article, where they had identical execution plans and IO).

Cheers!