December 24, 2015 at 10:34 am
Hi
I have 2 queries that return a different count and can't quite see why
This query uses not exists and is very quick but returns around 4k rows
I'm led to believe not exists is a better approach
select count(*) FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT A
INNER JOIN [SiriusPink]..CLAIM C
ON A.claim_id = C.claim_id
where not exists (--A.base_claim_payment_id NOT in (
SELECT 1
FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT AA
INNER JOIN [SiriusPink]..CLAIM_PAYMENT_ITEM BB
ON AA.claim_payment_id = BB.claim_payment_id
INNER JOIN [SiriusPink]..CLAIM CC
ON AA.claim_id = CC.claim_id
WHERE CC.version_id < C.version_id
)
This query takes uses not in, takes about 30 minutes BUT returns around 55k rows
The execution plan shows a very inefficient left anti-semi join Nested Loop
select count(*) FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT A
INNER JOIN [SiriusPink]..CLAIM C
ON A.claim_id = C.claim_id
where A.base_claim_payment_id NOT in (
SELECT 1
FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT AA
INNER JOIN [SiriusPink]..CLAIM_PAYMENT_ITEM BB
ON AA.claim_payment_id = BB.claim_payment_id
INNER JOIN [SiriusPink]..CLAIM CC
ON AA.claim_id = CC.claim_id
WHERE CC.version_id < C.version_id
)
I cannot see what's causing the row count difference
Any ideas?
Thanks
- Damian
December 24, 2015 at 1:16 pm
The 2nd query is wrong:
select count(*) FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT A
INNER JOIN [SiriusPink]..CLAIM C
ON A.claim_id = C.claim_id
where A.base_claim_payment_id NOT in (
SELECT 1
FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT AA
INNER JOIN [SiriusPink]..CLAIM_PAYMENT_ITEM BB
ON AA.claim_payment_id = BB.claim_payment_id
INNER JOIN [SiriusPink]..CLAIM CC
ON AA.claim_id = CC.claim_id
WHERE CC.version_id < C.version_id
)
December 24, 2015 at 1:16 pm
Some DDL for the tables and some sample data would help us more certainly answer that.
Having said that, unless something got lost in copy/paste translation, it looks like there are a couple issues.
The first is that your NOT IN subquery is returning a result set consisting entirely of rows with a column containing only the value 1, so it's effectively just saying return all rows where the base_claim_payment_id isn't 1. I think that's clearly not what you're wanting to do 🙂
The second is that the NOT IN query makes it look like you're wanting to somehow relate the base_claim_payment_id from the outer query to something in the inner query, but the NOT EXISTS subquery doesn't at all tie base_claim_payment_id from the outer query to anything in the inner query.
If you mocked up some sample data and gave us DDL for the tables, along with the sample data in the form of INSERTs and your desired results, we'd be able to help out more.
Cheers!
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply