March 17, 2014 at 4:12 pm
I'm having odd performance difference when running the same query two different ways. Literally, the only difference is whether its wrapped in an if statement.
This query is actually part of a larger one, but I've isolated it as the culprit.
Running the query by itself returns almost immediately (returns 0)
select COUNT(*) from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
))
Ultimately, I want to avoid running complex calculations based on the result of that query, so I wrap it in an if statement like so, but it takes much longer to run (~10 seconds) and I can't figure out why:
if (0 = (select COUNT(*) from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
)))
begin select 'update will be skipped to save time' end
else begin select 'missing rows will be inserted' end
The data set does not change so in both cases the result is zero / 'update will be skipped', and yet running these two versions of the same query always results in the first version completing quickly, and the second version taking about 10-12 seconds to complete.
the following query performs identically to the second version above, with the same execution plan and no increase in performance (i.e. using 'exists' instead of comparing count(*) to zero makes no difference).
if exists(select 1 from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
))
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end
Changing the query to use left or right join where data.GamingReport_Computerized.ResponseID is null performs the same way as using 'select where not in'.
It seems that using a select statement inside an if statement is causing a bad query execution plan to form. The only way around this I've found is to first assign the result of count to a variable and then test the variable with an if statement. Any combination that involves using 'if' and 'select' in the same statement results in a bad execution plan with poor performance.
See discussion here as well: http://stackoverflow.com/q/22465078/88409
March 17, 2014 at 5:14 pm
Try using NOT EXISTS instead NOT IN.
if exists(
select *
from Responses r
where not exists (
select *
from data.GamingReport_Computerized as t
where t.ResponseID = r.ResponseID
)
)
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end
It will be helpful having an index by ResponseID in each table if you do not have one.
March 17, 2014 at 5:35 pm
I have to say "I don't know", but it's interesting and if anyone wants to join in, this AdventureWorks query shows similar symptoms:
select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
if (select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
) = 0
print 'count is zero'
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
March 17, 2014 at 8:17 pm
mister.magoo (3/17/2014)
I have to say "I don't know", but it's interesting and if anyone wants to join in, this AdventureWorks query shows similar symptoms:
select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
if (select count(*)
from sales.salesorderdetail
where salesorderid not in (select salesorderheader.salesorderid from sales.salesorderheader)
) = 0
print 'count is zero'
As is often the case, we need to review the query plan. The first query gets an estimated row count of 178.5 coming out of the right-anti-semi-join so it chooses a hash join type. The second query has an estimate of 1 row and chooses a merge right-anti-semi-join.
Curious is that you can affect the plan type with different number checks on the IF.
if (select count(*)
from Sales.SalesOrderDetail
where SalesOrderID not in (select SalesOrderHeader.SalesOrderID from Sales.SalesOrderHeader)) > 10000
print 'count is zero'
That one switches back to the hash join. This could be a bug or feature in the optimizer. However, I note that you should pretty much never ever do a query like this. You are forcing the engine to touch EVERY ROW just to determine if there is one or more rows. That should be an EXIST query.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 17, 2014 at 9:01 pm
This is strange, I tried it on one of my databases and got a similar results. Even after creating indexes that I noticed missing. I had the best performance with the following query on my tables.
IF EXISTS (
SELECT COUNT(*)
FROM table1 c
WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
Edit: Remove a tray NOT
I fudged up a test set in case to try and narrow it down. I could not replicate the performance results.
I tried this with MAXDOP on the Server set to 0 and 1.
This is my test set and queries. Though it doesn't really help answer you question:Whistling:
/*
CREATE TABLE Table1 (
ID INT IDENTITY(1,1) PRIMARY KEY
, N INT NOT NULL
, Something Varchar(21) DEFAULT 'Put Something it here'
);
--CREATE INDEX Tab1_IDX1 ON Table1(ID) --Also tried this rather than a primary key
INSERT INTO Table1 (N) SELECT TOP 1000000 N FROM Tally
CREATE TABLE Table2 (
ID INT NOT NULL
, Something Varchar(21) DEFAULT 'Put Something it here'
)
INSERT INTO Table2 (ID) SELECT TOP 300000 N + 500 N FROM Tally
INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1000 N FROM Tally
INSERT INTO Table2 (ID)SELECT TOP 300000 N + 1500 N FROM Tally
INSERT INTO Table2 (ID)SELECT TOP 300000 N + 2000 N FROM Tally
CREATE INDEX Tab2_IDX1 ON Table2(ID)
CREATE TABLE Table3 (
ID INT NOT NULL
, Something Varchar(21) DEFAULT 'Put Something it here'
)
INSERT INTO Table3 (ID) SELECT TOP 500000 N FROM Tally
INSERT INTO Table3 (ID) SELECT TOP 500000 N + 250000 N FROM Tally
INSERT INTO Table3 (ID) SELECT TOP 500000 N + 500000 N FROM Tally
CREATE INDEX Tab3_IDX1 ON Table3(ID)
*/
DECLARE @C int
DECLARE @d DATETIME2 = GETDATE()
PRINT '--------------'
SELECT @C = COUNT(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table2)
PRINT @C
PRINT 'Straight Query - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT COUNT(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table2)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT IN - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT count(*)
FROM table1 c
WHERE NOT EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT EXISTS - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF EXISTS (
SELECT COUNT(*)
FROM table1 c
WHERE EXISTS ( SELECT 1 FROM table2 z WHERE z.ID = c.ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
PRINT 'IF with HAVING - Not 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
SELECT @C = COUNT(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table3)
PRINT @C
PRINT 'Straight Query NOT IN - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT count(*)
FROM table1 c
WHERE c.ID NOT IN ( SELECT ID FROM table3)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT IN - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF (
SELECT count(*)
FROM table1 c
WHERE NOT EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
) = 0
PRINT ' Result 0'
PRINT 'IF with NOT EXISTS - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
SET @d = GETDATE()
PRINT '--------------'
IF EXISTS (
SELECT COUNT(*)
FROM table1 c
WHERE EXISTS ( SELECT 1 FROM table3 z WHERE z.ID = c.ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
PRINT 'IF with HAVING - 0'
PRINT DATEDIFF(ms,@d,GETDATE())
PRINT '--------------'
/*
DROP TABLE Table1
DROP TABLE Table2
DROP TABLE Table3
*/
March 17, 2014 at 9:08 pm
If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever! :crazy:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 17, 2014 at 11:00 pm
TheSQLGuru (3/17/2014)
If truly boggles my mind why an EXISTS with a COUNT(*) and HAVING is faster than a straight EXISTS check. That makes absolutely no sense whatsoever! :crazy:
I agree and I was a bit loathed to post it to be honest. That was why I also mocked up some test data. It didn't perform the same on that. Tomorrow I'm going to take a closer look at the tables that I did the initial test on and try and make sense of it.
I'll try and get some actual plans posted when I do. The only reason I tried it was I thought it might have an early opt out due to the having.
March 18, 2014 at 1:06 pm
Hi
The tables I were querying from have the following:
Table1 1877 unique ids with a non clustered index.
Table2 38438 joining ids, avg of 20 rows per ID up to a maximum of 216 rows, non clustered index.
Statistics all updated with fullscan.
I ran the following querys--Statement1
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
--Statement2
IF NOT EXISTS (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
--Statement3
IF (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
) = 0
PRINT ' Result 0'
Here's the general stats
For Statement 3 the estimated rows are way south of the actual rows. The net effect is that Statement 3 ends up doing way more reads than it needs to.
March 18, 2014 at 1:54 pm
But your 3 samples are missing the important 4th sample:
IF NOT EXISTS (
SELECT *FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
)
PRINT ' Result 0'
You may need to play around with the exist/not exists to get the logic you want. Oh, and you want to code this so that it DOES HIT very often/quickly (assuming that is applicable here) - that way the EXISTS logic can short-circuit and exit early in the query.
Once you get the query right, how does it perform?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 18, 2014 at 2:04 pm
hunchback (3/17/2014)
Try using NOT EXISTS instead NOT IN.if exists(
select *
from Responses r
where not exists (
select *
from data.GamingReport_Computerized as t
where t.ResponseID = r.ResponseID
)
)
begin select 'missing rows will be inserted' end
else begin select 'update will be skipped to save time' end
It will be helpful having an index by ResponseID in each table if you do not have one.
I've found that there are typically no differences in performance between NOT EXISTS and NOT IN. Lot's of tests have been run on this site that demonstrate that fact, as well.
As a side bar, I prefer NOT IN because it doesn't require correlation and the sub-query can be tested separately.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 18, 2014 at 2:19 pm
Hi
Have added the various query options now and we have a new winner 😀 with a query that makes sense-- Statement 1
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
-- Statement 2
IF NOT EXISTS (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
-- Statement 3
IF EXISTS (
SELECT COUNT(*)
FROM Table1 c
WHERE EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
HAVING COUNT(*) > 0
)
PRINT ' Result 0'
-- Statement 4
IF (
SELECT COUNT(*)
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
) = 0
PRINT ' Result 0'
-- Statement 5
IF NOT EXISTS (
SELECT 1
FROM Table1 c
WHERE NOT EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
)
PRINT ' Result 0'
-- Statement 6
IF EXISTS (
SELECT 1
FROM Table1 c
WHERE EXISTS ( SELECT 1 FROM Table2 z WHERE z.CNST_CONSENT_ID = c.CNST_CONSENT_ID)
)
PRINT ' Result 0'
March 18, 2014 at 2:26 pm
I suspect I know why, but don't have time to write an essay tonight (and that's what is required to explain it)
The EXISTS adds a row goal of 1 into the plan (because EXISTS only needs to see whether there's a row or not). Paul White's blog has at least one post on Row Goals
To see if that is the cause, does this query have the same odd performance characteristic as the IF EXISTS... version?
select TOP(1) 1 from Responses r where r.ResponseID not in (
select ResponseID from data.GamingReport_Computerized
)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 18, 2014 at 2:55 pm
Cool! So it looks like the EXISTS/EXISTS is able to very quickly exit with a hit (8 reads). Oh, I forgot to ask - are your plans estimated or actual?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 18, 2014 at 3:09 pm
The plans are actual.
And here is Paul's blog that Gail referred to.
March 19, 2014 at 1:41 pm
mickyT: Which query do you think is a "winner", because those are not logically equivalent queries to my original query. For example, query 6 is using an "exists" in the where clause, which is incorrect. Of course that will short circuit because there exists 2.3 million matching rows. We're looking for existence of non-matching rows (i.e. where not exists or where ResponseID is null in the 2nd table) When you change it to "not exists" (since we're looking for rows that exist in one table but not the other), then it exhibits the exact same performance issue.
GilaMonster: The "top(1)" query you suggested does exhibit the same query plan and same performance issue.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply