Performance issue with EXCEPT

Question

Performance issue with EXCEPT

cmartel 20772

Ten Centuries

Points: 1233
More actions
May 17, 2017 at 11:30 am

#336980

I am rewriting some user-defined functions and I want to test if they give the exact same result as the functions they are replacing. I am testing the functions on all corporations of my database and the result contains about 15,000 rows. Running the following script:
WITH
    OLD AS (
        SELECT C.UniqueID, F.Value
        FROM Corporations C
        OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F
    ),
    NEW AS (
        SELECT C.UniqueID, F.Value
        FROM Corporations C
        OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F
    )
SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;
Takes over 21 minutes while executing the EXCEPT outside the WITH like in:
DECLARE @TOLD TABLE (UniqueID INT, Value INT);
DECLARE @TNEW TABLE (UniqueID INT, Value INT);
INSERT INTO @TOLD
    SELECT C.UniqueID, F.Value
    FROM Corporations C
    OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F;
INSERT INTO @TNEW
    SELECT C.UniqueID, F.Value
    FROM Corporations C
    OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F;
SELECT * FROM @TOLD
EXCEPT
SELECT * FROM @TNEW;
Takes less than one minute. I would prefer using the former script as it is by far more elegant but its performance is not acceptable. Any idea of what is going on?

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 1

To tell you exactly what is going on would require examining the execution plan.

In general, though, what is happening is that in the second case, you are forcing a particular execution order for the query by forcing it to materialize the result sets of the two queries. Once that is done, you then run a query using EXCEPT on those two result sets.

In the CTE case, the optimizer is free to produce results however it sees fit; it won't necessarily process the queries defining the CTEs and then run the final query against those results.

In this way, CTEs are very much like views and are in-lined with the rest of the query.

In your case, the method chosen by the optimizer for the first case is just much worse than materializing the intermediate result sets and then running the final query.

Sometimes it's quite advantageous to let the optimizer decide this, since it will often choose physical execution order better than we do, but sometimes it doesn't. In cases like this, it might be because of poor statistics, or it might be that this data retrieval is a bit too complicated for the optimizer to figure out as one query.

Especially if those UDFs are not iTVFs, they might be the source of much of the optimizer's confusion in the larger query.

Hopefully that helps!

EDIT: Fixed some typos.

cmartel 20772 Ten Centuries Points: 1233 More actions · Answer 2

Thanks Jacob,

Both old functions and new functions are iTVFs (new functions use CTE). I could send you the plans which are actually complex since the functions are themselves complex. I was expecting that someone encountered this issue and share the magic trick that would speed up the first script (I don't mind if it is slower than the second one but not that slow!)

Phil Parkin SSC Guru Points: 247224 More actions · Answer 3

cmartel 20772 - Wednesday, May 17, 2017 2:33 PM
Thanks Jacob,
Both old functions and new functions are iTVFs (new functions use CTE). I could send you the plans which are actually complex since the functions are themselves complex. I was expecting that someone encountered this issue and share the magic trick that would speed up the first script (I don't mind if it is slower than the second one but not that slow!)

There's no magic solution, though I've found many times in the past that breaking large queries into multiple chunks helps with execution speed.
I suggest you try using temp tables with version 2 of your query, it may run even faster.

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 4

As Phil indicated, there really isn't a one-size-fits-all trick that applies here.

Given what you've stated about your code, I'd suspect that the query is simply too complicated for SQL Server to reliably (or at all) come up with a decent execution plan.

I'd wager that if we saw the execution plan, the reason for early termination would be a time out.

In that case, there's not a lot to do but either tune all the involved code as well as possible, or give the optimizer smaller, more manageable queries to work with, as you've done by materializing the intermediate result sets in your second example.

Cheers!

Alan Burstein SSC Guru Points: 61152 More actions · Answer 5

Without Execution plans (you can just post them here, BTW, using the "Add File" link below) it's impossible to determine why each query is performing differently. In SQL server 2014 there's some improvements to how temp variables work - perhaps that is helping you. Note that your first solution could be reduced to:
SELECT C.UniqueID, F.Value FROM Corporations C OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F EXCEPT SELECT C.UniqueID, F.Value FROM Corporations C OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F
As has been mentioned: run both of these queries separately (try using WITH (RECOMPILE) and see if one is very slow. If one is very slow then you've isolated the problem; if they're both blazing fast and the EXCEPT is what's slowing you down then you could consider using temp tables instead of table variables and adding an index on each temp table on UniqueID, Value.

It's most important to understand how EXCEPT works: EXCEPT returns a distinct result set which means that the data must be sorted. sorting is expensive and each row becomes more expensive to sort the more rows you add. Indexes can prevent sorting but not in your query because you are generating a derived column. I would be curious to see if this helps:
( SELECT DISTINCT C.UniqueID, F.Value FROM Corporations C OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F ) EXCEPT ( SELECT DISTINCT C.UniqueID, F.Value FROM Corporations C OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F )
Here we're performing 3 sorts instead of one but, depending on the number of duplicates in each of the two queries, you may be performing 3 wildly less expensive sorts instead of a really big one.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Ed Wagner SSC Guru Points: 287026 More actions · Answer 6

Like others have said, the code and execution plans are really what we need to be able offer any serious help.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

drew.allen SSC Guru Points: 77014 More actions · Answer 7

Alan.B - Wednesday, May 17, 2017 3:50 PM
Note that your first solution could be reduced to:
SELECT C.UniqueID, F.Value FROM Corporations C OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F EXCEPT SELECT C.UniqueID, F.Value FROM Corporations C OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F

I believe it can be reduced even further:

SELECT C.UniqueID, F.Value FROM Corporations C CROSS APPLY ( -- changed this to a CROSS APPLY SELECT Value FROM OldFunction(C.UniqueID) EXCEPT SELECT Value FROM NewFunction(C.UniqueID) ) F

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

cmartel 20772 Ten Centuries Points: 1233 More actions · Answer 8

I just found out a magic key. If I change the end of the first script from:

SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;

To:

SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;

Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!

Michael L John One Orange Chip Points: 27079 More actions · Answer 9

cmartel 20772 - Thursday, May 18, 2017 2:33 PM
I just found out a magic key. If I change the end of the first script from:
SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;
To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;
Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!

By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try this
CREATE TABLE #Tmp ( Var_val varchar(100) )

INSERT INTO #Tmp(Var_val) VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')

SELECT CONVERT(varchar, var_val) FROM #Tmp

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Ed Wagner SSC Guru Points: 287026 More actions · Answer 10

Michael L John - Thursday, May 18, 2017 2:46 PM
cmartel 20772 - Thursday, May 18, 2017 2:33 PM
I just found out a magic key. If I change the end of the first script from:
SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;
To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;
Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try this
CREATE TABLE #Tmp ( Var_val varchar(100) )
INSERT INTO #Tmp(Var_val) VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')
SELECT CONVERT(varchar, var_val) FROM #Tmp

Mike's correct with his point. If you don't size a varchar, a variable will default to a length of 1 and a column will default to 30.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 11

Ed Wagner - Friday, May 19, 2017 8:31 AM
Michael L John - Thursday, May 18, 2017 2:46 PM
cmartel 20772 - Thursday, May 18, 2017 2:33 PM
I just found out a magic key. If I change the end of the first script from:
SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;
To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;
Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try this
CREATE TABLE #Tmp ( Var_val varchar(100) )
INSERT INTO #Tmp(Var_val) VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')
SELECT CONVERT(varchar, var_val) FROM #Tmp
Mike's correct with his point. If you don't size a varchar, a variable will default to a length of 1 and a column will default to 30.

Minor tweak to that:

Both variable declaration and column DDL will use a length of 1 for varchar if length is not specified.

It's when used in CAST/CONVERT without a length that it uses 30.

Cheers!

Ed Wagner SSC Guru Points: 287026 More actions · Answer 12

Jacob Wilkins - Friday, May 19, 2017 8:58 AM
Ed Wagner - Friday, May 19, 2017 8:31 AM
Michael L John - Thursday, May 18, 2017 2:46 PM
cmartel 20772 - Thursday, May 18, 2017 2:33 PM
I just found out a magic key. If I change the end of the first script from:
SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;
To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;
Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try this
CREATE TABLE #Tmp ( Var_val varchar(100) )
INSERT INTO #Tmp(Var_val) VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')
SELECT CONVERT(varchar, var_val) FROM #Tmp
Mike's correct with his point. If you don't size a varchar, a variable will default to a length of 1 and a column will default to 30.
Minor tweak to that:
Both variable declaration and column DDL will use a length of 1 for varchar if length is not specified.
It's when used in CAST/CONVERT without a length that it uses 30.
Cheers!

Aaagh! You're absolutely correct - thanks for correcting it.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions