May 17, 2017 at 11:30 am
May 17, 2017 at 12:11 pm
To tell you exactly what is going on would require examining the execution plan.
In general, though, what is happening is that in the second case, you are forcing a particular execution order for the query by forcing it to materialize the result sets of the two queries. Once that is done, you then run a query using EXCEPT on those two result sets.
In the CTE case, the optimizer is free to produce results however it sees fit; it won't necessarily process the queries defining the CTEs and then run the final query against those results.
In this way, CTEs are very much like views and are in-lined with the rest of the query.
In your case, the method chosen by the optimizer for the first case is just much worse than materializing the intermediate result sets and then running the final query.
Sometimes it's quite advantageous to let the optimizer decide this, since it will often choose physical execution order better than we do, but sometimes it doesn't. In cases like this, it might be because of poor statistics, or it might be that this data retrieval is a bit too complicated for the optimizer to figure out as one query.
Especially if those UDFs are not iTVFs, they might be the source of much of the optimizer's confusion in the larger query.
Hopefully that helps!
EDIT: Fixed some typos.
May 17, 2017 at 2:33 pm
Thanks Jacob,
Both old functions and new functions are iTVFs (new functions use CTE). I could send you the plans which are actually complex since the functions are themselves complex. I was expecting that someone encountered this issue and share the magic trick that would speed up the first script (I don't mind if it is slower than the second one but not that slow!)
May 17, 2017 at 2:41 pm
cmartel 20772 - Wednesday, May 17, 2017 2:33 PMThanks Jacob,Both old functions and new functions are iTVFs (new functions use CTE). I could send you the plans which are actually complex since the functions are themselves complex. I was expecting that someone encountered this issue and share the magic trick that would speed up the first script (I don't mind if it is slower than the second one but not that slow!)
There's no magic solution, though I've found many times in the past that breaking large queries into multiple chunks helps with execution speed.
I suggest you try using temp tables with version 2 of your query, it may run even faster.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
May 17, 2017 at 3:04 pm
As Phil indicated, there really isn't a one-size-fits-all trick that applies here.
Given what you've stated about your code, I'd suspect that the query is simply too complicated for SQL Server to reliably (or at all) come up with a decent execution plan.
I'd wager that if we saw the execution plan, the reason for early termination would be a time out.
In that case, there's not a lot to do but either tune all the involved code as well as possible, or give the optimizer smaller, more manageable queries to work with, as you've done by materializing the intermediate result sets in your second example.
Cheers!
May 17, 2017 at 3:50 pm
Without Execution plans (you can just post them here, BTW, using the "Add File" link below) it's impossible to determine why each query is performing differently. In SQL server 2014 there's some improvements to how temp variables work - perhaps that is helping you. Note that your first solution could be reduced to:SELECT C.UniqueID, F.Value
FROM Corporations C
OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F
EXCEPT
SELECT C.UniqueID, F.Value
FROM Corporations C
OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F
As has been mentioned: run both of these queries separately (try using WITH (RECOMPILE) and see if one is very slow. If one is very slow then you've isolated the problem; if they're both blazing fast and the EXCEPT is what's slowing you down then you could consider using temp tables instead of table variables and adding an index on each temp table on UniqueID, Value.
It's most important to understand how EXCEPT works: EXCEPT returns a distinct result set which means that the data must be sorted. sorting is expensive and each row becomes more expensive to sort the more rows you add. Indexes can prevent sorting but not in your query because you are generating a derived column. I would be curious to see if this helps:(
SELECT DISTINCT C.UniqueID, F.Value
FROM Corporations C
OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F
)
EXCEPT
(
SELECT DISTINCT C.UniqueID, F.Value
FROM Corporations C
OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F
)
Here we're performing 3 sorts instead of one but, depending on the number of duplicates in each of the two queries, you may be performing 3 wildly less expensive sorts instead of a really big one.
-- Itzik Ben-Gan 2001
May 17, 2017 at 4:19 pm
Like others have said, the code and execution plans are really what we need to be able offer any serious help.
May 18, 2017 at 7:46 am
Alan.B - Wednesday, May 17, 2017 3:50 PMNote that your first solution could be reduced to:SELECT C.UniqueID, F.Value
FROM Corporations C
OUTER APPLY (SELECT Value FROM OldFunction(C.UniqueID)) F
EXCEPT
SELECT C.UniqueID, F.Value
FROM Corporations C
OUTER APPLY (SELECT Value FROM NewFunction(C.UniqueID)) F
I believe it can be reduced even further:
SELECT C.UniqueID, F.Value
FROM Corporations C
CROSS APPLY ( -- changed this to a CROSS APPLY
SELECT Value FROM OldFunction(C.UniqueID)
EXCEPT
SELECT Value FROM NewFunction(C.UniqueID)
) F
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 18, 2017 at 2:33 pm
SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;
To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;
Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
May 18, 2017 at 2:46 pm
cmartel 20772 - Thursday, May 18, 2017 2:33 PMI just found out a magic key. If I change the end of the first script from:SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try thisCREATE TABLE #Tmp
(
Var_val varchar(100)
)
INSERT INTO #Tmp(Var_val)
VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')
SELECT CONVERT(varchar, var_val)
FROM #Tmp
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
May 19, 2017 at 8:31 am
Michael L John - Thursday, May 18, 2017 2:46 PMcmartel 20772 - Thursday, May 18, 2017 2:33 PMI just found out a magic key. If I change the end of the first script from:SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try thisCREATE TABLE #Tmp
(
Var_val varchar(100)
)INSERT INTO #Tmp(Var_val)
VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')SELECT CONVERT(varchar, var_val)
FROM #Tmp
Mike's correct with his point. If you don't size a varchar, a variable will default to a length of 1 and a column will default to 30.
May 19, 2017 at 8:58 am
Ed Wagner - Friday, May 19, 2017 8:31 AMMichael L John - Thursday, May 18, 2017 2:46 PMcmartel 20772 - Thursday, May 18, 2017 2:33 PMI just found out a magic key. If I change the end of the first script from:SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try thisCREATE TABLE #Tmp
(
Var_val varchar(100)
)INSERT INTO #Tmp(Var_val)
VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')SELECT CONVERT(varchar, var_val)
FROM #TmpMike's correct with his point. If you don't size a varchar, a variable will default to a length of 1 and a column will default to 30.
Minor tweak to that:
Both variable declaration and column DDL will use a length of 1 for varchar if length is not specified.
It's when used in CAST/CONVERT without a length that it uses 30.
Cheers!
May 20, 2017 at 7:19 am
Jacob Wilkins - Friday, May 19, 2017 8:58 AMEd Wagner - Friday, May 19, 2017 8:31 AMMichael L John - Thursday, May 18, 2017 2:46 PMcmartel 20772 - Thursday, May 18, 2017 2:33 PMI just found out a magic key. If I change the end of the first script from:SELECT * FROM OLD
EXCEPT
SELECT * FROM NEW;To:
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM OLD
EXCEPT
SELECT C.UniqueID + '|' + CONVERT(VARCHAR, F.Value) FROM NEW;Then both scripts run fast. The technique works here since I only want to know if there are differences. I do not care about presentation. I nevertheless found a way to trick the optimizer and that makes my day!
By not sizing the CONVERT(VARCHAR, you may be truncating your data and invalidating the comparison.
Try thisCREATE TABLE #Tmp
(
Var_val varchar(100)
)INSERT INTO #Tmp(Var_val)
VALUES('1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890')SELECT CONVERT(varchar, var_val)
FROM #TmpMike's correct with his point. If you don't size a varchar, a variable will default to a length of 1 and a column will default to 30.
Minor tweak to that:
Both variable declaration and column DDL will use a length of 1 for varchar if length is not specified.
It's when used in CAST/CONVERT without a length that it uses 30.
Cheers!
Aaagh! You're absolutely correct - thanks for correcting it.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply