Performance increase when qualifying join types

Question

Performance increase when qualifying join types

cmartel 20772

Ten Centuries

Points: 1233
More actions
March 12, 2018 at 1:05 pm

#344343

I have a simple case where I retrieve data using an inline table-valued function, while joining a table using its primary key. Things were going as fast as expected until the database grew up. My function currently returns about 100,000 rows while the joined table contains about 10,000 rows and the following statement that was executing in less than a second now executes in minutes.
SELECT F.SomeValue, T.SomeValue
FROM fnGetData() F
INNER JOIN Table T ON T.ID = F.TableID
I tried qualifying the INNER JOIN by specifying LOOP, MERGE and HASH and using ANY of the 3 types makes my statement run fast again.
Is it a good practice to qualify the joins (if the seed of a statement is a function) ?
Which type is better ?
Is there something else I am missing ?
I tried replacing the INNER JOIN by a CROSS APPLY and the statement runs fast again:
SELECT F.SomeValue, T.SomeValue
FROM fnGetData() F
CROSS APPLY (SELECT SomeValue FROM Table WHERE ID = F.TableID) T

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 1

In general, it is NOT a good practice to add any kind of join or table hints to force the query to do things it normally wouldn't do. In SOME cases it may help, but there's no general rule telling you which cases those are. The query optimizer puts a lot of work into choosing the correct query plan, and if you can beat it just by adding a hint then the optimizer is probably being misled by something. Outdated statistics, missing indexes, parameter sniffing, bad cardinality estimates, unnecessary table scans? There are many possible causes. You need a deeper understanding of the query to figure out what is going on.

Look at the estimated query plan. If you have two versions of the query with different run times, put them in the same query pane and press CTRL-L. How does the LOOP JOIN plan differ from the CROSS APPLY plan? Does the cost estimate for both plans match the observed performance difference? Try comparing the actual plans. How do estimated rowcounts at each step compare to the actual rowcounts? Capture query statistics (SET STATISTICS TIME ON and SET STATISTICS IO ON) to get more detail. If you have SELECT queries with large rowcounts, try adding " INTO #junk " so you're not measuring the time required to send the results over the network to populate a grid in Management Studio.

The best way to figure out what is going wrong with your query is to examine the execution plan, and the best way to learn how to read execution plans is to see what plans are produced by variations in a given query and why the query optimizer is sometimes so wrong about how to handle the query.

cmartel 20772 Ten Centuries Points: 1233 More actions · Answer 2

Thanks Scott,

The problem is that I think there is nothing wrong with my statement. The base function returns keys and dates. It is quite complex but performs very well. The problem arises when I want to perform an extra join to get the name of a person by primary key.

Writing «INNER JOIN Person P ON P.ID = F.PersonID» seems the right thing to do but, as mentioned, the statement performs poorly (15 minutes instead of fraction of a second). Writing «INNER LOOP JOIN Person P ON P.ID = F.PersonID», «INNER MERGE JOIN Person P ON P.ID = F.PersonID», «INNER HASH JOIN Person P ON P.ID = F.PersonID» or even «CROSS APPLY (SELECT Name FROM Person WHERE ID = F.PersonID) P» restores the performance.

Execution plans are obviously different and they are quite complex, because of the base function and there is no index issue here since its all primary key based. It seems to me that there is a flaw in SQL Server plan builder (I did not test more recent versions of SQL Server but I suspect I might get different results) and I was wondering if anyone else ever had the similar issues.

I will fix the issue by using «CROSS APPLY» since I am not comfortable tweaking join types.

sgmunson SSC Guru Points: 110640 More actions · Answer 3

cmartel 20772 - Tuesday, March 13, 2018 8:00 AM
Thanks Scott,
The problem is that I think there is nothing wrong with my statement. The base function returns keys and dates. It is quite complex but performs very well. The problem arises when I want to perform an extra join to get the name of a person by primary key.
Writing Â«INNER JOIN Person P ON P.ID = F.PersonIDÂ» seems the right thing to do but, as mentioned, the statement performs poorly (15 minutes instead of fraction of a second). Writing Â«INNER LOOP JOIN Person P ON P.ID = F.PersonIDÂ», Â«INNER MERGE JOIN Person P ON P.ID = F.PersonIDÂ», Â«INNER HASH JOIN Person P ON P.ID = F.PersonIDÂ» or even Â«CROSS APPLY (SELECT Name FROM Person WHERE ID = F.PersonID) PÂ» restores the performance.
Execution plans are obviously different and they are quite complex, because of the base function and there is no index issue here since its all primary key based. It seems to me that there is a flaw in SQL Server plan builder (I did not test more recent versions of SQL Server but I suspect I might get different results) and I was wondering if anyone else ever had the similar issues.
I will fix the issue by using Â«CROSS APPLYÂ» since I am not comfortable tweaking join types.

I'm pretty sure there is actually something wrong, or you wouldn't be having this problem in the first place. Clearly, it's not just the query that is the problem. If the optimizer is choosing a bad execution plan, then you need to fix the cause. My guess would be out of date statistics, but you'll need to determine when they were last updated. I'm all for being practical and going ahead and using CROSS APPLY, but it really should bother you that this occurred, until you have investigated all the possible causes.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 4

Probably most of the people on this site have had a similar issue at least once. Sometimes a major restructuring of a query does not fool the query optimizer at all and you get an identical plan, and other times some trivial change makes it perform horribly. There is not one simple answer that explains all these cases, the only way I know of to figure out why two versions of the same query perform differently is to compare the execution plans.

Usually there is a reason the query optimizer seems to choose the wrong plan, such as outdated statistics or massive table fragmentation. It may reject a good plan because it overestimates the cost of some action, or choose a bad plan because it underestimates the cost. It can cache a query plan with one set of parameters, then reuse it with another set of parameters that make the plan a very bad choice. These are not flaws in the optimizer, but the result of design choices to let it usually pick a good enough plan in most cases in a reasonable amount of time. You can help it by making sure statistics are updated, tables and indexes are not fragmented, and queries are no more complex than they need to be.

If you want to know why the query optimizer is not getting the job done in this case, you have to do the hard work of going through the plans. Just going ahead with the CROSS APPLY version if it meets your requirements is also a reasonable choice.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 5

cmartel 20772 - Monday, March 12, 2018 1:05 PM
I have a simple case where I retrieve data using an inline table-valued function, while joining a table using its primary key. Things were going as fast as expected until the database grew up. My function currently returns about 100,000 rows while the joined table contains about 10,000 rows and the following statement that was executing in less than a second now executes in minutes.
SELECT F.SomeValue, T.SomeValue
FROM fnGetData() F
INNER JOIN Table T ON T.ID = F.TableID
I tried qualifying the INNER JOIN by specifying LOOP, MERGE and HASH and using ANY of the 3 types makes my statement run fast again.
Is it a good practice to qualify the joins (if the seed of a statement is a function) ?
Which type is better ?
Is there something else I am missing ?
I tried replacing the INNER JOIN by a CROSS APPLY and the statement runs fast again:
SELECT F.SomeValue, T.SomeValue
FROM fnGetData() F
CROSS APPLY (SELECT SomeValue FROM Table WHERE ID = F.TableID) T

Is it the fact that you added a join hint or the fact that adding the join hint caused a recompile that may have gotten you over the hump with a temporary bout with "bad parameter sniffing"? I suspect it was the resulting recompile. Updating stats would probably have cause the same effect.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

cmartel 20772 Ten Centuries Points: 1233 More actions · Answer 6

I found out where the problem was.

The base function was internally returning a Person ID using an IIF statement testing a constant, like:

SELECT
IIF(I.Status = K.Active, I.FromPersonID, I.ToPersonID)
FROM Invoice I
OUTER APPLY (SELECT * FROM fnConstants()) K

Where function fnConstants() looks like:

CREATE FUNCTION fnConstants()
RETURNS TABLE AS RETURN
SELECT
0 AS Active,
1 AS Inactive
2 AS Postponed,
...

This way of handling constants is elegant and nearly as fast as providing the value, or at least it was nearly as fast until last week. Using values instead of constants solves the issue:

SELECT
IIF(I.Status = 0 /*Active*/, I.FromPersonID, I.ToPersonID)
FROM Invoice I

Since the base function is inline, the «resulting» statement had an INNER JOIN that ultimately looked like:

INNER JOIN Person P ON P.PersonID = IIF(I.Status = K.Active, I.FromPersonID, I.ToPersonID)

That now looks like:

INNER JOIN Person P ON P.PersonID = IIF(I.Status = 0 /*Active*/, I.FromPersonID, I.ToPersonID)

It is sad that SQL Server failed to compute a good execution plan because of the way constants were handled. From now on, I will stop using fnConstants() and put back values and comments.

Thanks!