Extra join is killing query

  • Good for you SS Coach and Chris M.

    You will have to give me a few days to get the sample database together as I will need to remove any hint of real information from it and we are currently finalizing a release here.

    Regards

    David

  • My $0.02 (and worth every penny...).

    When I am faced with a query that is performing poorly, particularly if it is joining a lot of tables and returning a lot of data values, I try to split it into two parts - how do I get the record keys for the records that meet the criteria I specify, and then how do I retrieve the actual data values from those records?

    I have found that when you throw enough table joins at SQL Server, even when the joins are really simple, at some points the query optimizer is going to guess wrong, and the query is going to fall "down the rabbit hole".

    At that point, refactor the query. If you know a priori that some tables will normally return a very small result set in your queries, then get that subset first in a temp table, then start joining your other tables on that small subset. It is hard (not impossible) for the query optimizer to get confused at this point.

    Don't necessarily try to return all the data columns in one big query. Use ONLY your criteria columns and key columns to join the minimum set of tables you need to satisfy the criteria, and focus on getting the record keys to the tables containing the data columns to be returned. Make sure that query runs FAST, and fix it until it does.

    Then, join that result to the tables (maybe even the same tables again!) to get the data columns you wish to return.

    The query optimizer is amazingly good at what it does, but if you throw 10,000 possible execution paths at it and only a few will work well, don't be surprised when it gets confused.

    I am definitely not in favor of "hints" to solve performance issues. Read the caveats about hints - they are there for a reason. A hint that works today may fail spectacularly with SQL 2014 et seq. Solve the underlying performance issues and you don't need hints.

  • Hi Guys

    Here is a zipped up bak (SQL Server 2008 R2) file filed with sample data and the proc. It is 17 meg so too large to post here and I have used dropbox.

    I have changed the proc slightly, but the illogical use of the #tmp table is still required. I have put comments in the proc to explain it's logic, but basically it is a dynamic query that returns a list of clients eligible for targeting to the application grid. To make the list the client must be of a high enough profile or be a target already. These two separate criteria are combined with the union clause.

    Good luck, I would be happy if someone can show me a better way of writing the query, but the main thing that interests me is WHY DO I NEED THAT DAMN #TMP TABLE?

    Regards

    David

  • DavidDroog (8/5/2015)


    Hi Guys

    Here is a zipped up bak (SQL Server 2008 R2) file filed with sample data and the proc. It is 17 meg so too large to post here and I have used dropbox.

    ...

    I have changed the proc slightly, but the illogical use of the #tmp table is still required. I have put comments in the proc to explain it's logic, but basically it is a dynamic query that returns a list of clients eligible for targeting to the application grid. To make the list the client must be of a high enough profile or be a target already. These two separate criteria are combined with the union clause.

    Good luck, I would be happy if someone can show me a better way of writing the query, but the main thing that interests me is WHY DO I NEED THAT DAMN #TMP TABLE?

    Regards

    David

    Awesome, thanks.

    Can you answer the questions in this post please - it would give us a head start.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi ChrisM

    Replies below

    1.Clients with “disciplines” filtered by geography using the geography parameter. This query should reference client, client address and geography tables plus those required to identify "disciplines". If any other tables are referenced in the query then provide an explanation. If any tables are referenced more than once then provide a full justification.

    The discipline field is in the client table to join to the geography table you join to the address bale via Address.AddressID = Client.PrimaryAddressID. You then join the address table to the Geography table via Address.GeographyBrickReferenceID = Geography.ReferenceID where GeographyTypeID = 8 (The lowest level of the Geography Hierarchy known as a brick). The table causing me the dramas (OwnershipRoleDiscipline) joins to the next level up in the geographyhierarchy (known as an SRA), the clients discipline and the reps OwnershipRoleID which is essentially their "team"

    2.Identify how to distinguish between existing and potential targets, but keep this separate from query 1 for now. It might be as simple as a grab from sales tables using clients identified by Query 1.

    The potential targets query is the top half of the query whilst the second half of the query is the existing targets. The two parts are separated by the UNION clause

    3.One rep with “disciplines” and products including info to determine “high enough product profile”, chosen using the rep parameter. If any table is referenced more than once in this query, then provide a full justification for it.

    The high enough product profile is determined by joining to the profile.ProfileValueColour table having a ForDisplay value of 1. It joins via the OwnershipRoleID, the product and the actual and potential values of the client.

    I should also state for people wanting to get the value of the @query to run it in a non-dynamic form the print statement will only spit out the first 5000 characters. You will need to add the rest on yourself.

    Regards

    David

  • DavidDroog (8/5/2015)


    Hi ChrisM

    Replies below

    1.Clients with “disciplines” filtered by geography using the geography parameter. This query should reference client, client address and geography tables plus those required to identify "disciplines". If any other tables are referenced in the query then provide an explanation. If any tables are referenced more than once then provide a full justification.

    The discipline field is in the client table to join to the geography table you join to the address bale via Address.AddressID = Client.PrimaryAddressID. You then join the address table to the Geography table via Address.GeographyBrickReferenceID = Geography.ReferenceID where GeographyTypeID = 8 (The lowest level of the Geography Hierarchy known as a brick). The table causing me the dramas (OwnershipRoleDiscipline) joins to the next level up in the geographyhierarchy (known as an SRA), the clients discipline and the reps OwnershipRoleID which is essentially their "team"

    2.Identify how to distinguish between existing and potential targets, but keep this separate from query 1 for now. It might be as simple as a grab from sales tables using clients identified by Query 1.

    The potential targets query is the top half of the query whilst the second half of the query is the existing targets. The two parts are separated by the UNION clause

    3.One rep with “disciplines” and products including info to determine “high enough product profile”, chosen using the rep parameter. If any table is referenced more than once in this query, then provide a full justification for it.

    The high enough product profile is determined by joining to the profile.ProfileValueColour table having a ForDisplay value of 1. It joins via the OwnershipRoleID, the product and the actual and potential values of the client.

    I should also state for people wanting to get the value of the @query to run it in a non-dynamic form the print statement will only spit out the first 5000 characters. You will need to add the rest on yourself.

    Regards

    David

    Hi David

    Thanks, every little bit helps. However the request was very specific: "Can you make a start by providing these three queries, please:". Can you do this? It's quite fundamental to what I've got in mind and it also gives a head start to any other folks who might wish to chip in. Cheers

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/6/2015)


    DavidDroog (8/5/2015)


    Hi ChrisM

    Replies below

    1.Clients with “disciplines” filtered by geography using the geography parameter. This query should reference client, client address and geography tables plus those required to identify "disciplines". If any other tables are referenced in the query then provide an explanation. If any tables are referenced more than once then provide a full justification.

    The discipline field is in the client table to join to the geography table you join to the address bale via Address.AddressID = Client.PrimaryAddressID. You then join the address table to the Geography table via Address.GeographyBrickReferenceID = Geography.ReferenceID where GeographyTypeID = 8 (The lowest level of the Geography Hierarchy known as a brick). The table causing me the dramas (OwnershipRoleDiscipline) joins to the next level up in the geographyhierarchy (known as an SRA), the clients discipline and the reps OwnershipRoleID which is essentially their "team"

    2.Identify how to distinguish between existing and potential targets, but keep this separate from query 1 for now. It might be as simple as a grab from sales tables using clients identified by Query 1.

    The potential targets query is the top half of the query whilst the second half of the query is the existing targets. The two parts are separated by the UNION clause

    3.One rep with “disciplines” and products including info to determine “high enough product profile”, chosen using the rep parameter. If any table is referenced more than once in this query, then provide a full justification for it.

    The high enough product profile is determined by joining to the profile.ProfileValueColour table having a ForDisplay value of 1. It joins via the OwnershipRoleID, the product and the actual and potential values of the client.

    I should also state for people wanting to get the value of the @query to run it in a non-dynamic form the print statement will only spit out the first 5000 characters. You will need to add the rest on yourself.

    Regards

    David

    Hi David

    Thanks, every little bit helps. However the request was very specific: "Can you make a start by providing these three queries, please:". Can you do this? It's quite fundamental to what I've got in mind and it also gives a head start to any other folks who might wish to chip in. Cheers

    With the weekend coming up and offering folks extended periods of time essential for this kind of task, here's a gentle nudge. I'll get the data loaded this weekend but don't plan on making a start on coding until the remaining questions are addressed.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I've checked the first query text using query diagrammer, namely off-line version of https://snowflakejoins.com/ and i see that

    1) Product p is never used but in this join

    JOIN dbo.Product p

    ON p.ProductID = ppc.ProductID

    What is the reason to have this join, no FK in place or something?

    2) this will hardly affect perfomance but looks like one of two cpp.ProductID =.. predicates is redudant due to predicate of previous inner join

    AND ppc.ProductID = cpa.ProductID

    JOIN Profile.ClientProductProfile cpp

    ON cpp.ProductID = cpa.ProductId

    AND cpp.ProfileValue = ppc.Potential

    AND cpp.ProductID = ppc.ProductID

    Serg

  • serg-52 (8/7/2015)


    I've checked the first query text using query diagrammer, namely off-line version of https://snowflakejoins.com/ and i see that

    1) Product p is never used but in this join

    JOIN dbo.Product p

    ON p.ProductID = ppc.ProductID

    What is the reason to have this join, no FK in place or something?

    2) this will hardly affect perfomance but looks like one of two cpp.ProductID =.. predicates is redudant due to predicate of previous inner join

    AND ppc.ProductID = cpa.ProductID

    JOIN Profile.ClientProductProfile cpp

    ON cpp.ProductID = cpa.ProductId

    AND cpp.ProfileValue = ppc.Potential

    AND cpp.ProductID = ppc.ProductID

    Serg

    You can see where the questions are headed and why, Serg. The original query has sufficient code smells to label it as "coding by accident".

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 9 posts - 31 through 38 (of 38 total)

You must be logged in to reply to this topic. Login to reply