Top 1 makes very fast query into very slow query

  • I have a very simple query:

    SELECT  AkcesAutoID  FROM
    ( SELECT
    P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
    FROMdbo.Podrobnosti P
    INNER JOINRO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
    INNER JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP

    Order By PP.Rok, PP.AkcesitPred, PP.Akcesit

    It returns 281 records, from a quite modest set of tables: Podrobnosti around 120,000 records, Akces around 60,000 and the fntStringsToTable TVF exactly one record, and does so in a fraction of a second. But when I change the query to return only the first record, and NO other changes, the query goes nuts.

    SELECT Top 1 AkcesAutoID  FROM
    ( SELECT
    P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
    FROMdbo.Podrobnosti P
    INNER JOINRO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
    INNER JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP

    Order By PP.Rok, PP.AkcesitPred, PP.Akcesit

    Execution time jumps to several minutes, and the execution plan is right out of the twilight zone - almost 600 MILLION records returned, cost percentages over one million percent, and so on. I've found a number of beefs with a similar theme on the net, but no definite solution. They all say essentially, "Try wording the query differently and see what happens."

    I did try different things, and swapping the TVF join for a WHERE clause fixes the problem, but I also see many posts claiming that JOINs are generally preferred over WHERE clauses. And it doesn't make sense anyway. I already have the sort in place - the recordset built is identical in both versions. All I want is to return only the first record, rather than all 281 of them. Can anyone shed some light on this?

    • This topic was modified 3 years, 7 months ago by  pdanes.
    Attachments:
    You must be logged in to view attached files.
  • Yeah, seems like a bug in SQL optimizer.  I'd make sure SQL didn't use a LOOP join (gack!) for the two main tables.  I think the hint will also force the main tables to be joined before the Ltrs table.  Typically EXISTS() is more efficient than JOIN, assuming only one row in Ltrs, which seems to be the case.

    ( SELECT
    P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
    FROMdbo.Podrobnosti P
    INNER HASH JOIN dbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID
    INNER JOINRO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter ) PP

     

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Yep, that did it. Inner HASH Join, instead of Inner Join, and the query runs normally. The execution plan is still slightly more convoluted than without the Top clause, but the insanity is gone and the speed is back. Thank you - any idea what is going on, and how to avoid it in general?

    • This reply was modified 3 years, 7 months ago by  pdanes.
  • It also doesn't matter where I put the HASH directive. Both of these work fine, and yield what looks to be the same execution plan:

    INNER JOIN RO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
    INNER HASH JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP
    INNER HASH JOIN RO.fntStringsToTable ('J') Ltrs on Ltrs.EvidenceLetter = P.EvidenceLetter
    INNER JOIN dbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP
  • BTW, the TVF does not always have only one record. Usually one, but it is possible to have almost 50. However, that would be an extreme case, and probably an error in using the application, or one of my edge case tests. In actual service, it's mostly one, occasionally up to around four.

  • If it could ever have more than one, then it needs to be a JOIN, just like you have it.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Well, I was thinking of replacing the entire JOIN clause with WHERE P.EvidenceLetter In('J') [or In('J','K','L')]  and so on. I tried that in my initial tests, and it also made the problem go away, but I would have to change a fair amount of code that generates the dynamic SQL. Simply plopping in the HASH was a lot simpler.

    But I have to wonder now, will I screw up some other queries that don't like the HASH directive? I have a bunch of testing ahead of me anyway, over some other development, so this is something I will be watching.

  • Yes, in theory could be less efficient in some cases than a LOOP join.  But it's still reasonable performance.  You'll never totally drop off a performance cliff going from LOOP to HASH, but you can vice versa (as you're seen).

    Another way to do it, although it would then effect every join in the query and not just a specific join, is to use:

    OPTION(HASH JOIN, MERGE JOIN)

    MERGE joins are extremely efficient when applicable, so you don't want to prevent those.  So this just prevents LOOP joins.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Looking at the two plans, it seems that the TOP 1 version tries to avoid the sort.   (With disastrous results.)    Apparently it can read the correct sequence from the Akces table.    I believe it is then doing a loop join to both the the Podrobnoti table and the fntStringsToTable function, assuming that it will get a "hit" fairly quickly.    But it has to join the entirety of the the two tables before it can ever join to the function.

    I know you already have a fix, but just out of curiosity, I would like to see what the table-valued function does.   I assume it is a multiline table valued function function.    At our shop, best practice is to avoid multiline table valued functions and to APPLY, not JOIN, inline table valued functions.

     

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The query could be simplified, no?  Maybe something like this

    select top(1) p.AkcesAutoID
    from dbo.podrobnosti p
    join ro.fntStringsToTable('J') ltrs on ltrs.EvidenceLetter = p.EvidenceLetter
    join dbo.Akces a on p.AkcesAutoID = a.AkcesAutoID
    order by a.Rok, a.Akcesitpred, a.Akcesit;

    "I was thinking of replacing the entire JOIN clause with WHERE..."  Nah, I wouldn't do it.  A SELECT statement is a procedural sandwich which is evaluated in a fixed order (in most cases).  FROM, ON, and JOIN are evaluated 1st, 2nd, 3rd.  My tendency is to load up as much as possible in the FROM clause because it seems to make the rest of the query simpler and easier to get right.  It's also somewhat more self-documenting imo

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • The Dixie Flatline wrote:

    Looking at the two plans, it seems that the TOP 1 version tries to avoid the sort.   (With disastrous results.)    Apparently it can read the correct sequence from the Akces table.    I believe it is then doing a loop join to both the the Podrobnoti table and the fntStringsToTable function, assuming that it will get a "hit" fairly quickly.    But it has to join the entirety of the the two tables before it can ever join to the function.

    I know you already have a fix, but just out of curiosity, I would like to see what the table-valued function does.   I assume it is a multiline table valued function function.    At our shop, best practice is to avoid multiline table valued functions and to APPLY, not JOIN, inline table valued functions.

    Here you go. It's pretty trivial - just turns a comma-delimited string into a single-column table. The individual string elements are one or two letters, indicating a collection catalog. Mostly people work within a single catalog, but a few operations, mostly searches, can cross catalog boundaries.

    CREATE FUNCTION [RO].[fntStringsToTable] (@LtrFiltr varchar(1000))
    RETURNS
    @Ltrs TABLE(EvidenceLetter varchar(2) not null
    primary key clustered (EvidenceLetter)
    )

    with schemabinding
    AS
    BEGIN
    declare @CommaPos int
    while LEN(@LtrFiltr) > 0
    begin
    set @CommaPos = CHARINDEX(',',@LtrFiltr,1)
    if @CommaPos > 0
    begin
    insert @Ltrs (EvidenceLetter) Select left(@LtrFiltr,@CommaPos-1)
    set @LtrFiltr = SUBSTRING(@LtrFiltr,@CommaPos+1,len(@LtrFiltr)-@CommaPos+1)
    end
    else
    begin
    insert @Ltrs (EvidenceLetter) Select @LtrFiltr
    set @LtrFiltr = ''
    end
    end
    return
    END
  • ScottPletcher wrote:

    Yes, in theory could be less efficient in some cases than a LOOP join.  But it's still reasonable performance.  You'll never totally drop off a performance cliff going from LOOP to HASH, but you can vice versa (as you're seen).

    Another way to do it, although it would then effect every join in the query and not just a specific join, is to use:

    OPTION(HASH JOIN, MERGE JOIN)

    MERGE joins are extremely efficient when applicable, so you don't want to prevent those.  So this just prevents LOOP joins.

    Wow, didn't even know you could do that. I'm fairly competent with basic SQL, but there are so many variations, and new elements being added all the time. Makes my head spin trying to keep up.  Thank you, I'll give that a try as well.

  • You could maybe replace the whole WHILE LOOP'ing function with Eirikur's LEAD splitter.  If this is SQL Server 2016+ you could use the built-in STRING_SPLIT itvf (since the order of the split string(s) doesn't seem to matter (which is another reason to leave in the FROM clause imo)).  Something like this

    select top(1) p.AkcesAutoID
    from dbo.podrobnosti p
    join dbo.DelimitedSplit8K_LEAD('J') ds on p.EvidenceLetter=ds.Item
    join dbo.Akces a on p.AkcesAutoID=a.AkcesAutoID
    order by a.Rok, a.Akcesitpred, a.Akcesit;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    The query could be simplified, no?  Maybe something like this

    select top(1) p.AkcesAutoID
    from dbo.podrobnosti p
    join ro.fntStringsToTable('J') ltrs on ltrs.EvidenceLetter = p.EvidenceLetter
    join dbo.Akces a on p.AkcesAutoID = a.AkcesAutoID
    order by a.Rok, a.Akcesitpred, a.Akcesit;

    "I was thinking of replacing the entire JOIN clause with WHERE..."  Nah, I wouldn't do it.  A SELECT statement is a procedural sandwich which is evaluated in a fixed order (in most cases).  FROM, ON, and JOIN are evaluated 1st, 2nd, 3rd.  My tendency is to load up as much as possible in the FROM clause because it seems to make the rest of the query simpler and easier to get right.  It's also somewhat more self-documenting imo

    It could be simplified, and I tried that when testing, before I posted the problem, but it didn't help much. The major bugaboo seems to be the join to the TVF. Anyway, this query is built by a series of functions - one that opens the select clause, another that generates the fields, another for the joins, another for inner Where, outer Where, and finally sort. It's the result of years of trying different approaches to developing and maintaining all the queries that feed the final user app. It's not the absolute peak in performance - the queries generated could usually be tweaked for better response, but the performance is more than adequate for the setup I'm working with, and it makes both maintenance and additional development much more manageable.

    I originally trotted out the idea at a SQL in the City meeting, back when they were still being conducted as real meetings in London. Several heavyweights told me the idea stunk (although they were much more polite about it), but I kept at it and got it working pretty well. I probably couldn't get away with it in a heavily loaded environment, but I have the luxury of a machine that is loafing around 99% of the time, and has essentially unlimited disk space. As long as individual queries respond reasonably well, around 1 second or less (and almost all do - even with my less than optimal dynamic query generation, response time is usually instant), everyone is happy. Resource contention, blocking, deadlocks, memory pressure - all that sort of stuff simply doesn't exist for me.

    A thought, though - what about putting the Where clause into the From, as a subquery:

    SELECT  AkcesAutoID  FROM
    ( SELECT
    P.AkcesAutoID, A.AkcesitPred, A.Akcesit, A.Rok
    FROM(Select * From dbo.Podrobnosti Where EvidenceLetter In ('J')) P
    INNER JOINdbo.Akces A ON P.AkcesAutoID = A.AkcesAutoID) PP
    Order By PP.Rok, PP.AkcesitPred, PP.Akcesit

    I'll give that a try at work tomorrow and see what happens.

  • Steve Collins wrote:

    You could maybe replace the whole WHILE LOOP'ing function with Eirikur's LEAD splitter.  If this is SQL Server 2016+ you could use the built-in STRING_SPLIT itvf (since the order of the split string(s) doesn't seem to matter (which is another reason to leave in the FROM clause imo)).  Something like this

    select top(1) p.AkcesAutoID
    from dbo.podrobnosti p
    join dbo.DelimitedSplit8K_LEAD('J') ds on p.EvidenceLetter=ds.Item
    join dbo.Akces a on p.AkcesAutoID=a.AkcesAutoID
    order by a.Rok, a.Akcesitpred, a.Akcesit;

    I can't use STRING_SPLIT (yet), I'm still on 2014. I have a virtual 2019 machine online, but still waiting for the staff to sort out some connection bugs - I can experiment with it and talk to the instance with SSMS, but I can't get an ODBC connection, which is kind of important. The app is in MSACCESS, and ODBC is how it communicates with the backend. None of the staff are DB types, and I have no admin rights in the network (domain), so troubleshooting this kid of stuff is problematic. They don't know what I need, and I can't test things on my own.

    But I don't think the performance of the TVF is really the issue. The problem is SQL Server is generating a shitty plan when I add the TOP 1 specifier. A faster function may run somewhat faster, but not nearly enough to make up for the abysmal performance hit from a bad execution plan.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply