Improving Performance for Some LIKE “%string%” Searches (SQL Spackle)

  • As several folks have pointed out, it is not necessary to use the IS NOT NULL test in a CTE to get the INDEX SEEK, as the same query plan results when that filter is put in the WHERE clause. The blog I referenced in the article in fact shows it that way (if memory serves).

    The reason I left it in the CTE for this article has to do with the testing I was doing. I had several equivalent constructs to get the INDEX SEEK. When I was testing against a 10M row table and checking the timings, for some reason the CTE version was slightly faster than the others.

    So I left it that way just in case it might be more of an obvious speed gain when others came along to test the hypothesis.

    I cannot account for the speed difference of the CTE, except to say it was noticeable. The query plans were all the same.

    Thanks everyone for taking a look on this reboot of the article!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Jeff Moden (1/16/2015)


    Hi Dwain,

    Great article, as usual. It spawned some pretty good discussion, as well. Well done and thanks for taking the time to write something like this.

    I did find that, as someone else mentioned on this thread, that adding a sargable lookup does the same thing as the CTE does. If you take that one step further, you can improve the query and also ignore blanks as well as NULLs using the following. Of course and as always, YMMV and "It Depends". 😀

    SELECT AString

    FROM dbo.StringsTest

    WHERE AString > ''

    AND AString LIKE '%221%'

    ;

    Thanks Jeff for the kind words as usual.

    The thing I still haven't figured out is why I couldn't find anything on this topic in my searches. It seemed so obvious when I hit upon it, I can't believe that I'm the first person to surface these results. More likely my skills with Google need improvement.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (1/18/2015)


    The thing I still haven't figured out is why I couldn't find anything on this topic in my searches. It seemed so obvious when I hit upon it, I can't believe that I'm the first person to surface these results. More likely my skills with Google need improvement.

    Nope. I think your Google skills are probably just fine. You're probably one of the first to make this realization especially concerning leading "%" LIKEs.

    I got taught a lesson (thank you Paul White for the correction) about this (although not specifically about LIKE) when I wrote a bad QOD (which has been corrected) and it really enhanced my understanding of how indexes worked and how to simplify code.

    I was one of those that believed that ANY non-SARGable line of code in the WHERE clause was a very bad thing. To be sure, nothing beats a WHERE clause where all the lines in the WHERE clause are SARGable but sometimes the data in the table makes figuring that out a real pain or sometimes prevents it from happening at all despite all of the trickery in the world. For example, one might write a query against a customer table to return all customers that became customers in a given month. No problem. That makes for 100% SARGability in the presence of the correct index. If we add to that that the customer has a particular column that IS NULL or has a value of "0", you end up with either AND ISNULL(somecolumn,0) = 0 or you end up with (somecolumn IS NULL OR somecolumn = 0). Both suck... or so you would think. If you add "somecolumn" to the index (and sometimes without adding it), you still get a SEEK followed by a pretty fast range scan because the leading column of the index is based on the date. "Somecolumn" is just there for the ride and more quickly isolates the number of rows that the key lookup would have to do which makes it faster. Again, not as fast as something that's 100% SARGable but still very fast and faster than if you didn't add the column to the index, in most cases.

    Of course, the right thing to do would be to make the table column NOT NULL so that you could, indeed, write 100% SARGable code but that's not always possible.

    Again, thanks for the great article and all the testing you did. A lot of folks have no clue how long it takes to put something like this together and do the research and testing that it takes to actually be right at the same time. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • lsmith 90637 (1/16/2015)


    A word of caution regarding CTEs.

    I have run into problem with using CTE's against very large tables, especially in the 2008 line of SQL Servers, where the can CTE causes something approaching vapor lock. It gets worse with nested CTE's, in that the execution and re-execution of the queries inside CTE's must be re-executed every time they are referenced. The symptom when seen with SP_WHO2 is the query fails to progress for many minutes, even hours. It consumes CPU, and blows out to a large number of parallelized executions.

    Refactoring the same query to replace the CTE with a derived table or dumping the CTE results to a #temp table first, restore the query to sub-minute performance.

    It seems that there may have been a defect in 2008R2 that was resolved in 2012. In other words, we could readily reproduce in 2008R2 but not in 2012.

    We've removed all use of CTE from our code, except for recursive queries. Even there, we dump the results to a #temp table and use the #temp thereafter.

    Sledge hammer to swat a fly.

    CTEs are not the problem here. After all they are nothing more than an alternate way to write subqueries (except for recursive CTEs). Use them to make your code neater.

    It is true though that in some cases you get better performance with named temp tables. As always, it depends.

    Gerald Britton, Pluralsight courses

  • Great article. Thanks for sharing.

    I have used CTE's on a number of occasions to cure slow-running queries but in my case I believe they worked for entirely different reasons. In each case I was dealing with a query with a massive number of table joins, convoluted logic, etc., mostly caused by badly designed tables. In looking at the execution plans I would generally spot a place where a massive amount of data was being pushed through stage after stage of the pipeline only to get reduced to a few records at the very end -- if the filter which reduced the record output so massively had been done earlier in the pipeline, only a tiny amount of data would have needed to be processed, but the optimizer had timed out before finding an optimal plan. I found that if I organized a CTE to do that filtering as a separate piece then incorporated it into the main query, the optimizer, even though it would generally still time out prior to finding an optimal plan, would generally produce a plan which had my optimization built into it. Sometimes, by coding a few of these optimizations as CTE clauses I could get some massive improvements in performance.

    Of course, the main problem with hand-optimizing any code is still true -- the next release of the optimizer might handle the original situation spectacularly but perhaps will handle the hand-optimized code less well because the hand-optimized code ties its hands too much.

    There's more art than science, here.

    - Les

  • .

  • lnoland (2/12/2016)


    Great article. Thanks for sharing.

    I have used CTE's on a number of occasions to cure slow-running queries but in my case I believe they worked for entirely different reasons. In each case I was dealing with a query with a massive number of table joins, convoluted logic, etc., mostly caused by badly designed tables. In looking at the execution plans I would generally spot a place where a massive amount of data was being pushed through stage after stage of the pipeline only to get reduced to a few records at the very end -- if the filter which reduced the record output so massively had been done earlier in the pipeline, only a tiny amount of data would have needed to be processed, but the optimizer had timed out before finding an optimal plan. I found that if I organized a CTE to do that filtering as a separate piece then incorporated it into the main query, the optimizer, even though it would generally still time out prior to finding an optimal plan, would generally produce a plan which had my optimization built into it. Sometimes, by coding a few of these optimizations as CTE clauses I could get some massive improvements in performance.

    Of course, the main problem with hand-optimizing any code is still true -- the next release of the optimizer might handle the original situation spectacularly but perhaps will handle the hand-optimized code less well because the hand-optimized code ties its hands too much.

    There's more art than science, here.

    - Les

    Classic example where pulling massive queries apart into smaller ones then combining them at the end is the way to go. There's a good video on Channel9 about just that.

    Developing Microsoft SQL Server Databases: (06) Optimizing and Troubleshooting Queries (about 23 minutes in)

    Gerald Britton, Pluralsight courses

  • As usual the simple ideas are the best. Good article.

    From the forum comments I notice that the different versions of SQL give different results thereby illustrating that the SQL Server Query Optimiser does get better through time.

    When I started using SQL Server (back in the 6.5 days) there was a trick to put a completely superfluous wide ranging BETWEEN condition in a WHERE clause to force an index seek. From SQL2005 that became completely unnecessary and in some cases detrimental.

    It is always worth re-visiting what we learn to see if they continue to apply as technology evolves.

    It will probably seem bizarre to today's programmers but once upon a time garbage collection was seen as a ludicrous waste of resource and languages depending on it would never catch on. We were told always to manage the life cycle of our objects ourselves. Always explicitly destroy what you create. These days we are taught not to try and 2nd guess garbage collection until we really know what we are doing.

  • Thank you for the enlightenment.

  • But in case my table is largee and column don't have NULL value, but in query we are doing string search like %<text>% then also it is doing scan, while we have already created index on the column

  • I learned a long time ago if at all possible do not use a leading % as that means a full column scan. Better to use a leading character followed by % as it reduces the size of the search set.

Viewing 11 posts - 31 through 40 (of 40 total)

You must be logged in to reply to this topic. Login to reply