Subquery with "OR" strange problem.

  • Hi, guys. I would like to ask you a question to prevent me going insane. I stumbled upon something yesterday that hooked me and bothers me tremendously. My super simple SQL does not work as expected and I am looking for somebody to help explain why. I do not need a resolution or rewriting this SQL in a different way. I need to understand WHY this super simple SQL does not work.

    It is SQL 2008 with no traffic. There are two tables Big and Small. Big table is about 500K rows and Small is about 500 rows. The datapype of the columns used are nvarchar(30), the tables are well organized(PK, indexes, itc…)

    SELECT * FROM Customers

    Where Customers.Name like 'Rom%' OR Name In (Select DerivitiveName from Names where MasterName = Customers. Name)

    Both the “like” clause and subquery run a fraction of a second but combined together they run painfully slow, 18 seconds. The subquery usually returns just a few rows. Even if I rewrite it to be one record it still runs identically slow!

    SELECT * FROM Customers

    Where Customers.Name like 'Rom%' OR Name = (Select DerivitiveName from Names where MasterName = Customers. Name and ID =323)

    A Union with both parts of WHER section runs instantly, so the correlated query but this one does not want to run fast! Can somebody to give me a reason WHY it does not work and I am clueless in this situation.

  • Try running it with both the OR and the UNION and compare the query plans. Save the plans and post them here if possible.

  • It would help if you would post the DDL for the tables including index definitions and the actual execution plans of both the fast and slow queries as .SQLPLAN files.

  • What happens if you use a unicode data type to query the data? Might be an issue with implicit conversion. Hard to tell without the actual execution plans and the limited information available....

    SELECT * FROM Customers

    Where Customers.Name like N'Rom%' OR Name In (Select DerivitiveName from Names where MasterName = Customers. Name)



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Most likely (almost certainly!) this query produces a plan with two scans and a nested loop. For each row in the outer (Customers) it scans the inner part, something like a constant scan for [font="Courier New"]Customers.Name like 'Rom%'[/font] and a full scan of

    Names for [font="Courier New"]OR Name In (Select DerivitiveName from Names where MasterName = Customers. Name)[/font]

    ....This might take a while....!

    😎

  • Yes, I am thinking the same: it might correlates each row of the large table with “OR” part of the small table (subquery ) as if I have no result out of subquery then I have no speed problem. The whole point is that it not shall be happening (at least in my view) and with this type of selects I never had any issuers before. I am pondering about what happens I this specific case. I will post the plan tomorrow.

  • As a general rule, including an OR in the WHRE clause will prevent the optimiser from using an index. This can be confirmed in your situation by looking at the query plans.

    One standard way to improve performance where you need to use an OR is to do two separate queries joined by a UNION ALL (or a UNION if you need to eliminate duplicates). Each of the queries can use an index, so the overall performance can be faster.

    You need to try different options, look at the query plan, measure the performance, and decide what is best for your situation.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Thanks guys for your help. Union of cause does fine, so other options I have tried, and the plan looks like did not show anything abnormal. The question I was straggling was why THAT query did not work - everything what I know suggested me that it shall be fine and I modeled this stupid simple scenario over weekend with much larger amount of data and it was flying as expected. I just get to the client side and to my huge surprise (and not without some sadistic happiness) I discovered that the lager table uses NVARCHAR and conversion begin VARCHAR to NVARCHAR cause the understandable performance hit. Well, my faith in God has been restored. Lutz you got the prize, thank you very much to everybody.

    Kind Regards,

    Romy the Cat

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply