Conditional WHERE Clauses and Boolean Algebra

  • Hello All,

    First, good job Tony for the article. If anything, always good to put ideas out there for a vibrant debate.

    Now, I can unequivocally state that the performance aspect of using this technique is "OK" for very, very small datasets but once you have any real amount of data in a table to sift through, the performance degrades very quickly. As well, if you have many columns to filter on, it just drags the performance down even more.

    Having been bitten by this technique before, I switched back to dynamic SQL to take care of the job.

    I place all SQL commands in stored procs anyway so use of dynamic SQL is acceptable security wise (still have to be vigilant against injections...check your variables!).

    Again, with any technique you choose, test...test...test and check every aspect from, what I like to call "cold start" executions (first time run of a query) as that is a good estimation of what the "worst" performance of your query would be (subsequent calls to your query would be pulled from sql cache/precompiled) to a large set of data to search through.

    I really liked the methodology at first but once it was placed into a real world scenario, it just couldn't cut the mustard.

    2 cents...

    Dave

  • yes but its not as flexible ie doing something like:

    select * from table where (@date is null or [datecol] < @date)

    either way though, for best performance with this type of where clause stick to a dynamic sql. It will out perform any other way described in this forum.

    the proof:

    tested on a table of items that has 20 something million records. An index on the table on the [date] and [qty] fields. Doing:

    a) select [date], count(qty) from items

    where [date] < @date

    b) select [date], count(qty) from items

    where [date] < isnull(@date, [date])

    c) select [date], count(qty) from items

    where (@date is null or [date] < @date)

    query A takes 3 seconds and B&C each take 22seconds. BIG difference.

    in the latter two the optimizer decides to do an index scan, whereas the first does an index seek(obviously much faster in a large table).

  • thanx Eric Wahner-345205

    I already noticed and corrected the mistake

  • i would also add that teaching the benefit of using good mathematical principals, you should pick a better example query... not one that is at best teaching people how to write poor queries. If you look at the results in the above (real world example posted) the numbers clearly back the point of view which is to steer clear of using this method. sorry to rain on the parade

  • opc.three (8/31/2010)


    Rob Fisk (8/31/2010)


    opc.three (8/31/2010)


    Rob Fisk (8/31/2010)


    Very good points at the start of the article and I may do some refresher readin on some bits but for the specific example would something like this not be better?

    CREATE PROCEDURE [spSearchPeople]

    @firstname nvarchar(50) = '',

    @lastname nvarchar(50) = ''

    AS

    BEGIN

    SELECT ID, FirstName, LastName FROM People

    WHERE FirstName LIKE '%' + @firstname + '%'

    OR LastName LIKE '%' + @lastname + '%'

    END

    This approach does not correctly answer the search question when @firstname and @lastname are not null.

    The parameters are defaulted to an empty string rather than null.

    OK, there is the front end caveat that you don't pass a parameter unless needed but given that there is no need to use ISNULL within the query.

    The problem is not with the default value of the parameters it is with the boolean logic. If someone provide Jim for @firstname and Thorpe for @lastname your query will incorrectly return people with the name Jim Smith, Jim Williams, etc.

    Silly me. Of course it just needs changing to AND and it's all good.

    Nice spot. Was such a basic error I didn't see it even when looking again to see f I had missed something.

    That's another issue with complex boolean logic of course is that the simplest of things can be a mare to track down. Especially if you start throwing the odd NOT into the mix.

    To be fair though the article's main point is that you should be familiar with it which should make following the logic somewhat easier both when building and trouble shooting.

    [font="Arial"]_______________________________________________________
    Change is inevitable... Except from a vending machine.[/font]

  • more elegant and better performing COALESCE solution

    firstname = COALESCE(@firstname,firstname)

    and

    lastname= COALESCE(@lastname,lasttname)

    .. and one more comment

    dynamic SQL is always has to be a last and least favorite choice

    because it is not precompiled query and other many valid reasons

    http://www.sommarskog.se/dynamic_sql.html

  • While I agree with some of the initial arguments in the article, particular those around familiarizing oneself with Set logic, boolean Algebra, Etc. I strongly disagree with the approach proposed. While very simple, it doesn NOT optimize correctly. There is no Short-Circuiting in SQL. The first step SQL takes when running a query is to Normalize this into an object graph (from which a hash of this object graph will determine if a matching execution plan exists). As a result, pretty much ANY @Variable = Value check will result in a full scan. in some cases an @Parameter = Value may rarely behave differently when contained in a stored proc. the functional difference being that at compile time, @Parameter values are considered as specific values, whereas @Variables defined in the procedure are considered unknowns and will ALWAYS result in a scan for this approach (See BOL: Statistics Used by the query optimizer) ...you can use the Optimize For hint to counter this behavior, OPTION RECOMPILE also works).

    this is one of the most common patterns I have identified as causing problems in enterprise level systems because of the forced scan behavior. Generally I recommend 1 of 2 approachs: Dynamic SQL using SP_ExecuteSQL w/ Binding (this treats the values correctly as parameters), but costs extra due to likely recompiles...and in cases where you have no more than 3 optional parameters, check the parameters with IF's and execute only the query that has your specified arguments. Either of these approachs will optimize correctly and yield MUCH better results.

    In rare cases, I would accept this patterns, as long as there were sufficient NON-NULLable arguments to ensure that the optimizer performed SEEK operations on the non-nullable values.

  • Tony Alicea (8/31/2010)


    As far as efficiency - in 14 years I've always found this technique to be acceptable in the realm of efficiency and (especially in the old days) was given weight over closing the potential security holes of dynamic SQL.

    Building Parameterized Dynamic SQL and then invoking it via EXEC sp_executesql and passing back in the parameters will outperform this by a mile.

    No more (filed IS NULL OR filed = @parameter) junk. If the parameter is NULL it's not part of the WHERE clause.

    Read here.

    http://www.sommarskog.se/dyn-search-2005.html

    This is applicable even in SQL 2008 R2 because the new parameter sniffing screws up more than it fixes.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • WayneS (8/31/2010)


    For those that have read this far, I'd like to suggest that you read this blog[/url] by Gail Shaw on working with this type of query (catch-all query).

    Another great article about the same thing I quoted. 🙂



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • The main concern is about the use of indexes eventually present on the tables queried.

    Though is quite different to "evaluate" the term, consider the word

    small medium large

    referred to the size of table or to the size of the resulting rowset:

    For "small" you can use ISNULL or COALESCE as suggested by pfranceschini.

    For "medium" you can use AND/OR sequences as suggested by tony.

    For "large" you can use temp table:

    FIRST - using "medium" approach to create a temp table with the rows that match indexed fields, the temp table will have index on the fields that are NOT indexed in the original table

    AFTER - query the temp table using remaining field not indexed in the original table but indexed in the temp.

    I think each case must be worked on separately, there aren't a general rule of thumb! We must approach the most valuable solution that worth.

  • I agree that the approach described is the most rational, and therein lies the irony that is Microsoft. Please correct me if I am mistaken, but this approach will yield an index scan, while the dynamic SQL approach can get an index seek, which is faster.

    The irony is even more exasperating when one considers that Microsoft sells a developer environment that provides all the bells and whistles, like color coding for key words and on-the-spot compiling to identify what would be a runtime error.

    But then, for a very common task as described in this article, the most optimal solution is to code it with dynamic SQL. This then forces the developer to sacrifice all the benefits of that color coding and on-the-spot compiling.

  • About a year ago, someone in our company started using this approach to make stored procedures generic enough so multiple places in the application can use the same stored procedure. BAD MOVE! This practice brought the system to its knees. Bad plans are cached, high I/O, table scans, heavy blocking and deadlocks are just some of the problems with using OR statements in WHERE clauses.

    We went through a huge exercise to remove ALL of this code from stored procedures but we still find one we missed every once in a while. They will eventually show up on our daily list of stored procedures with the most reads.

    I'd recommend against this practice.

  • Tim Brooks (8/31/2010)


    I've seen this construct used many times, and typically it is a little poor on performance. An index on First and / or Last Name will usually be used, for sure, but it will tend to be an index scan, rather than a seek. I've tried many variations to get this to change, but with no joy as yet.

    Yes you are correct. I've used this trick myself for many years. Performance of it was pretty bad in some situations. I think, if you are using SQL Server, the dynamic query can be faster - SQL Server will look at your stored proc, and generate and cache a different plan for each possibility of the dynamic query. If certain options are heavily used, you will see an improvement in those, possibly at the expense of more rarely used options. Dynamic SQL in general should probably be avoided, but in the case where you are doing a search query with a lot of optional parameters, I think it can be used effectively.

    But the conditional where clause (if that's what we're calling it now) is a good way to do it if you only have one or two optional search parameters... when you get over two, it gets ugly.

  • Brannon Weigel (8/31/2010)


    About a year ago, someone in our company started using this approach to make stored procedures generic enough so multiple places in the application can use the same stored procedure. BAD MOVE! This practice brought the system to its knees. Bad plans are cached, high I/O, table scans, heavy blocking and deadlocks are just some of the problems with using OR statements in WHERE clauses.

    We went through a huge exercise to remove ALL of this code from stored procedures but we still find one we missed every once in a while. They will eventually show up on our daily list of stored procedures with the most reads.

    I'd recommend against this practice.

    Any method can be abused, that doesn't make it "always bad" - see my post before. If this technique is used correctly, performance can be pretty good, but the situation you describe is extreme. I have seen that many times in various places - someone gets an idea that one technique can solve every problem, and pretty soon you have a real mess. This is like anything else, it will bite you if you use it wrong.

  • Jeffrey Wiener (8/31/2010)


    But then, for a very common task as described in this article, the most optimal solution is to code it with dynamic SQL. This then forces the developer to sacrifice all the benefits of that color coding and on-the-spot compiling.

    And? Proper stored procs "in theory" have no optional parameters. Because these are optional parameters what you're getting is a query cache with different plans for each combination of parameters... it's actually very cool that it works this way.

    I like to think of my dynamic procedures like this as stored procedures that write stored procedures... it makes me smile.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 15 posts - 31 through 45 (of 60 total)

You must be logged in to reply to this topic. Login to reply