Best Practice question

  • I have a requirement to pull customer data from a table and have it filtered by 1 of 3 different fields and sorted by 1 of 3 fields.   All queries would retrieve only the customer_id into a temp table.  Then I could join the other customer data once it is "filtered".  Here are the 2 ways I think I could handle this:

    Here is the example code:

    CREATE TABLE #customer 
    ( customer_id INT IDENTITY,
    customer_num CHAR(6),
    name CHAR(20),
    zip CHAR(5),
    colors CHAR(5))
    INSERT INTO #customer
    ( customer_num, name, zip, colors)
    VALUES
    ('100', 'Dave', '92444' , 'Red'),
    ('101', 'Andy', '51111' , 'Blue'),
    ('102', 'Ralph', '01222', 'Red'),
    ('103', 'Dave', '51111' , 'Green'),
    ('104', 'Mike', '10555' , 'Green'),
    ('105', 'Jane', '32145' , 'Red')

    CREATE TABLE #results (customer_id INT)

    DECLARE @name CHAR(20) = ''
    DECLARE @zip CHAR(5) = '51111'
    DECLARE @colors CHAR(5) = ''

    -- Option 1 - 3 statements
    INSERT INTO #results
    SELECT customer_id
    FROM #customer AS c
    WHERE name = @name
    INSERT INTO #results
    SELECT customer_id
    FROM #customer AS c
    WHERE zip = @zip
    INSERT INTO #results
    SELECT customer_id
    FROM #customer AS c
    WHERE colors = @colors

    SELECT *
    FROM #results AS r


    --- Option 2:
    SELECT customer_id
    FROM #customer AS c
    WHERE name = case when @Name = '' then name else @name end
    AND zip = case when @zip = '' then zip else @zip END
    AND colors = CASE WHEN @colors = '' THEN colors ELSE @colors END


    DROP TABLE #customer
    DROP TABLE #results

     

    Which option is better?  Will I experience a problem on large datasets with the 2nd option?  Are there other ways to do this?

    Thanks.

  • Option2 is going to give you terrible performance because the @Name = '', @zip = '' and @colors = '' in the WHERE clause will force a table scan every time.

    Option 1 will potentially give you duplicates, but can be tweaked to ensure that you get distinct values.  However, it will not necessarily give you the same results, as it is effectively using OR instead of AND

    You can use Dynamic SQL, and build up the WHERE clause to only include the fields that you are searching on.

    --- Option 3:
    DECLARE @SQL nvarchar(2000); -- Size as needed
    DECLARE @WHERE nvarchar(100) = NULL; -- Size as needed

    SET @SQL = N'/*INSERT INTO #results ( customer_id )*/
    SELECT c.customer_id
    FROM #customer AS c
    [[WHERE]]
    GROUP BY c.customer_id
    ';

    IF ( @name <> '' )
    BEGIN
    SET @WHERE = ISNULL(@WHERE + ' AND ', 'WHERE ') + 'c.name = @name';
    END;

    IF ( @zip <> '' )
    BEGIN
    SET @WHERE = ISNULL(@WHERE + ' AND ', 'WHERE ') + 'c.zip = @zip';
    END;

    IF ( @colors <> '' )
    BEGIN
    SET @WHERE = ISNULL(@WHERE + ' AND ', 'WHERE ') + 'c.colors = @colors';
    END;

    SET @SQL = REPLACE(@SQL, N'[[WHERE]]', ISNULL(@WHERE, N''));

    EXECUTE sys.sp_executesql @stmt = @SQL
    , @params = N'@name char(20), @zip char(5), @colors char(5)'
    , @name = @name
    , @zip = @zip
    , @colors = @colors;

     

  • The common name for such a query is a "Catch-All" query and "It Depends" strongly comes into play here.  How often will this query be executed?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The example is for a query called by the Customer List page of the web application.  It will execute 1000's of times a day.

  • You'd typically want to use dynamic SQL for this, so that only the relevant columns were included in the comparisons.  You'll also typically want to recompile to get a specific query plan, since each run could be very different from a previous run.

    DECLARE @colors ...
    ...

    DECLARE @sql nvarchar(max)

    SET @sql = 'INSERT INTO #results SELECT customer_id FROM #customer AS c WHERE'

    IF @colors <> ''
    SET @sql = @sql + ' colors = @colors AND'
    IF @name <> ''
    SET @sql = @sql + ' name = @name AND'
    IF @zip <> ''
    SET @sql = @sql + ' zip = @zip AND'
    IF RIGHT(@sql, 4) = ' AND'
    SET @sql = LEFT(@sql, LEN(@sql) - 4)

    SET @sql = @sql + ' OPTION ( RECOMPILE )'

    PRINT @sql
    EXEC(@sql)

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • mike 57299 wrote:

    The example is for a query called by the Customer List page of the web application.  It will execute 1000's of times a day.

    In that case, please see the following article, which I consider to be the "Holy Grail" for such "Catch-All Query" code.

    https://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

    The code is NOT subject to SQL Injection.  It's also super important to avoid recompiles for such code when it's used "thousands of times in a day" because that's an extra load on the system that should be avoided for high usage.  That also means that using sp_ExecuteSQL is a must in order to develop separate stored plans depending on usage.

    In other words, follow Gail's direction in the article I posted the link for and that does NOT include the use of RECOMPILE or a simple EXEC().

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    In that case, please see the following article, which I consider to be the "Holy Grail" for such "Catch-All Query" code. 

    +1, me too.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Jeff Moden wrote:

    mike 57299 wrote:

    The example is for a query called by the Customer List page of the web application.  It will execute 1000's of times a day.

    In that case, please see the following article, which I consider to be the "Holy Grail" for such "Catch-All Query" code.

    https://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/

    +2, me three

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Actually you could be causing a performance problem by getting SQL to use the same query plan for different values, i.e., by avoiding the "overhead" of a recompile.  The first query executed might require a full scan, but the next one could use an index lookup ... but it won't because you parameterized the query and basically forced SQL to use the same plan.

    It's true that the article doesn't mention using recompile with dynamic SQL.  That may be an oversight rather than a feature.

    The OPTION ( RECOMPILE ) may be redundant with EXEC(), but it doesn't hurt anything if the plan would always be (re)compiled anyway.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • To expand upon this, and maybe provide some ideas that can ease your pain, here are some thoughts.  I had this same situation a while back. The difference was that there were 125 possible columns that could have been included in the search, and the additional wrinkle was the concepts of "related items" and "Like items".  Related items would be something like a gas can when they search for a lawnmower. Like items were all of the lawnmowers with similar features.   The actual code was what Gail Shaw laid out in the article, with the following twists...

    1. Where did the determination that these three parameters were what was needed?  In other words, before you start writing the code, make sure you are filling an actual need.  You may be handling a situation that was required once 15 years ago.   Using my experience, this search evolved over time when anyone said "I wish I could search on XXX".  Being wonderful people, the developers just kept adding and adding possibilities.
    2. Can you determine what the "default" search condition may be? As an example, 90% of the time the only value that is entered by the users is "Name".  The other 10% are some combinations of the three parameters. If you can make an ACCURATE determination, then create 3 procs.  The parent proc, which tests to see what parameters contain values and calls the next two procs.  The second proc is written for the 90%, the third proc is written for everything else.   Again, using my experience, I captured the values of the parameters with each execution of the procedure for 3 months.  I determined that 94% of the time the same single parameter contained a value.  Further analysis uncovered that, as one example, person Z always entered these 2 parameters because they thought they had to.  There were situations where they search used a combination of parameters once a week so that the user could export the data into a spreadsheet, and create a "catalog".  And so forth.   Once I figured these things out, a proc was created that was tuned for the 94%,  a second proc was created that handled all of the rest of the cases, a report was created to create the catalog, and we were able to reduce 125 parameters down to 33.
    3. Do you need a like search?  If so, do not make that the default.  Make the user choose to do a like search, and if possible, pop up a nag that says "This is gonna be slow, you should probably go to lunch for a while".  You may want to give three choices, and exact match, a "starts with", and a "like".   If the like search is mission critical, there are a number of far better things to do than just using wildcard characters.
    4.  Can you validate the users input and possibly provide them a pick list? In your example, is there a finite list of colors? If so, make the users pick the colors.  That eliminates a user entering "gray" when the database contains "grey".  You can also possibly  leverage filtered indexes, assuming that there are only a few possible values.  An easy example is an column that contains "active" or "Inactive".  You create 2 indexes, but again, it depends.  Once is filtered for active, another is filtered for inactive.
    5. Depending upon the size of your dataset, and the number of columns you need to return, you may want to consider creating a "search" table.  We had a situation where there was a table that contained about 80 columns, and about 20 million rows.  There were 5 columns we searched on, 3 of which were datetime columns. Almost every search was WHERE datetime >= Some Date, which created a lot of scans of the table.  We created a search table that consisted of the 5 search columns, and the primary key of the parent table. It was kept in sync with triggers.  The search used that table, and then used the PK to grab the records out of the parent table.  The search was almost like a big index on the larger parent table.

    Ultimately, all of this was replaced with  a full text search built with Lucene / Solr, but it ran like a champ for a long while and got me a lot of kudos from the powers that be.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • ScottPletcher wrote:

    Actually you could be causing a performance problem by getting SQL to use the same query plan for different values, i.e., by avoiding the "overhead" of a recompile.  The first query executed might require a full scan, but the next one could use an index lookup ... but it won't because you parameterized the query and basically forced SQL to use the same plan.

    It's true that the article doesn't mention using recompile with dynamic SQL.  That may be an oversight rather than a feature.

    The OPTION ( RECOMPILE ) may be redundant with EXEC(), but it doesn't hurt anything if the plan would always be (re)compiled anyway.

    Where does the article say to re-use the same plan? Am I missing something?

    There are 2 suggestions in this article. One is OPTION(Recompile), which does not use the same plan.

    The second suggestion is sp_executesql, which caches each query that has the same parameters.

    Depending upon the situation, EXEC can certainly open one up to SQL injection.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • ScottPletcher wrote:

    Actually you could be causing a performance problem by getting SQL to use the same query plan for different values, i.e., by avoiding the "overhead" of a recompile.  The first query executed might require a full scan, but the next one could use an index lookup ... but it won't because you parameterized the query and basically forced SQL to use the same plan.

    It's true that the article doesn't mention using recompile with dynamic SQL.  That may be an oversight rather than a feature.

    The OPTION ( RECOMPILE ) may be redundant with EXEC(), but it doesn't hurt anything if the plan would always be (re)compiled anyway.

    The same holds true for static SQL thanks to parameter sniffing.  Having individual query plans for each main set of criteria is a good thing in these "Catch-All" queries thanks to the plans that are formed.  And RECOMPILE is never a good idea if the proc is going to be hammered upon.  That IS a place were RECOMPILE should not be used because it does, in fact, hurt.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    ScottPletcher wrote:

    Actually you could be causing a performance problem by getting SQL to use the same query plan for different values, i.e., by avoiding the "overhead" of a recompile.  The first query executed might require a full scan, but the next one could use an index lookup ... but it won't because you parameterized the query and basically forced SQL to use the same plan.

    It's true that the article doesn't mention using recompile with dynamic SQL.  That may be an oversight rather than a feature.

    The OPTION ( RECOMPILE ) may be redundant with EXEC(), but it doesn't hurt anything if the plan would always be (re)compiled anyway.

    The same holds true for static SQL thanks to parameter sniffing.  Having individual query plans for each main set of criteria is a good thing in these "Catch-All" queries thanks to the plans that are formed.  And RECOMPILE is never a good idea if the proc is going to be hammered upon.  That IS a place were RECOMPILE should not be used because it does, in fact, hurt.

    NOT if the queries have dramatically different row counts with supporting indexes.  The recompile time is trivial compared to full table scans.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    Actually you could be causing a performance problem by getting SQL to use the same query plan for different values, i.e., by avoiding the "overhead" of a recompile.  The first query executed might require a full scan, but the next one could use an index lookup ... but it won't because you parameterized the query and basically forced SQL to use the same plan.

    It's true that the article doesn't mention using recompile with dynamic SQL.  That may be an oversight rather than a feature.

    The OPTION ( RECOMPILE ) may be redundant with EXEC(), but it doesn't hurt anything if the plan would always be (re)compiled anyway.

    The same holds true for static SQL thanks to parameter sniffing.  Having individual query plans for each main set of criteria is a good thing in these "Catch-All" queries thanks to the plans that are formed.  And RECOMPILE is never a good idea if the proc is going to be hammered upon.  That IS a place were RECOMPILE should not be used because it does, in fact, hurt.

    NOT if the queries have dramatically different row counts with supporting indexes.  The recompile time is trivial compared to full table scans.

    I'll agree that "It Depends".  Would you add RECOMPILE to every bit of code that may have such an issue even though it's used 1,000's of time per day?  One would hope not.

    We once had some code that "only" took 100ms to "run".  It took a while to figure out why people were complaining how long it took to run in the GUI.  It turns out that the code was recompiling EVERY time it ran and it was taking 2 to 22 seconds with the norm being about 20 seconds.  Talk about "scans".

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Michael L John wrote:

    To expand upon this, and maybe provide some ideas that can ease your pain, here are some thoughts.  I had this same situation a while back. The difference was that there were 125 possible columns that could have been included in the search, and the additional wrinkle was the concepts of "related items" and "Like items".  Related items would be something like a gas can when they search for a lawnmower. Like items were all of the lawnmowers with similar features.   The actual code was what Gail Shaw laid out in the article, with the following twists...

    1. Where did the determination that these three parameters were what was needed?  In other words, before you start writing the code, make sure you are filling an actual need.  You may be handling a situation that was required once 15 years ago.   Using my experience, this search evolved over time when anyone said "I wish I could search on XXX".  Being wonderful people, the developers just kept adding and adding possibilities.
    2. Can you determine what the "default" search condition may be? As an example, 90% of the time the only value that is entered by the users is "Name".  The other 10% are some combinations of the three parameters. If you can make an ACCURATE determination, then create 3 procs.  The parent proc, which tests to see what parameters contain values and calls the next two procs.  The second proc is written for the 90%, the third proc is written for everything else.   Again, using my experience, I captured the values of the parameters with each execution of the procedure for 3 months.  I determined that 94% of the time the same single parameter contained a value.  Further analysis uncovered that, as one example, person Z always entered these 2 parameters because they thought they had to.  There were situations where they search used a combination of parameters once a week so that the user could export the data into a spreadsheet, and create a "catalog".  And so forth.   Once I figured these things out, a proc was created that was tuned for the 94%,  a second proc was created that handled all of the rest of the cases, a report was created to create the catalog, and we were able to reduce 125 parameters down to 33.
    3. Do you need a like search?  If so, do not make that the default.  Make the user choose to do a like search, and if possible, pop up a nag that says "This is gonna be slow, you should probably go to lunch for a while".  You may want to give three choices, and exact match, a "starts with", and a "like".   If the like search is mission critical, there are a number of far better things to do than just using wildcard characters.
    4.  Can you validate the users input and possibly provide them a pick list? In your example, is there a finite list of colors? If so, make the users pick the colors.  That eliminates a user entering "gray" when the database contains "grey".  You can also possibly  leverage filtered indexes, assuming that there are only a few possible values.  An easy example is an column that contains "active" or "Inactive".  You create 2 indexes, but again, it depends.  Once is filtered for active, another is filtered for inactive.
    5. Depending upon the size of your dataset, and the number of columns you need to return, you may want to consider creating a "search" table.  We had a situation where there was a table that contained about 80 columns, and about 20 million rows.  There were 5 columns we searched on, 3 of which were datetime columns. Almost every search was WHERE datetime >= Some Date, which created a lot of scans of the table.  We created a search table that consisted of the 5 search columns, and the primary key of the parent table. It was kept in sync with triggers.  The search used that table, and then used the PK to grab the records out of the parent table.  The search was almost like a big index on the larger parent table.

    Ultimately, all of this was replaced with  a full text search built with Lucene / Solr, but it ran like a champ for a long while and got me a lot of kudos from the powers that be.

    How would you do a "starts like" search?   Would that use the 'like' operator?

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply