Dynamic SQL and IN Operator

  • VERY impressive splitter improvements there Jeff! Well done!!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • m.haren (4/20/2011)


    What if you use the same convert for the value you are checking. Like ... convert(nvarchar(20), PRH.ClosedDate, 112) ?

    That should work, but is gloriously inefficient as it's not sargable.

    Tom

  • Jeff Moden (4/16/2011)


    Thanks for posting the splitter code. There are several problems with that splitter code that you may not be aware of.

    1. That splitter doesn’t work as advertised according to the code itself. The input variables suggest that it will allow a 5-character delimiter but if we feed it a string with even just a 2-character delimiter…

    SELECT * FROM dbo.Split('12|-|34|-|56|-|78','|-|');

    … here’s what you get.

    IdData

    112

    2|34

    3|56

    4|78

    2. The splitter code uses RBAR to do the split. In this case, it’s what I call a “Nibbler” splitter which finds the first element, stores it, and the “bites” the first element and first delimiter off of the string. The process continues until no delimiters are left and it does a final insert with everything that’s left as the final element. Because of the multiple inputs and the fact that the function is necessarily an mTVF (Multi-line Table Valued Function which has all of the overhead of a Scalar Function), this function is relatively slow.

    3. The splitter uses artificial limits on the size of the string that it can be fed. With MAX data-types being the exception, one would always expect an NVARCHAR splitter to be able to split up to and including the full width of an NVARCHAR(4000).

    4. The splitter has a built in LTRIM/RTRIM which could be a real problem if leading or trailing spaces are ever required to be preserved. My recommendation would be to do such trimming outside of the function if such a need arises. For those items not needing to be trimmed, having trimming done inside the function doesn’t do anything except slow things down.

    Here’s the code that I recommend you use. If you really need for the trimming to be done inside the splitter, do that at the final assignment of “Item = “. Of course, you can also rename the splitter and the column names it returns to match your current splitter.

    CREATE FUNCTION [dbo].[DelimitedSplitN4K]

    /***************************************************************************************************

    Purpose:

    Split a given string at a given delimiter and return a list of the split elements (items).

    Notes:

    1. Leading a trailing delimiters are treated as if an empty string element were present.

    2. Consecutive delimiters are treated as if an empty string element were present between them.

    3. Except when spaces are used as a delimiter, all spaces present in each element are preserved.

    Returns:

    iTVF containing the following:

    ItemNumber = Element position of Item as a BIGINT (not converted to INT to eliminate a CAST)

    Item = Element value as a NVARCHAR(4000)

    CROSS APPLY Usage Example:

    ---------------------------------------------------------------------------------------------------

    --===== Conditionally drop the test tables to make reruns easier for testing.

    -- (this is NOT a part of the solution)

    IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL

    DROP TABLE #JBMTest

    ;

    --===== Create and populate a test table on the fly (this is NOT a part of the solution).

    -- In the following comments, "b" is a blank and "E" is an element in the left to right order.

    -- Pipes are used to encapsulate the output of "Item" so that you can see that all blanks

    -- are preserved no matter where they may appear.

    SELECT *

    INTO #JBMTest

    FROM ( --# & type of Return Row(s)

    SELECT 0, NULL UNION ALL --1 NULL

    SELECT 1, SPACE(0) UNION ALL --1 b (Empty String)

    SELECT 2, SPACE(1) UNION ALL --1 b (1 space)

    SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces)

    SELECT 4, N',' UNION ALL --2 b b (both are empty strings)

    SELECT 5, N'55555' UNION ALL --1 E

    SELECT 6, N',55555' UNION ALL --2 b E

    SELECT 7, N',55555,' UNION ALL --3 b E b

    SELECT 8, N'55555,' UNION ALL --2 b B

    SELECT 9, N'55555,1' UNION ALL --2 E E

    SELECT 10, N'1,55555' UNION ALL --2 E E

    SELECT 11, N'55555,4444,333,22,1' UNION ALL --5 E E E E E

    SELECT 12, N'55555,4444,,333,22,1' UNION ALL --6 E E b E E E

    SELECT 13, N',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b

    SELECT 14, N',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b

    SELECT 15, N' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space)

    SELECT 16, N'This,is,a,test.' --E E E E

    ) d (SomeID, SomeValue)

    ;

    GO

    --===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)

    SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = '|'+split.Item+'|'

    FROM #JBMTest test

    CROSS APPLY dbo.DelimitedSplitN4K(test.SomeValue,',') split

    ;

    ---------------------------------------------------------------------------------------------------

    Other Notes:

    1. Optimized for NVARCHAR(4000) or less. No testing or error reporting for truncation at 4000

    characters is done.

    2. Optimized for single character delimiter. Multi-character delimiters should be resolved

    externally from this function.

    3. Optimized for use with CROSS APPLY.

    4. Does not "trim" elements just in case leading or trailing blanks are intended.

    5. If you don't know how a Tally table can be used to replace loops, please see the following...

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    6. Changing this function to use NVARCHAR(MAX) will cause it to run twice as slow. It's just the

    nature of NVARCHAR(MAX) whether it fits in-row or not.

    7. Multi-machine testing for the method of using UNPIVOT instead of 10 SELECT/UNION ALLs shows

    that the UNPIVOT method is quite machine dependent and can slow things down quite a bit.

    Credits:

    This code is the product of many people's efforts including but not limited to the following:

    cteTally concept originally by Iztek Ben Gan and "decimalized" by Lynn Pettis (and others) for a

    bit of extra speed and finally redacted by Jeff Moden for a different slant on readability and

    compactness. Hat's off to Paul White for his simple explanations of CROSS APPLY. Finally,

    special thanks to Erland Sommarskog for his tireless efforts to help people understand

    what you can actually do with T-SQL. I also thank whoever wrote the first article I ever saw

    on "numbers tables" which is located at the following URL ...

    http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-numbers-table.html

    Revision History:

    Rev 00 - 20 Jan 2010 - Concept for inline cteTally: Lynn Pettis and others.

    Redaction/Implementation: Jeff Moden

    - Base 10 redaction and reduction for CTE. (Total rewrite)

    Rev 01 - 13 Mar 2010 - Jeff Moden

    - Removed one additional concatenation and one subtraction from the SUBSTRING in the

    SELECT List for that tiny bit of extra speed.

    Rev 02 - 14 Apr 2010 - Jeff Moden

    - No code changes. Added CROSS APPLY usage example to the header, some additional credits,

    and extra documentation.

    Rev 03 - 18 Apr 2010 - Jeff Moden

    - No code changes. Added notes 7, 8, and 9 about certain "optimizations" that dont'

    actually work for this type of function.

    Rev 04 - 29 Jun 2010 - Jeff Moden

    - Added WITH SCHEMABINDING thanks to a note by Paul White. This prevents an unnecessary

    "Table Spool" when the function is used in an UPDATE statement even though the function

    makes no external references.

    Rev 05 - 02 Apr 2011 - Jeff Moden

    - Rewritten for extreme performance improvement especially for larger strings approaching

    the 4K boundary and for strings that have wider elements. The redaction of this code

    involved removing ALL concatenation of delimiters, optimization of the maximum "N" value

    by using TOP instead of including it in the WHERE clause, and the reduction of all

    previous calculations (thanks to the switch to a "zero based" cteTally) to just one

    instance of one add and one instance of a subtract. The length calculation for the final

    element (not followed by a delimiter) in the string to be split has been greatly

    simplified by using the ISNULL/NULLIF combination to determine when the CHARINDEX returned

    a 0 which indicates there are no more delimiters to be had or to start with. Depending on

    the width of the elements, this code is between 4 and 8 times faster on a single CPU box

    than the original code especially near the 8K boundary.

    - Modified comments to include more sanity checks on the usage example, etc.

    Rev 06 - 12 Apr 2011 - Jeff Moden

    - Based on a suggestion by Ron "Bitbucket" McCullough, additional test rows were added to

    the above sample and the code was changed to encapsulate the output in pipes so that

    spaces and empty strings could be perceived in the output. The "Notes" section was added

    and the "Other Notes" section was abbreviated.

    ***************************************************************************************************/

    --===== Define I/O parameters

    (

    @pString NVARCHAR(4000),

    @pDelimiter NCHAR(1)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table” produces values from 0 up to

    -- 10,000... enough to cover NVARCHAR(4000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1

    FROM cteTally t

    WHERE (

    SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN

    OR t.N = 0

    )

    )--===== The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,4000))

    COLLATE Latin1_General_BIN

    FROM cteStart s

    ;

    GO

    Just in case you’ve heard that “Tally Table” based splitters get slow as the number of elements increased or the width of the string increased, that used to be true. This is a new type of splitter that I just completed a new article for, which will be coming out soon. If you still have doubts, here’s a performance chart from the article for a very similar VARCHAR(8000) splitter. The heavy Blue line is your “Nibbler” splitter. The Red line is how old “Tally Table” based splitters used to act. The heavy Black line is the new splitter above.

    Can you post the version optimized for varchar(8000) or varchar(max)?

    P.S. I love it when the short, clear and consise explainations takes 3 times more space than the code they explain :hehe:.

  • Thanks for the help everyone, especially Tom. Got it working now.

  • TheSQLGuru (4/20/2011)


    VERY impressive splitter improvements there Jeff! Well done!!

    Aye... thanks, Kevin.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ninja's_RGR'us (4/20/2011)


    Can you post the version optimized for varchar(8000) or varchar(max)?

    You bet... see attached. I've included both the NVARCHAR(4000) and VARCHAR(8000) versions. The only differences are (of course) no divide by 2 for DATALENGTH, 4000 becomes 8000, and NVARCHAR(4000) becomes VARCHAR(8000).

    As a side bar, I've also sussed the same problem for the 2 MAX datatypes which will be in a separate article. I'd attach the code except that I'm not done testing it for accuracy and performance yet. I don't want to put something out there as an "improvement" unless I'm absolutely sure.

    And, no... it's not as easy as just changing the parameters to a MAX datatype. You can certainly do that but performance will be disappointing.

    P.S. I love it when the short, clear and consise explainations takes 3 times more space than the code they explain :hehe:.

    Yeah, I know, huh? Lot's of people on this forum have contributed to the code over a period of time and I kept track of them as best I can. I also didn't want anyone to have a misunderstanding about how the code works so that's why all the comments. It's the first function that's an "article" in itself. 😉

    [font="Arial Black"]{edit} The code I had attached to this post works fine for most delimiters but has a case problem if letters are used as delimiters. I took the code down and will replace it with corrected code as soon as I can.[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • [font="Arial Black"] The code I had attached to this post works fine for most delimiters but has a case problem (thanks to Paul White for finding the problem) if letters are used as delimiters. I took the code down and will replace it with corrected code as soon as I can which will be tonight.[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Considering that I missed something before, I sent the repaired code out to several folks and asked them to try to break it. As soon as I hear back from them, I'll post back here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply