Reaping the benefits of the Window functions in T-SQL

  • Thanks for the great info.

    I was previously using a modified version of

    Jeff Moden's Tally OH! An Improved SQL 8K β€œCSV Splitter” Function

    to parse text files that where pipe delimited.

    I really needed a way to parse proper csv files that are comma delimited with quote text qualifiers

    as the data may have embedded commas.

    i.e. "900 N. May ST., #5"

    using the previous splitter it would get split into "900 N. May ST. and #5"

    After searching the forums I found this and it works perfect but..

    The production server for this project is 2008 R2.

    Is there any way to replicate the Lag() and Lead() functions with 2008 equivalents?

    Any help would be greatly appreciated.

  • Even li'l ol' MySQL has GROUP_CONCAT().

    You mean only MySQL has GROUP_CONCAT(). BOL has a SQLCLR version http://msdn.microsoft.com/en-us/library/ms131056(v=sql.105).aspx and there is an improved versoin in codeplex http://groupconcat.codeplex.com/%5B/url%5D.

  • Hope it's not too late to say, "Great article Eirikur!". I just finished reading this for a second time (there is a problem that I was struggling with that your article helped me solve). I have referred many people to this article as a good example of "How to reap the benefits of Window functions".

    I had one small question... I noticed you used the Latin1_General_BIN collation trick (WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter) in you CSV example but did not use it in dbo_DelimitedSplit8K_LEAD. Is there a specific reason that you did not use it there?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (9/1/2015)


    Hope it's not too late to say, "Great article Eirikur!". I just finished reading this for a second time (there is a problem that I was struggling with that your article helped me solve). I have referred many people to this article as a good example of "How to reap the benefits of Window functions".

    I had one small question... I noticed you used the Latin1_General_BIN collation trick (WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter) in you CSV example but did not use it in dbo_DelimitedSplit8K_LEAD. Is there a specific reason that you did not use it there?

    Not as late as my answer Alan:-D

    The reason for not doing a binary collation on the first part was not changing anything within the original DelimitedSplit8K code apart from introducing the lead function in order to reflect only the changes from charindex to lead.

    😎

  • Good article, well, lengthy. There are better areas to demonstrate the benefit of window functions than parsing strings. For example, when you need to compare rows based on some sequence.

    Using xml functions in SQL server is much easier to parse strings.:-)

  • lucyliu0301 (4/22/2016)


    Good article, well, lengthy. There are better areas to demonstrate the benefit of window functions than parsing strings. For example, when you need to compare rows based on some sequence.

    Using xml functions in SQL server is much easier to parse strings.:-)

    Using XML functions to split CSV is also much slower.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice to see this article again. Still awesome!:-P

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Jeff Moden (4/22/2016)


    lucyliu0301 (4/22/2016)


    Good article, well, lengthy. There are better areas to demonstrate the benefit of window functions than parsing strings. For example, when you need to compare rows based on some sequence.

    Using xml functions in SQL server is much easier to parse strings.:-)

    Using XML functions to split CSV is also much slower.

    Second Jeff's input here, for less than 8000/4000 characters, XML is much slower. Normally I see this assumption where no proper testing has been done, few functions have gone through as rigorous testing as the DelimitedSplit8K/4K functions, thanks to Jeff (cudos Jeff and SSC) which means that on the areas where to demonstrate the benefits of the Window functions, there are hardly any better challenges.

    😎

    Further if one needs more than the number of elements one can fit within 8000/4000 characters string then as Jeff recently posted, "you are doing something wrong";-)

  • Alan.B (4/22/2016)


    Nice to see this article again. Still awesome!:-P

    Thanks Alan! When are you going to do a piece about your interesting work?

    😎

  • Eirikur Eiriksson (4/22/2016)


    Alan.B (4/22/2016)


    Nice to see this article again. Still awesome!:-P

    Thanks Alan! When are you going to do a piece about your interesting work?

    😎

    Very soon sir, I have a few things I just need to clean up a little before submitting.:-D

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks for the article.

  • robert_verell (3/24/2014)


    I like this article if anything for the West Point, MS reference.

    For a moment I thought of the Academy.

  • Having just moved from SQL 2008, I'm finally able to use this improved version of the splitter function.

    I have just one comment, the ISNULL(NULLIF(LEAD(s.N1,1,1) over (order by s.N1)-1,0)-s.N1,8000) seems to be doing a little more work than it needs to - the default value for the LEAD function could be changed to 8000, so you don't need the isnull or nullif. It must make a few microseconds of a difference in performance.
    LEAD(s.N1,1,8000) over (order by s.N1)-1-s.N1

    From the little testing I've done, this does seem to work.

  • Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

  • Lynn Pettis - Friday, November 2, 2018 7:52 AM

    Maybe I missed it in the article, I have read through it twice, on the CSV code why are you using the Latin1_General_BIN collation?

    The reason is that it using a binary collations is more efficient than using a language specific collations.
    😎

Viewing 15 posts - 16 through 30 (of 57 total)

You must be logged in to reply to this topic. Login to reply