Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • peter-757102 (5/6/2011)


    WayneS (5/6/2011)


    Can we get someone to put Peter's and Nadrek's code into a new test script, so that all can easily test it? Just modify Jeff's script to remove all but the DelimitedSplit8K, and add the four newer ones from Peter and Nadrek. If you need a tally table, it needs to be 1-10000, with a CI with 100% fill factor.

    I am working on that as we speak.

    update

    I made the script but am still in the process of optimizing / correcting a few things. Right now, there are too many processes running on my server to get reliable clean data for the graphs, so i will resume tomorrow. Hopefully by then, someone else will have also taken a second look and we can compare findings.

    If you can post the script that you are testing with, I can test it this weekend.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Agh! Sorry for my absence, folks. Thank all of you for covering for me. I haven't tested them for myself, yet, but Peter's modifications make a whole lot of sense and they look very promising! Well done, Peter! Looks like I need to find some smarter dust-bunnies or fewer beer popsicles. 😛

    That's what I love about these discussions... I learn something new every time! 🙂 Even better, what Wayne said looks like it's coming true. We're all going to end up enjoying the ultimate splitter. Keep it up!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • WayneS (5/6/2011)


    Gianluca Sartori (5/6/2011)


    This thread is getting more and more exciting.

    Awesome.

    I know what you mean. I, for one, am looking forward to the revealing of the ultimate DelimitedSplit function. It would be ultra-sweet if it were to handle varchar(max) also!

    You're looking for the ultimate split? One that's 3x faster and handles VARCHAR(MAX) - and NVARCHAR(MAX) for that matter - all in the one function? You know how this post ends... 😀

  • rmechaber (5/6/2011)


    Wow, what a difference 2 years makes:

    Jeff Moden, February 2009, http://www.sqlservercentral.com/Forums/FindPost652105.aspx:

    With the exception of RegEx, damned near all CLR's are simply not the right way to do it.

    Jeff Moden, May 2011, http://www.sqlservercentral.com/articles/Tally+Table/72993/[/url]:

    Make no doubt about it. The current best way to split delimited strings in SQL Server is to use a CLR splitter.

    And no -- this is meant to be neither prod nor provocation.:-D (Also, I don't think I've quoted out of context.)

    My point is that if one of the best and brightest among us (Jeff!) can (1) object to indiscriminate use of a new coding approach and then (B) nevertheless keep an open mind and see the utility of that new approach, then maybe I'll pause before I condemn the next "new thing" that comes along.

    Thanks,

    Rich

    P.S. With acknowledgments to Car Talk....

    Good thing I hedged the bet by saying "damned near all", huh? 🙂

    Just to clarify my position on SQLCLR's so people have a better understanding of where I stand on the subject; When I first heard that MS was going to include the ability to use CLR's in SQL Server, I almost did handsprings. What a wonderful way to do things that either can't be done in SQL Server or shouldn't be. Then, just like when cursors first came out, it seems the larger part of the world started using them for just about everything. As I've told a couple of other people, I had one guy write a CLR to do the Modulus function because he didn't know how to do a Modulus in T-Sql. I have a friend where a developer couldn't figure out how to do a simple "upsert" and wrote a CLR for that. Because I've been looking for them, I've also seen a good number of articles on how to generate random numbers using CLR's because, supposedly, SQL Server isn't any good at it (and there could be nothing further from the truth).

    So here's the bottom line, in case anyone was truly wondering... I love the fact that CLR's exist and that some folks have come up with good, sound uses for them. I hate the fact that so many other people have abused them. Since the latter outweighs the former in number by (so it seems to me) a hundred to one, I end up sounding like an anti-CLR zealot but I'm really not.

    Anyway, thanks for the feedback and the observation, Rich. I very much appreciate you explained that it wasn't a jab (quite the contrary, from what I can see, so thanks!). And, I agree... keep an open mind. Shoot, I may even find a reason to use SSIS, SSAS, or SSRS someday in the future. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/6/2011)


    Shoot, I may even find a reason to use SSIS, SSAS, or SSRS someday in the future. 🙂

    I take it me beating you with them until you submit to the almighty Gates isn't an option in that plan?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This thread has indeed been a pleasure. Not only did I get a great response to the article, but we've collectively gotten a great enhancement to the original code from the article. I've tested Nadrek's and Peter's code both for correct funtionality and performance. My intention is to do the final documentation on the latest enhancement and post it as a replacement for the two functions attached to the article so people can find it easily. (I'll also replace the function in the test code and post the final performance charts)

    Does anyone else have any other suggestions for a speed enhancement before I do that?

    Nadrek and Peter. Thanks a million for taking the time to tweek the code. Wayne, thanks for testing so many people's suggestions and posting the results in graph form. You've just got to love to collaboration that occurred in this thread. Well done, folks! 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Craig Farrell (5/7/2011)


    Jeff Moden (5/6/2011)


    Shoot, I may even find a reason to use SSIS, SSAS, or SSRS someday in the future. 🙂

    I take it me beating you with them until you submit to the almighty Gates isn't an option in that plan?

    Heh... better bring a gun to that knife fight. It took two years for Paul White to convince me that I should even turn CLR on. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQLkiwi (5/6/2011)


    WayneS (5/6/2011)


    Gianluca Sartori (5/6/2011)


    This thread is getting more and more exciting.

    Awesome.

    I know what you mean. I, for one, am looking forward to the revealing of the ultimate DelimitedSplit function. It would be ultra-sweet if it were to handle varchar(max) also!

    You're looking for the ultimate split? One that's 3x faster and handles VARCHAR(MAX) - and NVARCHAR(MAX) for that matter - all in the one function? You know how this post ends... 😀

    I think I do :-D.... it was posted a few pages back?

    However, where I work we are selling the product that we're developing. I'm sure that there will be places that have a "no CLR" rule on their SQL Servers, so it will have to be done in all T-SQL. :crying: 🙁

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Jeff Moden (5/7/2011)


    Does anyone else have any other suggestions for a speed enhancement before I do that?

    :ermm: Have you tested Peso's/Peter's code that he sent you yet? His code normally turns out to be wickedly fast.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (5/7/2011)


    Jeff Moden (5/7/2011)


    Does anyone else have any other suggestions for a speed enhancement before I do that?

    :ermm: Have you tested Peso's/Peter's code that he sent you yet? His code normally turns out to be wickedly fast.

    There was a problem with the code he sent me and I sent it back because I didn't want to presume to correct it. I've not heard back from him, yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Since this was a pretty long article I had to put it aside and read it over the weekend. Just finished. I have to say this is probably most exciting and interesting technical article I've ever read. And I mean not just from T-SQL perspective, but for the literary value. Dust bunnies and beer popsicles certainly helped me to stay alert while comprehending Jeff's wisdom. In a word, awesome! Thank you for sharing with the rest of us.

  • Hey Jeff,

    Well I "worked" till 7 am in the morning doing all sorts of stuff with the code., then took a 2.5 hour sleep and wend to do something else after that...gaming is a serious time hog, and so is SQL ;).

    I can't say something final yet Jeff, most my attempts worked fantastic in the query plan, but failed badly when actual run with the testing code. It seems like the code in the article has just the right computation at the right location for the optimizer not to try to be too smart and ruin performance. It is weird, but certainly not the first time I seen it happen that plans that are simpler and more direct, with 5 times better performance according to the optimizer fail badly once actual run.

    Also, I am looking forward to the results of your benchmarks of the code. The code for a persistent tally table did ran slower on my server, and that was before I tweaked it so that it works also well with order by to guarantee correct output. Some code tweaks were needed for this to get the sort operation out of the query plan, as that would be a real performance killer.

    I am certainly going try some more stuff, but have no time line for it, so consider my code final (for now), lol.

    And thanks again for steering up an interesting article plus discussion, you are good with that sort of thing as you always succeed :).

  • mishaluba (5/7/2011)


    Since this was a pretty long article I had to put it aside and read it over the weekend. Just finished. I have to say this is probably most exciting and interesting technical article I've ever read. And I mean not just from T-SQL perspective, but for the literary value. Dust bunnies and beer popsicles certainly helped me to stay alert while comprehending Jeff's wisdom. In a word, awesome! Thank you for sharing with the rest of us.

    Absolutely my pleasure. Thank you for the wonderful critique. I was honestly quite concerned about how well the metaphoric explanations of what was going through my mind would be received, but it was the only way I could think of to turn an otherwise terribly dry subject into something that might also be a bit of fun to read. I've got to admit, it made writing the article a whole lot more fun, too!

    By the way, the dust bunnies told me to tell you that they're very happy you liked their beer popsicle idea. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ChrisM@home (5/3/2011)


    Jeff, please please please write a book, without changing your style. Give us all an opportunity to repay you for what you've done for us.

    I'll second that suggestion. Every time you write an article I am both entertained and educated. A book from you would be a must have.

    Tom

  • Tom.Thomson (5/7/2011)


    ChrisM@home (5/3/2011)


    Jeff, please please please write a book, without changing your style. Give us all an opportunity to repay you for what you've done for us.

    I'll second that suggestion. Every time you write an article I am both entertained and educated. A book from you would be a must have.

    I'm humbled by your fine endorsement, Tom. I guess I'll just have to hucker down and do it. Of course, a fair bit of material will be repeated. I won't copy word for word (well, based on the response, maybe I'll copy this article) but, as Remi suggested, if we were to put some of my articles together, we'd darned near have a book.

    Thank all of you for the confidence.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 196 through 210 (of 990 total)

You must be logged in to reply to this topic. Login to reply