Performance issue with tally solution

  • Sorry, but I think I cannot. Currently I have only a x64 system. If you have a x64 system, sure!

    This works on both 32 and 64 bit:

    (Update: Two people asked me to remove the big script, so go compile your own! :-D)

    --
    Adam Machanic
    whoisactive

  • Thank you to whomever fixed the formatting on this page!

    I had to scroll right about 50 metres to find the reply button earlier... 🙂

    @Flo: Yes scripting is easy, but if the destination server doesn't have CLR enabled...and so on. I was thinking of those occasions where one needs to split some strings quickly in a SSMS session - not for code that is going to be published and re-used. That's where I use the ROW_NUMBER tally, or a recursive syntax.

    Adam Machanic (4/27/2009)


    One of the key things is making sure that the output is NVARCHAR(N), not NVARCHAR(MAX)--as it turns out, returning NVARCHAR(MAX) from a SQLCLR TVF is quite a bit slower (something to do with how the data is allocated internally, and I don't pretend to fully understand it).

    Yes, long, long, ago, at the start of this thread, Flo's original puzzler involved NVARCHAR(MAX), and apparently this was a core requirement. We seem to have moved back to finding optimal solutions for non-LOB parameter examples in recent days.

    The QO/engine in general seems to over-compensate for the potential size and different storage pattern for LOBs - and eliminating parallel plans through use of MAX does not help either.

    Last time I raised this (serial plans for CLR routines receiving a LOB) with PSS, it eventually came back as a 'bug' - to be fixed in Katmai (the usual reply at the time). As far as I know, it is still outstanding, and passing a LOB to a CLR routine which otherwise would be parallelizable, always results in a fully serial plan (not just a serial 'island' in the plan like with backward index scans, for example).

    You and Flo have done some great work on this overnight! (my time zone).

    Cheers,

    Paul

  • Paul White (4/27/2009)


    Yes, long, long, ago, at the start of this thread, Flo's original puzzler involved NVARCHAR(MAX), and apparently this was a core requirement. We seem to have moved back to finding optimal solutions for non-LOB parameter examples in recent days.

    NVARCHAR(MAX) for both input and output? What is the use case? Just to be clear, the issue is only NVARCHAR(MAX) on output -- it's fine to have it as an input.

    --
    Adam Machanic
    whoisactive

  • (Update: Two people asked me to remove the big script, so go compile your own! [BigGrin] )

    Actually, I just asked Adam to put it as an attachment instead, rather than a code block. I'm still no wiser as to how to do the

    CREATE FUNCTION (sigh)

    ..Goodnight folks.

    Best wishes,
    Phil Factor

  • Adam Machanic (4/27/2009)


    NVARCHAR(MAX) for both input and output? What is the use case? Just to be clear, the issue is only NVARCHAR(MAX) on output -- it's fine to have it as an input.

    The original requirement was on the input IIRC. The details are in the first few posts on this thread. I guess Flo would be needed to confirm the reason for the requirement.

    Does having a MAX type on the input not force a serial plan for you then? In some cases, generating a parallel plan would be essential for best performance. A T-SQL solution that was not wrapped in a UDF would be capable of a parallel plan, if the QO so chooses. For certain cases, this could give T-SQL an edge (even if it required running raw script or using a stored procedure).

    None of that changes your original point about MAX datatypes on output killing performance of course.

    Cheers,

    Paul

  • Let me clarify something...

    The requirement that the functions need to return those items as NVARCHAR(MAX) is not actual any more. I have noticed that it is not possible to create "the split function" which handles everything... long time ago in the valleys and forests of this thread.

    I'm currently testing two different requirements:

    One function which returns NVARCHAR(4000) and one that returns NVARCHAR(MAX) for special cases (as parsing Moby Dick :-D).

    Greets

    Flo

  • Phil Factor (4/27/2009)


    (Update: Two people asked me to remove the big script, so go compile your own! [BigGrin] )

    Actually, I just asked Adam to put it as an attachment instead, rather than a code block. I'm still no wiser as to how to do the

    CREATE FUNCTION (sigh)

    ..Goodnight folks.

    Why don't you just use autodeploy?

    Or...

    CREATE FUNCTION [faster_split]

    (

    @instr nvarchar(MAX),

    @delimiter nvarchar(4000)

    )

    RETURNS TABLE(output nvarchar(4000))

    AS

    EXTERNAL NAME [SqlClassLibrary].[UserDefinedFunctions].[faster_split]

    --
    Adam Machanic
    whoisactive

  • @Phil - Jeff convinced me that posting compiled code was a no-no. Even if tools like .NET reflector are available. Maybe Adam will post a script for you* - his bits and bytes should be trustworthy :laugh: :laugh: :laugh:

    @Flo - Impressed that you are not asleep! What was the requirement for MAX originally, even if it no longer applies? Am intrigued.

    My particular use case years ago was to make use of a (streaming ;-)) CLR compression function.

    edit: *he did - still catching up...

  • Paul White (4/27/2009)


    @Flo - Impressed that you are not asleep!

    I am impressed that you are already awake :laugh:

    What was the requirement for MAX originally, even if it no longer applies? Am intrigued.

    The initial requirement sounds as simple that it sounds just stupid. I think the best explanation seems to be "because I'm a perfectionist".

    I'm working (currently worked) on a script to script out a complete database DDL (for fun and to publish here). The script has currently about 1,500 lines. Sure, it covers procedures, functions, XML-Schema-Collections, ... Since I completely don't know how people write their code I need to be able to script lines as NVARCHAR(MAX). Just have a look to my post here. If I don't cut the helper string for the VARBINARY transformation into pieces the line would be about 600 characters. That's the reason for my first test data from sys.sql_modules :Whistling:

    But this requirement is far away. I learned in this thread that "string splitting" is a very own science. As Lynn wrote in one post he works currently in a project which requires very much string splitting. I think this is a very common requirement which almost everybody already needed. It's just annoying if you google for "TSQL split string" the first 1,500,000 solutions how how great those cursors are. Most of them work even worse than my simple first cursor...

    For the moment the DDL script is sleeping because I think this topic is much more important for people. It is a great discussion with all of you great people and I think almost everybody learned something new here (I'm quiet sure that I learned the most 🙂 ).

    Since there was no production problem I had, after this input I could show my chief some really great new things and it causes that some parts of our applications will run about 100 times faster in future. So I also had a huge benefit for my business.

    I'm a younger guy who tries to become better. I know that I'm one of the best developers in my company. But I also know there are many people who are much better than me - like here! So I'm very thankful for every answer to my questions, corrections of other posts or just some tips!

    I really hope that you are not angry because there was initially no production problem to be fixed...

    Thanks again to all of you!

    Flo

  • Flo...

    Since there was no production problem I had, after this input I could show my chief some really great new things and it causes that some parts of our applications will run about 100 times faster in future. So I also had a huge benefit for my business.

    I'm a younger guy who tries to become better. I know that I'm one of the best developers in my company. But I also know there are many people who are much better than me - like here! So I'm very thankful for every answer to my questions, corrections of other posts or just some tips!

    I really hope that you are not angry because there was initially no production problem to be fixed...

    SSC isn't just about helping people fix production problems, it is also about helping people learn more about SQL Server. What is happening on this thread is a testament to this communities commitment to helping people learn and improve their skills and abilities. There is absolutely no reason to apologize for starting this thread. It is an awesome thread and we can't wait (but will have to) for you to write this whole thing up as an article (or two or three).

  • Hey good answers Flo.

    Never be afraid of posting theoretical or odd examples of stuff without a production requirement!

    And yes there is always someone that knows more about a given subject...which is good because there is always more to know!

    BTW camelCase locally and Pascal Case publically ftw!

    I do tend to use underscores at module level, but that's probably just an old bad habit.

    Cheers,

    Paul

  • Hi Lynn

    Thanks for your feedback. It counts a lot for me!

    Greets

    Flo

  • Paul White (4/27/2009)


    Hey good answers Flo.

    Never be afraid of posting theoretical or odd examples of stuff without a production requirement!

    And yes there is always someone that knows more about a given subject...which is good because there is always more to know!

    Also thanks for your feedback! I think it would be worst to know there is nothing more to know!

    BTW camelCase locally and Pascal Case publically ftw!

    I do tend to use underscores at module level, but that's probably just an old bad habit.

    ...and I learned from these habits. I like this syntax!

    Greets

    Flo

  • Adam Machanic (4/27/2009)


    Hi all,

    I have posted a more scalable SQLCLR string split method here:

    http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx

    Thanks to Paul White for pointing me to this thread. Please consider testing all of these methods UNDER LOAD, using either SQLQueryStress, as I did, or another load tool. The results will be quite interesting, I promise. 😀

    Howdy! Long time no see.

    Have done any testing of the methods folks have been showing on this thread?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Florian Reischl (4/27/2009)


    Hi Adam

    The String Enumerator (I just called it "String Enumerator" for better understanding what it does) performs very good! Especially for more rows. I just tried with some of my test cases:

    ItemLength ItemCount ItemRows ItemType ItemDescription Ranking Module Duration(ms) CpuTime

    ----------- ----------- ----------- -------------- ---------------- -------- -------------------------- ------------ -----------

    10 10 10 VARCHAR(111) Fixed Length 1 CLR Simple 00:00:00.010 0

    10 10 10 VARCHAR(111) Fixed Length 2 CLR RegEx 00:00:00.010 0

    10 10 10 VARCHAR(111) Fixed Length 3 CLR String Enumerator 00:00:00.010 16

    10 10 10 VARCHAR(111) Fixed Length 4 Traditional Tally TOP(0) 00:00:00.010 16

    10 10 10 VARCHAR(111) Fixed Length 5 Cursor 00:00:00.033 15

    10 10 10 VARCHAR(111) Fixed Length 6 CLR char loop 00:00:00.037 0

    10 10 10 VARCHAR(111) Fixed Length 7 CLR XML 00:00:00.053 31

    10 10 10 VARCHAR(111) Fixed Length 8 Traditional Tally 00:00:00.113 0

    10 10 1000 VARCHAR(111) Fixed Length 1 Traditional Tally TOP(0) 00:00:00.203 156

    10 10 1000 VARCHAR(111) Fixed Length 2 CLR String Enumerator 00:00:00.443 422

    10 10 1000 VARCHAR(111) Fixed Length 3 CLR Simple 00:00:00.467 468

    10 10 1000 VARCHAR(111) Fixed Length 4 CLR RegEx 00:00:00.473 468

    10 10 1000 VARCHAR(111) Fixed Length 5 CLR char loop 00:00:00.533 516

    10 10 1000 VARCHAR(111) Fixed Length 6 Traditional Tally 00:00:00.747 188

    10 10 1000 VARCHAR(111) Fixed Length 7 CLR XML 00:00:00.987 954

    10 10 1000 VARCHAR(111) Fixed Length 8 Cursor 00:00:01.027 969

    100 10 10 VARCHAR(1011) Fixed Length 1 CLR Simple 00:00:00.010 16

    100 10 10 VARCHAR(1011) Fixed Length 2 CLR RegEx 00:00:00.010 15

    100 10 10 VARCHAR(1011) Fixed Length 3 CLR String Enumerator 00:00:00.013 16

    100 10 10 VARCHAR(1011) Fixed Length 4 Traditional Tally TOP(0) 00:00:00.017 16

    100 10 10 VARCHAR(1011) Fixed Length 5 CLR char loop 00:00:00.027 0

    100 10 10 VARCHAR(1011) Fixed Length 6 Cursor 00:00:00.040 16

    100 10 10 VARCHAR(1011) Fixed Length 7 CLR XML 00:00:00.070 15

    100 10 10 VARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:00.370 31

    100 10 100 VARCHAR(1011) Fixed Length 1 CLR String Enumerator 00:00:00.073 47

    100 10 100 VARCHAR(1011) Fixed Length 2 CLR char loop 00:00:00.087 63

    100 10 100 VARCHAR(1011) Fixed Length 3 CLR RegEx 00:00:00.087 62

    100 10 100 VARCHAR(1011) Fixed Length 4 CLR Simple 00:00:00.090 78

    100 10 100 VARCHAR(1011) Fixed Length 5 Traditional Tally TOP(0) 00:00:00.110 94

    100 10 100 VARCHAR(1011) Fixed Length 6 CLR XML 00:00:00.150 109

    100 10 100 VARCHAR(1011) Fixed Length 7 Cursor 00:00:00.167 140

    100 10 100 VARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:00.417 141

    100 10 100 NVARCHAR(1011) Fixed Length 1 CLR String Enumerator 00:00:00.057 47

    100 10 100 NVARCHAR(1011) Fixed Length 2 CLR Simple 00:00:00.070 63

    100 10 100 NVARCHAR(1011) Fixed Length 3 CLR RegEx 00:00:00.070 62

    100 10 100 NVARCHAR(1011) Fixed Length 4 Traditional Tally TOP(0) 00:00:00.143 110

    100 10 100 NVARCHAR(1011) Fixed Length 5 CLR char loop 00:00:00.147 79

    100 10 100 NVARCHAR(1011) Fixed Length 6 CLR XML 00:00:00.150 125

    100 10 100 NVARCHAR(1011) Fixed Length 7 Cursor 00:00:00.180 141

    100 10 100 NVARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:00.720 109

    100 10 1000 NVARCHAR(1011) Fixed Length 1 CLR String Enumerator 00:00:00.567 532

    100 10 1000 NVARCHAR(1011) Fixed Length 2 CLR char loop 00:00:00.670 547

    100 10 1000 NVARCHAR(1011) Fixed Length 3 CLR Simple 00:00:00.677 609

    100 10 1000 NVARCHAR(1011) Fixed Length 4 CLR RegEx 00:00:00.717 641

    100 10 1000 NVARCHAR(1011) Fixed Length 5 Traditional Tally TOP(0) 00:00:01.313 1172

    100 10 1000 NVARCHAR(1011) Fixed Length 6 Cursor 00:00:01.350 1313

    100 10 1000 NVARCHAR(1011) Fixed Length 7 CLR XML 00:00:01.517 1094

    100 10 1000 NVARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:01.837 1141

    10 10 10 VARCHAR(111) Variable Length 1 Traditional Tally TOP(0) 00:00:00.007 0

    10 10 10 VARCHAR(111) Variable Length 2 CLR RegEx 00:00:00.007 16

    10 10 10 VARCHAR(111) Variable Length 3 CLR String Enumerator 00:00:00.007 16

    10 10 10 VARCHAR(111) Variable Length 4 CLR Simple 00:00:00.010 0

    10 10 10 VARCHAR(111) Variable Length 5 CLR char loop 00:00:00.020 0

    10 10 10 VARCHAR(111) Variable Length 6 Cursor 00:00:00.030 15

    10 10 10 VARCHAR(111) Variable Length 7 CLR XML 00:00:00.047 15

    10 10 10 VARCHAR(111) Variable Length 8 Traditional Tally 00:00:00.097 0

    10 10 1000 VARCHAR(111) Variable Length 1 Traditional Tally TOP(0) 00:00:00.110 109

    10 10 1000 VARCHAR(111) Variable Length 2 CLR String Enumerator 00:00:00.453 453

    10 10 1000 VARCHAR(111) Variable Length 3 CLR Simple 00:00:00.457 469

    10 10 1000 VARCHAR(111) Variable Length 4 CLR RegEx 00:00:00.483 484

    10 10 1000 VARCHAR(111) Variable Length 5 CLR char loop 00:00:00.537 484

    10 10 1000 VARCHAR(111) Variable Length 6 Traditional Tally 00:00:00.537 141

    10 10 1000 VARCHAR(111) Variable Length 7 Cursor 00:00:00.933 906

    10 10 1000 VARCHAR(111) Variable Length 8 CLR XML 00:00:01.120 891

    500 10 10 VARCHAR(5011) Variable Length 1 CLR String Enumerator 00:00:00.010 16

    500 10 10 VARCHAR(5011) Variable Length 2 CLR Simple 00:00:00.013 15

    500 10 10 VARCHAR(5011) Variable Length 3 CLR RegEx 00:00:00.013 0

    500 10 10 VARCHAR(5011) Variable Length 4 CLR char loop 00:00:00.020 0

    500 10 10 VARCHAR(5011) Variable Length 5 Traditional Tally TOP(0) 00:00:00.027 15

    500 10 10 VARCHAR(5011) Variable Length 6 Cursor 00:00:00.043 15

    500 10 10 VARCHAR(5011) Variable Length 7 CLR XML 00:00:00.087 16

    500 10 10 VARCHAR(5011) Variable Length 8 Traditional Tally 00:00:00.560 32

    500 10 100 VARCHAR(5011) Variable Length 1 CLR RegEx 00:00:00.073 62

    500 10 100 VARCHAR(5011) Variable Length 2 CLR String Enumerator 00:00:00.073 47

    500 10 100 VARCHAR(5011) Variable Length 3 CLR Simple 00:00:00.090 78

    500 10 100 VARCHAR(5011) Variable Length 4 CLR char loop 00:00:00.110 78

    500 10 100 VARCHAR(5011) Variable Length 5 CLR XML 00:00:00.193 125

    500 10 100 VARCHAR(5011) Variable Length 6 Traditional Tally TOP(0) 00:00:00.210 203

    500 10 100 VARCHAR(5011) Variable Length 7 Cursor 00:00:00.217 188

    500 10 100 VARCHAR(5011) Variable Length 8 Traditional Tally 00:00:00.440 219

    Greets

    Flo

    PS: I just formatted the code more to a C# look in my solution. DBAs writing C#, pew... 😀

    Flo, which test generator are you using for this. Just gotta make sure.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 271 through 285 (of 522 total)

You must be logged in to reply to this topic. Login to reply