April 27, 2009 at 2:40 pm
Sorry, but I think I cannot. Currently I have only a x64 system. If you have a x64 system, sure!
This works on both 32 and 64 bit:
(Update: Two people asked me to remove the big script, so go compile your own! :-D)
--
Adam Machanic
whoisactive
April 27, 2009 at 3:30 pm
Thank you to whomever fixed the formatting on this page!
I had to scroll right about 50 metres to find the reply button earlier... 🙂
@Flo: Yes scripting is easy, but if the destination server doesn't have CLR enabled...and so on. I was thinking of those occasions where one needs to split some strings quickly in a SSMS session - not for code that is going to be published and re-used. That's where I use the ROW_NUMBER tally, or a recursive syntax.
Adam Machanic (4/27/2009)
One of the key things is making sure that the output is NVARCHAR(N), not NVARCHAR(MAX)--as it turns out, returning NVARCHAR(MAX) from a SQLCLR TVF is quite a bit slower (something to do with how the data is allocated internally, and I don't pretend to fully understand it).
Yes, long, long, ago, at the start of this thread, Flo's original puzzler involved NVARCHAR(MAX), and apparently this was a core requirement. We seem to have moved back to finding optimal solutions for non-LOB parameter examples in recent days.
The QO/engine in general seems to over-compensate for the potential size and different storage pattern for LOBs - and eliminating parallel plans through use of MAX does not help either.
Last time I raised this (serial plans for CLR routines receiving a LOB) with PSS, it eventually came back as a 'bug' - to be fixed in Katmai (the usual reply at the time). As far as I know, it is still outstanding, and passing a LOB to a CLR routine which otherwise would be parallelizable, always results in a fully serial plan (not just a serial 'island' in the plan like with backward index scans, for example).
You and Flo have done some great work on this overnight! (my time zone).
Cheers,
Paul
April 27, 2009 at 3:33 pm
Paul White (4/27/2009)
Yes, long, long, ago, at the start of this thread, Flo's original puzzler involved NVARCHAR(MAX), and apparently this was a core requirement. We seem to have moved back to finding optimal solutions for non-LOB parameter examples in recent days.
NVARCHAR(MAX) for both input and output? What is the use case? Just to be clear, the issue is only NVARCHAR(MAX) on output -- it's fine to have it as an input.
--
Adam Machanic
whoisactive
April 27, 2009 at 3:41 pm
(Update: Two people asked me to remove the big script, so go compile your own! [BigGrin] )
Actually, I just asked Adam to put it as an attachment instead, rather than a code block. I'm still no wiser as to how to do the
CREATE FUNCTION (sigh)
..Goodnight folks.
Best wishes,
Phil Factor
April 27, 2009 at 3:42 pm
Adam Machanic (4/27/2009)
NVARCHAR(MAX) for both input and output? What is the use case? Just to be clear, the issue is only NVARCHAR(MAX) on output -- it's fine to have it as an input.
The original requirement was on the input IIRC. The details are in the first few posts on this thread. I guess Flo would be needed to confirm the reason for the requirement.
Does having a MAX type on the input not force a serial plan for you then? In some cases, generating a parallel plan would be essential for best performance. A T-SQL solution that was not wrapped in a UDF would be capable of a parallel plan, if the QO so chooses. For certain cases, this could give T-SQL an edge (even if it required running raw script or using a stored procedure).
None of that changes your original point about MAX datatypes on output killing performance of course.
Cheers,
Paul
April 27, 2009 at 3:42 pm
Let me clarify something...
The requirement that the functions need to return those items as NVARCHAR(MAX) is not actual any more. I have noticed that it is not possible to create "the split function" which handles everything... long time ago in the valleys and forests of this thread.
I'm currently testing two different requirements:
One function which returns NVARCHAR(4000) and one that returns NVARCHAR(MAX) for special cases (as parsing Moby Dick :-D).
Greets
Flo
April 27, 2009 at 3:45 pm
Phil Factor (4/27/2009)
(Update: Two people asked me to remove the big script, so go compile your own! [BigGrin] )
Actually, I just asked Adam to put it as an attachment instead, rather than a code block. I'm still no wiser as to how to do the
CREATE FUNCTION (sigh)
..Goodnight folks.
Why don't you just use autodeploy?
Or...
CREATE FUNCTION [faster_split]
(
@instr nvarchar(MAX),
@delimiter nvarchar(4000)
)
RETURNS TABLE(output nvarchar(4000))
AS
EXTERNAL NAME [SqlClassLibrary].[UserDefinedFunctions].[faster_split]
--
Adam Machanic
whoisactive
April 27, 2009 at 3:46 pm
@Phil - Jeff convinced me that posting compiled code was a no-no. Even if tools like .NET reflector are available. Maybe Adam will post a script for you* - his bits and bytes should be trustworthy :laugh: :laugh: :laugh:
@Flo - Impressed that you are not asleep! What was the requirement for MAX originally, even if it no longer applies? Am intrigued.
My particular use case years ago was to make use of a (streaming ;-)) CLR compression function.
edit: *he did - still catching up...
April 27, 2009 at 4:21 pm
Paul White (4/27/2009)
@Flo - Impressed that you are not asleep!
I am impressed that you are already awake :laugh:
What was the requirement for MAX originally, even if it no longer applies? Am intrigued.
The initial requirement sounds as simple that it sounds just stupid. I think the best explanation seems to be "because I'm a perfectionist".
I'm working (currently worked) on a script to script out a complete database DDL (for fun and to publish here). The script has currently about 1,500 lines. Sure, it covers procedures, functions, XML-Schema-Collections, ... Since I completely don't know how people write their code I need to be able to script lines as NVARCHAR(MAX). Just have a look to my post here. If I don't cut the helper string for the VARBINARY transformation into pieces the line would be about 600 characters. That's the reason for my first test data from sys.sql_modules :Whistling:
But this requirement is far away. I learned in this thread that "string splitting" is a very own science. As Lynn wrote in one post he works currently in a project which requires very much string splitting. I think this is a very common requirement which almost everybody already needed. It's just annoying if you google for "TSQL split string" the first 1,500,000 solutions how how great those cursors are. Most of them work even worse than my simple first cursor...
For the moment the DDL script is sleeping because I think this topic is much more important for people. It is a great discussion with all of you great people and I think almost everybody learned something new here (I'm quiet sure that I learned the most 🙂 ).
Since there was no production problem I had, after this input I could show my chief some really great new things and it causes that some parts of our applications will run about 100 times faster in future. So I also had a huge benefit for my business.
I'm a younger guy who tries to become better. I know that I'm one of the best developers in my company. But I also know there are many people who are much better than me - like here! So I'm very thankful for every answer to my questions, corrections of other posts or just some tips!
I really hope that you are not angry because there was initially no production problem to be fixed...
Thanks again to all of you!
Flo
April 27, 2009 at 4:55 pm
Flo...
Since there was no production problem I had, after this input I could show my chief some really great new things and it causes that some parts of our applications will run about 100 times faster in future. So I also had a huge benefit for my business.
I'm a younger guy who tries to become better. I know that I'm one of the best developers in my company. But I also know there are many people who are much better than me - like here! So I'm very thankful for every answer to my questions, corrections of other posts or just some tips!
I really hope that you are not angry because there was initially no production problem to be fixed...
SSC isn't just about helping people fix production problems, it is also about helping people learn more about SQL Server. What is happening on this thread is a testament to this communities commitment to helping people learn and improve their skills and abilities. There is absolutely no reason to apologize for starting this thread. It is an awesome thread and we can't wait (but will have to) for you to write this whole thing up as an article (or two or three).
April 27, 2009 at 5:01 pm
Hey good answers Flo.
Never be afraid of posting theoretical or odd examples of stuff without a production requirement!
And yes there is always someone that knows more about a given subject...which is good because there is always more to know!
BTW camelCase locally and Pascal Case publically ftw!
I do tend to use underscores at module level, but that's probably just an old bad habit.
Cheers,
Paul
April 27, 2009 at 5:01 pm
Hi Lynn
Thanks for your feedback. It counts a lot for me!
Greets
Flo
April 27, 2009 at 5:06 pm
Paul White (4/27/2009)
Hey good answers Flo.Never be afraid of posting theoretical or odd examples of stuff without a production requirement!
And yes there is always someone that knows more about a given subject...which is good because there is always more to know!
Also thanks for your feedback! I think it would be worst to know there is nothing more to know!
BTW camelCase locally and Pascal Case publically ftw!
I do tend to use underscores at module level, but that's probably just an old bad habit.
...and I learned from these habits. I like this syntax!
Greets
Flo
April 27, 2009 at 6:30 pm
Adam Machanic (4/27/2009)
Hi all,I have posted a more scalable SQLCLR string split method here:
Thanks to Paul White for pointing me to this thread. Please consider testing all of these methods UNDER LOAD, using either SQLQueryStress, as I did, or another load tool. The results will be quite interesting, I promise. 😀
Howdy! Long time no see.
Have done any testing of the methods folks have been showing on this thread?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 27, 2009 at 6:36 pm
Florian Reischl (4/27/2009)
Hi AdamThe String Enumerator (I just called it "String Enumerator" for better understanding what it does) performs very good! Especially for more rows. I just tried with some of my test cases:
ItemLength ItemCount ItemRows ItemType ItemDescription Ranking Module Duration(ms) CpuTime
----------- ----------- ----------- -------------- ---------------- -------- -------------------------- ------------ -----------
10 10 10 VARCHAR(111) Fixed Length 1 CLR Simple 00:00:00.010 0
10 10 10 VARCHAR(111) Fixed Length 2 CLR RegEx 00:00:00.010 0
10 10 10 VARCHAR(111) Fixed Length 3 CLR String Enumerator 00:00:00.010 16
10 10 10 VARCHAR(111) Fixed Length 4 Traditional Tally TOP(0) 00:00:00.010 16
10 10 10 VARCHAR(111) Fixed Length 5 Cursor 00:00:00.033 15
10 10 10 VARCHAR(111) Fixed Length 6 CLR char loop 00:00:00.037 0
10 10 10 VARCHAR(111) Fixed Length 7 CLR XML 00:00:00.053 31
10 10 10 VARCHAR(111) Fixed Length 8 Traditional Tally 00:00:00.113 0
10 10 1000 VARCHAR(111) Fixed Length 1 Traditional Tally TOP(0) 00:00:00.203 156
10 10 1000 VARCHAR(111) Fixed Length 2 CLR String Enumerator 00:00:00.443 422
10 10 1000 VARCHAR(111) Fixed Length 3 CLR Simple 00:00:00.467 468
10 10 1000 VARCHAR(111) Fixed Length 4 CLR RegEx 00:00:00.473 468
10 10 1000 VARCHAR(111) Fixed Length 5 CLR char loop 00:00:00.533 516
10 10 1000 VARCHAR(111) Fixed Length 6 Traditional Tally 00:00:00.747 188
10 10 1000 VARCHAR(111) Fixed Length 7 CLR XML 00:00:00.987 954
10 10 1000 VARCHAR(111) Fixed Length 8 Cursor 00:00:01.027 969
100 10 10 VARCHAR(1011) Fixed Length 1 CLR Simple 00:00:00.010 16
100 10 10 VARCHAR(1011) Fixed Length 2 CLR RegEx 00:00:00.010 15
100 10 10 VARCHAR(1011) Fixed Length 3 CLR String Enumerator 00:00:00.013 16
100 10 10 VARCHAR(1011) Fixed Length 4 Traditional Tally TOP(0) 00:00:00.017 16
100 10 10 VARCHAR(1011) Fixed Length 5 CLR char loop 00:00:00.027 0
100 10 10 VARCHAR(1011) Fixed Length 6 Cursor 00:00:00.040 16
100 10 10 VARCHAR(1011) Fixed Length 7 CLR XML 00:00:00.070 15
100 10 10 VARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:00.370 31
100 10 100 VARCHAR(1011) Fixed Length 1 CLR String Enumerator 00:00:00.073 47
100 10 100 VARCHAR(1011) Fixed Length 2 CLR char loop 00:00:00.087 63
100 10 100 VARCHAR(1011) Fixed Length 3 CLR RegEx 00:00:00.087 62
100 10 100 VARCHAR(1011) Fixed Length 4 CLR Simple 00:00:00.090 78
100 10 100 VARCHAR(1011) Fixed Length 5 Traditional Tally TOP(0) 00:00:00.110 94
100 10 100 VARCHAR(1011) Fixed Length 6 CLR XML 00:00:00.150 109
100 10 100 VARCHAR(1011) Fixed Length 7 Cursor 00:00:00.167 140
100 10 100 VARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:00.417 141
100 10 100 NVARCHAR(1011) Fixed Length 1 CLR String Enumerator 00:00:00.057 47
100 10 100 NVARCHAR(1011) Fixed Length 2 CLR Simple 00:00:00.070 63
100 10 100 NVARCHAR(1011) Fixed Length 3 CLR RegEx 00:00:00.070 62
100 10 100 NVARCHAR(1011) Fixed Length 4 Traditional Tally TOP(0) 00:00:00.143 110
100 10 100 NVARCHAR(1011) Fixed Length 5 CLR char loop 00:00:00.147 79
100 10 100 NVARCHAR(1011) Fixed Length 6 CLR XML 00:00:00.150 125
100 10 100 NVARCHAR(1011) Fixed Length 7 Cursor 00:00:00.180 141
100 10 100 NVARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:00.720 109
100 10 1000 NVARCHAR(1011) Fixed Length 1 CLR String Enumerator 00:00:00.567 532
100 10 1000 NVARCHAR(1011) Fixed Length 2 CLR char loop 00:00:00.670 547
100 10 1000 NVARCHAR(1011) Fixed Length 3 CLR Simple 00:00:00.677 609
100 10 1000 NVARCHAR(1011) Fixed Length 4 CLR RegEx 00:00:00.717 641
100 10 1000 NVARCHAR(1011) Fixed Length 5 Traditional Tally TOP(0) 00:00:01.313 1172
100 10 1000 NVARCHAR(1011) Fixed Length 6 Cursor 00:00:01.350 1313
100 10 1000 NVARCHAR(1011) Fixed Length 7 CLR XML 00:00:01.517 1094
100 10 1000 NVARCHAR(1011) Fixed Length 8 Traditional Tally 00:00:01.837 1141
10 10 10 VARCHAR(111) Variable Length 1 Traditional Tally TOP(0) 00:00:00.007 0
10 10 10 VARCHAR(111) Variable Length 2 CLR RegEx 00:00:00.007 16
10 10 10 VARCHAR(111) Variable Length 3 CLR String Enumerator 00:00:00.007 16
10 10 10 VARCHAR(111) Variable Length 4 CLR Simple 00:00:00.010 0
10 10 10 VARCHAR(111) Variable Length 5 CLR char loop 00:00:00.020 0
10 10 10 VARCHAR(111) Variable Length 6 Cursor 00:00:00.030 15
10 10 10 VARCHAR(111) Variable Length 7 CLR XML 00:00:00.047 15
10 10 10 VARCHAR(111) Variable Length 8 Traditional Tally 00:00:00.097 0
10 10 1000 VARCHAR(111) Variable Length 1 Traditional Tally TOP(0) 00:00:00.110 109
10 10 1000 VARCHAR(111) Variable Length 2 CLR String Enumerator 00:00:00.453 453
10 10 1000 VARCHAR(111) Variable Length 3 CLR Simple 00:00:00.457 469
10 10 1000 VARCHAR(111) Variable Length 4 CLR RegEx 00:00:00.483 484
10 10 1000 VARCHAR(111) Variable Length 5 CLR char loop 00:00:00.537 484
10 10 1000 VARCHAR(111) Variable Length 6 Traditional Tally 00:00:00.537 141
10 10 1000 VARCHAR(111) Variable Length 7 Cursor 00:00:00.933 906
10 10 1000 VARCHAR(111) Variable Length 8 CLR XML 00:00:01.120 891
500 10 10 VARCHAR(5011) Variable Length 1 CLR String Enumerator 00:00:00.010 16
500 10 10 VARCHAR(5011) Variable Length 2 CLR Simple 00:00:00.013 15
500 10 10 VARCHAR(5011) Variable Length 3 CLR RegEx 00:00:00.013 0
500 10 10 VARCHAR(5011) Variable Length 4 CLR char loop 00:00:00.020 0
500 10 10 VARCHAR(5011) Variable Length 5 Traditional Tally TOP(0) 00:00:00.027 15
500 10 10 VARCHAR(5011) Variable Length 6 Cursor 00:00:00.043 15
500 10 10 VARCHAR(5011) Variable Length 7 CLR XML 00:00:00.087 16
500 10 10 VARCHAR(5011) Variable Length 8 Traditional Tally 00:00:00.560 32
500 10 100 VARCHAR(5011) Variable Length 1 CLR RegEx 00:00:00.073 62
500 10 100 VARCHAR(5011) Variable Length 2 CLR String Enumerator 00:00:00.073 47
500 10 100 VARCHAR(5011) Variable Length 3 CLR Simple 00:00:00.090 78
500 10 100 VARCHAR(5011) Variable Length 4 CLR char loop 00:00:00.110 78
500 10 100 VARCHAR(5011) Variable Length 5 CLR XML 00:00:00.193 125
500 10 100 VARCHAR(5011) Variable Length 6 Traditional Tally TOP(0) 00:00:00.210 203
500 10 100 VARCHAR(5011) Variable Length 7 Cursor 00:00:00.217 188
500 10 100 VARCHAR(5011) Variable Length 8 Traditional Tally 00:00:00.440 219
Greets
Flo
PS: I just formatted the code more to a C# look in my solution. DBAs writing C#, pew... 😀
Flo, which test generator are you using for this. Just gotta make sure.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 271 through 285 (of 522 total)
You must be logged in to reply to this topic. Login to reply