May 13, 2008 at 8:59 am
Splitting a string may be trivial in T-SQL but that doesn't mean it's the best tool for the job. The CLR is vastly faster for splitting strings the longer your string gets. I have written a set of stored procedures that allows my application to do batch updates instead of always calling CRUD procs one at a time. I see a lot of examples of using XML for multi-valued parameters, but for my money the CLR is by far superior and more scaleable. I wrote an article comparing splitting strings between T-SQL and CLR and the differences were very interesting. Due to a small bug in a test case I wrote I found out I was able to split a string in seconds using the CLR which took 17 minutes using T-SQL. Even for trivial splits the CLR is faster. Here's a link to the article the test results are at the bottom: http://www.codeproject.com/KB/database/TableValuedFnsAsArrays.aspx
May 13, 2008 at 9:33 am
developmentalmadness (5/13/2008)
You're proving one of my points: that CLR functions can and often are much faster than T-SQL functions. That being said - your article isn't using anywhere near the fastest method for splitting strings in T-SQL.
Try this one on for size:
declare @g datetime
DECLARE @Parameter VARCHAR(8000)
SET @Parameter = replicate('Element01,Element02,Element03,Element04,Element05',150)
--===== Create a table to store the results in
DECLARE @Elements TABLE
(
Number INT IDENTITY(1,1), --Order it appears in original string
Value VARCHAR(8000) --The string value of the element
)
set @g=getdate();
--===== Add start and end commas to the Parameter so we can handle
-- single elements
SET @Parameter = ','+@Parameter +','
--===== Join the Tally table to the string at the character level and
-- when we find a comma, insert what's between that command and
-- the next comma into the Elements table
INSERT INTO @Elements (Value)
SELECT SUBSTRING(@Parameter,N+1,CHARINDEX(',',@Parameter,N+1)-N-1)
FROM dbo.Tally
WHERE N < LEN(@Parameter)
AND SUBSTRING(@Parameter,N,1) = ','
--Notice how we find the comma
SELECT * FROM @Elements
select DATEDIFF(MS,@G,GETDATE())
set @g=getdate();
select * from dbo.regexsplit(@parameter,',')
select DATEDIFF(MS,@G,GETDATE())
At very least the ratio is a little more adjusted. But still - like was mentioned before - Regex (split is one of the meothd using Regex) being used this way tends to perform VERY well, and it then becomes a factor of how long the chain is.
(By the way - Jeff - who just waded into the conversation - has a very good article on the Tally table method, which I unceremoniously stole for this purpose).
http://www.sqlservercentral.com/articles/TSQL/62867/[/url]
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 13, 2008 at 9:36 am
developmentalmadness (5/13/2008)
I agree that a CLR split is probably faster than an XML split... heh... just about everything is faster than an XML split. 😀
I took a peek at your fine article and didn't see the "big" test you did... and, you tested against a WHILE loop, which I'd never use for production code...
Would it be possible for you to attach the big string that took 17 minutes, please? I'd like to give it a whirl. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 10:13 am
I agree that a CLR split is probably faster than an XML split... heh... just about everything is faster than an XML split.
Yeah, comparing CLR w/ XML wasn't really the most convincing argument. I'm really not a fan of XML in the database - I'm just not comfortable with the performance hit.
I took a peek at your fine article and didn't see the "big" test you did... and, you tested against a WHILE loop, which I'd never use for production code...
At the bottom of the article you'll see this:
Test Project (1000 Iterations)
- T-SQL 17 min 20 sec 765 ms
- CLR 0 min 3 sec 453 ms
That's what I was referring to. The "bug" that caused this was that I forgot to re-initalize my string between iterations. So my string grew to be very large by the final iteration. The loop created a string of 25 integers separated by commas. So if you took the final string it would have been a string of 25,000 integers. But it was more a case of a string that started as a list of 25 items and grew by 25 more items with each iteration. So it turns out you end up with a string which is 25 x 25!. Feel free to correct my math, it may very well be wrong, but the string would still be pretty large. It would be easier for you to construct the string yourself programmatically. I can't imagine the flaming I'd get for posting that string on the forum :D.
As for production code, I wasn't trying to write something for production. I agree, when I seen database hits from w/in any loop in an application it's like nails on chalkboard. I don't even like to use cursors. Which is why I use a CLR split to do batch updates when I need to update more than one record.
The goal of the test project in the article was to demonstrate just speed, not a true concurrent example. That would have been more true to real world. But I do believe that if the database is doing nothing else but processing 1,000 - 10,000 sequential requests the first method to finish can still be declared faster and more scaleable, barring any huge difference in resource consumption.
May 13, 2008 at 10:25 am
Regarding string splitting in CLR vs. T-SQL, check out Paul Nielsen's blog. One of his lessons learned is exactly this issue: http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/12/10-lessons-from-35k-tps.aspx
Would be interesting to see how the hardware profiles influence the decision to go one way or the other.
TroyK
May 13, 2008 at 10:30 am
Matt,
Thanks for that article link. I seem to remember seeing a tally table at some point when I first started programming, but didn't understand it, moved on and forgot about it. But now that I understand it I now have even fewer reasons to use loops in T-SQL.
Thanks!
May 13, 2008 at 10:49 am
cs_troyk (5/13/2008)
Regarding string splitting in CLR vs. T-SQL, check out Paul Nielsen's blog. One of his lessons learned is exactly this issue: http://sqlblog.com/blogs/paul_nielsen/archive/2007/12/12/10-lessons-from-35k-tps.aspxWould be interesting to see how the hardware profiles influence the decision to go one way or the other.
TroyK
Wow, that really goes to prove that you gotta test accurately. After the testing I did I just assumed that the CLR was just so much faster I didn't think it could ever have a problem scaling. But then I've never written anything that had to run at 35k tps. The concurrency issue really exposes a problem with splitting strings. I'm really curious what the tipping point is.
May 13, 2008 at 11:12 am
JohnG (5/13/2008)
inexplicably not able to handle outputting Varchar(MAX), being stuck instead @ varchar(8000), so you can't play with the full specturm should you need to.
That's a real bad limitation! Thanks for the heads-up. In our case it would be even worse as we support UNICODE (NVARCHAR) which then would be limited to 4,000 characters.
Interesting limitation as strings are UNICODE in .NET C# with no limitations! Why the screwy limitation when imbedded into SQL Server?
This is certainly not true; this limitation does not exist. CLR functions can receive and output NVARCHAR(MAX). In fact, you only have two string options: NVARCHAR(4000) handled as SqlString and NVARCHAR(MAX) handled as SqlChars. The trick with SqlChars is that it is a char[] (Character Array) that, in most cases, needs to be converted to a string via something like the following:
string __StringVersion = new String(InputParameter.Value);
The other trick is how to return a SqlChars. I do not have the code in front of me that I use, but I believe it is something along the lines of:
return new SqlChars(__StringVersion.ToCharArray());
Now, to the question at hand: Why aren't people using the CLR? (which then seems to imply to many people "Should the CLR even exist?")
Well, originally the debate seemed to be centered around when it was appropriate to use the CLR with the idea being that the logic of a process would either be 100% T-SQL or 100% CLR. In my experience there does not seem to be a huge need for 100% CLR based logic but the real value is in extending what T-SQL can do by exposing the power of a real programming language; the typical examples are Regular Expressions and String Split. Those are good examples and the String Splitt may or may not be more performant in CLR vs T-SQL natively (I have seen a difference where using NVARCHAR(4000) / SqlString is faster but NVARCHAR(MAX) / SqlChars is slower), but there is so much more that can be done. Let's face it, T-SQL is a pretty sparse language. For this reason, I created a CLR library of functions to augment T-SQL called SQL# (http://www.SQLsharp.com/). True, some of the functions I created are just thought experiments but some are quite handy. While some argue that logic shouldn't be in the DB anyway, the fact is that is how it goes for many projects (LINQ might get newer projects to re-think this) so why not give the DB programmer a richer set of tools to use? Some will argue that it is a security risk but the security is only what you give it. The CLR need not extend outside of the database and within the database it only has the rights given to it just like any proc. Some will say that it is not portable since other DBs do not have it (Oracle, DB2, PostgreSQL, SQL 2000, etc.). I find this very unconvincing since most projects do not port to another DB. I have heard of it happening but never actually seen it. And if it does happen, then fine, it can be dealt with along with all of the other issues arising from porting datatypes, etc. But again, when you have a project that is based in a particular vendor's DB and it will not exist on another, then by chosing to stay ANSI compliant and not making use of the DB's extended functions and datatypes, then you lose out on gaining the real power of that platform. No, SQL CLR is not a be-all, end-all solution for anything, but it certainly can help in many situations when used appropriately.
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
May 13, 2008 at 11:13 am
I went to a CLR session at DevTeach recently, and to boil down what the presenter said,
- the CLR is slower that T-SQL in most cases,
- has the ability to take down a server with poor code (and unless the DBA is an expert in the language the assembly was written in, it's not going to be caught)
- is not a feature that can be turned off once a single C# assembly is written; you'll be trying to close the barn door after the horse has left
I agree with the previous poster - this is a "we do it too" feature, trying to keep up with Oracle and their built in JVM. As an Oracle DBA, I found the JVM to be slow and not terribly useful. I would expect to find the same from the CLR.
One last point - you are version locked with the regards to the run-time. Are bug fixes and general improvements ported to the SQL Server CLR?
May 13, 2008 at 11:23 am
Pardon the redundacy, but does anyone have knowledge of, or better yet experience, porting 32 bit CLR functions to a 64 bit SQL Server.
I'm pretty sure that the 64 bit embedded Common Runtime is not going to recompile the DLLs. Will the server still be able run the 32 bit Common Runtime or will the 64 bit runtime support the 32 bit DLLs?
May 13, 2008 at 11:33 am
Because CLR assemblies are not compiled to machine language there is no problem when moving them between 32 and 64 bit. The assemblies will be JIT compiled by the runtime when they are loaded so you don't need to do anything differently regardless of the architecture.
May 13, 2008 at 11:36 am
Tom Garth (5/13/2008)
Pardon the redundacy, but does anyone have knowledge of, or better yet experience, porting 32 bit CLR functions to a 64 bit SQL Server.I'm pretty sure that the 64 bit embedded Common Runtime is not going to recompile the DLLs. Will the server still be able run the 32 bit Common Runtime or will the 64 bit runtime support the 32 bit DLLs?
Unfortunately I do not know for certain since I do not have 64-bit hardware to play with, BUT, my assumption is that it will not matter since the beauty of .Net (and also of Java) is that they do not technically run on the hardware; they are intermediate code (p-code in .Net and byte-code in Java) that is executed by a run-time application that does run on the hardware. This is where the CLR (Common Language Runtime) comes in: it runs natively and handles and p-code (the Assembly) given to it regardless of language (C#, J#, VB.Net, etc.). So the issues of portability should be on the shoulders of who provides the run-time (in the .Net case it is Micro$oft) and there are several vendors producing JVM (Java Virtual Machines) for Java. I hope that I am correct in all of this :Whistling:.
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
May 13, 2008 at 11:37 am
- the Table-valued functions need to materialize their entire recordset before being able to return anything, so they can quickly "swamp" the memory area.
This is true for T-SQL TVFs but not for CLR TVFs. The CLR actually "streams" the results. From BOL:
Differences Between Transact-SQL and CLR Table-Valued Functions
Transact-SQL TVFs materialize the results of calling the function into an intermediate table. Since they use an intermediate table, they can support constraints and unique indexes over the results. These features can be extremely useful when large results are returned.
In contrast, CLR TVFs represent a streaming alternative. There is no requirement that the entire set of results be materialized in a single table. The IEnumerable object returned by the managed function is directly called by the execution plan of the query that calls the TVF, and the results are consumed in an incremental manner. This streaming model ensures that results can be consumed immediately after the first row is available, instead of waiting for the entire table to be populated. It is also a better alternative if you have very large numbers of rows returned, because they do not have to be materialized in memory as a whole. For example, a managed TVF could be used to parse a text file and return each line as a row.
May 13, 2008 at 11:41 am
Solomon Rutzky (5/13/2008)
JohnG (5/13/2008)
inexplicably not able to handle outputting Varchar(MAX), being stuck instead @ varchar(8000), so you can't play with the full specturm should you need to.
That's a real bad limitation! Thanks for the heads-up. In our case it would be even worse as we support UNICODE (NVARCHAR) which then would be limited to 4,000 characters.
Interesting limitation as strings are UNICODE in .NET C# with no limitations! Why the screwy limitation when imbedded into SQL Server?
This is certainly not true; this limitation does not exist. CLR functions can receive and output NVARCHAR(MAX). In fact, you only have two string options: NVARCHAR(4000) handled as SqlString and NVARCHAR(MAX) handled as SqlChars. The trick with SqlChars is that it is a char[] (Character Array) that, in most cases, needs to be converted to a string via something like the following:
string __StringVersion = new String(InputParameter.Value);
The other trick is how to return a SqlChars. I do not have the code in front of me that I use, but I believe it is something along the lines of:
return new SqlChars(__StringVersion.ToCharArray());
Now, to the question at hand: Why aren't people using the CLR? (which then seems to imply to many people "Should the CLR even exist?")
Well, originally the debate seemed to be centered around when it was appropriate to use the CLR with the idea being that the logic of a process would either be 100% T-SQL or 100% CLR. In my experience there does not seem to be a huge need for 100% CLR based logic but the real value is in extending what T-SQL can do by exposing the power of a real programming language; the typical examples are Regular Expressions and String Split. Those are good examples and the String Splitt may or may not be more performant in CLR vs T-SQL natively (I have seen a difference where using NVARCHAR(4000) / SqlString is faster but NVARCHAR(MAX) / SqlChars is slower), but there is so much more that can be done. Let's face it, T-SQL is a pretty sparse language. For this reason, I created a CLR library of functions to augment T-SQL called SQL# (http://www.SQLsharp.com/). True, some of the functions I created are just thought experiments but some are quite handy. While some argue that logic shouldn't be in the DB anyway, the fact is that is how it goes for many projects (LINQ might get newer projects to re-think this) so why not give the DB programmer a richer set of tools to use? Some will argue that it is a security risk but the security is only what you give it. The CLR need not extend outside of the database and within the database it only has the rights given to it just like any proc. Some will say that it is not portable since other DBs do not have it (Oracle, DB2, PostgreSQL, SQL 2000, etc.). I find this very unconvincing since most projects do not port to another DB. I have heard of it happening but never actually seen it. And if it does happen, then fine, it can be dealt with along with all of the other issues arising from porting datatypes, etc. But again, when you have a project that is based in a particular vendor's DB and it will not exist on another, then by chosing to stay ANSI compliant and not making use of the DB's extended functions and datatypes, then you lose out on gaining the real power of that platform. No, SQL CLR is not a be-all, end-all solution for anything, but it certainly can help in many situations when used appropriately.
All right - let me wipe the egg off my face for a second. You're right - I overlooked the SQLChars as an output parameter for Functions. I confused this with the User-defined Aggregates limit, which DOES only put out 8000 (thanks to its MaxByteSize property).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
May 13, 2008 at 11:43 am
developmentalmadness (5/13/2008)
- the Table-valued functions need to materialize their entire recordset before being able to return anything, so they can quickly "swamp" the memory area.
This is true for T-SQL TVFs but not for CLR TVFs. The CLR actually "streams" the results. From BOL:
Differences Between Transact-SQL and CLR Table-Valued Functions
Transact-SQL TVFs materialize the results of calling the function into an intermediate table. Since they use an intermediate table, they can support constraints and unique indexes over the results. These features can be extremely useful when large results are returned.
In contrast, CLR TVFs represent a streaming alternative. There is no requirement that the entire set of results be materialized in a single table. The IEnumerable object returned by the managed function is directly called by the execution plan of the query that calls the TVF, and the results are consumed in an incremental manner. This streaming model ensures that results can be consumed immediately after the first row is available, instead of waiting for the entire table to be populated. It is also a better alternative if you have very large numbers of rows returned, because they do not have to be materialized in memory as a whole. For example, a managed TVF could be used to parse a text file and return each line as a row.
http://msdn.microsoft.com/en-us/library/ms131103.aspx%5B/quote%5D
I do read the above documentation as you do, but it is somewhat true (maybe just in a technical sense) that the CLR TVFs DO unfortunately need to build the entire result-set before returning it. Now, this might be semantics in terms of how the internals of SQL server merge those results with the calling query, but in the .Net code, you do need to store your entire data structure while the function is executing and only at the very end of the function does it return the results. So, yes, it does consume as much memory as the result set will need before it can return it even if it does it more efficiently than a T-SQL TVF. For this reason I have sometimes used a CLR Proc which can send a result row as you create it and hence consume very little memory.
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
Viewing 15 posts - 31 through 45 (of 57 total)
You must be logged in to reply to this topic. Login to reply