September 22, 2010 at 8:33 am
The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....
WHERE N BETWEEN 1 AND LEN(@Parameter)+1
--Jeff Moden
Change is inevitable... Change for the better is not.
September 22, 2010 at 8:37 am
Jeff Moden (9/22/2010)
Heh... Of course it doesn't handle Unicode correctly... the variables are all VARCHAR.
That's a bit circular! It doesn't handle Unicode because it doesn't? 😛
The parameters are indeed VARCHAR, my point is that there is no need to exclude Unicode data from your split function. My worry is that people may forget that it is not Unicode-compatible, and pass such a string to it. The function would appear to work correctly (no errors or warnings) but any un-mappable characters would be silently replaced by question marks. Scope for improvement?
September 22, 2010 at 8:44 am
Paul White NZ (9/22/2010)
Jeff Moden (9/22/2010)
Heh... Of course it doesn't handle Unicode correctly... the variables are all VARCHAR.That's a bit circular! It doesn't handle Unicode because it doesn't? 😛
The parameters are indeed VARCHAR, my point is that there is no need to exclude Unicode data from your split function. My worry is that people may forget that it is not Unicode-compatible, and pass such a string to it. The function would appear to work correctly (no errors or warnings) but any un-mappable characters would be silently replaced by question marks. Scope for improvement?
Heh... I thought you were just being obvious. 😛 Rough day yesterday.
I agree. I've repaired the +1 problem (it didn't make a noticeable difference in performance). I'll flip the code to NVARCHAR tonight and test it. Without getting into the world of MAX variables, I need to change the test a bit so it doesn't exceed 4k. Of course, the next thing I need to do after that is some MAX testing and we already know that MAX and Tally splits aren't famous for getting along.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 22, 2010 at 8:47 am
Jeff Moden (9/22/2010)
The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....WHERE N BETWEEN 1 AND LEN(@Parameter)+1
Indeed. Even with that correction, however, the LEN prevents correct results when using a space as the delimiter:
ALTER FUNCTION dbo.Split8KTally
(@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
SUBSTRING(@Parameter,N,CHARINDEX(@Delimiter,@Parameter+@Delimiter,N)-N) AS ItemValue
FROM dbo.Tally
WHERE N BETWEEN 1 AND LEN(@Parameter)+1
AND SUBSTRING(@Delimiter+@Parameter,N,1) = @Delimiter; --Notice how we find the comma
GO
SELECT ItemNumber, ItemValue FROM dbo.Split8KTally('A,B,', ','); -- 3 rows (correct)
SELECT ItemNumber, ItemValue FROM dbo.Split8KTally('A B ', ' '); -- 2 rows (oops!)
Sorry. 🙁
September 22, 2010 at 9:01 am
Jeff Moden (9/22/2010)
The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....WHERE N BETWEEN 1 AND LEN(@Parameter)+1
Shouldn't that be
WHERE N < DATALENGTH(@Parameter) + 2
September 22, 2010 at 9:05 am
steve-893342 (9/22/2010)
Jeff Moden (9/22/2010)
The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....WHERE N BETWEEN 1 AND LEN(@Parameter)+1
Shouldn't that be
WHERE N < DATALENGTH(@Parameter) + 2
Unless @Parameter is NVARCHAR.
Could do LEN(@Parameter+'X')-1
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537September 22, 2010 at 9:45 am
Paul White NZ (9/22/2010)
This seems a good time to highlight some other issues too:. . .
The XML solution needs to be modified (as shown on Brad's blog) to handle entitization correctly:
DECLARE @CSV NVARCHAR(MAX) = N'A,B&C,,C,?,',
@Delim NVARCHAR(255) = N',';
SELECT S.ItemNumber,
S.ItemValue
FROM dbo.Split8KXMLBrad(@CSV, @Delim) S;
Other character values are invalid in XML and would cause the method to fail completely. The XML solution also emit NULLs instead of empty strings.
You're right about the NULL part. That's easily taken care of with an ISNULL call within the function.
As for the "problem" characters like ampersand, less-than, greater-than, etc, the function doesn't handle that because it was originallly designed to receive a list of INTEGER values. Still, to avoid a runtime error, it should handle those special characters.
I wrote about how to do this in a blog post last year:
http://bradsruminations.blogspot.com/2009/10/un-making-list-or-shredding-of-evidence.html
Here is the revised function... unfortunately adding the logic to handle the encoding/decoding adds more overhead and therefore more CPU, but that's the penalty you pay for doing "tricks" like this by using XML:
create function dbo.ufn_SplitNCharArrayXML
(
@List nvarchar(max)
,@Delimiter nchar(1)
)
returns table
as
return
select Item=isnull(Item,N'') --Handle NULL
from (select XMLEncoded=(select [*]=@List for xml path(''))) F0 --Handle Encodings
cross apply (select XMLString='<x>'+replace(XMLEncoded,@Delimiter,'</x><x>')+'</x>') F1
cross apply (select XMLList=cast(XMLString as xml).query('.')) F2
cross apply XMLList.nodes('/x') F3(XMLNode)
cross apply (select Item=XMLNode.value('(./text())[1]','nvarchar(4000)')) F4
select * from dbo.ufn_SplitNCharArrayXML(N'A,B&C,,C,?,',',')
/*
Item
----
A
B&C
C
?
(6 rows affected)
*/
--Brad
September 22, 2010 at 9:56 am
Paul White NZ (9/22/2010)
I must admit I'm a little confused about the conclusions being drawn overall.At the risk of stating the obvious, using a multi-statement TVF is exactly equivalent to splitting the input value into a table variable, and then running a separate join query.
In any case, the net effect is to produce a better plan by counteracting certain costing model limitations by introducing another - namely that table variables are estimated as producing one row (unless OPTION(RECOMPILE) is used).
There's nothing wrong with that of course - but it does mean that one has to choose between Tally implementations depending on how large the input set is.
EXACTLY... There doesn't seem to be a one-stop-shopping function that does it all.
September 22, 2010 at 10:36 am
Brad Schulz (9/22/2010)
You're right about the NULL part. That's easily taken care of with an ISNULL call within the function.
Certainly.
As for the "problem" characters like ampersand, less-than, greater-than, etc, the function doesn't handle that because it was originallly designed to receive a list of INTEGER values. Still, to avoid a runtime error, it should handle those special characters. I wrote about how to do this in a blog post last year: http://bradsruminations.blogspot.com/2009/10/un-making-list-or-shredding-of-evidence.html
I know, I referred to your blog in my previous post. Maybe I should have been more explicit about that.
Here is the revised function... unfortunately adding the logic to handle the encoding/decoding adds more overhead and therefore more CPU, but that's the penalty you pay for doing "tricks" like this by using XML:
Yep, and that was my main reason for mentioning it. To make your method generally applicable, we would need to use the slower method - quite how much slower it is I don't know. I don't think too many people split strings containing invalid XML characters (those that cannot be entitized-around) but I guess the possibility exists.
Paul
September 22, 2010 at 10:40 am
Brad Schulz (9/22/2010)
EXACTLY... There doesn't seem to be a one-stop-shopping function that does it all.
For my money, a production-quality SQLCLR solution is closest, since it produces the best/very good performance for all the cases we have considered, and handles very large strings and Unicode as well.
Unusual cases might require a query hint or other minimal manual tuning, but then that's true of most queries. Is that fair?
September 22, 2010 at 10:52 am
Paul White NZ (9/22/2010)
Brad Schulz (9/22/2010)
EXACTLY... There doesn't seem to be a one-stop-shopping function that does it all.For my money, a production-quality SQLCLR solution is closest, since it produces the best/very good performance for all the cases we have considered, and handles very large strings and Unicode as well.
Unusual cases might require a query hint or other minimal manual tuning, but then that's true of most queries. Is that fair?
Yes, that's fair. I concede that the CLR approach seems to be the best one-stop-shopping approach... it does perform... the 100MB memory grant thing bugs me though.
If CLR is not available, and if any lists you split are going to be less than 100 items (which I think is the case 99.9% of the time), then I would think that the RBAR approach (or any kind of multi-line function approach) is really the best (in SQL2008 that is). 😀
September 22, 2010 at 11:09 am
Brad Schulz (9/22/2010)
Yes, that's fair. I concede that the CLR approach seems to be the best one-stop-shopping approach... it does perform... the 100MB memory grant thing bugs me though.
Me too, but there are robust ways to avoid that. I hope that future releases will have better support for costing TVFs - both CLR and T-SQL.
If CLR is not available, and if any lists you split are going to be less than 100 items (which I think is the case 99.9% of the time), then I would think that the RBAR approach (or any kind of multi-line function approach) is really the best (in SQL2008 that is). 😀
You are probably right. The lack of scalability is a concern, though.
I think I will stop violently agreeing with you now. 😎
September 22, 2010 at 11:30 am
Paul White NZ (9/22/2010)
You are probably right. The lack of scalability is a concern, though.I think I will stop violently agreeing with you now. 😎
:laugh: :laugh: :laugh: :laugh: :laugh:
September 22, 2010 at 12:24 pm
Mark-101232 (9/22/2010)
steve-893342 (9/22/2010)
Jeff Moden (9/22/2010)
The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....WHERE N BETWEEN 1 AND LEN(@Parameter)+1
Shouldn't that be
WHERE N < DATALENGTH(@Parameter) + 2
Unless @Parameter is NVARCHAR.
Could do LEN(@Parameter+'X')-1
Could you write it like this to accommodate the NVARCHAR?
WHERE N < LEN(@Parameter + 'XX')
or perhaps
WHERE N < DATALENGTH(@Parameter)/2 + 2
September 22, 2010 at 6:54 pm
steve-893342 (9/22/2010)
Jeff Moden (9/22/2010)
The problem with the Tally Table related splits is that I forgot :blush: to add 1 to the length....WHERE N BETWEEN 1 AND LEN(@Parameter)+1
Shouldn't that be
WHERE N < DATALENGTH(@Parameter) + 2
You can if you want but the parts in the query that use t.N don't ever include both the leading comma and the trailing comma so it's not necessary.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 166 through 180 (of 214 total)
You must be logged in to reply to this topic. Login to reply