February 14, 2011 at 4:28 pm
Jeff Moden (2/14/2011)
The reason for DATALENGTH in the function posted on this thread is so that you can use a space as a delimiter. It's only important if you have trailing delimiters with nothing inbetween but I wanted to make sure it was handled.The code also works fine (with some modification for DATALENGTH()/2) and fast for NVARCHAR(4000). I've only started on testing against any of the MAX datatypes. Typically, just using one of the MAX datatypes in a JOIN, as is usually done with splitters of this nature, immediately causes the code to run twice as slow allowing even (ugh!) recursive CTE's to outstripe joined splitters. However, this runs so bloody fast now (thanks to there being NO concatenation), I'm hoping it'll keep up with a rock solid well written While loop on the MAX datatypes. I just haven't had the time to do that testing, yet.
Fair enough. You've commented twice that the lack of concatonation has really sped this up. I like the IFNULL(NULLIF()) bit you built in, it looks slick. Its surprising that it's that powerful though. I'll have to go check this out.
Thank you, as always.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
February 14, 2011 at 5:25 pm
Anamika (2/14/2011)
Thanks,It is working fine. I wonder on what scenario I've gone for Global Temporary tables. 🙁
Thanks,
Ami
+1 on sending a normal #temp table into dynamic SQL!
I've used global temp tables when I'm dumping SQL out to bcp to export to a delimited table; in those cases I use entirely dynamic SQL and name the ##temp table something like:
SET @tmpGlobalTempTable = '##tmpStoredProcedureName_' + CAST(@@spid AS VARCHAR(6)) + '_' + CAST(FLOOR(rand()*100000) AS VARCHAR(6))
February 15, 2011 at 7:45 pm
wolfkillj (2/14/2011)
Jeff Moden (2/14/2011)
Ok... the following function will pretty much blow the doors off of other splitter functions including some of my old ones.WOW!!! That crash you just heard was the sound of the doors being blown off the splitter functions we currently use. This is going into production TODAY! Thanks, Jeff!
I'm looking forward to an updated tally table article - it took me a while to grasp the workings of this function, but I'm very curious about how I could use tally table logic in other ways, too.
Jason
Jason, I wonder if you'd do me a favor. Would it be possible for you to post the functions you replaced? I'd like the code for two reasons... 1) to do my own testing against and 2) I want to make sure that my code is actually a correct replacement for yours.
Thanks, Jason.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 16, 2011 at 7:49 am
Sure. My multistatement table-valued function (below) is a WHILE loop that repeats as long as the string has any characters in it, finding the first occurrence of the delimiter, inserting the LEFT() of the string before the first delimiter into a table variable, snipping off that substring plus the delimiter, then repeating on the remaining string until there's nothing left of it. I didn't write this myself, but cribbed it from another website. For short comma-delimited strings, the performance difference between your code and mine seems to be minimal (in very unstructured tests), but since my code requires an INSERT operation for every value in the comma-delimited string, the execution time starts to mount as the string gets longer. We sometimes send comma-delimited strings with hundreds of values to return data for our automated document generation processes, so your function seems to perform better in those cases. I'd be interested to see if rigorous testing bears this out. Thanks!
CREATE FUNCTION [dbo].[Split_Value](@Value NVARCHAR(max), @delimiter CHAR(1))
RETURNS @ValueTable TABLE (Value NVARCHAR(max))
AS
BEGIN
DECLARE @tempValueList NVARCHAR(max)
SET @tempValueList = @Value
DECLARE @i INT
DECLARE @ValueVal NVARCHAR(max)
SET @i = CHARINDEX(@delimiter, @tempValueList)
WHILE (LEN(@tempValueList) > 0)
BEGIN
IF @i = 0
SET @ValueVal = @tempValueList
ELSE
SET @ValueVal = LEFT(@tempValueList, @i - 1)
INSERT INTO @ValueTable(Value) VALUES(@ValueVal)
IF @i = 0
SET @tempValueList = ''
ELSE
SET @tempValueList = RIGHT(@tempValueList, LEN(@tempValueList) - @i)
SET @i = CHARINDEX(@delimiter, @tempValueList)
END
END
Jason Wolfkill
February 17, 2011 at 6:44 am
wolfkillj (2/16/2011)
Sure. My multistatement table-valued function (below) is a WHILE loop that repeats as long as the string has any characters in it, finding the first occurrence of the delimiter, inserting the LEFT() of the string before the first delimiter into a table variable, snipping off that substring plus the delimiter, then repeating on the remaining string until there's nothing left of it. I didn't write this myself, but cribbed it from another website. For short comma-delimited strings, the performance difference between your code and mine seems to be minimal (in very unstructured tests), but since my code requires an INSERT operation for every value in the comma-delimited string, the execution time starts to mount as the string gets longer. We sometimes send comma-delimited strings with hundreds of values to return data for our automated document generation processes, so your function seems to perform better in those cases. I'd be interested to see if rigorous testing bears this out. Thanks!
CREATE FUNCTION [dbo].[Split_Value](@Value NVARCHAR(max), @delimiter CHAR(1))
RETURNS @ValueTable TABLE (Value NVARCHAR(max))
AS
BEGIN
DECLARE @tempValueList NVARCHAR(max)
SET @tempValueList = @Value
DECLARE @i INT
DECLARE @ValueVal NVARCHAR(max)
SET @i = CHARINDEX(@delimiter, @tempValueList)
WHILE (LEN(@tempValueList) > 0)
BEGIN
IF @i = 0
SET @ValueVal = @tempValueList
ELSE
SET @ValueVal = LEFT(@tempValueList, @i - 1)
INSERT INTO @ValueTable(Value) VALUES(@ValueVal)
IF @i = 0
SET @tempValueList = ''
ELSE
SET @tempValueList = RIGHT(@tempValueList, LEN(@tempValueList) - @i)
SET @i = CHARINDEX(@delimiter, @tempValueList)
END
END
Very cool. Thanks for the code. I've seen this "nibbler" type of function before and I have a pretty cool CSV generator I built to test it with. I'll let you know.
As a side bar, if your CSV's will fit in a VARCHAR(8000) or NVARCHAR(4000), you can double the performance of just about any splitter code by not using one of the MAX datatypes. "Right sizing" of data-types works very well, in this case. In fact, I have two splitter functions... one for VARCHAR(8000) and one for VARCHAR(MAX) just so I can get the extra speed out of things that don't need the girth of VARCHAR(MAX). And, yes... I'm also working on improvements for VARCHAR(MAX) splitters. Right now, the VARCHAR(8000) version of the splitter I made is about 3 to 4 times (depending) faster than the nearly identical VARCHAR(MAX) splitter. The ONLY difference between the two is the data-type of the input string variable.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2011 at 7:29 am
Hi Jeff,
Thanks for the tip on the (max) vs. (8000)/(4000) nvarchar/varchar data types in these functions. When I implemented your new splitter function, I used nvarchar(4000), which covers our needs to at least a "five nines" percentage. I'll probably put a nvarchar(max) version in place, too, just to cover any unanticipated needs.
Jason
Jason Wolfkill
February 17, 2011 at 5:25 pm
wolfkillj (2/17/2011)
Hi Jeff,Thanks for the tip on the (max) vs. (8000)/(4000) nvarchar/varchar data types in these functions. When I implemented your new splitter function, I used nvarchar(4000), which covers our needs to at least a "five nines" percentage. I'll probably put a nvarchar(max) version in place, too, just to cover any unanticipated needs.
Jason
It won't hurt anything except slow things down abit because the cteTally table will always "overshoot" the length of NVARCHAR(4000) but we really should change most instances of DATALENGTH() to DATALENGTH()/2 if you're going to use NVARCHAR(). Sorry I didn't give you a heads up on that, before.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply