May 11, 2012 at 1:05 am
Comments posted to this topic are about the item Convert a string list to TABLE with single CTE query
May 11, 2012 at 7:09 am
This is a very nice function. But, to even consider its use, it would be helpful to know how it compares in terms of execution performance to other similar functions that have been written before.
For example I have 2 versions (version 1, version 2 below) of this split function. Would be great if someone is able to provide some metrics and a nice writeup. I take no credit for writing either of them. Beings with higher SQL mojo deserve the credit. (version 3 is code from the article wrapped in a function).
NOTE: It also looks like version 2 and 3 are very similar. Version 2 will not work with a recursion level greater than 100. I use it only on short CSV strings. Version 1 is still my goto function.
simple test run with 110 items.
SET STATISTICS TIME ON;
SELECT * FROM dbo.split('0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9', DEFAULT)
SET STATISTICS TIME OFF;
SET STATISTICS TIME ON;
SELECT * FROM dbo.split2('0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9', DEFAULT)
SET STATISTICS TIME OFF;
SET STATISTICS TIME ON;
SELECT * FROM dbo.split3('0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9', DEFAULT)
SET STATISTICS TIME OFF;
version 1: <=== this version uses a "Tally Table" named dbo.tbl_Number. You need to create one for this version to work. I've included a stored procedure at the end that will create this for you.
CREATE FUNCTION [dbo].[split]
(
@DelimitedString AS VARCHAR(MAX),
@Delimiter AS CHAR = N','
)
RETURNS TABLE
AS
RETURN(
SELECTSUBSTRING(
@Delimiter + @DelimitedString + @Delimiter,
NumberId + 1,
CHARINDEX(@Delimiter, @Delimiter + @DelimitedString + @Delimiter, NumberId + 1) - NumberId - 1
) AS Token
FROMdbo.tbl_Number WITH (NOLOCK)
WHERENumberId >= 1
ANDNumberId < LEN(@Delimiter + @DelimitedString + @Delimiter) - 1
ANDSUBSTRING(@Delimiter + @DelimitedString + @Delimiter, NumberId, 1) = @Delimiter
)
version 2:
CREATE FUNCTION [dbo].[split2]
(
@s-2VARCHAR(512),
@sepCHAR(1) = N','
)
RETURNS TABLE
AS
RETURN (WITH Pieces(pn, start, stop) AS (
SELECT1,
1,
CHARINDEX(@sep, @s-2)
UNION ALL
SELECTpn + 1,
stop + 1,
CHARINDEX(@sep, @s-2, stop + 1)
FROMPieces
WHEREstop > 0
)
SELECTpn,
SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop - start ELSE 512 END) AS s
FROMPieces
)
version 3 <=== this is the same code provided in the article wrapped in a table-function
CREATE FUNCTION [dbo].[split3]
(
@DelimitedString AS VARCHAR(MAX),
@Delimiter AS CHAR = N','
)
RETURNS @Result TABLE(
TokenVARCHAR(128)
)
AS
BEGIN
WITH rep(item, delim) AS
(
SELECT@DelimitedString AS item,
@Delimiter AS delim
UNION ALL
SELECTLEFT(item, CHARINDEX(delim, item, 1) - 1) AS item,
delim
FROMrep
WHERECHARINDEX(delim, item, 1) > 0
UNION ALL
SELECTRIGHT(item, LEN(item) - CHARINDEX(delim, item, 1)) AS item,
delim
FROMrep
WHERECHARINDEX(delim, item, 1) > 0
)
INSERT INTO @Result(Token)
SELECTitem
FROMrep
WHERECHARINDEX(delim, item, 1) = 0
OPTION(MAXRECURSION 0); -- Needed to handle recursion levels greater than 100. By default SQL Server limits recursion levels to 100.
RETURN;
END
Tally Table
/*EXEC dbo.CreateNumbersTable DEFAULT
SELECT * FROM dbo.tbl_Number
\*/
CREATE PROCEDURE [dbo].[CreateNumbersTable]
@EndingNumberINT = 65535
AS
BEGIN
SET NOCOUNT ON
DECLARE @IsDeleteTableBIT
SELECT @IsDeleteTable = dbo.ufn_IsTableExist('tbl_Number')
IF @IsDeleteTable = 1
DROP TABLE dbo.tbl_Number
CREATE TABLE dbo.tbl_Number
(
NumberId INT IDENTITY(1,1) NOT NULL,
CONSTRAINT PK_tbl_Number PRIMARY KEY CLUSTERED
(
NumberId ASC
) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
WHILE COALESCE(SCOPE_IDENTITY(), 0) < @EndingNumber
BEGIN
INSERT dbo.tbl_Number DEFAULT VALUES
END
END
/*SELECT dbo.ufn_IsTableExist('tbl_Number')
\*/
ALTER FUNCTION [dbo].[ufn_IsTableExist]
(
@TableName SYSNAME
)
RETURNS BIT
AS
BEGIN
RETURN(SELECTCASE WHEN EXISTS(SELECT1
FROMINFORMATION_SCHEMA.TABLES
WHERETABLE_TYPE = N'BASE TABLE'
ANDTABLE_NAME = @TableName)
THEN 1
ELSE 0
END);
END
June 15, 2012 at 8:22 am
Sandor,
Have a look at Jeff Moden's splitter testing. You'll find that the inline tally table version is quite fast without needing a helper table and he's done all the testing you need.
http://www.sqlservercentral.com/articles/Tally+Table/72993/
Another article: http://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/
/* Anything is possible but is it worth it? */
June 15, 2012 at 11:18 am
Gatekeeper,
Thank you for the links.
-sandor
August 3, 2012 at 6:44 am
Try this - it works faster for me than using a tally table - it converts the string to xml, then selects from the XML
CREATE FUNCTION [dbo].[Split]
(
@delimited nvarchar(max),
@delimiter nvarchar(100)
) RETURNS @t TABLE
(
val nvarchar(max)
)
AS
BEGIN
declare @xml xml
set @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'
insert into @t(val)
select
r.value('.','nvarchar(max)') as item
from @xml.nodes('//root/r') as records(r)
RETURN
END
August 3, 2012 at 7:35 am
Just tried the XML version with a short test string.
SELECT * FROM dbo.Split(N'03680027,08563027,03682027,03677003,03683027,17948027,03702003,17948003,15636003,03620003,03299003,03014003,03679027,08563003,03013003,15636027,00781003,35879049,03682003,03683003,03679003,03680003,30441003,02747049,02748049,02749049,03300003,02981003,02981027,02981004,02982003,02982027,02982004,02983003,02983027,02983004,06921003,02974003,02974027,02974004,02975003,02975027,02975004,02973003,02973027,02973004', N',');
Looks really good. Seems to perform just as quickly as the one I use with the Tally Table.
Thank you.
Sandor
August 3, 2012 at 7:43 am
I tried it the other day with a very long delimited list of several thousand items - a tally based function took about 3 seconds, the xml one took 0 seconds
May 10, 2016 at 1:43 pm
Thanks for the script.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply