May 10, 2014 at 5:25 am
mister.magoo (5/9/2014)
mburbea (5/9/2014)
Have you tried switching the memory optimized tally table durability to just schema? I read that a table will perform best in this mode and its pretty easy to add a start up script for a db to populate the tally table with rows.Yes, the performance was almost identical to the millisecond, which seems unlikely but that's what happened.
Creating a Hekaton table with schema-only durability makes data modification faster (no logging/checkpointing), but it has no effect on reads. Native compilation (rather than T-SQL interop) would speed up reads, but as mentioned before this is not yet available for functions.
May 13, 2014 at 1:32 pm
That makes sense.
After playing with the new Lead variant, my conclusion is that it seems to be slower than the mister.magoo collate latin1_general_bin version. That seems to be the best performing in ITF.
However, I can offer one very minor optimization that can slightly improve the performance depending on the size of the string. Basically reducing the size of the intermediate tables. I was studying the plan and it seemed that it was creating much larger rows then the optimizer needed for the intermediate derived tables. Since we don't care at all about the rows in E1,E2,E4 we can convert those to bit. For CteTally we can use a smallint as we need up to 8000. The other intermediate results conversion wasn't so successful. For very short strings it performs about the same, but for the longer strings the smaller rows for the intermediate results seems to be very beneficial.
Here is the modified function:
if object_id('delimitedSplit8kbsr') is not null drop function delimitedSplit8kbsr
GO
CREATE FUNCTION [dbo].delimitedSplit8kbsr
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT convert(bit,1) UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT convert(bit,1) FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT convert(bit,1) FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- we use conversion to bit and smallint to reduce the row size of these results.
-- sql seems smart enough to avoid doing unnecessary work.
SELECT top (ISNULL(DATALENGTH(@pString),0)) convert(smallint,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1
UNION ALL
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
And attached is the comparison of 4 functions,clrsplit,delimitedSplit8kB,delimitedSplit8k_lead, and this function BSR. The results suggest this might be the best for users without access to SQL server 2014 enterprise and an inexplicable restriction on CLR.
Split wins, as always but the BSR function is no slouch, beating the other 2 TSQL functions with a sizable lead, performing at about halfway between the clr and the other fast t-sql functions.
This was performed on a sql server 2014 Standard (sigh 🙁 ) Edition instance.
May 13, 2014 at 2:23 pm
Brilliant:-)
Ran a short disco version of the test on 2014 Developers edition, also added a Lead version which came in second!
SplitterName TOTAL_D AVG_D
-------------------------- -------- ---------
Split 3.97600 0.081142
delimitedSplit8kbsr 4.12700 0.084224
delimitedSplit8kbsrL 4.21200 0.085959
DelimitedSplit8K_T1_Lead 4.34300 0.088632
DelimitedSplit8K_Lead 5.47300 0.111693
DelimitedSplit8K_T0_Lead 5.52800 0.112816
Good stuff!
😎
May 13, 2014 at 2:27 pm
mburbea (5/13/2014)
However, I can offer one very minor optimization that can slightly improve the performance depending on the size of the string. Basically reducing the size of the intermediate tables. I was studying the plan and it seemed that it was creating much larger rows then the optimizer needed for the intermediate derived tables. Since we don't care at all about the rows in E1,E2,E4 we can convert those to bit. For CteTally we can use a smallint as we need up to 8000.
Well done, this seems like a very useful improvement. I hope other people will be able to reproduce the speed up on lower versions.
May 13, 2014 at 3:55 pm
The collate binary, short row optimization seems to really only help on 2014. My guess is that the new estimator is much more clever (and faster). It seems to be worse on SQL server 2008R2. (I don't have access to a sql server 2012 box to try it on)
SplitterName total_davg_d
-----------------------------------------------
DelimitedSplit8KB 35.1645160.717643183673469
delimitedSplit8kbsr 37.3257270.761749530612245
On older versions of sql server, it seems best to just use the binary variant.
Eirikur Eiriksson (5/13/2014)
Brilliant:-)Ran a short disco version of the test on 2014 Developers edition, also added a Lead version which came in second!
SplitterName TOTAL_D AVG_D
-------------------------- -------- ---------
Split 3.97600 0.081142
delimitedSplit8kbsr 4.12700 0.084224
delimitedSplit8kbsrL 4.21200 0.085959
DelimitedSplit8K_T1_Lead 4.34300 0.088632
DelimitedSplit8K_Lead 5.47300 0.111693
DelimitedSplit8K_T0_Lead 5.52800 0.112816
Good stuff!
😎
Good to hear, do you mind posting your lead version? T1 seems quite impressive if its not using the "SR" (small row) variant. I also used this slight optimization of lead's math which I think is better but might be worse.
isnull(lead(n1,1,null) over (order by n1) - 1 -n1,8000)
May 13, 2014 at 4:21 pm
My findings differ, in that the "bit trick" makes no significant difference in Jeff's test harness.
I am not saying it doesn't work for some, but not for me on 2014 Dev Edition on a somewhat old PC.
The binary collation mod obviously helps a lot but only if using binary collation works for your use case and in my case there is no discernible improvement from using bit.
On those systems where you do see an improvement, I wonder if you also would see one from explicitly casting the 1 in cteStart as a smallint to make it match datatype with cteTally to avoid implicit conversions?
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT cast(1 as smallint)
UNION ALL
SELECT t.N+cast(1 as smallint) FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
May 13, 2014 at 8:58 pm
That's rather odd. I found it best not to mess with the CteStart derived table. Changing that seems to slow things down. For my use case which is comma delimited list of codes or integers it seems to be pretty nice.
I tried casting both and it only hurts.
May 13, 2014 at 10:58 pm
Looks like the performance improvements have more to do with the platform (2014). I have been looking at results from the same (or identical) hardware (E6220, i5, 256GB SSD, 8Gb RAM, Power Savings disabled!!!) for 2008 and 2012. The findings are that those are similar, but significantly slower than 2014. Mind you, this is not a scientific measurement, more eyeballing 😉
Then, looking at the bit type tally, another thing emerged; unless all seeds in the initial CTE are explicitly cast to a bit, it still returns int (more in the code). This makes me believe that this bit (excuse the pun 😛 ) is not a contributing factor in terms of performance.
😎
SET NOCOUNT ON;
DECLARE @NUMROWS INT = 1;
/* All seeds explicitly cast to bit, returns bit */
;WITH TN(N) AS (SELECT N FROM (VALUES (CONVERT(BIT,1)),(CONVERT(BIT,1))
,(CONVERT(BIT,1)),(CONVERT(BIT,1)),(CONVERT(BIT,1))
,(CONVERT(BIT,1)),(CONVERT(BIT,1)),(CONVERT(BIT,1))
,(CONVERT(BIT,1)),(CONVERT(BIT,1))) AS X(N))
SELECT TOP(@NUMROWS)
DATALENGTH(NM.N) AS DLEN
,'All seeds explicitly cast to bit, returns bit' AS COMMENT
FROM TN NM;
/* First seed explicitly cast to bit, returns int */
;WITH E1(N) AS (
SELECT convert(bit,1) UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
SELECT TOP(@NUMROWS)
DATALENGTH(NM.N) AS DLEN
,'First seed explicitly cast to bit, returns int' AS COMMENT
FROM E1 NM;
/* No conversion, returns int */
;WITH TN(N) AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1)) AS X(N))
SELECT TOP(@NUMROWS)
DATALENGTH(NM.N) AS DLEN
,'No conversion, returns int' AS COMMENT
FROM TN NM;
/* All but last seed cast to bit, returns int */
;WITH TN(N) AS (SELECT N FROM (VALUES (CONVERT(BIT,1)),(CONVERT(BIT,1))
,(CONVERT(BIT,1)),(CONVERT(BIT,1)),(CONVERT(BIT,1))
,(CONVERT(BIT,1)),(CONVERT(BIT,1)),(CONVERT(BIT,1))
,(CONVERT(BIT,1)),(1)) AS X(N))
SELECT TOP(@NUMROWS)
DATALENGTH(NM.N) AS DLEN
,'All but last seed cast to bit, returns int' AS COMMENT
FROM TN NM;
Contracted results
DLEN COMMENT
----------- ---------------------------------------------
1 All seeds explicitly cast to bit, returns bit
4 First seed explicitly cast to bit, returns int
4 No conversion, returns int
4 All but last seed cast to bit, returns int
May 14, 2014 at 12:06 am
OK, I am dragged in again by so many notifications received for this excellent never ending thread :-). After looking at the recent comments it is really fascinating to know the use of new technology methods. I will look into more details when I will get more time.
However, as far as the speed comparison tests are concerned, I am not sure that LEAD function was faster (Although, I may prefer it for some other reasons depdending upon the requirement). I already tried it way earlier (Absolutely no intention to take any credit away from Eirikur Eiriksson. It is an excellent excellent work from him) but I was able to beat that with CHARINDEX + BINARY COLLATION + FIRST ITEM HANDLED SEPERATELY combination on SQL 2012 (Not sure of SQL 2014). However, the most important reason for neglecting it was that I didn’t have SQL 2012 in production 😉
The fastest method (after CLR) for me was AFOREMENTIONED COMBINATION + PHYSICAL TALLY TABLE + (N+1) PHYSICAL COLUMN + NO IMPLICIT CONVERSIONS (TALLY TABLE WITH INT DATATYPE). And as Paul White has already pointed out for LEAD and 8K functions, this function is not much susceptible to larger strings.
Since I do not have the time to run the speed tests again, Following are my three functions (copied from old web source). It would be better if someone who already did the comparisons could run the tests again after including the following three functions.
1.IF OBJECT_ID(N'dbo.DelimitedSplit8K_T1_First_Item_Handling_Seperately ', N'IF') IS NOT NULL
2.DROP FUNCTION dbo. DelimitedSplit8K_T1_First_Item_Handling_Seperately;
3. /*===== USES BINARY COLLATION TO SPEED UP THE STRING COMPARISON=======*/
4./*===== FIRST DELIMITED ITEM HANDLED SEPERATELY=======================*/
5.CREATE FUNCTION [dbo].[DelimitedSplit8K_T1_First_Item_Handling_Seperately]
6.--===== Define I/O parameters
7.(@pString VARCHAR(8000), @pDelimiter VARCHAR(1)/*===MADE IT VARCHAR TO AVOID IMPLICIT CONVERSION WHILE COMPARING====*/
8.)
9.RETURNS TABLE WITH SCHEMABINDING AS
10.RETURN
11.--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
12. -- enough to cover VARCHAR(8000)
13. WITH E1(N) AS (
14. SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
15. SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
16. SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
17. ), --10E+1 or 10 rows
18. E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
19. E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
20. cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
21. -- for both a performance gain and prevention of accidental "overruns"
22. SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
23. ),
24.cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
25. /*====FIRST ITEM HANDLED AT THE END======*/
26. --SELECT 1 UNION ALL -- does away with 0 base CTE, and the OR condition in one go!
27. /*=======================================*/
28. SELECT N+1
29. FROM cteTally
30. WHERE SUBSTRING(@pString ,N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
31. ),
32.cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
33. SELECT N1,
34. ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,N1),0) - N1,8000)
35. FROM cteStart
36. )
37.--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
38./*====FIRST ITEM HANDLING===============*/
39.SELECT ItemNumber = 0, Item = SUBSTRING(@pString , 1, ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1, 8000))
40.UNION ALL
41.SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY N1),
42. Item = SUBSTRING(@pString , N1, L1)
43. FROM cteLen
44.;
IF OBJECT_ID(N'dbo.Split8KTally_With_Column_N1', N'IF') IS NOT NULL
DROP FUNCTION dbo.Split8KTally_With_Column_N1;
GO
--===== Tally Table (Split8KTally_With_Column_N1 ITVF) ===============================================================================
/*===== THIS FUNCTION USES COLUMN "N1" OF TALLY TABLE HAVING PRE-CALCULATED VALUES FOR N+1 ===========*/
CREATE FUNCTION dbo.Split8KTally_With_Column_N1 (
@pString VARCHAR(8000), @pDelimiter VARCHAR(1)
)
RETURNS TABLE
AS
RETURN
/*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/
SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER
, ItemValue = SUBSTRING(@pString, 1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,
@pString COLLATE Latin1_General_BIN), 0)
- 1, 8000 ))
UNION ALL -- CONCATENATE THE OUTPUTS
/*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/
SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] )
, ItemValue = SUBSTRING(@pString, [N1],
ISNULL(NULLIF(CHARINDEX(@pDelimiter ,
@pString COLLATE Latin1_General_BIN, [N1]), 0)
- [N1], 8000) )
FROM [dbo].[Tally] WITH ( NOLOCK ) --NOLOCK HINT IS NOT NECESSARY
WHERE [N] BETWEEN 1 AND LEN(@pString)
AND SUBSTRING(@pString, [N], 1) = @pDelimiter
GO
--=====================================================================================================================
-- Conditionally drop and recreate the Split8KTally_With_N_PLUS_1 splitter function.
--=====================================================================================================================
IF OBJECT_ID('dbo.Split8KTally_With_N_PLUS_1','IF') IS NOT NULL
DROP FUNCTION [dbo].Split8KTally_With_N_PLUS_1
GO
--===== Tally Table (Split8KTally_With_N_PLUS_1 ITVF) ===============================================================================
/*===== THIS FUNCTION USES RUNTIME CALCULATION "N + 1" INSTEAD OF USING THE PRE-CALCULATED COLUMN "N1" OF THE TALLY TABLE===========*/
CREATE FUNCTION dbo.Split8KTally_With_N_PLUS_1 (
@pString VARCHAR(8000), @pDelimiter VARCHAR(1)
)
RETURNS TABLE
AS
RETURN
/*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/
SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER
, ItemValue = SUBSTRING(@pString, 1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,
@pString COLLATE Latin1_General_BIN), 0)
- 1, 8000 ))
UNION ALL -- CONCATENATE THE OUTPUTS
/*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/
SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] )
, ItemValue = SUBSTRING(@pString, (N+1),
ISNULL(NULLIF(CHARINDEX(@pDelimiter ,
@pString COLLATE Latin1_General_BIN, (N+1)), 0)
- (N+1), 8000) )
FROM [dbo].[Tally] WITH ( NOLOCK )--NOLOCK HINT IS NOT NECESSARY
WHERE [N] BETWEEN 1 AND LEN(@pString)
AND SUBSTRING(@pString, [N], 1) = @pDelimiter
GO
One more thing about LEAD function, IIRC, I used the LEAD somewhat as
SUBSTRING(@pString, N,
LEAD(N,1,16000)—DOUBLE THE MAX STRING LENGTH
OVER (ORDER BY N) - N-1 )
No need for ISNULL(NULLIF(
Also, with physical tally table LEAD performed much better as generally Window functions perform better on indexed columns especially CLUSTERED INDEX.
Last but not least, even though, I saw the almost the same behavior in article’s test harness, my preferred test harness was different as I wanted to test the larger strings i.e. near to 8000 characters.
May 14, 2014 at 12:22 am
mburbea (5/13/2014)
That makes sense.After playing with the new Lead variant, my conclusion is that it seems to be slower than the mister.magoo collate latin1_general_bin version. That seems to be the best performing in ITF.
However, I can offer one very minor optimization that can slightly improve the performance depending on the size of the string. Basically reducing the size of the intermediate tables. I was studying the plan and it seemed that it was creating much larger rows then the optimizer needed for the intermediate derived tables. Since we don't care at all about the rows in E1,E2,E4 we can convert those to bit. For CteTally we can use a smallint as we need up to 8000. The other intermediate results conversion wasn't so successful. For very short strings it performs about the same, but for the longer strings the smaller rows for the intermediate results seems to be very beneficial.
Here is the modified function:
if object_id('delimitedSplit8kbsr') is not null drop function delimitedSplit8kbsr
GO
CREATE FUNCTION [dbo].delimitedSplit8kbsr
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT convert(bit,1) UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT convert(bit,1) FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT convert(bit,1) FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- we use conversion to bit and smallint to reduce the row size of these results.
-- sql seems smart enough to avoid doing unnecessary work.
SELECT top (ISNULL(DATALENGTH(@pString),0)) convert(smallint,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1
UNION ALL
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
And attached is the comparison of 4 functions,clrsplit,delimitedSplit8kB,delimitedSplit8k_lead, and this function BSR. The results suggest this might be the best for users without access to SQL server 2014 enterprise and an inexplicable restriction on CLR.
Split wins, as always but the BSR function is no slouch, beating the other 2 TSQL functions with a sizable lead, performing at about halfway between the clr and the other fast t-sql functions.
This was performed on a sql server 2014 Standard (sigh 🙁 ) Edition instance.
Very Nice thing. I guess converting it to bit could mean a smaller footprint in memory and hence may perform better (but I have been proven wrong with 1+1 = 2 logic as lots of things depends upon many other variables). I would try some alteration in your function (different combinations as well)
1. Converting each value in starting CTE to bit
2. Use INT conversion instead of SMALLINT
Somewhat like
if object_id('delimitedSplit8kbsr') is not null drop function delimitedSplit8kbsr
GO
ALTER FUNCTION [dbo].delimitedSplit8kbsr
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT convert(bit,1) UNION ALL SELECT convert(bit,1) UNION ALL SELECT convert(bit,1) UNION ALL
SELECT convert(bit,1) UNION ALL SELECT convert(bit,1) UNION ALL SELECT convert(bit,1) UNION ALL
SELECT convert(bit,1) UNION ALL SELECT convert(bit,1) UNION ALL SELECT convert(bit,1) UNION ALL SELECT convert(bit,1)
), --10E+1 or 10 rows
E2(N) AS (SELECT convert(bit,1) FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT convert(bit,1) FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- we use conversion to bit and smallint to reduce the row size of these results.
-- sql seems smart enough to avoid doing unnecessary work.
SELECT top (ISNULL(DATALENGTH(@pString),0)) convert(INT,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1
UNION ALL
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
Also, instead of UNION ALL, VALUES could be used since you are testing it on SQL 2012+.
Last but not least. IIRC, Chris Morris shared different variations of cteTALLY (some were better) , I guess they are never tried here yet ?
May 14, 2014 at 8:37 am
Usman actually seems to be right. The avg row size of the intermediate CTE tables seems to have no real bearing on the plan. THe conversion to smallint/int of row_number seems to be where all the gains are coming from. Smallint being upgraded is 'free', where as the conversion from int->bigint is non trivial. When you use "set statistics profile on" on the "8kB" version of the function you will see the following in the argument field (I've removed all the null fields):
DEFINE:([Expr1055]=substring('a,b,c,d',CONVERT_IMPLICIT(int,[Union1052],0),CONVERT_IMPLICIT(int,[Expr1053],0)))
DEFINE:([Expr1054]=row_number)
DEFINE:([Expr1053]=isnull(CONVERT_IMPLICIT(bigint,CASE WHEN charindex(',','a,b,c,d',CONVERT_IMPLICIT(int,[Union1052],0))=(0) THEN NULL ELSE charindex(',','a,b,c,d',CONVERT_IMPLICIT(int,[Union1052],0)) END,0)-[Union1052],(8000)))
VALUES:(((1)))
DEFINE:([Expr1051]=[Expr1050]+(1))
WHERE:(CONVERT(varchar(1),substring('a,b,c,d',CONVERT_IMPLICIT(int,[Expr1050],0),(1)),0)=',')
TOP EXPRESSION:((7))
DEFINE:([Expr1050]=row_number)
The worst offender is the line where it has to convert the expression from big int to int to do work on it then convert the result back to bigint. Now let's look at the same plan for "8kbsr"
DEFINE:([Expr1056]=substring('a,b,c,d',[Union1053],[Expr1054]))
DEFINE:([Expr1055]=row_number)
DEFINE:([Expr1054]=isnull(CASE WHEN charindex(',','a,b,c,d',[Union1053])=(0) THEN NULL ELSE charindex(',','a,b,c,d',[Union1053]) END-[Union1053],(8000)))
ORDER BY:([Union1053] ASC)
VALUES:(((1)))
DEFINE:([Expr1052]=CONVERT(int,[Expr1050],0)+(1))
WHERE:(CONVERT(varchar(1),substring('a,b,c,d',CONVERT(int,[Expr1050],0),(1)),0)=',')
TOP EXPRESSION:((7))
DEFINE:([Expr1050]=row_number)
The plan look similar but the cost of converting back and forth is removed. The rowsize optimization had nothing or very little to do with it. I think that might have been part of the benefit that the physical table offered is that it wasn't having to convert numbers. I've tried reforming with different tally table shapes and in practice it made no decernible difference in performance. In Sql Server 2014 though, the convert to int on the row number is a huge gain.
Edit:
I can't figure out how to turn off the stupid emoticons which kind of make my post look ridiculous.
Also Usman's reformation to have CTEStart not have teh union but to have the union on line 31 seems to provide a minor boost in performance, enough so that it's overall time is now beating Split 🙂
May 14, 2014 at 9:16 am
mburbea (5/14/2014)
Usman actually seems to be right. The avg row size of the intermediate CTE tables seems to have no real bearing on the plan. THe conversion to smallint/int of row_number seems to be where all the gains are coming from. Smallint being upgraded is 'free', where as the conversion from int->bigint is non trivial. When you use "set statistics profile on" on the "8kB" version of the function you will see the following in the argument field (I've removed all the null fields):
DEFINE:([Expr1055]=substring('a,b,c,d',CONVERT_IMPLICIT(int,[Union1052],0),CONVERT_IMPLICIT(int,[Expr1053],0)))
DEFINE:([Expr1054]=row_number)
DEFINE:([Expr1053]=isnull(CONVERT_IMPLICIT(bigint,CASE WHEN charindex(',','a,b,c,d',CONVERT_IMPLICIT(int,[Union1052],0))=(0) THEN NULL ELSE charindex(',','a,b,c,d',CONVERT_IMPLICIT(int,[Union1052],0)) END,0)-[Union1052],(8000)))
VALUES:(((1)))
DEFINE:([Expr1051]=[Expr1050]+(1))
WHERE:(CONVERT(varchar(1),substring('a,b,c,d',CONVERT_IMPLICIT(int,[Expr1050],0),(1)),0)=',')
TOP EXPRESSION:((7))
DEFINE:([Expr1050]=row_number)The worst offender is the line where it has to convert the expression from big int to int to do work on it then convert the result back to bigint. Now let's look at the same plan for "8kbsr"
DEFINE:([Expr1056]=substring('a,b,c,d',[Union1053],[Expr1054]))
DEFINE:([Expr1055]=row_number)
DEFINE:([Expr1054]=isnull(CASE WHEN charindex(',','a,b,c,d',[Union1053])=(0) THEN NULL ELSE charindex(',','a,b,c,d',[Union1053]) END-[Union1053],(8000)))
ORDER BY:([Union1053] ASC)
VALUES:(((1)))
DEFINE:([Expr1052]=CONVERT(int,[Expr1050],0)+(1))
WHERE:(CONVERT(varchar(1),substring('a,b,c,d',CONVERT(int,[Expr1050],0),(1)),0)=',')
TOP EXPRESSION:((7))
DEFINE:([Expr1050]=row_number)The plan look similar but the cost of converting back and forth is removed. The rowsize optimization had nothing or very little to do with it. I think that might have been part of the benefit that the physical table offered is that it wasn't having to convert numbers. I've tried reforming with different tally table shapes and in practice it made no decernible difference in performance. In Sql Server 2014 though, the convert to int on the row number is a huge gain.
This was one of the main reasons I chose INT datatype for physical Tally table so that I can get rid of implicit conversions as for larger strings this could have been more costly 🙂
May 14, 2014 at 10:44 am
Usman's reformation of the query to remove the union from "cteStart" and add it to the final result as has a nice appreciable gain in performance. It now seems to beat the CLR splitter. This is ONLY true on SQL Server 2014 unfortunately.
Here are the optimizations applied from the base "8K splitter" in the updated script file.
1. Collate binary before doing any string comparison.
2. Convert row_number to int.
3. Try a base 20 tally table and doing 3 cross joins instead of 4 (doesn't really seem to change anything).
4. Perform Usman's reformation. (for Lead that still means charindex no choice here).
SplitterNametotal_davg_d
DelimitedSplit8k_LEAD2.2430.0457755102040816
DelimitedSplit8KB2.5460.0519591836734694
delimitedSplit8kbsr2.2240.0453877551020408
split2.2750.0464285714285714
Both versions seem to be faster than the CLR function 😀
Since the slowdown seems to be the virtual tally table I'd wonder if a CLR function that just spits out a streaming number table might be able to improve the performance. Unfortunately, I haven't a clue how to write such a thing.
Here are the functions as they stand:
BSR
CREATE FUNCTION [dbo].delimitedSplit8kbsr
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS (
SELECT top (ISNULL(DATALENGTH(@pString),0)) CONVERT(int,ROW_NUMBER() OVER (ORDER BY (select null))) FROM E1,E1 b,E1 c
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
select itemNumber = 1,
item = substring(@pstring,1,ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1,8000))
union all
SELECT ROW_NUMBER() OVER(ORDER BY N1)+1,
SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
LEAD
CREATE FUNCTION dbo.DelimitedSplit8K_LEAD
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) CONVERT(int,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1,E1 b,E1 c
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
lead(n1,1,16000) over (order by n1) - 1 -n1
FROM cteStart s
)
--===== Do the actual split.
SELECT ItemNumber = 1,
Item =substring(@pstring,1,ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1,8000))
UNION ALL
SELECT ROW_NUMBER() OVER(ORDER BY l.N1)+1,
SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
May 14, 2014 at 11:14 am
mburbea (5/14/2014)
Usman's reformation of the query to remove the union from "cteStart" and add it to the final result as has a nice appreciable gain in performance. It now seems to beat the CLR splitter. This is ONLY true on SQL Server 2014 unfortunately.Here are the optimizations applied from the base "8K splitter" in the updated script file.
1. Collate binary before doing any string comparison.
2. Convert row_number to int.
3. Try a base 20 tally table and doing 3 cross joins instead of 4 (doesn't really seem to change anything).
4. Perform Usman's reformation. (for Lead that still means charindex no choice here).
SplitterNametotal_davg_d
DelimitedSplit8k_LEAD2.2430.0457755102040816
DelimitedSplit8KB2.5460.0519591836734694
delimitedSplit8kbsr2.2240.0453877551020408
split2.2750.0464285714285714
Both versions seem to be faster than the CLR function 😀
Since the slowdown seems to be the virtual tally table I'd wonder if a CLR function that just spits out a streaming number table might be able to improve the performance. Unfortunately, I haven't a clue how to write such a thing.
Here are the functions as they stand:
BSR
CREATE FUNCTION [dbo].delimitedSplit8kbsr
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS (
SELECT top (ISNULL(DATALENGTH(@pString),0)) CONVERT(int,ROW_NUMBER() OVER (ORDER BY (select null))) FROM E1,E1 b,E1 c
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
select itemNumber = 1,
item = substring(@pstring,1,ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1,8000))
union all
SELECT ROW_NUMBER() OVER(ORDER BY N1)+1,
SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
LEAD
CREATE FUNCTION dbo.DelimitedSplit8K_LEAD
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) CONVERT(int,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1,E1 b,E1 c
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
lead(n1,1,16000) over (order by n1) - 1 -n1
FROM cteStart s
)
--===== Do the actual split.
SELECT ItemNumber = 1,
Item =substring(@pstring,1,ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1,8000))
UNION ALL
SELECT ROW_NUMBER() OVER(ORDER BY l.N1)+1,
SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
What machine are you on :hehe: I dont have SQL 2014 yet, so cannot test it myself. But I guess one extra bit of performance may still be possible.
CREATE FUNCTION [dbo].delimitedSplit8kbsr
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter VARCHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS (
SELECT top (ISNULL(DATALENGTH(@pString),0)) CONVERT(int,ROW_NUMBER() OVER (ORDER BY (select null))) FROM E1,E1 b,E1 c
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
select itemNumber = convert(bigint,0),
item = substring(@pstring,1,ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1,8000))
union all
SELECT ROW_NUMBER() OVER(ORDER BY N1),--+1 not needed if we start from zero
SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
LEAD
CREATE FUNCTION dbo.DelimitedSplit8K_LEAD
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) CONVERT(int,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E1,E1 b,E1 c
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
),
cteLen(N1,L1) AS(
SELECT s.N1,
lead(n1,1,16000) over (order by n1) - 1 -n1
FROM cteStart s
)
--===== Do the actual split.
SELECT ItemNumber = convert(bigint,0),
Item =substring(@pstring,1,ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN),0)-1,8000))
UNION ALL
SELECT ROW_NUMBER() OVER(ORDER BY l.N1),--+1 not needed if we start from zero
SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l
;
Although I am sure on larger strings CLR can easily beat these solutions. So it all depends upon the requirements.
May 14, 2014 at 12:46 pm
Is everyone running Jeff's original/updated test rig from the article to get these results?
If so, you might not be testing what you think you are. There is a weakness in the test script that can result in the Csv8K table containing duplicates (i.e. every row is the same). This is related to hiding the NEWID call in the view. In (at least) SQL Server 2012 and 2014, this NEWID value may be cached, resulting in every row being the same.
It is obviously not a very good test to simply split the string once, cache the results in a table spool, and replay that for each duplicate row from the Csv8K table. This particularly (and unfairly) benefits the non-CLR splitters by the way.
As a quick hack fix, you can use undocumented trace flag 8690 to prevent caching the NEWID value in this part of the script:
SELECT @sql = ISNULL(@SQL,'')+
'
IF OBJECT_ID(''dbo.Csv8K'',''U'') IS NOT NULL DROP TABLE dbo.Csv8K;
SELECT *
INTO dbo.Csv8K
FROM dbo.CreateCsv8K
('+CAST(NumberOfRows AS VARCHAR(10))+', '
+CAST(NumberOfElements AS VARCHAR(10))+', '
+CAST(MinElementLength AS VARCHAR(10))+', '
+CAST(MaxElementLength AS VARCHAR(10))+') OPTION (QUERYTRACEON 8690); --# of Rows, # of Elements, MIN element length, MAX element length
EXEC dbo.TestEachFunction '+CAST(NumberOfRows AS VARCHAR(10)) +', '
+CAST(NumberOfElements AS VARCHAR(10))+', '
+CAST(MinElementLength AS VARCHAR(10))+', '
+CAST(MaxElementLength AS VARCHAR(10))+';
'
FROM cteControl
Nevertheless, you need to be careful to check the Csv8K table to check the rows are actually different, as intended.
One more thing. The CLR code isn't fully compiled to native code on creation. The first call to the function may take a full second or more as just-in-time compilation takes place. This also seems unfair, so I add an initial do-nothing split just after the CLR function creation to the test script. This ensures at least most of the CLR is compiled before the main tests start:
SELECT * FROM dbo.Split(N'A,B,C,D', N',');
FWIW a test I just ran (on SQL Server 2014 RTM CU1) with the fixes in place produced:
+-----------------------------------------------------+
¦ SplitterName ¦ TotalDuration ¦ AvgDuration ¦
¦-----------------------+---------------+-------------¦
¦ DelimitedSplit8K_LEAD ¦ 19.07600 ¦ 0.389306 ¦
¦ Split ¦ 12.35800 ¦ 0.252204 ¦
+-----------------------------------------------------+
BTW, that's CLR Split vs Usman's latest version of DelimitedSplit8K_LEAD.
Viewing 15 posts - 601 through 615 (of 990 total)
You must be logged in to reply to this topic. Login to reply