March 20, 2018 at 3:25 pm
Jeff Moden - Tuesday, March 20, 2018 3:10 PMbevanward - Tuesday, March 20, 2018 3:04 PMJeff Moden - Monday, March 19, 2018 7:06 PMCool feedback. Thanks, Bevan.I think the concern of some of the other poster's is having to do with performance. Generally speaking and with few exceptions, an mTVF (Multi-statement Table Valued Function) is going to be slower than an iTVF (Inline Table Valued Function). Your use of LEAD may help overcome that but, if you can use lead, take a look at a modernized version of DelimitedSplit8K that a good online friend, Eirikur Eiriksson , wrote. I've test it and it's twice as fast as DelimitedSplit8k and approaches CLR speeds very closely. Of course, it requires 2012+.
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
Hi Jeff
Have modified Eirikur's version for multiple character delimiters and longer fields. Certainly runs fast from what I've seen so far have to test more.
The only issue I have stumbled into so far is where delimiters overlap and it splits multiple times - for example when splitting with three dots and try and pass through a area with lots of dots. Any thoughts appreciated. Thanks for pointing this all out.
Thanks
BevanPost the modifications you made.
This is what it looks like - did not realise can not attach .sql files.
Thanks
Bevan
use tempdb
go
drop function if exists DelimitedSplit_Max_Multi_LAG
go
create function [dbo].[DelimitedSplit_Max_Multi_LAG]
/*
Based on
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
and http://www.sqlservercentral.com/articles/Tally+Table/72993/
*/
--===== Define I/O parameters
(@pString NVARCHAR(max), @pDelimiter NVARCHAR(1000))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table†produces values from 0 up to 100,000,000...
-- enough to cover a big file
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows
E6(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+6 or 1,000,000
E8(N) AS (SELECT 1 FROM E6 a, E6 b), --10E+8 or 100,000,000
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8
),
cteStart(N1) AS (--==== Find right edge of delimiter
SELECT t.N+case when t.N = 0 then 0 else datalength(@pDelimiter)/2 end
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,datalength(@pDelimiter)/2) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - case when s.N1=1 or count(*) over (partition by (select null)) = ROW_NUMBER() OVER(ORDER BY s.N1) then 1 else (datalength(@pDelimiter)/2) end),0)-s.N1,datalength(@pString)/2))
FROM cteStart s
;
GO
select * from dbo.DelimitedSplit_Max_Multi_LAG('The shortbeard codling can grow to a maximum standard length of about 33 cm (13 in) and takes the form of a somewhat spindle-shaped cylinder. The snout is fairly blunt and the upper jaw is slightly longer than the lower jaw. There is a short barbel on the chin and a flattened spine on the operculum. There are two dorsal fins with no spines and a total of from 69 to 76 soft rays. The first dorsal fin has 6 rays, the second of which is the longest, and the second dorsal fin originates immediately behind the first fin. Each pectoral fin has 17 to 23 soft rays. The pelvic fins are widely separated and each has two elongated soft rays; these fins originate to the front of the pectoral fins. The anal fin has no spines and has 57 to 63 soft rays. The caudal fin is small and rounded. The general colour of this fish is tan, the dorsal and anal fins having dark edges.','. ')
March 20, 2018 at 5:05 pm
bevanward - Tuesday, March 20, 2018 1:55 AMHi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.Thanks again keep well
Bevan
Hi Bevan,
Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.
CREATE VIEW [phi].[vwPROCEDURE_V]
AS
WITH cteProcedure_Code AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(8)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-'))
) ds
)
,
cteProcedure_Date AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(9)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-'))
) ds
)
,
cteProcedure_Location AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS CHAR(1)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-'))
) ds
)
,
cteBlock_Num AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(4)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-'))
) ds
)
SELECT
-- Source
a.FACILITY_STAY_EPISODE_ID
,a.FACILITY_STAY_ID
,a.FACILITY_ID
,a.clinical_codeset
-- Procedure Code
,b.ItemNumber-1 AS [ROW_NUMBER]
,b.ItemNumber-1 AS procedure_code_number
,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code
-- Procedure Date
,c.ItemNumber-1 AS procedure_date_number
,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date
-- Procedure Location
,d.ItemNumber-1 AS procedure_location_number
,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location
-- Procedure Type
,CAST(
CASE
WHEN b.ItemNumber-1 = 0 THEN 'P'
ELSE FORMAT(b.ItemNumber-1,'00')
END AS CHAR(2)
) AS procedure_type
-- Block Number
,e.ItemNumber-1 AS block_num_number
,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num
-- Source
FROM trn.PROCEDURE a
-- Procedure Code
LEFT JOIN cteProcedure_Code b
ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID
-- Procedure Date
LEFT JOIN cteProcedure_Date c
ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=c.ItemNumber
-- Procedure Location
LEFT JOIN cteProcedure_Location d
ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=d.ItemNumber
-- Block Num
LEFT JOIN cteBlock_Num e
ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=e.ItemNumber
GO
The source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.
Regards,
Scott
March 20, 2018 at 8:06 pm
I have previously modified my copy of Jeff's code to only return non-blank items, and return them stripped of leading and trailing spaces.
Here is an excerpt of my changes:
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
--===== Strip leading and trailing spaces, only return non-empty results.
SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY l.N1),
Item = LTRIM(RTRIM(SUBSTRING(@pString, l.N1, l.L1)))
FROM cteLen l
WHERE LEN(SUBSTRING(@pString, l.N1, l.L1)) > 0
I want to do a similar thing for Eirikur's code, but it's a bit trickier due to the LEAD() windowing function.
I include my changes in their entirety, including a longer header with use cases. Some of the header comments are meant to help less TSQL-aware colleagues. The changes are at the bottom of the code.
Compare the results of my use cases with the original code vs. my changes - my desired output should be obvious by comparing the two results.
If there is a better (i.e. more performant) way to implement these changes please let me know.
/*=====================================================================
Function Name : fn_MoH_DelimitedSplit8K_LEAD.sql
Purpose : User Defined Function to split a string based
on a delimiter and return the tokens in the
string as a table object. Uses the LEAD
windowing function for performance gains over
the DelimitedSplit8K function.
SQL Server Version : Microsoft SQL Server 2012 (SP3) (KB3072779) - 11.0.6020.0 (X64)
Other Code Called : None
Originally Written by : Scott In Sydney
Date : 21MAR2018
Stored Process Version : 1.0
=======================================================================
This code was originally published at:
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
See also:
http://www.sqlservercentral.com/articles/Tally+Table/72993/
Full attribution is given to the original author Eirikur Eiriksson by
virtue of the above URL. My assumption is that this code is in the
public domain by virtue of that URL.
=======================================================================
Modification History : Original version
=====================================================================*/
/*---------------------------------------------------------------------
Usage:
-- Create a synonym to this function in the desired database
USE Dev
GO
DROP SYNONYM [dbo].[fn_MoH_SplitDelimitedStringToTable]
CREATE SYNONYM [dbo].[fn_MoH_SplitDelimitedStringToTable]
FOR [master].[dbo].[fn_MoH_DelimitedSplit8K_LEAD]
SELECT ds.ItemNumber, CAST(ds.Item AS VARCHAR(5)) AS Item
FROM fn_MoH_SplitDelimitedStringToTable(' ','foo bar blah') ds
-- default delimiter is a blank space, but a UDF require all parms to be specified, use "DEFAULT" for the default delimiter
SELECT ds.ItemNumber, CAST(ds.Item AS VARCHAR(5)) AS Item
FROM dbo.fn_MoH_SplitDelimitedStringToTable(DEFAULT,'foo bar blah') ds
Returns:
foo
bar
blah
as a table, with Column Name "Item".
SELECT ds.ItemNumber, CAST(ds.Item AS VARCHAR(5)) AS Item
FROM master.dbo.fn_MoH_DelimitedSplit8K_LEAD(',','fie, fi,,,, fo, fum') ds
Returns:
fie
fi
fo
fum
as a table, with Column Name "Item".
=======================================================================
-- Cross Apply with another table
SELECT s.name, ds.Item
FROM (
SELECT TOP 3 * FROM sys.databases
) s
CROSS APPLY
fn_MoH_SplitDelimitedStringToTable(' ','foo bar blah') ds
SELECT s.name, ds.Item
FROM (
SELECT TOP 3 * FROM sys.databases
) s
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable(' ',' foo bar blah ') ds
SELECT s.name, ds.Item
FROM (
SELECT TOP 3 * FROM sys.databases
) s
CROSS APPLY
master.dbo.fn_MoH_DelimitedSplit8K_LEAD('^','foo^^bar^^^blah') ds
Returns foo/bar/blah cross joined (Cartesian Product) with the first
three rows from sys.objects.
=======================================================================
-- Split a delimted string supplied by a table instead of hard coded
IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
SELECT *
INTO #temp
FROM (
SELECT ' foo| bar| blah '
UNION ALL
SELECT 'fie|fi|fo|fum'
) d (string)
SELECT * FROM #temp
SELECT t.string, ds.ItemNumber, ds.Item
FROM #temp t
CROSS APPLY
master.dbo.fn_MoH_DelimitedSplit8K_LEAD('|',t.string) ds
Returns the delimited string data split by the supplied delimiter,
with the string data supplied by a column in a table.
-- Nested query
SELECT s.name, x.ItemNumber, x.Item
FROM (
SELECT TOP 3 * FROM sys.databases
) s
CROSS APPLY
(
SELECT ds.*
FROM #temp t
CROSS APPLY
master.dbo.fn_MoH_DelimitedSplit8K_LEAD('|',t.string) ds
) x
IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp
Returns the delimited string data split by the supplied delimiter,
with the string data supplied by a column in a table,
cross joined (Cartesian Product) with the first
three rows from sys.objects.
=======================================================================
-----------------------------------------------------------------------
Notes:
The input delimited string is limited to 8000 characters.
Do not mark this user defined function as a system function via
sp_MS_marksystemobject. This only works for stored procedures.
Instead, create a synonym in each database requiring this UDF.
This will allow a two-level name to be used to call this UDF.
Alternatively, use a three-level name to call this UDF,
since it does not require the calling database context.
---------------------------------------------------------------------*/
ALTER FUNCTION [dbo].[fn_MoH_DelimitedSplit8K_LEAD]
--===== Define I/O parameters
( @pDelimiter CHAR(1) = ' '
, @pString VARCHAR(8000)
)
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (
SELECT 1 FROM E1 a, E1 b
), --10E+2 or 100 rows
E4(N) AS (
SELECT 1 FROM E2 a, E2 b
), --10E+4 or 10,000 rows max
cteTally(N) AS (
--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0
UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (
--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
),
cteResults AS (
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
)
--===== Strip leading and trailing spaces, only return non-empty results.
SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
LTRIM(RTRIM(r.Item)) AS Item
FROM cteResults r
WHERE LEN(r.Item) > 0
;
/******* END OF FILE *******/
March 20, 2018 at 9:36 pm
Also, I've done a performance comparison of DelimitedSplit8K vs. DelimitedSplit8K_LEAD, given my use case and minor code changes noted in previous posts above.
DelimitedSplit8K:
SET STATISTICS TIME ON
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(8)) AS Item
FROM trn.DIAGNOSIS src
CROSS APPLY
master.dbo.fn_MoH_DelimitedSplit8K('|',
CONCAT(src.diagnosis_code_p,'|',src.diagnosis_code_sec)
) ds
WHERE ds.Item IS NOT NULL
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.(17277002 rows affected)
SQL Server Execution Times:
CPU time = 3927157 ms, elapsed time = 4867563 ms. (Which is 01:27:07)
DelimitedSplit8K_LEAD:
SET STATISTICS TIME ON
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(8)) AS Item
FROM trn.DIAGNOSIS src
CROSS APPLY
master.dbo.fn_MoH_DelimitedSplit8K_LEAD('|',
CONCAT(src.diagnosis_code_p,'|',src.diagnosis_code_sec)
) ds
WHERE ds.Item IS NOT NULL
SQL Server parse and compile time:
CPU time = 11 ms, elapsed time = 11 ms.SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.(17277002 rows affected)
SQL Server Execution Times:
CPU time = 906594 ms, elapsed time = 1832653 ms. (Which is 00:30:22)
Hope this helps...
March 21, 2018 at 5:22 am
Scott In Sydney - Tuesday, March 20, 2018 5:05 PMbevanward - Tuesday, March 20, 2018 1:55 AMHi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.Thanks again keep well
BevanHi Bevan,
Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.
CREATE VIEW [phi].[vwPROCEDURE_V]
AS
WITH cteProcedure_Code AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(8)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-'))
) ds
)
,
cteProcedure_Date AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(9)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-'))
) ds
)
,
cteProcedure_Location AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS CHAR(1)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-'))
) ds
)
,
cteBlock_Num AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(4)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-'))
) ds
)
SELECT
-- Source
a.FACILITY_STAY_EPISODE_ID
,a.FACILITY_STAY_ID
,a.FACILITY_ID
,a.clinical_codeset
-- Procedure Code
,b.ItemNumber-1 AS [ROW_NUMBER]
,b.ItemNumber-1 AS procedure_code_number
,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code
-- Procedure Date
,c.ItemNumber-1 AS procedure_date_number
,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date
-- Procedure Location
,d.ItemNumber-1 AS procedure_location_number
,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location
-- Procedure Type
,CAST(
CASE
WHEN b.ItemNumber-1 = 0 THEN 'P'
ELSE FORMAT(b.ItemNumber-1,'00')
END AS CHAR(2)
) AS procedure_type
-- Block Number
,e.ItemNumber-1 AS block_num_number
,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num-- Source
FROM trn.PROCEDURE a-- Procedure Code
LEFT JOIN cteProcedure_Code b
ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID-- Procedure Date
LEFT JOIN cteProcedure_Date c
ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=c.ItemNumber-- Procedure Location
LEFT JOIN cteProcedure_Location d
ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=d.ItemNumber-- Block Num
LEFT JOIN cteBlock_Num e
ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=e.ItemNumber
GOThe source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.
Regards,
Scott
Hi Scott
Just an initial thought that might be completely off but thought I should mention first up.
Can you run a bunch of batches in parallel?
If you can monopolize the resources on the server then ...
Run the original SQL2012 lag split for each column concurrently
For example run a bunch of these concurrently:
insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is null
If you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.
Then once it is complete recomposite the results into the final required shape?
I think with that number of records you should be able to get that processing time down a lot.
Will have more of look tonight when I get off
Thanks
Bevan
March 21, 2018 at 9:02 am
bevanward - Wednesday, March 21, 2018 5:22 AMScott In Sydney - Tuesday, March 20, 2018 5:05 PMbevanward - Tuesday, March 20, 2018 1:55 AMHi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.Thanks again keep well
BevanHi Bevan,
Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.
CREATE VIEW [phi].[vwPROCEDURE_V]
AS
WITH cteProcedure_Code AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(8)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-'))
) ds
)
,
cteProcedure_Date AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(9)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-'))
) ds
)
,
cteProcedure_Location AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS CHAR(1)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-'))
) ds
)
,
cteBlock_Num AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(4)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-'))
) ds
)
SELECT
-- Source
a.FACILITY_STAY_EPISODE_ID
,a.FACILITY_STAY_ID
,a.FACILITY_ID
,a.clinical_codeset
-- Procedure Code
,b.ItemNumber-1 AS [ROW_NUMBER]
,b.ItemNumber-1 AS procedure_code_number
,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code
-- Procedure Date
,c.ItemNumber-1 AS procedure_date_number
,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date
-- Procedure Location
,d.ItemNumber-1 AS procedure_location_number
,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location
-- Procedure Type
,CAST(
CASE
WHEN b.ItemNumber-1 = 0 THEN 'P'
ELSE FORMAT(b.ItemNumber-1,'00')
END AS CHAR(2)
) AS procedure_type
-- Block Number
,e.ItemNumber-1 AS block_num_number
,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num-- Source
FROM trn.PROCEDURE a-- Procedure Code
LEFT JOIN cteProcedure_Code b
ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID-- Procedure Date
LEFT JOIN cteProcedure_Date c
ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=c.ItemNumber-- Procedure Location
LEFT JOIN cteProcedure_Location d
ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=d.ItemNumber-- Block Num
LEFT JOIN cteBlock_Num e
ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=e.ItemNumber
GOThe source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.
Regards,
ScottHi Scott
Just an initial thought that might be completely off but thought I should mention first up.
Can you run a bunch of batches in parallel?
If you can monopolize the resources on the server then ...
Run the original SQL2012 lag split for each column concurrently
For example run a bunch of these concurrently:
insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is nullIf you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.
Then once it is complete recomposite the results into the final required shape?
I think with that number of records you should be able to get that processing time down a lot.
Will have more of look tonight when I get off
Thanks
Bevan
Hi Scott,
can you post the code of the functions used please?
May have missed them if you've done so earlier, struggling with the connection so cannot see those
😎
March 21, 2018 at 9:28 am
Eirikur Eiriksson - Wednesday, March 21, 2018 9:02 AMbevanward - Wednesday, March 21, 2018 5:22 AMScott In Sydney - Tuesday, March 20, 2018 5:05 PMbevanward - Tuesday, March 20, 2018 1:55 AMHi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.Thanks again keep well
BevanHi Bevan,
Your article has triggered a dialogue which is a good thing - you've found Jeff's code, and no, I was not aware of the modernised version that Jeff mentioned. So your post has triggered further education on my part 🙂
Since I'm on SS 2012, I'll definitely look at the code from Eirikur Eiriksson.
Not to hijack this post, but since it's related, I thought I'd share my use case.
I've got a source table where multiple values are stored in a pipe-delimited string. So, instead of columns foo1-foo50, there's a single foo column, with data "A|B|C|D|etc". It would be rare (but possible) to have a complete set of 50 tokens.
There are multiple such columns, and not always a one-to-one match of tokens within the different columns (i.e. foo might contain 20 tokens, bar might contain 15 tokens, and blah might contain 3 tokens).
I have to "unwind" this data, converting multiple tokens into multiple rows, essentially normalizing the data. A bit like UNPIVOT, but with a concatenated string rather than multiple columns.
Jeff's code has been invaluable in helping me to code this up. I never would have come up with his approach on my own.
Here is my code. fn_MoH_SplitDelimitedStringToTable is identical to fn_DelimitedSplit8K, just a different name.
CREATE VIEW [phi].[vwPROCEDURE_V]
AS
WITH cteProcedure_Code AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(8)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_code_p,'-'),'|',ISNULL(src.procedure_code_sec,'-'))
) ds
)
,
cteProcedure_Date AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(9)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(CONVERT(VARCHAR(8),src.procedure_date_p,112),'-'),'|',ISNULL(src.procedure_date_sec,'-'))
) ds
)
,
cteProcedure_Location AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS CHAR(1)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.procedure_location_p,'-'),'|',ISNULL(src.procedure_location_sec,'-'))
) ds
)
,
cteBlock_Num AS
(
SELECT src.FACILITY_STAY_EPISODE_ID
,CAST(ds.ItemNumber AS SMALLINT) AS ItemNumber
,CAST(ds.Item AS VARCHAR(4)) AS Item
FROM trn.PROCEDURE src
CROSS APPLY
dbo.fn_MoH_SplitDelimitedStringToTable('|',
CONCAT(ISNULL(src.block_num_p,'-'),'|',ISNULL(src.block_num_sec,'-'))
) ds
)
SELECT
-- Source
a.FACILITY_STAY_EPISODE_ID
,a.FACILITY_STAY_ID
,a.FACILITY_ID
,a.clinical_codeset
-- Procedure Code
,b.ItemNumber-1 AS [ROW_NUMBER]
,b.ItemNumber-1 AS procedure_code_number
,CASE WHEN b.Item='-' THEN NULL ELSE b.Item END AS procedure_code
-- Procedure Date
,c.ItemNumber-1 AS procedure_date_number
,CASE WHEN c.Item='-' THEN NULL ELSE CAST(c.Item AS DATE) END AS procedure_date
-- Procedure Location
,d.ItemNumber-1 AS procedure_location_number
,CASE WHEN d.Item='-' THEN NULL ELSE d.Item END AS procedure_location
-- Procedure Type
,CAST(
CASE
WHEN b.ItemNumber-1 = 0 THEN 'P'
ELSE FORMAT(b.ItemNumber-1,'00')
END AS CHAR(2)
) AS procedure_type
-- Block Number
,e.ItemNumber-1 AS block_num_number
,CASE WHEN e.Item='-' THEN NULL ELSE e.Item END AS block_num-- Source
FROM trn.PROCEDURE a-- Procedure Code
LEFT JOIN cteProcedure_Code b
ON a.FACILITY_STAY_EPISODE_ID=b.FACILITY_STAY_EPISODE_ID-- Procedure Date
LEFT JOIN cteProcedure_Date c
ON a.FACILITY_STAY_EPISODE_ID=c.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=c.ItemNumber-- Procedure Location
LEFT JOIN cteProcedure_Location d
ON a.FACILITY_STAY_EPISODE_ID=d.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=d.ItemNumber-- Block Num
LEFT JOIN cteBlock_Num e
ON a.FACILITY_STAY_EPISODE_ID=e.FACILITY_STAY_EPISODE_ID
AND b.ItemNumber=e.ItemNumber
GOThe source table has about 5M rows; the output has about 17M rows. We don't have the fastest server, and the load takes about 4 hours to run. Any performance improvements (i.e. Eirikur's code) or improvements to my own approach will certainly help.
Regards,
ScottHi Scott
Just an initial thought that might be completely off but thought I should mention first up.
Can you run a bunch of batches in parallel?
If you can monopolize the resources on the server then ...
Run the original SQL2012 lag split for each column concurrently
For example run a bunch of these concurrently:
insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is nullIf you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.
Then once it is complete recomposite the results into the final required shape?
I think with that number of records you should be able to get that processing time down a lot.
Will have more of look tonight when I get off
Thanks
BevanHi Scott,
can you post the code of the functions used please?
May have missed them if you've done so earlier, struggling with the connection so cannot see those😎
Hi Eirikur
Firstly thanks for your article Reaping the benefits of the Window functions in T-SQL! Jeff pointed it out to me after I had a go at writing a set-based split function and since have had a great education in taking it all much further.
My main need to adapt what you have written are:
1. Multiple character delimiters
2. Text longer than varchar(8000)
I was wondering if you can have a look at what I have come up with from your experience. It is already significantly faster than what I have had in use however would appreciate any comments if you get a chance.
Many thanks
Bevan
use tempdb
go
drop function if exists DelimitedSplit_Max_Multi_LAG
go
create function [dbo].[DelimitedSplit_Max_Multi_LAG]
/*
This script is based on a modification of an excellent script by Eirikur Eiriksson
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
And commentary and excellent work from Jeff Moden
http://www.sqlservercentral.com/articles/Tally+Table/72993/
Author: Bevan Ward
Date: 20-Feb-2018
Purpose: Table function that takes a string of nvarchar(max) value and a delimiter of nvarchar(1000) and splits the String into records on the delimiter boundaries. If there are no instances of the delimiter then string is returned as a single ordinal.
Dependency: SQL2012+
Based on:
Version: 1
Syntax: select * from dbo.DelimitedSplit_Max_Multi_LAG('<input string>','<delimiter>')
Example: select * from dbo.DelimitedSplit_Max_Multi_LAG('Apple,Orange,Pear,Grape',',')
select a.name, b.ItemNumber, b.Item from sys.objects a cross apply dbo.DelimitedSplit_Max_Multi_LAG(name,'_') b
Pseudo code:
1. Find character position of delimiters - right edge as position + length of delimiter except for first position
2. Use LEAD to pull up the next sequence position and substring the piece from the string
3. Retun this result with a position number
Known issues: over printing delimiters are not catered for - for example the delimiter of '..' when splitting a string that includes ... is not catered for
Form of output:
+----------+----+
|ItemNumber|Item|
+----------+----+
| 1 | <> |
+----------+----+
| n | <> |
+----------+----+
*/
--===== Define I/O parameters
(@pString NVARCHAR(max), @pDelimiter NVARCHAR(1000))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table†produces values from 0 up to 100,000,000...
-- enough to cover a big file
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows
E6(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+6 or 1,000,000
E8(N) AS (SELECT 1 FROM E6 a, E6 b), --10E+8 or 100,000,000
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8
),
cteStart(N1) AS (--==== Find right edge of delimiter
SELECT t.N+case when t.N = 0 then 0 else datalength(@pDelimiter)/2 end
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,datalength(@pDelimiter)/2) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - case when s.N1=1 or count(*) over (partition by (select null)) = ROW_NUMBER() OVER(ORDER BY s.N1) then 1 else (datalength(@pDelimiter)/2) end),0)-s.N1,datalength(@pString)/2))
FROM cteStart s
;
GO
March 21, 2018 at 4:49 pm
bevanward - Wednesday, March 21, 2018 5:22 AMHi ScottJust an initial thought that might be completely off but thought I should mention first up.
Can you run a bunch of batches in parallel?
If you can monopolize the resources on the server then ...
Run the original SQL2012 lag split for each column concurrently
For example run a bunch of these concurrently:
insert into tmp<field one>
select PrimaryKey, Ordinal, Item from <table> cross apply dbo.DelimitedSplit8K_LEAD(<field one>,'|')
delete tmp<field one> where Item is nullIf you monitor your resource consumption on the server as you add more of these and measure overall time that may be the most simple way to get the improvement.
Then once it is complete recomposite the results into the final required shape?
I think with that number of records you should be able to get that processing time down a lot.
Will have more of look tonight when I get off
Thanks
Bevan
Hi Bevan,
Thanks for the idea. I suppose I could embed the separate queries into separate data sources in SSDT vs. using a view, and run those separate queries in parallel on SSDT. For various reasons, I like embedding the logic in views and using those views as a data source in SSDT where possible. But that doesn't mean I'm married to that approach, esp. for performance gains.
I've had limited success using temp tables in SSDT, but I think there are options I can use to make this possible (persistent connections). Of course, I could just create permanent tables in my database and then drop them.
Having said that, I'm quite happy with the performance gains from Eirikur's code.
I've had good success in SSDT using the Balanced Data Distributor in other jobs, but I've never used that with a view, especially with a complex view such as mine. I wonder if I could use the BDD using my current view to achieve parallel processing? One thing to consider is the BDD doesn't guarantee any order in the target table, but that shouldn't be an issue in my processing.
When I ran a single query for performance testing, it ran in 30 mins (see above post). When I ran the data load in my Dev environment overnight, the entire view / data load ran in 32 mins. So I'm wondering if splitting out the queries into parallel processes will gain me much?
Regards,
Scott
March 21, 2018 at 5:13 pm
Eirikur Eiriksson - Wednesday, March 21, 2018 9:02 AMHi Scott,
can you post the code of the functions used please?
May have missed them if you've done so earlier, struggling with the connection so cannot see those😎
Hi Eirikur,
Thanks for your code contributions to the community.
My code changes are here: https://www.sqlservercentral.com/Forums/Dialogs/PostDetails.aspx?PostID=1928604
You may also be interested in: https://www.sqlservercentral.com/Forums/Dialogs/PostDetails.aspx?PostID=1928608
Let me know if you still have problems seeing the code.
The crux of my changes are:
...
cteStart(N1) AS (
--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
),
cteResults AS (
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
FROM cteStart s
)
--===== Strip leading and trailing spaces, only return non-empty results.
SELECT ItemNumber = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
LTRIM(RTRIM(r.Item)) AS Item
FROM cteResults r
WHERE LEN(r.Item) > 0
In summary, I've added an extra cteResults, moved the ROW_NUMBER() derivation to the final select statement, added a WHERE clause to only return non-blank tokens, and strip leading/trailing spaces from the results.
It appears to work correctly. But, it is critical for my processing that ROW_NUMBER() corresponds to the order of the tokens in the delimited string. This appears to be the case when spot checking source vs. target data.
March 22, 2018 at 4:14 am
Hi Guys,
and thanks for the nice comments, wouldn't want to take too much credit for this as it is a community (SSC or Sql Server Community) effort.
I got some suggestions:
1. The (n)varchar(max) can hold up to (2^31) - 1 or 2147483647 characters, the tally table must be able to produce that number, optimal construct would be using 15^8 or something like
WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
, NUMS(N) AS (SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY @@VERSION)
FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8)
2. Consider using a single character token as a delimiter and replace the multi character delimiter with the token before passing it to the function. This is straight forward as one can encapsulate the single character token function in another function.
3. One of the reasons for the LEAD/LAG versions performing so well is that they utilize the special worktable, but it has limits. If the number of rows exceed that limit, then the benefits are minimal or even negative.
😎
I've done some extensive work on this and even have an article draft with code and a test harness somewhere. Will try to find it and post back when I have the chance.
March 22, 2018 at 2:19 pm
Eirikur Eiriksson - Thursday, March 22, 2018 4:14 AMHi Guys,and thanks for the nice comments, wouldn't want to take too much credit for this as it is a community (SSC or Sql Server Community) effort.
I got some suggestions:
1. The (n)varchar(max) can hold up to (2^31) - 1 or 2147483647 characters, the tally table must be able to produce that number, optimal construct would be using 15^8 or something like
WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
, NUMS(N) AS (SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))/2) ROW_NUMBER() OVER (ORDER BY @@VERSION)
FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8)2. Consider using a single character token as a delimiter and replace the multi character delimiter with the token before passing it to the function. This is straight forward as one can encapsulate the single character token function in another function.
3. One of the reasons for the LEAD/LAG versions performing so well is that they utilize the special worktable, but it has limits. If the number of rows exceed that limit, then the benefits are minimal or even negative.
😎
I've done some extensive work on this and even have an article draft with code and a test harness somewhere. Will try to find it and post back when I have the chance.
Hi Eirikur
Thanks for this - most appreciated.
I'll update the tally table to your new method - the order by @@version is something I haven't seen before but am sure is better then order by (select null).
The replacement prior to running a singe delimiter makes a lot of sense as it will stop the need for testing for overlapping delimiters too.
The way I use it means that I'm not sure on the data size so I'll have to start doing some more complete testing once I've made these changes.
Thanks again - the way you have collated, documented and enhanced the community work is excellent and appreciated by many!
Bevan
Viewing 11 posts - 16 through 25 (of 25 total)
You must be logged in to reply to this topic. Login to reply