November 29, 2016 at 2:36 pm
I have a table that contains one column and is populated as follows:
declare @table table ( Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
I'm in need of a query that will split every row that has multiple values into all combinations, and return a distinct list of combinations. For the example above, I need to return:
Each individual value (unique):
----------------------------------
'1'
'5'
'21'
'32'
'97'
'115'
'173'
Groupings of two values:
---------------------------
'1,32'
'1,173'
'32,173'
'5,32'
'21,32'
'54,115'
Groupings of three values:
-----------------------------
'1,32,173'
In summary, for every row, all possible combinations; that is, for row '1,32,173', I need: '1', '32', '173', '1,32', '1,173', '32,173' and '1,32,173'. Any help is greatly appreciated.
November 29, 2016 at 5:06 pm
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):
-- your sample data
declare @table table ( Col varchar(100) )
insert @table (Col)
values ('173'), ('97'), ('54,115' ), ('1,32,173'), ('21,32'), ('5,32'), ('32,173');
-- solution:
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT a.col, item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col, col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col, col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col, item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
ORDER BY col -- not required, will slow you down. Included for demo purposes only!
You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
-- Itzik Ben-Gan 2001
November 30, 2016 at 2:42 am
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
-- solution
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
select distinct Col from R order by Col
-- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).
-- The second CTE gets the number of elements in each original RowID.
-- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.
November 30, 2016 at 2:47 am
N_Muller (11/30/2016)
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
Now in a way that one can actually read the code :w00t:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
selectdistinct Col from R order by Col
November 30, 2016 at 9:54 am
N_Muller (11/30/2016)
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
-- solution
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
select distinct Col from R order by Col
-- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).
-- The second CTE gets the number of elements in each original RowID.
-- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.
First, I noticed an issue with my solution where I was getting duplicates, here's an updated version:
declare @table table (IndexID int identity(1,1), Col varchar(100));
insert @table (Col)
values ('173'), ('97'), ('54,115' ), ('1,32,173'), ('21,32'), ('5,32'), ('32,173');
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION
SELECT item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
ORDER BY item; -- not required.
Next, if FN_ListToTable_8K is not an Inline Table Valued function then you need to change that. A multi-line Table Valued splitter function will KILL your performance. The solution you are using includes a recursive CTE - those can be costly too. Here's a performance test to compare techniques:
SET NOCOUNT ON;
-- Sample data:
declare @table table (IndexID int identity(1,1), Col varchar(100));
WITH parts(n,p) AS
(
SELECT TOP (10000)
'1'+REPLICATE('0',(abs(checksum(newid())%3)+1)), abs(checksum(newid())%3)+1
FROM sys.all_columns
)
INSERT @table
SELECT
CASE p
WHEN 1 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))
WHEN 2 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))
ELSE cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))
END
FROM parts;
--SELECT * FROM @table
DBCC TRACEON(2453); -- for better cardinality estimates on temp variables
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'MY SOLUTION: '+char(13)+char(10)+REPLICATE('-',50);
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION
SELECT item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
--ORDER BY item
PRINT 'RCTE:'+char(13)+char(10)+REPLICATE('-',50);
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ItemNumber,
Col = V.Item
from@table Y cross APPLY dbo.DelimitedSplit8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
SELECT DISTINCT Col
FROM R
--ORDER BY col;
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
DBCC TRACEOFF(2453);
Perf Test results:
MY SOLUTION:
--------------------------------------------------
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 31 ms, elapsed time = 32 ms.
Table '#A73A2F66'. Scan count 25, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 3458, logical reads 98032, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 547 ms, elapsed time = 427 ms.
RCTE:
--------------------------------------------------
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 16 ms, elapsed time = 21 ms.
Table 'Worktable'. Scan count 58811, logical reads 392249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#A73A2F66'. Scan count 3, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1029 ms, elapsed time = 1193 ms.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
You can see - even with a good splitter the recursive CTE solution is less than half as fast and produces 4X the number of reads.
-- Itzik Ben-Gan 2001
December 6, 2016 at 3:24 pm
N_Muller (11/30/2016)
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
-- solution
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
select distinct Col from R order by Col
-- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).
-- The second CTE gets the number of elements in each original RowID.
-- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.
Can you post a copy of the FN_ListToTable_8K function so that we can try out your code? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2016 at 4:52 pm
Jeff Moden (12/6/2016)Can you post a copy of the FN_ListToTable_8K function so that we can try out your code? Thanks.
CREATE FUNCTION [dbo].[FN_ListToTable_8K]
(
@listVARCHAR(8000),
@delimiterCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
--
-- Thist function has had minimal if any modification on code
-- by Jeff Moden (SQL Server Central)
--
-- CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS
(
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1
), --10E+1 or 10 rows
E2(N) AS
(
SELECT1
FROME1 a,
E1 b
),--10E+2 or 100 rows
E4(N) AS
(
SELECT1
FROME2 a,
E2 b
),--10E+4 or 10,000 rows
cteTally(N) AS
(
--
--This provides the "base" CTE and limits the number of rows right up front
--for both a performance gain and prevention of accidental "overruns"
--
SELECTTOP
(ISNULL(DATALENGTH(@list),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROME4
),
cteStart(N1) AS
(
--
--This returns N+1 (starting position of each "element" just once for each delimiter)
--
SELECT1
UNION ALL
SELECTt.N+1
FROMcteTally t
WHERESUBSTRING(@list,t.N,1) = @delimiter
),
cteLen(N1,L1) AS
(
--
--Return start and length (for use in substring)
--
SELECTs.N1,
ISNULL(NULLIF(CHARINDEX(@delimiter,@list,s.N1),0)-s.N1,8000)
FROMcteStart s
)
--
--Do the actual split. The ISNULL/NULLIF combo handles the length for the final
--element when no delimiter is found.
--
SELECTID= ROW_NUMBER() OVER(ORDER BY l.N1),
ListValues= SUBSTRING(@list, l.N1, l.L1)
FROMcteLen l
December 6, 2016 at 6:01 pm
N_Muller (12/6/2016)
Jeff Moden (12/6/2016)Can you post a copy of the FN_ListToTable_8K function so that we can try out your code? Thanks.
CREATE FUNCTION [dbo].[FN_ListToTable_8K]
(
@listVARCHAR(8000),
@delimiterCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
--
-- Thist function has had minimal if any modification on code
-- by Jeff Moden (SQL Server Central)
--
-- CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS
(
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1
), --10E+1 or 10 rows
E2(N) AS
(
SELECT1
FROME1 a,
E1 b
),--10E+2 or 100 rows
E4(N) AS
(
SELECT1
FROME2 a,
E2 b
),--10E+4 or 10,000 rows
cteTally(N) AS
(
--
--This provides the "base" CTE and limits the number of rows right up front
--for both a performance gain and prevention of accidental "overruns"
--
SELECTTOP
(ISNULL(DATALENGTH(@list),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROME4
),
cteStart(N1) AS
(
--
--This returns N+1 (starting position of each "element" just once for each delimiter)
--
SELECT1
UNION ALL
SELECTt.N+1
FROMcteTally t
WHERESUBSTRING(@list,t.N,1) = @delimiter
),
cteLen(N1,L1) AS
(
--
--Return start and length (for use in substring)
--
SELECTs.N1,
ISNULL(NULLIF(CHARINDEX(@delimiter,@list,s.N1),0)-s.N1,8000)
FROMcteStart s
)
--
--Do the actual split. The ISNULL/NULLIF combo handles the length for the final
--element when no delimiter is found.
--
SELECTID= ROW_NUMBER() OVER(ORDER BY l.N1),
ListValues= SUBSTRING(@list, l.N1, l.L1)
FROMcteLen l
I'm thinking that code is going to work pretty well for you. 😀 Thanks for the mention in the comments.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2016 at 6:03 pm
Shifting gears back to the problem at hand, will there ever be CSVs with more than 3 elements?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2016 at 7:23 am
Jeff Moden (12/6/2016)
Shifting gears back to the problem at hand, will there ever be CSVs with more than 3 elements?
Yes, but the most I've seen in my scenarios is about 20.
December 7, 2016 at 7:24 am
Jeff Moden (12/6/2016)
I'm thinking that code is going to work pretty well for you. 😀 Thanks for the mention in the comments.
I give credit, where credit is due 🙂
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply