May 1, 2007 at 9:37 pm
Ok Evan, let's go way back to your original request... you have data that looks like this...
code_list_id |
code_list_text
1
'Alpha,Bravo,Charlie,Delta,Echo'
2
'India,Foxtrot,Golf'
...
..
..
.
100
'X-ray,Yankee,Zulu'
...and you want it to look like this...
code_list_id |
code_1
code_2
code_3
code_4
code_5
1
'Alpha'
'Bravo'
'Charlie'
' Delta'
'Echo'
2
'India'
'Foxtrot'
'Golf'
NULL
NULL
...
.
.
.
.
.
..
.
.
.
.
.
100
'X-ray'
'Yankee'
'Zulu'
NULL
NULL
First, I think that Serqiy is correct... you don't really need it like that... you just need to be able to query it as if it were like that. Don't worry... I'm gonna give it to you both ways...
In order to be able to do this, we need the help of a special little multi-purpose tool known as a "Tally" or "Numbers" table... I like "Tally" because it just sounds cooler All it is is a table containing a single column of well indexed sequential numbers starting at "1" and will help you do some remarkable things that most folks would have to use a cursor for. And, it runs a heck of a lot faster than a cursor does, too. Here's how to make one... make it a permanent part of your database 'cause it's worth it... and notice that we don't use a WHILE loop or cursor to make the Tally table, either...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates when you need it
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
Alright... just for the rest of the folks who might want to try this one, we need a bit of test data...
--===== Create a test table to hold some test data for the demo
CREATE TABLE Code_Lists
(
Code_List_ID INT,
Code_List_Text VARCHAR(8000)
)
INSERT INTO Code_Lists (Code_List_ID,Code_List_Text)
SELECT 1,'Alpha,Bravo,Charlie,Delta,Echo' UNION ALL
SELECT 2,'India,Foxtrot,Golf' UNION ALL
SELECT 3,'Delta,Echo' UNION ALL
SELECT 4,'Alpha' UNION ALL
SELECT 5,' ' UNION ALL
SELECT 6,'X-ray,Yankee,Zulu'
Ok... now the hard part... about now, most folks resort to some form of RBAR (pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row) either by using a WHILE loop or a Cursor. The logical choice would be to make a SPLIT function but even that is a form of RBAR. Nope, we're gonna do like Serqiy suggested (to start with)... we're gonna normalize the data (first) and we're going to do it with what most folks avoid... a good healthy cross-join that works pretty darned fast because of the criteria...
--===== Do the split with a count
SELECT Code_List_ID,
CASE
WHEN LEN(h.Code_List_Text)=0
THEN NULL
ELSE SUBSTRING(','+h.Code_List_Text+',', t.N+1, CHARINDEX(',', ','+h.Code_List_Text+',', t.N+1)-t.N-1)
END AS Code_List_Text,
CASE
WHEN LEN(h.Code_List_Text)=0
THEN 1
ELSE t.N-DATALENGTH(REPLACE(LEFT(h.Code_List_Text,t.N), ',', ''))+1
END AS Position
FROM dbo.Tally t,
dbo.Code_Lists h
WHERE SUBSTRING(','+h.Code_List_Text+',', t.N, 1) = ','
AND t.N < DATALENGTH(','+h.Code_List_Text+',')
Try THAT with a cursor... this will easily do 10,000 rows in a little over 1 second. Yeah, there's probably a faster way to do it but I'm being a bit lazy. Here's what the result set looks like...
Code_List_ID |
Code_List_Text
Position
1
Alpha
1
1
Bravo
2
1
Charlie
3
1
Delta
4
1
Echo
5
2
India
1
2
Foxtrot
2
2
Golf
3
3
Delta
1
3
Echo
2
4
Alpha
1
5
NULL
1
6
X-ray
1
6
Yankee
2
6
Zulu
3
Notice that the result set contains a position number which is correct for the position in the CSV string from which it was plucked. Now, I'm not sure why, but lot's of folks don't like "long skinny" tables where row information is split up... so, let's put it all back together with another chunk of SQL prestidigitation known as a "cross-tab" (it's in Books Online if you want to read about it)... for this, we'll also use the results of the previous query as if it were a table or view... that's called a "Derived Table" and some folks actually prefer to think of them as inline views. Here we go...
SELECT d.Code_List_ID,
MAX(CASE WHEN d.Position = 1 THEN d.Code_List_Text ELSE NULL END) AS Code_1,
MAX(CASE WHEN d.Position = 2 THEN d.Code_List_Text ELSE NULL END) AS Code_2,
MAX(CASE WHEN d.Position = 3 THEN d.Code_List_Text ELSE NULL END) AS Code_3,
MAX(CASE WHEN d.Position = 4 THEN d.Code_List_Text ELSE NULL END) AS Code_4,
MAX(CASE WHEN d.Position = 5 THEN d.Code_List_Text ELSE NULL END) AS Code_5
FROM (--==== Derived table "d" splits the code list by Code_List_ID and provides a position for each item in Code_List_Text
SELECT Code_List_ID,
CASE
WHEN LEN(h.Code_List_Text)=0
THEN NULL
ELSE SUBSTRING(','+h.Code_List_Text+',', t.N+1, CHARINDEX(',', ','+h.Code_List_Text+',', t.N+1)-t.N-1)
END AS Code_List_Text,
CASE
WHEN LEN(h.Code_List_Text)=0
THEN 1
ELSE t.N-DATALENGTH(REPLACE(LEFT(h.Code_List_Text,t.N), ',', ''))+1
END AS Position
FROM dbo.Tally t,
dbo.Code_Lists h
WHERE SUBSTRING(','+h.Code_List_Text+',', t.N, 1) = ','
AND t.N < DATALENGTH(','+h.Code_List_Text+',')
) d
GROUP BY d.Code_List_ID
And, how do you use THAT in a query... you either convert THAT to a view and then query the view (WARNING WARNING WARNING!!!! DO NOT USE A VIEW FOR MORE THAN ABOUT 10K rows for this unless you make an indexed view!), insert the results into a new table and query the new table (even if it's a temp table), or use the query as another derived table!
Lemme know how it works for you... and yes... I agree... legacy hardware is a pain in the patootie but its still fun to mess with
_____________________________________________________________________________________________________________________________________
--Jeff Moden
Change is inevitable... Change for the better is not.
May 1, 2007 at 10:00 pm
Jeff,
there is no need for a cursor if you use my function:
SELECT code_list_id, dbo.List_CharLong(code_list_text, ',')
FROM CodesTable
It returns same set of data.
_____________
Code for TallyGenerator
May 2, 2007 at 12:10 am
Hi Jeff
Your solution using the "Tally" and derived position tables worked beautifully...
I got exactly what I needed.
That was some mighty fine codin' Mr. Moden.
Your assistance is greatly appreciated.
Cheers
Evan
May 2, 2007 at 6:48 am
I know, Serqiy... and I agree... I was just trying to avoid the UDF just for fun.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2007 at 6:52 am
Thanks, Evan... appreciate the feedback. Just remember that with these types of things (cross tab), the whole result set must materialize before criteria will work on it (which is why these really stink for views)...
Also, I just went back and reread a post I missed... since your black box never spits out less than 2 items, we can get rid of the CASE statements in the derived table to make this a wee-bit faster...
SELECT d.Code_List_ID,
MAX(CASE WHEN d.Position = 1 THEN d.Code_List_Text ELSE NULL END) AS Code_1,
MAX(CASE WHEN d.Position = 2 THEN d.Code_List_Text ELSE NULL END) AS Code_2,
MAX(CASE WHEN d.Position = 3 THEN d.Code_List_Text ELSE NULL END) AS Code_3,
MAX(CASE WHEN d.Position = 4 THEN d.Code_List_Text ELSE NULL END) AS Code_4,
MAX(CASE WHEN d.Position = 5 THEN d.Code_List_Text ELSE NULL END) AS Code_5
FROM (--==== Derived table "d" splits the code list by Code_List_ID
-- and provides a position for each item in Code_List_Text
SELECT Code_List_ID,
SUBSTRING(','+h.Code_List_Text+',', t.N+1, CHARINDEX(',', ','+h.Code_List_Text+',', t.N+1)-t.N-1)
AS Code_List_Text,
t.N-DATALENGTH(REPLACE(LEFT(h.Code_List_Text,t.N), ',', ''))+1
AS Position
FROM dbo.Tally t,
dbo.Code_Lists h
WHERE SUBSTRING(','+h.Code_List_Text+',', t.N, 1) = ','
AND t.N < DATALENGTH(','+h.Code_List_Text+',')
) d
GROUP BY d.Code_List_ID
___________________________________________________________________________________________________________________
--Jeff Moden
Change is inevitable... Change for the better is not.
May 2, 2007 at 9:40 am
Thanks, Jeff, for taking time to produce an excellent solution. I was going to take time to use Burton Roberts' 2001 function to solve it. I have been using it extensively since I found it some time ago. It is very similar to Sergiy's function. But your solution works just fine. I'm including it below.
CREATE FUNCTION udf_split(@sText varchar(8000), @sDelim varchar(20) = ' ')
RETURNS @retArray TABLE (idx smallint Primary Key, value varchar(8000))
AS
/***********************************************************************************
udf_split
Emulate VB's Split() function and parameter arrays .
Taken from Burton Roberts, SQL Server Magazine, July 2001.
Renamed from fn_Split to udf_split in order to maintain local naming conventions.
Sample call: SELECT * FROM dbo.udf_Split(@string, '.')
-----------------------------------------------------------------------------------
CHANGE HISTORY
-----------------------------------------------------------------------------------
DATE WHO COMMENT
---------- ----------- ------------------------------------------------------------
05/02/2006 Carl Start.
***********************************************************************************/
BEGIN
DECLARE @idx smallint,
@value varchar(8000),
@bcontinue bit,
@iStrike smallint,
@iDelimlength tinyint
IF @sDelim = 'Space'
BEGIN
SET @sDelim = ' '
END
SET @idx = 0
SET @sText = LTrim(RTrim(@sText))
SET @iDelimlength = DATALENGTH(@sDelim)
SET @bcontinue = 1
IF NOT ((@iDelimlength = 0) or (@sDelim = 'Empty'))
BEGIN
WHILE @bcontinue = 1
BEGIN
--If you can find the delimiter in the text, retrieve the first element and
--insert it with its index into the return table.
IF CHARINDEX(@sDelim, @sText)>0
BEGIN
SET @value = SUBSTRING(@sText,1, CHARINDEX(@sDelim,@sText)-1)
BEGIN
INSERT @retArray (idx, value)
VALUES (@idx, @value)
END
--Trim the element and its delimiter from the front of the string.
--Increment the index and loop.
SET @iStrike = DATALENGTH(@value) + @iDelimlength
SET @idx = @idx + 1
SET @sText = LTrim(Right(@sText,DATALENGTH(@sText) - @iStrike))
END
ELSE
BEGIN
--If you can’t find the delimiter in the text, @sText is the last value in
--@retArray.
SET @value = @sText
BEGIN
INSERT @retArray (idx, value)
VALUES (@idx, @value)
END
--Exit the WHILE loop.
SET @bcontinue = 0
END
END
END
ELSE
BEGIN
WHILE @bcontinue=1
BEGIN
--If the delimiter is an empty string, check for remaining text
--instead of a delimiter. Insert the first character into the
--retArray table. Trim the character from the front of the string.
--Increment the index and loop.
IF DATALENGTH(@sText)>1
BEGIN
SET @value = SUBSTRING(@sText,1,1)
BEGIN
INSERT @retArray (idx, value)
VALUES (@idx, @value)
END
SET @idx = @idx+1
SET @sText = SUBSTRING(@sText,2,DATALENGTH(@sText)-1)
END
ELSE
BEGIN
--One character remains.
--Insert the character, and exit the WHILE loop.
INSERT @retArray (idx, value)
VALUES (@idx, @sText)
SET @bcontinue = 0
END
END
END
RETURN
END
September 25, 2007 at 1:13 am
Sorry,
Was reviewing old PM's and came across this link. Here's a split function that has no loops... the details, as always, are in the comments...
drop FUNCTION dbo.fnTSplit
go
CREATE FUNCTION dbo.fnTSplit
/***************************************************************************************************
Purpose:
Split a delimited string into a table variable including a column that identifies the ordinal of
each value that has been split out of the input string. Just about ANY size delimiter may be used.
Programmer Notes:
1. Blank values return NULLs in the result set even when they are at the beginning or end of the
input string.
2. Empty input string returns result set with 1 value of NULL.
3. NULL input string returns result set with 1 value of NULL.
4. Function raises an error if the length of the input string is greater than 7,998 characters.
Typical Performance:
20 items in a single string: < 1 millisecond
1,373 items in a single string: 30-46 milliseconds
Example Usage:
SELECT * FROM dbo.fnTSplit
Revision History:
Rev 00 - 05/11/2007 - Jeff Moden
- Initial creation, unit test, and performance test for SQLServerCentral.Com
***************************************************************************************************/
--==================================================================================================
-- Declare the I/O variables
--==================================================================================================
(
@DelimitedList VARCHAR(8000), --Delimited string of values to be split
@Delimiter VARCHAR(8000) --The delimiter to split the values on
)
RETURNS @Result TABLE --The return table
(
Ordinal INT IDENTITY(1,1), --The ordinal postion of the split value
String VARCHAR(8000) --The split value
)
AS
BEGIN
--==================================================================================================
-- Declare local variables
--==================================================================================================
DECLARE @MyDelim CHAR(1)
--==================================================================================================
-- Test the input variable for max length. Force error if test fails because cannot use a
-- RAISERROR in a function.
--==================================================================================================
IF LEN(@DelimitedList) > 7998 --Need room for two delimiters (8000-2=7998)
INSERT INTO @Result (STRING)
SELECT STR(1/0) --Forces an error (divide by zero)
--==================================================================================================
-- Presets
--==================================================================================================
--===== Use an "invisible" character that will replace the given delimiter(s)
SET @MyDelim = CHAR(1)
--===== Replace the delimiters in the input string, no matter how long they may be, with the single
-- character "invisible" delimiter we just created above. Then, wrap the string in delimiters.
SET @DelimitedList = @MyDelim+ISNULL(REPLACE(@DelimitedList,@Delimiter,@MyDelim),'')+@MyDelim
--==================================================================================================
-- Split the modified input string into the result table using the Tally table to create a
-- set-based "loop". The WITH (NOLOCK) nearly doubles the speed.
--==================================================================================================
INSERT INTO @Result (String)
SELECT NULLIF(SUBSTRING(@DelimitedList,N+1,CHARINDEX(@MyDelim,@DelimitedList,N+1)-N-1),'')
FROM dbo.Tally WITH (NOLOCK)
WHERE N < LEN(@DelimitedList)
AND SUBSTRING(@DelimitedList,N,1)=@MyDelim
RETURN
END
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply