August 5, 2007 at 12:05 am
My table has data like:
record 1 --> 12 | 34 |
record 2 --->536|
In this data, the numbers are separated by pipes "|" and there can be any number of numbers. I need to parse each of these records and separate the numbers to different records. The output from the above data should be:
12
34
536
How can I get this? I tried a recursive function. It returns a table if i give it an input string of the format "1 | 2 | 3 | 4|". But I have to call the function as " select * from recursive_fn ('1 | 2| 3|4|') ". I can not provide the column of the source table as input to the function. Please advise on a better approach.
Thanks to all.
August 5, 2007 at 6:29 am
You should fix INSERT, not SELECT.
_____________
Code for TallyGenerator
August 5, 2007 at 1:08 pm
First and foremost, Serqiy is correct... delimited data should not be allowed to be stored in the database. The inserts to the database should be done via a staging table, normalized, and only then should it be stored in the database.
Now, if I read you correctly, you want to split a whole table's worth of PSV (pipe seperated values).
The following method will allow you to either create the normalized data or continue to calculate the data from a delimited column. Again, I strongly recommend the former rather than the latter...
Creating a "Helper Table"
Ok... before we even get started, we need a helper table to do this. It's a "numbers" table and it consists of nothing more than, you guessed it, a column of well indexed sequential numbers. I call it a "Tally" table just because it's easier to say and is a unique enough word so as not to confuse anyone in conversation. Here's how to make one... it should be a permanent table because it has lots of other uses...
--===== Create and populate the Tally table on the fly
SELECT TOP 11000 --equates to more than 30 years of dates
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
Creating Some Test Data
Ok... You didn't really provide any test data or DDL to speak of, but we need something we test on. The following code creates a test table called "jbmTest"... read the comments in the code to see what it's doing...
--===== Create and populate a test table.
-- Column RowNum contains unique seqential numbers starting at 1
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Takes about 11 seconds to execute for a million rows and < 1 second for 10 thousand.
SELECT TOP 10000 --<<< Controls how many rows to create
RowNum = IDENTITY(INT,1,1),
SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),
SomePSV = 'Part1|Part2|Part3|Part4|Part5||Part7|Part8|Part9|Part10|' --Pipe separated
INTO dbo.JBMTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN
--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--===== Allow nulls in the PSV column
ALTER TABLE dbo.JBMTest
ALTER COLUMN SomePSV VARCHAR(70) NULL
--===== Change some of the rows (evenly divisible by 10) to only have 1 parameter
UPDATE dbo.JBMTest
SET SomePSV = 'Part1|'
WHERE RowNum % 10 = 0
--===== Change one row to not have anything
UPDATE dbo.JBMTest
SET SomePSV = NULL
WHERE RowNum = 1
--===== Change one row to have an empty string
UPDATE dbo.JBMTest
SET SomePSV = ''
WHERE RowNum = 2
--===== Change one row to have just a delimiter
UPDATE dbo.JBMTest
SET SomePSV = '|'
WHERE RowNum = 3
Split a Whole Table's Worth of Delimited Values
And now, the solution to your problem... this solution uses a cross-join between your table (JBMTest in the code above and below) at the character level of the SomePSV column and the numbers in the Tally table. It'll expand 100,000 rows of 10 PSV's (Pipe separated values) in about 26 seconds and 10,000 PSV's in about 3 seconds, the results of which are simply displayed on screen (use the grid mode). The output of the solution could certainly be directed to a table or a temp table for additional use without having to recalculate (hint: could be used to normalize the data you store as a child table).
--===== Declare and start a time to measure duration with
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
--===== Declare a variable to hold the delimiter
DECLARE @Delim CHAR(1)
SET @Delim = '|'
--===== Do the split
-- Do nulls and pipeless blanks
SELECT RowNum,SomeInt,SomePsv,1 AS Position
FROM dbo.JBMTest
WHERE SomePsv IS NULL
OR SomePsv = ''
UNION ALL
-- Do everything that has a delimiter
SELECT RowNum,
SomeInt,
LTRIM(RTRIM(NULLIF(SUBSTRING(@Delim+h.SomePsv+@Delim, t.N+1,
CHARINDEX(@Delim, @Delim+h.SomePsv+@Delim, t.N+1)-t.N-1),''))) AS Val,
t.N-LEN(REPLACE(LEFT(@Delim+h.SomePsv+@Delim,t.N), @Delim, '')) AS Position
FROM dbo.Tally t,
dbo.JBMTest h
WHERE SUBSTRING(@Delim+h.SomePsv+@Delim, t.N, 1) = @Delim
AND t.N < LEN(@Delim+h.SomePsv)
--===== Display the duration
PRINT DATEDIFF(ms,@StartTime,GETDATE())
A word of caution... I had to change some pretty well established code to allow for the trailing delimiter (a bit different than normal) and haven't spent a great deal of time testing it. You should. It does trim the leading and training blanks found in each element (also, highly unusual)...
In SQL Server 2005, I imagine some sort of recursive CTE would be in order.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2007 at 1:35 pm
Thanks Jeff, Serqiy. I really appreciate the detailed reply that Jeff gave. Also, I realise that delimited data is good in files, not in database tables. I have learnt it the hard way
However, i was up last night and finally created two functions to do the job. the first is a recursive function which splits the input string until the string ends and returns a table with each element of the input string as a row.
The second function is a cursor on the source table and passes each record from the source table to the recursive function. Finally, from this second function I get the entire source table parsed as I want it.
Once again, thankyou to all you guys for putting in the time and effort.
August 5, 2007 at 1:57 pm
Thanks. It would be nice if you shared your solution with us... both the function and the cursor, please.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2007 at 5:21 pm
Sure Jeff...The recursive parsing function followed by the function to count and summarise the entire table based on input dates.
CREATE FUNCTION dbo.MPM_PARSE_ERROR(@error_number VARCHAR(100))
RETURNS @ERROR_TABLE TABLE (ERROR VARCHAR(5))
AS
BEGIN
/*------------------------------------------------------------
Name : MPM_PARSE_ERROR
Input : Error numbers as pipe delimited sring from AUD_ERROR_LOG
Purpose: Parses pipe delimited error strings, returns a table
with each error as a record of the output table.
------------------------------------------------------------*/
DECLARE @pos_pipe INT
DECLARE @current_error VARCHAR(5)
DECLARE @new_error_str VARCHAR(100)
SET @pos_pipe = CHARINDEX('|', @error_number)
--Set Exit criteria for recursive function
IF (@pos_pipe = 0 AND LEN(LTRIM(RTRIM(@error_number))) = 0) OR @error_number = '|' or @error_number = '||'
BEGIN
RETURN
END
--Error string may begin with a pipe
IF @pos_pipe = 1
BEGIN
SET @error_number = RIGHT(@error_number, len(@error_number) - 1) + '|'
SET @pos_pipe = CHARINDEX('|', @error_number)
END
IF @pos_pipe = 0 --last error in string
BEGIN
SET @current_error = @error_number
SET @new_error_str = '|'
END
ELSE --may be another error number after current error
BEGIN
SET @current_error = substring (@error_number, 1,@pos_pipe - 1)
SET @new_error_str = replace (@error_number, @current_error + '|', '')
END
INSERT INTO @ERROR_TABLE
SELECT LTRIM(RTRIM(@current_error)) UNION
SELECT * from dbo.MPM_PARSE_ERROR(@new_error_str)
RETURN
END
--------------------------------------------------------------------------
CREATE FUNCTION MPM_ERROR_SUMMARY (@start_date DATETIME, @end_date DATETIME)
RETURNS @ERROR_SUMMARY TABLE (ERROR VARCHAR(5), MASTER_SEQ_ID INT, LOAD_ID INT, ERROR_DATE_TIME DATETIME)
AS
BEGIN
/*------------------------------------------------------------
Name : MPM_ERROR_SUMMARY
Input : Start and end dates for which errors have to be summarized.
Purpose: Provide a summary of Errors in the AUD_ERROR_LOG table
by process and count of errors.
------------------------------------------------------------*/
DECLARE @error_number VARCHAR(100)
DECLARE @seq INT, @load_id INT, @er_date DATETIME
DECLARE CUR_ERROR CURSOR FOR
SELECT ERROR_NUMBER,
MASTER_SEQ_ID,
LOAD_ID,
ERROR_DATE_TIME
FROM AUD_ERROR_LOG
WHERE ERROR_DATE_TIME >= @start_date
AND ERROR_DATE_TIME <= @end_date
OPEN CUR_ERROR
FETCH NEXT FROM CUR_ERROR INTO @error_number, @seq, @load_id, @er_date
WHILE @@FETCH_STATUS = 0
BEGIN
--Pipe delimited errors have to be parsed. call function for this.
INSERT INTO @ERROR_SUMMARY
SELECT ERROR, @seq, @load_id, @er_date FROM DBO.MPM_PARSE_ERROR(@error_number)
FETCH NEXT FROM CUR_ERROR INTO @error_number, @seq, @load_id, @er_date
END
CLOSE CUR
DEALLOCATE CUR
RETURN
END
August 5, 2007 at 10:02 pm
Thank you...
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2007 at 6:57 am
shailendra;
You might be more efficient having WHILE loop in the function instead of a recursive function. Recursive functions certainly have their place, but I'm not sure this is one of them.
CREATE
FUNCTION dbo.MPM_PARSE_ERROR(@error_number VARCHAR(100))RETURNS
@ERROR_TABLE TABLE (ERROR VARCHAR(5))AS
BEGIN
/*------------------------------------------------------------
Name : MPM_PARSE_ERROR
Input : Error numbers as pipe delimited sring from AUD_ERROR_LOG
Purpose: Parses pipe delimited error strings, returns a table
with each error as a record of the output table.
------------------------------------------------------------*/
DECLARE @pos_pipe INTDECLARE @current_error VARCHAR(5)DECLARE @new_error_str VARCHAR(100)SET @new_error_str = LTRIM(RTRIM(@error_number))SET @pos_pipe = CHARINDEX('|', @new_error_str) WHILE (@pos_pipe > 0) BEGIN --Error string may begin with a pipe IF @pos_pipe = 1 SET @current_error = '' ELSE SET @current_error = LEFT(@new_error_str, @pos_pipe-1) --comment out this line if you do want empty values returned IF(LEN(@current_error) > 0) INSERT INTO @ERROR_TABLE SELECT LTRIM(RTRIM(@current_error)) SET @new_error_str = SUBSTRING(@new_error_str, @pos_pipe + 1, 100) SET @pos_pipe = CHARINDEX('|', @new_error_str) END--last error in string?IF(LEN(@new_error_str) > 0) INSERT INTO @ERROR_TABLE SELECT LTRIM(RTRIM(@new_error_str))RETURNEND
Hope this helps
Mark
August 6, 2007 at 7:47 am
Thanks Mark,
You solution also looks good, will try it out. Thanks
August 6, 2007 at 8:41 am
I'd be interested in seeing the performance characteristics of each of the three solutions.
--------------------
Colt 45 - the original point and click interface
August 6, 2007 at 8:48 am
Yeah, exactly what i had in mind. But i need to wait until the weekend to do that.
August 6, 2007 at 11:19 am
Or... just use the set based solution I provided to do the whole table at once
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2007 at 11:20 am
Heh... that's why I asked to see the code... I'll be doing the ol' million row testing on these solutions tonight... Ladies and Gentlemen, place your bets
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply