When I got done reading Stephen Lasham's article Extracting a String Within Delimiters - Part 2, I thought that I would share with the community, some functions I had just written a couple weeks earlier. I have found that I do a lot of string parsing where I work. It turns out that I am always either reading flat files into SQL, or having to parse some table a Web Developers populates with delaminated strings. As such, I happen to have two functions I've written that I think you may find interesting.
The Problem
For a lot of cases I can use DTS to bring in the flat files, and I do. But what if I have a situation where the flat files have the same kind of data, but different column lengths. I ran into this when I started setting up Performance Logs on all our servers. A Server with 4 processors will have more columns than a server with 2. But the type of data I'll still be trapping, CPU data, will still be the same. Wouldn't it be nice to put all those logs into a central location and have one SQL server pull them in? To do this though, I would have to create a DTS package for each log file. Or, create some convoluted DTS logic to handle it.
What about when I get a Web Developer that wants to dump her data into SQL? It's early in the project and she doesn't know how many columns she will be eventually giving you. Wouldn't it be nice to give her a two-column table? One column for the key information, and one column for data values? She could then concatenate her Key values and Data values and dump them into two varchar fields. Later you could just parse them up, and if there are extra columns you either discard them, or handle them with some logic.
In either circumstance, wouldn't it be easy if we could pull the data into SQL, but then have the parsing ability of the DTS package?
The Solution
I've written two custom functions, and they are:
FN_Repetitive_Str_Parse & FN_PatCount
FN_Repetitive_Str_Parse is just as the name suggests. It will take in three parameters: The string you want to parse, the delimiter, and the number of the column you want returned. It will then return the data from the column you specify.
FN_PatCount is much like PATINDEX, but instead of giving you the position of the pattern in the string, It will count the number of times a the pattern occurs within a string. The return is an int value. This is Useful for dynamically finding out how many columns are in your string. You can then use this number to iterate through the string and lift out the variable columns.
This sample code is lifted right from the header of Repetitive String Parse. It demonstrates the ability of the function to turn two delimited values into a 5 column table.
DECLARE @KEY_DATA VARCHAR(200), @VAL_DATA VARCHAR(200) SELECT @KEY_DATA = 'AB12345~PA~ORD0001', @VAL_DATA = '253~USER1' SELECT dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 1) AS PART, dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 2) AS LOC, dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 3) AS [ORDER], dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 1) AS SHIP_QTY, dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 2) AS NETID, GETDATE() AS DATE_STAMP
This next bit of code is lifted out of a script I wrote to parse Performance logs from various computers. The number of data elements is dependant upon the hardware of the system generating the log, and thus varies from system to system. I wanted to make the parsing algorithm as general as possible, and thus translated a horizontal table into a vertical one. Once the raw data (comma delaminated, Quote-string Identified) is Bulk imported into a holding table, this part of the script parses it into a vertical table. Another script reads this 'parsed' vertical table, and depending on column labels, reconstructs the data into a RDS architecture. But I digress. Here's the relevant parsing section
CREATE TABLE #LOG_OUTPUT ( FILNAME VARCHAR(80), FILDATE DATETIME, LOGINFO VARCHAR(150) ) ---===<<< Some DDL so there's output to view >>>===--- INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"COLUMN1","COLUMN2","COLUMN3","COLUMNn"') INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_1","DATAC2_1","DATAC3_1","DATACn_1"') INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_2","DATAC2_2","DATAC3_2","DATACn_2"') INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_3","DATAC2_3","DATAC3_3","DATACn_3"') INSERT INTO #LOG_OUTPUT VALUES('PERFMON_LOGFILE1', '2004-09-22 10:00:00','"DATAC1_4","DATAC2_4","DATAC3_4","DATACn_4"') ---===<<< This is the code that parses the Log files >>>===--- DECLARE @PARSE_COUNT SMALLINT, @FIELD_IDX SMALLINT, @ROW_ID SMALLINT, @OLD_DATE DATETIME, @FILEDATE DATETIME, @LOGINFO VARCHAR(2000) DECLARE @PARSE_TBL TABLE ( DATESTAMP DATETIME, COLUMN_HEAD VARCHAR(255), DATA VARCHAR(255) ) DECLARE PARSE_PERFLOGS CURSOR FOR SELECT FILDATE, LOGINFO FROM #LOG_OUTPUT OPEN PARSE_PERFLOGS FETCH NEXT FROM PARSE_PERFLOGS INTO @FILEDATE, @LOGINFO SELECT @ROW_ID = 1, @OLD_DATE = @FILEDATE WHILE @@FETCH_STATUS = 0 BEGIN SELECT @PARSE_COUNT = DBO.FN_PATCOUNT (',"', @LOGINFO), @FIELD_IDX = 1 IF @ROW_ID = 1 WHILE @FIELD_IDX <= @PARSE_COUNT BEGIN INSERT INTO @PARSE_TBL SELECT @FILEDATE AS FILEDATE, dbo.FN_Repetitive_Str_Parse (@LOGINFO, ',"', @FIELD_IDX) AS COLUMN_HEAD, @FIELD_IDX AS DATA SET @FIELD_IDX = @FIELD_IDX + 1 END ELSE WHILE @FIELD_IDX <= @PARSE_COUNT BEGIN INSERT INTO @PARSE_TBL SELECT @FILEDATE AS FILEDATE, @FIELD_IDX AS FIELD, dbo.FN_Repetitive_Str_Parse (@LOGINFO, ',"', @FIELD_IDX) AS COLUMN_HEAD SET @FIELD_IDX = @FIELD_IDX + 1 END FETCH NEXT FROM PARSE_PERFLOGS INTO @FILEDATE, @LOGINFO IF @OLD_DATE = @FILEDATE SET @ROW_ID = @ROW_ID + 1 ELSE SET @ROW_ID = 1 SET @OLD_DATE = @FILEDATE END CLOSE PARSE_PERFLOGS DEALLOCATE PARSE_PERFLOGS DROP TABLE #LOG_OUTPUT SELECT * FROM @PARSE_TBL
Here's the source code for the custom Functions.
-------------------- dbo.FN_Repetitive_Str_Parse --------------------- IF EXISTS (SELECT NAME FROM SYSOBJECTS WHERE NAME = 'FN_Repetitive_Str_Parse') DROP FUNCTION FN_Repetitive_Str_Parse GO CREATE FUNCTION dbo.FN_Repetitive_Str_Parse /***************************************************************** ** Name : dbo.FN_Repetitive_Str_Parse ** ** Description : DECLARE @KEY_DATA VARCHAR(200), ** @VAL_DATA VARCHAR(200) ** SELECT @KEY_DATA = 'AB12345~PA~ORD0001', ** @VAL_DATA = '253~USER1' ** SELECT dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 1) AS PART, ** dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 2) AS LOC, ** dbo.FN_Repetitive_Str_Parse (@KEY_DATA, '~', 3) AS [ORDER], ** dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 1) AS SHIP_QTY, ** dbo.FN_Repetitive_Str_Parse (@VAL_DATA, '~', 2) AS NETID, ** GETDATE() AS DATE_STAMP ** ** ** ** ** ** ** Written By : Chris Cathers 8/17/04 ** ** Uses Tables: ** ** Parameters : ** ** Returns : ** ** Modifications: ** ** ** ** ** *****************************************************************/( @STR2PARSE VARCHAR(5000), @CHECK_CHAR VARCHAR(10), @DATA_ELEMENT SMALLINT ) RETURNS VARCHAR(100) AS BEGIN DECLARE @STR_VAL VARCHAR(100), @STR_INC SMALLINT, @STR_LEN SMALLINT, @FIELD_CNT TINYINT, @CHK_LEN TINYINT, @MAX_FIELD TINYINT, @FIELD_IDX SMALLINT, @VAL_INDEX SMALLINT, @TEST_CHAR VARCHAR(10), @TEST_STRING1 VARCHAR(100), @RETURN_STAT TINYINT ---===<<< Create temp table that contains Positional Data >>>===--- DECLARE @POS_DATA TABLE ( Element_type CHAR(4), Element_Count TINYINT, Element_Position SMALLINT ) DECLARE @VAL_LIST TABLE ( STR_VALUE VARCHAR(800), DATA_ELEMENT SMALLINT ) ---===<<< Var Initilalization >>>===--- SELECT @STR_LEN = LEN(@STR2PARSE), @STR_INC = 0, @FIELD_CNT = 0, @CHK_LEN = LEN(@CHECK_CHAR), @TEST_CHAR = '' ---===<<< Parse string >>>===--- -- **************************************** -- Generate a table recording the position in the string -- where the delimiter exists. -- **************************************** WHILE @STR_INC <= @STR_LEN BEGIN IF SUBSTRING(@STR2PARSE, @STR_INC - @CHK_LEN, @CHK_LEN) = @CHECK_CHAR BEGIN SET @FIELD_CNT = @FIELD_CNT + 1 INSERT INTO @POS_DATA VALUES ('LIST', @FIELD_CNT, @STR_INC) END SET @STR_INC = @STR_INC + 1 END -- **************************************** -- Test to see if the string ended with the delimiter. -- If it did not, Record the position of the last field. -- **************************************** IF (SELECT SUBSTRING(@STR2PARSE, MAX(ELEMENT_POSITION) + 1, 1) FROM @POS_DATA) <> '' BEGIN SET @FIELD_CNT = @FIELD_CNT + 1 INSERT INTO @POS_DATA VALUES ('LIST', @FIELD_CNT, LEN(@STR2PARSE) + 1) END ---===<<< Parse values and Columns >>>===--- SELECT @FIELD_CNT = MIN(ELEMENT_COUNT), @MAX_FIELD = MAX(ELEMENT_COUNT) FROM @POS_DATA WHILE @FIELD_CNT <= @MAX_FIELD BEGIN SELECT @FIELD_IDX = CASE WHEN @FIELD_CNT = 1 THEN 1 ELSE @STR_INC - @CHK_LEN + 1 END SELECT @STR_INC = ELEMENT_POSITION FROM @POS_DATA WHERE ELEMENT_TYPE = 'LiST' AND ELEMENT_COUNT = @FIELD_CNT IF @FIELD_CNT = @MAX_FIELD SET @TEST_STRING1 = LTRIM(SUBSTRING(@STR2PARSE, @FIELD_IDX, @STR_INC - @FIELD_IDX)) ELSE SET @TEST_STRING1 = LTRIM(SUBSTRING(@STR2PARSE, @FIELD_IDX, @STR_INC - @FIELD_IDX - @CHK_LEN)) INSERT INTO @VAL_LIST SELECT @TEST_STRING1, @FIELD_CNT SELECT @FIELD_CNT = @FIELD_CNT + 1 END -- ********************************************************** -- Fetch the value requested by user. -- If necessary clean out any occurrences of the delimiter -- ********************************************************** SELECT @STR_VAL = CASE WHEN CHARINDEX(@CHECK_CHAR, STR_VALUE, 2) > 0 THEN LEFT(STR_VALUE, LEN(STR_VALUE) - @CHK_LEN) ELSE STR_VALUE END FROM @VAL_LIST WHERE DATA_ELEMENT = @DATA_ELEMENT RETURN ISNULL(@STR_VAL, '') END GO GRANT EXECUTE ON [dbo].[FN_Repetitive_Str_Parse] TO [PUBLIC] GO ----------------------------------------------------------------------- -------------------------- FN_PATCOUNT ------------------------------- IF EXISTS (SELECT NAME FROM SYSOBJECTS WHERE NAME = 'FN_PATCOUNT' AND XTYPE = 'FN') DROP FUNCTION FN_PATCOUNT GO CREATE FUNCTION FN_PATCOUNT /***************************************************************** ** Name : FN_PATCOUNT ** ** Description : Will return a count of how many times the search ** pattern occurs in the submitted string. ** (A custom Metadata function) ** ** ** Written By : Chris Cathers 2004-09-03 ** ** Uses Tables: ** ** Parameters : ** ** Returns : ** ** Modifications: ** ** ** ** ** *****************************************************************/( @PATX VARCHAR(255), @STR VARCHAR(8000) ) RETURNS SMALLINT AS BEGIN DECLARE @PATCOUNT SMALLINT, @PATIDX SMALLINT, @PATLEN TINYINT SELECT @PATCOUNT = 1, @PATIDX = 1, @PATLEN = LEN(@PATX) WHILE @PATIDX <= LEN(@STR) BEGIN IF (SELECT SUBSTRING(@STR, @PATIDX - @PATLEN, @PATLEN)) = @PATX SET @PATCOUNT = @PATCOUNT + 1 SET @PATIDX = @PATIDX + 1 END RETURN @PATCOUNT END GO GRANT EXECUTE ON [dbo].[FN_PATCOUNT] TO [PUBLIC] GO -----------------------------------------------------------------------
Conclusions
With the creative use of the canned string functions that come with SQL, we can create some custom functions that can make everyone's job a lot easier.