January 4, 2012 at 8:43 am
So I am working on a project where we have a column of data that is of variable width strings and may be comma, space, or comma/space delimited. Sample data below:
CREATE TABLE #TEMP (delimitedString VARCHAR(255), phoneNumber VARCHAR(10))
INSERT INTO #TEMP
SELECT 'AAA,BBBB,CCC,DDD', '1234567890'
UNION ALL
SELECT 'EEEE, FFF','0987654321'
UNION ALL
SELECT 'GGG HHH III JJ KKK','5678901234'
I started with a bit of script that I wrote that accepted 1 parameter to split a string, @siteIds VARCHAR(255) (There may be better ones, but this suited the quick purpose I needed. Please suggest improvements if you have some):
DECLARE @siteIds VARCHAR(255)
DECLARE @strngLen TINYINT
DECLARE @split TABLE(siteId VARCHAR(5))
SET @siteIds = 'AAA,BBB,CCC'
SET @strngLen = CHARINDEX(',', @siteIds)
WHILE CHARINDEX(',', @siteIds) > 0
BEGIN
SET @strngLen = CHARINDEX(',', @siteIds);
INSERT INTO @split
SELECT SUBSTRING(@siteIds,1,@strngLen - 1);
SET @siteIds = SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds));
END
INSERT INTO @split
SELECT @siteIds
SELECT * FROM @split
Simple enough, right? So now I want to alter this a bit to handle spaces and/or commas. I thought it might work to do a PATINDEX instead of the CHARINDEX and use an expression of '%[ ,,, ]%', but I am not really good at understanding these types of expressions. Of course, if I do this I will get some spaces in my table:
DECLARE @siteIds VARCHAR(255)
DECLARE @strngLen TINYINT
DECLARE @split TABLE(siteId VARCHAR(5))
SET @siteIds = 'AAA, BBB, CCC'
SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds)
WHILE PATINDEX('%[ ,,, ]%', @siteIds) > 0
BEGIN
SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds);
INSERT INTO @split
SELECT SUBSTRING(@siteIds,1,@strngLen - 1);
SET @siteIds = SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds));
END
INSERT INTO @split
SELECT @siteIds
SELECT * FROM @split
Ultimately I will turn this into a table valued function and use it in a cross apply on a table with the sample data above. Here is what I have so far for the TVF:
CREATE FUNCTION tvf_parse_column
(
@siteIds VARCHAR(255),
@phone VARCHAR(255)
)
RETURNS
@parsed TABLE
(
siteId VARCHAR(10),
phoneNumber VARCHAR(255)
)
AS
BEGIN
DECLARE @strngLen TINYINT
SET @siteIds = LTRIM(RTRIM(@siteIds)) --trim off entry error
SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds)
WHILE PATINDEX('%[ ,,, ]%', @siteIds) > 0
BEGIN
SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds);
IF(SELECT SUBSTRING(@siteIds,1,@strngLen - 1)) <> ' '
INSERT INTO @parsed (siteId, phoneNumber)
SELECT SUBSTRING(@siteIds,1,@strngLen - 1), @phone;
SET @siteIds = SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds));
END
INSERT INTO @parsed (siteId, phoneNumber)
SELECT @siteIds, @phone;
RETURN
END
Any thoughts on improvements? Also, to remove the spaces in the function versus the select that will call on the function?
Jared
CE - Microsoft
January 4, 2012 at 10:07 am
Jeff Moden wrote an excellent article on splitters here[/url].
January 4, 2012 at 10:22 am
Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.
That being said I have changed the function to this:
CREATE FUNCTION tvf_parse_column
(
@siteIds VARCHAR(255),
@phone VARCHAR(255)
)
RETURNS
@parsed TABLE
(
siteId VARCHAR(10),
phoneNumber VARCHAR(255)
)
AS
BEGIN
DECLARE @strngLen TINYINT
SET @siteIds = LTRIM(RTRIM(@siteIds)) --trim off entry error
SET @strngLen = PATINDEX('%[\, , ,\,]%', @siteIds)
WHILE PATINDEX('%[\, , ,\,]%', @siteIds) > 0
BEGIN
SET @strngLen = PATINDEX('%[\, , ,\,]%', @siteIds);
INSERT INTO @parsed (siteId, phoneNumber)
SELECT SUBSTRING(@siteIds,1,@strngLen - 1), @phone;
SET @siteIds = LTRIM(SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds)));
END
INSERT INTO @parsed (siteId, phoneNumber)
SELECT @siteIds, @phone;
RETURN
END
Any better thoughts?
Jared
CE - Microsoft
January 4, 2012 at 11:22 am
Replace comma plus space with comma, then space with comma. Or something similar which works with your data (test it).
Then use Jeff's splitter.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 5, 2012 at 1:37 am
SQLKnowItAll (1/4/2012)
Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.
Any better thoughts?
Yep. . . if you need to do this in a database then use Jeff's splitter 😉
January 5, 2012 at 5:11 am
Cadavre (1/5/2012)
SQLKnowItAll (1/4/2012)
Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.
Any better thoughts?
Yep. . . if you need to do this in a database then use Jeff's splitter 😉
The first thing Jeff would say in this case is that since the conditions of my split are different than those referenced in his article (multiple delimiters in a random assortment, fixed element size, and a relatively small and fixed amount of data around 100 rows) is to test for myself and use what is most suited for the specific case.
That being said, the goal of my post was not to decide on a function, but to come up with a solution for the multiple delimiters in the same column. So let's assume for a moment that I AM using Jeff's new DelimitedSplit8K splitter. How would you attack my scenario of multiple delimiters?
Jared
CE - Microsoft
January 5, 2012 at 9:14 am
SQLKnowItAll (1/5/2012)
Cadavre (1/5/2012)
SQLKnowItAll (1/4/2012)
Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.
Any better thoughts?
Yep. . . if you need to do this in a database then use Jeff's splitter 😉
The first thing Jeff would say in this case is that since the conditions of my split are different than those referenced in his article (multiple delimiters in a random assortment, fixed element size, and a relatively small and fixed amount of data around 100 rows) is to test for myself and use what is most suited for the specific case.
That being said, the goal of my post was not to decide on a function, but to come up with a solution for the multiple delimiters in the same column. So let's assume for a moment that I AM using Jeff's new DelimitedSplit8K splitter. How would you attack my scenario of multiple delimiters?
ChrisM@home (1/4/2012)
Replace comma plus space with comma, then space with comma. Or something similar which works with your data (test it).Then use Jeff's splitter.
If this really really won't work, there's some mileage in using APPLY to pick up the delimiters one at a time. This is good for about 5 or so, after that the performance tails off.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply