Split Function - Comma and/or Space Delimited

  • So I am working on a project where we have a column of data that is of variable width strings and may be comma, space, or comma/space delimited. Sample data below:

    CREATE TABLE #TEMP (delimitedString VARCHAR(255), phoneNumber VARCHAR(10))

    INSERT INTO #TEMP

    SELECT 'AAA,BBBB,CCC,DDD', '1234567890'

    UNION ALL

    SELECT 'EEEE, FFF','0987654321'

    UNION ALL

    SELECT 'GGG HHH III JJ KKK','5678901234'

    I started with a bit of script that I wrote that accepted 1 parameter to split a string, @siteIds VARCHAR(255) (There may be better ones, but this suited the quick purpose I needed. Please suggest improvements if you have some):

    DECLARE @siteIds VARCHAR(255)

    DECLARE @strngLen TINYINT

    DECLARE @split TABLE(siteId VARCHAR(5))

    SET @siteIds = 'AAA,BBB,CCC'

    SET @strngLen = CHARINDEX(',', @siteIds)

    WHILE CHARINDEX(',', @siteIds) > 0

    BEGIN

    SET @strngLen = CHARINDEX(',', @siteIds);

    INSERT INTO @split

    SELECT SUBSTRING(@siteIds,1,@strngLen - 1);

    SET @siteIds = SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds));

    END

    INSERT INTO @split

    SELECT @siteIds

    SELECT * FROM @split

    Simple enough, right? So now I want to alter this a bit to handle spaces and/or commas. I thought it might work to do a PATINDEX instead of the CHARINDEX and use an expression of '%[ ,,, ]%', but I am not really good at understanding these types of expressions. Of course, if I do this I will get some spaces in my table:

    DECLARE @siteIds VARCHAR(255)

    DECLARE @strngLen TINYINT

    DECLARE @split TABLE(siteId VARCHAR(5))

    SET @siteIds = 'AAA, BBB, CCC'

    SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds)

    WHILE PATINDEX('%[ ,,, ]%', @siteIds) > 0

    BEGIN

    SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds);

    INSERT INTO @split

    SELECT SUBSTRING(@siteIds,1,@strngLen - 1);

    SET @siteIds = SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds));

    END

    INSERT INTO @split

    SELECT @siteIds

    SELECT * FROM @split

    Ultimately I will turn this into a table valued function and use it in a cross apply on a table with the sample data above. Here is what I have so far for the TVF:

    CREATE FUNCTION tvf_parse_column

    (

    @siteIds VARCHAR(255),

    @phone VARCHAR(255)

    )

    RETURNS

    @parsed TABLE

    (

    siteId VARCHAR(10),

    phoneNumber VARCHAR(255)

    )

    AS

    BEGIN

    DECLARE @strngLen TINYINT

    SET @siteIds = LTRIM(RTRIM(@siteIds)) --trim off entry error

    SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds)

    WHILE PATINDEX('%[ ,,, ]%', @siteIds) > 0

    BEGIN

    SET @strngLen = PATINDEX('%[ ,,, ]%', @siteIds);

    IF(SELECT SUBSTRING(@siteIds,1,@strngLen - 1)) <> ' '

    INSERT INTO @parsed (siteId, phoneNumber)

    SELECT SUBSTRING(@siteIds,1,@strngLen - 1), @phone;

    SET @siteIds = SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds));

    END

    INSERT INTO @parsed (siteId, phoneNumber)

    SELECT @siteIds, @phone;

    RETURN

    END

    Any thoughts on improvements? Also, to remove the spaces in the function versus the select that will call on the function?

    Jared
    CE - Microsoft

  • Jeff Moden wrote an excellent article on splitters here[/url].


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (1/4/2012)


    Jeff Moden wrote an excellent article on splitters here[/url].

    Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.

    That being said I have changed the function to this:

    CREATE FUNCTION tvf_parse_column

    (

    @siteIds VARCHAR(255),

    @phone VARCHAR(255)

    )

    RETURNS

    @parsed TABLE

    (

    siteId VARCHAR(10),

    phoneNumber VARCHAR(255)

    )

    AS

    BEGIN

    DECLARE @strngLen TINYINT

    SET @siteIds = LTRIM(RTRIM(@siteIds)) --trim off entry error

    SET @strngLen = PATINDEX('%[\, , ,\,]%', @siteIds)

    WHILE PATINDEX('%[\, , ,\,]%', @siteIds) > 0

    BEGIN

    SET @strngLen = PATINDEX('%[\, , ,\,]%', @siteIds);

    INSERT INTO @parsed (siteId, phoneNumber)

    SELECT SUBSTRING(@siteIds,1,@strngLen - 1), @phone;

    SET @siteIds = LTRIM(SUBSTRING(@siteIds, @strngLen + 1, LEN(@siteIds)));

    END

    INSERT INTO @parsed (siteId, phoneNumber)

    SELECT @siteIds, @phone;

    RETURN

    END

    Any better thoughts?

    Jared
    CE - Microsoft

  • Replace comma plus space with comma, then space with comma. Or something similar which works with your data (test it).

    Then use Jeff's splitter.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • SQLKnowItAll (1/4/2012)


    Cadavre (1/4/2012)


    Jeff Moden wrote an excellent article on splitters here[/url].

    Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.

    Any better thoughts?

    Yep. . . if you need to do this in a database then use Jeff's splitter 😉


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (1/5/2012)


    SQLKnowItAll (1/4/2012)


    Cadavre (1/4/2012)


    Jeff Moden wrote an excellent article on splitters here[/url].

    Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.

    Any better thoughts?

    Yep. . . if you need to do this in a database then use Jeff's splitter 😉

    The first thing Jeff would say in this case is that since the conditions of my split are different than those referenced in his article (multiple delimiters in a random assortment, fixed element size, and a relatively small and fixed amount of data around 100 rows) is to test for myself and use what is most suited for the specific case.

    That being said, the goal of my post was not to decide on a function, but to come up with a solution for the multiple delimiters in the same column. So let's assume for a moment that I AM using Jeff's new DelimitedSplit8K splitter. How would you attack my scenario of multiple delimiters?

    Jared
    CE - Microsoft

  • SQLKnowItAll (1/5/2012)


    Cadavre (1/5/2012)


    SQLKnowItAll (1/4/2012)


    Cadavre (1/4/2012)


    Jeff Moden wrote an excellent article on splitters here[/url].

    Thanks for that, but what I am really trying to get at in this post is not an actual delimited string, but maybe partially. I say that because this is a column in an excel sheet that may or may not be separated by commas or spaces. There are no requirements for format as the customer did not request this, we are doing it to make life easier on ourselves. Therefore, we are not "requiring" a strict format to this column. Ultimately they are just elements separated by a space, a comma, or both. If on the unlikely chance that something else is used, we will manually fix it. So, my nibbler splitter is fine for this project as the database will be 3 tables only (staging table, application table, and reference table) with relatively small amounts of data and small strings.

    Any better thoughts?

    Yep. . . if you need to do this in a database then use Jeff's splitter 😉

    The first thing Jeff would say in this case is that since the conditions of my split are different than those referenced in his article (multiple delimiters in a random assortment, fixed element size, and a relatively small and fixed amount of data around 100 rows) is to test for myself and use what is most suited for the specific case.

    That being said, the goal of my post was not to decide on a function, but to come up with a solution for the multiple delimiters in the same column. So let's assume for a moment that I AM using Jeff's new DelimitedSplit8K splitter. How would you attack my scenario of multiple delimiters?

    ChrisM@home (1/4/2012)


    Replace comma plus space with comma, then space with comma. Or something similar which works with your data (test it).

    Then use Jeff's splitter.

    If this really really won't work, there's some mileage in using APPLY to pick up the delimiters one at a time. This is good for about 5 or so, after that the performance tails off.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply