Split String into Table or Record Set

  • Morning Guys,

    I have 3000 rows of address data in a single field called Address.
    The actual components of the address are split by LF [CHAR(10)].
    Because it is UK data the format isn't strict, so there can be more than a constant number of LF's present.

    How would I go about turning this into a table, where each LF marks a new column.

    I am not clear on how to handle the unknown number of columns.
    I don't need each address part to be in the right field, so there is no concept of Street Name or County Name just AddressFragment1 to N

    Cheers
    Alex

  • alex.sqldba - Thursday, August 24, 2017 5:50 AM

    Morning Guys,

    I have 3000 rows of address data in a single field called Address.
    The actual components of the address are split by LF [CHAR(10)].
    Because it is UK data the format isn't strict, so there can be more than a constant number of LF's present.

    How would I go about turning this into a table, where each LF marks a new column.

    I am not clear on how to handle the unknown number of columns.
    I don't need each address part to be in the right field, so there is no concept of Street Name or County Name just AddressFragment1 to N

    Cheers
    Alex

    Can you provide some sample DDL, some sample data (in the form of INSERT statements) and desired results (also with DDL)? Just four or five rows of sample data should be enough.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • [This site does not play well with Edge!]


    create table SourceAddress
    (
        [AddressData] varchar(max) not null
    )

    insert SourceAddress(AddressData)
    values('51 Anonymous Street//London//NA1 4PQ'),
    ('The Ugly Building//66 Benjamin Street//London//SW1E 7YN'),
    ('Top Floor//The Short Squat Building//401 Limehouse Street//London//SW7 9XY'),
    ('Overpriced Office, Inc//300 Victoria Street//WC1A 4NN'),
    ('57 Albany House//Charlotte Street//Chelsea//London//SW3 6CB')

    The problem for me is the unknown and variable number of breaks or '//' that can appear.

  • Assuming this is SQL 2016, you can use the STRING_SPLIT function, but understand that while you get a table, you would get one row per address fragment.   The problem would be how to separate the records for one address from the records for another address.   That's why we need some DDL for the existing table, so that we can see if there's a way to order the original records so that some original key to the data can "come along for the ride", so to speak, and at least serve as a marker for all the pieces of any one given address.   As I suspect that the order of appearance in the original string is going to matter, I'd actually recommend you use Jeff Moden's string splitter function instead of STRING_SPLIT, as it can operate on SQL Server versions going back to SQL 2008.   It's called DelimitedSplit8K, and you can find it by searching the "Articles" on this site for the search term "Tally Oh".   Read the article - it's a great read and covers all the technical details.   It provides an ordered list in table form, similar to the output of SQL 2016's STRING_SPLIT function, but you get the Item Number as a field in the table, and it performs so well that only CLR can do better.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Normally, at least for a UK address, you split it into 5 parts. Address lines 1-4 and the postcode.  If you have anymore, then you'll need to combine some of these lines togheter (delimtied with a comma (,)).

    This is a bit of a guess without DLL and DLM, and could very likely be improved upon, but:
    CREATE TABLE #Address (CompanyName varchar(100),FullAddress varchar(500));
    INSERT INTO #Address
    VALUES ('Sample Company Ltd','Sample Company Ltd' + CHAR(10) + 'Big Tower' + CHAR(10) + 'South Park' + CHAR(10) + 'Big City' + CHAR(10) + 'Lancashire' + CHAR(10) + 'AB12 2BA'),
           ('Test Co, PLC','Test Co, PLC' + CHAR(10) + 'Small Town' + CHAR(10) + 'Lancashire' + CHAR(10) + 'AB17 9QD');

    SELECT *
    FROM #Address;
    GO

    WITH Split AS (
      SELECT *,
        MAX(Itemnumber) OVER (PARTITION BY A.CompanyName) AS PcodePos
      FROM #Address A
       CROSS APPLY dbo.DelimitedSplit8K (A.FullAddress, CHAR(10)) DS),
    rCTE AS (
      SELECT S.CompanyName,
        MAX(CASE WHEN S.ItemNumber = 1 AND S.ItemNumber < S.PcodePos THEN Item ELSE NULL END) AS AddressLine1,
        MAX(CASE WHEN S.ItemNumber = 2 AND S.ItemNumber < S.PcodePos THEN Item ELSE NULL END) AS AddressLine2,
        MAX(CASE WHEN S.ItemNumber = 3 AND S.ItemNumber < S.PcodePos THEN Item ELSE NULL END) AS AddressLine3,
        MAX(CASE WHEN S.ItemNumber = 4 AND S.ItemNumber < S.PcodePos THEN Item ELSE NULL END) AS AddressLine4,
        MAX(CASE WHEN S.ItemNumber = S.PcodePos THEN item ELSE NULL END) AS Postcode,
        S.PcodePos,
        4 AS ItemNumber
      FROM Split S
      GROUP By S.CompanyName, S.PcodePos
      UNION ALL
      SELECT rCTE.CompanyName,
        rCTE.AddressLine1, rCTE.AddressLine2, rCTE.AddressLine3,
        rCTE.AddressLine4 +', ' + S.Item AS AdddressLine4,
        rCTE.Postcode,
        rCTE.PcodePos,
        rCTE.ItemNumber + 1 AS ItemNumber
      FROM rCTE
       JOIN Split S ON rCTE.CompanyName = S.CompanyName AND rCTE.ItemNumber + 1 = S.ItemNumber AND S.ItemNumber < rCTE.PcodePos)
    SELECT *
    FROM rCTE
    WHERE ItemNumber = (SELECT MAX(ItemNumber)
           FROM rCTE sq
           WHERE sq.CompanyName = rCTE.CompanyName);
    GO

    DROP TABLE #Address;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Forgot to mention, the DelimitedSplit8K function can be found here: http://www.sqlservercentral.com/articles/Tally+Table/72993/

    (For some reason, editing my above post means i can't use SSC for about 2-3 minutes and resultes in an error.)

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I should add that I can isolate Post Code separately with some string manipulation.

    It's just the remaining number of field that are unknown. Could be 3 fields, could be 4 that I am not sure how to handle. But I shall have a play with the code in the replies above. Thanks All. Much appreciated.

  • alex.sqldba - Thursday, August 24, 2017 6:34 AM

    I should add that I can isolate Post Code separately with some string manipulation.

    It's just the remaining number of field that are unknown. Could be 3 fields, could be 4 that I am not sure how to handle. But I shall have a play with the code in the replies above. Thanks All. Much appreciated.

    I've made a slight amendment to my code, adding an extra sample line, and to handle that (missed it out initially). It'll return a NULL if there are less than 4 Address Lines in any subsequent columns.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply