Seperate digits

  • Hi There,

    I have a table, holding both streetnames and housenumbers/Housenumberintervals/letters in the same column,

    StreetnamesNumbersLetters

    Appelstreet, city 345 A

    James Hight 107-109

    Upper 4 - 8

    Sunstreet 2

    and I need to seperate streetnames, housenumbers and letters in three seperate columns, like this

    Streetnames Numbers Letters

    Appelstreet, city 345 A

    James Hight 107

    James Hight 109

    Upper 4

    Upper 6

    Upper 8

    Sunstreet 2

    I have no idea how to this, all help is appreciated.

    joejoe

  • Information may have been lost during the concatenation/storing this data in a single column. Probably it will not always be possible to get this information back. E.g 13-19 may need to be expanded to 13, 15 A, 15 B, 17, 19. Do you have any data to validate the results? Is your data consistent, i.e. does every entry satisfy the simple rule of {street name} {number|range} [{letter}] format. If it does you can first expand the ranges, and then split the data based on the location of the number (CHARINDEX, LEFT, RIGHT). If not, you will need to identify the rows that do not follow the above syntax, and and decide what to do with them. If there are too many exceptions, see if these can be parsed using a different syntax.

    It is a pain to check whether a particular column in a table satisfies a regular expression. You probably want to use an application written in a language like C# that has libraries supporting regular expression matching.

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • Is the database really the best place to do this?

    However providing the data is always in the format specified then this is how I would do it

    Create a temp table and fill it with your data

    DECLARE @temp TABLE (StreetnamesNumbersLetters varchar(100),Streetnames varchar(30),Numbers varchar(30),Letters varchar(30))

    insert into @temp

    select 'Appelstreet, city 345 A','','','' union

    select 'James Hight 107-109','','','' union

    select 'Upper 4 - 8','','','' union

    select 'Sunstreet 2','','',''

    Move any data with no number !!!!

    UPDATE @temp

    SET Streetnames = StreetnamesNumbersLetters,StreetnamesNumbersLetters = ''

    WHERE StreetnamesNumbersLetters <> ''

    AND PATINDEX('%[0-9]%',StreetnamesNumbersLetters) = 0

    Move text up to first digit

    UPDATE @temp

    SET Streetnames = LEFT(StreetnamesNumbersLetters,PATINDEX('%[0-9]%',StreetnamesNumbersLetters)-1),

    StreetnamesNumbersLetters = STUFF(StreetnamesNumbersLetters,1,PATINDEX('%[0-9]%',StreetnamesNumbersLetters)-1,'')

    WHERE StreetnamesNumbersLetters <> ''

    Move any data from letter onwards (if present)

    UPDATE @temp

    SET Letters = SUBSTRING(StreetnamesNumbersLetters,PATINDEX('%[a-z]%',StreetnamesNumbersLetters),255),

    StreetnamesNumbersLetters = STUFF(StreetnamesNumbersLetters,PATINDEX('%[a-z]%',StreetnamesNumbersLetters),255,'')

    WHERE StreetnamesNumbersLetters <> ''

    AND PATINDEX('%[a-z]%',StreetnamesNumbersLetters) > 0

    Move single numbers

    UPDATE @temp

    SET Numbers = StreetnamesNumbersLetters,StreetnamesNumbersLetters = ''

    WHERE StreetnamesNumbersLetters <> ''

    AND CHARINDEX('-',StreetnamesNumbersLetters) = 0

    Create additional rows for number range

    INSERT INTO @temp (StreetnamesNumbersLetters,Streetnames,Numbers,Letters)

    SELECT '',t.Streetnames,CAST(n.number as varchar),t.Letters

    FROM @temp t

    INNER JOIN dbo.Numbers n

    ON n.Number BETWEEN CAST(LTRIM(RTRIM(LEFT(t.StreetnamesNumbersLetters,CHARINDEX('-',t.StreetnamesNumbersLetters)-1))) as int)+1

    AND CAST(LTRIM(RTRIM(SUBSTRING(t.StreetnamesNumbersLetters,CHARINDEX('-',t.StreetnamesNumbersLetters)+1,255))) as int)

    AND n.Number % 2 = CAST(LTRIM(RTRIM(LEFT(t.StreetnamesNumbersLetters,CHARINDEX('-',t.StreetnamesNumbersLetters)-1))) as int) % 2

    WHERE t.StreetnamesNumbersLetters <> ''

    Move first number

    UPDATE @temp

    SET Numbers = LEFT(StreetnamesNumbersLetters,CHARINDEX('-',StreetnamesNumbersLetters)-1),StreetnamesNumbersLetters = ''

    WHERE StreetnamesNumbersLetters <> ''

    Far away is close at hand in the images of elsewhere.
    Anon.

  • If you want something slick to split addresses and validate them at the same time, look into a product called "ZP4".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    I looked into the ZP4, and truely looks very handy. But I don't have any adresses in USA, only in Europe.

    But thanks for the tip πŸ™‚

  • David,

    Thanks for posting the scripts that I needed. I understand most of it when I see it, but I doubt I ever have the skills to write anything like it!

    I have now,

    - moved any data with no number

    - moved text up to first digit

    - moved any data from letter onwards (if present)

    - moved single numbers

    But the "Create additional rows for number range" part fails with the following message:

    Msg 208, Level 16, State 1, Line 39

    Invalid object name 'msp.dbo.Numbers'.

    I have tried to rename the Number column with several combinations (n.Numbers & Number & n.number) with no luck. Any suggestions?

    Best joejoe

  • ...but I doubt I ever have the skills to write anything like it!

    Sure you will, we all started as beginners πŸ˜‰

    Now we try to elevate ourselves to Jeffs' level πŸ˜€

    Invalid object name 'msp.dbo.Numbers'.

    Sorry my bad, used one of my tables in testing and forgot to remove db name, I have removed the reference from the post

    However the Numbers table is just that, a table of numbers, this site has many examples

    Far away is close at hand in the images of elsewhere.
    Anon.

  • joe joe (10/17/2007)


    Hi Jeff,

    I looked into the ZP4, and truely looks very handy. But I don't have any adresses in USA, only in Europe.

    But thanks for the tip πŸ™‚

    Ack... my bad... didn't realize the continent never mind the country πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Now we try to elevate ourselves to Jeffs' level

    Heh... nah... it's not worth all the nosebleeds due to lack of Oxygen πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm going back to Andras' post.

    If you want to spilt the adderss 'Upper 4 - 8' you really do need to know the actual addresses used in 'Upper'. It may be that originally Upper had 3 buildings named 'Upper 4', 'Upper 6' and 'Upper 8', but these were replaced over time with a single building with one street door called 'Upper 4 - 8'. Equally, Upper may have had consecutive numbering ('Upper 4', 'Upper 5', etc.) - this is not uncommon in Europe.

    Whatever was done to 'Upper' in the past, if you send the same mail to 'Upper 4', 'Upper 6', 'Upper 8' you will look less than professional.

    I would advise against splitting addresses such as 'Upper 4 - 8' unless you know that such a split makes sense with what is on the ground today.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop HΓ©lder CΓ’mara

  • Unless you are going to use some sort of address validation software to verify your change you will most likely completely scramble the data more than humpty dumpty. I would discourage against it. Even getting people to enter data in a valid manor is difficult how will you maintain it? There are just too many possibilities.

    As good as it sounds to break it up like that, any attempt with out some sort of method to validate your choice is going to possibly lose customer data.

  • A quick search found these guys http://www.grcdi.nl/dmtools.htm never used them so I have no recommendation to give, but they have a 30 day trial πŸ™‚ How long will it take for you to re-format your data? However keeping it valid is the tougher part.

  • Hi Guys,

    Thanks for all the advices. No panic, I'm not a dba for a sql server in production, and I'm not using the addresses for contacting purposes. With the addresses in the split format I will be able to, validate and geocode them for visualisation.

    But thanks for your concern πŸ™‚

  • You bet... thanks for the feedback!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • In the U.S. I would use a service like Melissa Data that will accept a street address in pretty much any format and return the address broken down into it's USPS (postal service) components (e.g. house number, street, zip/postal code, etc.) I'm sure that there are similar services for the rest of the world (at least I would hope so) search for "data quality" and the country...

    I would not spend a great deal of time trying to parse the addresses myself, too many variables/exceptions, you'll go crazy and probably end up with garbage.

    Joe

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply