String splitting using like and data cleansing

  • I have a large poorly designed table (inherited) With a Name field that contains comma delimited text containing address information. I need to do several things with it but unfortunately there doesn't appear to be any true consistency in it. When it displays in its own text box it works by placing each section on a new Line and looks ok.

    But I need to pull it apart and place things like unit number, Building Name in its own column etc. In the data it could be in either the 2nd,3rd, 4th, dependent on what came 1st.

    the data looks some thing like the following

    unitNumber/StreetNumber Space StreetName (Building Name), Subub,City,Country

    Some addresses won't have unit number or Suburb or country so when splitting you could have Suburbs and Citys in multiple columns even if you try and stagger the split process.

    Has any body go a good tool or reference site for dealing for this sort of problem. Code snippets?

    I have a table that I have made up that has some of the street names that could be used for comparing against existing records but it is by no means fool proof due to spelling inconsistencies . I also have another list of Common building names that could be used to compare, remove and place in the new building column.

    All the examples I can find seem simplistic.

    Thanks for recommendations in advance

    Gerard

  • To do this properly, I'd be looking at paying for specialist address cleansing software (QAS or PostcodeAnywhere are the two I've used in the UK), which have built up quite a lot of decent fuzzy matching rules etc. to return a clean address from many different input formats.

    If there's not enough business value to justify that, you could look at playing around with some string splitters and trying to get the majority of these wrangled into a consistent format.

    Sounds like the majority have CRLF's to separate them? These are stripped out when viewing in grid mode in SSMS. If you see them when you return results to text, then they're there in the data and can be used to split. If you want to do this with T-SQL, Jeff's written a fast Table-Valued function[/url] to do this, although writing a CLR function would allow more flexibility/easier handling of address data.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply