Data Cleansing

  • Hi,

    I have got a address line as:

    Unit3

    Unit 4

    Shop2

    Shop 5

    Villa 7

    Villa11

    I want the output to be like:

    Unittype Unit Number

    -------- ---------------------

    Unit 3

    Unit 4

    Shop 2

    Shop 5

    Villa 7

    Villa 11

    How can this be done? Any help will be appreciated

  • declare @t table(Unit nvarchar(100))

    insert into @t(Unit) values

    ('Unit3'),

    ('Unit 4'),

    ('Shop2'),

    ('Shop 5'),

    ('Villa 7'),

    ('Villa11')

    select

    t.unit

    , left(t.unit, patindex('%[0-9]%', t.unit)-1)

    , substring(t.unit, patindex('%[0-9]%', t.unit), 100)

    from

    @t t

    ?

    I Have Nine Lives You Have One Only
    THINK!

  • How will this work when addresses are like:

    Dummy Management Unit PO BOX 999 Dept of Energy, Training and The Arts CITY EAST DummyState 9999,

    Unit 1/100 Dummy Street, DummySuburb, DummyState

    For first address the Unit and UnitType both should be NULL and for the 2nd address it should be Unit 1

  • rka (10/31/2011)


    How will this work when addresses are like:

    Dummy Management Unit PO BOX 999 Dept of Energy, Training and The Arts CITY EAST DummyState 9999,

    Unit 1/100 Dummy Street, DummySuburb, DummyState

    Horribly. Your text is non-standardized, which is a whole different issue then a standardized set like described originally.

    For first address the Unit and UnitType both should be NULL and for the 2nd address it should be Unit 1

    The first and second don't even have the work Unit IN it. The third is a different ballgame.

    You're going to have to multi-pass this data, probably once for each set of conditions, to break it out intelligently. Then you'll have to manually inspect for outlyers or bad settings. Finally you'll have to repeat the process after you rinse it once or twice.

    There is no easy answer. Depending on just how many rules you have and how bad this data is, this is somewhere between a few days to a few weeks of work.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This may be a good point to start looking for a good third-party address cleansing tool. This particular rabbit hole can be very very deep... :crazy:

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply