Any ways to unify messy input?

  • Ahoi,

    i have the situation that im dealing with data that is basically user input for each row. A user wants to analyse this data with a cube.

    But the data dimension becomes quiet messy because of all the possible variations that have the same meaning and should be grouped.

    Here is a subset of the inputdata:

    create table #Base(
    Company_Name1 nvarchar(255)
    ,Street nvarchar(255)
    ,Adress nvarchar(255)
    )
    insert #Base
    values


    ('Company KG','Kekstr. 51' ,'Metropolis')
    ,('Company KG','Kekstr. 53' ,'Metropolis')
    ,('COMPANY KG','Kekstreet 53' ,'METROPOLIS' )
    ,('COMPANY KG','Warnerer Str. 50-54' ,'METROPOLIS')
    ,('COMPANY KG','Warnererstr 50-54' ,'METROPOLIS')
    ,('COMPANY KG','Warnererstr. 50 - 54' ,'Metropolis')
    ,('COMPANY KG','Warnererstr. 50-54' ,'Metropolis' )
    ,('COMPANY KG','Warnererstreet 50 - 54' ,'METROPOLIS' )
    ,('COMPANY KG','Warnererstreet 50-54' ,'Metropolis')

    Here is some kind of result im looking for.

    Note: These are not the ONE solution, these possible variations, i am just looking to create 3 distinct rows out of the given ones.

    select *
    from #Base
    where Street in ('Kekstr. 51','Kekstr. 53','Warnerer Str. 50-54')


    select *
    from #Base
    where Street in ('Kekstr. 51','Kekstreet 53','Warnererstr. 50 - 54')

    Anyone ideas?

  • I think Google has some type of API that does address correcting.

    I'd look for that or some other known app to do this.  What you're trying to do is way too complex to do on your own.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher wrote:

    I think Google has some type of API that does address correcting.

    I'd look for that or some other known app to do this.  What you're trying to do is way too complex to do on your own.

    Thanks, was kinda expecting that to be the case.

    Guess i will try to aggregate the data by creating hierarchies in the Dimension to group the Input by different Logics:

    • First N String positions are the same
    • Same Numbers including
    • .... whatever else i can think of

     

  • This was removed by the editor as SPAM

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply