How to compare data (join) based on two varchar columns please

  • -- My first Data

    create table #myfirst (id int, city varchar(20))

    insert into #myfirst values (500,'Newyork')

    insert into #myfirst values (100,'Ediosn')

    insert into #myfirst values (200,'Atlanta')

    insert into #myfirst values (300,'Greenwoods')

    insert into #myfirst values (400,'Hitchcok')

    insert into #myfirst values (700,'Walmart')

    insert into #myfirst values (800,'Madida')

    -- My Second Data

    create table #mySecond (id int, city varchar(20),Sector varchar(2))

    insert into #mySecond values (1500,'Newyork','MK')

    insert into #mySecond values (5500,'Ediosn','HH')

    insert into #mySecond values (5060,'The Atlanta','JK')

    insert into #mySecond values (7500,'The Greenwoods','DF')

    insert into #mySecond values (9500,'Metro','KK')

    insert into #mySecond values (3300,'Kilapr','MK')

    insert into #mySecond values (9500,'Metro','NH')

    --Third Second Data

    create table #myThird (id int, city varchar(20),Sector varchar(2))

    insert into #myThird values (33,'Walmart','PP')

    insert into #myThird values (20,'Ediosn','DD')

    select f.*,s.Sector from #myfirst f join #mySecond s on f.city = s.city

    /*

    idcitySector

    500NewyorkMK

    100EdiosnHH

    */

    i have doubt on two things

    1) How Can i compare the City names, by eliminating 'The ' at the begining (if there is any in second tale city) between first and second

    2) after comparing first and second if there is no match found in second them want to compare with third table values for those not found

    --i tried below to solve first doubt, it is working but want to know any otherwasys to do it please

    select f.*,s.Sector from #myfirst f join #mySecond s on replace (f.city, 'THE ','')= replace (s.city, 'THE ','')

    --Expected results wull be

    create table #ExpectResults (id int, city varchar(20),Sector varchar(2))

    insert into #ExpectResults values (200,'Atlanta','JK')

    insert into #ExpectResults values (100,'Ediosn','HH')

    insert into #ExpectResults values (300,'Greenwoods','DF')

    insert into #ExpectResults values (500,'Newyork','MK')

    insert into #ExpectResults values (700, 'Walmart','PP')

    insert into #ExpectResults values (800, 'Madidar','')

    IDcitySector

    200AtlantaJK

    100EdiosnHH

    300GreenwoodsDF

    500NewyorkMK

    700 WalmartPP

    800Madidar

  • That sounds like dirty data. The single best solution is to clean the data at it's source. The functions that you ran on the columns will prevent index use making your query extremely slow. However, if you can't clean the data, that's about all you can do. Another option is to create a calculated column that stores the data in a cleaned state.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply