Find string in another string during select

  • Hi Everyone,

    This probably has a simple answer that I just can't see now- I have 2 tables with course numbers, table 1 in the format "1234", table 2 in the format "1234 5678" (or more).

    What I am trying to do via select is identify the data in table 1 that cannot be found in the values in table 2.

    I've tried several simpler queries with no success. I am thinking the only way this might be possible is via nested cursors to compare each table 1 value to each table 2 value via charindex, and if charindex equals 0 when fetch status equals zero the table 1 value was not found in table 2 so return the table 1 value.

    If anyone knows a quicker and easier way to accomplish this I would be very impressed as well as appreciative.

    Thanks!

  • Check EXCEPT operator, like it was designed to solve your conundrum.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • You definitely do not need to use a one-at-a-time solution with a cursor or WHILE loop. Besides using EXCEPT to find the members of one resultset not found in another, you also can find rows that don't match specified criteria (in this case, whether CharIndex returns >0) by putting the match condition in the ON clause of a LEFT OUTER JOIN that relates the two tables and then using the WHERE clause to test for NULL in one of the fields of the table to be searched.

    Select TableA.ColumnA

    from TableA a

    left outer join TableB b on -- condition here - your charindex comparison, for instance

    where b.anycolumn is null

    In general, procedural solutions are less efficient in SQL Server than specifying a set-based result definition and letting the optimizer work out the best way to find the answer.

  • This is a fast and dirty way of doing what you are asking for. However, this query won't win any speed contensts, so if you have a lot of records, you may look at something else.

    --table1 is the table with the search for values (eg '1234')

    --table2 is the table with the values being searched (eg '1234 5678')

    select * from table1 a

    inner join table2 b on b.Col1 like '%'+a.Col1+'%'

  • Carleton (9/19/2009)


    This is a fast and dirty way of doing what you are asking for. However, this query won't win any speed contensts, so if you have a lot of records, you may look at something else.

    --table1 is the table with the search for values (eg '1234')

    --table2 is the table with the values being searched (eg '1234 5678')

    select * from table1 a

    inner join table2 b on b.Col1 like '%'+a.Col1+'%'

    Yes, the "LIKE" operator is another tool that would help, and may be easier to understand than the charindex approach. As is, this code snippet doesn't identify the table1 entries NOT found in table2 (the original problem), but perhaps one could compare table1 data with the output of this SELECT with an EXCEPT operation to get that final resultset.

    And, I don't really see how LIKE would work here anyway. Isn't it now looking for literal percent signs (inside quotes)? As they're quoted, they're no longer wildcards.

    As for winning speed contests, none of the approaches suggested would be able to use an index to find matches in table2, so you'll always have a full scan in the execution plan. If this is for a real-world application, you may want to explore normalizing that data -- putting each value that's in the "already there" table into its own row. That way an index could be established and queries would be much faster.

  • My appologies, I overlooked that you wanted records that were NOT a match. In this case adding the word NOT to my original script suggestion should work:

    select * from #table1 a

    inner join #table2 b on b.Col1 NOT like '%'+a.Col1+'%'

    John, I must respectfully disagree on the wildcard comment you mentioned:

    And, I don't really see how LIKE would work here anyway. Isn't it now looking for literal percent signs (inside quotes)? As they're quoted, they're no longer wildcards.

    If you want to search on the literal characters % or _ you must enclose them in brackets ([%] and [_]). If you run this query, you will find that the % characters are treated as wildcards even tho they are in the single quotes:

    create table #Table1 (Col1 varchar(max))

    Create table #Table2 (Col1 varchar(max))

    insert into #table1 values ('1234')

    insert into #table1 values ('1')

    insert into #table1 values ('2')

    insert into #table1 values ('3')

    insert into #table2 values ('1234 5678')

    insert into #table2 values ('5678 1234')

    insert into #table2 values ('1234')

    insert into #table2 values ('0')

    select * from #table1 a

    inner join #table2 b on b.Col1 NOT like '%'+a.Col1+'%'

    Cheers,

    Carleton

  • Carleton,

    Thank you for that correction. You're absolutely right about the wildcards. I'm here to learn and appreciate the chance for dialog.

    Now, however, the NOT LIKE still doesn't do the trick here. When I run the code you posted, each of the four rows in table1 finds a row in table2 that meets the condition of not being "like" the table1 value, so all four are in the resultset, even though the intention is not to include '1234', for instance. One could still use the LIKE operator to define a filter result set to be compared via EXCEPT.create table #Table1 (Col1 varchar(max))

    Create table #Table2 (Col1 varchar(max))

    insert into #table1 values ('1234')

    insert into #table1 values ('2345')

    insert into #table1 values ('2')

    insert into #table1 values ('3')

    insert into #table2 values ('1234 5678')

    insert into #table2 values ('5678 1234')

    insert into #table2 values ('1234')

    insert into #table2 values ('0')

    Select * from #table1

    EXCEPT

    select a.* from #table1 a

    inner join #table2 b on b.Col1 like '%'+a.Col1+'%'

    An additional gotcha turns up here: table1 values of "1", "2", and "3" are found in table2 even though the intention appears to be to match only on full four-digit numbers. Again, I'd encourage the OP to get that data normalized if he's dealing with a real-world database.

  • Well there you have it then, a true example of teaming! So John, you and I can split the revenue for this solution.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply