Using like in a set-based manner

  • Field1Field2

    12345 12345c

    The suffix of "c" (or others) is what's causing me the issue. Field1 and Field2 are not in the same table. Field1 is the base/parent/join defining field - Table.Field2 = Table.Field1.

    The trouble is these values reside in tables that both contain millions of rows and have no covering indexes. The join I'm using as I work this, which gives correct data but totally unacceptable performance is within the WHERE clause:

    (OK, this is Solomon, please don't ask if creating indexes are possible, it's not)

    ltrim(rtrim(APDOC.InvcNbr)) like ltrim(rtrim(ARDOC.RefNbr)) + '%'

    In a similar manner within a join:

    INNER JOIN APDOC ON substring(ltrim(rtrim(APDOC.InvcNbr)), 1, len(ltrim(rtrim(ARDOC.RefNbr)))) = ltrim(rtrim(ARDOC.RefNbr))

    Performance is NOT good with this any way I approach it.

    Thoughts or ideas really appreciated!

    Thank you.

  • Few questions:

    1) Is suffix always 1 character? If not, any chance of more representive sample data?

    2) Is it possible to add a calculated column to table APDOC?

    3) And while I'm at it, why no indexes?

    Cheers

  • ok, i know that the moment you use a function on a JOIN or WHERE condition, you lose the ability to use indexes, and it requires a table scan.

    Your #1 objective should be to remove all functions from the join condition so you can take advantage of indexes.

    WHY do you do this ltrim(rtrim(APDOC.InvcNbr)) <--to this column? is it a char with spaces at the end of it? is there spaces at the beginning of it?(same question goes for the column with the substring...why trim it? out of habit, or is there really bad data due to white space?

    if there are, can you change the datatype to varchar? Can you update to remove the spaces?

    the previous poster noted, that a persisted calculated column could solve your problem so you can do a clean join, but you'd have to add an index.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I'll echo what Lowell said. The Like is not the problem it is the Functions that cause the problem. If you have no white space then get rid of the trim or, run a one-time update to remove the white space.

    When Allister asked

    Is suffix always 1 character?

    I think he was asking because you can use the "_" as a 1 character wild card which can help the optimizer.

  • Excellent questions from everyone, and I should've addressed them in the original post.

    Suffix is not always 1 character, but can be multiple values from 1 to 3 characters in length.

    Modifying the schema isn't possible, it's a third party app (Solomon/Dynamics) that we're attempting to create an external application for. The suffix values were added to a legacy process 8-10 years ago and changing them now isn't possible.

    Indexes are completely up to the DBA and beyond my call, but I can recommend them.

    The field is char(15), so there are trailing space issues, without the trim calls can't get to a real comparison. Changing to varChar can't be done wihout blowing up the application.

  • Rick Kurtz (7/29/2009)


    Excellent questions from everyone, and I should've addressed them in the original post.

    Suffix is not always 1 character, but can be multiple values from 1 to 3 characters in length.

    Any chance of a more representative sample?

    Modifying the schema isn't possible, it's a third party app (Solomon/Dynamics) that we're attempting to create an external application for. The suffix values were added to a legacy process 8-10 years ago and changing them now isn't possible.

    OK, modifying, meaning, for example, changing existing columns, is likely to cause a lot of problems, but adding columns shouldn't lead to too many problems, so is it just politically difficult to add columns/indexes?

    Indexes are completely up to the DBA and beyond my call, but I can recommend them.

    You're going to have to post some more details here to enable folks to recommend suitable indexes, check out recomendations at following url on what info is needed http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    The field is char(15), so there are trailing space issues, without the trim calls can't get to a real comparison. Changing to varChar can't be done wihout blowing up the application.

    Which field is char(15), one or both?

  • If modifying the schema is not an option, can you create a cope of this table or may be even replicate the table. If that is possible, then you can add computed column that will store the numeric part of the expression.

    Here is something that may work for the computed column. Obviously, replace the variable @STR with the column name for computed column specification.

    DECLARE @STR VARCHAR(10)

    SELECT @STR = '12345cxyz'

    SELECT SUBSTRING(@str,1,PATINDEX('%[a-z]%',@str)-1)

    Hope that helps.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply