Using Left Function in a JOIN

  • We have a field with a compound value which is used to join to another table in the following way:

    select * from TABLE1 T1, TABLE2, T2

    WHERE

    LEFT(T1.FIELD1, 3) = T2.FIElD1

    The values of T1.FIELD look like this:

    001ABC123456, 002ABC123456, etc.

    There are approx 130000 rows in TABLE1, with 12 different values for LEFT(T1.FIELD1, 3), with an equal distribution.

    I have created a new table for TABLE1 with the value for LEFT(FIELD1, 3) populated in a separate field. I ran the new query:

    select * from TABLE1 T1, TABLE2, T2

    WHERE

    T1.NEWFIELD = T2.FIELD1

    and found no noticeable difference in speed.

    QUESTION: Is using a function such as LEFT or SUBSTRING inherently more costly than joining on a separate field? Is this a practice to avoid? Or does it not make any difference?

    Thanks

    Jay

  • Im sure there is a cost associated with it - how much probably depends on which operation you're doing. I'd guess left is likely to optimize much better than right or substring would, and definitely better than a like with a leading wildcard.

    Until you have a performance problem, do it the right way - which means use the function in the join. Temp tables, table functions, etc, should be considered only if that is the ONLY way you can solve the query or you just cannot get the performance you need. My opinion anyway. Let SQL do its thing!

    Andy

  • quote:


    Until you have a performance problem, do it the right way - which means use the function in the join. Temp tables, table functions, etc, should be considered only if that is the ONLY way you can solve the query or you just cannot get the performance you need. My opinion anyway. Let SQL do its thing!


    I disagree - as a general rule, using a function in the join will invalidate andy indexes you may have on T1.Field1, because the result of the function is not the same thing as the field, so the query optimizer cannot do it's thing - you'll end up with more inefficient query.

    You won't notice much difference with only 130000 rows/1 join, but you will on complex/bulky queries

    I'd advise using a #temp table.

    Trev

  • For your situation I think the function is fine. Maybe it would be diferent if you had 130000 diferent values for LEFT(T1.FIELD1,3). With only 12 values in 130000 records I would think that optimization is probably minimal. However you could consider adding a column to your table and populating it with an Insert trigger. Thats assuming the extra column doesn't break anything.

  • Trev,

    You're saying that you would ALWAYS use a temp table if you needed to use a function in a join?

    Leon's point is well taken as a way to optimize - but you'd need to use an update trigger as well to keep the data synced. An alternative with less maintenance would be a computed column, which is indexable in SQL2K.

    Andy

  • If your column is indexed, SQL Server is able to pull out the info it needs from there and not go to the table. that's why this query doesn't seem to matter. In general, if you are joining on this, I would say you have a column that is not fully decomposed.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply