"OR" statement in JOIN query

  • I've got a LEFT OUTER JOIN I'm trying to do on a table, where I want to match on two different criteria. Something like this:

    LEFT OUTER JOIN mytable a ON a.ID = maintable.ID AND

    ((a.field1 = maintable.field1)

    OR

    (a.f2 = maintable.f2

    AND a.f3 = maintable.f3

    AND a.f4 = maintable.f4))

    I'm seeing *horrible* performance from this. If I take either of the individual statements, i.e. LEFT OUTER JOIN mytable a ON a.ID = maintable.ID AND a.field1 = maintable.field1

    or the other one, each individually performs just fine. Heck, if I join to the same table twice in successive statements, that performs well, too, but I don't really want to join to the same table twice in the same query.

    How can I left outer join to two different subsets of the same table in the same query? Thanks!!

  • Pete there is a difference between Joining and using a WHERE statement to limit your results; I'm thinking that you are including the WHERE statement as part of the join, and you don't want to do that.

    i'm pretty sure your query should be this, and will perform a lot better, but you really need to paste your real query, and not pseudocode to get a more definititve answer.

    --your snippet implies that 3 other tables aliased f2,f3 and f4 are already part of the query

    LEFT OUTER JOIN mytable a ON a.ID = maintable.ID

    LEFT OUTER JOIN maintable ON a.field1 = maintable.field1

    WHERE

    (a.f2 = maintable.f2

    AND a.f3 = maintable.f3

    AND a.f4 = maintable.f4))

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thank you for your response! Sorry I wasn't more clear; my original query is at work and I just remembered that I wanted to look for help with that specific problem I was seeing. There's really only two tables involved, f2, f3, and f4 were meant to be fields in one of the tables. Let me try and explain it better....

    I essentially have two identical tables (one with "new" data, and one with existing data that I want to compare to). They both will match on field GROUPNUMBER, but then I want to match on either SSN, or the combination of FIRSTNAME, LASTNAMES, DOB, and SEX. I need a LOJ because I want to know what in the existing data is different, and if the existing side is null then I know the new data is new.

    Sorry again for pseudcode, from what I rememeber this is essentially it:

    Two tables: tblNewData, tblExistingData. They each have data about persons. I want to return all rows from tblNewData, and matching rows from tblExistingData.

    select *

    from tblNewData a LEFT OUTER tblExistingData b ON a.GROUPNUMBER = b.GROUPNUMBER

    AND

    (

    a.SSN = b.SSN OR

    (a.FIRSTNAME = b.FIRSTNAME

    and a.LASTNAME = b.LASTNAME

    and a.DOB = b.DOB

    and a.SEX = b.SEX)

    )

    That's basically where I am now.

    Putting the conditions in the where clause is an interesting idea and I'll try that. I think what I was aiming for was to not have to join to *all* of the rows on the right-side table, thinking that limiting those in the join would speed things up. But maybe that's not how it will work.

  • I don't think you want to put the criteria in the WHERE clause because that will likely cause the Optimizer to see the LEFT JOIN as and INNER JOIN. The issue you are having is that the OR is making the query filter non-SARGable, so the Optimizer is unlikely to be able to do any SEEKS, but have to scan the index or table and in your case I would bet you are seeing a table/clustered index scan. Attaching the query plan (graphical) will help us help you.

    Many times when needing to use an OR you can get better performance by UNION'ing the queries together. Something like this:

    select

    *

    from

    tblNewData a LEFT OUTER JOIN

    tblExistingData b

    ON a.GROUPNUMBER = b.GROUPNUMBER AND

    a.SSN = b.SSN

    UNION

    select

    *

    from

    tblNewData a LEFT OUTER JOIN

    tblExistingData b

    ON a.GROUPNUMBER = b.GROUPNUMBER AND

    a.FIRSTNAME = b.FIRSTNAME and

    a.LASTNAME = b.LASTNAME and

    a.DOB = b.DOB and

    a.SEX = b.SEX

    This will allow the Optimizer to better use existing indexes.

  • I've always used this method:

    SELECT a.*, b1.*, b.*

    FROM tblNewData a

    LEFT OUTER JOIN tblExistingData b1

    ON b1.GROUPNUMBER = a.GROUPNUMBER AND b1.SSN = a.SSN

    LEFT OUTER JOIN tblExistingData b

    ON a.GROUPNUMBER = b.GROUPNUMBER AND

    a.FIRSTNAME = b.FIRSTNAME and

    a.LASTNAME = b.LASTNAME and

    a.DOB = b.DOB and

    a.SEX = b.SEX

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yeah, I am seeing clustered index scans, I'll see if I can up a graphic of it.

    Setting the query as Chris suggests actually seems to perform the quickest, but it seems like that is essentially querying the same table twice. Plus, then I need to do some manipulation since I basically want a value in b1, or if nothing matches there then the value in b. Ideally I get this represented in a single field from the query because ultimately I need to compare that to the single field from the new data.

    All good ideas and I'm gonna play with it some more. Thanks everyone!

  • SELECT a.GROUPNUMBER, a.SSN,

    ISNULL(b1.FIRSTNAME, b.FIRSTNAME) AS FIRSTNAME,

    ISNULL(b1.LASTNAME, b.LASTNAME) AS LASTNAME

    FROM tblNewData a

    LEFT OUTER JOIN tblExistingData b1

    ON b1.GROUPNUMBER = a.GROUPNUMBER AND b1.SSN = a.SSN

    LEFT OUTER JOIN tblExistingData b

    ON a.GROUPNUMBER = b.GROUPNUMBER AND

    a.FIRSTNAME = b.FIRSTNAME and

    a.LASTNAME = b.LASTNAME and

    a.DOB = b.DOB and

    a.SEX = b.SEX

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply