Extremely slow queries when doing an OR join.

  • I have found some curious behavior. I have two tables that are related and I want to execute a join between them. But, they can join on one of two fields. My join looks like this:

    SELECT *

    FROM tableA

    INNER JOIN tableB ON tableA.field1 = tableB.field1 OR tableA.field2 = tableB.field2

    When I run this query on two tables with around 50-100k rows, the join process takes many minutes, sometimes 30-40 minutes. When I join on either the first field or the second field it takes around 1 second.

    I rewrote the query to left join to tableB twice and use a WHERE clause to determine that at least one of them was a match. But that doesn't explain why the first query takes so darn long to run. Looking at the execution plan doesn't really tell me, it looks like the one-field join plan but with just the extra OR clause on the join.

    One other curious feature is that I see the same problem if the second clause isn't even a join field. ie:

    SELECT *

    FROM tableA

    INNER JOIN tableB ON tableA.field1 = tableB.field1 AND (tableB.field2 >= val1 OR tableB.field2 <= val2) The join processor just really hates the OR clause. Notes: This is SQL Server 2000 with all service packs and running on a not-huge database. Indexes are available on both fields in the OR clause in both tables. Anybody else seen similar behavior?

  • Asside from changing the data model (Not saying you should, or should not) but joins perform poorly when an or is stated in them much as it would if you place OR in the where clause.

    I would suggest you create your select using a union.

    Select Field1, field2, field3

    From (Select Field1, field2, field3

          From TableA

          INNER JOIN tableB ON tableA.field1 = tableB.field1

          Where tableB.field2 >= val1

          union

          Select Field1, field2, field3

          From TableA

          INNER JOIN tableB ON tableA.field1 = tableB.field1

          where tableB.field2 <= val2) as DerivedTable

    Sometimes it is what it is.

     

  • I'm guessing the obvious indexes have been configured.  Can you determine anything from the execution plan as to why this is happening?  OR is not great performance, but I see that your scenario does not make sense.

    Check IO statistics and plan for strange behavior.

     

  • Yeah, things are indexed as needed. The execution plan doesn't tell me much more than the regular join method does.

    It is even slower than if it had to do a full table or index scan during the lookup. It's like it has to do a full table scan for every record in the index, essentially making a temp table for each row of results or something.

    I am working around it now, but it makes my SQL very messy and I worry that it is not really the same results that I would get with the OR join.

  • If this is taking 30+ minutes on only 50K rows, then there's more going on, and most likely cardinality issues.

    What is the uniqueness of field1 and field2 in the 2 tables ?

  • I would say that the uniqueness of field1 and field2 are pretty good. For example, in one of my joins the fields were both US State codes. (Where a particular person (tableA) had a billing state and a shipping state for example.) They are pretty well distributed. But even when it was not a join condition, it was just as bad.

    Ill see if I can come up with some better examples of this with the Northwind database or something and see if I can reproduce it for others.

    In most cases I can work around it with more than one left join and a where clause, or some union query or a reworking of my data design, but I don't feel like I should have to. Even if I am willing to pay the price for a table scan, that is no excuse for SQL to try to do O(N^2) table scans just to punish me.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply