Left Anti Semi Join - performance issue

  • Hi,

    I have a strange performance issue trying to upgrade our SQL 2005 to SQL2012.

    TableA in on a database running SQL2012

    Remoteserver is a 2005 server

    When I run the following query my current production server (TableA is on SQL2005 in production), the performance is great:

    select * from TableA where CustID not in (select CustID from remoteserver.PRDA.dbo.Cust)

    But when I run the same query on my SQL2012, I get a left anti semi join and the query run for a very long time. How can I get ride of the Left Anti Semi Join?

    If I save the result of the query from the remote server in a temp table, and use the temp table it works well.

    select CustID into #tempCust from remoteserver.PRDA.dbo.Cust

    select * from TableA where CustID not in (select CustID from #tempCust).

    Thanks for your help !

  • Ah the joys of the Distributed Transaction Coordinator! That's the most likely culprit. My usual solution is pull the remote data over into a table variable (avoids log bloat and DTC), and then work with it locally. Depending on the amount of data, staging to a table variable and then dumping into a local temp table (for the stats those keep) can actually be worth it sometimes.

    If you load into a table variable from remote, it takes less aggressive locks, because of how table variables work. That's my experience with it, anyway. Haven't done that trick for a few years, so don't know how well it works on more modern SQL versions. Last time I tried it, was still on SQL 2005. So test thoroughly if you decide to try that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks GSquared....

    interestingly, this works great:

    select * from TableA a

    where not exists (select CustID from remoteserver.PRDA.dbo.Cust b where b.Custid=a.Custid)

  • Yeah, it's kind of unpredictable what will work and what won't, when it comes to cross-server queries.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Rem70Rem (8/27/2012)


    Thanks GSquared....

    interestingly, this works great:

    select * from TableA a

    where not exists (select CustID from remoteserver.PRDA.dbo.Cust b where b.Custid=a.Custid)

    Personally I think that is a better query in any case, the IN's just offend my delicate sensibilities. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (8/28/2012)


    Personally I think that is a better query in any case, the IN's just offend my delicate sensibilities. 🙂

    Why?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/28/2012)


    TheSQLGuru (8/28/2012)


    Personally I think that is a better query in any case, the IN's just offend my delicate sensibilities. 🙂

    Why?

    Because I have delicate sensibilities, of course!! 😀

    I also note that NOT IN can give "wrong"/"unexpected" output when NULLs are in play, and in some scenarios can be WAY less efficient than NOT EXISTS. Simple web search will show examples, including from your own blog.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • GilaMonster (8/28/2012)


    TheSQLGuru (8/28/2012)


    Personally I think that is a better query in any case, the IN's just offend my delicate sensibilities. 🙂

    Why?

    In my experience, IN performance stinks with sets. They work fine with literals [WHERE state_code IN ('FL', 'NH', 'TN')]. It's been so long since I used an IN with a set that I don't remember why it performed so monumentally bad.

    If you're going to use IN with a set, you might as well join to the set. That gives SQL Server more options on how to do the matching. If you need a NOT IN then I recommend going with NOT EXISTS (as Kevin mentioned).

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • IN with a subquery performs identically to EXISTS

    http://sqlinthewild.co.za/index.php/2009/08/17/exists-vs-in/

    INNER JOINs perform slightly slower than IN

    http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/

    NOT in performs badly when the columns involved are nullable because there's a logical difference in their behaviour (NOT IN vs NOT EXISTS) when there are NULLs involved. If the columns are not nullable, NOT IN and NOT EXISTS perform identically.

    http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Tatsu (9/6/2012)


    GilaMonster (8/28/2012)


    TheSQLGuru (8/28/2012)


    Personally I think that is a better query in any case, the IN's just offend my delicate sensibilities. 🙂

    Why?

    In my experience, IN performance stinks with sets. They work fine with literals [WHERE state_code IN ('FL', 'NH', 'TN')]. It's been so long since I used an IN with a set that I don't remember why it performed so monumentally bad.

    If you're going to use IN with a set, you might as well join to the set. That gives SQL Server more options on how to do the matching. If you need a NOT IN then I recommend going with NOT EXISTS (as Kevin mentioned).

    I've actually found the opposite to be true... that IN will, many times, perform better than an inner join.

    You show me your test code and I'll show you mine. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Gail and Jeff!

    The experience I was referring to was apparently old and moldy. I think I ran into a problem with subqueries and IN back in SQL 7 or 2000. I've used EXISTS since then with excellent results so never went back.

    Great information, especially Gail's links.

    [font="Tahoma"]Bryant E. Byrd, BSSE MCDBA MCAD[/font]
    Business Intelligence Administrator
    MSBI Administration Blog

  • Tatsu (9/7/2012)


    Thanks Gail and Jeff!

    The experience I was referring to was apparently old and moldy. I think I ran into a problem with subqueries and IN back in SQL 7 or 2000. I've used EXISTS since then with excellent results so never went back.

    Great information, especially Gail's links.

    Heh... been there and done that. I've had to retest everything I could think of when a new release comes out because you just don't know what's going to happen. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply