Totally confused by query behavior and plan

  • I am transferring, cleaning, upgrading data from our old FoxPro system to SQL Server. I am currently working on citizenship information and I am getting some odd, at least to me, behaivor. It looks like in the query plan I am getting CROSS JOIN behavior from a MERGE JOIN Operator as I have ~20000 rows on each side of the JOIN and 251862818 coming out of the JOIN. I can't for the life of me figure out how to get rid of it. It returns and should return ~20000 rows but takes over 2 minutes. Obviously I am missing something.

    Attached is a zip file with table creation scripts and the query (Also here) (only necessary columns), a data creation script (about 100 rows for the main tables) and the .sqlplan (RealData.sqlplan). I went with the attachment because the scripts are very long because of the detail.

    BTW when I run the attached scripts and the query against the 100 row tables I get a similar plan (TestData.sqlplan) also attached.

    SQL Server 2005 sp2 on Windows Server 2003 all 64-bit. 4 dual-core processors, plenty of ram as it is a dev box where I am basically the only one currently working.

    Thanks all.

    Edit: Sorry, forgot the Drop commands for all the objects:

    DROP TABLE data_xfer.citizen_xfer

    DROP TABLE data_xfer.ntm_iso_country_xref

    DROP TABLE data_xfer.citizen_xfer

    DROP TABLE dbo.Persons

    DROP TABLE dbo.Citizenships

    DROP SCHEMA data_xfer

    And the actually query although it is included in the TableCreate.sql file:

    SELECT

    C.Citizenship_ID

    FROM

    data_xfer.citizen_xfer AS CX JOIN

    (

    SELECT

    ISNULL(data_xfer.person_xfer.person_no, dbo.Persons.ntmaa_person_no) AS personno,

    dbo.Persons.Person_ID

    FROM

    data_xfer.person_xfer RIGHT OUTER JOIN

    dbo.Persons ON

    data_xfer.person_xfer.primary_person_no = dbo.Persons.ntmaa_person_no

    ) AS P ON

    CX.personno = P.personno JOIN

    data_xfer.ntm_iso_country_xref AS NICX ON

    CX.ctrycd = NICX.ntm_code JOIN

    dbo.Citizenships AS C ON

    P.Person_Id = C.Person_Id AND

    NICX.iso_code = C.Country_Code

  • That's a strange one. The estimated & actual rows are WAY off. You might check the statistics. Let me keep looking.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Looking at it some more, you're query is joining between tables with this:

    JOIN dbo.Citizenships AS C

    ON P.Person_Id = C.Person_Id

    AND NICX.iso_code = C.Country_Code

    Which hits the derived table and the NICX table seperately, but you'll notice in the merge join that they Person_ID column isn't used as part of the WHERE clause for the Join. I think that's where the problem is coming up.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Completely no help to you, this is a great example of how the estimated costs and the actual costs have nothing to do with each other. Jeff would love it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Guys, I've never seen a table joined against two different tables before. Do you do it a lot?

    Just a hunch, but have you tried getting the P.Person_Id and NICX.iso_code together in a dt before you join the citizenships table?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Bob Hovious (2/6/2009)


    Guys, I've never seen a table joined against two different tables before. Do you do it a lot?

    Sure. I do "glue" tables (many to many relationships) all the time. This is just an odd case because we are changing the structure so much (normalizing) and it is behaving quite oddly.

    Just a hunch, but have you tried getting the P.Person_Id and NICX.iso_code together in a dt before you join the citizenships table?

    I am doing this now since I could not get a single query to do what I wanted in a timely manner. I load up everything except the Citizenships data into a table variable and then join to the Citizenships table and, viola, done in no time.

    I probably spent too much time on it, but I was annoyed that I couldn't get it to work so I kept banging my head on it. Then I thought I might have been missing something simple, so I posted it here.

    Thanks for taking a look to both you and Grant.

  • Give this a try at least. It runs faster with your test data.

    I use glue tables all the time too... but always in this format:

    select blah

    from tableA a

    join glueAB glue on glue.acolumn = a.column

    join tableB b on b.column = glue.bcolumn

    Here's the revision.

    ----------------------------------------------------------------------------------------------

    SELECT C.Citizenship_ID

    FROM (select person_ID,iso_code

    From data_xfer.citizen_xfer AS CX

    JOIN data_xfer.ntm_iso_country_xref AS NICX ON CX.ctrycd = NICX.ntm_code

    JOIN (SELECT ISNULL(data_xfer.person_xfer.person_no, dbo.Persons.ntmaa_person_no) AS personno, dbo.Persons.Person_ID

    FROM data_xfer.person_xfer

    RIGHT OUTER JOIN dbo.Persons ON data_xfer.person_xfer.primary_person_no = dbo.Persons.ntmaa_person_no

    ) AS P ON CX.personno = P.personno

    ) as newDT

    JOIN dbo.Citizenships AS C ON newDT.Person_Id = C.Person_Id AND

    newDT.iso_code = C.Country_Code

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks Bob, I'll give it a shot on Monday.

  • Or not 🙂 You've already fixed the problem.

    I was just curious if the optimizer would generate a different query plan with your live data.

    Hope you have a good weekend, Jack.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • This may not be what was happening here, but I had a somewhat similar issue on Friday using nested derived tables. It made me ask a question that I've never 100% known the answer to, and that is whether or not a nested derived table will inherit the where exclusions before it runs. Here's an example of what I was doing. Sometimes it ran in milliseconds, sometimes it took 50+ seconds.

    SELECT ...

    FROM fs_sro sro (NOLOCK)

    INNER JOIN CSRT_Login_User_Customer_List LUCL (NOLOCK) ON sro.Cust_Num = LUCL.Cust_Num

    INNER JOIN (SELECT S.ref_num,

    CONVERT(UNIQUEIDENTIFIER, MAX(CONVERT(BINARY(16), Rowpointer))) Rowpointer

    FROM fs_schedule S (NOLOCK)

    INNER JOIN (SELECT ref_num,

    MAX(Sched_Date) Sched_Date

    FROM fs_Schedule (NOLOCK)

    WHERE appt_stat = '30 SCHEDULED'

    OR appt_stat = '40 DISPATCHED'

    GROUP BY Ref_Num

    ) SMD ON S.ref_num = SMD.ref_num

    AND S.sched_Date = SMD.Sched_Date

    GROUP BY S.ref_num

    ) SMR ON sro.sro_num = SMR.ref_num

    INNER JOIN fs_schedule sched (NOLOCK) ON SMR.RowPointer = sched.RowPointer

    WHERE (sched.Complete = 0)

    AND (sched.appt_stat = '30 SCHEDULED'

    OR sched.appt_stat = '40 DISPATCHED'

    )

    AND LUCL.SessionID = '2fr5gnf3v4w41yjp0vua3pfn'

    What I'm wondering, is if those derived tables will always inherit the exclusions (based on the where and other joins above them), or if it could be trying to apply a uniqueidentifier conversion to binary to the entire table. Adding the same where criteria to the inner derived table seems to have "fixed" the issue, but it made the base processing time about half again as long due to having to rejoin tables.

    This is the version that could take milliseconds or minutes. I had already duplicated some of the criteria in both places, but not all of it.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Seth, if you look at the execution plan, even visually, can't you see where the exclusions are happening in the process?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Garadin (2/7/2009)


    This may not be what was happening here, but I had a somewhat similar issue on Friday using nested derived tables. It made me ask a question that I've never 100% known the answer to, and that is whether or not a nested derived table will inherit the where exclusions before it runs.

    CTE's seem to do a better job of that than derived tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey (2/6/2009)


    Completely no help to you, this is a great example of how the estimated costs and the actual costs have nothing to do with each other. Jeff would love it.

    😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think I could. The problem is, the estimated showed no issues at all, and the actual either ran in milliseconds or I had to kill it after a few minutes. The problem kept vanishing on me when I was trying to troubleshoot it and after half an hour of clearing the proc cache and generating new sessionID's to try to make it happen again, I called it quits and added in the extra criteria.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Jeff Moden (2/7/2009)


    Garadin (2/7/2009)


    This may not be what was happening here, but I had a somewhat similar issue on Friday using nested derived tables. It made me ask a question that I've never 100% known the answer to, and that is whether or not a nested derived table will inherit the where exclusions before it runs.

    CTE's seem to do a better job of that than derived tables.

    If I could use CTE's, I wouldn't have been doing that in the first place. I'd have just used ROW_NUMBER instead :hehe:.

    But that seems to confirm my suspicion that the optimizer can choose to resolve the inner rows before the outer (which makes sense). Is it good practice then to add all criteria to the derived tables instead of the outer query? Guess that makes more sense overall anyways.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply