February 6, 2009 at 9:08 am
I am transferring, cleaning, upgrading data from our old FoxPro system to SQL Server. I am currently working on citizenship information and I am getting some odd, at least to me, behaivor. It looks like in the query plan I am getting CROSS JOIN behavior from a MERGE JOIN Operator as I have ~20000 rows on each side of the JOIN and 251862818 coming out of the JOIN. I can't for the life of me figure out how to get rid of it. It returns and should return ~20000 rows but takes over 2 minutes. Obviously I am missing something.
Attached is a zip file with table creation scripts and the query (Also here) (only necessary columns), a data creation script (about 100 rows for the main tables) and the .sqlplan (RealData.sqlplan). I went with the attachment because the scripts are very long because of the detail.
BTW when I run the attached scripts and the query against the 100 row tables I get a similar plan (TestData.sqlplan) also attached.
SQL Server 2005 sp2 on Windows Server 2003 all 64-bit. 4 dual-core processors, plenty of ram as it is a dev box where I am basically the only one currently working.
Thanks all.
Edit: Sorry, forgot the Drop commands for all the objects:
DROP TABLE data_xfer.citizen_xfer
DROP TABLE data_xfer.ntm_iso_country_xref
DROP TABLE data_xfer.citizen_xfer
DROP TABLE dbo.Persons
DROP TABLE dbo.Citizenships
DROP SCHEMA data_xfer
And the actually query although it is included in the TableCreate.sql file:
SELECT
C.Citizenship_ID
FROM
data_xfer.citizen_xfer AS CX JOIN
(
SELECT
ISNULL(data_xfer.person_xfer.person_no, dbo.Persons.ntmaa_person_no) AS personno,
dbo.Persons.Person_ID
FROM
data_xfer.person_xfer RIGHT OUTER JOIN
dbo.Persons ON
data_xfer.person_xfer.primary_person_no = dbo.Persons.ntmaa_person_no
) AS P ON
CX.personno = P.personno JOIN
data_xfer.ntm_iso_country_xref AS NICX ON
CX.ctrycd = NICX.ntm_code JOIN
dbo.Citizenships AS C ON
P.Person_Id = C.Person_Id AND
NICX.iso_code = C.Country_Code
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
February 6, 2009 at 11:47 am
That's a strange one. The estimated & actual rows are WAY off. You might check the statistics. Let me keep looking.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 6, 2009 at 11:55 am
Looking at it some more, you're query is joining between tables with this:
JOIN dbo.Citizenships AS C
ON P.Person_Id = C.Person_Id
AND NICX.iso_code = C.Country_Code
Which hits the derived table and the NICX table seperately, but you'll notice in the merge join that they Person_ID column isn't used as part of the WHERE clause for the Join. I think that's where the problem is coming up.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 6, 2009 at 11:57 am
Completely no help to you, this is a great example of how the estimated costs and the actual costs have nothing to do with each other. Jeff would love it.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 6, 2009 at 2:29 pm
Guys, I've never seen a table joined against two different tables before. Do you do it a lot?
Just a hunch, but have you tried getting the P.Person_Id and NICX.iso_code together in a dt before you join the citizenships table?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 6, 2009 at 2:48 pm
Bob Hovious (2/6/2009)
Guys, I've never seen a table joined against two different tables before. Do you do it a lot?
Sure. I do "glue" tables (many to many relationships) all the time. This is just an odd case because we are changing the structure so much (normalizing) and it is behaving quite oddly.
Just a hunch, but have you tried getting the P.Person_Id and NICX.iso_code together in a dt before you join the citizenships table?
I am doing this now since I could not get a single query to do what I wanted in a timely manner. I load up everything except the Citizenships data into a table variable and then join to the Citizenships table and, viola, done in no time.
I probably spent too much time on it, but I was annoyed that I couldn't get it to work so I kept banging my head on it. Then I thought I might have been missing something simple, so I posted it here.
Thanks for taking a look to both you and Grant.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
February 6, 2009 at 3:03 pm
Give this a try at least. It runs faster with your test data.
I use glue tables all the time too... but always in this format:
select blah
from tableA a
join glueAB glue on glue.acolumn = a.column
join tableB b on b.column = glue.bcolumn
Here's the revision.
----------------------------------------------------------------------------------------------
SELECT C.Citizenship_ID
FROM (select person_ID,iso_code
From data_xfer.citizen_xfer AS CX
JOIN data_xfer.ntm_iso_country_xref AS NICX ON CX.ctrycd = NICX.ntm_code
JOIN (SELECT ISNULL(data_xfer.person_xfer.person_no, dbo.Persons.ntmaa_person_no) AS personno, dbo.Persons.Person_ID
FROM data_xfer.person_xfer
RIGHT OUTER JOIN dbo.Persons ON data_xfer.person_xfer.primary_person_no = dbo.Persons.ntmaa_person_no
) AS P ON CX.personno = P.personno
) as newDT
JOIN dbo.Citizenships AS C ON newDT.Person_Id = C.Person_Id AND
newDT.iso_code = C.Country_Code
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 6, 2009 at 7:20 pm
Thanks Bob, I'll give it a shot on Monday.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
February 7, 2009 at 9:39 am
Or not 🙂 You've already fixed the problem.
I was just curious if the optimizer would generate a different query plan with your live data.
Hope you have a good weekend, Jack.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 7, 2009 at 10:11 am
This may not be what was happening here, but I had a somewhat similar issue on Friday using nested derived tables. It made me ask a question that I've never 100% known the answer to, and that is whether or not a nested derived table will inherit the where exclusions before it runs. Here's an example of what I was doing. Sometimes it ran in milliseconds, sometimes it took 50+ seconds.
SELECT ...
FROM fs_sro sro (NOLOCK)
INNER JOIN CSRT_Login_User_Customer_List LUCL (NOLOCK) ON sro.Cust_Num = LUCL.Cust_Num
INNER JOIN (SELECT S.ref_num,
CONVERT(UNIQUEIDENTIFIER, MAX(CONVERT(BINARY(16), Rowpointer))) Rowpointer
FROM fs_schedule S (NOLOCK)
INNER JOIN (SELECT ref_num,
MAX(Sched_Date) Sched_Date
FROM fs_Schedule (NOLOCK)
WHERE appt_stat = '30 SCHEDULED'
OR appt_stat = '40 DISPATCHED'
GROUP BY Ref_Num
) SMD ON S.ref_num = SMD.ref_num
AND S.sched_Date = SMD.Sched_Date
GROUP BY S.ref_num
) SMR ON sro.sro_num = SMR.ref_num
INNER JOIN fs_schedule sched (NOLOCK) ON SMR.RowPointer = sched.RowPointer
WHERE (sched.Complete = 0)
AND (sched.appt_stat = '30 SCHEDULED'
OR sched.appt_stat = '40 DISPATCHED'
)
AND LUCL.SessionID = '2fr5gnf3v4w41yjp0vua3pfn'
What I'm wondering, is if those derived tables will always inherit the exclusions (based on the where and other joins above them), or if it could be trying to apply a uniqueidentifier conversion to binary to the entire table. Adding the same where criteria to the inner derived table seems to have "fixed" the issue, but it made the base processing time about half again as long due to having to rejoin tables.
This is the version that could take milliseconds or minutes. I had already duplicated some of the criteria in both places, but not all of it.
February 7, 2009 at 10:45 am
Seth, if you look at the execution plan, even visually, can't you see where the exclusions are happening in the process?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
February 7, 2009 at 10:54 am
Garadin (2/7/2009)
This may not be what was happening here, but I had a somewhat similar issue on Friday using nested derived tables. It made me ask a question that I've never 100% known the answer to, and that is whether or not a nested derived table will inherit the where exclusions before it runs.
CTE's seem to do a better job of that than derived tables.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 7, 2009 at 10:57 am
Grant Fritchey (2/6/2009)
Completely no help to you, this is a great example of how the estimated costs and the actual costs have nothing to do with each other. Jeff would love it.
😀
--Jeff Moden
Change is inevitable... Change for the better is not.
February 8, 2009 at 2:06 am
I think I could. The problem is, the estimated showed no issues at all, and the actual either ran in milliseconds or I had to kill it after a few minutes. The problem kept vanishing on me when I was trying to troubleshoot it and after half an hour of clearing the proc cache and generating new sessionID's to try to make it happen again, I called it quits and added in the extra criteria.
February 8, 2009 at 11:01 am
Jeff Moden (2/7/2009)
Garadin (2/7/2009)
This may not be what was happening here, but I had a somewhat similar issue on Friday using nested derived tables. It made me ask a question that I've never 100% known the answer to, and that is whether or not a nested derived table will inherit the where exclusions before it runs.CTE's seem to do a better job of that than derived tables.
If I could use CTE's, I wouldn't have been doing that in the first place. I'd have just used ROW_NUMBER instead :hehe:.
But that seems to confirm my suspicion that the optimizer can choose to resolve the inner rows before the outer (which makes sense). Is it good practice then to add all criteria to the derived tables instead of the outer query? Guess that makes more sense overall anyways.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply