August 3, 2009 at 10:09 am
Here is my scenario
T1, Primary table, pk col is char(10),not null, approx. 2.5 million rows
T2, foreign table, fk col is char(10) not null, between 3 and 6 million rows. FK col is indexed.
this is a one to many relationship between T1 and T2
I often have to verify that a row from T1 is not in T2 but I do not need any other data and have no additional filter criteria for T2. All query filter criteria pertains to T1 and will result in between 10k and 200k result rows.
I'm not sure of the best way to perform a simple existence check between T1 and T2 so I find myself doing a left join and just checking for T2.Col IS NOT NULL. For example:
SELECT t1.col, t2.col, etc..
FROM
T1
LEFT JOIN T2 ON T2.FkCol = F1.PkCol
WHERE
t1.Col = 'xxx' AND
t2.FkCol IS NOT NULL
This works but the query plans are inconsistant. Sometimes I get an index seek and sometimes I get an index scan. I would like to consistantly get index seeks on T2 but so far its been hit or miss.
For test purposes I have converted T2.FkCol to a primary key and removed the duplicate rows but I still get index scans. When looking at the query plan item for T2 I can see that the estimated row count is the same or almost the same as the table row count so sql thinks it needs all the T2 rows and is returning them, which makes sense (I think) because of the left join.
Is there a better way to do this ? I don't know if there are any db design patterns or common practices that might apply to this scenario, or if there is a way to accomplish the same work without the need for a left join.
btw, this is basically a phone list and a do not call list so filter criteria are applied to T1 and T2 is just checked to make sure its not a DNC phone number. The Dnc list may have many rows added or removed during the business day so adding a bit column to T1 and flagging rows as dnc does not appear to be a viable option at this time.
Any suggestions would be appreciated and Thanks.
Bill
August 3, 2009 at 11:36 am
This:
SELECT
t1.col,
t2.col,
etc..
FROM
T1
LEFT JOIN T2
ON T2.FkCol = T1.PkCol
WHERE
t1.Col = 'xxx' AND
t2.FkCol IS NOT NULL
is the same as this:
SELECT
t1.col,
t2.col,
etc..
FROM
Table1 t1
INNER JOIN Table2 t2
on (t1.PkCol = t2.FkCol)
WHERE
t1.Col = 'xxx'
August 3, 2009 at 12:37 pm
Thanks for the reply but after reading it I was still confused (no surprise there), so I re-read my own post and I had a typo that probably explains your response.
Should have been:
SELECT t1.col, t2.col, etc..
FROM
T1
LEFT JOIN T2 ON T2.FkCol = F1.PkCol
WHERE
t1.Col = 'xxx' AND
t2.FkCol IS NULL
I need the rows from T1 where there is NO MATCH in T2. Sorry for any confusion.
August 3, 2009 at 12:54 pm
If I understand what you are getting at you should use a NOT EXISTS query like this:
select t1.col1,
t1.col2,...
from t1
where not exists
(
select *
from t2
where t1.PkCol = t2.FkCol
)
This will return all rows in t1 where a corresponding row does not exist in t2
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
August 3, 2009 at 1:04 pm
William Plourde (8/3/2009)
Thanks for the reply but after reading it I was still confused (no surprise there), so I re-read my own post and I had a typo that probably explains your response.Should have been:
SELECT t1.col, t2.col, etc..
FROM
T1
LEFT JOIN T2 ON T2.FkCol = F1.PkCol
WHERE
t1.Col = 'xxx' AND
t2.FkCol IS NULL
I need the rows from T1 where there is NO MATCH in T2. Sorry for any confusion.
You may also want to try this:
select
t1.col, ...
from
table1 t1
where
t1.col = 'xxx'
and not exists(select 1 from table2 t2 where t1.PkCol = t2.FkCol);
If your execution plans change from seeks to scans when run with a different value for t1.Col = 'xxx' then it maybe the Query Optimizer has determined that a scan may be more efficient than a seek for that particular execution of the query.
August 3, 2009 at 1:06 pm
I should add that as long as the seek predicate on t1 is indexed and all the required columns are included that should get you an index seek on t1 and as long as the FkCol is indexed on t2 that should get you an index seek there.
The query might still be pretty expensive depending on how selective your seek predicate is.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
August 3, 2009 at 1:22 pm
How about an EXCEPT?
SELECT t1.col1
FROM T1
EXCEPT
SELECT t2.FkCol
FROM T2
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 3, 2009 at 1:36 pm
The EXCEPT function under the covers is doing the same thing as the NOT EXISTS, at least in my limited testing.
There is one major drawback to the EXCEPT function and that is that like UNION both recordsets must contain the same number of columns and the columns must be of compatible data types. In a situation like this the EXCEPT function could be used to identify the ID's that you are interested in (your query) but you would have to place the EXCEPT in a derived table or subquery and use the outer query actually return the data you want.
Something like this, but the NOT EXIST is more efficient in this case.
select t1.col1,
t1.col2,...
from t1
join
(
SELECT iCol = t1.PkCol1
FROM T1
EXCEPT
SELECT iCol = t2.FkCol
FROM T2
)list
on t1.PkCol = list.iCol
where t1.col3 = 'xxx'
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
August 3, 2009 at 4:12 pm
Thanks for the help everyone I appreciate it. I tried several of the suggestions and wouldnt'cha know it, the index scan was the fastest. One of the suggestions did result in an index seek on T2 but it was still slower. Go figure.
But I had to give it a try because I'm joining several large tables and the difference between a good query and a bad query was 15minutes. Here is my final query and yes, its dynamic but it has to be for business reasons.
unnecessary detail here....
main tables are about 2million rows, 3.5million, and 2million rows
basically this assigns unallocated leads in a marketing campaign into a logical output file but only if they are not used by another campaign or if they are used but have been designated to allow use by multiple campaigns.
SET @SqlUpdate = '' +
';WITH cteLocked (LeadId) AS (' +
'SELECT cl.LeadId ' +
'FROM CampaignLeads cl ' +
'LEFT JOIN vw_LockedLeads vwll on vwll.LeadId = cl.LeadId ' +
'WHERE ' +
'cl.CampaignId = [CampaignId] AND ' +
'cl.CampaignFileId IS NULL and ' +
'vwll.LeadId IS NOT NULL ' +
'),' +
'cteOverlap (LeadId) AS (' +
'SELECT LeadId ' +
'FROM CampaignLeads cl ' +
'JOIN CampaignOverlap co on co.OverlapCampaignFileId = cl.CampaignFileId ' +
'WHERE co.CampaignId = [CampaignId] ' +
')' +
'UPDATE [TopClause] CampaignLeads ' +
'SET CampaignFileId = [NewFileId] ' +
'FROM ' +
'CampaignLeads cl ' +
'LEFT JOIN cteLocked on cteLocked.LeadId = cl.LeadId ' +
'LEFT JOIN [CarrierDncTableName] dnc on dnc.Btn = cl.LeadId ' +
'LEFT JOIN cteOverlap on cteOverlap.LeadId = cl.LeadId ' +
'WHERE ' +
'cl.CampaignId = [CampaignId] AND ' +
'cl.CampaignFileId IS NULL AND ' +
'dnc.Btn IS NULL AND ' +
'(cteLocked.LeadId IS NULL OR cteOverlap.LeadId IS NOT NULL)'
August 3, 2009 at 4:17 pm
When you have a nonclustered index seek that winds up being more expensive than an index scan you are probably also getting a Key Lookup. Which in turn means that your nonclustered index isn't a "covering" index for the query. I'm going to guess that you can improve the performance of the query dramatically by including a few columns to your index.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply