April 6, 2013 at 2:34 pm
I have multiple pairs of "Offers" and each Offer is comprised of several lists of "Items".
My pairs of Offers are defined in one table (OfferOverlaps) with about 200,000 rows, and I have another table (OfferItems) which contains the mapping of each Offer's lists of Items with about 30,000,000 rows.
Using these two tables I need to find the rows from OfferOverlap where the two overlapping Offers defined in each row contain at least one common Item in OfferItems.
I first tried this:
select distinct OO.OfferID, OO.OverlappingOfferID
from OfferOverlaps OO
join OfferItems OI1 on OO.OfferID = OI1.OfferID
join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID
where OI1.ItemID = OI2.ItemID
But the performance was not satisfactory.
I also tried to use exists in an attempt to try to get the comparison of matching ItemID's to stop immediately after the first match was found rather than evaluating all matches (potentially over a million) between each set of OfferItems:
select distinct OO.OfferID, OO.OverlappingOfferID
from OfferOverlaps OO
join OfferItems OI1 on OO.OfferID = OI1.OfferID
join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID
where exists (select OI1.ItemID intersect select OI2.ItemID)
The performance was the same. In fact, the query plans are identical for both of these approaches.
I would be very grateful for any ideas you might have about a different approach to this query problem that would yield better performance than these approaches that I've tried.
Below is a simple example that sets up an illustration of the query problem.
[font="Courier New"]
--drop table OfferOverlaps
--drop table OfferItems
-- 200,000 rows in this table
create table OfferOverlaps (
OfferID int,
OverlappingOfferID int,
constraint cpk_OfferOverlaps primary key clustered (OfferID,OverlappingOfferID)
)
insert into OfferOverlaps values
(1, 6),
(3, 8),
(5, 9);
-- 30,000,000 rows in this table
create table OfferItems (
OfferID int,
ListID int,
ItemID int,
constraint cpk_OfferItems primary key clustered (OfferID,ListID,ItemID)
)
insert into OfferItems values
(1, 1, 111),
(1, 1, 222),
(2, 2, 222),
(3, 1, 333),
(3, 2, 888),
(4, 1, 444),
(4, 2, 555),
(5, 1, 555),
(6, 1, 444),
(6, 2, 666),
(7, 1, 777),
(8, 1, 333),
(8, 1, 888),
(9, 1, 999);
select distinct OO.OfferID, OO.OverlappingOfferID
from OfferOverlaps OO
join OfferItems OI1 on OO.OfferID = OI1.OfferID
join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID
where OI1.ItemID = OI2.ItemID
select distinct OO.OfferID, OO.OverlappingOfferID
from OfferOverlaps OO
join OfferItems OI1 on OO.OfferID = OI1.OfferID
join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID
where exists (select OI1.ItemID intersect select OI2.ItemID)
[/font]
April 7, 2013 at 5:38 am
For larger data sets, I'd work on getting rid of that DISTINCT operator. It's going to lead to issues because of the aggregation. But, leaving everything roughly the same, I found if you just added an index like this
CREATE INDEX ix1 on dbo.OfferOverlaps(OverlappingID);
Performance on my machine went from around 92ms for the original structure to about 14ms with this index. Because the clustered index is included in the key of the nonclustered index, it makes this nonclustered index covering. Having the data sorted by OverlappingID eliminates the clustered index scan.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
April 8, 2013 at 11:47 pm
This should help a bit:
Add to table OfferOverlaps:
,
constraint uk_OfferOverlaps unique (OverlappingOfferID,OfferID)
and this to table OrderItems:
CREATE INDEX IX_ItemOrder ON OfferItems(ItemID, OfferID)
And to avoid DISTINCT you may use this approach:
select OO.OfferID, OO.OverlappingOfferID
from OfferOverlaps OO
WHERE EXISTS (
SELECT *
FROM OfferItems OI1
INNER JOIN OfferItems OI2 on OI2.OfferID = OO.OverlappingOfferID AND OI1.ItemID = OI2.ItemID
WHERE OI1.OfferID = OO.OfferID
)
_____________
Code for TallyGenerator
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply