preventing duplication in inner join

Question

Post reply

preventing duplication in inner join

Roger Hicks

Valued Member

Points: 61
More actions
September 23, 2004 at 2:26 pm

#60315

I have two tables that I want to join: One that contains items for sale, and one that contains items to buy. I want to join these tables on a common ProductID number column that identifies the product for sale or purchase. Here is a sample query:
SELECT * FROM BuyListings INNER JOIN SellListings ON BuyListings.ProductID=SellListings.ProductID
For any given product, there are a certain number of items for sale, and a certain number of items to buy. I want to uniquely match each sale item from the SellListings table to an item in the BuyListings table.
SAMPLE DATA:
BuyListings Table
BuyListingID    ProductID
1                  1
2                  1
3                  1
4                 2
5                  3
SellListings Table
SellListingID     ProductID
10                  1
11                  1
12                  3
12                  3
Desired Result Set
BuyListingID     SellListingID        ProductID
1                    10                    1
2                    11                    1
5                    12                    3
To clarify, for each ProductID, if there is both a BuyListing and SellListing available, I want to join them in the result set. I want to remove that BuyListing and SellListing so they can't be put into the result set again. Then if there is a BuyListing and SellListing still available, repeat the process. If (for a single ProductID) there are more BuyListings then SellListings (or vice versa) the extras are ignored.
Notice that BuyListingID 2 does not join to SellListingID 10 because it is already used in the result set. Also, BuyListingID 3 does not join to SellListingID 10 or 11, because there are already BuyListings joined to those SellListings.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

noeld SSC Guru Points: 96590 More actions · Answer 1

SELECT DISTINCT BuyListingID, SellListingID, ProductID

FROM BuyListings INNER JOIN SellListings ON BuyListings.ProductID=SellListings.ProductID

HTH

* Noel

Roger Hicks Valued Member Points: 61 More actions · Answer 2

Roger Hicks

Valued Member

Points: 61

September 23, 2004 at 3:04 pm

#524261

Gee....it's that simple?

Thanks for the help!

Roger Hicks Valued Member Points: 61 More actions · Answer 3

Uh-oh, that didn't work. Because DISTINCT only eliminates duplicate rows, I still get the following result set (based on the sample data above):

BuyListingID SellListingID ProductID

1 10 1

1 11 1

2 10 1

2 11 1

3 10 1

3 11 1

5 12 3

5 13 3

sukhoi971 SSCarpal Tunnel Points: 4833 More actions · Answer 4

pls give schema and some records so that we can make out what ur asking for

<a href="http://www.websolsoftware.com"> For IT jobs click here</a>

*Sukhoi*[font="Arial Narrow"][/font]

sukhoi971 SSCarpal Tunnel Points: 4833 More actions · Answer 5

how are buyinglisting and selllisting dependant on each other ?

<a href="http://www.websolsoftware.com"> For IT jobs click here</a>

*Sukhoi*[font="Arial Narrow"][/font]

Jonathan Tomkins Ten Centuries Points: 1268 More actions · Answer 6

I suspect you'll need to use a cursor to accomplish this in sql since ( if I've understood what you want :S ) you need to continually monitor the results set. Looks like something that might be better done in code, but o/w a cursor and a temp table should suffice.

jon

Yukas Mr or Mrs. 500 Points: 521 More actions · Answer 7

Hi!

Notice that BuyListingID 2 does not join to SellListingID 10 because it is already used in the result set. Also, BuyListingID 3 does not join to SellListingID 10 or 11, because there are already BuyListings joined to those SellListings

What you want is simple,

It is not INNER JOIN but combination EXISTS (semi-join) with TOP 1 :

SELECT

BL.BuyListingId

,SellListingId = (

SELECT TOP 1

SellListingId FROM SellListings WHERE ProductId = BL.ProductId

)

,ProductId

FROM

BuyListings BL

WHERE

EXISTS(

SELECT * FROM SellListings WHERE ProductId = BL.ProductId

)

I can't see a practical purpose in such statement - but get!

Good Luck!

Vladan SSC-Insane Points: 22072 More actions · Answer 8

Cursor is IMHO not necessary... and if the BuyListingID and SellListingID would always be precisely as they are in the example - meaning that the ID is identity column with increment 1 and all listings of a certain product follow immediately one after another - then it could be doable even without temporary tables. However, if it can happen, that buy listing with ID 1 is for product A, ID 2 for product B and ID 3 again for product A, then temporary table is needed, to order the tables by product and assign sequence numbers correctly - if this is the case, we can discuss it later, because solution is basically the same, just with pre-ordering the rows. BTW, I suppose, that the 2 identical rows in table SellListings are typing error and SellListingID really identifies the row.

Solution is in marking each line with certain product (in both tables) with numbers from 1 to x and then join both tables on 2 columns - productID, and this additional Sequence number column.

To get the right result from supplied values, following query works fine:

SELECT BuyListingID, SellListingID, buy.productID

FROM

(SELECT *, BuyListingID + 1 - (select min(BuyListingID) from BuyListings where productID = bl.productID) AS seq_no

FROM BuyListings bl) AS buy

JOIN

(SELECT *, SellListingID + 1 - (select min(SellListingID) from SellListings where productID = bl.productID) AS seq_no

FROM SellListings bl) AS sell

ON buy.productID = sell.productID AND buy.seq_no = sell.seq_no

HTH, Vladan

PS: I am aware that this query is far from perfect and most probably would need some ORDER BY clauses in the subqueries... but I'm at work, so my time is limited and I just wanted to show the principial idea behind solution quickly, using supplied sample. We can look at the problem in greater detail later, when we know more about the real structure of data.

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 9

I have a similar task, where I'm matching two lists and want the best matches but can have only one unique match for an item from either list. You don't say whether you have any criteria to choose one match over another, but I will assume there is something (most recent posting, sale/asking price?) that you can put into an expression for ranking.

Create a temp table, create unique indexes on BuyListingID and SellListingID using WITH IGNORE_DUP_KEY. Join the two tables to enumerate every possible match, and insert the results in the temp table ordered by match score. Only row will be saved for any BuyListingID or SellListingID, the others are ignored. I've found this technique to be much simpler than trying to create a query to identify and delete all the duplicates.

CREATE TABLE #Results (

BuyListingID INT NOT NULL,

SellListingID INT NOT NULL,

ProductID INT NOT NULL)

CREATE UNIQUE NONCLUSTERED INDEX IX_Results_Buy ON #Results (BuyListingID) WITH IGNORE_DUP_KEY

CREATE UNIQUE NONCLUSTERED INDEX IX_Results_Sell ON #Results (SellListingID) WITH IGNORE_DUP_KEY

INSERT INTO #Results (BuyListingID, SellListingID, ProductID)

SELECT BuyListingID, SellListingID, b.ProductID

FROM BuyListings b

INNER JOIN SellListings s ON b.ProductID = s.ProductID

ORDER BY <ranking expression>

You'll get a warning message about duplicates ignored, but the results will be what you're looking for.

Vladan SSC-Insane Points: 22072 More actions · Answer 10

Hello,

I got to look at the problem again and realized, that - if I understand everything correctly - this query should work even if the products are not ordered, as I mentioned with the first solution... and there is no need for temporary tables, inserts and similar.

SELECT BuyListingID, SellListingID, buy.productID

FROM

(SELECT *, (select count(*) from BuyListings where productID = bl.productID AND BuyListingID <= bl.BuyListingID) AS seq_no FROM BuyListings bl) AS buy

JOIN

(SELECT *, (select count(*) from SellListings where productID = sl.productID AND SellListingID <= sl.SellListingID) AS seq_no FROM SellListings sl) AS sell

ON buy.productID = sell.productID AND buy.seq_no = sell.seq_no

This query should always return correct results... e.g. with this example, where the first one fails:

create table BuyListings (BuyListingID int, ProductID int)

insert into BuyListings values (1,1)

insert into BuyListings values (2,2)

insert into BuyListings values (3,1)

insert into BuyListings values (4,1)

insert into BuyListings values (5,3)

insert into BuyListings values (6,3)

create table SellListings (SellListingID int, ProductID int)

insert into SellListings values (10,1)

insert into SellListings values (11,1)

insert into SellListings values (12,3)

Result:

BuyListingID SellListingID productID

------------ ------------- -----------

1 10 1

3 11 1

5 12 3

I also apologize for the confusing aliases in my previous attempt - it worked OK, as they were in different select statements, but giving identical alias to two different tables within one complex query is not the best practice :-).

sukhoi971 SSCarpal Tunnel Points: 4833 More actions · Answer 11

if u could clarify the other way that u need

it based on product id it would have been simpler to understand

productid buylistingid selllistingid

becoz u say other way problem looks confused

<a href="http://www.websolsoftware.com"> For IT jobs click here</a>

*Sukhoi*[font="Arial Narrow"][/font]

Bob Monahon Ten Centuries Points: 1282 More actions · Answer 12

I agree with Vladan's approach. For a more complex case, however, the ListingID's won't be in

continuous sequence. For this technique to work, however, they at least have to unique in each table.

1A. The first intermediate result calculates the sequence number for each BuyListing.

You can save this in a #temp table, or use the SQL as a sub-query (see below step #3).

SELECT ROWB.BuyListingID, ROWB.ProductID, COUNT(ALLB.BuyListingID) AS seq_no

FROM BuyListings AS ROWB,

BuyListings AS ALLB

WHERE ALLB.ProductID = ROWB.BuyListingID

AND ALLB.BuyListingID <= ROWB.BuyListingID

GROUP BY ROWB.ProductID, ROWB.BuyListingID

1B. Intermediate result set #1:

BuyListingID ProductID seq_no

1 1 1

2 1 2

3 1 3

4 2 1

5 3 1

2A. Repeat exactly for the SellListing table:

SELECT ROWS.SellListingID, ROWS.ProductID, COUNT(ALLS.SellListingID) AS seq_no

FROM SellListings AS ROWS,

SellListings AS ALLS

WHERE ALLS.ProductID = ROWS.SellListingID

AND ALLS.SellListingID <= ROWS.SellListingID

GROUP BY ROWS.ProductID, ROWS.SellListingID

2B. Intermediate result set #2:

SellListingID ProductID seq_no

10 1 1

11 1 2

12 3 1

3A. Now join the intermediate result sets on ProductID and seq_no:

SELECT INTB.BuyListingID, INTS.SellListingID, INTB.ProductID, INTB.seq_no

FROM <IntermediateBuyResult> AS INTB

<IntermediateSellResult> AS INTS

WHERE INTB.ProductID = INTS.ProductID

AND INTB.seq_no = INTS.seq_no

3B. Final result set:

BuyListingID SellListingID ProductID seq_no

1 10 1 1

2 11 1 2

5 12 3 1

Regards,

Bob Monahon

Vladan SSC-Insane Points: 22072 More actions · Answer 13

Hi Bob,

I agree absolutely, and in fact I already used almost the same SQL in my second post :-). I just didn't have enough time when I was writing the first one, so I decided to hit the basics only and return to this question later. Of course, if the Listing ID's are not unique, then it is hard to make any query work... Anyway, I'm glad that you came up with the same solution independently; it makes me more confident, that it really works

Scott Coleman One Orange Chip Points: 27452 More actions · Answer 14

Vladan and Bob have a great solution, unless:

The listing ID's are not unique

You want to use more complicated criteria to choose the best match

You have very large tables

The solution I posted using unique indexes works nicely with 50+ million row tables, but if I tried something with multiple self-joins my co-workers would string me up before I lived long enough to see the results.