September 23, 2004 at 2:26 pm
I have two tables that I want to join: One that contains items for sale, and one that contains items to buy. I want to join these tables on a common ProductID number column that identifies the product for sale or purchase. Here is a sample query:
SELECT * FROM BuyListings INNER JOIN SellListings ON BuyListings.ProductID=SellListings.ProductID
For any given product, there are a certain number of items for sale, and a certain number of items to buy. I want to uniquely match each sale item from the SellListings table to an item in the BuyListings table.
SAMPLE DATA:
BuyListings Table
BuyListingID ProductID
1 1
2 1
3 1
4 2
5 3
SellListings Table
SellListingID ProductID
10 1
11 1
12 3
12 3
Desired Result Set
BuyListingID SellListingID ProductID
1 10 1
2 11 1
5 12 3
To clarify, for each ProductID, if there is both a BuyListing and SellListing available, I want to join them in the result set. I want to remove that BuyListing and SellListing so they can't be put into the result set again. Then if there is a BuyListing and SellListing still available, repeat the process. If (for a single ProductID) there are more BuyListings then SellListings (or vice versa) the extras are ignored.
Notice that BuyListingID 2 does not join to SellListingID 10 because it is already used in the result set. Also, BuyListingID 3 does not join to SellListingID 10 or 11, because there are already BuyListings joined to those SellListings.
September 23, 2004 at 2:43 pm
SELECT DISTINCT BuyListingID, SellListingID, ProductID
FROM BuyListings INNER JOIN SellListings ON BuyListings.ProductID=SellListings.ProductID
HTH
* Noel
September 23, 2004 at 3:04 pm
Gee....it's that simple?
Thanks for the help!
September 24, 2004 at 12:25 am
Uh-oh, that didn't work. Because DISTINCT only eliminates duplicate rows, I still get the following result set (based on the sample data above):
BuyListingID SellListingID ProductID
1 10 1
1 11 1
2 10 1
2 11 1
3 10 1
3 11 1
5 12 3
5 13 3
September 24, 2004 at 1:45 am
pls give schema and some records so that we can make out what ur asking for
<a href="http://www.websolsoftware.com"> For IT jobs click here</a>
*Sukhoi*[font="Arial Narrow"][/font]
September 24, 2004 at 1:52 am
how are buyinglisting and selllisting dependant on each other ?
<a href="http://www.websolsoftware.com"> For IT jobs click here</a>
*Sukhoi*[font="Arial Narrow"][/font]
September 24, 2004 at 2:36 am
I suspect you'll need to use a cursor to accomplish this in sql since ( if I've understood what you want :S ) you need to continually monitor the results set. Looks like something that might be better done in code, but o/w a cursor and a temp table should suffice.
jon
September 24, 2004 at 4:31 am
Hi!
Notice that BuyListingID 2 does not join to SellListingID 10 because it is already used in the result set. Also, BuyListingID 3 does not join to SellListingID 10 or 11, because there are already BuyListings joined to those SellListings
What you want is simple,
It is not INNER JOIN but combination EXISTS (semi-join) with TOP 1 :
SELECT
BL.BuyListingId
,SellListingId = (
SELECT TOP 1
SellListingId FROM SellListings WHERE ProductId = BL.ProductId
)
,ProductId
FROM
BuyListings BL
WHERE
EXISTS(
SELECT * FROM SellListings WHERE ProductId = BL.ProductId
)
I can't see a practical purpose in such statement - but get!
Good Luck!
September 24, 2004 at 4:43 am
Cursor is IMHO not necessary... and if the BuyListingID and SellListingID would always be precisely as they are in the example - meaning that the ID is identity column with increment 1 and all listings of a certain product follow immediately one after another - then it could be doable even without temporary tables. However, if it can happen, that buy listing with ID 1 is for product A, ID 2 for product B and ID 3 again for product A, then temporary table is needed, to order the tables by product and assign sequence numbers correctly - if this is the case, we can discuss it later, because solution is basically the same, just with pre-ordering the rows. BTW, I suppose, that the 2 identical rows in table SellListings are typing error and SellListingID really identifies the row.
Solution is in marking each line with certain product (in both tables) with numbers from 1 to x and then join both tables on 2 columns - productID, and this additional Sequence number column.
To get the right result from supplied values, following query works fine:
SELECT BuyListingID, SellListingID, buy.productID
FROM
(SELECT *, BuyListingID + 1 - (select min(BuyListingID) from BuyListings where productID = bl.productID) AS seq_no
FROM BuyListings bl) AS buy
JOIN
(SELECT *, SellListingID + 1 - (select min(SellListingID) from SellListings where productID = bl.productID) AS seq_no
FROM SellListings bl) AS sell
ON buy.productID = sell.productID AND buy.seq_no = sell.seq_no
HTH, Vladan
PS: I am aware that this query is far from perfect and most probably would need some ORDER BY clauses in the subqueries... but I'm at work, so my time is limited and I just wanted to show the principial idea behind solution quickly, using supplied sample. We can look at the problem in greater detail later, when we know more about the real structure of data.
September 24, 2004 at 6:29 am
I have a similar task, where I'm matching two lists and want the best matches but can have only one unique match for an item from either list. You don't say whether you have any criteria to choose one match over another, but I will assume there is something (most recent posting, sale/asking price?) that you can put into an expression for ranking.
Create a temp table, create unique indexes on BuyListingID and SellListingID using WITH IGNORE_DUP_KEY. Join the two tables to enumerate every possible match, and insert the results in the temp table ordered by match score. Only row will be saved for any BuyListingID or SellListingID, the others are ignored. I've found this technique to be much simpler than trying to create a query to identify and delete all the duplicates.
CREATE TABLE #Results (
BuyListingID INT NOT NULL,
SellListingID INT NOT NULL,
ProductID INT NOT NULL)
CREATE UNIQUE NONCLUSTERED INDEX IX_Results_Buy ON #Results (BuyListingID) WITH IGNORE_DUP_KEY
CREATE UNIQUE NONCLUSTERED INDEX IX_Results_Sell ON #Results (SellListingID) WITH IGNORE_DUP_KEY
INSERT INTO #Results (BuyListingID, SellListingID, ProductID)
SELECT BuyListingID, SellListingID, b.ProductID
FROM BuyListings b
INNER JOIN SellListings s ON b.ProductID = s.ProductID
ORDER BY <ranking expression>
You'll get a warning message about duplicates ignored, but the results will be what you're looking for.
September 24, 2004 at 7:18 am
Hello,
I got to look at the problem again and realized, that - if I understand everything correctly - this query should work even if the products are not ordered, as I mentioned with the first solution... and there is no need for temporary tables, inserts and similar.
SELECT BuyListingID, SellListingID, buy.productID
FROM
(SELECT *, (select count(*) from BuyListings where productID = bl.productID AND BuyListingID <= bl.BuyListingID) AS seq_no FROM BuyListings bl) AS buy
JOIN
(SELECT *, (select count(*) from SellListings where productID = sl.productID AND SellListingID <= sl.SellListingID) AS seq_no FROM SellListings sl) AS sell
ON buy.productID = sell.productID AND buy.seq_no = sell.seq_no
This query should always return correct results... e.g. with this example, where the first one fails:
create table BuyListings (BuyListingID int, ProductID int)
insert into BuyListings values (1,1)
insert into BuyListings values (2,2)
insert into BuyListings values (3,1)
insert into BuyListings values (4,1)
insert into BuyListings values (5,3)
insert into BuyListings values (6,3)
create table SellListings (SellListingID int, ProductID int)
insert into SellListings values (10,1)
insert into SellListings values (11,1)
insert into SellListings values (12,3)
Result:
BuyListingID SellListingID productID
------------ ------------- -----------
1 10 1
3 11 1
5 12 3
I also apologize for the confusing aliases in my previous attempt - it worked OK, as they were in different select statements, but giving identical alias to two different tables within one complex query is not the best practice :-).
September 24, 2004 at 7:31 am
if u could clarify the other way that u need
it based on product id it would have been simpler to understand
productid buylistingid selllistingid
becoz u say other way problem looks confused
<a href="http://www.websolsoftware.com"> For IT jobs click here</a>
*Sukhoi*[font="Arial Narrow"][/font]
September 24, 2004 at 7:56 am
I agree with Vladan's approach. For a more complex case, however, the ListingID's won't be in
continuous sequence. For this technique to work, however, they at least have to unique in each table.
1A. The first intermediate result calculates the sequence number for each BuyListing.
You can save this in a #temp table, or use the SQL as a sub-query (see below step #3).
SELECT ROWB.BuyListingID, ROWB.ProductID, COUNT(ALLB.BuyListingID) AS seq_no
FROM BuyListings AS ROWB,
BuyListings AS ALLB
WHERE ALLB.ProductID = ROWB.BuyListingID
AND ALLB.BuyListingID <= ROWB.BuyListingID
GROUP BY ROWB.ProductID, ROWB.BuyListingID
1B. Intermediate result set #1:
BuyListingID ProductID seq_no
1 1 1
2 1 2
3 1 3
4 2 1
5 3 1
2A. Repeat exactly for the SellListing table:
SELECT ROWS.SellListingID, ROWS.ProductID, COUNT(ALLS.SellListingID) AS seq_no
FROM SellListings AS ROWS,
SellListings AS ALLS
WHERE ALLS.ProductID = ROWS.SellListingID
AND ALLS.SellListingID <= ROWS.SellListingID
GROUP BY ROWS.ProductID, ROWS.SellListingID
2B. Intermediate result set #2:
SellListingID ProductID seq_no
10 1 1
11 1 2
12 3 1
3A. Now join the intermediate result sets on ProductID and seq_no:
SELECT INTB.BuyListingID, INTS.SellListingID, INTB.ProductID, INTB.seq_no
FROM <IntermediateBuyResult> AS INTB
<IntermediateSellResult> AS INTS
WHERE INTB.ProductID = INTS.ProductID
AND INTB.seq_no = INTS.seq_no
3B. Final result set:
BuyListingID SellListingID ProductID seq_no
1 10 1 1
2 11 1 2
5 12 3 1
Bob Monahon
September 24, 2004 at 8:09 am
Hi Bob,
I agree absolutely, and in fact I already used almost the same SQL in my second post :-). I just didn't have enough time when I was writing the first one, so I decided to hit the basics only and return to this question later. Of course, if the Listing ID's are not unique, then it is hard to make any query work... Anyway, I'm glad that you came up with the same solution independently; it makes me more confident, that it really works
September 24, 2004 at 8:52 am
Vladan and Bob have a great solution, unless:
The listing ID's are not unique
You want to use more complicated criteria to choose the best match
You have very large tables
The solution I posted using unique indexes works nicely with 50+ million row tables, but if I tried something with multiple self-joins my co-workers would string me up before I lived long enough to see the results.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply