November 20, 2018 at 9:40 am
Hi all,
I feel certain this should be a common query, just maybe not using the right search.
Here goes: I have an ordered items table. How can i determine items that are commonly sold together?
In the following example, i want to see that items 3 and 4 are commonly sold together, and 2, 3, and 4 are also sometimes ordered together.
There can be 2-n items in an order.
Any help will be appreciated!
Thanks,
Paul
CREATE TABLE #tmp_data
(OrderNo INTEGER,
OrderItem VARCHAR(8));
INSERT #tmp_data
SELECT 1, 'Item1'
UNION ALL SELECT 1, 'Item2'
UNION ALL SELECT 1, 'Item3'
UNION ALL SELECT 1, 'Item4'
UNION ALL SELECT 2, 'Item2'
UNION ALL SELECT 2, 'Item3'
UNION ALL SELECT 2, 'Item4'
UNION ALL SELECT 3, 'ItemA'
UNION ALL SELECT 3, 'Item3'
UNION ALL SELECT 3, 'Item4'
UNION ALL SELECT 4, 'ItemB'
UNION ALL SELECT 4, 'Item3'
UNION ALL SELECT 4, 'Item4';
November 20, 2018 at 10:03 am
You're looking for something called Market Basket Analysis.
This article might be helpful
November 20, 2018 at 11:10 am
Yup, sounds like what I want to do. Except in SQL Server 2008. I don't have access to DW.
November 20, 2018 at 11:20 am
A data warehouse is just another database with a star schema instead of a standard 3NF schema.
November 20, 2018 at 11:33 am
Ah sorry, I thought thew WITH(DISTRIBUTION = xxx) was a required part of the tools for the analysis.
I'll have a go at this, thanks!
November 20, 2018 at 12:57 pm
This is an fun and interesting problem. If my solution is what you are looking for (or close enough) then I would suggest this index:CREATE CLUSTERED INDEX cl_nu__temp_data__OrderItem ON #tmp_data(OrderItem);
Solution:SELECT getTop1.searchItem, getTop1.orderItem, getTop1.total, getTop1.countRank
FROM
(
SELECT
searchItem = tx.OrderItem,
orderItem = itvf_itempairs.OrderItem,
total = itvf_itempairs.total,
countRank = DENSE_RANK() OVER
(PARTITION BY tx.OrderItem ORDER BY -itvf_itempairs.total)
FROM
(
SELECT t.OrderItem
FROM #tmp_data AS t
GROUP BY t.OrderItem
) AS tx
CROSS APPLY
(
SELECT t.OrderItem, total = COUNT(*)
FROM #tmp_data AS t
WHERE t.OrderNo IN
(
SELECT t.OrderNo
FROM #tmp_data AS t
WHERE EXISTS
(
SELECT *
FROM #tmp_data AS t2
WHERE t2.OrderItem = tx.orderItem
AND t2.OrderItem = t.orderItem
)
)
AND t.OrderItem <> tx.orderItem
GROUP BY t.OrderItem
) AS itvf_itempairs
) AS getTop1
WHERE getTop1.countRank = 1;
Returns:
searchItem orderItem total countRank
---------- --------- ----------- --------------------
Item1 Item2 1 1
Item1 Item3 1 1
Item1 Item4 1 1
Item2 Item3 2 1
Item2 Item4 2 1
Item3 Item4 4 1
Item4 Item3 4 1
ItemA Item3 1 1
ItemA Item4 1 1
ItemB Item3 1 1
ItemB Item4 1 1
-- Itzik Ben-Gan 2001
November 21, 2018 at 7:02 am
That's pretty impressive Alan, and applied to my real-world data, the results look something like expected.
Thank you very much.
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply