Finding "like" or "related" items

  • Hello All,

    I am hoping someone can help me with an elegant and practical solution. I have sales data representing more than a million individual sale items and roughly 40 million rows of actual sales transactions. Because all sales transaction contain an invoice number I can group all sales by invoice and determine all parts sold on the same invoice. Approaching this with a sample of one (one item) it is relatively straight forward to select out all invoices that contain that particular item, count up (group) all other items also appearing on those invoices, then rank or score them by the frequency of their appearance.

    Now, I need to apply this logic to the entire database and a cursor seems illogical and would likely exceed the system resources.

    I'm looking for the smart way to approach this situation. I can take a subset of roughly 12,000 items as the core for this project but the same questions apply.

    Any and all help is appreciated.

    Thank you,

    Lonnie M

  • This sounds like a classic analysis cube situation.

    Correct me if this is wrong: You need to know things like, if item X is purchased, how frequently does item Y also get purchased, for every item in the list?

    If so, that's pretty much what analysis cubes are designed to help you find. Take a look at SSAS (SQL Server Analysis Services) and see if that'll do what you need.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Ralph Kimball outlines a sql based approach for this kind of market basket analysis.

    http://www.kimballgroup.com/html/articles_search/articles1999/9904IE.html?TrkID=IE199904

    This article assumes you've got a firm grasp on Dimensional Modeling, ETL's and Data Warehousing.

    Good Luck!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply