Capturing product affinities

  • Does anyone have any recommendation for a good way to code the capture of product affinities?  I'd like to do it without using a cursor, but I'm not sure that it's possible.

    Here's the scenario:  A sales order may contain 1 to N lines.  In our example, we'll say that the sales order contains lines for the sale of items 'A', 'B', and 'C'.  Product affinities are defined as "When someone bought item 'A', they also bought item 'B', AND when someone bought item 'A', they also bought item 'C', AND the reverse of these is also true.  That is, when someone bought 'B', they also bought 'A', for example."

    We want to capture the follow data elements: 1) The date on which the affinity occurred, 2) The sum quantity of each item that sold, and 3) The number of 'hits' for the affinity pair.

    I'm thinking the columns in the table might look something like:

    • AffinityKey (unique key for each item pair combination)
    • Item1ID
    • Item2ID
    • Item1Qty
    • Item2Qty
    • AffinityHits
    • Date (not datetime)

    The quantity columns would the sum quantities sold for each item in the affinity pair for the date, and AffinityHits column would be incremented by one for each occurrence of the affinity pair for the date.  AffinityKey-Date would be a unique constraint on the table.

    Any ideas would be appreciated.  Especially if you've done something like this before.

    Thanks.


    Richard D. Cushing
    Sr. Consultant
    SYNGERGISTIC SOFTWARE SOLUTIONS, LLC

  • It'll take me a bit to setup some test data and write an example (an I'm going to bed for the night so not happening tonight), but I'm thinking a "self join" of the source table where the Invoice number = the Invoice number and the ItemID <> ItemID... of course Affinity Hits would be a SUM aggragate, Date would come from either the left or right table (no outer join, though... this would be more of a semi-cross-join).  Seems like the AffinityKey should actually be a combination fo the two ItemID's.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Assuming the structure of the Invoice & InvoiceDetail to be as follows with the sample test data

    create table Invoice (InvNum int NOT NULL identity(1,1), InvDate datetime NOT NULL)

    create table InvDetl (InvNum int NOT NULL, ItemID varchar(5) NOT NULL, Qty int NOT NULL)

    insert into Invoice (InvDate) select '01/01/2006'

    insert into Invoice (InvDate) select '01/02/2006'

    insert into Invoice (InvDate) select '01/02/2006'

    insert into Invoice (InvDate) select '01/03/2006'

    insert into Invoice (InvDate) select '01/03/2006'

    insert into Invoice (InvDate) select '01/03/2006'

    insert into Invoice (InvDate) select '01/03/2006'

    insert into InvDetl select 1, 'A', 3

    insert into InvDetl select 1, 'B', 2

    insert into InvDetl select 1, 'C', 1

    insert into InvDetl select 2, 'A', 4

    insert into InvDetl select 2, 'B', 6

    insert into InvDetl select 2, 'D', 2

    insert into InvDetl select 2, 'E', 2

    insert into InvDetl select 3, 'C', 5

    insert into InvDetl select 3, 'B', 2

    insert into InvDetl select 3, 'A', 1

    insert into InvDetl select 4, 'A', 9

    insert into InvDetl select 5, 'C', 4

    insert into InvDetl select 5, 'B', 6

    insert into InvDetl select 5, 'D', 2

    insert into InvDetl select 5, 'E', 2

    insert into InvDetl select 6, 'A', 4

    insert into InvDetl select 6, 'B', 6

    insert into InvDetl select 6, 'D', 3

    insert into InvDetl select 6, 'E', 1

    insert into InvDetl select 6, 'G', 2

    insert into InvDetl select 6, 'H', 8

    insert into InvDetl select 7, 'A', 4

    insert into InvDetl select 7, 'B', 6

    insert into InvDetl select 7, 'C', 3

    insert into InvDetl select 7, 'D', 1

    insert into InvDetl select 7, 'E', 5

    insert into InvDetl select 7, 'F', 7

    insert into InvDetl select 7, 'G', 2

    insert into InvDetl select 7, 'H', 9

     

    You can use the following query, that I have attempted as best as I can to validate & test. If you do find errors - please let me know.

    select  d.AffDate, d.Item1, d.Item2, sum(Qty1), sum(Qty2), count(*) as AffHits

    from   

     (select d1.InvNum, d1.ItemID as Item1, d2.ItemID as Item2, d1.Qty as Qty1, d2.Qty as Qty2,

      (select InvDate from Invoice i with (nolock)

       where i.InvNum = d1.InvNum) as AffDate

      from InvDetl d1 with (nolock), InvDetl d2 with (nolock)

      where d1.InvNum = d2.InvNum

      and d1.ItemID <> d2.ItemID

    &nbsp as d

    /*

    where d.AffDate = '01/03/2006' --For Verification purposes

    */

    group by d.AffDate, d.Item1, d.Item2

    order by 2,3

    I would suggest doing this in a batch process once every day, since one would assume that you have a large amount of data.

     

     


    I feel the need - the need for speed

    CK Bhatia

  • Bingo!... Nice job, CK...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, nice indeed. But I was surprised to see this strange difference in results between using the old full join syntax vs the new full join syntax. The new full join syntax produces 2 extra records with nulls in the invoice number and the second item column. And it's all caused by invoice 4 with its lonely single item. I think there is something subtle I'm missing.

    Old style

    select d1.InvNum,d1.ItemID,d2.ItemID

    from InvDetl d1,InvDetl d2

    where d1.InvNum=d2.InvNum and d1.ItemID<>d2.ItemID

    New style

    select d1.InvNum,d1.ItemID,d2.ItemID

    from InvDetl d1

      full join InvDetl d2 on d1.InvNum=d2.InvNum and d1.ItemID<>d2.ItemID

  • Old style

    select d1.InvNum,d1.ItemID,d2.ItemID

    from InvDetl d1,InvDetl d2

    where d1.InvNum=d2.InvNum and d1.ItemID<>d2.ItemID

    New style

    select d1.InvNum,d1.ItemID,d2.ItemID

    from InvDetl d1

      INNER join InvDetl d2 on d1.InvNum=d2.InvNum and d1.ItemID<>d2.ItemID

    or

    select d1.InvNum,d1.ItemID,d2.ItemID

    from InvDetl d1

      CROSS join InvDetl d2

    WHERE d1.InvNum=d2.InvNum and d1.ItemID<>d2.ItemID

  • Right! My confusion about 'full joins'.

  • Not sure Michael is set on this... FULL joins return all rows from both tables 1 time, Cross joins will produce either a partial or full cartesian join (ex. if two tables have 10 rows each, will produce 10*10 or 100 rows if full cartesian join is realized) depending on the cirteria.  INNER join and old style equi-joins must have matches in both tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply