combining multiple rows into one

  • i have this sales data for a retail shop that's pretty basic/primitive.

    each row is for an item, and not for an order. so if a customer purchases 3 items in a single order, there are 3 separate rows in the data. i would like to combine these rows in a single row that represents an order, for doing market basket analysis.

    customers can be uniquely identified by combining "customer code" and "customer name". a SINGLE order can be identified by combining "customer code", "customer name", and "bill date"

    columns that need to be in the final table are bill date, customer code, customer name, brand, item name, category, and subcategory.

    all columns from brand---subcategory need to be renamed and included/repeated for all items in a single purchase by a customer. for example, if an order has 3 items, the columns should be

    billdate custcode custname brand item cat subcat brand2 item2 cat2 subcat2 brand3 item3 cat3 subcat3

    i'm thinking of a left outer join, but i'm still not getting it. can someone please help me here? am using sql 2008 express.

    thanks,

    zypsy

  • zypsy (10/4/2008)


    each row is for an item, and not for an order. so if a customer purchases 3 items in a single order, there are 3 separate rows in the data. i would like to combine these rows in a single row that represents an order, for doing market basket analysis.

    Have you actually tried doing a "market basket analysis" on a single row like that? You have no idea of the world of hurt that will cause you. Stick with properly designed tables and normalized data and the analysis will be much easier in all cases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Market basket analysis starts with studying all the items in a single purchase/order by a customer. And that's the first thing I'm trying to do with this dataset.

    The maximum number of items in a single order may be 4-5 judging from my data exploration. I will look at things like reducing the complexity of the MB analysis by moving up from item level to subcategory (or to category level) later. What I would first like to do is create a dataset where each row translates into a single purchase/order by a customer.

    thanks,

    zypsy

  • You'll actually be increasing the level of complexity to do the analysis if the analysis will be done by computer and not a human.

    Post some data according to the specs in the link in my signature below and I'll try to show you what I mean...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • i have attached a sample, it's faster. thanks!

  • Heh... yeah, for you, maybe. If you want my help, please post it in the readily consumable form along with a table creation statement like I asked for in the article. You're the one who needs help... make it easy on me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You might also want to describe the analysis you intend to do. I see where you say what you think you need to do to the data, but you never really describe the kind of analysis you want to do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ...AND, if you really want to pivot the data as you say (I still think it's a mistake), take a look at the following article. If you use a partioned ROW_NUMBER to mark each customer's purchases, you can easily pivot the data.

    http://www.sqlservercentral.com/articles/T-SQL/63681/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff, below are the codes. I have selected the top 100 rows only, there are about 600,000 rows in the dataset.

    1. I just need each row in the dataset to represent a single order/purchase.

    2. I will reduce the data for the analysis by selecting only orders that have more than one item.

    3. The market basket analysis will answer something like, if an order has A what is the probability of the order having B item too? or B and C? or B and C and D?....For this, I will be creating additional columns that will calculate the SUPPORT and CONFIDENCE for each rule. I can do this easily in MS Excel.

    I work with SAS and SPSS most of the times but thought this problem can be better worked out in SQL. I will check out your article on pivoting. And i hope i have uploaded the data in the format you want.

    Thank you for your time Jeff.

    -------------------------------------------------------

    create table sample_data

    (

    Billdaten smalldatetime,

    custcode varchar(12),

    custname varchar(50),

    brand varchar(50),

    Category varchar(40),

    SubCategory varchar(50)

    )

    INSERT INTO sample_data (Billdaten,custcode,custname,brand,Category,SubCategory)

    SELECT 'Apr 13 2007 12:00AM','MH14780','RAHUL','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 2 2007 12:00AM','Gnr13992','RAMESH','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 5 2007 12:00AM','RJN19912','PREAM 9844340986','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 1 2007 12:00AM','CLK14382','DEVIRAMMA','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 22 2007 12:00AM','Gnr14136','H N BHAGYA LAKSHMI','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 2 2007 12:00AM','MH15244','HARISH SHARMA','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 11 2007 12:00AM','RJN20716','MANJUIA9448444407','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 11 2007 12:00AM','HRB112633','MUNIRAJ 9448620141','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Jun 4 2007 12:00AM','DML17764','SANGENE BIOTECH PVT LTD','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 21 2007 12:00AM','DML17764','SANGENE BIOTECH PVT LTD','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 30 2007 12:00AM','MH15182','KARTHIK','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 17 2007 12:00AM','Bsk17493','P SUBRAMANYA NAIDU','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 21 2007 12:00AM','HSR12300','P C SEKHAR','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 9 2007 12:00AM','DML17598','T V S NAIDU','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 24 2007 12:00AM','HSR11951','VIKRAM LOGISTIC AND MARITIME SERVICES PVT LTD','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 21 2007 12:00AM','HSR12305','LOKSESH REDDY','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Jun 3 2007 12:00AM','Bsk17677','LAXMI INDUSTRIES','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 29 2007 12:00AM','Bsk10246','Krishna','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 29 2007 12:00AM','Bsk17214','BABY NAGARAJ','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 1 2007 12:00AM','Bsk10090','Santosh','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 5 2007 12:00AM','Bsk17326','SOUMEN','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 19 2007 12:00AM','HRB110238','M D"SOUZA','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 11 2007 12:00AM','HRB111113','REGENCY MAGNUM','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 1 2007 12:00AM','HRB111507','BAJWAN DAS','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 30 2007 12:00AM','DML17454','K CHANDRASEKHAR','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Jun 12 2007 12:00AM','MH16094','HEMACHANDRA VENKATESH','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 1 2007 12:00AM','MH15217','RAMESH','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 14 2007 12:00AM','DML17216','DEJONGE','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 26 2007 12:00AM','DML17411','SANGEETHA PREETHAN','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 16 2007 12:00AM','Bsk17487','MARK DEVELOPERS','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Nov 10 2007 12:00AM','HRB119055','PRAMODA D T','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Aug 8 2007 12:00AM','HSR13327','NITI GUPTA','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 8 2007 12:00AM','Gnr13800','PRABHAKAR','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 29 2007 12:00AM','HSR12021','P PARAMASIVAN','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Jun 21 2007 12:00AM','HRB113550','KPD NAIR 9980510261','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Oct 5 2007 12:00AM','Bsk19741','PROF G R RAMADAS','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 22 2007 12:00AM','HRB112949','RAJAT NAYAK 9900235618','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Aug 27 2007 12:00AM','MYS16397','P.GURUNATH RAO','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 7 2007 12:00AM','RJN19955','KESHAVA MURTHY 98844039731','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 8 2007 12:00AM','RJN19971','SHANKAR','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 28 2007 12:00AM','BGM10489','A K MULLA','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 6 2007 12:00AM','BGM10542','RAMJAN MULLA','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 1 2007 12:00AM','BGM10278','VIKRAM A JAIN','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Jun 12 2007 12:00AM','Bsk17903','SARALAI','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Oct 14 2007 12:00AM','HRB117635','NIRMALA 9740276183','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Oct 1 2007 12:00AM','DML20220','ASHA JERLIN','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 29 2007 12:00AM','RJN20446','Dayananda 9844419010','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 29 2007 12:00AM','Bsk17668','AXIOM COUSULTING PVT LTD','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 3 2007 12:00AM','Bsk16927','CUSTOMERR','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 16 2007 12:00AM','HRB112790','PRADEEP MADURAM 9845165790','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 14 2007 12:00AM','HRB111892','HARISH 9886750812','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 24 2007 12:00AM','DML17389','MICRO ACADAMY','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 3 2007 12:00AM','DML17515','RITVINDER KAVR','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 23 2007 12:00AM','Bsk13951','MURGESH','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Apr 16 2007 12:00AM','MH14910','ATUL AROMA','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'Aug 19 2007 12:00AM','DML19417','M SURESH','ACO','FAN','TOWER FAN' UNION ALL

    SELECT 'May 17 2007 12:00AM','CLK14554','ANIL KUMAR','AIWA','AUDIO','HI FI' UNION ALL

    SELECT 'Apr 11 2007 12:00AM','HSR11790','PREETHAM','ARCTIC','STABLIZER','1KV' UNION ALL

    SELECT 'Aug 28 2007 12:00AM','Bsk19750','ANURADHA','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jun 23 2007 12:00AM','HRB113888','GAURAV 9945881115','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Aug 13 2007 12:00AM','HRB115559','Venu','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Apr 8 2007 12:00AM','RJN10001','Customer','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Sep 30 2007 12:00AM','RJN24830','NANJAPPA','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Oct 11 2007 12:00AM','RJN25000','AEW RJN CUSTOMER','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Aug 4 2007 12:00AM','RJN22923','SRINIVAS 9945078101','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jun 22 2007 12:00AM','Bsk18090','LOKESH','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jul 12 2007 12:00AM','Bsk15477','CUSTOMERR','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Aug 17 2007 12:00AM','Bsk19403','LINGEGOWDA','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'May 26 2007 12:00AM','RJN10001','Customer','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Oct 9 2007 12:00AM','RJN25034','RAVI 9900522111','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'May 20 2007 12:00AM','Bsk10769','SHEKAR','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jul 25 2007 12:00AM','RJN22706','DR.M S GOPAL 9845024686','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Sep 2 2007 12:00AM','Bsk19839','WIPRO LTD','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jul 25 2007 12:00AM','HRB114972','SHIVANANDA','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Nov 16 2007 12:00AM','HRB119329','RAJESH 9986423294','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Nov 18 2007 12:00AM','HRB119410','M arjunan 9845241317','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'May 20 2007 12:00AM','Bsk17547','SHALINI CHOWDHARY','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jul 23 2007 12:00AM','Bsk18755','J JAYAKRISHNAN / ASHA','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Oct 18 2007 12:00AM','Bsk20912','HUSSAIN','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Sep 8 2007 12:00AM','Gnr15265','MR NAGARAJA URS','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jun 20 2007 12:00AM','Bsk18061','HARISH KUMAR','ARCTIC','STABLIZER','200HC' UNION ALL

    SELECT 'Jul 19 2007 12:00AM','Bsk18676','VEERABADRACHAR','ARCTIC','STABLIZER','300HC' UNION ALL

    SELECT 'Apr 14 2007 12:00AM','Bsk13269','SMITHA','ARCTIC','STABLIZER','300HC' UNION ALL

    SELECT 'Aug 16 2007 12:00AM','RJN23393','AYYANAGOWDA 9845986641','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'Jun 6 2007 12:00AM','RJN21282','APPJI 9844439614','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'May 21 2007 12:00AM','CLK14597','CHRISTOPHER','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'Nov 1 2007 12:00AM','RJN26000','RJN 1005 AEW CUSTOMER','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'Nov 25 2007 12:00AM','Bsk10379','YOUGESH','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'Jun 3 2007 12:00AM','HRB113294','PADMA SHARMA 9945787878','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'Apr 3 2007 12:00AM','Bsk16740','SATYANARAYANA PRAKASH','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'Nov 7 2007 12:00AM','Bsk21643','G S GOPALASWAMY','ARCTIC','STABLIZER','50R' UNION ALL

    SELECT 'Oct 20 2007 12:00AM','RJN25461','ALUMUNIYAM SALES CORPORATION','ARCTIC','STABLIZER','AC STABILIZER' UNION ALL

    SELECT 'May 11 2007 12:00AM','HRB112626','THRIVIKRAM NAIDU','ARCTIC','STABLIZER','AC STABILIZER' UNION ALL

    SELECT 'Apr 8 2007 12:00AM','Bsk16820','BALAJI K B','ARCTIC','STABLIZER','AC STABILIZER' UNION ALL

    SELECT 'Jul 3 2007 12:00AM','RJN22060','LAITH SONY 9342358941','ATALSO','BLENDER','HAND MIXER' UNION ALL

    SELECT 'Jul 4 2007 12:00AM','RJN10001','Customer','ATALSO','BLENDER','HAND MIXER' UNION ALL

    SELECT 'May 16 2007 12:00AM','MYS15079','THE PRINCIPAL','BAJAJ','AIR COOLER','NONE' UNION ALL

    SELECT 'May 19 2007 12:00AM','MS10269','DEVRAJ D.N ELECTRICAIN RMC STAFF','BAJAJ','AIR COOLER','NONE' UNION ALL

    SELECT 'May 19 2007 12:00AM','MS10267','M.V. SRINIVAS RMC INCHARGE','BAJAJ','AIR COOLER','NONE' UNION ALL

    SELECT 'Sep 6 2007 12:00AM','RJN24037','S Naga kumar','BAJAJ','ELECTRIC COILER','ELECTRIC COILER'

  • That's better, but guess what this code means?

    SELECT BillDaten, CustCode, COUNT(*) AS ThingsThatNeedPivoting

    FROM Sample_Data

    GROUP BY BillDaten, CustCode

    HAVING COUNT(*) > 1

    BillDaten CustCode ThingsThatNeedPivoting

    ----------------------- ------------ ----------------------

    (0 row(s) affected)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Please, teacher, I know, I know :w00t:

    (sorry, I'm in a good mood today, so I couldn't miss the opportunity for a stupid joke 😛 )

  • And now on a serious note... this is a bit scary for me:

    all columns from brand---subcategory need to be renamed and included/repeated for all items in a single purchase by a customer

    You shouldn't do that... not even if you think you know how many items in one purchase is a maximum. And you can NEVER know that, there can always be one more tomorrow or next month or next year. But even if we accept that you will never have more than 3 items in one order, as soon as you create several columns (item1, item2, item3), your evaluation query needs to take into account 3 columns instead of one. You need to write the query so that it recognizes that these purchases are identical:

    item1=FAN, item2=CD-ROM drive, item3=MOTHERBOARD

    and

    item1=CD-ROM drive, item2=FAN, item3=MOTHERBOARD

    and

    item1=MOTHERBOARD, item2=CD-ROM drive, item3=FAN

    and

    item1=MOTHERBOARD, item2=FAN, item3=CD-ROM drive

    and so on - I'm lazy to write all combinations.

    Now imagine you can have 4 items in a purchase...beginning to look a bit scary, isn't it?

    While as long as you leave the data in a normal table, you have just one column and no variations for different ordering. Don't you think it is better?

  • Zypsy absolutely wants to use a spreadsheet because of some functionality that's easily available there. Although I normally oppose denormalized pivots for processing, Zypsy may be correct here and such a dynamic pivot can be done pretty easily provided the pivot doesn't exceed the maximum number of columns in the spreadsheet. I'm just waiting for some decent data with at least one customer who bought more than 1 item in a day to test with. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Perhaps Zypsy just needs a little of what I'll call "solution motivation". It seems fairly simple that given the existing data structure, the proposed analysis involving what the probability is that a customer that bought B as well as A is exceedingly simple. I don't have time to map to exact field names right now, but consider the following as "concept code":

    DECLARE @ORDERED_A TABLE (ORDERID int PRIMARY KEY)

    INSERT INTO @ORDERED_A

    SELECT DISTINCT ORDERID

    FROM ORDERS

    WHERE ITEM = 'A'

    DECLARE @ORDERED_B TABLE (ORDERID int PRIMARY KEY)

    INSERT INTO @ORDERED_B

    SELECT DISTINCT ORDERID

    FROM ORDERS

    WHERE ITEM = 'B'

    DECLARE @HOW_MANY_ORDERED_BOTH AS int

    SELECT @HOW_MANY_ORDERED_BOTH = COUNT(A.ORDERID) FROM ORDERED_A AS A INNER JOIN ORDERED_B AS B

    ON A.ORDERID = B.ORDERID

    DECLARE @HOW_MANY_ORDERED_A AS int

    SELECT @HOW_MANY_ORDERED_A = COUNT(*)

    FROM @ORDERED_A

    DECLARE @PCT_ALSO_ORDERED_B AS decimal(10,8)

    SET @PCT_ALSO_ORDERED_B = (@HOW_MANY_ORDERED_BOTH * 1.) / @HOW_MANY_ORDERED_A

    Or am I missing something? Seems like this could also be fairly easily extended to do almost any given number of items.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • i have got the table i want by running something like

    SELECT billdate,custcode,custname,

    MAX(CASE WHEN Seq=1 THEN brand ELSE NULL END) AS brand,

    MAX(CASE WHEN Seq=1 THEN item ELSE NULL END) AS item,

    MAX(CASE WHEN Seq=1 THEN cat ELSE NULL END) AS cat,

    MAX(CASE WHEN Seq=2 THEN brand ELSE NULL END) AS brand2,

    MAX(CASE WHEN Seq=2 THEN item ELSE NULL END) AS item2,

    .

    .

    .

    FROM

    (SELECT ROW_NUMBER() OVER (PARTITION BY billdate,custcode,custname

    ORDER BY billdate,custcode,custname) AS Seq, *

    FROM MyTable)t

    GROUP BY billdate,custcode,custname

    To answer Vladan's "You shouldn't do that... not even if you think you know how many items in one purchase is a maximum. And you can NEVER know that, there can always be one more tomorrow or next month or next year."

    An analysis of the dataset revealed that the maximum number of items in an order is 5. The market basket analysis i will be doing is based on historical data, and the results will be used for selling items as bundles, selling/offering promotional coupons, determining product placements/locations (on shelves), etc. This is not the real-time application of Association Rules you see in websites like Amazon.

    Regarding Jeff's query, "I'm just waiting for some decent data with at least one customer who bought more than 1 item in a day to test with"

    This data is from an electronics/home appliances store; items are normally big and home delivered. Very unlikely for customers to come to the shop more than once in a day. And in the rare event that a customer has more than one order in a day, the second order will have the same custcode as the returning customer normally mentions his/her name (and sometimes the mobile #) and the salesperson will look it up in the day's record.

    Based on some criteria, I have reduced the number of orders in the dataset

    from about 42000 to about 6000. The next step is to count the number of times an item appears with/without combination with other items. And then I will calculate the support, confidence, and lift for the rules.

    I know this is going to be a tedious process but in the absence of the Data

    Mining editions/modules of MS SQL, SAS and SPSS (which have the Market

    Basket Analysis feature/wizard/function), I guess I have no choice.

    Thanks again!

    -zypsy

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply