Help with Customer Classification Script

  • Hi there,

    I have been scrutinising the quality of the SQL code from our 3rd Party software supplier and came across some code which would win the RBAR awards (at least the European RBAR awards). I look forward to what Jeff has to say about this ๐Ÿ˜›

    This is the scenario:

    We want to perform a customer classification on a regular basis using the RFM(R) method (Basically customer scoring according to order history).

    The way the software company does it is as follows:

    {Pseudo-Code}

    START LOOP IN JAVA

    Find customer

    {start collecting information for customer}

    find all orders for this customer

    find all orders for this customer in the last 60 days

    find last 3 orders for this customer

    find all returned items for this customer

    find all catalogues for this customer

    {end of collection}

    get next customer goto top of loop

    {End of Pseudo-code}

    As you can (hopefully) see, they have basically moved the oh-so-evil cursor out of the T-SQL and into Java. They were wondering why it took 3+ days to classify our customers (5+ million).

    When I saw this in the profiler I nearly fell off my chair. This is what they call batch-processing :w00t:

    So I thought I would be able to outperform them and decided to try it out. I would appreciate it if you could help me here and give me some pointers if I am doing anything terribly wrong.

    The way that I would attack this would be set-based (รก la Jeff Moden). I have read alot about his aggressive advertisement of RBAR and how to avoid it, and try to keep this in mind.

    If you're still reading, here is my proposed method ๐Ÿ™‚

    For each of the loop steps above I would collect the information for all customers at once into pre-tables (not in tempdb, but as real tables). Once they are there, push it all into one final table and then update the customers with their new RFM score.

    This is really easy except for one part - find last 3 orders for customer. I had some trouble getting my head around this in a set-based way. In a cursor, piece of cake, but my brain is a bit fried at the moment (built a house, had a child, built a datawarehouse, had NO holidays all in the last 18 months).

    I came up with the following method to do this:

    {Disclaimer: Tables,Columns etc are all faked. I also wouldn't do a select * on the customers table but you get the idea!}

    Grab all orders per customer and give them a rank, youngest order has rank 1

    SELECT

    OrderID,

    CustomerID,

    OrderDate,

    Row_number() OVER (PARTITION BY CustomerID ORDER BY Orderdate DESC) AS lastOrderRank

    INTO

    lastorders

    FROM

    Orders

    Grab all customers details and the information (if any) on the youngest 3 orders from the lastorders table I just created.

    SELECT

    l.OrderID,

    l.Orderdate,

    c.*

    FROM

    Customers c WITH (NOLOCK)

    LEFT JOIN lastorders l

    ON l.CustomerID = c.CustomerID AND

    l.lastOrderRank < 4

    As far as I can see, I get the youngest three (or less if the ordered less times)orders for each customer we have registered.

    I discussed this with a fellow DBA/DBDev and we were both happier with this than the Java-Cursor from hell that has been offered up to us.

    Now my question to the SSCers: Am I going about this the right way, or should/could I do it better/differently?

    I have no problems with totally different methods, I would appreciate any input on this, even if it's to say that the JAVA-Cursor is a lovely idea! ๐Ÿ˜‰

    Thank you for your patience, if you've made it so far. Hopefully you could understand my ramblings and feel inclined to answer me.

    Regards

    GermanDBA

    Regards,

    WilliamD

  • You are on the right track. Just make sure that your final query is covered by the appropriate indexes and you will finish *much* faster than the "java-cursor" I believe all you need is just a single query ๐Ÿ˜‰


    * Noel

  • GermanDBA (6/11/2008)


    This is really easy except for one part - find last 3 orders for customer. I had some trouble getting my head around this in a set-based way.

    I came up with the following method to do this:

    ...

    Grab all orders per customer and give them a rank, youngest order has rank 1

    SELECT

    OrderID,

    CustomerID,

    OrderDate,

    Row_number() OVER (PARTITION BY CustomerID ORDER BY Orderdate DESC) AS lastOrderRank

    INTO

    lastorders

    FROM

    Orders

    Grab all customers details and the information (if any) on the youngest 3 orders from the lastorders table I just created.

    SELECT

    l.OrderID,

    l.Orderdate,

    c.*

    FROM

    Customers c WITH (NOLOCK)

    LEFT JOIN lastorders l

    ON l.CustomerID = c.CustomerID AND

    l.lastOrderRank < 4

    You actually want to roll these together:

    WITH lastorder(OrderID, CustomerID, OrderDate, lastOrderRank) AS

    (SELECT

    OrderID,

    CustomerID,

    OrderDate,

    Row_number() OVER (PARTITION BY CustomerID ORDER BY Orderdate DESC)

    FROM Orders)

    SELECT

    l.OrderID,

    l.Orderdate,

    c.*

    FROM Customers c WITH (NOLOCK)

    LEFT JOIN lastorders l ON l.CustomerID = c.CustomerID

    AND l.lastOrderRank < 4

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Hi RBarryyoung,

    I thought about that but wasn't sure what was better, to go the CTE way or to build a pre-table with a better index. I have played with CTEs before and seen that they can be very usefull but not always.

    Seen as the CTE runs over all orders for all customers and does the row_count stuff too, I though it better to get that done and then join it to the customers afterwards. I will try it out both ways and see which is quicker.

    regards

    GermanDBA

    Regards,

    WilliamD

  • Hi again,

    I just thought I'd give a quick heads-up. The CTE method did beat the pre-table + index that I initially suspected to be quicker.

    I also had all of the steps from the evil-cursor pump into intermediate tables for all customers in under 10 minutes - SET BASED WORK FTW!!!!!!! ๐Ÿ˜›

    Thanks for your comments and tips.

    GermanDBA

    Regards,

    WilliamD

  • Great! Glad it worked out.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply