Need Some Expert Advice to optimize this query

  • Description: I am selecting transactions by customeres who were added within the past 10 years.  Then group them by add_date in 12 month groups.  Then group transactions by transaction_date.  Grouping starts at current month - 12 months

    Ex: Customer add_date Group

         Transaction_Date Group

    So, Customers with an add_date between 3/31/2005 and 3/31/2006 would be grouped together.  then their transactions would be group by transaction_date between 3/31/2005 and 3/31/2006, 3/31/2004 and 3/31/2005, etc...  going back 10 years.  Obviously the most current Customers added would have fewer transaction groups.

    Table look like this:  BTW - our production transaction table has over 12 million records

    Customer_Table (Customer_ID, Add_Date)

    Transaction_Table (Transaction_ID, Customer_ID, Transaction_Date, Transaction_Amount)

     

    My query looks like this:  Query takes nearly 2 minutes to complete.

    Select Disctinct  SUM(t.Transaction_Amount) as Amount

    -- This code assigns a group number.  For example customers added from 3/31/2005 to 4/1/2006 would be assigned a CustomerAddGroup number of 0.

    -- The customers added 2 years ago would have a CustomerAddGroup number of 1 and so on...

              , FLOOR((DATEADIFF(MM, c.Add_Date, '3/31/2006')-1 / 12) as CustomerAddGroup 

              , FLOOR((DATEDIFF(MM, t.Transaction_Date, '3/31/2006')-1 / 12) as TransactionDateGroup

    From Customer_Table c

    Inner Join Transaction_Table t

    on c.Customer_ID = t.Customer_ID

    Where c.Add_Date <= '3/31/2006'

    And c.Add_Date >= dateadd(yy, -10, '3/31/2006')

    And t.Transaction_Date <= '3/31/2006'

    And t.Transaction_Date >= DateAdd(yy, -10, '3/31/2006')

    Group BY  FLOOR((DATEADIFF(MM, c.Add_Date, '3/31/2006')-1 / 12) as CustomerAddGroup

                  , FLOOR((DATEDIFF(MM, t.Transaction_Date, '3/31/2006')-1 / 12) as TransactionDateGroup

     

  • Some Pointers here:

    1. All Dates should be converted to datetime variable prior to query execution.

    2. Make sure you have indexes in your searched (and joined)  columns

    3. "Distinct" is not necessary - in my opinion- for that query

    4. Not sure what you need to accomplish with: FLOOR((DATEADIFF(MM, c.Add_Date, '3/31/2006')-1 / 12). Can you be more explicit here? Don't you have a missing parenthesis there ?

     


    * Noel

  • here is an example of how I would go about it:

    Declare @Firstdate datetime, @LastDate datetime

    select @FirstDate = dateadd( mm, -12 *10, '20063103'), @LastDate = '20063103'

     

    Select      SUM(t.Transaction_Amount) as Amount

              , month( c.Add_Date) as CustomerAddGroup

              , convert(datetime,datediff(day,0,t.Transaction_Date)) as TransactionDateGroup

    From Customer_Table c

         Inner Join Transaction_Table t

         on c.Customer_ID = t.Customer_ID

    Where     c.Add_Date <= @LastDate

          And c.Add_Date >= @FirstDate

          And t.Transaction_Date <= @LastDate

          And t.Transaction_Date >= @FirstDate

    Group BY   month(c.Add_Date) 

             , convert(datetime,datediff(day,0,t.Transaction_Date))

     


    * Noel

  • I appreciate the response, but i think your solution wont work for us because you are grouping by month.  I need to group the results by groups of customers within 12 months then their transactions over a 10 year period in groups of 12 months.

    It would look like this in excel.  Each group is a sliding 12 month grouping.  So for all customers added between 03/06-03/05 would have there transactions listed below that column grouped by transaction group.

                                                                      Customer Add Groups

    Transactions                  03/06-03/05              03/05-03/04              03/04-03/03             03/03-03/02

    By Customer Add Group

    03/06-03/05                   100                         200                           300                        400

    03/05-03/04                   0                            100                           200                        300   

    03/04-03/03                   0                             0                             100                        200 

    03/03-03/02                   0                             0                             0                            100

     

  • Ok Now I think I understand what you need.

    If you can not persist the calculation some how you won't be able to use an index on the transaction_date nor on the add_date for grouping purposes.

    If you have already implemeted all the ponts in my previous post all that is left is the implemetation of some sort of persistence.

    Create a calulated column or an indexed view that implement those values and if you go with the calculated column don't forget the index on it. At that point your query should look like:

    Select      SUM(t.Transaction_Amount) as Amount

              , Calculated_Customer_group as CustomerAddGroup

              , Calculated_transaction_group as TransactionDateGroup

    From Customer_Table c

         Inner Join Transaction_Table t

         on c.Customer_ID = t.Customer_ID

    Where     c.Add_Date <= @LastDate

          And c.Add_Date >= @FirstDate

          And t.Transaction_Date <= @LastDate

          And t.Transaction_Date >= @FirstDate

    Group BY  Calculated_Customer_group

     , Calculated_transaction_group

    hth

     


    * Noel

  • Thanks Noeld.  My original system design was to encapsulate this in a process and archive the results.  Unfortunately, the project requestor wants to be able to run this for any date range using a web application.  Which requires the CustomerAddGroup and TransactionGroups to slide.  The 12 month window always slides, so it's not static.

  • I thought about that too and you can make it static by simply defining group date 0 as the earliest date in your db and increment from that to current date. This is the opposit of what you have now, you can always increment and your 0 reference is static

    Then it will be extremelly easy to persist those values all needed afterwards is to generate the translation lookup (translate from 'GroupNumber' to 'DateRange') which is also static by the way

     

     

     


    * Noel

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply