May 22, 2007 at 12:07 am
I have a question I'm hoping someone will be able to help me with - I've been working on it for a while and can't seem to find an obvious solution.
Basically I have a fairly standard transaction fact table, and customers may have made multiple transactions over the selected range.
I need to know how many customers made 1 order, 2 orders, 3 orders, etc. As far as I can tell, I'm effectively doing a count of a count.
It seems to me to be a fairly straightforward request, however I can't work out a way to do it in Analysis Services. If someone can help me out or point me in the right direction I would be very grateful
Thanks,
Matt
May 22, 2007 at 4:59 pm
Hi Matt,
you need a customer dimension and a facttable (your transaction table). The customer dimension you could build by using customer data from your legacy system or you can do a view with select distinct customernummer, customername from your fact table.
You have to connect the dimension and the fact table in the data source view by using a foreign key relationship between the customer key in the dimension (e.g. customerid, number) and the reference customer key in your fact table
(this has no effect at the underlying system)
Then you add a new measure to you measuregroup (=fact table in your case) using count as aggregation type. You could do a count on the customer number for example or any other NOT NULL column in the fact table.
When I do those things I mostly use a view for the fact table including a select 1 as columnx, This is cheap, I'm sure that a value is in the column and Integer values do not waste too much bytes.
When you processed your cube and select your customer in the dimension then the measure should show the number of transactions for the selected customer.
Best regards and good luck,
Stefan
SK
May 22, 2007 at 9:02 pm
Hi Stefan
Thanks for your reply, but I'm afraid I don't think you quite understood what I was asking. At the moment I have about 800,000 customers in the database, so I don't want to see the order count for each one individually.
Ideally what I want is something like:
Orders Customers
1 1050
2 2570
3 5890
etc. which tells me that over the selected period, 1050 customers only made a single order, 2570 customers made 2 orders, etc.
So effectively I need to do a count of the number of orders, then a count of those counts.
At the moment I'm looking into using sets, so I can do the counts over month ranges, so if anyone has any suggestions about sets or otherwise it would be much appreciated still.
Thanks,
Matt
May 23, 2007 at 12:23 am
I'd say you need a new dimension, the ORder Count Group dim. It's likely that this dim would be related only to a new, higher level aggregate fact table. In AS2K5 you could achieve this reasonably easily with the ability to have multiple fact tables/measure groups per cube; AS2K could take a little more work but should produce the same outcome.
A poor mans way (but quite possibly the only way? in AS2K) to approach this would be to create a view (or table your choce depending on performance and required latency of the data) that holds the timeperiod reference (eg some month identifier like 200705), the customer key and the (summed from the lower transaction/fact table) count of orders. You could then layer another view over this that is either hard-coded or uses a lookup to group the customers in to order counts per time period. You would need to create i) a new key in that view that relates to ii) a new dimension whose keys relate to textual names such as 0, 1, 2, 3..., >= 10.
If you've used/created your Time dimension as Shared, then create a new cube that references this dimension with this new fact table and the new 'Order Count Group' dimension. Then use a virtual cube to bring in the measures from both fact tables and all relevant dimensions.
Steve.
May 23, 2007 at 1:18 am
Hi Steve
Thanks for your thoughts also. Essentially that's what we're doing at the moment. Unfortunately creating the aggregate tables in the database require much more direct maintenance, and effectively doubles the size of our database, so I'm looking for other approaches.
I think that defining sets for each month might help solve this, however they still require the time periods (each month) to be predefined, and ideally I would like this approach to work across any selection of the data.
Maybe I'm trying to do something that's too complex here, but I would have thought that something like this would be a fairly common reporting request.
Thanks,
Matt
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply