August 2, 2005 at 4:15 am
Hi all,
Suppose I have the following tables:
PRODUCTS
(prd_id, prd_name, price, supplier, date_of_purchase)
VENDORS
(vendor_name)
I am building a cube with price as the measure, and 3 dimensions: Product_id (from prd_id), date_of_purchase, and supplier.
The supplier field maps to a vendor_name value in the VENDORS table.
In order to show all vendors even if there is no product associated with that vendor, when I build the Supplier dimension, I use the vendor_name column of the VENDORS table, and join the VENDORS and PRODUCTS tables at the cube level.
The problem is, it happens that in the VENDORS table, there are 2 records with the same vendor_name (there is no unique key or primary key in the table), let's call it "ADIDAS".
Suppose that in the PRODUCTS table, the total price of all products with supplier "ADIDAS" is 3,000. When the aggregations are calculated in the cube, the "ADIDAS" vendor will show a value of 6,000, because there are 2 ADIDAS values in the VENDORS table to join with each PRODUCTS record with supplier ADIDAS.
As I am not the designer of the database I cannot decide to remove the duplicate record. What can I do? I tried to set those allow unique names/keys properties in the dimension (set to false), but still the result is the same. Is it possible to do something like placing a "DISTINCT" keyword in the Source Table Filter (which I dunno how to do so as the filter is supposed to be a WHERE clause). Or is there anything I missed out?
Thanks for your help,
delpiero
August 2, 2005 at 5:53 pm
This is a pretty major fault with the DB design. If the two suppliers aren't meant to be the same entity, then there is no way to tell AS (or any other query) which one to assign the sales to.
If you can. i'd write a view over the suppliers table with a distinct clause in it, but as you've noted, this is really an arbitrary removal of one of the dupes.
Are you *sure* there is no other method of joining the transactions to the suppliers? If what you're seeing is truly how the source DB works, then looking through the front end (data entry) application, you'd not be able to differentiate between the two suppliers sales.
Steve.
August 3, 2005 at 7:19 am
In reality this is not an AS issue. It simply comes down to duplicates in your data that would affect any application using this information.
As stevefromOZ rightly points out "If the two suppliers aren't meant to be the same entity, then there is no way to tell" them apart. Therefore any record (in your fact table for example) assigned to that entity will be duplicated for each instance of it in the joining table.
If these are in fact the same entity then Distincting the data used to populate your vendors dimension is a reasonable solution. However, if this is not the case then you have a very diffucult job ahead!! You could potentially get around this by assigning a surrogate key to each member of your vendors demension and assign this appropriatly in the fact table depending on what instance of vendor you want to assign it to, i.e. surrogate_key 1 or surrogate_key 2. But this is all pointless though, if at the end of the day if you have no way of differentiating the two records as they will appear to be a single member 'ADIDAS' in your cube.
Good luck!
August 3, 2005 at 12:17 pm
I guess I'd join with a subquery like
SELECT DISTNCT vendor_name FROM VENDORS
Or create a similar view in the DB and join with that
August 8, 2005 at 10:40 am
Chances are that the VENDORS table has a vendor_id column that is unique. This is the column that should be joined to the PRODUCTS table -- not the vendor_name column.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply