July 22, 2007 at 4:36 pm
We have system for one of your client, which collects the customer information from different data sources (65+) and stores in a relational model. Any changes to the customer information is logged in log tables. My question is, should I call this system as a datawarehouse or a Operational data store (ODS). Is it necessary for data in a datawarehouse to be in dimensional model ?
July 24, 2007 at 7:33 am
I'm a beginner at datawarehousing, so you might not put much weight on my response:
The dimensional model is necessary for a datawarehouse. You have described an ODS (see http://www.dmreview.com/article_sub.cfm?articleId=469). The point of the dimensional model is to provide fast reporting in a model understandable to the users. A relational model does not do that because most queries have to pull together many rows from multiple tables and that takes far too long.
You can use the model you have described as a datawarehouse. You can also sprint in clogs. I wouldn't recommend either.
(By the way, I am disregarding your sentence about the log table, because I'm not sure at what point you do that or how it fits in.)
July 24, 2007 at 9:15 am
I am also a beginner at this, so someone correct me if I'm wrong, but here's what I've gotten from the articles I've read.
Once you've determined where all your source data is coming from, which pieces of data you're going to need and what items (reports, etc) you're going to have to furnish to the end users, you have to set up a database. Truth be told, it doesn't matter what you call it (a lot of different people call them by different names) but it still comes down to being a database. You design your database based on the final needs of how the data is to be accessed.
Keep in mind that things like OLAP cubes and reports will be built "on top" of the database and that your database (data mart, data warehouse, ODS, what ever you want to call it) will be denormalized to a certain extent. You can take it all the way and do a star schema there if you want or only take it part of the way. But it should NOT be a mirror of one of your data sources because your data sources are probably transactional dbs and require much more data and much more normalization than your datawarehouse will need.
Currently, the one I'm playing with has a double star schema in it. Products, Dealers, Distributors and Agents are all pointing to a center "Transactions table" which houses all the keys. An Orders table also points to the Transactions table. Because different departments sell different products, the finanicals ended up in several different data sources, so the second star has the different finanical tables and things specifically relevant to Orders pointing to the Orders table instead of the Transactions table. Hence, double-star with a connector between Orders and Transactions.
BTW, I'm still not too clear on the difference between a datawarehouse and datamart. Supposedly there is one, but it seems to me that they are both databases and the datamarts are just more specialized than the general datawarehouse. If that's the case, and you don't have a ton of sources (less than 5), I see no reason to use datamarts. If someone disagrees with my whole datamart / datawarehouse concept, please let me know. I like hearing other people's insight on BI.
July 24, 2007 at 9:58 am
At risk of taking this forum thread on a tangent, a data mart is a subset of a data warehouse. Unless you are working with a very simple business, you will eventually need multiple data marts in your data warehouse. Each mart deals with a business process. For example, if your business is a Web site, you would have one mart for user sessions, another for order line items, etc.
July 24, 2007 at 10:02 pm
I'll vote with Ralph on this one, what you've got is best described as a operational data store...
Joe
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply