August 2, 2007 at 11:02 am
We just had a consultant in to do work. They built a few tables in our 'staging' database and also in our 'datamart' database. We have an ETL tool to load data from the staging into the datamart.
I now 'own' the work they delivered.
The source data is extracted from an operational system into MS Access files (ugh!... but we have no control on this).
There are 80+ fields in the extract that are loaded into staging and about 45 fields loaded from the staging into a fact table in the datamart.
The consultant built the datamart tables so that every text field in the fact table is 'VarChar 1020'...!
There are about 35 fields with this attribute.
They claim that because they are varchar it doesn't matter, but something about looking at the table schema and seeing 35+ fields all with a length of 1020 chars just doesn't seem right.
Many of the source fields are small (1, 2, 8) char lengths.
Is this design crazy (i.e. should I be adjusting it) or should I just leave it alone?
August 2, 2007 at 12:07 pm
August 2, 2007 at 4:10 pm
Next time when you are looking for a consultant, make sure that person knows what a data warehouse is ? Also most companies have this mistake, including my current company, they hand the project to the consultant(s) and let them do everything. There is not one single internal employee working with the consultants, now the consultants are leaving and no one knows what they had done. It is going to be a disaster. I had warned the management once before because I had been in this situation a lot of times, but did they listen, noooooooooooo!!!!!!
Now I am watching the history repeats itself again.
There should not be more than 8 dimensions in a fact table otherwise you are going crazy, the table will be too big. You have to know the source data before making any changes, basically you have to start from scratch. I am sorry.
August 2, 2007 at 4:13 pm
BTW, why varchar(1020) ? How did he pick this number?
I agreed at this point, if it is working, don't do anything. Once you start making changes, you basically have to re-write the whole system.
August 3, 2007 at 2:12 am
I'm not the smartest when it comes to the intricacies of SQL Server storage, but I *think* the consultant is right in saying that the 1020 doesn't matter because it is variable length. Please tell me if I am wrong!
When I am designing tables for datawarehouses I tend to make the field size pretty big (and variable), to cater for data that is not guaranteed to be a certain size in the future. That is, a 'Description' field may be varchar(255) in the source, but a) an upgrade to the source system can change that pretty quickly, possibly crashing your ETL batch and b) datawarehouses should be source-agnostic, and that means the entire source system should be able to change without necessarily changing the end tables. Not sure exactly where the 1020 comes from, but hey, it's a big number where a big number is called for, so fair enough.
As for not having more than 8 dimensions, I'm afraid that this is the real world and businesses are complex. You've got to match the design to the needs of the business, and if you insist on staying to an arbitrary number of dimensions because more might be confusing, then you run a very big risk of not fulfilling your customers' requirements.
August 3, 2007 at 1:15 pm
It is recommended that the fact table should not have more than 8 dimensions. Of course you can have more than that. Actually if your fact table has 8 dimensions, it is already pretty big. Maybe you can re-define some of your dimension table or make another fact table.
The customers always want everything. But you don't have to stick to one design or one way of doing thing. Maybe it is just me. 🙂
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply