January 16, 2008 at 8:46 pm
Please forgive the cross-posting, I wasn't sure the best place to put this question.
I'm modeling a procurement process as part of a data warehouse project - and in this case what we're procuring is people (filling jobs). The basic process is that the company creates a requisition, and that requisition will be to hire X number of people for a certain type of job. People then apply for those jobs until we hire enough to fill the number of openings from the requisition.
In the transactional system, there are two main tables that hold the information for this process. One is Requisition, which has info like the number of openings and the min and max wage for the jobs. The second is RequisitionApplicants, which has a row for every person that applies for the job, and a flag that indicates of that person was hired or not. For the most part, the facts we're concerned with are counts - how many Requisitions there are by the age of the requisition, department, etc... and how many applicants we had vs how many hired by various dimensions.
There's a disagreement among the design team about how this should be modeled. My contention is that there should be two fact tables, one based on the Requisition, another based on RequisitionPeople. This will allow us to easily do both kinds of counts that we need and keep the design clean. It would look like:
FactRequsition
--------------
AgeBucketKey
DepartmentKey
NumberOfOpenings
MaxRate
FactRequisitionApplicant
------------------------
PersonKey
StatusKey
HiredFlag
Another team member says we should combine them into one table with the granularity being the applicant from RequisitionPeople - rolling everything up into one table. His design would be:
FactRequisitionApplicant
-----------------------
AgeBucketKey
DepartmentKey
PersonKey
StatusKey
NumberOfOpenings
MaxRate
HiredFlag
The dimension keys from the old FactRequisition table would be repeated for every applicant to that requisition. Similarly, the facts would be repeated as well.
I have a lot of problems with this. The grain feels all wrong and incoherent. It seems like we're unnecessarily repeating data and dimension keys. It also seems like the counts I need to come up with will be more difficult to obtain. For instance, if we have 5 requisitions and 22 people have applied to the various jobs from those requisitions, we'll have 22 rows in the fact table. Counting the applicants is easy, but counting the requisitions is more difficult.
My question is - am I right? Which design is better?
May 26, 2008 at 9:15 am
Hi,
That's a pretty simple scenario.
I would go with your design. You can have 2 cubes, each serving different purpose. If required, you can always have a virtual cube to combine these cubes to do a comparison.
Regards,
Vijay
Warm Regards,
Neel aka Vijay.
May 27, 2008 at 11:00 am
I agree. You seem to have a good understanding of the level of granularity, which is the fundemental concept to proper OLAP design. Without the data, I can't be sure, but I agree that the combined concept does not on the surface feel right. It would seem to be mixing grains.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply