Fact Table Design Question

  • Please forgive the cross-posting, I wasn't sure the best place to put this question.

    I'm modeling a procurement process as part of a data warehouse project - and in this case what we're procuring is people (filling jobs). The basic process is that the company creates a requisition, and that requisition will be to hire X number of people for a certain type of job. People then apply for those jobs until we hire enough to fill the number of openings from the requisition.

    In the transactional system, there are two main tables that hold the information for this process. One is Requisition, which has info like the number of openings and the min and max wage for the jobs. The second is RequisitionApplicants, which has a row for every person that applies for the job, and a flag that indicates of that person was hired or not. For the most part, the facts we're concerned with are counts - how many Requisitions there are by the age of the requisition, department, etc... and how many applicants we had vs how many hired by various dimensions.

    There's a disagreement among the design team about how this should be modeled. My contention is that there should be two fact tables, one based on the Requisition, another based on RequisitionPeople. This will allow us to easily do both kinds of counts that we need and keep the design clean. It would look like:

    FactRequsition

    --------------

    AgeBucketKey

    DepartmentKey

    NumberOfOpenings

    MaxRate

    FactRequisitionApplicant

    ------------------------

    PersonKey

    StatusKey

    HiredFlag

    Another team member says we should combine them into one table with the granularity being the applicant from RequisitionPeople - rolling everything up into one table. His design would be:

    FactRequisitionApplicant

    -----------------------

    AgeBucketKey

    DepartmentKey

    PersonKey

    StatusKey

    NumberOfOpenings

    MaxRate

    HiredFlag

    The dimension keys from the old FactRequisition table would be repeated for every applicant to that requisition. Similarly, the facts would be repeated as well.

    I have a lot of problems with this. The grain feels all wrong and incoherent. It seems like we're unnecessarily repeating data and dimension keys. It also seems like the counts I need to come up with will be more difficult to obtain. For instance, if we have 5 requisitions and 22 people have applied to the various jobs from those requisitions, we'll have 22 rows in the fact table. Counting the applicants is easy, but counting the requisitions is more difficult.

    My question is - am I right? Which design is better?

  • I would tend to agree that your design is better, the other design would appear to make it much more difficult to generate meaningful results for an average end user.

    I would also be looking to modify your design slightly though. Not sure if you been planning to use the Req as a Fact and a DIm but I would be leaning that way. I think the Req needs to be a fact so you can get your counts etc and let you users analyze the Requisition info, but using it as a DImension tied in to an Applications Fact (not Applicants) which is then related to the Applicants Dim (also a Fact, used for raw counts, demographic analysis etc). As you've provided it, the Applicants fact looks wrong to me to have the Hired flag in it - this would tel me that a single Applicant can be hired only once, and doesn't tell me for which Req (or Reqs) they were hired.

    FactRequsition (also a Dim)

    --------------

    [Req_ID]

    [TimeKey?]

    AgeBucketKey[maintenance of this?]

    DepartmentKey

    NumberOfOpenings

    MaxRate

    [Other Req attributes, used in the Dim]

    FactApplications

    ----------------

    Req_ID -- ties this to the Requisitions dimension

    Person_Key -- this this to the FactRequisitionApplicant (a fact & dim)

    Hired -- bit flag, count these up

    [Applied_Flag] -- again a bit, in case you just want to sum up the records to get your total applicants count for the Reqs, this is instead of having to do counts, your choice.

    FactRequisitionApplicant (also a dim)

    ------------------------

    PersonKey

    StatusKey[you didn't mention this one, not sure of purpose]

    HiredFlag [not sure if you want this here, means a person is either hired or not, and doesn't allow analysis for when people are hired to multiple positions over time]

    [Other Demographic attributes of Applicants, e.g. State, City, Key Skills etc]

    The above design would allow you to do a sort of bi-directional analysis -- see counts of applicants by city etc for given Reqs, but also let you view an Applicant, and see how many times they've applied to Reqs, how many times they've been hired etc.

    Slightly off topic but would be interested to see how you'l manage your 'Age Bucket Key' on the FactReq. To me this is temporal (e.g a Req was a week old a week ago, this week it's two weeks old) - you'd have to be updating the keys on a regular basis to get a true indication of the age of the Req. Possibly adding a TimeKey and Time Dim would allow your users to see counts of Reqs opened in on days, in given weeks, months, qtrs etc. I'd imagine that this would also let you see things like count of Reqs for january, count of openings (total across Reqs for Jan), count of hires, and maybe with a little bit of work, count of applications (true applications not hires, allowing you to analyze the 'calibre' of candidates, e.g. you could have 1000 appplicants but hire only 5, not a good hit ratio when 20 were required).

    Hope this helps out somehow,

    Steve.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply