November 16, 2007 at 9:04 am
I am cutting my teeth on star schema design. I have a simple star schema I am building for Headounct analysis at work. I have a factless fact table where a row represents a head in the company. Each head is toed to a particulat week in a Date dimension tabel. There are additional dimensions for things like gender, ethnicity, marital status, age, etc. Now in my department dimension - it's hierarchical. In the DimDepartmnet there is a department which belongs to a company. Comapnies belong to divisions. Now the fun part. Each division has a headcount target for each year. Up to this point I am in a perfect star schema (no snow flaking). How would I integrate in this concept of a headcount target for each division for a given year?
We are using cognos on top of this star schema to provide reporting and analysis services if that is relevant. From the Star Schema design stand point... any thoughts?
Christian Loris
November 16, 2007 at 10:07 am
So I assume you have a Date dimension table, Company table, what about gender, ethnicity, marital status, does each one have its own dimension table or all these belong to a customer table?
I like to use surrogate key to build my fact table.
Date table
DateID INT IDENTITY(1,1),
DateTime DATETIME,
DateYear VARCHAR,
DateMonth VARCHAR,
DateDay VARCHAR...
You can add all kind of date related columns in this table.
Company table
CompanyID INT IDENTITY(1,1),
CompanyName VARCHAR
Division VARCHAR
Department VARCHAR
Aside from gender, ethnicity, the fact table should be
HeadcountFact
DateID
CompanyID
This is factless table which will work find. If you want to find out how many headcount in particular year in particular division.
SELECT COUNT(*)
FROM HeadcountFact h
INNER JOIN Date d ON h.DateID = d.DateID
INNER JOIN Company c ON h.companyid = c.companyID
WHERE h.YEAR = '2006' AND c.DIVISOIN = 'IT'
However I am not familiar with cognos, so I don't know if it can use factless table.
If it can't use factless table, then you have to create a headcount measure in the fact table.
Headcounttable
DateID
CompanyID
Headcount
my 2 cents
Am I missing something?
November 16, 2007 at 10:58 am
Loner:
You've got pretty much what I have got as far as design. The thing that is missing is that I have a headcount target for each division for each year. So for example Division A has a headcount target of 2000 for 2005 and a traget of 2100 for 2006... and so on. How does this figure into the picture?
Chris
November 16, 2007 at 11:30 am
Chris, one way to do this would be to have another fact table - headcount target and join it with date and division dimensions which you already have.
If I understand your problem correctly, you have a headcount target value for each division for each year. That makes headcount target a measure with a different granularity than your headcount fact table. So it needs to go in fact table of its own.
November 16, 2007 at 12:04 pm
Ok. That seems simplistic. I am not too familiar with the AS stack and cubing at this point - but I am assuming if I wanted to report or conduct analysis, I would just have to point to both cube/data sources. I could then view, for instance, an analysis table listing the headcount at a division for each week. And then using the other fact table, correlate the division and year, and show the delta for that week from the annual target. I just have to ensure I can map the business keys for the divisions between the two fact tables.
Chris
November 20, 2007 at 2:21 am
Hi Chris,
I go along with Loner's solution except where a table occurs that stores hierarchical information, then I would always associate it with its lowest level of granularity and hence name it accordingly. The lowest level of granularity here is Department. Therefore, in my design, it would be known as DEPARTMENT_DIMENSION rather than COMPANY.
You seem to be objecting to snowflaking although I'm unclear why. If you can accept it, then I suggest you create a DIVISION_DIMENSION containing DIVISION_ID (foreign key on DEPARTMENT_DIMENSION), YEAR (enables you to build up headcount history) and HEADCOUNT.
Personally, I like to keep things as simple as possible and having multiple fact tables strikes me as over engineering.
If you need some convincing about snowflaking then consider this point: In a data warehouse to be used for marketing purposes, then typically the CUSTOMER_DIMENSION will be a slowly changing dimension. If the address key is not on the CUSTOMER_DIMENSION (snow flaking), but perhaps on the fact table, then how do you ensure you have selected the most recent address for a customer for a direct mail marketing campaign - especially if the criteria for selecting the customer was a match on a non-current row of the CUSTOMER_DIMENSION?
I hope the above helps.
November 21, 2007 at 6:58 pm
When you build up the fact table, you always ended up with the count of the lowest level unless you put it under Analyzer Service and it will do the count for you for all levels. Otherwise what you do when you want to get the upper level is the set the query with group by and where clause.
SELECT c.Department, SUM(Headcount)
FROM HeadcountTable h
INNER JOIN Companytable c ON h.companyid = c.companyid
WHERE c.Department = 'IT'
GROUP BY c.Department
November 22, 2007 at 1:44 pm
I don't think snowflake schema is necessary for this design. This is just a typical company dimension.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply