March 12, 2008 at 7:54 am
Hi,
In relational OLTP databases, tables should all have a primary key constraint.
In a datawarehouse, is this also a good practise for fact tables ?
I'm convinced that a fact table should be implemented as a clustered index ( on a meaningfull column like its identity column), but should additionaly a primary key constraint be defined on the combination of the foreign keys to the dimensions.
What is a good rule of thumb here ?
Regards,
Franky
Franky L.
March 12, 2008 at 12:06 pm
Yes, you should have a primary Key and a Clustered Key on an OLAP table.
They do not necessarily have to be the same, though they often are (more than 50% of the time).
Your clustered key will have the greatest impact on performance of both ETL and OLAP functions. Choosing the fact table's ID column as the clustered key may benefit ETL performance, but it will not help the OLAP functions.
OLAP functions will normally benefit most from a clustered key that queries will use as a ranged subset in the filtering clauses. For a fact table this is not normally the FactID column.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 13, 2008 at 2:33 am
Thx for the inputs RBaryy !
Franky L.
March 25, 2008 at 8:36 am
What's the point of having an ID column on a fact table?
I have an identity column on each dimension table which I use as a surrogate key in the fact table.
Jez
March 26, 2008 at 12:32 am
Jez (3/25/2008)
What's the point of having an ID column on a fact table?
It can facilitate the ETL process.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
October 28, 2008 at 5:35 pm
No sense adding PK to FACT if strategy is only INSERT(APPEND).
Will just slow down the load.
January 28, 2009 at 12:30 pm
I am building my first DW/star schema in sql 2008. I just partitioned one of my fact tables (I have two) and I used the GUI & then generated the script to execute by hand so that I could see what it was doing.
My question: I noticed that my clustered primary key (an identity column) was dropped and recreated as a nonclustered primary key by the GUI.
Can someone explain why it did this? Is this because of what RBarryYoung said? Should I instead create a clustered index using other columns (supposedly composed of two or more surrogate keys?)
and leave the non-clustered primary key alone? How do I go about choosing which of the surrogate keys to include in the clustered index?
Thanks,
Sonya
January 29, 2009 at 7:13 pm
In my practice I never seen the situation when jobs are always finished successfully.
Besides, FACT tables are a subject to read - not only ETL insert is involved.
Sometimes you may even need to update FACT or DELETE some garbage - you can never have 100% clean data.
My advice - use PK in FACT. By default in SQL Server it is unique and clustered.
If you will use another method by creating a column with unique constraint and non clustered index - it will definitely
create overhead for ETL insert as every insert DB engine will be checking constraint via table scan.
As variant - it may be created as composite field containing ETL build PK + increment.
(I am keeping info about ETL builds and their status in table in the same Database)
Even automatic increment may help you further in case of any issues with ETL and data quality especially if you create duplicates.
Otherwise, in order to fix something you may scan your tables forever:) or be unable to make fixes at all.
Cheers!
February 4, 2009 at 9:59 am
Primary key on what columns? if it is based on Identity column then i dont see any use.
If it is on business Columns YES it will be benificial.
Thanks -- Vj
February 4, 2009 at 10:22 am
Let’s start with the basics: All tables should have a primary key.
If a table does not have a primary key, it is not in first normal form. Any time you are violating this most basic tenant of relational database design, you are going off the tracks.
You can debate the wisdom of implementing it as identity vs. non-identity, natural vs. surrogate, clustered vs. non-clustered, or unique index vs. constraint.
November 8, 2011 at 7:25 pm
Here's my concern. I got a fact table with 10 columns (let us say column A-J). Columns A-G determines the unique set of records. Does that mean I have to create primary key for A-G? That will greatly increase the index size, right? Is there any alternative solution to that?
November 9, 2011 at 8:42 am
I got a fact table with 10 columns (let us say column A-J). Columns A-G determines the unique set of records. Does that mean I have to create primary key for A-G?
You could do that, and I think some people would say you should do this. In this case I would, however, create a unique identifier column. Depending on the urgency of keeping the values in the columns unique, I would consider a unique index.
November 9, 2011 at 12:57 pm
drcrelao (11/8/2011)
Here's my concern. I got a fact table with 10 columns (let us say column A-J). Columns A-G determines the unique set of records. Does that mean I have to create primary key for A-G? That will greatly increase the index size, right? Is there any alternative solution to that?
You can always use a surrogate key, add a bridge table between FACT and all DIMensions that are included in the otherwise natural key.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply