May 9, 2014 at 7:46 am
The SQL CAT team's recommendation is to avoid partitioning dimension tables: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspx
I have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)
I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.
Thanks in advance.
May 9, 2014 at 8:14 am
GavD (5/9/2014)
The SQL CAT team's recommendation is to [font="Arial Black"]avoid partitioning dimension tables[/font]: http://blogs.msdn.com/b/sqlcat/archive/2013/09/16/top-10-best-practices-for-building-a-large-scale-relational-data-warehous.aspxI have inherited a dimension table that has almost 3 billion rows and is 1TB and been asked to look at partitioning and putting maintenance in place, etc. (first step is to compress)
I'm not a DW expert so was wondering what are the reasons to not partition dimensions? I can find articles that reference the above article but not much that explains why.
Thanks in advance.
Are you sure that you're not mixing things up a bit? Dimension tables should be relatively small when compared to Fact tables and, as a result, shouldn't need to be partitioned.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2014 at 8:28 am
That was my understanding but this is a dimension. The large fact tables are all partitioned, they are a similar size to this one.
May 9, 2014 at 8:36 am
Can you post the CREATE TABLE statement (including indexes, keys, FKs)? That might help a bit here.
Just so you know, partitioning a table that has other tables pointing to it via FK can be a real pain because, in order for a column to be unique on such a table, either the partitioning column must be added to the otherwise unique column (which would break it as an FK candidate) or the UNIQUE index on the column can't be "aligned" which would kill the ability to do SWITCHes to move legacy data out of the table quickly.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 9, 2014 at 9:28 am
CREATE TABLE [dbo].[dim_dr](
[dr_sk] [bigint] IDENTITY(1,1) NOT NULL,
[dr_nk] [varchar](30) NOT NULL,
[s_sk] [bigint] NULL,
[s_nk] [varchar](30) NULL,
[dr_frdate_hq_sk] [int] NULL,
[dr_frtime_hq_sk] [int] NULL,
[dr_tdate_hq_sk] [int] NULL,
[dr_ttime_hq_sk] [int] NULL,
[dr_frdate_local_sk] [int] NULL,
[dr_frtime_local_sk] [int] NULL,
[dr_tdate_local_sk] [int] NULL,
[dr_ttime_local_sk] [int] NULL,
[dr_frdate_utc_sk] [int] NULL,
[dr_frtime_utc_sk] [int] NULL,
[dr_tdate_utc_sk] [int] NULL,
[dr_ttime_utc_sk] [int] NULL,
[p_dr_id] [int] NULL,
[p_s_id] [int] NULL,
[p_ita_dr_id] [int] NULL,
[p_ita_s_id] [int] NULL,
[e_idfodr] [numeric](28, 12) NULL,
[e_idfos] [numeric](28, 12) NULL,
[c_datetime_hq] [datetime] NULL,
[c_datetime_local] [datetime] NULL,
[c_datetime_utc] [datetime] NULL,
[dr_decimal] [decimal](28, 12) NULL,
[dr_numerator] [int] NULL,
[dr_denominator] [numeric](9, 0) NULL,
[dr_fraction] [varchar](20) NULL,
[ibr_dr_flag] [bit] NULL,
[dr_frdatetime_hq] [datetime] NULL,
[dr_frdatetime_local] [datetime] NULL,
[dr_frdatetime_utc] [datetime] NULL,
[dr_tdatetime_hq] [datetime] NULL,
[dr_tdatetime_local] [datetime] NULL,
[dr_tdatetime_utc] [datetime] NULL,
[current_dr_flag] [bit] NULL,
[inferred_m_flag] [bit] NULL,
[inserted_exec_log_key] [bigint] NULL,
[updated_exec_log_key] [bigint] NULL,
CONSTRAINT [PK_dim_dr] PRIMARY KEY CLUSTERED
(
[dr_sk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_dr]
) ON [DIM_dr]
GO
SET ANSI_PADDING ON
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d] FOREIGN KEY([dr_frdate_hq_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_hq_sk_dim_d]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d] FOREIGN KEY([dr_frdate_local_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_local_sk_dim_d]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d] FOREIGN KEY([dr_frdate_utc_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frdate_utc_sk_dim_d]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t] FOREIGN KEY([dr_frtime_hq_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_hq_sk_dim_t]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t] FOREIGN KEY([dr_frtime_local_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_local_sk_dim_t]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t] FOREIGN KEY([dr_frtime_utc_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_frtime_utc_sk_dim_t]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d] FOREIGN KEY([dr_tdate_hq_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_hq_sk_dim_d]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d] FOREIGN KEY([dr_tdate_local_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_local_sk_dim_d]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d] FOREIGN KEY([dr_tdate_utc_sk])
REFERENCES [dbo].[dim_d] ([d_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_tdate_utc_sk_dim_d]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t] FOREIGN KEY([dr_ttime_hq_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_hq_sk_dim_t]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t] FOREIGN KEY([dr_ttime_local_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_local_sk_dim_t]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t] FOREIGN KEY([dr_ttime_utc_sk])
REFERENCES [dbo].[dim_t] ([t_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_dr_ttime_utc_sk_dim_t]
GO
ALTER TABLE [dbo].[dim_dr] WITH NOCHECK ADD CONSTRAINT [FK_dim_dr_s_sk_dim_s] FOREIGN KEY([s_sk])
REFERENCES [dbo].[dim_s] ([s_sk])
NOT FOR REPLICATION
GO
ALTER TABLE [dbo].[dim_dr] NOCHECK CONSTRAINT [FK_dim_dr_s_sk_dim_s]
GO
CREATE NONCLUSTERED INDEX [IX_dim_dr_dr_nk] ON [dbo].[dim_dr]
(
[dr_nk] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO
CREATE NONCLUSTERED INDEX [IX_dim_dr_s_nk] ON [dbo].[dim_dr]
(
[s_nk] ASC,
[dr_frdatetime_hq] ASC,
[dr_tdatetime_hq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO
CREATE NONCLUSTERED INDEX [IX_dim_dr_s_sk] ON [dbo].[dim_dr]
(
[s_sk] ASC,
[dr_frdatetime_hq] ASC,
[dr_tdatetime_hq] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [DIM_DR]
GO
May 9, 2014 at 10:24 am
Except for the dr_sk column, I don't see any other unique indexes on this table and that's a good thing for paritioning-sake.
What's the nature of the dr_frdatetime_hq column?
1. Is it assigned a value as part of the insert of the row? If not, is there such a date column in the table (we need one to partition)?
2. Is it (that column) ever updated?
The reason I ask is because (especially because of the indexes on this table) it looks like it might make a decent column to base monthly partitions on.
Also, (and I have to double-check when I get home) the existing constraints with the NO CHECK hint might be a problem for paritioning. IIRC, no constraints in a table can have the NO CHECK option if you want fully aligned partitions that can someday take advantage of SWITCH (out or in) but that might only apply to the parititioning column. Like I said, I have to double-echeck on that.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2014 at 5:39 am
Some of the int fields are loaded as YYYYMMDD so these may be good partitioning columns also but I see what you say about the indexes (looking at the dmv's it seems one index is not being used)
I am approaching the partitioning mainly from a maintenance perspective, but will also be looking at queries that may hit the table.
I'm trying to find out if there are any big pitfalls to partitioning a dimension, from a performance, load, querying perspective etc or what are the reasons to avoid doing it?
May 12, 2014 at 7:50 am
If the rows eventually become static, there are some huge pitfalls with individual FileGroups having mounds of wasted space.... and individual FileGroups by month is definitely the way to go for maintenance especially where backups and piece-meal restores are concerned.
I've solved those pitfalls but it's not easy because of the problems associated with extra space generated during index rebuilds and the fact that DBCC SHRINKFILE hammers indexes for fragmentation.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply