November 18, 2013 at 7:50 am
Guys,
I have the following situation for my new Data Warehouse:
A table called ENTITY
CREATE TABLE Entity
(
ID int identity(1,1),
EntityType int not null,
NaturalKey nvarchar(100) not null
)
(The NaturalKey is the value we will want to report on). Once set the Entity.NaturalKey should not change and is therefore a candiate for PK on Entity.
and I have a table called SOURCE
CREATE TABLE Source
(
ID int identity(1,1),
SourceName nvarchar(100) not null
)
Now the challenge is that the same Entity may have different keys in different source systems (don't ask!)
so there is an interim table
CREATE TABLE Source_X_Entity
(
ID int identity(1,1),
SourceID int not null, -- FK to source
EntityID int not null, -- FK to Entity
SourceRef nvarchar(100) not null,
-- some other fields not relevant to the discussion
)
The issue is that most of the time we will be searching, grouping and filtering by the Entity.NaturalKey but going through the Source_X_Entity table. Should I break 3rd Normal Form and put the NaturalKey into the Source_X_Entity table as well and risk having to do a mass update if the natural key is ever changed. It will save me a join on almost every query.
FYI we are using SQL2012 for this project but I would be interested to know if the answer changes for SQL2008 ro SQL2005
November 18, 2013 at 8:49 am
Is this "extra join" causing a performance issue? If not, I would recommend that you keep the table structure you have. If this is a new system load it with data and test the performance.
November 18, 2013 at 9:14 am
assuming you would join Source_X_Entity to the other two tables ,
i would think that if you create a non clustered index on Entity(NaturalKey), or maybe a non cluster on(Entity(EntityID) INCLUDE(NaturalKey) you would see a good query plan when hitting the three tables together, na dwould not need to modify the structure.
Lowell
November 18, 2013 at 9:54 am
Thanks guys,
The Entity.NaturalKey will definately be indexed in some form as it will be displayed on most granular level reports. As to performance, I am not sure because I havn't built it yet. I am a contractor and have 5 weeks to develop a generic DW which will then need to perform admirably with an unknown quantity of data. most of the time the X table will be a 1:1:1 relationship so I doubt that performance will be an issue. The question was whether to save effort when creating queries by including the remote field in the primary driving table - The major con of this would be that if you do join the two tables you would need to specify the table name in the field definitions.
I think I will stick to the 3NF for now and say you need to join to the table if you need the field. I might create a view with the two tables already joined so save code duplication.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply