September 27, 2016 at 2:47 pm
I often come up against the following challenge, and end up de-normalizing my table design in order to enforce data integrity. I would be interested to hear if there is a different way of modelling the data that is both normalized and enforces integrity.
Here is a typical (simplified) example:
create table [ProductType]
(
[ProductTypeId] INT identity(1,1) not null,
[ProductTypeName] nvarchar(100) not null,
constraint [PK_ProductType] primary key ([ProductTypeId])
)
create table [Product]
(
[ProductId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[ProductName] nvarchar(100) not null,
constraint [PK_Product] primary key ([ProductId]),
constraint [AK_Product] unique ([ProductId], [ProductTypeId]),
constraint [FK_Product_ProductType] foreign key ([ProductTypeId]) references [ProductType]([ProductTypeId])
)
create table [ProductTypeProperty]
(
[PropertyId] INT identity(1,1) not null,
[ProductTypeId] int not null,
[PropertyName] nvarchar(100) not null,
constraint [PK_ProductTypeProperty] primary key ([PropertyId]),
constraint [AK_ProductTypeProperty] unique ([PropertyId], [ProductTypeId]),
constraint [FK_ProductTypeProperty_ProductType] foreign key ([ProductTypeId]) references [ProductType]([ProductTypeId])
)
create table [ProductPropertyValue]
(
[ProductId] INT not null,
[PropertyId] INT not null,
[ProductTypeId] int not null,
[PropertyValue] nvarchar(100) not null,
constraint [PK_ProductPropertyValue] primary key ([ProductId], [PropertyId]),
constraint [FK_ProductPropertyValue_ProductTypeProperty] foreign key ([PropertyId], [ProductTypeId]) references [ProductTypeProperty]([PropertyId], [ProductTypeId]),
constraint [FK_ProductPropertyValue_Product] foreign key ([ProductId], [ProductTypeId]) references [Product]([ProductId], [ProductTypeId])
)
SET IDENTITY_INSERT [dbo].[ProductType] ON
INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (1, N'Clothing')
INSERT [dbo].[ProductType] ([ProductTypeId], [ProductTypeName]) VALUES (2, N'Drink')
SET IDENTITY_INSERT [dbo].[ProductType] OFF
SET IDENTITY_INSERT [dbo].[ProductTypeProperty] ON
INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (1, 1, N'Colour')
INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (2, 1, N'Size')
INSERT [dbo].[ProductTypeProperty] ([PropertyId], [ProductTypeId], [PropertyName]) VALUES (3, 2, N'Volume')
SET IDENTITY_INSERT [dbo].[ProductTypeProperty] OFF
SET IDENTITY_INSERT [dbo].[Product] ON
INSERT [dbo].[Product] ([ProductId], [ProductTypeId], [ProductName]) VALUES (1, 1, N'T-shirt')
INSERT [dbo].[Product] ([ProductId], [ProductTypeId], [ProductName]) VALUES (2, 2, N'Milk')
SET IDENTITY_INSERT [dbo].[Product] OFF
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 1, 1, N'Red')
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 2, 1, N'XL')
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (2, 3, 2, N'1 pint')
-- NOTE THAT THE FKS ON [PRODUCTPROPERTYVALUE] MEAN YOU CANNOT RUN EITHER OF THESE
-- WHICH TRY TO ASSIGN A PROPERTY TO A PRODUCT THAT DOESN'T BELONG TO ITS TYPE
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (1, 3, 1, N'Red')
INSERT [dbo].[ProductPropertyValue] ([ProductId], [PropertyId], [ProductTypeId], [PropertyValue]) VALUES (2, 2, 1, N'XL')
Here's what I am trying to model:
1. A product has a product type (e.g. "clothing", "drink" etc)
2. A product type has a number of properties (e.g. "clothing" has "colour" and "size")
3. A product has property values for any/all of the properties belonging to its type
4. A product cannot have property values for properties not belonging to its type
Point #4 is the one that causes the challenge. In order to create foreign keys to achieve this ([FK_ProductProperty_ProductTypeProperty] and [FK_ProductPropertyValue_Product]) I have de-normalized and added "unnecessary" unique constraints:
1. Added [ProductTypeId] to [ProductPropertyValue]
2. Added a unique constraint on [PropertyId], [ProductTypeId] to [ProductPropertyValue]
3. Added a unique constraint on [ProductId], [ProductTypeId] to [Product]
However it seems impossible to achieve all 4 of the above points without these ... is it?
This challenge seems to come up when my design departs from a simple branching "snowflake" and there needs to be "loops" in the relationships:
P ---------> PT
^ ^
| |
| |
PPV ------> PTP
Note: this is a simple example with a 4 tables. With a real-world design with many levels of related tables, the issue becomes magnified (unique constraints on multiple columns, more than one "unnecessary" unique constraint per table etc).
September 27, 2016 at 3:20 pm
If you could add some sample data to your tables above that would help.
September 27, 2016 at 4:28 pm
Done
September 28, 2016 at 7:14 am
I think I'd add another table in there. Instead of having a direct link between ProductPropertyType and ProductPropertyValue, I would define the values, the properties, and then have a table that maps between them. Then, only the ProductPropertyValue that are in the mapping table for each ProductPropertyType will be available to the Product of that ProductPropertyType. That should completely satisfy the business requirement.
Denormalization just isn't a mechanism I would normally use for data integrity. It's kind of the opposite really.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
October 8, 2016 at 7:16 am
Laurence Neville (9/27/2016)
I often come up against the following challenge, and end up de-normalizing my table design in order to enforce data integrity. I would be interested to hear if there is a different way of modelling the data that is both normalized and enforces integrity.
I think you are probably not doing sufficient normalisation: If you don't top at 3NF but intead go on to EKNF and then to 4NF and 5NF wile avoiding the BCNF nonsense (BCNF was a mistake, but descriptions of 4NF and 5NF that incorporate it are excessively common) you may find that you are enforcing data integrity quite effectively without any denormalisation. Of course if teh denormalisation you are referring tois reverting from BCNF to EKNF, that is not denormalisation if the two forms are different (because in that case BCNF is not a normal form for your data model, and reverting to EKNF is normalisation).
This challenge seems to come up when my design departs from a simple branching "snowflake" and there needs to be "loops" in the relationships:
P ---------> PT
^ ^
| |
| |
PPV ------> PTP
Note: this is a simple example with a 4 tables. With a real-world design with many levels of related tables, the issue becomes magnified (unique constraints on multiple columns, more than one "unnecessary" unique constraint per table etc).
Well, your diagram doesn't have a loop in it - it is acyclic - so the loop idea seems to be something of a red herring. Real loops in the network of foreign key constraints would be a problem, but if you had such a loop I am pretty sure that the problem would be that you had chosen a bizarre and illogical bunch of data that didn't represent reality. If the network is acyclic and not a tree, like the network in the diagram, cascades are a pain to handle (which is why SQL Server refuses to handle them and you have to do it yourself) but that shouldn't lead to denormalisation.
Tom
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply