March 12, 2012 at 2:51 pm
Hi everyone
I want to know what would be best practice for the following scenario:
Article table with link to a Category and each Category can have sub categories.
In my head there are 3 solutions, but don't know which is best:
1. Separate Category and Sub Cat tables, with a link in SubCat to MainCat. The Article table will then just hold the SubCat id. You can use a lookup to get the MainCat
2. Same as above, but you store both MainCat and SubCat id's in the Article table.
3. All categories in one table, with an extra column defining parent/main category and then a linking table for category id to article id.
In my mind, performance should be best in solution 2 as this uses the least joins?
Please advise.
Thank you.
March 12, 2012 at 4:05 pm
I would probably separate the Category and Subcatergory tables. Only data related to Category would be in the Category table and only data related to Subcategory would be in the subcategory table. Such things as descriptions, and other specific information.
If combined, you will have the potential for duplicated data and the need to ensure all cpoies of the duplicated data are properly updated when needed.
March 12, 2012 at 6:59 pm
Method 1 works badly if some articles point to category and others to subcategory; either you give up on database-enforced referential transparency, or you do it with a trigger instead of with a foreign key a constraint, or you use two pointers and two nullable reference constraints plus a check constraint to ensure that one or the other pointer is null (accepting to have two nullable columns in the article table). So I wouldn't use method 1.
Method 2 copes slightly better with the basic referential transparency constraint (at most one of the columns which is source of a reference constraint has to be nullable); but if you want the database to ensure valid data, you have to ensure if two articles point to the same subcategory they also point to the same category. That can only be done with a trigger, which is required even if all articles point to a subcategory (so that neither reference needs to be nullable). Ideally of course the category to subcategory relationship is expressed between those two tables, which would have an effect on that trigger (probably make it easier to understand version, in fact). But again, I don't like it.
Method 3 could be varied a little: the article table points to an entry in the category_and_subcategory table, so there's only one pointer and it can never be null; each subcategory entry in the latter table points to the entry for its parent category, while each top level category points to itself (this pointer is the only extra column in the category table, a separate marker to indicate whether something is a subcategory or not is unneccessary; and it doesn't need to be nullable - indeed it must not be nullable). I don't understand what the pointer from category id to article id that you mention is. No trigger is needed for this method. This method, unlike the other two, can support a hierarchy with more than 2 levels of category, which may or may not be useful - probably not, judging by what you say about joins. There needs to be a clean relationship between primary keys of top level categories and primary keys of subcategories. If you have only two levels (top level category and subcategory) you could insist that top level categories have negative keys (if you use numeric keys) or have one of a small set of first characters (if they are strings) whereas subcategories have positive keys (using integers) or begin with none of those characters (using strings); this allows a simple check constraint to be written that ensures there are only two levels. If you have a deeper hierarchy you might want to insist that a subcategory's primary key compares greater than its parent's key (whether they are numeric or string - with of course all the mess of comparing composite keys if the keys are not simple, but it looks from your query as if they are simple single column ids); this ensures that a very simple check constraint can be used to ensure that there is no loop in the category hierarchy except where a top level category points to itself to indicate that it is top level. Anyway, uou end up with no nullable columns, which is good news, and no need for any triggers, which is also good news. So I think I would choose that method.
Tom
March 13, 2012 at 12:09 am
Thanks for the extensive reply Tom, appreciate it.
I think I will go with option 3 then as this allows for better scale-ability in case another category level gets added at a later stage.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply