August 11, 2012 at 1:59 pm
I need some help with designing my database.
I'm creating a database that holds patents.
A patent has this data
Patent number
Title
Classifications
Inventors
Filed date
Issue date
Referenced patents
Referenced by patents
Short description
Since Inventors and Classification can be more than one, I have created three tables
Patents
Inventors
Classification
I have also created a two join tables
PatentsInventors
PatentsClassifications
To top it off I have the Referenced patents and Referenced by patents, which are other patents (in the Patents table) that this patent has a reference to/from.
That's two more tables
PatentsReferenced
PatentsReferencedBy
After that rather long winded intro...
Have I created a structure that is OK?
How should I use Foreign Keys on these tables?
I'm using MS SQL 2008 R2 Express, Windows 7 Ultimate
// Anders
August 13, 2012 at 4:42 am
This was removed by the editor as SPAM
August 13, 2012 at 5:32 am
One thing to add - change your ID fields to int. Unless I'm missing something, there's no reason to make them varchar or any text based data type.
Mark
August 13, 2012 at 5:46 am
The ids have characters in them...
Maybe I should create a new column that holds the value and change patent_id and classification_id to INT and make them an Identity (I'm using MS SQL Server Management Studios label on it)
Then I will automatically get a new unique id...?
// Anders
August 13, 2012 at 9:28 am
Yes, create a PK on each table and make that the auto incrementing ID and set up FK constraints where appropriate. I tend to reserve "ID" for this field only.
Mark
August 13, 2012 at 1:02 pm
OK, I will do that.
What are the advantages for using INT as PK (which I assume stands for Primary Key) vs CHAR?
// Anders
August 13, 2012 at 2:10 pm
The main advantage is the storage 4bytes against 20bytes.
Another advantage would be the ease of creation for new identities (SQL Server will create them automatically).
However, there's a constant debate on how you should declare you Primary Keys. The option for the auto-increment ints referred as surrogate keys that end-users shouldn't see and the option for natural keys that are defined by business rules (a good example would be the patent number that uses an alpha-numeric code).
I would suggest to use both (a surrogate and a natural key) in the tables that apply. Natural keys will help you to identify the row and surrogate will help you with other processes and to ensure the uniqueness. I realize that it would sound redundant to use both, but you just use both in your primary table and use the surrogate for the foreign keys.
August 13, 2012 at 3:03 pm
Another advantage of INT declared as IDENTITY for surrogate keys is the increased potential of keeping fragmentation down over time. Are patent ids ever-increasing in value over time as they are issued?
More important than your primary key, which can be a logical choice like your alpha-numeric patent id if it is in fact unique and non-null, is the clustering key. A clustered index is more important to focus on when it comes to performance, space planning and maintainability. The primary key and clustering key do not necessarily need to be one in the same. Some reading for you:
More considerations for the clustering key - the clustered index debate continues!
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply