April 2, 2009 at 6:27 am
I have a small application where users mapped to my application will login in and make entries into the system. Each user is mapped to a department and one department can have multiple users. I need to cerate a report in my application where each user can view the entries made my him and all other users of his department only i.e all users should be able to view data of their respective department only.
For this I have a departmentmaster table where I have columns like DepID,DeptCode,Departmentname.
DepId is a auto generated identity column of type integer where values will be inserted by Sql on every new insert starting from 1.
DepCode is a 4 character code of type varchar(4) for the department and Departmentname is the name.
DepCode can have values like 'ACCO' for 'Accounts' department and 'PROC' for 'Processing' department
I also have a Usermaster table where each user is mapped to a single department
Also there is a Transaction table where I am storing the details of the transaction done. In this transaction detail table I have the department column.
My question is that whether in the usermaster table and the Transaction table the column (for department mapping) should be the 'DepId' or the 'DepCode'. Which column would be ideal to have the join on considering I would have to generate reports and have many transactions in the transaction table. How do I consider my table design for this requirement
Thanks
April 2, 2009 at 2:47 pm
Hi
The answer for the join type is: the INT column.
I would add the UserId instead of the DepId to the transaction table to avoid data redundancy. So you can use a CTE to get all users of the current users department and join it with the transaction table.
Only in case of very huge tables with millions of data it makes sense to add the DepId to the transaction table for better reporting. Usually this would be done by a data-warehouse database for reporting.
Hope this helps!
Greets
Flo
April 2, 2009 at 3:32 pm
[font="Verdana"]I agree with Flo.
If you use generated keys (like the integer ID field) rather than "natural keys" or "business keys" (like the Code field), then you can save yourself a lot of problems. Here are some reasons why.
Natural keys do change! All you need is a decision to change department codes, and suddenly you not only have to change the department table, but all of the other tables where you linked by the department code.
Natural keys tend to be longer than generated keys. So all of your related tables take up more storage space, and the joins are harder to make.
Natural keys are often strings. SQL Server uses a bunch of rules to compare strings. Not only is this slower than doing a binary comparison for integers, but also the results can vary depending on whether the database is case sensitive or on the sorting order of the strings.
System generated numeric keys also have the advantage of being ideal for clustered indexes. They order your tables by the order in which rows are created, and optimise the amount of time that SQL Server takes to allocate storage. (Note that this is not true for other systems like Oracle that use different space allocation schemes, but is true in both SQL Server and Sybase.)
Hopefully that gives you enough background to understand why you use ID fields over Code fields for creating relationships. 😀
[/font]
April 2, 2009 at 5:17 pm
Bruce,
I just wanted to add that a cluster key on an identity column is not without issues:
1. While it can help keep clustered index fragmentation down, this is not true if existing records are updated with values which will no longer fit
2. Non-clustered indexes will likely fragment more quickly than they might do otherwise.
3. Inserts always happen on the same page, so a 'hot spot' can develop if insert activity is high.
4. Statistics on the ID column will not cover recently added items. If the maximum ID in the table was 1000 the last time statistics were built, searches on rows 1001+ may produce a poor plan, as the QO will expect just one row from the search. This problem is worse for larger tables, as the algorithms for auto-updating statistics have effect of rebuilding statistics much less often on large tables. Note that the bad statistics will also rule out parallel plans in most cases.
Having said that, a unique index on a INT column can have amazing performance benefits in parallel plans with a hash join with bitmap filter, where the QO is often able to push the bitmap 'in row' on the probe input, rather than eliminating rows at the repartition streams operator.
Sorry if all that is a bit random!
Cheers,
/Paul
April 2, 2009 at 6:43 pm
Paul White (4/2/2009)
Bruce,I just wanted to add that a cluster key on an identity column is not without issues.
[font="Verdana"]Hey Paul,
Any design for a large system needs to be tuned for physical performance. In this specific case, it was around an alternative between natural keys (which would, as it happens, still have many of the same issues you raised) and IDs. It was also mentioned that this is for a smallish system.
So yes, there's a whole 'nother level of complication that happens when you start tuning a design for a physical implementation. But you don't do that at the initial design level. And for there, building your relationships around IDs is a better choice than natural keys.
I could argue some of your points, btw (having re-read them.) But I think that would be side-tracking the point of this discussion.
[/font]
April 2, 2009 at 7:29 pm
Bruce W Cassidy (4/2/2009)I could argue some of your points, btw (having re-read them.) But I think that would be side-tracking the point of this discussion.
[/font]
Hi Bruce,
Sure. I should have started off by saying that I agree that natural keys are a poor choice and a surrogate is my default starting point too. I guess I just wanted to mention some of the less glamorous implications of clustering on a monotonically increasing key. I do enjoy a good discussion, so if you feel inclined, send me your thoughts in a PM!
Cheers,
/Paul
April 2, 2009 at 8:12 pm
Thanks to all for your replies
What I understand from your replies is that it would be best to have the identity column for the join rather than the generated string column becasue of indexes and other issues.
Suppose even if say the Depcode(the string column) would never change its value, would you still advice for the identity column as the foreign key. Ofcourse I need to look at the future since my transaction table will be huge in terms of data.
April 2, 2009 at 8:33 pm
znkin (4/2/2009)
Thanks to all for your repliesWhat I understand from your replies is that it would be best to have the identity column for the join rather than the generated string column becasue of indexes and other issues.
Suppose even if say the Depcode(the string column) would never change its value, would you still advice for the identity column as the foreign key. Ofcourse I need to look at the future since my transaction table will be huge in terms of data.
[font="Tahoma"]If I got a dollar for every time I heard "the natural key will not change" and then... discovered it did change, I'd probably have around $3. 😀 They don't change often, but they do change.
The real benefit is that you don't have to carry the larger key around for your relationships (and you avoid string comparisons.)
When you come to design your transaction tables, I'd suggest you keep two rules of thumb in mind:
1. Make your transaction table "write once".
Ensure that nothing modifies transactions. If you have a process that comes along and adds some additional information to transactions, do it as an insert into a related table.
2. Keep your transaction tables as lean as possible.
Avoid copying down additional information that can be looked up "to make the developers lives easier because they don't know how to write joins." If you really have that issue, you can create a view that does the joins for them.
3. Don't use your transaction store for reporting.
If you have business reports running off your transactions, sooner or later you are going to hit the trade off of reporting performance versus transaction processing performance. You will have reports locking out transactions and vice-versa. It turns into a mess.
So right away you want to consider that you will have some sort of reporting version of your transactions. The transaction system should capture the data ASAP and get on with the job of capturing more data. You can have a separate process that reads new transactions and copies them across to a reporting system.
4. Learn how to count to two!
I think I failed that one! 😀
[/font]
April 2, 2009 at 9:07 pm
...and a further quick word on the evil that is string keys.
Not only are they much more work to compare on than integer keys, but the comparison depends on collation too. A place I worked once had a situation where different databases had different collations (for historical reasons). As a consequence cross-database and linked-server string comparisons were not fun.
If I had a dollar for every time a query failed with the 'unable to resolve collation conflict' message, I'd be considerably richer than Bruce 😀
Worse, the collation of tempdb rarely matched either. Joining to temporary tables on strings was therefore also decidedly less than fantastic...
String keys: just don't do it.
/paul
[and I didn't even mention case-sensitive matches!]
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply