May 15, 2019 at 5:12 am
I've 2 database contain tables as well
Database: testSQL
USE [testSQL]
GO
/****** Object: Table [dbo].[AppUser] Script Date: 15/5/2019 12:56:47 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AppUser](
[AppUser_Id] [int] IDENTITY(1,1) NOT NULL,
[Username] [varchar](255) NOT NULL,
[FirstName] [varchar](255) NULL,
[LastName] [varchar](255) NULL,
CONSTRAINT [PK_AppUser] PRIMARY KEY CLUSTERED
(
[Username] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[AppUserRole] Script Date: 15/5/2019 12:56:47 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AppUserRole](
[AppUserRole_Id] [int] IDENTITY(1,1) NOT NULL,
[Username] [varchar](255) NULL,
[Role_Id] [int] NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[AppUserRole] WITH CHECK ADD CONSTRAINT [FK_AppUserRole_AppUserRole_001] FOREIGN KEY([Username])
REFERENCES [dbo].[AppUser] ([Username])
ON UPDATE CASCADE
GO
ALTER TABLE [dbo].[AppUserRole] CHECK CONSTRAINT [FK_AppUserRole_AppUserRole_001]
GO
Database: testSQL2
USE [testSQL2]
GO
/****** Object: Table [dbo].[AppUser] Script Date: 15/5/2019 12:57:55 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[AppUser](
[AppUser_Id] [int] IDENTITY(1,1) NOT NULL,
[Username] [varchar](255) NOT NULL,
[FirstName] [varchar](255) NULL,
[LastName] [varchar](255) NULL,
CONSTRAINT [PK_AppUser] PRIMARY KEY CLUSTERED
(
[AppUser_Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [UQ_AppUserRole_001] UNIQUE NONCLUSTERED
(
[Username] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[AppUserRole] Script Date: 15/5/2019 12:57:55 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[AppUserRole](
[AppUserRole_Id] [int] IDENTITY(1,1) NOT NULL,
[AppUser_Id] [int] NULL,
[Role_Id] [int] NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[AppUserRole] WITH CHECK ADD CONSTRAINT [FK_AppUserRole_AppUserRole_001] FOREIGN KEY([AppUser_Id])
REFERENCES [dbo].[AppUser] ([AppUser_Id])
GO
ALTER TABLE [dbo].[AppUserRole] CHECK CONSTRAINT [FK_AppUserRole_AppUserRole_001]
GO
In testSQL, custom value defined as Primary Key. In this case, Username(varchar(255)) is Primary Key
In testSQL2, Identity Running Number defined as Primary Key. In this case, AppUser_Id(int) is Primary Key
In testSQL, if User want to change Username value, we can using AppUser_Id(int) as following
use testSQL
update [dbo].[AppUser]
set Username=@NewUsername
where [AppUser_Id]=@AppUser_Id
*Any table using Username(varchar(255)) as Foreign Key, the Username value automatically changed because of Cascade Update
In testSQL2, if User want to change Username value, just changed as following
use testSQL2
update [dbo].[AppUser]
set Username=@NewUsername
where AppUser_Id=@AppUser_Id
*No table rows will affected because of No Cascade Update. Furthermore, Identity Running Number is unchangeable
My question is, which Database Design is best practice and performance wise? Database testSQL or Database: testSQL2 ?
May 15, 2019 at 7:15 am
Here's the bad news. You need both. If logically the UserName value is the business defined identifying value, you should have a unique constraint on it. This is what is called the natural key. However, I, along with a lot of other people, have found that using the natural key in the database as a primary key leads to problems, primarily around performance. You note that cascading updates can take care of changes to the natural key, and you're right. That is a performance nightmare. You're much better off using a meaningless primary key that can be propagated to all the child tables, then, data changes only occur when and where they are supposed to occur instead of across a huge swath of additional locations. However, you can't just have the meaningless key and still maintain business appropriate data. You need to have the natural key enforced as well. So, best practice on the databases I build, use an identity column for the PK (and maybe for the clustered index, totally different discussion) AND have a unique index on the natural key, UserName in this case.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 15, 2019 at 7:40 am
Your writing is valuable. Thanks
May 15, 2019 at 8:56 am
Not a problem. Happy to help. Hopefully we hear from a few others as well. There is some degree of disagreement on this topic.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 16, 2019 at 1:39 am
Here's the bad news. You need both. If logically the UserName value is the business defined identifying value, you should have a unique constraint on it. This is what is called the natural key. However, I, along with a lot of other people, have found that using the natural key in the database as a primary key leads to problems, primarily around performance. You note that cascading updates can take care of changes to the natural key, and you're right. That is a performance nightmare. You're much better off using a meaningless primary key that can be propagated to all the child tables, then, data changes only occur when and where they are supposed to occur instead of across a huge swath of additional locations. However, you can't just have the meaningless key and still maintain business appropriate data. You need to have the natural key enforced as well. So, best practice on the databases I build, use an identity column for the PK (and maybe for the clustered index, totally different discussion) AND have a unique index on the natural key, UserName in this case.
Ditto that. I'll also flat out say that since it's a requirement that the user can change their user name, that immediately disqualifies the column as either a PK (for the very reasons you state) or a clustering key for me with only super rare exceptions.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2019 at 6:03 am
It's better it to update such "key" values as user name in this example.
old use name must stay in the database, in case some historical records have to be audited.
external references like, say, xml messages, verification tokens, etc., will contain the original user name, which could not be matched to any record in the database anymore.
you better have "effective" dates against each user name record, and when a user changes the name, add a new record, link it to the old one and deactivate the old name by putting EffectiveTo date against it.
_____________
Code for TallyGenerator
May 16, 2019 at 6:28 am
Oooh, this is a HUGE topic. Posting into comments in a thread is not going to cover everything. You need to look at a completely different database design. Here's an old article I wrote about how to do versioned data. A different point of view on the same topic. You should also look at Change Data Capture. This goes WAY beyond what defines a key. In fact, you might get better information opening a new thread.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 16, 2019 at 12:17 pm
Oooh, this is a HUGE topic. Posting into comments in a thread is not going to cover everything. You need to look at a completely different database design. Here's an old article I wrote about how to do versioned data. A different point of view on the same topic. You should also look at Change Data Capture. This goes WAY beyond what defines a key. In fact, you might get better information opening a new thread.
Just to be sure (and I might be reading things incorrectly), I don't believe that Sergiy is suggesting keeping history in "comments". I believe what he's saying is that there are a whole lot of people out there that do and the only way to ever be able to match the two up in the future would be to ensure that history in the database IS saved/versioned as you suggest.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2019 at 12:33 pm
As a bit of a sidebar, there's no way that I'd keep history/versioning in the same table as the current data. Nor would I use CDC for this table. Temporal Tables or something similar... yeah... that's the ticket for me. If a history/version/audit table contains anything but old rows that were made old by UPDATEs or DELETEs, then you've done something wrong, IMHO. If there's a "IsLatestVersion" column in the table, you've done something really really wrong.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 16, 2019 at 1:44 pm
As a bit of a sidebar, there's no way that I'd keep history/versioning in the same table as the current data. Nor would I use CDC for this table. Temporal Tables or something similar... yeah... that's the ticket for me. If a history/version/audit table contains anything but old rows that were made old by UPDATEs or DELETEs, then you've done something wrong, IMHO. If there's a "IsLatestVersion" column in the table, you've done something really really wrong.
Jeff, I would like you to elaborate more on why you think this is a bad design, to have the current value in a 'history' table. I've seen database designs where the most current version is in say a 'CurrentPolicy' table and the history data is in a table like 'CurrentPolicy_History'. Now that makes it easy for the novice SQL users. I find it a nightmare. I like to design the code so that no matter when I run the code for a certain date i get the same results by just passing the one date. I this design you would need to change the SQL to look at the 'History' table if you needed to run a report to see how the data looked a year ago.
That is why the 'History' tables I have set up contain the history data as well as the current version. To me this is the best way to do this. It's just a matter of setting up a sub query correctly so you get the 'correct' row from the 'history' table you are after.
Now I've also seen where in my example tables above that both tables would contain the most current version. And I have also copied that design, to help the novice users.
I value your opinion Jeff, just would like to know more of why you think his is a bad design.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
May 16, 2019 at 10:01 pm
As Grant stated, it's a huge subject, so I'll just limit (at least for now) most of this to a "User" table similar to what the OP asked about.
First, the idea of using a surrogate key, such as an IDENTITY column, seems essential for this type of table to me. Yes, you could use a SEQUENCE but then, in order to guarantee a correct sequence value, you would have to either write any and all code to use a sequence or you need to write an INSERT trigger to guarantee that a correct sequence number is used. Writing all code to use a correct sequence number is fine but that won't protect you when it comes to ad hoc inserts. You could also protect against ad hoc inserts but now we're starting to get complicated. An IDENTITY column seems to be the perfect choice for the PK.
On to your question... and, yeah... a whole lot of people will have different opinions on this subject and so consider this to be my opinion.
If you store historical rows in the User table, that means that you'll be inserting new rows in the table instead of updating existing rows. That means that to preserve the UserID (AppUser_ID according to the OP's table), which is the PK, you'd need yet another column to do so AND you'd have to make sure that it was populated with the value of the IDENTITY column the first time that a row for the user was created. That also means the second column would sometimes have to be treated as the PK along with some indication (start and end date of version of the row, my preferred method in an SCD TYPE II fashion if the table is going to be used for both current and historical data), maintain a version column, or have an IsCurrentRow flag column (which I never use because I need the start and end date columns anyway).
And then you have the UserName column, which must be unique if that's what it's truly being used for. What will happen when a female user wants to change her last name to her husband's surname but wants to keep the same user name? Yes, I agree that there are ways around that. I've just not found it to be worth it in the long run for other reasons, as you'll see below.
As you suggest, having current row and history rows in the same table does make reporting easier: There's no need to have a report be smart enough to look at one table, the other, or both depending, of course, on the purpose and type of report. The reports DO have to be smart enough to use something like IsCurrentRow to look at only current, only history, or both and that shouldn't be a problem. I can't speak for anyone else but, even for a table such as this, there are people writing ad hoc reports (yeah... big mistake but it's like fighting city hall... I'd have to burn it down to win that fight) where I don't actually want them to have access to historical rows. I can certainly fix that with row level security in the form of a couple of pass-through views and grant individuals read privs on only the views but that ends up being a pain, as well.
Because user tables don't generally get that large even when history is contained, these next couple of items may not seem to apply but I generally follow certain types of auditing rules because history is a form of audit and can be used as such if the notion strikes someone. One of the rules (at least where I work and the auditors love it) is that audit tables must be separate. If the table doesn't qualify as an audit-able tale, then no need for a separate history table, right? That absolutely correct until a bee gets into someone's bonnet and they change the classification of the table to being audit-able. I've been burned by that at other companies where I lost the battle of saying "If it has history, it's an audit-able table and it's better to be safe than sorry later".
Now, for tables that aren't going to stay as small as a user table, there are other issues. For example, let's talk about indexes. If you agree that historical rows must never change (and I'm sure you will), why should you need to do index maintenance on them? Also, I've found that there are serious advantages to having different and fewer indexes on current row information while there's also an advantage to also having different and, perhaps, more indexes on historical tables even if they're not being used for audits. Seriously reduced index maintenance is also included in that. I also don't see the need to continuously backup that which will never change and so I frequently move such historical tables to an "Archive" database (whether the table is partitioned or not... whole 'nuther discussion there). And while the "guarantee" by Managers and Developers of a given table "never getting that big" is certainly something to consider, if you combine the fact that's one of the more frequently broken guarantees and mix that with the other "guarantee" of a table never being declared "audit-able", I find it both easier and safer and provides other advantages to just follow the rule of "If history of any type is involved, it must be stored in a separate table".
There's also the occasional problem of someone writing and ad hoc (or even "good")query with an "oops" attached to it and they end up doing a table scan and if the history is large, it could provide some interference or I'll get a call from the person that issued the query about the database being slow. While a dose of pork chops and some training can help curtail that, that's just one more thing that can be avoided if I keep the current data separate from the historical data.
And, yeah... I don't know all there is to know about them, yet (I've finally found some time to start experimenting with them), but, from what I've seen so far, I'm thinking that Temporal Tables made all of that a proverbial walk in the park and they made the whole process much more secure.
I've probably left a couple of things out but that's my opinion.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 17, 2019 at 3:16 am
I agree, temporal tables are probably a better approach than CDC. I actually was sitting here trying to remember what they were called. Ha! Old age is fun.
As to history in the table, I was part of (not taking anything but partial credit on this one) a team that successfully put together a system that did this. Performance was awesome and we had all the data in on set of tables. It required a lot of coding & design discipline, but it worked. Not arguing with Jeff, just saying, it can be done. It's not easy.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply