How To Approach Adding A primary Key To An Existing Table

Question

How To Approach Adding A primary Key To An Existing Table

YSLGuru

SSC-Insane

Points: 21661
More actions
July 24, 2024 at 3:26 pm

#4439579

In our vendor created/managed DB we have a table that is like an audit table the vendor uses for tracking down issues. It's something you can turn on or off via the application that uses the DB. The table has NO Primary Key. It does have a date stamp like column so we've used it as the way to determine order of operations. The problem is it's no consistent. When I make a change it stamps it with my time, my time zone. We know that it uses other times because that last X rows in the table based on this time stamp have times that have yet to come, that are 1 or more hours in the future. Trying to get teh software vendor to explain this has proven difficult because they see it as something they use and not really clients. We can use it because we can access teh DB to query it. We can even modify some things like adding our own custom tables, views, UDF's and in some cases depending on the table, add triggers.
What I'd like to try to do is add a PK to this thing so we can get the data form it in the actual order it occurred and not what the time stamp has. My concern is that it's an existing table with a LOT-O-ROWS and because it captures every Update, Insert and Edit command it's highly volatile.
Thoughts? Ideas? Suggestions? Maybe even a prophecy or 2?

Kindest Regards,
Just say No to Facebook!

July 25, 2024 at 12:48 pm

This was removed by the editor as SPAM
July 25, 2024 at 12:53 pm

This was removed by the editor as SPAM
September 30, 2024 at 1:13 pm

This was removed by the editor as SPAM
October 28, 2024 at 12:02 pm

This was removed by the editor as SPAM

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101120 More actions · Answer 1

If the table is just a heap (you said there's no PK, but that doesn't necessarily mean there's no clustered index), as you know, you'll have to rewrite the whole table to add any kind of clustered index (PK or not). In that case, you'd likely want to use the create-a-new-table-and-switch-over-to-it approach. For that, you'd need the disk space and some way to tell new rows in that that table from old ones for rows added while you're copy the bulk of the old rows to the new table.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Dennis Jensen SSChasing Mays Points: 606 More actions · Answer 2

Well there is actually numerous ways this can be approached actually but most of them are going to greatly depend upon the layout of the current table as well as the data contained within it.

Still addressing this specific case without taking everything else into account I think I hear the following:

1) We use a date-time stamp that changes based on the last insert or update as the key for the record -- bad practice.

2) We want to add a PK (sequential number) to the records to track the order of execution -- good practice

3) The date-time stamp sometimes has unexplainable future dates -- this needs to be resolved as to how you plan to sequence these. Personally I would sequence them as they come in regardless of the date-time stamp since you have already determined it is not a reliable piece of data.

4) Now assuming you want to track the executions -- aka series of date-time stamps for insert and updated records this means you have to have a means to uniquely identify a single series of records. For instance if Record 1 comes in at 1:00pm and then is updated at 2:00pm the date-time stamp changes but the sequence number does not, that is unless you assign it a new sequence that catalogs that update (which seems what one would do) but then you need a way to reference the parent record.

A secondary solution might be to create a secondary table that tracks the changes with date-time stamps and denotes what was changed -- aka a change log. Then only on inserts does a new record get a sequential identifier and any updates to that record are recorded within the change log table.

If you have any more information to better outline the issue, then perhaps that might be helpful in helping us help you. Otherwise I am currently available to take on assignments 😉 if you would like me to dive into this more deeply.

This reply was modified 5 months ago by Dennis Jensen.

YSLGuru SSC-Insane Points: 21661 More actions · Answer 3

Scott/Dennis - Thanks for such quick replies. I didn't provide the DDL for teh table originally so it's listed below.

CREATE TABLE [dbo].[TRACK](
[hRecord] [NUMERIC](18, 0) NULL,
[dDateIn] [DATETIME] NULL,
[sUserName] [VARCHAR](120) NULL,
[sTableName] [VARCHAR](60) NULL,
[sActionType] [VARCHAR](6) NULL,
[sSqlLine] [VARCHAR](2000) NULL
) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [I_Track_1] ON [dbo].[TRACK]
(
[sUserName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

CREATE NONCLUSTERED INDEX [I_Track_2] ON [dbo].[TRACK]
(
[dDateIn] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
GO

Kindest Regards,

Just say No to Facebook!

Dennis Jensen SSChasing Mays Points: 606 More actions · Answer 4

Okay you did not answer the question I posed about whether or not the date-time stamp gets changed based on an Insert versus an Update -- that was an assumption that I was making. Knowing that helps greatly in understanding the problem.

So yes you will definitely need to add a column to the current table as Scott outlines all though, I have done this without creating a new table but this greatly will depend on the size of the table and the nature of the data. So again upon parameters that we are not aware of.

Also what is hRecord -- this seems like it should be a unique identifier for each record if perhaps combined with the date-time stamp maybe? Everything else within it looks like a typical change log table

YSLGuru SSC-Insane Points: 21661 More actions · Answer 5

I don't know if the time changes based on the type of change. I have no idea why some items are future dated. I do know from limited testing that when I do something within the application, regardless of the type of change, it gets recorded with my actual time; my time zone. Since there are future dated items in it we can only speculate that this is true for all DML commands. IF however I make a change via SSMS, that doesn't get recorded. The capturing of changes is being managed by the software app itself so any changes done outside the app aren't captured. I know this is a BAD design but I didn't make it I'm just stuck with it. When we asked the vendor how the time stamp works they said it is done based on the clients time zone but clearly that's not completely true since there are future dated items.

As for the hRecord, I too initially thought that would be a key until I looked at it. This column is NULL in every row. My guess is it was something planned to serve as a key but for whatever reason it didn't get implemented.

In terms of the amount of data, it varies. AS t current there are 94,986,439 rows and the row with the oldest time stamp is 45 days ago. I can tell you it's not always 45 days. At one point there was 3 months worth. We have asked the vendor how this w0rks, what determine how much data is retained and we were told 3 months but this is clearly not the case. One time when I checked there was close to 6 months worth but that was in the first few months that this thing was enabled. So we can't make any assumptions on how much data it will always retain.

Apologies if I miss something but I'm trying to respond to the posts while also in a virtual class.

Kindest Regards,

Just say No to Facebook!

ScottPletcher SSC Guru Points: 101120 More actions · Answer 6

It's nice that there's already an index on dDateIn. I would suggest something like this:

--Setup

(A1) Create a new table, with a usable clustered index. Add data compression to the table (clus index) if it:

(1) is available (2) is allowed and (3) would save significant space (it usually does!).

Use proc sys.sp_estimate_data_compression_savings 'dbo', 'TRACK', NULL, NULL, 'PAGE' to see how much space would be saved.

--Process

(B1) Capture the process start datetime (@max_date_to_copy).

<Loop>

(B2) Copy rows from the existing table into the new table, based on dDateIn:

(1) Copy TOP (some number), where (some number) is chosen so that it is as high as reasonably possible while still allowing SQL to:

(A) Use the dDateIn index to lookup the rows (check the query plan); and

(B) INSERTs run reasonably quickly.

For example, something roughly like below (uncomment the sequence code if you want a unique key value):

/*CREATE SEQUENCE dbo.TRACK_sequence AS bigint START WITH 1 INCREMENT BY 1 NO CYCLE;*/
GO
CREATE TABLE [dbo].[TRACK_NEW](
[hRecord] [NUMERIC](18, 0) NULL,
[dDateIn] [DATETIME] NULL,
/*[iSequence] bigint NOT NULL CONSTRAINT TRACK__DF_iSequence DEFAULT NEXT VALUE FOR dbo.TRACK_sequence,*/
[sUserName] [VARCHAR](120) NULL,
[sTableName] [VARCHAR](60) NULL,
[sActionType] [VARCHAR](6) NULL,
[sSqlLine] [VARCHAR](2000) NULL
) ON [PRIMARY];

CREATE /*UNIQUE*/ CLUSTERED INDEX [TRACK__CL] ON dbo.TRACK_NEW
( dDateIn /*, iSequence */) WITH ( DATA_COMPRESSION=PAGE, DROP_EXISTING=OFF, ONLINE=OFF, SORT_IN_TEMPDB=ON, FILLFACTOR=98 ) ON [PRIMARY];

GO

DECLARE @dDateIn_start datetime;
DECLARE @dDateIn_end datetime;
DECLARE @max_date_to_copy datetime;
DECLARE @rows_at_a_time int;

/* change any/all of these values as needed (for example, to restart for a given datetime) */
/* !!For best performance, check the query plan to verify that SQL does use the nonclus index to */
/* do the INSERT INTO dbo.TRACK_NEW based on this value of @rows_at_a_time!! */
SET @rows_at_a_time = 100000;
SET @max_date_to_copy = DATEADD(MS, -3, CAST(GETDATE() AS date));
SET @dDateIn_start = '19000101';

WHILE 1 = 1
BEGIN
IF @dDateIn_start >= @max_date_to_copy
BREAK;

/* based on dDateIn order, find the dDateIn value @rows_at_a_time values ahead of the current value; */
/* SQL can use the nonclus index to do this, so it should be very fast */
SELECT TOP (@rows_at_a_time) @dDateIn_end = dDateIn
FROM dbo.TRACK
WHERE dDateIn >= @dDateIn_start
ORDER BY dDateIn;
IF @@ROWCOUNT = 0
    BREAK;

IF @dDateIn_end > @max_date_to_copy
    SET @dDateIn_end = @max_date_to_copy;

/* Since there is no unique key, the number of rows copied may be somewhat more than the specified @rows_at_a_time */
/* !!For best performance, check the query plan to verify that SQL does use the nonclus index to *//* do the INSERT INTO dbo.TRACK_NEW based on this value of @rows_at_a_time!! */
INSERT INTO dbo.TRACK_NEW /*(...)*/
SELECT * /*..., ...,*/FROM dbo.TRACK
WHERE dDateIn >= @dDateIn_start AND dDateIn <= @dDateIn_end
ORDER BY dDateIn;

SET @dDateIn_start = DATEADD(MS, 3, @dDateIn_end);

/*WAITFOR DELAY '00:00:01'*/
END /*WNILE*/
/* at this point, all rows before today should be copied */
/* now, copy rows from today from time 00:00:00.003 to, say, five minutes before current time */
/* then, create the other nonclus index: */
CREATE NONCLUSTERED INDEX [TRACK__IX_sUserName] ON [dbo].[TRACK_NEW]
(
[sUserName]
) WITH ( DATA_COMPRESSION=PAGE, DROP_EXISTING=OFF, ONLINE=OFF, SORT_IN_TEMPDB=ON, FILLFACTOR=90 ) ON [PRIMARY];
GO

/* finally, copy all rows from five minutes ago to end of table, then rename the tables */

EXEC sys.sp_rename 'dbo.TRACK', 'TRACK_ORIGINAL'

EXEC sys.sp_rename 'dbo.TRACK_NEW', 'TRACK'

/*Verify no rows are missing from the last few minutes, then, whenever you are comfortable with the change,

drop table dbo.TRACK_ORIGINAL.*/

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 7

My advice is actually going to be the opposite of a lot of others here - I wouldn't add any columns to the table or change the structure. The reason being that if the application that reads and writes to that table is expecting things to exist in a specific way, adding a column may result in the app misbehaving. For example, if the application is doing a SELECT * on the table and expected 10 columns back and you add a column, the app may not handle that scenario.

IF you really want to have a table for this, what I would do is make a new table and have a trigger on the current tracking table that copies the data from INSERTED into your new table. Double the disk space used by that auditing table, but reduces the risk that you are going to break the app.

My advice with ANY 3rd party vendor tool is to not touch any of their base tables. If you need to add/remove anything, do it in a new table and have a trigger on it. If you want to keep the disk space usage lower (ie not duplicate the whole table), calculate a hash on the row and use that hash as a map between the base table and your custom one which stores additional details.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

ScottPletcher SSC Guru Points: 101120 More actions · Answer 8

My main advice would also be not to modify the table structure unless you really need to (I deliberately put the code in comments that (if you wanted to) changed the structure.)

However, I would change the date index to clustered and I would compress the table. Those should be transparent to their app: the compression certainly will be.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

I agree with Brian... check with the 3rd party before you mess with one of their tables. While this is just a simple "audit" table that won't even stand up in a court of law, you don't want to void a warranty or any possible future help or prevent future updates, etc.

Once you find out if you can use and populate the column and consider what the table is used for and it's current size, I'd set it up for automatic monthly table partitioning, possible with yearly (or monthly) file groups so that you can set the older months to READ ONLY to help relive backups and a couple of other things.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 10

Jalaal Akber wrote:

Index Optimization: Review and optimize indexes to ensure they complement the new primary key and do not degrade performance. Data Integrity Checks: Implement regular checks to ensure data integrity, including monitoring for any changes that could introduce duplicates or NULL values into the primary key column.

The above bit of advice is useless. The primary key, by definition, must be unique, so you will never have duplicates. NULLS are also impossible in a primary key column by definition of the primary key. The primary key is there to ensure that each row is unique. Microsoft docs say that for a primary key - "All columns defined within a primary key constraint must be defined as not null" and "each combination of values from all the columns in the primary key constraint definition must be unique". Link - https://learn.microsoft.com/en-us/sql/relational-databases/tables/primary-and-foreign-key-constraints?view=sql-server-ver16

The quoted post sounds like it was written by an AI and nobody read and understood the reply before passing the information along...

I do agree with ScottPletcher about some changes having no impact to the application, but there is also always risk with making changes that go against what the vendor set up and recommends. Like Jeff said, it could void your warranty and support. I've seen some vendors where adding an index on one of their tables means you lose support and a new index shouldn't cause any issues except a small performance hit on CRUD operations. Mind you, even adding a trigger can ruin support, so my advice of basically cloning the data with a trigger can be bad too. Using an SSIS job to migrate the data on a schedule (unfortunately wouldn't be real-time that way) could be an option, but pretty sure that is not a good option for your specific scenario... CDC (Change Data Capture) may be an option though but again, may result in loss of support/warranty. I have one system where if my database is in FULL recovery mode, I lose support for "running in an unsupported configuration". The database must be in simple recovery mode and I have no clue why. I argued with the vendor on that until they stopped replying and I decided support was more important than a good RPO. The app owner agreed.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.