Changing from Non-IDENTITY to IDENTITY and vice versa
In my career I have come across many situations where it was required to include/remove the Identity Property to/from a column at a latter stage of the development cycle due to some flaw.
If a sequence number is manually maintained, it would require the application to have a separate logic. The change-over will require extra effort and additional over heads. The obvious workaround for this would be to introduce an Identity Column to the table which will not require any additional effort or cost. The main disadvantage in it is that it cannot be updated.
Before SQL Server 2005 was introduced there were only two methods to add an Identity Column to a table through TSQL. There is no DDL statement that assists with altering a column to an Identity and vice versa. The two methods are as follows.
I will be using the following table structure in the explanation:
CREATE TABLE SOURCE_TB
(
ID INT NOT NULL
,DATE DATETIME
,COST MONEY
)
100,000 records were inserted to the table SOURCE_TB when testing was carried out
Method 1 - Adding a new Identity column to the existing table.
The thinking behind this is to introduce a new column (SOURCE_TB.ID_II) to the SOURCE_TB table with the Identity and remove the column SOURCE_TB.ID.
Following are the steps that could be followed:
Add a new identity field SOURCE_TB.ID_II
Drop all related objects ex. Indexes, Foreign Keys, Primary Keys that have been created on SOURCE_TB.ID
Drop the field SOURCE_TB.ID
Change the name of SOURCE_TB.ID_II to SOURCE_TB.ID
Recreate all Foreign Keys, Primary Keys and other constraints that were dropped in step 2
The Code
ALTER TABLE SOURCE_TB ADD ID_II INT IDENTITY(1,1)
ALTER TABLE SOURCE_TB DROP COLUMN ID
EXEC sp_rename 'SOURCE_TB.ID_II' ,'ID','COLUMN'
The following counters were gathered from the profiler:
Duration | CPU | Reads | Writes | |
Adding the column Source.ID_II | 7,957,414 | 7,422 | 38,657 | 451 |
Dropping the Column Source.ID | 20,089 | 0 | 28 | 2 |
Rename the Column Source.ID_II to Source.ID | 296,194 | 0 | 194 | 0 |
Disadvantages
Depending on the size of the row there could be a possibility of a page split
The columns will not be in the original order when SOURCE_TB is created.
Originally the order of the columns would be ID, Date and Cost and after including the Identity column the order will change to Date, Cost and ID. Assume that the application code didn’t explicitly specify the column name when retrieving data from the Data Base through a record set. As the fields are not retrieved in the same order as before the application may generate an error.If there are any gaps in SOURCE_TB.ID field, this could create a referential integrity problem if it is a referenced key
The work around for this would be a cumbersome task:Before the SOURCE_TB.ID is dropped you need to identify existing gaps
After the SOURCE_TB.ID_II is renamed to SOURCE_TB.ID an update needs to be done on all the referencing tables to avoid any integrity issues.
If all the fields do not reside in a single page the operation would fail.
If the field SOURCE_TB.ID was a clustered Index and you plan to continue with it, it will take a long time to create the Index
There will be a down time in the Production Environment when implementing this.
Advantages
Only a column change will take place
System dependencies will still be in tack as no changes have taken place on the SOURCE_TB
It will not require a large amount of additional hard disk space to implement this compared to the next option.
Method 2 - Create the destination table with the same schema as the source table
The thinking behind this is to transfer all the data from SOURCE_TB to DESTINATION_TB and rename the DESTINATION_TB to SOURCE_TB.
Following are the steps that could be followed:
Create DESTINATION_TB table with the same schema as SOURCE_TB
Transfer the data from SOURCE_TB to DESTINATION_TB
Drop all the Foreign Keys referencing to SOURCE_TB if any
Alter all the objects with schemabinding if any
Drop the table SOURCE_TB
Rename the table DESTINATION_TB to SOURCE_TB
Recreate all the Foreign Keys to refer back to SOURCE_TB
Recreate all the Indexes on the SOURCE_TB
Recreate functions and views with schemabinding
Create all other constraints
The same steps could be followed if the change is done by the Enterprise Manager Tool.
The Code
CREATE TABLE DESTINATION_TB
(
ID INT IDENTITY(125006,1) NOT NULL
,DATE DATETIME
,COST MONEY
)
INSERT INTO DESTINATION_TB (ID, DATE, COST) SELECT ID ,DATE, COST FROM SOURCE_TB
DROP TABLE SOURCE_TB
EXEC sp_rename 'DESTINATION_TB' ,'SOURCE_TB'
Following counters were gathered from the profiler
Duration | CPU | Reads | Writes | |
---|---|---|---|---|
Creation of the Destination table | 855 | 0 | 64 | 17 |
Insert to Destination Table from Source | 3,856,785 | 719 | 115,957 | 396 |
Drop the Source table | 6809 | 0 | 279 | 17 |
Renaming the Destination table to Source | 362,363 | 0 | 171 | 0 |
Disadvantages
If the SOURCE_TB is a large table you have to be concern of the disk space availability.
As we are going to copy all the data from one table to another, you need to be aware of the temporary disk space requirement. So you will need to know the exact space required for the table.As the data is physically moved to the Destination_TB it may require just more than a simple “Insert”
You may need a batch wise “Insert” to the Destination_TB. This will avoid any locks from taking place and daily operation can proceed without any effect.There will be a down time in the Production Environment
Advantages
All the columns will be of the same sequence as the original Source_TB
In SQL Server 2005 you can do this in a much efficient manner.
It is expected, that you have the basic understanding of partitioning. Have a look at the following links:
http://www.databasejournal.com/features/mssql/article.php/3640891
The most important and interesting aspect of SQL Server 2005, is that a table belongs to a partition by default even though the table is not physically partitioned.
You can verify this by going through the sys.partitions table
SELECT * FROM sys.partitions
WHERE [object_id] = OBJECT_ID('SOURCE_TB')
Following are the steps:
Drop all the Foreign Keys referencing to SOURCE_TB if any
Alter all the objects with schemabinding if any
Drop all the Indexes from SOURCE_TB
Create the new table DESTINATION_TB with same schema as the SOURCE_TB
SWITCH the data between the tables
Drop the table Source_TB
Rename the table Destination_TB to Sourse_TB
Recreate all the Foreign Keys to refer to SOURCE_TB
Recreate all the Indexes on SOURCE_TB
Recreate functions, views with SCHEMABINDING
Create all other constraints
The code
CREATE TABLE DESTINATION_TB
(
ID INT IDENTITY(125006,1) NOT NULL
,DATE DATETIME
,COST MONEY
)
ALTER TABLE SOURCE_TB SWITCH TO DESTINATION_TB
DROP TABLE SOURCE_TB
EXEC sp_rename 'DESTINATION_TB' ,'SOURCE_TB'
Following counters were gathered from the profiler:
Duration | CPU | Reads | Writes | |
---|---|---|---|---|
Creation of the Destination table | 0 | 0 | 64 | 17 |
Do the switching of the partitions | 2 | 0 | 54 | 0 |
Drop the Source table | 1 | 16 | 100 | 0 |
Renaming the Destination table to Source | 259 | 140 | 519 | 2 |
What you need to know
At the time of making the switch, both source and destination tables should be having the same schema (This does not apply to the identity column).
No data will be physically moved from one location to another but the pointers will change, which sorts a lot of problem, locks from taking place.
Each time you SWITCH data from one to another partition, the data from the source partition will be zero.
Disadvantages
It’s only available in Enterprise Edition and Developer Edition.
Sysdependencies will be effected as the object ID’s will be swapped.
All objects with schemabinding will have to be altered.
All Foreign Keys will have to be created to repoint to SOURCE_TB again.
Advantages
Requires the least down time in the Production Environment compared to the previous options
Conclusion
Taking the resource usage and cost into consideration, using table partitioning will be the safest and best option.