Identity column versus RowID

Question

Identity column versus RowID

rash3554

SSCommitted

Points: 1984
More actions
September 7, 2016 at 1:38 pm

#313202

Hi there,
I have a table with 100 million records containing Employee information without a Unique key.
Create Table Employee
(
EmpName varchar(100),
EmpAddress1 varchar(max),
EmpAddress2 varchar(1000),
EmpAddress2 varchar(500),
EmpCity varchar(100),
EmpState varchar(50),
StartDate varchar(100),
EndDate varchar 100)
)
I wanted to create a full text search and hence needed to create a unique key.
I thought of 2 different ways of doing it.
1A. ALTER TABLE EMPLOYEE ADD RowID [int] NULL
1B. DECLARE @id INT
SET @id = 0
UPDATE [MYDB].[dbo].[Employee]
SET @id = RowID = @id + 1
OPTION ( MAXDOP 1 )
1C: ALTER TABLE [dbo].[Employee]
ALTER COLUMN RowID [int] NOT NULL
GO
1D: CREATE UNIQUE NONCLUSTERED INDEX [uixEmp-RowID]
ON [dbo].[Employee] (RowID)
GO
OR
Create Table Employee_2
(
ID int identity(1,1) not null,
EmpName varchar(100);
EmpAddress1 varchar(max);
EmpAddress2 varchar(1000);
EmpAddress2 varchar(500);
EmpCity varchar(100);
EmpState varchar(50);
StartDate Datetime,
EndDate Datetime
)
INSERT INTO Employee_2
(EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState,StartDate,EndDate)
SELECT EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState,StartDate,EndDate FROM dbo.Employee
I am not sure which one is better.
Any help is appreciated.
Thanks
MR

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Michael L John One Orange Chip Points: 27079 More actions · Answer 1

rash3554 (9/7/2016)
Hi there,
I have a table with 100 million records containing Employee information without a Unique key.
Create Table Employee
(
EmpName varchar(100);
EmpAddress1 varchar(max);
EmpAddress2 varchar(1000);
EmpAddress2 varchar(500);
EmpCity varchar(100);
EmpState varchar(50);
)
I wanted to create a full text search and hence needed to create a unique key.
I thought of 2 different ways of doing it.
1A. ALTER TABLE EMPLOYEE ADD RowID [int] NULL
1B. DECLARE @id INT
SET @id = 0
UPDATE [MYDB].[dbo].[Employee]
SET @id = RowID = @id + 1
OPTION ( MAXDOP 1 )
1C: ALTER TABLE [dbo].[Employee]
ALTER COLUMN RowID [int] NOT NULL
GO
1D: CREATE UNIQUE NONCLUSTERED INDEX [uixEmp-RowID]
ON [dbo].[Employee] (RowID)
GO
OR
Create Table Employee_2
(
ID int identity(1,1) not null,
EmpName varchar(100);
EmpAddress1 varchar(max);
EmpAddress2 varchar(1000);
EmpAddress2 varchar(500);
EmpCity varchar(100);
EmpState varchar(50);
)
INSERT INTO Employee_2
(EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState)
SELECT EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState FROM dbo.Employee
I am not sure which one is better.
Any help is appreciated.
Thanks
MR

In both cases, the names are somewhat vague. What's wrong with "Employee_ID"?

The code in 1B does not look correct.

How about option 3? Depending upon the number of rows, and the downtime you can withstand:

ALTER TABLE Employee ADD Employee_ID int IDENTITY(1,1)

GO

CREATE UNIQUE NONCLUSTERED INDEX [uixEmployee]

ON [dbo].[Employee] (Employee_ID)

GO

Otherwise, I would probably go with option 2. You may be able to simplify this.

SELECT *

INTO Employee_Tmp

FROM Employee

GO

ALTER TABLE Employee_Tmp ADD Employee_ID int IDENTITY(1,1)

Go

sp_rename 'Employee', 'Employee_Old'

GO

sp_rename 'Employee_temp', 'Employee'

GO

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

tshad SSCertifiable Points: 5905 More actions · Answer 2

I agree.

I usually have the ID have the same name as the table (Employee or Employees table and EmployeeID). I wouldn't put the underscore to separate the ID from the Employee as you already have separation with the caps.

I also agree with option 2 (but watch out for Celko - he might be lurking and hates identities) :-D.

We have used identities on hundreds of tables with many clients. I haven't seen a client yet who does not use identities in there databases.

Grant Fritchey SSC Guru Points: 399379 More actions · Answer 3

Celko or not, just adding an identity column to a table doesn't address the business logic. Do you have multiple rows with the same employee? There must be a logically unique constraint on most tables, even if you don't use it as the primary key or the clustered index.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

rash3554 SSCommitted Points: 1984 More actions · Answer 4

Sorry needed to edit table columns. I had forgotten to add to columns.

So there are multiple records for same employee:

INSERT INTO dbo.Employees(EmpName ,EmpAddress1 ,EmpAddress2 ,EmpAddress2 ,

EmpCity ,EmpState ,StartDate,EndDate )

Values ('Ann','1 Main Street',NULL,NULL,'SALEM','PA','December 1990','2002-06-01')

INSERT INTO dbo.Employees(EmpName ,EmpAddress1 ,EmpAddress2 ,EmpAddress2 ,

EmpCity ,EmpState ,StartDate,EndDate )

Values ('Ann','122 Box Street',NULL,NULL,'DANBURY','CT','2002-06-02','Present')

tshad SSCertifiable Points: 5905 More actions · Answer 5

Depending on the logic, you might want to have multiple tables.

Just having a unique id to have one is not normally useful. What does it signify. In your case, it seems to signify a name and an address. But that means that you have Ann in two addresses. How do you know if this is the same "Ann". What if there are two "Ann"'s. How do you differentiate from the two.

Not knowing your use, you might want to have two tables.

Create Table Employees

(

EmployeeID int identity(1,1) not null,

EmpName varchar(100);

Age int,

StartDate DateTime not null,

EndDate DateTime,

)

Create Table Addresses

(

AddressID int identity(1,1) not null,

EmployeeID int not null,

EmpAddress1 varchar(max);

EmpAddress2 varchar(1000);

EmpAddress2 varchar(500);

EmpCity varchar(100);

EmpState varchar(50);

StartDate Datetime,

EndDate Datetime

)

Put your information specific to the employee in the employee in the employee table and the information specific to the addresses in the addresses table.

Now you can put your EmployeeID in other tables as foreign keys.

Michael L John One Orange Chip Points: 27079 More actions · Answer 6

tshad (9/8/2016)
Depending on the logic, you might want to have multiple tables.
Just having a unique id to have one is not normally useful. What does it signify. In your case, it seems to signify a name and an address. But that means that you have Ann in two addresses. How do you know if this is the same "Ann". What if there are two "Ann"'s. How do you differentiate from the two.
Not knowing your use, you might want to have two tables.
Create Table Employees
(
EmployeeID int identity(1,1) not null,
EmpName varchar(100);
Age int,
StartDate DateTime not null,
EndDate DateTime,
)
Create Table Addresses
(
AddressID int identity(1,1) not null,
EmployeeID int not null,
EmpAddress1 varchar(max);
EmpAddress2 varchar(1000);
EmpAddress2 varchar(500);
EmpCity varchar(100);
EmpState varchar(50);
StartDate Datetime,
EndDate Datetime
)
Put your information specific to the employee in the employee in the employee table and the information specific to the addresses in the addresses table.
Now you can put your EmployeeID in other tables as foreign keys.

There may be an argument about this, but this structure looks a bit flawed.

For starters, an address is an attribute of an employee. An employee is not an attribute of an address. That is what you are proposing.

The employee table should have the foreign key to the address table.

With this structure, you have also created a one-to-one relationship. I suspect that there would a one to many relationship, an employee can have many addresses.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Smendle Hall of Fame Points: 3987 More actions · Answer 7

Michael L John (9/8/2016)
tshad (9/8/2016)
Depending on the logic, you might want to have multiple tables.
Just having a unique id to have one is not normally useful. What does it signify. In your case, it seems to signify a name and an address. But that means that you have Ann in two addresses. How do you know if this is the same "Ann". What if there are two "Ann"'s. How do you differentiate from the two.
Not knowing your use, you might want to have two tables.
Create Table Employees
(
EmployeeID int identity(1,1) not null,
EmpName varchar(100);
Age int,
StartDate DateTime not null,
EndDate DateTime,
)
Create Table Addresses
(
AddressID int identity(1,1) not null,
EmployeeID int not null,
EmpAddress1 varchar(max);
EmpAddress2 varchar(1000);
EmpAddress2 varchar(500);
EmpCity varchar(100);
EmpState varchar(50);
StartDate Datetime,
EndDate Datetime
)
Put your information specific to the employee in the employee in the employee table and the information specific to the addresses in the addresses table.
Now you can put your EmployeeID in other tables as foreign keys.
There may be an argument about this, but this structure looks a bit flawed.
For starters, an address is an attribute of an employee. An employee is not an attribute of an address. That is what you are proposing.
The employee table should have the foreign key to the address table.
With this structure, you have also created a one-to-one relationship. I suspect that there would a one to many relationship, an employee can have many addresses.

Nothing a couple of well placed unique constraints couldn't fix....

tshad SSCertifiable Points: 5905 More actions · Answer 8

I think that is what the original design was doing - many addresses for one employee.

It looks like the user wanted to keep track of each address an employee was at and the duration of the stay at each address.

So yes, there would be a one to many relationship. Your way, you could have many addresses but an employee could only be at one address.

In that case, I would even have an address table. I would just leave it the way the user had it, with the user address in the employee table.

To go further, if you only knew that there would be two address (home and billing, for example), you could do it either way.

Chris Wooding SSCarpal Tunnel Points: 4615 More actions · Answer 9

The structure proposed by tshad allows employees to flat-share. If you want overall flexibility, you need 3 tables; Employee, Address and EmployeeAddressLink. The start and end dates would then belong on the link table (along with address type, if you have separate billing and residential addresses).

tshad SSCertifiable Points: 5905 More actions · Answer 10

That would be correct for a many to many relationship and would be applicable only if you also want take into account that more than one employee can be at the same address, which would be preferable to multiple instances of the same address. And would be necessary, in this case, since the start and end dates would be different for two employees at the same address.