Identity column versus RowID

  • Hi there,

    I have a table with 100 million records containing Employee information without a Unique key.

    Create Table Employee

    (

    EmpName varchar(100),

    EmpAddress1 varchar(max),

    EmpAddress2 varchar(1000),

    EmpAddress2 varchar(500),

    EmpCity varchar(100),

    EmpState varchar(50),

    StartDate varchar(100),

    EndDate varchar 100)

    )

    I wanted to create a full text search and hence needed to create a unique key.

    I thought of 2 different ways of doing it.

    1A. ALTER TABLE EMPLOYEE ADD RowID [int] NULL

    1B. DECLARE @id INT

    SET @id = 0

    UPDATE [MYDB].[dbo].[Employee]

    SET @id = RowID = @id + 1

    OPTION ( MAXDOP 1 )

    1C: ALTER TABLE [dbo].[Employee]

    ALTER COLUMN RowID [int] NOT NULL

    GO

    1D: CREATE UNIQUE NONCLUSTERED INDEX [uixEmp-RowID]

    ON [dbo].[Employee] (RowID)

    GO

    OR

    Create Table Employee_2

    (

    ID int identity(1,1) not null,

    EmpName varchar(100);

    EmpAddress1 varchar(max);

    EmpAddress2 varchar(1000);

    EmpAddress2 varchar(500);

    EmpCity varchar(100);

    EmpState varchar(50);

    StartDate Datetime,

    EndDate Datetime

    )

    INSERT INTO Employee_2

    (EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState,StartDate,EndDate)

    SELECT EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState,StartDate,EndDate FROM dbo.Employee

    I am not sure which one is better.

    Any help is appreciated.

    Thanks

    MR

  • rash3554 (9/7/2016)


    Hi there,

    I have a table with 100 million records containing Employee information without a Unique key.

    Create Table Employee

    (

    EmpName varchar(100);

    EmpAddress1 varchar(max);

    EmpAddress2 varchar(1000);

    EmpAddress2 varchar(500);

    EmpCity varchar(100);

    EmpState varchar(50);

    )

    I wanted to create a full text search and hence needed to create a unique key.

    I thought of 2 different ways of doing it.

    1A. ALTER TABLE EMPLOYEE ADD RowID [int] NULL

    1B. DECLARE @id INT

    SET @id = 0

    UPDATE [MYDB].[dbo].[Employee]

    SET @id = RowID = @id + 1

    OPTION ( MAXDOP 1 )

    1C: ALTER TABLE [dbo].[Employee]

    ALTER COLUMN RowID [int] NOT NULL

    GO

    1D: CREATE UNIQUE NONCLUSTERED INDEX [uixEmp-RowID]

    ON [dbo].[Employee] (RowID)

    GO

    OR

    Create Table Employee_2

    (

    ID int identity(1,1) not null,

    EmpName varchar(100);

    EmpAddress1 varchar(max);

    EmpAddress2 varchar(1000);

    EmpAddress2 varchar(500);

    EmpCity varchar(100);

    EmpState varchar(50);

    )

    INSERT INTO Employee_2

    (EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState)

    SELECT EmpName,EmpAddress1,EmpAddress2,EmpAddress2 ,EmpCity,EmpState FROM dbo.Employee

    I am not sure which one is better.

    Any help is appreciated.

    Thanks

    MR

    In both cases, the names are somewhat vague. What's wrong with "Employee_ID"?

    The code in 1B does not look correct.

    How about option 3? Depending upon the number of rows, and the downtime you can withstand:

    ALTER TABLE Employee ADD Employee_ID int IDENTITY(1,1)

    GO

    CREATE UNIQUE NONCLUSTERED INDEX [uixEmployee]

    ON [dbo].[Employee] (Employee_ID)

    GO

    Otherwise, I would probably go with option 2. You may be able to simplify this.

    SELECT *

    INTO Employee_Tmp

    FROM Employee

    GO

    ALTER TABLE Employee_Tmp ADD Employee_ID int IDENTITY(1,1)

    Go

    sp_rename 'Employee', 'Employee_Old'

    GO

    sp_rename 'Employee_temp', 'Employee'

    GO

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • I agree.

    I usually have the ID have the same name as the table (Employee or Employees table and EmployeeID). I wouldn't put the underscore to separate the ID from the Employee as you already have separation with the caps.

    I also agree with option 2 (but watch out for Celko - he might be lurking and hates identities) :-D.

    We have used identities on hundreds of tables with many clients. I haven't seen a client yet who does not use identities in there databases.

  • Celko or not, just adding an identity column to a table doesn't address the business logic. Do you have multiple rows with the same employee? There must be a logically unique constraint on most tables, even if you don't use it as the primary key or the clustered index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Sorry needed to edit table columns. I had forgotten to add to columns.

    So there are multiple records for same employee:

    INSERT INTO dbo.Employees(EmpName ,EmpAddress1 ,EmpAddress2 ,EmpAddress2 ,

    EmpCity ,EmpState ,StartDate,EndDate )

    Values ('Ann','1 Main Street',NULL,NULL,'SALEM','PA','December 1990','2002-06-01')

    INSERT INTO dbo.Employees(EmpName ,EmpAddress1 ,EmpAddress2 ,EmpAddress2 ,

    EmpCity ,EmpState ,StartDate,EndDate )

    Values ('Ann','122 Box Street',NULL,NULL,'DANBURY','CT','2002-06-02','Present')

  • Depending on the logic, you might want to have multiple tables.

    Just having a unique id to have one is not normally useful. What does it signify. In your case, it seems to signify a name and an address. But that means that you have Ann in two addresses. How do you know if this is the same "Ann". What if there are two "Ann"'s. How do you differentiate from the two.

    Not knowing your use, you might want to have two tables.

    Create Table Employees

    (

    EmployeeID int identity(1,1) not null,

    EmpName varchar(100);

    Age int,

    StartDate DateTime not null,

    EndDate DateTime,

    )

    Create Table Addresses

    (

    AddressID int identity(1,1) not null,

    EmployeeID int not null,

    EmpAddress1 varchar(max);

    EmpAddress2 varchar(1000);

    EmpAddress2 varchar(500);

    EmpCity varchar(100);

    EmpState varchar(50);

    StartDate Datetime,

    EndDate Datetime

    )

    Put your information specific to the employee in the employee in the employee table and the information specific to the addresses in the addresses table.

    Now you can put your EmployeeID in other tables as foreign keys.

  • tshad (9/8/2016)


    Depending on the logic, you might want to have multiple tables.

    Just having a unique id to have one is not normally useful. What does it signify. In your case, it seems to signify a name and an address. But that means that you have Ann in two addresses. How do you know if this is the same "Ann". What if there are two "Ann"'s. How do you differentiate from the two.

    Not knowing your use, you might want to have two tables.

    Create Table Employees

    (

    EmployeeID int identity(1,1) not null,

    EmpName varchar(100);

    Age int,

    StartDate DateTime not null,

    EndDate DateTime,

    )

    Create Table Addresses

    (

    AddressID int identity(1,1) not null,

    EmployeeID int not null,

    EmpAddress1 varchar(max);

    EmpAddress2 varchar(1000);

    EmpAddress2 varchar(500);

    EmpCity varchar(100);

    EmpState varchar(50);

    StartDate Datetime,

    EndDate Datetime

    )

    Put your information specific to the employee in the employee in the employee table and the information specific to the addresses in the addresses table.

    Now you can put your EmployeeID in other tables as foreign keys.

    There may be an argument about this, but this structure looks a bit flawed.

    For starters, an address is an attribute of an employee. An employee is not an attribute of an address. That is what you are proposing.

    The employee table should have the foreign key to the address table.

    With this structure, you have also created a one-to-one relationship. I suspect that there would a one to many relationship, an employee can have many addresses.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John (9/8/2016)


    tshad (9/8/2016)


    Depending on the logic, you might want to have multiple tables.

    Just having a unique id to have one is not normally useful. What does it signify. In your case, it seems to signify a name and an address. But that means that you have Ann in two addresses. How do you know if this is the same "Ann". What if there are two "Ann"'s. How do you differentiate from the two.

    Not knowing your use, you might want to have two tables.

    Create Table Employees

    (

    EmployeeID int identity(1,1) not null,

    EmpName varchar(100);

    Age int,

    StartDate DateTime not null,

    EndDate DateTime,

    )

    Create Table Addresses

    (

    AddressID int identity(1,1) not null,

    EmployeeID int not null,

    EmpAddress1 varchar(max);

    EmpAddress2 varchar(1000);

    EmpAddress2 varchar(500);

    EmpCity varchar(100);

    EmpState varchar(50);

    StartDate Datetime,

    EndDate Datetime

    )

    Put your information specific to the employee in the employee in the employee table and the information specific to the addresses in the addresses table.

    Now you can put your EmployeeID in other tables as foreign keys.

    There may be an argument about this, but this structure looks a bit flawed.

    For starters, an address is an attribute of an employee. An employee is not an attribute of an address. That is what you are proposing.

    The employee table should have the foreign key to the address table.

    With this structure, you have also created a one-to-one relationship. I suspect that there would a one to many relationship, an employee can have many addresses.

    Nothing a couple of well placed unique constraints couldn't fix....

  • I think that is what the original design was doing - many addresses for one employee.

    It looks like the user wanted to keep track of each address an employee was at and the duration of the stay at each address.

    So yes, there would be a one to many relationship. Your way, you could have many addresses but an employee could only be at one address.

    In that case, I would even have an address table. I would just leave it the way the user had it, with the user address in the employee table.

    To go further, if you only knew that there would be two address (home and billing, for example), you could do it either way.

  • The structure proposed by tshad allows employees to flat-share. If you want overall flexibility, you need 3 tables; Employee, Address and EmployeeAddressLink. The start and end dates would then belong on the link table (along with address type, if you have separate billing and residential addresses).

  • That would be correct for a many to many relationship and would be applicable only if you also want take into account that more than one employee can be at the same address, which would be preferable to multiple instances of the same address. And would be necessary, in this case, since the start and end dates would be different for two employees at the same address.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply