Urgent: Adding Primary Key to the 70 Million table

  • Hi,

    I have 70million records table having email column.I want to add primary key to the email column.

    plssssssssssssss suggest me the efficient way to do this in sqlserver 2005.

    Thanks in advance...

  • I am assuming the email field is a varchar, adding a primary key to such a field might not be the best thing. I hope you have a test system somewhere to try this but I would look at creating a new table that is partitioned in some way, migrating the data into that table and renaming. Can you provide the full schema of the table?

    Facts are stubborn things, but statistics are more pliable - Mark Twain
    Carolyn
    SQLServerSpecialists[/url]

  • Below is the total table with 70 Million records

    Total ( Email,FirstName,LastName,Address,City,State,Zipcode,Gender,DOB ,

    Phone,WebAddress1,Webaddress2,IPAddress,DateTime,Interest

    )

    Ofcourse Email is nvarchar(255)

    I do not want duplicates in that, so wanted to add Primary Key.

    I need the best way for adding that..

    My idea is like

    ALTER TABLE total ADD PRIMARY KEY (Email)

    the above is enough or do we need to mention clustered also

    ALTER TABLE total ADD CONSTRAINT pk_email PRIMARY KEY CLUSTERED (Email)

    Plsssssssssssss suggest me the best way with syntax

  • An nvarchar(255) is a rather bad choice for a clustered index. Clustered indexes should be narrow (among other things). 512 bytes is not narrow.

    If there an existing primary key on the table? Is there an existing clustered index on the table? If you just want to enforce uniqueness, you can define a unique constraint. Is the email column nullable? If so, you can't put a primary key on it, but a unique constraint (or unique index) is allowed

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No Email column is not Nullable.

    No other primary key or clustered index existing in the table.

    This is the first one we are going to put.

    Do u mean Unique is better than primary key or how is that

    Thnx in advance..

  • DBTeam (11/18/2009)


    No Email column is not Nullable.

    No other primary key or clustered index existing in the table.

    This is the first one we are going to put.

    Good.. you have everything clear to create a prmiary key, but wait till you read the entire reply.

    Do u mean Unique is better than primary key or how is that

    In your case, it is better to have a unique constraint enabled rather than having a primary key (Since it is nvarchar(255).


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • So it won't create any problem right,

    even though if we insert million of records into that in the future.

    B'coz earlier we kept Primary key after that insertion process got delayed(taking days together), so we dropped primary key constraint.

    Unique constraint will not delay the insertion process right...

    Pls help me reg this.......

  • DBTeam (11/18/2009)


    So it won't create any problem right,

    even though if we insert million of records into that in the future.

    Maybe. You need to test.

    Unique constraint will not delay the insertion process right...

    Maybe. You need to test, especially since it seems you're fairly new to this.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • DBTeam (11/18/2009)


    So it won't create any problem right,

    even though if we insert million of records into that in the future.

    It will create a problem when a duplicate record is inserted, other than that no major problems should occur. Of course performance problems may arise at a later time, but that may or may not be because of this.

    B'coz earlier we kept Primary key after that insertion process got delayed(taking days together), so we dropped primary key constraint.

    SQL Server is designed to work with tables containing millions of records and we all have been using them with primary key, and unique key.

    By the way you said earlier it was delaying for days together, how many rows already existed in the table and how many rows were you inserting? There might be many reasons for that to be slow.

    To sum it up, I repeat the same advice "test it".


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • If there was clustered primary key on email, it could have been lagging due to page splits during inserts. I think in this case a nonclustered unique constraint (to ensure uniqueness) would be better.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • I think the poster just "clicked" on the PK button to create the original PK and by defualt it's created as a clustered index. That would account for the slow inserts because of what was already mentioned (page splits, moving the CI column in order,...) This constant re arraigment of the data will really slow down.

    Definitely do not create a clustered index on that field. Do as mentioned above and create a Unique Index on the email column.

  • Thank u So much for all ur sugesstions,

    i'll create unique index only

  • Bru Medishetty (11/18/2009)


    In your case, it is better to have a unique constraint enabled rather than having a primary key (Since it is nvarchar(255).

    Ummmm.... not necessarily true. A non-clustered PK would work just fine here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • DBTeam (11/20/2009)


    Thank u So much for all ur sugesstions,

    i'll create unique index only

    If you have no clustered index, index maintance won't be worth much on a 70 million row heap. Other than the email address, is there anything (preferably something temporal if there are a lot of inserts) you could build a clustered index on?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply