Primary key - Integer? or Varchar?

  • The main reason for setting primary key as auto generated id is that the sql server only will handle the generation of the key... which will take care of the duplicates... it will never insert a duplicate id.

    If you choose it as int or char ... you need to handle the key generation... if sometimes while inserting the key if duplicates are inserted it will raise the error and that insertion will not complete.

    So it is recommended to use identity column as primary key.

  • btw:

    Did anyone ever test the "ok i will determine the max value myself and add 1" .... and conclude sqlserver actually performs a FULL TABLE SCAN to resolve it.

    afaik DB2 V3 or 4 on mainframe back in 1994 or so did resolve this issue (determine Min/Max of a indexed column ) with approx. 4 i/o on the index B-tree.

    create table dbo.T_MaxTest ( TheNumber varchar(15) not null primary key, myfiller char(1000) not null default 'abc', tscrea datetime default getdate())

    go

    -- pk = clustered index on dbo.T_MaxTest (TheNumber);

    go

    insert into dbo.T_MaxTest (TheNumber)

    Select TOP 110000 N

    from dbo.Tally

    order by convert(varchar(15),N);

    go

    /*

    -- is tested but did not help out !

    dbcc dbreindex ('dbo.T_MaxTest')

    checkpoint

    dbcc dropcleanbuffers

    dbcc freeproccache

    set statistics io on

    set statistics time on

    */

    Select max(TheNumber)

    --, count(*) as aantal

    from dbo.T_MaxTest

    ;

    GO

    Select max(TheNumber)

    from dbo.T_MaxTest

    WHERE TheNumber > 500000;

    GO

    create table dbo.T_MaxTest2 ( TheNumber varchar(15) not null, tscrea datetime default getdate())

    go

    Create NONclustered index CI_MaxTest2 on dbo.T_MaxTest2 (TheNumber);

    go

    Insert into dbo.T_MaxTest2 (TheNumber)

    Select TheNumber

    from dbo.T_MaxTest

    go

    Select max(TheNumber)

    from dbo.T_MaxTest2;

    GO

    Select max(TheNumber)

    from dbo.T_MaxTest2

    WHERE TheNumber > 500000;

    GO

    create table dbo.T_MaxTest3 ( TheNumber int not null, tscrea datetime default getdate())

    go

    Create clustered index CI_MaxTest3 on dbo.T_MaxTest3 (TheNumber);

    go

    Insert into dbo.T_MaxTest3

    Select TheNumber

    from dbo.T_MaxTest

    go

    Select max(TheNumber)

    from dbo.T_MaxTest3;

    GO

    Select max(TheNumber)

    from dbo.T_MaxTest3

    WHERE TheNumber > 500000;

    GO

    create table dbo.T_MaxTest4 ( TheNumber char(15) not null, tscrea datetime default getdate())

    go

    Create clustered index CI_MaxTest4 on dbo.T_MaxTest4 (TheNumber);

    go

    Insert into dbo.T_MaxTest4

    Select TheNumber

    from dbo.T_MaxTest

    go

    Select max(TheNumber)

    from dbo.T_MaxTest4;

    GO

    Select max(TheNumber)

    from dbo.T_MaxTest4

    WHERE TheNumber > 500000;

    GO

    drop table T_MaxTest4;

    drop table T_MaxTest3;

    drop table T_MaxTest2;

    drop table T_MaxTest;

    :sick:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Michael Valentine Jones (11/25/2008)


    Lynn Pettis (11/25/2008)

    ...Besides, this discussion has remained relatively positive (so far)...

    In that case, let me jump in. 😀

    There are many opinions of this subject, but I am firmly in the IDENTITY integer clustered primary key camp.

    ...

    sorry, just curious : design of identity as PK & timestamp/datetime [convert(datetime, getdate(), 21)] as clustered key for those heavily-relied-on-date table, i think it will improve the perf on Select/Update/Delete more, and the purpose of table partitioning, right?

  • I you're quoting URLs, may I give a plug for Database Design Workbench - Keys (19 September 2008) [/url] which we wrote because we felt a bit confused by the nomenclature for keys, so we wrote out a glossary, and then played around with all kinds of keys. I ended up being surprised how usable some of Joe Celko's suggestions really are. I hope the article helps. it certainly cleared our thoughts

    To answer the original question, We couldn't find any difference between integers and Varchars for use as keys in our million-row test databases.

    Best wishes,
    Phil Factor

  • gan (11/27/2008)


    Michael Valentine Jones (11/25/2008)


    Lynn Pettis (11/25/2008)

    ...Besides, this discussion has remained relatively positive (so far)...

    In that case, let me jump in. 😀

    There are many opinions of this subject, but I am firmly in the IDENTITY integer clustered primary key camp.

    ...

    sorry, just curious : design of identity as PK & timestamp/datetime [convert(datetime, getdate(), 21)] as clustered key for those heavily-relied-on-date table, i think it will improve the perf on Select/Update/Delete more, and the purpose of table partitioning, right?

    If that was a question, I have no idea what you are asking.

  • gan (11/27/2008)


    Michael Valentine Jones (11/25/2008)


    Lynn Pettis (11/25/2008)

    ...Besides, this discussion has remained relatively positive (so far)...

    In that case, let me jump in. 😀

    There are many opinions of this subject, but I am firmly in the IDENTITY integer clustered primary key camp.

    ...

    sorry, just curious : design of identity as PK & timestamp/datetime [convert(datetime, getdate(), 21)] as clustered key for those heavily-relied-on-date table, i think it will improve the perf on Select/Update/Delete more, and the purpose of table partitioning, right?

    It all depends on how you wish to partition. That in itself is an entirely different conversation, and usually on that comes back to "what makes sense in your scenario". If your partitioning scheme is to keep most of the updating into a single partition, then making your partition scheme match up to your clustered index might make sense. (This simplifies removal of archived/aged out data). If on the other hand, your goal with partitioning is to improve perf by spreading out the reads and writes, then doing it by date might not make sense (since it might create a "hot spot" continuously getting hammered on.

    This however is likely to become a lengthy conversation, which might do better in its own thread (so as to not hijack this one).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • sorry, just curious : design of identity as PK & timestamp/datetime [convert(datetime, getdate(), 21)] as clustered key for those heavily-relied-on-date table, i think it will improve the perf on Select/Update/Delete more, and the purpose of table partitioning, right?

    It all depends on how you wish to partition. That in itself is an entirely different conversation, and usually on that comes back to "what makes sense in your scenario". If your partitioning scheme is to keep most of the updating into a single partition, then making your partition scheme match up to your clustered index might make sense. (This simplifies removal of archived/aged out data). If on the other hand, your goal with partitioning is to improve perf by spreading out the reads and writes, then doing it by date might not make sense (since it might create a "hot spot" continuously getting hammered on.

    This however is likely to become a lengthy conversation, which might do better in its own thread (so as to not hijack this one).

    Thanks 😛

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply