Handling of Primary Keys

  • Hi folks, this could be my first question on here (although I have replied to one or two). Most of the time BOL or SSC gives me the answers I need so I don't feel the need to post. I am not sure if there is a definitive answer so opinions are greatly appreciated.

    I will apologise now for not putting the tables up in required format but it is only the 1 field I am concerned with.

    I have table A and column 1 is ID(int) with a Primary Key set on that column. An external contractor has developed a system for us whereby he uses a second table to hold Primary Key values from Table A (and other tables). So Table B has two columns, [table_name] and [PK_value] (no primary key on this table, that would be too much to ask for).

    A stored procedure is called by the application to read table B, increment the column [PK_value] by 1 and then use that value to insert a new record into table A.

    Am I missing something here because this seems to be really round the houses? If we accept that the IDENTITY option wasn't used at Design Time (which would have been my preferred option in this case) and we don't want to add it in retrospectively, why couldn't a local variable be used as in the code below?

    declare @PK_value as int

    select @PK_value = max(id) from test_table

    select @PK_value = @PK_value +1

    insert into test_table

    values (@PK_value, 'John', 'Ecc')

    (the code may not be good but works and I am still learning):-)

    The contractor has said it was set up like that in the original design because it was 'industry standard', it may be, I don't know. That is why I would like some opinions.

  • Ah, the Sequence Table. If not implemented correctly, you're going to be chasing deadlock problems until you're blue in the face. I can't say it any better than Jeff, so check out his comments on Sequence Tables in his long post mid-way down the page.

    http://www.sqlservercentral.com/Forums/Topic286732-8-1.aspx

    Tim Wilkinson has the solution on page 2.

    On a personal note, if you're consultant is attempting to implement a Sequence Table into your application and he is not following these guidelines, I would question his experience and advice.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • It was an industry standard... about 15 years ago.

    In addition to the problems outlined above, you may hit serious problems with rogue data. I had to deal with a system that had this design. Every time we did bulk loads we had to be EXTREMELY careful to get the ID table updated or we started running into constraint errors all over the place. It was a HUGE pain with very little benefit.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I have seen this strategy used before to help with merging disconnected datasets.

    The reason they do not use the PK + 1 is probably because of the possibilty of inserting bulk amounts of data.

    They reserve a certain number of keys and then the app can continue processing while they perform the bulk operation.

    I prefer the Identity key with a unique constraint usually, unless it is a simple natural key. it depends on your needs.

  • Thanks for the comments Guys, it just didn't feel right to me, and I know I can be a stubborn pain at times if I haven't got clarity on something.

    I don't think we'll get anything sorted retrospectively but there is a Project Mark 2 coming up and I will be a lot more involved on the SQL side from the design level. There has been limited SQL involvement in the Company previously and my boss is determined to get everything tightened up.

    John, thanks for the link, I will make the effort to go read it.:-)

  • Has anybody come across issues of the identity function in T-Sql (using IDENTITY(1,1) I guess) not producing sequential entries. This was one of the reasons (I have been told) for not using IDENTITY in the original design but I have been unable to find anything on the net. I will have another look later but thought somebody may already be aware of this issue?

  • Has anybody come across issues of the identity function in T-Sql (using IDENTITY(1,1) I guess) not producing sequential entries.

    Yes, frequently. If you create a row (in a table with an IDENTITY column), then roll back the transaction, that identity is not re-used, hence the possible gaps in the sequence.

  • Thanks. Logical answer. Could anything else account for it (without direct manipulation)?

  • Sure, errors will cause gaps, deletions, anything that causes an incomplete transaction, implicit or explicit, can lead to gaps. If you need a perfectly flawless incrementally increasing number, with no gaps, you have to create one (or borrow one of the many listed in the Scripts section here at SSC).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • John Rowan (6/18/2009)


    Ah, the Sequence Table. If not implemented correctly, you're going to be chasing deadlock problems until you're blue in the face. I can't say it any better than Jeff, so check out his comments on Sequence Tables in his long post mid-way down the page.

    http://www.sqlservercentral.com/Forums/Topic286732-8-1.aspx

    The key phrase there is 'if not implemented correctly' 🙂

    I can't believe that no-one has said 'it depends' yet - because it does. Sequence tables have their place in solving particular problems which are hard to do with an IDENTITY-based scheme. Implemented correctly, they can be extremely concurrent and have excellent performance.

    Rather than take this thread too far off course, I have added comments and a fairly comprehensive demo script to Jeff's post - as referenced in your quote above.

    Paul

  • I would question their use even when implemented correctly. IDENTITY is the way to go in SQL Server when for surrogate key creation. Can someone give me a business case for 'requiring' key values to be sequential w/o gaps? If so, there's a disconnect between the DB developers and application designers. Surrogate key use differs a bit different from natural key use.

    Let me explain. Natural keys or business keys contain values that have a meaningful interpretation to the end user. The end user cares about the values in the key because the data carries a meaning that adds value to the user. When surrogate keys are used, they represent an arbitrary value that is used only to assign uniqueness to the row and to relate that row to other entities. The end user should not care about the data in that column. In fact, the end user should never see the data in that column. Surrogate keys have no meaning to the end user so there is no value in exposing them to the user. In this scenario, the DB designer would create a unique index on the canidate or business key (which would be the same business key as the previous example) and the user would still use that to select from the table.

    With that said, if someone said that they needed the rows to be in 'order' without gaps, that is something that you would give them as you returned the data using ROW_NUMBER(). You would not need to store them in the database that way.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (6/19/2009)


    I would question their use even when implemented correctly.

    Heh. Well unless you are $deity posting under a psedonym, I think we're going to need more than just that statement! 🙂

    Ok. Let's say you have a database which is connected to a middle-tier which is responsible for the business rules. Each thread on this middle-tier generates the procedure calls and parameter values necessary to perform some business transactions - whatever they might be.

    Each transaction involves multiple objects (tables) and foreign key relationships. The database table design (with IDENTITY columns) means that the next call to a procedure requires the identity value(s) inserted by the previous procedure.

    Calling each procedure sequentially and using the identity value returned in subsequent procedures is clearly not going to scale well. Consider the number of network round trips involved.

    The usual solution is to pre-allocate a range of ID values for each object to each thread, let the application construct a bunch of calls with the parameter values already filled in, and chuck that batch at the SQL Server.

    How would you accommodate that requirement with IDENTITY columns?

    John Rowan (6/19/2009)


    Can someone give me a business case for 'requiring' key values to be sequential w/o gaps?

    A business reason? For something internal to the database that the users never see? Not really.

    There may be times when guaranteed sequential ids would be useful internally. Comparing the previous or following row efficiently for large tables springs to mind...

    John Rowan (6/19/2009)


    If so, there's a disconnect between the DB developers and application designers.

    Shocking! I do hope that never happens in the real world 😀

    John Rowan (6/19/2009)


    Surrogate key use differs a bit different from natural key use.

    Indeed. But let's not go down that track. Surrogate keys are generally a good idea yes.

    John Rowan (6/19/2009)


    With that said, if someone said that they needed the rows to be in 'order' without gaps, that is something that you would give them as you returned the data using ROW_NUMBER(). You would not need to store them in the database that way.

    Or you could use order by and let the application do the numbers. ROW_NUMBER is not for free you know - especially if the data needs to be sorted.

    Cheers,

    Paul

  • Paul White (6/19/2009)


    John Rowan (6/19/2009)


    I would question their use even when implemented correctly.

    Heh. Well unless you are $deity posting under a psedonym, I think we're going to need more than just that statement! 🙂

    Ok. Let's say you have a database which is connected to a middle-tier which is responsible for the business rules. Each thread on this middle-tier generates the procedure calls and parameter values necessary to perform some business transactions - whatever they might be.

    Each transaction involves multiple objects (tables) and foreign key relationships. The database table design (with IDENTITY columns) means that the next call to a procedure requires the identity value(s) inserted by the previous procedure.

    Calling each procedure sequentially and using the identity value returned in subsequent procedures is clearly not going to scale well. Consider the number of network round trips involved.

    The usual solution is to pre-allocate a range of ID values for each object to each thread, let the application construct a bunch of calls with the parameter values already filled in, and chuck that batch at the SQL Server.

    How would you accommodate that requirement with IDENTITY columns?

    At some point, you still need to query the DB to determine which ranges belong to which threads and which values within those ranges correspond to table data that matches the criteria for the business transaction, right?

    Why not make one call to the DB to get back the IDENTITY values for the rows that meet the business criteria and use the business logic in the middle tier to evenly distribute those values to the threads. Better yet, create a service broker queue and make one procedure call to gather the IDENTITY values that meet the business transaction criteria and throw all of those values into the service broker queue. The activation SPs on the SB queue will replace the SPs that would have been called by your middle-tier, round trips would be reduced to one, SQL Server would handle scaling the SP call threads via service broker, and you show your boss how by taking advantage of the functionality built into SQL Server for scaling can save time, money, and allow for a much more portable solution.

    John Rowan (6/19/2009)


    Can someone give me a business case for 'requiring' key values to be sequential w/o gaps?

    A business reason? For something internal to the database that the users never see? Not really.

    There may be times when guaranteed sequential ids would be useful internally. Comparing the previous or following row efficiently for large tables springs to mind...

    I wouldn’t consider this a solid reason to use a sequence table. This can be done just fine w/o one.

    John Rowan (6/19/2009)


    With that said, if someone said that they needed the rows to be in 'order' without gaps, that is something that you would give them as you returned the data using ROW_NUMBER(). You would not need to store them in the database that way.

    Or you could use order by and let the application do the numbers. ROW_NUMBER is not for free you know - especially if the data needs to be sorted.

    Agreed.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (6/19/2009)


    At some point, you still need to query the DB to determine which ranges belong to which threads and which values within those ranges correspond to table data that matches the criteria for the business transaction, right?

    No, not at all. The idea is that each thread preallocates a range of IDs for each table, and uses those to process incoming requests without further reference to the database.

    Take the example of a retail sales system. Sale transaction data arrives as XML, which is processed at the mid-tier. Depending on the content of the XML, various records will be inserted into different tables, and other non-database processing may also take place. By preallocating key values, the mid-tier doesn't need to check the identity value after inserting each record - it can just send a stream of inserts wrapped in a transaction (some operations would call a stored procedure rather than doing a simple insert, but you see the idea).

    This sort of thing can also be done with Service Broker, as you say. That is a perfectly valid design, but requires business logic to be implemented (or mirrored) in the database - which is not always desirable. It also places additional load on the database server, which might be better distributed across a farm of mid-tier servers. If the rule processing requires significant non-database processing, the database is also probably also not the best place to do it.

    John Rowan (6/19/2009)


    I wouldn’t consider this a solid reason to use a sequence table. This can be done just fine w/o one.

    Sequence tables don't guarantee that there won't be gaps in the range of values allocated. That isn't the point of them really. The problem of comparing previous and next row values is a tricky one regardless of whether identity columns are used - at least it is tricky to optimize.

    Paul

  • What about a hybrid GUID/sequential system? It is easy to generate a GUID on the client, let the GUID move to the database and then when it's inserted into a table, allow a sequential system to be generated on the database. You can keep the GUID and always relate that back to the sales system.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply