incremental in second column

  • CREATE TABLE Invoice(

    ModelID    CHAR(12) NOT NULL,

    InvoiceID INT  NOT NULL,

    OtherDeatils DataTyes

    )

    Any Model will have InvoiceID as 0

    Whenever an Invoice is created based on a model InvoiceId should be incremeted by 1

    All are fine until there is a cuncurrency.

    Same number is cretaed by selecting the max if InvoiceID increment it by one. Is there a

    fool proof way (like identity column) to increment numbers on the second column for each invoice.

    Thank you

    ModelID InvoiceID

    AAAA    1

    AAAA    2

    BBBB    1

    BBBB    2

    BBBB    3

    CCCC    1

    CCCC    2

     

    Regards,
    gova

  • The only way is to lock the whole table while you do the insert so that noone else can read/update from it... can be a pain in the ass if you have a lot of inserts to do.

  • There are lot of inserts that is why there is cocurrency. I even tried parent child table design. That doesn't work either as it is the same but we have models in seperate table.

    Regards,
    gova

  • Just a taught... Could you keep the date (datetime so you have ms precision) of the insert instead and then maybe add an identity column so that you have a tie breaker for the rare cases where 2 inserts will happen at the exact same time. Then you can select the data order by ModelID - DateInsert - IdentCol.

  • Could test for Existence and then find the next ID.  That would probably make for a procedure hit during the save process, but it could be better than not allowing a save? 

     

    I wasn't born stupid - I had to study.

  • It is a primary key. It will not allow. I check for the error code call same proc again which is a bad way to do it. If there a logic that makes sure the number is not same like second identity that is what I am looking for.

    Regards,
    gova

  • 1. Add identity or use another column with unique value in it.

    2. Create function lioke this:

    CREATE FUNCTION dbo.GetInvoiceID (@ModelId char(12), @IdentValue int)

    RETURN int

    AS

    declare @C int

    SET @C = (select count(*) from ModelTable

    where ModelId = @ModelId and IdentValue <= @IdentValue))

    RETURN @C

    3. Make column InvoiceId computed. Apply the formula:

       = dbo.GetInvoiceID (ModelId, IdentValue)

    4. For better performance create clustered index on column IdentValue.

    _____________
    Code for TallyGenerator

  • Thanks Sergiy. Looks like it would work. I would try to use this in after update trigger.

    You know what in our group meeting we came to a conclusion identical to this. Even Model ID created like this as Mod000001.

     

    Regards,
    gova

  • The function Sergiy posted would certainly work, but I would worry about performance issues as the invoice table grows.  With a clustered index on the identity field, it would have to scan from the beginning of the table for each call to GetInvoiceID.  If the clustered index was on ModelID, IdentValue then GetInvoiceID would be very quick but you would have problems with page splits.  This might be minimized using a clustered key on IdentValue and an index on (ModelID, IdentValue) (no fragmentation in the table, but some in the index).

    You said you would put the function call in an after insert trigger rather than as a computed column.  That's probably a good idea, I wouldn't want to see those function calls repeated for every row of a "SELECT * FROM Invoice" with a million rows.

    What happens if you ever decide to archive and remove old invoices?  The InvoiceID numbers will all be reset.

    Using a separate ModelInvoiceID table (was this the parent/child design you mentioned?) would be slower at first, but it shouldn't slow down with age.

    CREATE

    TABLE ModelInvoice (

        ModelID CHAR(12) NOT NULL PRIMARY KEY CLUSTERED,

        LastInvoiceID INT NOT NULL)

    INSERT

    INTO ModelInvoice(ModelID, LastInvoiceID)

    SELECT ModelID, MAX(InvoiceID) FROM Invoice GROUP BY ModelID

    CREATE PROCEDURE GetInvoiceID @ModelID CHAR(12) AS

        DECLARE @n int

        BEGIN TRAN

        SELECT @n = LastInvoiceID + 1 FROM ModelInvoice WITH (HOLDLOCK,UPDLOCK) WHERE ModelID = @ModelID

        IF @n IS NULL

            INSERT INTO ModelInvoice(ModelID, LastInvoiceID) VALUES (@ModelID, 1)

        ELSE

            UPDATE ModelInvoice SET LastInvoiceID = @n WHERE ModelID = @ModelID

        COMMIT TRAN

        RETURN @n

    This should only lock a single row in ModelInvoice while getting the next invoice number, and only long enough to update that row.  There is no concurrency issue for invoices with different model numbers, only with simultaneous invoices for the same model.  You will have the same page splitting issue with new models that you would have with a (ModelID, IdentValue) secondary index using the other approach.

    This approach is undoubtedly slower at first.  It requires two inserts, and has more of a concurrency issue than the function.  On the other hand it should only slow down proportional to the depth of the ModelInvoice index, while the function looks to me like it slows down proportional to the Invoice row count.  (When you max out the INT IDENTITY field at two billion Invoice rows, the ModelInvoice index depth will only be around 4 or 5.)  You wouldn't have concurrency issues unless you're inserting a lot of rows, so I would want to know what happens to the function as your row count goes up.

  • If result of the function will be stored in computed column it will not affect SELECT performance at all.

    And clustered index is not mandatory because this function counts only rows with ModelId = @ModelId, so optimizer will seek the index on ModelId first and than compare subset of values in identity column with value in curremt row.

    If you not expect more than 10 rows per 1 model you better don't create index at all.

    _____________
    Code for TallyGenerator

  • Your post said to create a clustered index on IdentValue and nothing about a ModelID index, but I assume this was an oversight.  Obviously it works much better with a ModelID index.

    "If you not expect more than 10 rows per 1 model you better don't create index at all." - I don't see how this works without a (ModelID, IdentValue) secondary index, or a clustered (ModelID) index.  Then you're back to scanning the whole table (at least all rows <= IdentValue).

    "If result of the function will be stored in computed column" - Computed columns don't store their value.  According to the "Creating Indexes on Computed Columns" topic in BOL you couldn't use an index to cache the value either (the value is nondeterministic because it pulls data from multiple rows).  I think using the function in an AFTER INSERT trigger to compute the value once and store it as a normal column is a good idea.

    I think the function idea is probably a very good approach, if used with the proper indexes and if only evaluated once per row (not as a computed column).  Without knowing the ModelID distribution though, you don't know what the worst case behavior is.  The function slows down proportional to the size of the model index (# of rows for a given model), while the ModelInvoiceID table approach will slow down proportional to the depth of the index (no more than 5 for billions of rows, assuming the index is rebuilt periodically and doesn't get unbalanced).  The function is definately faster if there are tens or a few hundred invoice rows per model, but would be terrible if there are millions.  There is a crossover somewhere in the thousands, or tens of thousands, and I would want to look into it before I committed to an approach.  Again, it's not the total Invoice rows but the max invoices per ModelID.  If 10,000 or more invoices for one model is inconceivable, then go with the function.

    Using the function does prevent you from ever removing invoice history, except for obsolete models that will not be reused.

    UPDATE - After some further thought, in the best case (a well-maintained defragmented index) you'll get around 2000 index rows in a single extent that can be read with one I/O, so the Invoices/ModelID value would have to get pretty big before you need to worry about the performance of the function.  It's still nice to know what the worst case might be.

  • "Computed columns don't store their value." - where did you read this?

    If value is not deterministic and cannot be used in index it does not mean this value does not actually exist.

    Talking about identity column we must have in mind different ones. I was talking about new column with IDENTITY(1,1). If you expect many rows per one ModelId it's better for function calculations performance to have index on it. But if typical number of rows will not exceed 50 forget about it.

    And if we are talking about tens of thousands of invoices per model it's a reason to reconsider your design. In such case it's better to have separate tables for Model and for Invoice.

    Anyway the main index in your schema must be on ModelId. It will be used by optimizer both for update/insert and for select statements.

    _____________
    Code for TallyGenerator

  • I read from the holy BOL:

    computed_column_expression

    Is an expression defining the value of a computed column. A computed column is a virtual column not physically stored in the table. It is computed from an expression using other columns in the same table. For example, a computed column can have the definition: cost AS price * qty. The expression can be a noncomputed column name, constant, function, variable, and any combination of these connected by one or more operators. The expression cannot be a subquery.

    If I'm wrong, then I'll be grateful for learning something.  But the way I read it computed columns always have to be recomputed, and if the function involves other table rows it sounds recursive to me.

  • Just jumping in here... the computed columns are always calculated on the fly UNLESS they are indexed. However, to be indexable, the formula must be deterministic. This one is not because you are counting something from a table, so indexing this column in any way is out of the question as long as it is calculated.

  • Sergiy, I don't understand your comments on the index structure.

    With no index on ModelID, every function evaluation requires a full table scan.  Or at least up to IdentValue, if there is a clustered primary key on the identity field.

    With a secondary index only on (ModelID), it will either do a table scan or (most likely) use an index seek on the ModelID index and then do bookmark lookups in the main table to find IdentValue.  Regardless of the invoices/ModelID value this is additional I/O.

    A clustered index on ModelID works (for low invoices/ModelID) because no bookmark lookups are needed.  If invoices/ModelID * (avg row size) grows larger than an extent (64KB) extra I/O is required.

    A (ModelID, IdentValue) secondary index works best because it has the maximum number of rows/page and requires no bookmark lookup (minimum I/O).  If IdentValue is also the clustered primary key it is even better.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply