May 17, 2006 at 9:58 am
Absolutely
* Noel
May 17, 2006 at 10:06 am
Noel, you're confusing me. "A sequence table is the right idea", but "It is not my preferred way"? So there's another way you prefer that's better than the ones posted, but you're not going to share it with us?
I agree that the sequence table is a very workable solution and would not worry about scaling if it is done right. My biggest concern would be guaranteeing that all inserts are done by a stored procedure that correctly accesses the sequence table.
You could argue that the identity field approach should scale better because you only need one action to insert the record, while the sequence table requires an extra action for the update. This is probably just a theoretical argument, because I can't imagine a real situation where a sequence table approach would break down if all of Noel's suggestions were applied.
On the other hand if you've got such a volume of invoices that you're worried about scaling, you should be able to afford enough hardware to deal with it. I'd be happy to come help set up your new 32-processor servers for the reasonable sum of twice my current salary.
May 17, 2006 at 10:21 am
One of my concerns with a sequence table, aside from the additional I/O and other overhead, is that you cannot then easily insert multiple invoices in one statement.
If, for example, thru some EDI (yes, it is still used) or other process you have a batch of invoices to enter, you must single-step thru them with a sequence table, but not with an identity column.
As for a non-numeric, that's true, it could not be in the column itself -- then again, it shouldn't be anyway imo. It should be a separate column and concatenated into the computed column that is displayed to the user.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 17, 2006 at 10:29 am
I love functions... below is the function declaration but first the insert statment
INSERT INTO dbo.Invoice ( Invoice_id, Customer_name, Invoice_amount )
VALUES ( dbo.fn_NextKey(05), 'ACME Corp.', 12345.67 )
This technique also works with SELECT inserts for multi-row inserting joy. The function is also easy to modify for different key compositions mmyy-nnnnnnnnnn or what ever...
Cheers!
CREATE FUNCTION dbo.fn_NextKey( @year int )
RETURNS varchar(10)
AS
BEGIN
/* ASSERTIONS
* Invoice_Id has a fixed pattern of yy-nnnnnn where yy is last two digits of a year
* and nnnnnn is a 6 digit 0 padded number that is incremented by 1
*/
DECLARE @pattern char(3), @answer varchar(10)
SET @pattern = right('00'+convert(varchar(10),@year),2) + '-'
SELECT
@answer = CASE WHEN max(Invoice_id) IS NULL
THEN @pattern + '000001' /* first invoice of given year */
ELSE @pattern + right('000000'+convert(varchar(10),convert(int,substring(max(Invoice_id),4,6))+1),6)
END
FROM dbo.Invoice
WHERE Invoice_id LIKE @pattern+'%'
RETURN @answer
END
May 17, 2006 at 10:54 am
I'm not sure how the function will work with multiple rows, I get a duplicate key error when I try it.
Maybe if it was implemented as a CLR function and flagged as non-deterministic, but I don't think it's worth the trouble.
May 17, 2006 at 12:34 pm
It should be a separate column and concatenated into the computed column that is displayed to the user.
I think thats the most important post in this discussion. I think one of the worst things you can do to a relational DB is to store MORE THEN ONE Information in a field. And if you "force" the field to become a wrong datatype it will even get worse.
Take the Year-Number example...
SET @tmp_invoice_id = (SELECT MAX(SUBSTRING(invoice_id, 4, 5) + 1)
FROM Invoice WHERE (invoice_id LIKE @tmp_date + '%'))
To find all "Year entries" you can utilize an Index with "like '06%' after that you are using a function to find the maximum value of ALL Invoices for that year. That means in a "worst case" you have to do 99.999 times Substring ?
I think the overhead when creating the sequential Invoices can be negelected if you are using a sequence table... Also here is a small suggestion for the self updating of the sequence table. If I am faced with a the same problem, then i usually do it the "lazy way" Update the row i need to update, and if I get a rowcount of "0" then I know that i need to actually insert the row. Updating 0 Rows has no negative performace effect on SQL Server (Assuming correct indexes) and if I first check if the row exists, and then update, then i need to access the index 2 times... Once for checking and once for updating... So I can skip the 2nd Access to the index most of the times. And the Insert will automatically fierd when its needed.
But when the reporting season starts this (string) system will hurt you. It is not possible to generate a "good" index on these fields. and when you start running querries this will start to hurt you. It might only start with an concattinated Invoice Number... but in a current application i am working with this started to run rampart... They needed to change the Datatype from Varchar to Binary to store all the "extra" information (Yes I tried to convice them that this is stupid... trust me I tried.. I am also looking for a Job now where ppl actually listen)
P.s: The suggested solution can still be valid if the number of Invoices remains small. You allways have to think about which compromise you can make...
May 17, 2006 at 3:08 pm
Hehe...
I was refering to the Identity column with primary key constraint
Should you need this "sequencing" you can use the table and should you could care less about it use the Identity instead.
The sequence is very good to "reserve" client side batches of candidate keys in one go
I am certaint that scalability is excellent if you apply all the remomendations I posted (I have !)
* Noel
May 17, 2006 at 8:26 pm
Thankyou all for taking time to put your comments... I am glad to see a healthy discussion here on topic which I started
Regards,
Asif
May 18, 2006 at 12:59 am
I was thinking a bit about the performance problem I mentioned above when creating a sequence.
The group by querry could be rewritten so it can correctly use the indexec.
The following querry would do the trick:
SET @tmp_invoice_id = (SELECT top 1 (SUBSTRING(invoice_id, 4, 5) + 1)
FROM Invoice WHERE (invoice_id LIKE @tmp_date + '%')order by invoive_id desc)
This will limit the querry to only retrieve one record and only do one substring and one implicit cast on the data.
May 18, 2006 at 2:31 am
How about this string keys are to be avoided (not least because they become foreign keys), also the indexes will be larger and less efficient than integers. This method uses an identity key and a derived column that calculates the invoice id from the identity column. An alternative would be to calculate the invoice id using the function and storing it alongside the identity column, might be worth storing and indexing the year too as likely to group indexes by year, can use check constraints to ensure values are in sync.
create table tbl(id int identity(1,1), dt datetime)
go
create function dbo.GetInvoiceId(@i int, @yr int) returns varchar(10)
as
begin
declare @r varchar(10)
set @r = @i - coalesce((select max(id) from tbl where year(dt) < @yr),0)
return substring(cast(@yr as varchar(4)),3,2) + substring('-000000',1, 7 - len(@r)) + @r
end
go
begin
alter table tbl add MyId as dbo.GetInvoiceId([id], year(dt))
insert tbl(dt)
select top 100 '2004-01-01' from sysobjects
insert tbl(dt)
select top 100 '2005-01-01' from sysobjects
insert tbl(dt)
select top 100 '2006-01-01' from sysobjects
select * from tbl
end
go
drop table tbl
go
drop function dbo.GetId
Phil Nicholas
May 18, 2006 at 8:39 am
I liked your revised approach of letting SQL pick proper index and help speed up performance. The goal for this approach was totally eliminate the need to “maintain any sort of counters” and I think I have mentioned that the solution I proposed is for a small to medium size operation, believe it or not it always worked for me since about 8 years
I will give it another shot using CLR integration and see how much does it makes difference.
Regards,
Asif
June 17, 2006 at 11:57 am
Iron clad?
-- prepare test data
declare @test-2 table (t varchar(50))
insert @test-2
select 'ov1' union all
select 'ov2' union all
select 'ov4' union all
select 'ov31' union all
select 'ssss1' union all
select 'ssss99'
-- show data before inserting new values
select * from @test-2
-- do the work
declare @prefix varchar(50)
select @prefix = 'ov' -- Throw anything in!
insert @test-2
select@prefix + convert(varchar, t.n + 1)
from(
selectmax(convert(int, substring(t, 1 + len(@prefix), 50))) n
from@test-2
wheret like @prefix + '%'
and not substring(t, 1 + len(@prefix), 50) like '%[^0-9]%'
) t
wherenot @prefix like '%[0-9]%'
and not t.n like '%[^0-9]%'
-- show data after inserting new values
select * from @test-2
N 56°04'39.16"
E 12°55'05.25"
May 16, 2007 at 10:30 pm
hmmm....
well it does solve the issue MOMENTARILY
however if its really deployed over a vast network where a transaction of insert would be like fired so very frequently...the user is definitely gettin more already used errors...i am sure...
and plus there are better alternatives already available...
nyways this definitely helped thoughh
May 17, 2007 at 5:41 am
We've had a similar situation and deployed a solution like Noels which I think is the best. It gives you something to lock on to prevent the concurrency problem (which will show up once you get a lot of users and data on the system). It also will perform must faster because you only have to read one row out of a table that will have only a couple of rows (if one per year) rather than scanning your invoice table for max constantly.
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply