March 10, 2010 at 8:29 am
I have to design an accounting system and I am stucked at the point where i need to design some complex business rules.
To make it simplier, take the following example.
I have two classical tables Invoice and InvoiceLines
CREATE TABLE Invoice (InvoiceID INT, Total decimal)
CREATE TABLE InvoiceLine(InvoiceLineID INT, InvoiceID INT, Amount decimal)
I have a procedure, like, InsertInvoiceLine(...)
How can I make sure that after the InsertInvoiceLine is executed the Invoice.Total is always equal to SUM(InsertInvoiceLine.Amount) ?
I am not sure that my questions is 100% correct, but currently I have problems with corrupted data - like in production, sometimes the invoice total is different then the sum of all the invoice lines. How can I stop this happening? How can I throw an error if this business rule is not respected? Also, consider that I have hundred of thousands of invoices, so not all the constraints are acceptable.
What would you recommend?
Thanks,
Stelian.
March 12, 2010 at 11:15 pm
First of all, I would not store the total of the Invoice Lines in the Invoice Header record. This can be computed as needed when the header record is read.
Second, it appears (without seeing the code of the stored procedure) that you are inserting one row at a time into the Invoice Line Item table. This procedure should be able to update the Header record as each Line Item is entered.
You mention corruption of your data, can you explain how you think this is occuring?
March 13, 2010 at 2:52 am
stelianx (3/10/2010)
...in production, sometimes the invoice total is different then the sum of all the invoice lines. How can I stop this happening?
Get SQL Server to do the hard work of maintaining the totals for you, automatically. Guaranteed never to be wrong.
Demonstration script:
-- For demonstration purposes
USE tempdb;
GO
-- Required connection settings
SET ANSI_NULLS,
ANSI_PADDING,
ANSI_WARNINGS,
ARITHABORT,
CONCAT_NULL_YIELDS_NULL,
QUOTED_IDENTIFIER
ON;
SET NUMERIC_ROUNDABORT
OFF;
GO
-- Drop demo objects if they exist
IF OBJECT_ID(N'dbo.InvoiceTotal', N'V')
IS NOT NULL
DROP VIEW dbo.InvoiceTotal;
IF OBJECT_ID(N'dbo.InvoiceLine', N'U')
IS NOT NULL
DROP TABLE dbo.InvoiceLine;
IF OBJECT_ID(N'dbo.Invoice', N'U')
IS NOT NULL
DROP TABLE dbo.Invoice;
GO
-- Header table
CREATE TABLE
dbo.Invoice
(
invoice_id INTEGER NOT NULL PRIMARY KEY,
other_columns CHAR(1024) NOT NULL DEFAULT (SPACE(1024)),
);
GO
-- Detail table
CREATE TABLE
dbo.InvoiceLine
(
invoice_id INTEGER NOT NULL
CONSTRAINT [FK dbo.InvoiceLine dbo.Invoice invoice_id]
FOREIGN KEY
REFERENCES dbo.Invoice,
line_id INTEGER NOT NULL,
amount MONEY NOT NULL,
CONSTRAINT [PK dbo.InvoiceLine invoice_id, line_id]
PRIMARY KEY CLUSTERED (invoice_id, line_id)
WITH (FILLFACTOR = 100)
);
GO
-- Add 10,000 invoices
INSERT dbo.Invoice
(invoice_id)
SELECT TOP (10000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
-- Add 8 lines per invoice, total 80,000 lines
INSERT dbo.InvoiceLine
(invoice_id, line_id, amount)
SELECT TOP (80000)
(ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1) / 8 + 1,
(ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1) % 8 + 1,
CONVERT(MONEY, RAND(CHECKSUM(NEWID())) * 10000)
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
-- Create a view of all invoice totals
CREATE VIEW
dbo.InvoiceTotal
WITH SCHEMABINDING
AS
SELECT INV.invoice_id,
amount = SUM(LINE.amount),
line_count = COUNT_BIG(*)
FROM dbo.Invoice INV
JOIN dbo.InvoiceLine LINE
ON LINE.invoice_id = INV.invoice_id
GROUP BY
INV.invoice_id;
GO
-- Materialize the view
CREATE UNIQUE CLUSTERED INDEX [CUQ dbo.InvoiceTotal invoice_id]
ON dbo.InvoiceTotal (invoice_id);
GO
-- Demonstrate using the view
SELECT invoice_id, amount, line_count
FROM dbo.InvoiceTotal WITH (NOEXPAND)
ORDER BY
invoice_id;
GO
-- Object drops
IF OBJECT_ID(N'dbo.InvoiceTotal', N'V')
IS NOT NULL
DROP VIEW dbo.InvoiceTotal;
IF OBJECT_ID(N'dbo.InvoiceLine', N'U')
IS NOT NULL
DROP TABLE dbo.InvoiceLine;
IF OBJECT_ID(N'dbo.Invoice', N'U')
IS NOT NULL
DROP TABLE dbo.Invoice;
GO
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 13, 2010 at 2:56 am
Lynn Pettis
I would not store the total of the Invoice Lines in the Invoice Header record. This can be computed as needed when the header record is read.
This is a common de-normalization. Computing invoice totals on-the-fly is unlikely to scale well.
Lynn Pettis
It appears that you are inserting one row at a time into the Invoice Line Item table. This procedure should be able to update the Header record as each Line Item is entered.
This is also unlikely to scale, and introduces the potential for deadlocks. Worse, the logic must be repeated wherever a line item might be modified.
Lynn Pettis
You mention corruption of your data, can you explain how you think this is occuring?
My guess is that the current design relies either on triggers or code similar to that suggested above.
Indexed views are the natural solution, as shown in my demonstration code.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 13, 2010 at 3:31 am
Thanks guys for your help.
I came with the invoice example just for reference, to simplify the scenario, but probably I should give you the real example.
As my first remark - yes I have the the Invoice / InvoiceTotal tables - it is designed by updating totals with triggers and it's working quite well.
My real scenario is however here:
- I have to keep track of 2 (or multiple) accounting systems, meaning that the same invoice should be represented differently for different systems. As simple example, an invoice for 3 days of rental of cost of 100 of a car will look like:
System1:
----
InvoiceLine: 100
System2:
-----
InvoiceLine, day1: 33.33
InvoiceLine, day3: 33.33
InvoiceLine, day3: 33.34
So, first system expects the whole amount as one line, while the other system is splitting the amount per day of rental.
In my database, whenever a new InvoiceLine is inserted / updated / deleted, I should mirror this operation in the 2nd accounting system and the golden rule is that both accounting systems should always calculate the same amounts
However, some programming errors can occur, like rounding error and the 2nd system can calculate 99,99 instead of 100.00. And then the customer complains. The rules to respect are very complex, so I cannot guarantee that the accounting will calculate correctly in 100% of cases - however, how can I make sure that an error is thrown if the 2 accounting systems do not calculate the same amounts? I would prefer users to see errors, rather then loosing money...
March 13, 2010 at 3:51 am
stelianx (3/13/2010)
it is designed by updating totals with triggers and it's working quite well.
I used to use triggers for this before there was an alternative. Indexed views are superior for many reasons, including performance. Triggers are vulnerable to being accidentally disabled, to logic errors (particularly for multi-row operations), and concurrency issues. I would encourage you to use indexed views where possible for future development work.
stelianx (3/13/2010)
My real scenario is however here: I have to keep track of 2 (or multiple) accounting systems, meaning that the same invoice should be represented differently for different systems.
The problem is now very different from the original one posted, and I am not going to write another test rig and script. If you provide good-quality set-up code, I might post a solution. Please include all relevant detail this time, including whether the tables concerned exist in the same database or not.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 13, 2010 at 4:05 am
If you need to figure out where you have different results in your two tables then just use the great sample code Paul provided, join it to your invoice header table and select the rows with a difference <> 0.
March 13, 2010 at 9:00 am
Paul White (3/13/2010)
Lynn Pettis
I would not store the total of the Invoice Lines in the Invoice Header record. This can be computed as needed when the header record is read.This is a common de-normalization. Computing invoice totals on-the-fly is unlikely to scale well.
Lynn Pettis
It appears that you are inserting one row at a time into the Invoice Line Item table. This procedure should be able to update the Header record as each Line Item is entered.This is also unlikely to scale, and introduces the potential for deadlocks. Worse, the logic must be repeated wherever a line item might be modified.
Lynn Pettis
You mention corruption of your data, can you explain how you think this is occuring?My guess is that the current design relies either on triggers or code similar to that suggested above.
Indexed views are the natural solution, as shown in my demonstration code.
Paul
First, yes I understand it is a common means of denormalization and I use that when appropriate. However, in the simple scheme of things, it really comes down to "It Depends". How is the total being used in and by the system.
Also, I'd have to run some tests, but an in-line TVF and a cross apply may be scalable enough for this particular requirement and not need to use an indexed view with it attendant use of additional disk space (I know, disk is cheap, but it also depends on what the business can afford).
Of course, all of this is mute as this had nothing to do with the real problem.
I had to write code years ago that solved a similar problem, unfortunately it was in COBOL, not SQL. I had break down tax amounts to the individual taxing authorities and ran into similar problems regarding rounding. The difference, I couldn't kick out errors, I had to incorporate logic to ensure that the break down matched the overall amount.
This one may take a little thinking to help solve. It will help if the OP posts a test rig for us to work with along with the expected results.
March 13, 2010 at 7:43 pm
Lynn Pettis (3/13/2010)
Also, I'd have to run some tests, but an in-line TVF and a cross apply may be scalable enough for this particular requirement and not need to use an indexed view with it attendant use of additional disk space (I know, disk is cheap, but it also depends on what the business can afford).
An in-line user-defined table-valued function and APPLY is exactly equivalent to in-lining the code in the stored procedure you mentioned. The part that does not scale well is the cost of firing triggers and running complex code to maintain the totals. It is also quite easy to produce deadlocks using this approach.
As anyone who has ever written one of these 'manual' de-normalization strategies knows, it is just hard work to ensure that everything stays consistent, while maintaining good concurrency, and avoiding deadlocks.
Indexed views absolutely guarantee consistency, with very little effort.
The disk space used by the indexed view comes to 20 bytes per row.
Storing the sum and count in the header table would require 16 bytes per row.
The incremental cost is therefore 4 bytes per row. 😛
No-brainer.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 13, 2010 at 8:07 pm
First, read what I wrote, I would have to do some testing to know if the in-line TVF would scale well or not.
Second, it really depends on how someone creates the indexed view. If they create the view properly, primary key and computed value only, then yes. Problem I've seen is that most people create the indexed view over all the columns of the base table and that is what I was actually talking about but didn't articlate properly.
Now, back to the Air Force v Army hockey game. We are currently tied 1 - 1 in the 2nd period.
March 13, 2010 at 8:54 pm
Lynn Pettis
First, read what I wrote, I would have to do some testing to know if the in-line TVF would scale well or not.
:laugh: I did read what you wrote!
You might have to do some testing, but I have direct experience all the approaches mentioned so far in many large-scale environments.
Your original recommendation - to calculate the total from the base data every time it is required - quite obviously will not scale.
Lynn Pettis
Second, it really depends on how someone creates the indexed view. If they create the view properly, primary key and computed value only, then yes. Problem I've seen is that most people create the indexed view over all the columns of the base table and that is what I was actually talking about but didn't articlate properly.
What would be the point of that? That just creates a full copy of the table! A bizarre suggestion.
I have already shown how to do it correctly in the code provided, and explained why it is correct in detail.
If you wish to continue trying to salvage an untenable position, post some code to back up your statements. Otherwise, you are just hand-waving.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 13, 2010 at 9:09 pm
What is untenable about saying I'd need to do testing? Seems to me that's says I'm not sure and would do more investigating before commiting one way or the other.
Also, I am also speaking from experience. Most people when first creating indexed views tend to go to the extreme and build indexed views over the entire base table or tables instead of just the coluns that are needed. It is through discussions like this one they learn better methods.
Once I have done some testing I'll post my findings so that everyone can see the differences. This how we all learn, including myself.
March 13, 2010 at 9:19 pm
Lynn Pettis
I'm not sure and would do more investigating before commiting one way or the other.
I find it hard to credit that any reasonably competent person could believe that summing the detail records every time will scale well.
Does selecting a pre-computed aggregate not seem obviously better to you?
Nevertheless, should you fail to find the time to produce a test rig, I will happily oblige 🙂
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply