September 3, 2007 at 2:46 am
Hi,
This is not exactly a database design problem, it's rather a question about data presentation. However, all ideas are welcome, including change of database schema.
I have a system which, let's say, handles buckets of water. Buckets are received to facility, processed and then sent out. Processing means some water might be added or removed from any bucket, that will change the weight of the bucket. The weight is the main bucket parameter we care about.
Each bucket has ID. A bucket may "cycle" multiple times, that means be received, processed and sent out.
The events about each buckets are stored as a table:
BucketID Date Operation Weight
1 Jan 1 2007 IN 10
1 Jan 5 2007 CHANGE 11 --change can be shown as either new weight or as a weight difference
1 Jan 10 2007 OUT 11
2 Feb 1 2007 IN 15
2 Feb 10 2007 OUT 15
Now to the problem: I need to create an inventory turnover report that would take two parameters (begin_date and end_date) and for given period of time, display the following data:
Naturally, closing inventory should be equal to opening weight + received weight + processed weight - shipped weight.
Example: for data rows above, the report from Jan 8 to Feb 8 would display:
opening received changed sent closing
1/11 1/15 0/0 1/11 1/15
I could of course pull out all records and run thru them in report, conditionally filtering out buckets that are long gone, or have not yet arrived. But my concern is performance. There are hundreds of thousands of buckets, while only thousand or so will make it into a typical weekly report. I would like to push as much processing as I can to the server, and only run thru those records that actually matter.
So I need a way to query only those buckets "cycles" that will intersect with my reporting period. How?
Thanks for sharing your ideas,
Denis
September 3, 2007 at 9:40 am
Denis,
I'm not sure there's an easy way to run this report without performance issues. You can set up a clustered index on the date, which will limit he amount of stuff that you have to run through, but it still could be problematic since you have to know the last entry of each bucket in the report, so essentially you have to get the list of buckets, then get the "max(entry) < start date" in addition to what's in the system at that time.
I've done inventory stuff like this and it's problematic.
The first thing I'd suggest is that you add a "current weight" to the table that shows the current weight before the operation. So it would be (in your example)
BucketID Date Operation Weight Current
1 Jan 1 2007 IN 10 0
1 Jan 5 2007 CHANGE 11 10
1 Jan 10 2007 OUT 11 11
2 Feb 1 2007 IN 15 0
2 Feb 10 2007 OUT 15 15
This is essentially a back pointer so you don't have to scan backwards to any data prior to your start_date.
Other than that, I'd highly recommend this being processed in the reporting engine/server/client so it can scan through the rows that exist for the dates and start it's own "bucketing" (no pun intended) of the data for each bucket.
Also, if there's change in weight v the total weight, I'd put those in two separate columns. Don't share one column since that adds a lot of processing headaches.
Early here, but maybe someone has a better solution or an easy T-SQL query to get the results.
September 4, 2007 at 2:40 am
I couldn't quite figure out if you wanted to include buckets that have been brought IN or CHANGEd before the start date but not sent OUT before the report (therefore, like a bank, would be an opening credit).
If you didn't, I think this should produce what you are after:
DECLARE
@table TABLE (BucketID SMALLINT, [Date] DATETIME, Operation VARCHAR(6), Weight SMALLINT)
-- Date format is YYYY-MM-DD
INSERT INTO @table VALUES(2,'2006-12-01','IN', 12)
INSERT INTO @table VALUES(2,'2006-12-03','CHANGE', 13)
INSERT INTO @table VALUES(2,'2006-12-10','OUT', 13)
INSERT INTO @table VALUES(1,'2007-01-01','IN', 10)
INSERT INTO @table VALUES(1,'2007-01-05','CHANGE', 11)
INSERT INTO @table VALUES(1,'2007-01-10','OUT', 11)
INSERT INTO @table VALUES(2,'2007-02-01','IN', 15)
INSERT INTO @table VALUES(2,'2007-02-10','OUT', 15)
/** Date Variables **/
DECLARE @start_date DATETIME
DECLARE @end_date DATETIME
-- Date format is YYYY-MM-DD YYYY-MM-DD
SELECT @start_date = '2007-01-08', @end_date = '2007-02-08'
/** Create Results Table **/
DECLARE @results TABLE (descript VARCHAR(10), totalBuckets SMALLINT NULL, totalWeight SMALLINT NULL)
INSERT
INTO @results(descript) VALUES('opening')
INSERT INTO @results(descript) VALUES('received')
INSERT INTO @results(descript) VALUES('changed')
INSERT INTO @results(descript) VALUES('sent')
INSERT INTO @results(descript) VALUES('closing')
/** Get Opening Balance **/
-- Report Intersect Buckets
DECLARE @intersectBuckets TABLE (BucketID SMALLINT, [Date] DATETIME, Weight SMALLINT)
INSERT
INTO @intersectBuckets
SELECT original.bucketID, original.[Date], original.weight
FROM @table original
INNER JOIN(
SELECT bucketID, MAX([Date]) AS [Date] FROM @table -- Get last date for bucket for opening balance
WHERE bucketID IN (
SELECT bucketID --Get BucketIDs that intersect with report
FROM @table
WHERE [Date] BETWEEN @start_date AND @end_date
AND operation IN ('CHANGE', 'OUT')
)
AND operation IN ('CHANGE','IN')
AND [Date] < @start_date
GROUP BY bucketID) intersectBuckets
ON original.bucketID = intersectBuckets.bucketID
AND original.[Date] = intersectBuckets.[Date]
-- Intersect Buckets
UPDATE @results
SET totalBuckets = (SELECT COUNT(bucketID) FROM @intersectBuckets)
, totalWeight = (SELECT ISNULL(SUM(weight),0) FROM @intersectBuckets)
WHERE descript = 'opening'
/** Calculate Changes **/
--Received Buckets
UPDATE @results
SET totalBuckets = (SELECT COUNT(bucketID) FROM @table WHERE operation = 'IN' AND [Date] BETWEEN @start_date AND @end_date)
, totalWeight = (SELECT SUM(weight) FROM @table WHERE operation = 'IN' AND [Date] BETWEEN @start_date AND @end_date)
WHERE descript = 'received'
--Changed Buckets
UPDATE @results
SET totalBuckets = (SELECT COUNT(bucketID) FROM @table WHERE operation = 'CHANGE' AND [Date] BETWEEN @start_date AND @end_date)
, totalWeight = (SELECT ISNULL(SUM(weight),0) FROM @table WHERE operation = 'CHANGE' AND [Date] BETWEEN @start_date AND @end_date)
WHERE descript = 'changed'
--Sent Buckets
UPDATE @results
SET totalBuckets = (SELECT COUNT(bucketID) FROM @table WHERE operation = 'OUT' AND [Date] BETWEEN @start_date AND @end_date)
, totalWeight = (SELECT ISNULL(SUM(weight),0) FROM @table WHERE operation = 'OUT' AND [Date] BETWEEN @start_date AND @end_date)
WHERE descript = 'sent'
/** Calculate Closing **/
--Update to Opening
UPDATE @results
SET totalBuckets = (SELECT totalBuckets FROM @results WHERE descript = 'opening'),
totalWeight = (SELECT totalWeight FROM @results WHERE descript = 'opening')
WHERE descript = 'closing'
--Add Received
UPDATE @results
SET totalBuckets = totalBuckets + (SELECT totalBuckets FROM @results WHERE descript = 'received'),
totalWeight = totalWeight + (SELECT totalWeight FROM @results WHERE descript = 'received')
WHERE descript = 'closing'
--Minus Sent
UPDATE @results
SET totalBuckets = totalBuckets - (SELECT totalBuckets FROM @results WHERE descript = 'sent'),
totalWeight = totalWeight - (SELECT totalWeight FROM @results WHERE descript = 'sent')
WHERE descript = 'closing'
/** Show Results **/
SELECT * FROM @results
September 4, 2007 at 3:16 am
There is a bug in the above script... It does not return correct results when a bucket has been reused..... Change the dates to:
SELECT @start_date = '2007-01-08', @end_date = '2007-02-10'
and I don't think that the result is correct.
September 4, 2007 at 4:57 am
Steve, Adrian, much appreciate your replies,
Adrian, this is one impressive script! But, as you already noticed, it doesn't work if buckets "cycle" several times. However, the script has one more serious flaw: when you calculate the opening part (INSERT INTO @intersectBuckets), you assume that all buckets in your opening inventory must have at least one transaction within the reporting period. This assumption is incorrect, a bucket can be received in year 2000 and sit quietly in stock until year 2009 or so, and still it must be shown in opening and closing inventory for weeks in between. Try this to see the problem:
SELECT @start_date = '2007-01-06', @end_date = '2007-01-07'
Edit: sorry, I missed your disclaimer about not including "passive" buckets in the beginning of your original post. Yes, I need to include those buckets. The reported inventory must exactly correspond to actual inventory.
And, there is still another issue. In the final part of the script, you calculate the closing inventory from intermediate results. In this case, you should also take into account weight changes. This will show the problem:
SELECT @start_date = '2007-01-03', @end_date = '2007-01-20'
Please remove "cycling" bucket (first three datarows in @table) before trying my examples.
I generally agree with Steve that such kind of report is always difficult to implement. For each bucket "cycle", you need to know the dates for IN and OUT transactions. Then and only then you can decide whether this bucket must be in opening inventory or not. If I had a table for bucket cycles, such as:
CREATE TABLE BucketCycle (BucketCycleID INT PRIMARY KEY IDENTITY(1,1), BucketID INT NOT NULL, InDate DATETIME NOT NULL, OutDate DATETIME NULL)
ALTER TABLE BucketTransaction ADD BucketCycleID INT NOT NULL
then it would be trivial to obtain list of buckets that intersect the reporting date.
But "cost" of creating and upkeeping separate table just for one report is too high, so I consider it a last resort.
Denis
September 4, 2007 at 7:08 am
The cycling was getting me too. The best way, IMHO, is a hash table of some sort for each bucket, tracking the changes in a row by row basis. That's not a strength of SQL Server, so would be best done in a CLR function or a client side app.
I still think adding a "before" column would help greatly as well since it limits the scan to the "in report rows".
September 4, 2007 at 7:35 am
I've got a script which I think answers the question. I haven't fully tested it; but it's all set-based so the performance on a large table shouldn't be too bad.
The script finds buckets that have come IN and not been sent OUT yet, calculates the intersected buckets, and adjusts the closing balance for changes. I think!!
It's a bit long and messy but it should be followable (is that a word?)....
SET NOCOUNT ON
DECLARE @table TABLE (BucketID SMALLINT, [Date] DATETIME, Operation VARCHAR(6), Weight SMALLINT)
-- Date format is YYYY-MM-DD
INSERT INTO @table VALUES(3,'2006-11-11','IN', 9)
INSERT INTO @table VALUES(2,'2006-12-01','IN', 12)
INSERT INTO @table VALUES(2,'2006-12-03','CHANGE', 13)
INSERT INTO @table VALUES(2,'2006-12-10','OUT', 13)
INSERT INTO @table VALUES(1,'2007-01-01','IN', 10)
INSERT INTO @table VALUES(1,'2007-01-05','CHANGE', 11)
INSERT INTO @table VALUES(1,'2007-01-10','OUT', 11)
INSERT INTO @table VALUES(2,'2007-02-01','IN', 15)
INSERT INTO @table VALUES(2,'2007-02-10','OUT', 15)
INSERT INTO @table VALUES(3,'2007-03-01','OUT', 9)
SET NOCOUNT OFF
/** Date Variables **/
DECLARE @start_date DATETIME
DECLARE @end_date DATETIME
-- Date format is YYYY-MM-DD YYYY-MM-DD
SELECT @start_date = '2007-01-08', @end_date = '2007-02-08'
/** Create Results Table **/
DECLARE @results TABLE (descript VARCHAR(10), totalBuckets SMALLINT NULL, totalWeight SMALLINT NULL, comment VARCHAR(100))
INSERT INTO @results(descript, comment) VALUES('opening', 'Including Buckets already IN or CHANGEd but NOT CHANGed or OUT by end_date')
INSERT INTO @results(descript, comment) VALUES('received', 'IN in date range')
INSERT INTO @results(descript, comment) VALUES('changed', 'CHANGE in date range')
INSERT INTO @results(descript, comment) VALUES('sent', 'OUT in date range')
INSERT INTO @results(descript, comment) VALUES('closing', 'Final Balance incl. CHANGE')
/** Opening Balance and already IN/CHANGED Buckets **/
DECLARE @bucketsIN TABLE (bucketID INT, lastInDate DATETIME NULL, lastInWeight SMALLINT NULL, lastChangeDate DATETIME NULL, lastChangeWeight SMALLINT NULL, recycleDate DATETIME NULL)
--Define all buckets currently IN... That is have no OUT operation
INSERT INTO @bucketsIN (bucketID)
SELECT bucketID
FROM @table
WHERE [date] <= @end_Date
GROUP BY bucketID
HAVING SUM(CASE operation WHEN 'IN' THEN 1 ELSE 0 END) > SUM(CASE operation WHEN 'OUT' THEN 1 ELSE 0 END)
--Retrieve last IN date for each bucket, as a bucket can not be IN twice without going OUT
UPDATE @bucketsIN
SET lastInDate = [Date]
FROM (SELECT original.bucketID AS bucket, MIN(original.[DATE]) AS [Date]
FROM @table original
INNER JOIN @bucketsIN bucketsIN
ON original.bucketID = bucketsIN.bucketID
WHERE original.[date] <= @end_Date
AND operation = 'IN'
GROUP BY original.bucketID)dates
WHERE bucketID = bucket
--Retrieve weight for buckets already IN
UPDATE @bucketsIN
SET lastinWeight = weight
FROM @table original
INNER JOIN @bucketsIN bucketsIN
ON original.bucketID = bucketsIN.bucketID
AND original.[Date] = bucketsIN.lastInDate
-- Check if bucket has been SENT
UPDATE @bucketsIN
SET recycleDate = [Date]
FROM (
SELECT original.bucketID as bucket, MIN(original.[Date]) AS DATE
FROM @table original
INNER JOIN @bucketsIN bucketsIN
ON original.bucketID = bucketsIn.bucketID
WHERE original.[Date] > bucketsIn.lastInDate
AND original.operation = 'OUT'
GROUP BY original.bucketID
)recycled
WHERE bucketID = bucket
-- If recycled, last CHANGE would occure before recycle, if not then CHANGE is next CHANGE
UPDATE @bucketsIN
SET lastChangeDate = [Date]
FROM (
SELECT original.bucketID as bucket, MIN(original.[Date]) AS DATE
FROM @table original
INNER JOIN @bucketsIN bucketsIN
ON original.bucketID = bucketsIn.bucketID
WHERE original.[Date] > bucketsIn.lastInDate
AND original.[Date] < bucketsIn.recycleDate
AND bucketsIN.recycleDate IS NOT NULL
AND original.operation = 'CHANGE'
GROUP BY original.bucketID
)recycled
WHERE bucketID = bucket
-- Remove any buckets that have been recycled before report dates
DELETE FROM @bucketsIN
WHERE recycleDate <= @start_date
UPDATE @bucketsIN
SET lastChangeWeight = original.weight
FROM @table original
INNER JOIN @bucketsIN bucketsIN
ON original.bucketID = bucketsIN.bucketID
AND original.[Date] = bucketsIN.lastChangeDate
--Report Opening Buckets
UPDATE @results
SET totalBuckets = bucketCount,
totalWeight = weightSum
FROM(
SELECT COUNT(bucketID) AS bucketCount, ISNULL(SUM(ISNULL(lastChangeWeight, lastInWeight)), 0) AS weightSum
FROM @bucketsIN
WHERE lastInDate < @start_date
)openingBuckets
WHERE descript = 'opening'
--Report Intersect Buckets
DECLARE @intersectBuckets TABLE (BucketID SMALLINT, [Date] DATETIME, Operation VARCHAR(6), Weight SMALLINT)
INSERT INTO @intersectBuckets
SELECT original.bucketID, original.[Date], original.operation, original.weight
FROM @table original
INNER JOIN(
SELECT bucketID, MAX([Date]) AS [Date] FROM @table -- Get last date for bucket for opening balance
WHERE bucketID IN (
SELECT bucketID --Get BucketIDs that intersect with report
FROM @table
WHERE [Date] BETWEEN @start_date AND @end_date
AND operation IN ('CHANGE', 'OUT')
)
AND operation IN ('CHANGE','IN')
AND [Date] < @start_date
GROUP BY bucketID) intersectBuckets
ON original.bucketID = intersectBuckets.bucketID
AND original.[Date] = intersectBuckets.[Date]
-- Intersect Buckets
UPDATE @results
SET totalBuckets = totalBuckets + (SELECT ISNULL(COUNT(bucketID), 0) FROM @intersectBuckets)
, totalWeight = totalWeight + (SELECT ISNULL(SUM(weight), 0) FROM @intersectBuckets)
WHERE descript = 'opening'
/** Calculate Changes **/
--Received Buckets
UPDATE @results
SET totalBuckets = (SELECT COUNT(bucketID) FROM @table WHERE operation = 'IN' AND [Date] BETWEEN @start_date AND @end_date)
, totalWeight = (SELECT ISNULL(SUM(weight), 0) FROM @table WHERE operation = 'IN' AND [Date] BETWEEN @start_date AND @end_date)
WHERE descript = 'received'
--Changed Buckets
UPDATE @results
SET totalBuckets = (SELECT COUNT(bucketID) FROM @table WHERE operation = 'CHANGE' AND [Date] BETWEEN @start_date AND @end_date)
, totalWeight = (SELECT ISNULL(SUM(weight),0) FROM @table WHERE operation = 'CHANGE' AND [Date] BETWEEN @start_date AND @end_date)
WHERE descript = 'changed'
--Sent Buckets
UPDATE @results
SET totalBuckets = (SELECT COUNT(bucketID) FROM @table WHERE operation = 'OUT' AND [Date] BETWEEN @start_date AND @end_date)
, totalWeight = (SELECT ISNULL(SUM(weight),0) FROM @table WHERE operation = 'OUT' AND [Date] BETWEEN @start_date AND @end_date)
WHERE descript = 'sent'
/** Calculate Closing **/
--Update to Opening
UPDATE @results
SET totalBuckets = (SELECT totalBuckets FROM @results WHERE descript = 'opening'),
totalWeight = (SELECT totalWeight FROM @results WHERE descript = 'opening')
WHERE descript = 'closing'
--Add Received
UPDATE @results
SET totalBuckets = totalBuckets + (SELECT totalBuckets FROM @results WHERE descript = 'received'),
totalWeight = totalWeight + (SELECT totalWeight FROM @results WHERE descript = 'received')
WHERE descript = 'closing'
--Minus Sent
UPDATE @results
SET totalBuckets = totalBuckets - (SELECT totalBuckets FROM @results WHERE descript = 'sent'),
totalWeight = totalWeight - (SELECT totalWeight FROM @results WHERE descript = 'sent')
WHERE descript = 'closing'
/** Finally adjust for Changes **/
--Find IN for changes in Weight records
DECLARE @changes TABLE(bucketID SMALLINT, changeDate DATETIME NULL, changeWeight SMALLINT NULL, inDate DATETIME NULL, inWeight SMALLINT NULL)
INSERT INTO @changes (bucketID, changeDate, changeWeight)
SELECT bucketID, [Date], weight
FROM @table
WHERE [Date] BETWEEN @start_date AND @end_date
AND operation = 'CHANGE'
--Get Maximum IN date < CHANGE date
UPDATE @changes
SET inDate = [Date]
FROM (
SELECT original.bucketID as bucket, MAX(original.[Date]) AS DATE
FROM @table original
INNER JOIN @changes changes
ON original.bucketID = changes.bucketID
WHERE original.[Date] < changes.changeDate
AND original.operation = 'IN'
GROUP BY original.bucketID
)recycled
WHERE bucketID = bucket
--Get old WEIGHT for change
UPDATE @changes
SET inWeight = weight
FROM @table original
INNER JOIN @changes changes
ON original.bucketID = changes.bucketID
AND original.[Date] = changes.inDate
DECLARE @totalChange SMALLINT
SELECT @totalChange = ISNULL(SUM(changeWeight) - SUM(inWeight), 0) FROM @changes
--Update results with changes
UPDATE @results
SET totalWeight = totalWeight + @totalChange
WHERE descript = 'closing'
SELECT * FROM @results
Anyone else crossing their fingers?
September 4, 2007 at 7:40 am
Steve, can you give an example of such hash table? Would it be something different from BucketCycle table I proposed?
Of course "before" column is a must to be able to track weight changes. In actual BucketTransaction table, rows are organized in linked sequence where each row points to the next row. So I can easily fetch the previous state if I need to calculate the difference.
Denis
September 4, 2007 at 7:41 am
Just thinking on, if the CHANGED result (difference of CHANGES) is to be the difference and not the total of the weights after the changes have been made:
Remove the Changed Buckets section under Calculate Changes starting line 146 and update the table with the @totalChanges figure only.
September 4, 2007 at 9:04 am
Jesus Christ this is one mean piece of TSQL
Bravo. But please try:
SELECT @start_date = '2007-01-31', @end_date = '2007-04-01'
The script returns opening inventory 2/22, and closing inventory 1/13
The correct result would be 1/9 and 0/0 (everything is OUT by 2007-03-01).
It will take me some time to find the error, providing I will not fry my brains while doing so But anyway, I usually try to avoid excessively complicated TSQL code, breaking it into parts and possibly moving outside of SQL server. My sad experience is that complicated code tends to become unmanageable when you need to change something 3 months later. This is why I prefer to use reporting engine and only write as much TSQL as needed to give me the data to process.
In this case, all I need is a list of BucketID-s that intersect with my reporting period. Then I can pull all transactions for those buckets into report and process further, discarding rows I don't need and calculating inventory values. With this approach even cycles are not the problem, since the recycling percentage is relatively small and I will simply discard old cycles based on dates.
BTW I liked the following statement very much:
--Define all buckets currently IN... That is have no OUT operation INSERT INTO @bucketsIN (bucketID) SELECT bucketID FROM @table WHERE [date] < @begin_Date --my change, was: <= @end_date GROUP BY bucketID HAVING SUM(CASE operation WHEN 'IN' THEN 1 ELSE 0 END) > SUM(CASE operation WHEN 'OUT' THEN 1 ELSE 0 END)
It seems that it can give me the opening balance for my inventory. I must also add buckets that had transaction within the reporting period and I am all set.
I will try to play with it and see if it gives me my report with reasonable performance.
BR,
Denis
September 4, 2007 at 9:16 am
Denis,
Glad you could understand it okay.
The problem lies within the Intersect Buckets table. You'll need a DELETE statement to remove bucketID 2 (as has been SENT, similar DELETE method used for @bucketsIN) and bucketID 3 (as already captured by, and is exact match in, @bucketsIN).
EDIT: Or you could add the clauses in to the INSERT INTO @intersectBuckets query for the same effect... Might be degrading to performance this way though.
Hope this stops the brain frying!
Ade
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply