October 2, 2012 at 2:56 pm
I swear I've done this before, but I'm drawing a blank today.
in this example, one or more detail transactions should add up together to be a specific value; but it's not known which transactions should be included.
so in the example below, i have 23 individual transactions, and some subset of them will add up to exactly 244,194.49
can anyone point me to an example where they determine the transactions involved, since they would add up to a specific value?
With MySampleData(RID,VAL)
AS
(
SELECT 2172, 429.00 UNION ALL
SELECT 2173, 20555.00 UNION ALL
SELECT 2174, 5855.71 UNION ALL
SELECT 2175, 13892.89 UNION ALL
SELECT 2194, 36175.10 UNION ALL
SELECT 2188, 5451.32 UNION ALL
SELECT 2189, 566.00 UNION ALL
SELECT 2190, 706.40 UNION ALL
SELECT 2191, 19344.21 UNION ALL
SELECT 2192, 16600.00 UNION ALL
SELECT 2193, 52775.10 UNION ALL
SELECT 2182, 13320.34 UNION ALL
SELECT 2183, 1454.00 UNION ALL
SELECT 2184, 15960.00 UNION ALL
SELECT 2185, 18365.00 UNION ALL
SELECT 2186, 18646.00 UNION ALL
SELECT 2187, 21279.72 UNION ALL
SELECT 2176, 17525.00 UNION ALL
SELECT 2177, 19796.00 UNION ALL
SELECT 2178, 20573.00 UNION ALL
SELECT 2179, 21064.00 UNION ALL
SELECT 2180, 21252.00 UNION ALL
SELECT 2181, 7503.11
)
--theoretically, multiple val items add up to $244,194.49
--23 rows max, since there's 23 entries in this example
SELECT row_number() OVER (ORDER BY RID) AS RW,
RID,VAL
FROM MySampleData
Lowell
October 2, 2012 at 3:12 pm
Will they be sequential, or can it be any combination of rows?
Jared
CE - Microsoft
October 2, 2012 at 3:20 pm
SQLKnowItAll (10/2/2012)
Will they be sequential, or can it be any combination of rows?
definitely non sequential/gaps are involved;
i'm trying a mini tally table and n-1 transactions, but i keep thinking i need to joint he table agaisnt itself 23 times(for 23 row?), and i know that's not right;
i'm feeling dumb today, for sure.
the actual biz problem is one system shows the transactions were applied at a certain dollar amount, but the otehr system ahs them as unapplied;
trying to determine which details are involved is a bit of a pain, and trying the eyeball method didn't work so far.
Lowell
October 2, 2012 at 6:42 pm
There are two solutions to this amount:
2172,2173,2174, 2176,2177,2178,2179,2180,2181,2182,2183,2184,2185,2186,2187, 2189,2190,2191
2172,2173,2174,2175,2176,2177,2178,2179,2180,2181,2182,2183,2184,2185,2186,2187,2188,2189,2190
I have formatted them like that to highlight the differences.
The query is damn ugly and runs like a pig with all four legs roughly hacked away at the knee, taking over 2 minutes to find the solutions...
IF OBJECT_ID('tempdb..#data') is null
BEGIN
;WITH MySampleData(RID,VAL )
AS
(
SELECT 2172, 429.00 UNION ALL
SELECT 2173, 20555.00 UNION ALL
SELECT 2174, 5855.71 UNION ALL
SELECT 2175, 13892.89 UNION ALL
SELECT 2194, 36175.10 UNION ALL
SELECT 2188, 5451.32 UNION ALL
SELECT 2189, 566.00 UNION ALL
SELECT 2190, 706.40 UNION ALL
SELECT 2191, 19344.21 UNION ALL
SELECT 2192, 16600.00 UNION ALL
SELECT 2193, 52775.10 UNION ALL
SELECT 2182, 13320.34 UNION ALL
SELECT 2183, 1454.00 UNION ALL
SELECT 2184, 15960.00 UNION ALL
SELECT 2185, 18365.00 UNION ALL
SELECT 2186, 18646.00 UNION ALL
SELECT 2187, 21279.72 UNION ALL
SELECT 2176, 17525.00 UNION ALL
SELECT 2177, 19796.00 UNION ALL
SELECT 2178, 20573.00 UNION ALL
SELECT 2179, 21064.00 UNION ALL
SELECT 2180, 21252.00 UNION ALL
SELECT 2181, 7503.11
)
SELECT RID,VAL
INTO #data
FROM MySampleData
CREATE UNIQUE CLUSTERED INDEX ix1 ON #data (RID);
END
-- little hack - i ran the query including only items with non-zero "pennies/cents" - looking for combinations that ended in ".49"
-- this told me that nothing higher than 2181 could be involved in this particular solution, so I could include a WHERE RID<=2181 in this case.
;WITH rec AS
(
SELECT RID,CONVERT(DECIMAL(30,2),VAL) VAL,CONVERT(VARCHAR(4000),'') as items,RID as RID2
FROM #data
WHERE RID<=2181
UNION ALL
SELECT rec.RID,CONVERT(DECIMAL(30,2),rec.VAL + MySampleData.VAL),CONVERT(VARCHAR(4000),items+','+CONVERT(VARCHAR(4),MySampleData.RID)),MySampleData.RID
FROM rec
JOIN #data MySampleData
ON MySampleData.RID> rec.RID2 -- don't go over items you already covered
AND rec.VAL + MySampleData.VAL<=244194.49 -- no point adding up stuff that exceeds the target!
)
SELECT *
FROM rec
WHERE VAL = 244194.49 -- just in case it was out by a small amount...
OPTION(MAXRECURSION 0)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
October 2, 2012 at 6:42 pm
Well then, it is possible to have 23! combinations, I believe. You will have to have a pretty sophisticated algorithm to figure this out. I would think this is a CLR thing and not an SQL thing.
Jared
CE - Microsoft
October 2, 2012 at 7:54 pm
Lowell - Mister Magoo has pointed you in the right direction and his analysis to reduce the initial rowset based on the cents was really good.
His approach is similar to the second script in this article: http://www.sqlservercentral.com/articles/sql+n-Tuples/89809/ (for future reference).
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 3, 2012 at 3:31 am
Thanks for the validation dwain, and having read your thread, I thought that Jeff's preference for the WHILE loop must have some mileage, so I tried to speed things up and took the best of both worlds....a recursive CTE inside a while loop...
Now, this optimisation is pretty specific to this particular problem, so it is by no means meant as a general solution.
That said, it does bring the solution duration down to zero seconds, from over 2 minutes, so it's not bad...:-D
IF OBJECT_ID('tempdb..#data') is null
BEGIN
;WITH MySampleData(RID,VAL )
AS
(
SELECT 2172, 429.00 UNION ALL
SELECT 2173, 20555.00 UNION ALL
SELECT 2174, 5855.71 UNION ALL
SELECT 2175, 13892.89 UNION ALL
SELECT 2194, 36175.10 UNION ALL
SELECT 2188, 5451.32 UNION ALL
SELECT 2189, 566.00 UNION ALL
SELECT 2190, 706.40 UNION ALL
SELECT 2191, 19344.21 UNION ALL
SELECT 2192, 16600.00 UNION ALL
SELECT 2193, 52775.10 UNION ALL
SELECT 2182, 13320.34 UNION ALL
SELECT 2183, 1454.00 UNION ALL
SELECT 2184, 15960.00 UNION ALL
SELECT 2185, 18365.00 UNION ALL
SELECT 2186, 18646.00 UNION ALL
SELECT 2187, 21279.72 UNION ALL
SELECT 2176, 17525.00 UNION ALL
SELECT 2177, 19796.00 UNION ALL
SELECT 2178, 20573.00 UNION ALL
SELECT 2179, 21064.00 UNION ALL
SELECT 2180, 21252.00 UNION ALL
SELECT 2181, 7503.11
)
SELECT RID,VAL
INTO #data
FROM MySampleData
CREATE UNIQUE CLUSTERED INDEX ix1 ON #data (RID);
END
IF OBJECT_ID('tempdb..#bits') IS NOT NULL
DROP TABLE #bits;
DECLARE
@val DECIMAL(30,2)
,@MaxRID BIGINT
,@val2 DECIMAL(30,2)
,@id int
,@items VARCHAR(4000)
SET @val = 244194.49
-- little hack - run the query including only items with non-zero "pennies/cents" - looking for combinations that end in the required "pennies/cents"
-- store the results so we can reduce the number of items on the next run
;WITH rec AS
(
SELECT RID,CONVERT(DECIMAL(30,2),VAL) VAL,CONVERT(VARCHAR(4000),'') as items,RID as RID2
FROM #data
-- only interested in non-whole numbers here
WHERE VAL>FLOOR(VAL)
UNION ALL
SELECT rec.RID,CONVERT(DECIMAL(30,2),rec.VAL + MySampleData.VAL),CONVERT(VARCHAR(4000),items+','+CONVERT(VARCHAR(4),MySampleData.RID)),MySampleData.RID
FROM rec
JOIN #data MySampleData
-- don't go over items you already covered
ON MySampleData.RID> rec.RID2
-- no point adding up stuff that exceeds the target!
AND rec.VAL + MySampleData.VAL<=@val
-- only interested in non-whole numbers here
WHERE MySampleData.VAL>FLOOR(MySampleData.VAL)
)
SELECT identity(int,1,1) as id,RID,VAl,items
INTO #bits
FROM rec
WHERE (VAL-FLOOR(VAL)) = (@val-FLOOR(@val))
OPTION(MAXRECURSION 0)
select * from #bits
IF OBJECT_ID('tempdb..#results') IS NOT NULL
DROP TABLE #results;
CREATE TABLE #results(matched varchar(4000));
-- Now use a loop to calculate each possible set based on the knowledge we already have about non-whole numbers
SELECT TOP 1 @id = id, @MaxRID = RID,@val2=VAL,@items=CONVERT(VARCHAR(10),RID)+items+',' FROM #bits ORDER BY id
WHILE @@ROWCOUNT=1
BEGIN
-- find combinations of whole numbers that satisfy the remaining requirement
;WITH rec AS
(
SELECT RID,CONVERT(DECIMAL(30,2),VAL) VAL,CONVERT(VARCHAR(4000),'') as items,RID as RID2
FROM #data d
WHERE 1=1
-- No point checking RIDs that come after the one we are interested in, is there?
AND RID<=@MaxRID
-- whole numbers only please
AND VAL=FLOOR(VAL)
UNION ALL
SELECT rec.RID,CONVERT(DECIMAL(30,2),rec.VAL + MySampleData.VAL),CONVERT(VARCHAR(4000),items+','+CONVERT(VARCHAR(4),MySampleData.RID)),MySampleData.RID
FROM rec
JOIN #data MySampleData
-- don't go over items you already covered
ON MySampleData.RID> rec.RID2
-- no point adding up stuff that exceeds the target!
AND (MySampleData.VAL+rec.VAL)<=(@val-@val2)
-- whole numbers only please
AND MySampleData.VAL=FLOOR(MySampleData.VAL)
)
INSERT #results(matched)
SELECT CONVERT(VARCHAR(10),r.RID)+r.items+','+@items
FROM rec r
WHERE r.VAL + @val2 =@val
OPTION(MAXRECURSION 0, FAST 1)
SELECT TOP 1 @id = id, @MaxRID = RID,@val2=VAL,@items=CONVERT(VARCHAR(10),RID)+items+',' FROM #bits WHERE id>@id ORDER BY id
END
SELECT @val as TargetAmount,matched
FROM #results
Edit: Included a mistake made while tidying the code...now fixed.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
October 3, 2012 at 5:20 am
MM,
I don't know if you're familiar with the "Sweden Redistricting" problem, but it is mentioned at the bottom of the page in this thread: http://www.sqlservercentral.com/Forums/Topic1318149-392-5.aspx.
I spent two weeks working on a solution to it that ran in what I considered a reasonable period of time, using exactly the approach you used in your latest post to solve Lowell's problem. Essentially, both are bin packing problems:
1. Sweden re-districting: pack the bins, first with doublets and then with triplets
2. Lowell's problem: pack the bins with values containing fractional amounts and then pack them with the remaining values that are whole numbers.
I didn't post my final solution to the Sweden problem. If you're interested, PM me.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 3, 2012 at 5:50 am
ahh "bin packing" was the keyword; I've got a lot of saved scripts, could't remeber the label for it.
I wrote a script i wrote long ago, but my version involved while loops instead of rCTE's
thank you everyone for the help and the articles again!
Lowell
October 3, 2012 at 5:53 am
Dwain,
I did have a read of some of the Swedish districts thread, but not in that much detail.
I guess the general approach with these multi-level recursion problems is to reduce the size of the sets as often as possible, however suits the problem - in your case doublets first ,in this case fractional amounts first.
The key is in spotting how you can do that reduction and finding a way to apply it in a multi-level solution.
I was on the brink of looking at a solution for the Swedish District problem, because it looked interesting, but it may have to wait a while 🙂
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
October 4, 2012 at 6:43 am
I had a bit of an epiphany today that I thought I would share. I figure, if Mr. Magoo can use tricks to speed up the solution, I can use a trick to the solution approach. So instead of thinking of this as a bin packing problem, let's think of it conversely as a bin unpacking problem.
Consider the following (set up data omitted but courtesy of Mr. Magoo):
DECLARE @ReferenceTotal MONEY = 244194.49
,@Delimiter CHAR(1) = ','
,@ActualTotal MONEY = (
SELECT SUM(VAL)
FROM #data)
DECLARE @TargetTotal MONEY = @ActualTotal - @ReferenceTotal
;WITH UNIQUEnTuples (n, Tuples, ID, value) AS (
SELECT DISTINCT 1, CAST(RID AS VARCHAR(8000)), RID, CAST(VAL AS MONEY)
FROM #data
WHERE VAL <= @TargetTotal
UNION ALL
SELECT 1 + n.n, CAST(t.RID AS VARCHAR(8000)) + @Delimiter + n.Tuples, RID
,CAST(n.value + t.VAL AS MONEY)
FROM UNIQUEnTuples n
CROSS APPLY (
SELECT RID, VAL
FROM #data t
WHERE t.RID < n.ID) t
WHERE n.value + t.VAL <= @TargetTotal
)
SELECT [Soln No]=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
,ExcludedRIDs=Tuples, value
INTO #Temp
FROM UNIQUEnTuples
WHERE value = @ActualTotal - @ReferenceTotal
-- Display the solutions to the converse problem
SELECT * FROM #Temp
;WITH MyData AS (
SELECT [Soln No], RID
FROM #Temp
CROSS APPLY #data
EXCEPT
SELECT [Soln No], Item
FROM #Temp
CROSS APPLY dbo.DelimitedSplit8K(Tuples, @Delimiter)
)
SELECT [Soln No]
,IncludedRIDs=STUFF((
SELECT ',' + CAST(RID AS VARCHAR(10))
FROM MyData b
WHERE a.[Soln No] = b.[Soln No]
ORDER BY RID
FOR XML PATH('')), 1, 1, '')
FROM MyData a
GROUP BY [Soln No]
What I have done is to break the problem into 2 parts:
- Identify the RIDs that should be excluded to match the reference total provided by Lowell.
- Unpack those items from the full bin using EXCEPT.
- What is left is then STUFF/FOR XML PATH grouped into the two resulting records as the IncludedRIDs.
The solution may be a little more general than Mr. Magoo's (not sure), possibly a bit "prettier" (beauty is in the eye of the beholder) and certainly not as speedy (runs in about 17 seconds for me) but it was fun intellectually. The second step proved a bit more challenging than I expected until I switched my approach to using EXCEPT (before that I was doing something utterly stupid).
It would probably run faster if it didn't need to recurse to 5 levels to identify the 5 excluded RIDs, so is an option to consider when there's only a handful of RIDs you expect to be excluded.
UNIQUEnTuples is based on the article I quoted earlier with a couple of speed improvements that I posted today: http://www.sqlservercentral.com/Forums/Topic1301485-3122-5.aspx.
I need to thank you guys for the speed improvements. It was this problem and the slow run time of Mr. Magoo's original effort that made me go back and try some of the things I'd been thinking about.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply