April 7, 2010 at 2:40 pm
I think the title says it all. This little step is part of a much larger procedure.
Bascially I have 3 documents : orders, shippings and invoices. All with headers and lines obviously.
The fun starts with the fact that shippings and invoices have the column orderNo, but no link between the shipping and invoice. From that little misdesign, I need to match all 3 document # in a list of invoices...
The challenge: each order can have n shippings and the major PITA is that any # of shippings (for a single order) can then be merged into a single invoice... or not.
I have cases where a single order has 5 shippings and only 3 invoices which is where I'm hitting a wall.
In the final query I need to have only 1 line per invoice, so when I have more than 1 shipping, I need to display it like so : "order #" "invoice #" "VBL-001234, VBL-001236"
The only way I can safely match all documents is to go at the details level of shipments and invoices and check that I have 100% match on all rows for sku and Qty which proves I have the correct document links.
This method works excellent if I have only 1 invoice per order even if I have n shippings.
I have adapted my query to work with a list of possible combinaisons (ship #1 + ship #3).
Now what I need to do is generate that list of combos. I need to generate a list of all distinct combinaisons of two or more shipping # having without containing all possible documents. I have the feeling a recursive cte could do it but I've never worked with that and I'm a little lost on how to make it work.
So far, the worst case scenario I have is 16 shippings for a single order hence the reason I'm not building this with a series of self joins.
Here's some sample data and required output :
DECLARE @Base TABLE (OrderNo VARCHAR(20) NOT NULL, ShippingNo VARCHAR(20) NOT NULL, GroupRowID INT NULL, PRIMARY KEY CLUSTERED (OrderNo, ShippingNO))
INSERT INTO @Base (OrderNo, ShippingNO)
VALUES
('VCB-000070', '1'),
('VCB-000070', '2'),
('VCB-000070', '3'),
('VCB-000071', '1'),
('VCB-000071', '2'),
('VCB-000071', '3'),
('VCB-000071', '4')
for 'VCB-000070' :
--1 --exclude
--2 --exclude
--3 --exclude
1,2
1,3
2,3
--1,2,3 --exclude
for 'VCB-000071'
--1 --exclude
--2 --exclude
--3 --exclude
--4 --exclude
1,2
1,3
1,4
2,3
2,4
3,4
1,2,3
1,2,4
2,3,4
--1,2,3,4 --exclude
April 7, 2010 at 3:12 pm
It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.
April 7, 2010 at 3:45 pm
I'm not sure if I fully understood the requirement...
But I would consider building a permanent table holding all variations and query that one based on the number of shippings.
Maybe the following code helps to understand what I'd do (@lvl is used to define the number of shipments):
DECLARE @lvl INT
SET @lvl=4
DECLARE @tbl TABLE
(
id INT
)
INSERT INTO @tbl
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
;WITH cte AS
(
SELECT
t1.id AS id1,
t2.id AS id2,
t3.id AS id3,
t4.id AS id4,
CASE
WHEN t2.id=t3.id THEN 3
WHEN t2.id<t3.id AND t3.id=t4.id THEN 4
ELSE 5 END AS lvl
FROM @tbl t1
CROSS JOIN @tbl t2
CROSS JOIN @tbl t3
CROSS JOIN @tbl t4
WHERE t1.id<=t2.id
AND t2.id<=t3.id
AND t3.id<=t4.id
AND t1.id<>t2.id AND t1.id<>t3.id AND t1.id<>t4.id
), cte2 AS
(
SELECT
id1,
id2,
CASE WHEN @lvl>3 AND id2<>id3 AND id2<>id4 THEN id3 ELSE 8 END AS id3,
CASE WHEN @lvl>4 AND id3<>id4 AND id2<> id4 AND id3<>id4 THEN id4 ELSE 9 END AS id4
FROM cte
WHERE lvl<= @lvl
AND id2<= @lvl
AND id3<= @lvl
AND id4<= @lvl
)
SELECT
id1,
id2,
CASE WHEN id3<8 THEN id3 ELSE 0 END AS id3,
CASE WHEN id4<9 THEN id4 ELSE 0 END AS id4
FROM cte2
WHERE id2>id1 AND id3>id2 AND id4>id3
GROUP BY id1,id2,id3,id4
April 7, 2010 at 4:38 pm
Lynn Pettis (4/7/2010)
It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.
That post would contain about 500 lines of code and another 2000 to build the relevant objects and test data which I'm sure I can't provide because it contains personal data. And of course there's even more to it than what I said but those missing pieces are basic joins to other tables, nothing relevant to my little problem.
I have everything working except the piece to build all possible combinaisons which is where I need help. Other than just going with a loop I don't see any easy way out of this one.
April 7, 2010 at 4:42 pm
lmu92 (4/7/2010)
I'm not sure if I fully understood the requirement...But I would consider building a permanent table holding all variations and query that one based on the number of shippings.
Maybe the following code helps to understand what I'd do (@lvl is used to define the number of shipments):
DECLARE @lvl INT
SET @lvl=4
DECLARE @tbl TABLE
(
id INT
)
INSERT INTO @tbl
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
;WITH cte AS
(
SELECT
t1.id AS id1,
t2.id AS id2,
t3.id AS id3,
t4.id AS id4,
CASE
WHEN t2.id=t3.id THEN 3
WHEN t2.id<t3.id AND t3.id=t4.id THEN 4
ELSE 5 END AS lvl
FROM @tbl t1
CROSS JOIN @tbl t2
CROSS JOIN @tbl t3
CROSS JOIN @tbl t4
WHERE t1.id<=t2.id
AND t2.id<=t3.id
AND t3.id<=t4.id
AND t1.id<>t2.id AND t1.id<>t3.id AND t1.id<>t4.id
), cte2 AS
(
SELECT
id1,
id2,
CASE WHEN @lvl>3 AND id2<>id3 AND id2<>id4 THEN id3 ELSE 8 END AS id3,
CASE WHEN @lvl>4 AND id3<>id4 AND id2<> id4 AND id3<>id4 THEN id4 ELSE 9 END AS id4
FROM cte
WHERE lvl<= @lvl
AND id2<= @lvl
AND id3<= @lvl
AND id4<= @lvl
)
SELECT
id1,
id2,
CASE WHEN id3<8 THEN id3 ELSE 0 END AS id3,
CASE WHEN id4<9 THEN id4 ELSE 0 END AS id4
FROM cte2
WHERE id2>id1 AND id3>id2 AND id4>id3
GROUP BY id1,id2,id3,id4
No you're missing a small part of it. I don't know ahead of time how many levels of cross joins I'll need and it's going to be different for every order # (which a single client can have as much as 300 in a single day... and this report I'm building can display all the client's order since day 1 of the ERP).
So in essence I need to build all the possible combinaisons for all orders at run time. So that leaves me with 2 obvious answers
1 - Dynamic SQL which is really hard to implement
2 - Looping 1 level at the time until everything is resolved... which would be my next try assuming a recursing cross join is not possible is SQL.
April 7, 2010 at 4:50 pm
Think of it this way...
You have 4 persons in a room, how many different "couples" can do a handshake.
p1 can shake hands with p2,p3,p4
p2 can't shake hands again with p1 so only p2 with p3 and p4 remain
then only p3 can shake hands with p4.
Now that you have that done, how many 3 way hand shaking can you have (this is where recursion kicks in)?
p1,p2,p3
p1,p2,p4
p2,p3,p4
In my current solution I'm already adressing the possibilities that a single hand shake or everybody at the same time will resolve the join. Now I need everything in between.
Again I have cases with as much as 16 peoples in the room (16 shippings for 1 single order). So that's 14 levels of recursions to build all possible groups of handshakes. I wish I could give you a formula but I just can't figure that one out atm!!!
April 7, 2010 at 4:54 pm
Ninja's_RGR'us (4/7/2010)
Lynn Pettis (4/7/2010)
It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.That post would contain about 500 lines of code and another 2000 to build the relevant objects and test data which I'm sure I can't provide because it contains personal data. And of course there's even more to it than what I said but those missing pieces are basic joins to other tables, nothing relevant to my little problem.
I have everything working except the piece to build all possible combinaisons which is where I need help. Other than just going with a loop I don't see any easy way out of this one.
Not asking for actual data, but sample data; something that at least represents the problem domain. From what you have posted I'll have to leave it to others who are more imaginative. I need to see the problem and what the ultimate result is going to be to be of any help.
April 7, 2010 at 5:00 pm
Do you know the max number of cross joins ahead of time?
What I'd probably do in this case is to build a static table like the one I posted covering a "safe" level. When processing new data I would check if the nesting level would be sufficient and, if not, call as separate proc that would rebuild that table with expanded levels (most probably max_level + 1).
To query the data I would use an iTVF with the number of shippings as an input parameter to return the reduced matrix.
Basically, the "matrix" table would either be created based on a dynamic SQL statement to cover the unknown levels (including re-creation of the iTVF) or I would throw an exception and redo it script based (depending on the business requirement).
I might even add a pre-warning alert to send me an email if I'd have a level of 20 covered and there will be data 18 levels deep... It depends. π
But I agree trying to avoid looping for each shipment.
April 7, 2010 at 6:05 pm
This is a really interesting problem. I cannot think of a way to do it completely dynamic in terms of the maximum size of your combination collections and I'm hoping someone will come along with an amazing method I never would have considered.
However, if you can limit the maximum element grouping to a certain number, I have a possible suggestion for you. Basically, it's a tally table ... the numbers would join back to a ROW_NUMBER or identity or however you want to join back to your actual data. The 0's would be non-joins (so as to account for groupings smaller than the maximum.
My tiny tally table to avoid getting too many results back:
declare @test-2 table (myInt int)
insert into @test-2(myInt)
select 0
union select 1
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 10
And my logic to build my complex-er tally table:
select t1.myInt col1,
t2.myInt col2,
t3.myInt col3,
t4.myInt col4,
t5.myInt col5,
t6.myInt as col6
from @test-2 t1
join @test-2 t2
on t2.myInt > t1.myInt
join @test-2 t3
on t3.myInt > t2.myInt or t3.myInt = 0
join @test-2 t4
on (t4.myInt > t3.myInt and t3.myInt > 0) or t4.myInt = 0
join @test-2 t5
on (t5.myInt > t4.myInt and t4.myInt > 0) or t5.myInt = 0
join @test-2 t6
on (t6.myInt > t5.myInt and t5.myInt > 0) or t6.myInt = 0
where t1.myInt > 0
order by t1.myInt, t2.myInt
With only the 10 initial values, it should return 837 possible combination of between 2 and 6 elements.
I would be interested to hear feedback.
Thanks,
-Ben
April 7, 2010 at 6:39 pm
bteraberry (4/7/2010)
This is a really interesting problem. I cannot think of a way to do it completely dynamic in terms of the maximum size of your combination collections and I'm hoping someone will come along with an amazing method I never would have considered.However, if you can limit the maximum element grouping to a certain number, I have a possible suggestion for you. Basically, it's a tally table ... the numbers would join back to a ROW_NUMBER or identity or however you want to join back to your actual data. The 0's would be non-joins (so as to account for groupings smaller than the maximum.
My tiny tally table to avoid getting too many results back:
declare @test-2 table (myInt int)
insert into @test-2(myInt)
select 0
union select 1
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 10
And my logic to build my complex-er tally table:
select t1.myInt col1,
t2.myInt col2,
t3.myInt col3,
t4.myInt col4,
t5.myInt col5,
t6.myInt as col6
from @test-2 t1
join @test-2 t2
on t2.myInt > t1.myInt
join @test-2 t3
on t3.myInt > t2.myInt or t3.myInt = 0
join @test-2 t4
on (t4.myInt > t3.myInt and t3.myInt > 0) or t4.myInt = 0
join @test-2 t5
on (t5.myInt > t4.myInt and t4.myInt > 0) or t5.myInt = 0
join @test-2 t6
on (t6.myInt > t5.myInt and t5.myInt > 0) or t6.myInt = 0
where t1.myInt > 0
order by t1.myInt, t2.myInt
With only the 10 initial values, it should return 837 possible combination of between 2 and 6 elements.
I would be interested to hear feedback.
Thanks,
-Ben
The problem with this approach is that it assumes a max # of recursions... I could of course go on a limb and go all the way to 25 levels up front but I can't even estimate the # of rows this dataset would include. I adjusted your code to have 25 values instead of 10 and still do only 6 joins and the result had almost 250 000 rows... I can't imagine going even only to 16 with this.
April 7, 2010 at 6:41 pm
Lynn Pettis (4/7/2010)
Ninja's_RGR'us (4/7/2010)
Lynn Pettis (4/7/2010)
It may help to see the whole problem not just a piece of it. Can you put together a test suite including the expected output? I know I work better when I know the whole problem, and can see what the end result should be.That post would contain about 500 lines of code and another 2000 to build the relevant objects and test data which I'm sure I can't provide because it contains personal data. And of course there's even more to it than what I said but those missing pieces are basic joins to other tables, nothing relevant to my little problem.
I have everything working except the piece to build all possible combinaisons which is where I need help. Other than just going with a loop I don't see any easy way out of this one.
Not asking for actual data, but sample data; something that at least represents the problem domain. From what you have posted I'll have to leave it to others who are more imaginative. I need to see the problem and what the ultimate result is going to be to be of any help.
I'll see what I can do but I can't promise anything... I'd need to built scripts for at least 8 tables, adjust 500 lines of code to match the new tables then do about 2000 insert statements to have anything close to reality for 1 month of data for only 1 client... that's a lot of work for something I already know how to do... but wishing there was another way.
April 7, 2010 at 7:22 pm
This would make a great TSQL Challenge. I've been going around in circles for a couple hours now trying to figure out a way to do this in a dynamic set based manner. Haven't quite made it all come together yet.
April 8, 2010 at 2:46 am
^__^ Hi there,
uhmmm, i'm not usre if i understood the problem right...
i turn Ben's code to dynamic sql (which i always avoid using)
from Ben's code, are you saying you want to specify the number of fields/joins?
NOTES:
1. I have included the minimum number of combinations as an input parameter but it doesn't work yet...
2. I kinda hate how my code looks...
3. You can combine the two loops but will have additional conditional statements...
4. I tried 25 records with 16 joins... it took 3 mins and row 1 still has the value 1 (i stoped the execution)... that sample data would be kinda slow...
5. Just change the value of @MaxRows to the desired number of joins. (enter 3 or more, dont have proper validations yet)
hmmmm, is this what your looking for?
I hope it helps ^__^
SAMPLE DATA
CREATE TABLE #Test (myInt int)
insert into #Test(myInt)
select 0
union select 1
union select 2
union select 3
union select 4
union select 5
union select 6
union select 7
union select 8
union select 9
union select 10
MY CODE
DECLARE @MinRows INT, @MaxRows INT, @SqlString NVARCHAR(MAX),
@RowString NVARCHAR(MAX),@JoinString NVARCHAR(MAX),
@i INT,@iStr NVARCHAR(10),@iStrBefore NVARCHAR(10)
SELECT@MinRows='2'-- NO USE YET
,@MaxRows='8'
SET @i=1
SET @RowString='SELECT'
--// ROWS
WHILE @i<=@MaxRows
BEGIN
SET @iStr=CAST(@i AS NVARCHAR(MAX))
SELECT @RowString = @RowString + '
t' + @iStr + '.myInt col' + @iStr + ','
SET @i=@i+1
END
SELECT @RowString=LEFT(@RowString,LEN(@RowString)-1)
--//JOINS
SET @JoinString='
from #Test t1
inner join #Test t2
on t2.myInt > t1.myInt'
IF @MaxRows>=3
BEGIN
SELECT @JoinString=@JoinString+'
inner join #Test t3
on t3.myInt > t2.myInt or t3.myInt = 0'
IF @MaxRows>=4
BEGIN
SET @i=4
WHILE @i<=@MaxRows
BEGIN
SET @iStr=CAST(@i AS NVARCHAR(MAX))
SET @iStrBefore=CAST(@i-1 AS NVARCHAR(MAX))
SELECT @JoinString=@JoinString+'
inner join #Test t' + @iStr + '
on (t' + @iStr + '.myInt > t' + @iStrBefore + '.myInt and t' + @iStrBefore + '.myInt > 0) or t' + @iStr + '.myInt = 0'
SET @i=@i+1
END
END
END
SELECT @SqlString = @RowString + @JoinString +
'where t1.myInt > 0
order by t1.myInt, t2.myInt'
SELECT@SqlString AS 'My_SQL_Query_String'
,CAST(@SqlString AS XML) AS 'CLICK_ME_TO_VIEW'
EXEC (@SqlString)
DROP TABLE #Test
Is this what your looking for?
_____________________________________________
[font="Comic Sans MS"]Quatrei Quorizawa[/font]
:):D:P;):w00t::cool::hehe:
MABUHAY PHILIPPINES!
April 8, 2010 at 2:51 am
Backtracking a little...
Orders (OrderID)
OrderLines (OrderLineID)
Shipping (ShippingID)
ShippingLines (ShippingLineID)
Invoices (InvoiceID)
InvoiceLines (InvoiceLineID)
Which tables hold fk's of which other tables?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 8, 2010 at 4:34 am
Ninja's_RGR'us (4/7/2010)
Think of it this way...You have 4 persons in a room, how many different "couples" can do a handshake.
N-way handshaking combinations...
USE tempdb;
GO
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
GO
DROP FUNCTION dbo.Numbers;
GO
CREATE FUNCTION dbo.Numbers
(@N BIGINT)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
WITH
N1 AS (SELECT N = 1 UNION ALL SELECT 1),
N2 AS (SELECT N = 1 FROM N1 T, N1),
N3 AS (SELECT N = 1 FROM N2 T, N2),
N4 AS (SELECT N = 1 FROM N3 T, N3),
N5 AS (SELECT N = 1 FROM N4 T, N4),
N6 AS (SELECT N = 1 FROM N5 T, N5),
NM AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS N FROM N6)
SELECT TOP (@N)
N
FROM NM
WHERE N <= @N
ORDER BY
N ASC;
GO
DECLARE @People
TABLE (
person_id INTEGER IDENTITY(0,1) PRIMARY KEY,
name VARCHAR(10) NOT NULL
);
INSERT @People (name) VALUES ('Alan');
INSERT @People (name) VALUES ('Bob');
INSERT @People (name) VALUES ('Carol');
INSERT @People (name) VALUES ('Debra');
INSERT @People (name) VALUES ('Eric');
INSERT @People (name) VALUES ('Frank');
INSERT @People (name) VALUES ('Gina');
INSERT @People (name) VALUES ('Harry');
INSERT @People (name) VALUES ('Ian');
INSERT @People (name) VALUES ('Julie');
DECLARE @Bits
TABLE (
bit_id BIGINT NOT NULL PRIMARY KEY,
value BIGINT NOT NULL
);
INSERT @Bits (bit_id, value)
SELECT P.person_id - 1, POWER(2, (P.person_id - 1))
FROM @People P;
DECLARE @Grouped
TABLE (
N BIGINT NOT NULL,
name VARCHAR(10) NOT NULL,
group_size BIGINT NOT NULL,
group_order BIGINT NOT NULL,
group_row BIGINT NOT NULL,
UNIQUE (N, name),
UNIQUE (group_row, group_size, group_order, N)
);
INSERT @Grouped
(N, name, group_size, group_order, group_row)
SELECT Q.N,
Q.name,
Q.group_size,
group_order = DENSE_RANK() OVER (PARTITION BY Q.group_size ORDER BY Q.N),
group_row = DENSE_RANK() OVER (PARTITION BY Q.group_size ORDER BY Q.name)
FROM (
SELECT N.N,
P.name,
group_size = COUNT(*) OVER (PARTITION BY N.N)
FROM dbo.Numbers (POWER(2, (SELECT COUNT(*) FROM @People))) N
JOIN @Bits B
ON N.N & B.value = B.value
JOIN @People P
ON P.person_id = B.bit_id
) Q
SELECT names =
STUFF
(
(
SELECT ',' + name
FROM @Grouped T2
WHERE T2.N = T1.N
ORDER BY
name ASC
FOR XML PATH('')
)
, 1, 1, SPACE(0))
FROM @Grouped T1
WHERE group_row = 1
ORDER BY
T1.group_size,
T1.group_order;
GO
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 1 through 15 (of 64 total)
You must be logged in to reply to this topic. Login to reply