October 22, 2013 at 12:25 pm
I'm trying to figure out an alternative to a WHILE LOOP statement.
I have a table with database key numbers (PartyKey) and a person's ID number (PersonId)
partykeypersonId
11
23
31
33
44
The problem is a Person can have more than one PartyKey number, and visa versa.
For instance, PartyKey 1 and 3 both have the same PersonId Number of 1.
PartyKey 3 and Partykey 2 also share a PersonId number of 3.
I need to get all combinations of Partykeys. In this case I would need PartyKeys 1, 2, and 3.
The WHILE LOOP works but this type of query is hard to understand and took quite some time to figure out.
Does anyone know of a better way to get the desired output? I have spent days working on this but other solutions took twice as long as the WHILE LOOP.
The desired output is this:
orderidpartykeynbrOfIterations
732141
882911
882932
882923
932331
932312
932322
Here is some test data and the desired resultset.
DECLARE @increment TINYINT
DECLARE @didinsert TINYINT
/* create test data for person id combinations */
CREATE TABLE #personIds(
partykey INT
,personId INT
)
INSERT INTO #personIds
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 1
UNION ALL
SELECT 3, 3
UNION ALL
SELECT 4, 4
;
/* create test data for orders */
CREATE TABLE #orders (
orderId INT
,partykey INT
)
INSERT INTO #orders
SELECT8829, 1
UNION ALL
SELECT 9323, 3
UNION ALL
SELECT 7321, 4
;
/* to store all partykey combinations */
CREATE TABLE #orderPartyIds (
orderid INT
,partykey INT
,incrementNbr TINYINT
)
;
SET @increment = 1
SET @didinsert = 1
/* insert order data orderspartyids table */
/* to be used in while loop below */
INSERT INTO #orderPartyIds (
orderid
,partykey
,incrementNbr
)
SELECTorderid
,partykey
,@increment /* to show this partykey is the original in the order */
FROM#orders
/* this routine loops thru the orderparties and personids tables to get all posible partykey combinations */
/* this actually works, just looks messy */
WHILE @increment < 10 AND @didinsert = 1
BEGIN
INSERT INTO #orderPartyIds (orderid, partykey, incrementNbr)
SELECT DISTINCT o.orderid, p1.partykey, @increment
FROM #orderPartyIds o
INNER JOIN #personIds p ON o.partykey = p.partykey
INNER JOIN #personIds p1 ON p.personId = p1.personId
WHEREincrementNbr = @increment - 1
AND NOT EXISTS (
SELECTorderid
FROM#orderPartyIds o1
WHEREo1.orderid = o.orderid
ANDo1.partykey = p1.partykey
)
SET @didinsert = 0
/* set the variable back to one if data was inserted */
SELECT@didinsert = 1
FROM#orderPartyIds
WHEREincrementNbr = @increment
/* increment the counter by one */
SET @increment = @increment + 1
END /* end of WHILE loop */
;
/* expected output */
SELECTorderid
,partykey
,incrementNbr AS nbrOfIterations
FROM#orderPartyIds
ORDER BY orderid
,incrementNbr
;
/* clean up */
DROP TABLE #personIds
DROP TABLE #orders
DROP TABLE #orderPartyIds
October 22, 2013 at 3:25 pm
If you try grouping order id and partkey with count(*) that should give you number of unique combination of order id and part key.
If you want 1 2 3 next to it you can use Rownumber().
October 23, 2013 at 7:11 am
Thats not what he wants if you look at his expected results. He wants to do a "six degrees of kevin bacon" on his data, finding all the parties associated to an order through mutual people.
It can be done with a recursive CTE, which is still not ideal, but likely better than the while loop. I don't think you can avoid some type of recursion here given that table, but I'd love to be proven wrong. There might be a way to do it with dynamic sql, too, but my head hurts just thinking about it.
EDIT: I THINK it can be done with a recursive CTE. No longer sure
October 23, 2013 at 9:10 am
Here is the recursive CTE solution.
Not sure if it will be any quicker at all, and you still need to know the max iteration depth to put in an exit condition.
WITH
recursiveCTE AS (
SELECTorderid,
o.partykey,
p.personId,
1 AS incrementNbr
FROM #orders o
INNER JOIN #personIds p
ON p.partykey = o.partykey
UNION ALL
SELECT r.orderid,
p2.partykey,
p2.personId,
incrementNbr + 1 AS incrementNbr
FROM recursiveCTE r
INNER JOIN #personIds p1
ON p1.partykey = r.partykey
INNER JOIN #personIds p2
ON p2.personId = p1.personId
AND p2.partykey <> r.partykey
WHERE incrementNbr < 10
)
SELECT orderid,partykey,min(incrementNbr) AS nbrOfIterations
FROM recursiveCTE
GROUP by orderid,partykey
ORDER BY orderid, min(incrementNbr),partykey
October 23, 2013 at 10:48 am
Thanks Nevyn. That help me figure out what I did wrong.
Learned something new today. Didn't know what a "six degrees of kevin bacon" was. Interesting readying.
Anyway, the WHILE LOOP was still faster when executed against a table with 2,789,799 records.
I got the below script from some article on SSC some time ago and use it frequently on db's I have limited access on.
You can see the results of recursive CTE and the WHILE LOOP statement. SQL 2008 R2 on an i7 HP laptop.
The recursive CTE took almost one minute and the WHILE LOOP about 10 seconds or less. Did not clear cache between runs.
Anyway, I was hoping to clean up some code, since it being used more frequently, but the original developer may have had this right. Not giving up but it has to be shelved for right now.
Set NoCount On;
Declare @cpu_ int;
Declare @lreads_ int;
Declare @eMsec_ int;
Select
@cpu_ = cpu_time,
@lreads_ = logical_reads,
@eMsec_ = total_elapsed_time
From
sys.dm_exec_requests
Where
session_id = @@spid;
--====== Paste Code Below ============
--WHILE LOOP - three runs
----CpuMsLogRdsElapsed
----2596641465099
----2719641565540
----2841641465366
--Recursive CTE - two runs
----CpuMsLogRdsElapsed
----57360160042057833
----57687160041459077
--====== Paste Code Above =============
Select
cpu_time-@cpu_ as CpuMs,
logical_reads- @lreads_ as LogRds,
total_elapsed_time - @eMsec_ as Elapsed
From
sys.dm_exec_requests
Where
session_id = @@spid
GO
October 23, 2013 at 11:04 am
Yeah, I'm not surprised.
If the deepest # of iterations isn't that high, the while loop performance would not be that bad.
The recursive cte can't tell which matches it has already found (I tried to integrate that but it did not go well), so it keeps linking down to max iterations, and then fixes it in the grouping.
I don't think the dynamic sql would be a performance boost either. Maybe if hundreds of iterations were possible one of these could become worth it.
You might be better off tweaking the loop for performance gains instead of replacing it.
Try out (loop only below, needs the rest of your code):
WHILE @increment < 10 AND (@didinsert > 0 OR @increment <= 2)
BEGIN
INSERT INTO #orderPartyIds (orderid, partykey, incrementNbr)
SELECT DISTINCT o.orderid, p1.partykey, @increment
FROM #orderPartyIds o
INNER JOIN #personIds p ON o.partykey = p.partykey
INNER JOIN #personIds p1 ON p.personId = p1.personId
WHEREincrementNbr = @increment - 1
AND NOT EXISTS (
SELECTorderid
FROM#orderPartyIds o1
WHEREo1.orderid = o.orderid
ANDo1.partykey = p1.partykey
)
SET @didinsert = @@ROWCOUNT
/* set the variable back to one if data was inserted */
--SELECT@didinsert = 1
--FROM#orderPartyIds
--WHEREincrementNbr = @increment
/* increment the counter by one */
SET @increment = @increment + 1
END /* end of WHILE loop */
In other words, you should not need to query your temp table to find out if you are at max depth. @@rowcount will tell you if anything got inserted on the current increment.
October 23, 2013 at 12:37 pm
As for the 'six degrees of kevin bacon' analogy, I thought it was a good fit to the problem you describe 🙂
Kind of a strange requirement ...
October 23, 2013 at 1:02 pm
You know, sometimes the obvious is just too easy to see. Thanks.
I have been concentrating on getting rid of the WHILE LOOP.
When I first looked at the stored procedure it had over 960 lines of code. It also was not very readable (stretched out across the whole screen). I was able to take out repeatable processes and put them in either scalar functions or table valued functions, and made use of views where possible. This improved the performance of the proc tremendously and eliminated most, not all (yet), of the table scans.
The data we're working with is from integrations for other systems. Unfortunately in the source system a person can have more than one id and one id can have more than one person. Geez.
Just have to work with it. Not liking it but Oh Well.
Now that I've resigned myself to sticking with the WHILE LOOP, for awhile, I'll take your suggestion and look at ways to improve the logic.
Thanks again.
October 23, 2013 at 8:54 pm
I guess its going to depend on how many levels you need but I believe this works on your test data:
WITH AllPersons AS
(
SELECT DISTINCT c.partykey
,personid=CASE WHEN a.personid = b.personid AND a.personid = c.personid THEN a.personid
WHEN a.personid = b.personid THEN c.personid
WHEN b.personid = c.personid THEN a.personid
ELSE b.personid END
FROM #personIDs a
JOIN #personIDs b ON a.partykey = b.partykey
JOIN #personIDs c ON b.personid = c.personid
)
SELECT DISTINCT orderid, c.partykey
FROM #orders a
JOIN AllPersons b ON a.partykey = b.partykey
JOIN #personIDs c ON b.personid = c.personid;
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 23, 2013 at 9:20 pm
This WHILE LOOP might be a little cleaner and faster than what you've got and should resolve to any number of levels.
DECLARE @increment TINYINT
DECLARE @didinsert TINYINT
/* create test data for person id combinations */
CREATE TABLE #personIds(
partykey INT
,personId INT
)
INSERT INTO #personIds
SELECT 1, 1
UNION ALL
SELECT 2, 3
UNION ALL
SELECT 3, 1
UNION ALL
SELECT 3, 3
UNION ALL
SELECT 4, 4
;
/* create test data for orders */
CREATE TABLE #orders (
orderId INT
,partykey INT
)
INSERT INTO #orders
SELECT8829, 1
UNION ALL
SELECT 9323, 3
UNION ALL
SELECT 7321, 4
;
/* to store all partykey combinations */
CREATE TABLE #orderPartyIds (
orderid INT
,partykey INT
,personid INT
,incrementNbr TINYINT
)
;
DECLARE @RowsCount INT, @Iteration INT = 1;
INSERT INTO #orderPartyIDs (orderid, partykey, personid, incrementNbr)
SELECT orderid, a.partykey, b.personid, @Iteration
FROM #orders a
JOIN #personIDs b ON a.partykey = b.partykey;
SELECT @RowsCount = @@ROWCOUNT, @Iteration = @Iteration + 1;
WHILE @RowsCount > 0
BEGIN
INSERT INTO #orderPartyIDs (orderid, partykey, personid, incrementNbr)
SELECT orderid, partykey, personid, @Iteration
FROM
(
SELECT a.orderid, b.partykey, b.personid
FROM #orderPartyIDs a
JOIN #personIDs b ON a.personid = b.personid OR a.partykey = b.partykey
EXCEPT
SELECT orderid, partykey, personid
FROM #orderPartyIDs
) a;
SELECT @RowsCount = @@ROWCOUNT, @Iteration = @Iteration + 1;
END
SELECT orderid, partykey, IncrementNbr
FROM
(
SELECT orderid, partykey, IncrementNbr
,rn=ROW_NUMBER() OVER (PARTITION BY orderid, partykey ORDER BY IncrementNbr)
FROM #orderPartyIDs
) a
WHERE rn=1;
GO
/* clean up */
DROP TABLE #personIds
DROP TABLE #orders
DROP TABLE #orderPartyIds
Note that I added a column to your #orderPartyIDs temp table.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
October 27, 2013 at 6:36 pm
Hmmm... I was really hoping to hear back from the OP whether my suggestion helped.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply