March 15, 2011 at 9:28 am
I'm tracking a variety of processes to know when a given process is "compliant" vs "non-compliant". In the end, I just want to count the number of times a process has switched from non-compliant to compliant. So, below is a simplied example with sample data for a six month period:
ProcessID, Month, Compliant (0= non-compliant, 1 = compliant)
1,1,0
1,2,0
1,3,1
1,4,1
1,5,0
1,6,1
From the example, the process switched from non-compliant to compliant twice: once from month 2 to month 3, and then again from month 5 to month 6. I don't need to count the number of actual months that the process was compliant, only when it switches to be compliant.
I keep thinking that I could leverage a CTE or a self join, but I'm obviously struggling to solve it.
Thanks in advance for assistance,
Pete
March 15, 2011 at 9:48 am
Pete
Something like this?
SELECT COUNT(*)
FROM MyTable t1
JOIN Mytable t2
ON t1.ProcessId = t2.ProcessId
AND t2.Month = t1.Month + 1
AND t1.Compliant <> t2.Compliant
John
March 15, 2011 at 9:49 am
Try this
DECLARE @t TABLE(ProcessID INT, Month INT, Compliant INT)
INSERT INTO @t(ProcessID, Month, Compliant)
SELECT 1,1,0 UNION ALL
SELECT 1,2,0 UNION ALL
SELECT 1,3,1 UNION ALL
SELECT 1,4,1 UNION ALL
SELECT 1,5,0 UNION ALL
SELECT 1,6,1;
WITH CTE AS (
SELECT ProcessID, Month, Compliant,
ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month) AS rn
FROM @t)
SELECT x.ProcessID, x.Month
FROM CTE x
WHERE x.Compliant=1
AND EXISTS (SELECT * FROM CTE y
WHERE y.ProcessID=x.ProcessID
AND y.Compliant=0
AND y.rn=x.rn-1);
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 15, 2011 at 10:48 am
This is presenting itself as a "running total" problem...i.e. you need to compare a value in the previous row to a value in the current row to make a decision on whether to aggregate.
This would be best handled in an application layer better suited to iterative processing...but if you're stuck in T-SQL you're looking at a cursor, a self-referencing sub-select (similar to a triangular join) which was presented by Mark or maybe a quirky update.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
March 15, 2011 at 11:31 am
Mark - your query solution is great. I was leaning towards a cte and row_number(), but your technique with correlating a query using EXISTS provides the missing piece that eluded me.
John - I initially tried a self-join, as in your query, but without success. Your solution, unfortunately, returns an incorrect count (3), instead of 2. Nonetheless, I appreciate your assistance. I'm sure I've seen a self-join solution before for problems similar as the one I've presented -- wish I had saved them off.
/* HERE'S THE KICKER */
Looks like I've got a moving target --> Since the moment of when I posted my quest for help, the scope of the project appears to be shifting quickly to something similar to maintaining a SCD (slowly changing dimension). So, rather than merely being able to count the number of times the Compliant flag changes from non-compliant to compliant, it seems a table may be needed to track the startmonth & endmonth for each series of the compliant flag. Thus, in my example, 4 rows would be extracted as follows:
Process, StartMonth, EndMonth, Compliant
1,1,2,0
1,3,4,1
1,5,5,0
1,6,null,1
It's a simple process of summing the Compliant field, thereafter.
Unfortunately, I don't have a lot of practice with transforming rows into a date-span recordset; I'm used to going in the opposite direction of taking date spans, and transforming them into rows. I'm aware that SSIS has a SDC component, and as well as alternative techniques in SSIS, but is it a straight-forward enough process to maintain a SCD (aka date-span recordset) using just T-SQL?
March 15, 2011 at 12:10 pm
Something like this?
WITH CTE AS (
SELECT ProcessID, Month, Compliant,
ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month) -
ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month) AS rnDiff
FROM @t)
SELECT ProcessID,
MIN(Month) AS StartMonth,
MAX(Month) AS EndMonth,
Compliant
FROM CTE x
GROUP BY ProcessID, Compliant, rnDiff
ORDER BY ProcessID,StartMonth;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 15, 2011 at 12:36 pm
Marrrrrrk!
OK, I'm completely jazzed by the elegance of the query. Reminds me of the various techniques I've seen by Itzik Ben-Gan leveraging row_number() and other ranking functions.
So, was this type of query something you keep around in your bag of tricks or were you able to think it through? (I've parsed out your cte to better understand how the data comes out of it before it gets rolled up - I really want to understand the logic.)
Thanks,
Pete
March 15, 2011 at 2:19 pm
It's quite a well known technique, just google for sql "gaps and islands".
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 16, 2011 at 7:19 am
Mark,
Just before you provided the second query solution, I actually started reading up on gaps and islands --I just didn't get a chance to put something to the test. (At least I now know I was on the right track... )
Thanks again for your help
--Pete
March 16, 2011 at 7:24 am
Also a minor change will give you the NULL for the latest end month
WITH CTE AS (
SELECT ProcessID, Month, Compliant,
ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month DESC) AS rn1,
ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month DESC) AS rn2
FROM @t)
SELECT ProcessID,
MIN(Month) AS StartMonth,
MAX(CASE WHEN rn1>1 THEN Month END) AS EndMonth,
Compliant
FROM CTE x
GROUP BY ProcessID, Compliant, rn1-rn2
ORDER BY ProcessID,StartMonth;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537March 16, 2011 at 7:28 am
Mark-101232 (3/16/2011)
Also a minor change will give you the NULL for the latest end month
WITH CTE AS (
SELECT ProcessID, Month, Compliant,
ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY Month DESC) AS rn1,
ROW_NUMBER() OVER(PARTITION BY ProcessID,Compliant ORDER BY Month DESC) AS rn2
FROM @t)
SELECT ProcessID,
MIN(Month) AS StartMonth,
MAX(CASE WHEN rn1>1 THEN Month END) AS EndMonth,
Compliant
FROM CTE x
GROUP BY ProcessID, Compliant, rn1-rn2
ORDER BY ProcessID,StartMonth;
BONUS!
thx
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply