August 3, 2009 at 2:16 am
I'm hoping Jeff Modem can help me on this one 🙂
What I've got is a table that tells me when a clock is stopped or started on every entry. What I want to do is get the first start entry, then ignore the next start entries until I get a stop entry, then ignore the next stop entries until I get a start entry - ad nauseum.
I have no idea how you would do this without using a cursor 🙁
To make it easier, I have an easy temporary table that a solution could be derived from:
drop table #TempTable
create table #TempTable (ref int, StartStop varchar (5))
insert into #TempTable values (1, 'Start')
insert into #TempTable values (2, 'Start')
insert into #TempTable values (3, 'Stop')
insert into #TempTable values (4, 'Stop')
insert into #TempTable values (5, 'Start')
insert into #TempTable values (6, 'Stop')
insert into #TempTable values (7, 'Start')
insert into #TempTable values (8, 'Start')
So my query should return:
1, Start
3, Stop
5, Start
6, Stop
7, Start
How would I achieve this?
August 3, 2009 at 2:55 am
Hi,
try this
select * from
(
select a.ref,
(case when (select b.StartStop from #TempTable b where b.ref = a.ref -1)= a.StartStop
then 'ERROR' else StartStop end)StartStop
from #TempTable a
) as X
where StartStop 'ERROR'
August 3, 2009 at 3:33 am
Thank you so much!
I think I see the logic...
It looks like my table has NULLs as well... where a NULL is the same as 'Start'. I'm terribly sorry if this is too much bother, but I've been racking my brains for the past 15 minutes to work out how to implement the same thing...
drop table #TempTable
create table #TempTable (ref int, StartStop varchar (5))
insert into #TempTable values (1, NULL)
insert into #TempTable values (2, NULL)
insert into #TempTable values (3, 'Start')
insert into #TempTable values (4, 'Start')
insert into #TempTable values (5, 'Stop')
insert into #TempTable values (6, 'Stop')
insert into #TempTable values (7, 'Start')
insert into #TempTable values (8, 'Stop')
insert into #TempTable values (9, 'Start')
insert into #TempTable values (10, 'Start')
Do you have any suggestions? Or pointers on how to solve this?
August 3, 2009 at 3:43 am
select * from
(
select a.ref,
(case when (select isnull(b.StartStop,'Start') from #TempTable b where b.ref = a.ref -1)= isnull(a.StartStop,'Start')
then 'ERROR' else isnull(a.StartStop,'Start') end)StartStop
from #TempTable a
) as X
where StartStop 'ERROR'
August 3, 2009 at 3:52 am
August 3, 2009 at 4:09 am
However we are waiting for jeff mark on this (any performance issue?)
August 3, 2009 at 5:37 am
Funnily enough - compared to the last thing I was looking at it's a dream of a query. 🙂
I think that the isnulls will probably cause a bit of RBAR, but then that's a problem with the database and not really a fault of SQL. I'm seriously considering asking them to run an update to convert all the NULLs to Start.
Obviously the table that I'm looking at it slightly different - but not by much 🙂 and there's a clustered index on the ref value, so it works pretty well I think...
But I would love to see what Jeff does with this.
I'm curious where you learned your magic though... I've been reading everything I can get my hands on to try to replace cursors. I'm actually trying to do a running update, but I have to subtract the date of the time the clock first started from the time the clock stopped...
In other words, the table actually is like this:
drop table #TempTable
create table #TempTable (ref int, StartStop varchar (5), theDate datetime)
insert into #TempTable values (1, 'Start', '2009-01-01')
insert into #TempTable values (3, 'Stop', '2009-01-05')
insert into #TempTable values (4, 'Stop', '2009-01-05')
insert into #TempTable values (5, 'Start', '2009-02-04')
insert into #TempTable values (6, 'Stop', '2009-03-04')
insert into #TempTable values (7, 'Start', '2009-03-12')
insert into #TempTable values (8, 'Stop', '2009-03-14')
I'm still working on that 🙂
I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.
August 3, 2009 at 8:45 am
ta.bu.shi.da.yu (8/3/2009)
I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.
I hope that you're saying that as a joke. There is no reason to resort to a RBAR solution. Here is a set-based solution.
WITH StartStop AS (
SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate
FROM #TempTable
)
SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days
FROM StartStop AS a
LEFT OUTER JOIN StartStop AS b
ON a.Ref <= b.Ref
AND a.StartStop b.StartStop
WHERE a.StartStop = 'Start'
GROUP BY a.Ref, a.theDate
I learned a lot of my advanced SQL from the "T-SQL Black Belt" column in SQL Server Magazine.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 3, 2009 at 4:01 pm
drew.allen (8/3/2009)
ta.bu.shi.da.yu (8/3/2009)
I'm thinking that I'll just use a cursor or a while loop. I was thinking of running the query as the select statement in the cursor, then find the difference between each stop and start, the insert each difference as a row in a temp table then sum them all together.I hope that you're saying that as a joke. There is no reason to resort to a RBAR solution. Here is a set-based solution.
WITH StartStop AS (
SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate
FROM #TempTable
)
SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days
FROM StartStop AS a
LEFT OUTER JOIN StartStop AS b
ON a.Ref <= b.Ref
AND a.StartStop b.StartStop
WHERE a.StartStop = 'Start'
GROUP BY a.Ref, a.theDate
I learned a lot of my advanced SQL from the "T-SQL Black Belt" column in SQL Server Magazine.
Drew
Oh man... all you guys are awesome 🙂
August 3, 2009 at 4:09 pm
WITH StartStop AS (
SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate
FROM #TempTable
)
SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days
FROM StartStop AS a
LEFT OUTER JOIN StartStop AS b
ON a.Ref <= b.Ref
AND a.StartStop b.StartStop
WHERE a.StartStop = 'Start'
GROUP BY a.Ref, a.theDate
This won't quite work if there are multiple consecutive "Start" rows, since the requirement was to ignore duplicate "Start" rows that immediately follow another "Start" row. Also, the triangular join will perform poorly if there are many rows.
Here is an alternative query that uses the difference beween two ROW_NUMBER functions to help identify consecutive sequences with the same status (0 = Start, 1 = Stop), then uses another ROW_NUMBER function to join start-stop row pairs for which to calculate the date difference. This query will perform significantly faster than the above query if the number of rows is large (>10000).
;WITH cteStage1 AS (
SELECT
ref,
CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END AS status,
ROW_NUMBER() OVER (ORDER BY ref)
- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn
FROM #TempTable
),
cteStage2 AS (
SELECT S.ref, S.status, T.theDate,
ROW_NUMBER() OVER (ORDER BY S.ref) AS rn
FROM #TempTable T
INNER JOIN (
SELECT MIN(ref) AS ref, status
FROM cteStage1
GROUP BY status, cn
) S ON (S.ref = T.ref)
)
SELECT
T0.ref AS StartRef,
T0.theDate AS StartDate,
T1.ref AS StopRef,
T1.theDate AS StopDate,
DATEDIFF(day, T0.theDate, T1.theDate) AS Days
FROM cteStage2 T0
INNER JOIN cteStage2 T1 ON (T0.rn = T1.rn - 1)
WHERE (T0.status = 0 AND T1.status = 1)
ORDER BY T0.ref
August 3, 2009 at 4:36 pm
andrewd.smith (8/3/2009)
WITH StartStop AS (
SELECT Ref, IsNull(StartStop, 'Start') AS StartStop, theDate
FROM #TempTable
)
SELECT a.Ref, a.theDate AS Start, Min(b.theDate) AS Stop, DateDiff(dd, a.theDate, Min(b.theDate)) AS Days
FROM StartStop AS a
LEFT OUTER JOIN StartStop AS b
ON a.Ref <= b.Ref
AND a.StartStop b.StartStop
WHERE a.StartStop = 'Start'
GROUP BY a.Ref, a.theDate
This won't quite work if there are multiple consecutive "Start" rows, since the requirement was to ignore duplicate "Start" rows that immediately follow another "Start" row. Also, the triangular join will perform poorly if there are many rows.
Here is an alternative query that uses the difference beween two ROW_NUMBER functions to help identify consecutive sequences with the same status (0 = Start, 1 = Stop), then uses another ROW_NUMBER function to join start-stop row pairs for which to calculate the date difference. This query will perform significantly faster than the above query if the number of rows is large (>10000).
;WITH cteStage1 AS (
SELECT
ref,
CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END AS status,
ROW_NUMBER() OVER (ORDER BY ref)
- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn
FROM #TempTable
),
cteStage2 AS (
SELECT S.ref, S.status, T.theDate,
ROW_NUMBER() OVER (ORDER BY S.ref) AS rn
FROM #TempTable T
INNER JOIN (
SELECT MIN(ref) AS ref, status
FROM cteStage1
GROUP BY status, cn
) S ON (S.ref = T.ref)
)
SELECT
T0.ref AS StartRef,
T0.theDate AS StartDate,
T1.ref AS StopRef,
T1.theDate AS StopDate,
DATEDIFF(day, T0.theDate, T1.theDate) AS Days
FROM cteStage2 T0
INNER JOIN cteStage2 T1 ON (T0.rn = T1.rn - 1)
WHERE (T0.status = 0 AND T1.status = 1)
ORDER BY T0.ref
Oh boy... my head feels like it's about to explode...
I'm confused by the first bit of the CTE... what does the following do?
ROW_NUMBER() OVER (ORDER BY ref)
- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn
August 3, 2009 at 5:22 pm
I'm confused by the first bit of the CTE... what does the following do?
ROW_NUMBER() OVER (ORDER BY ref)
- ROW_NUMBER() OVER (PARTITION BY (CASE WHEN StartStop = 'Stop' THEN 1 ELSE 0 END) ORDER BY ref) AS cn
The combination of the numeric difference between the 2 ROW_NUMBER functions [cn] and the expression over which the 2nd ROW_NUMBER function is partitioned [status] uniquely identifies each consecutive sequence of rows with the same value of [status] when ordered by the ref column.
Once we have calculated the values of [cn] and [status] in the CTE, we can group by these columns and easly find the first (i.e. minimum ref column value) row in each consecutive sequence.
This ROW_NUMBER difference method used to solve the problem of combining sequential rows originated from the following article, or at least that's where I first saw it used.
http://www.sqlmag.com/Article/ArticleID/93606/sql_server_93606.html
August 4, 2009 at 8:24 am
Sorry... I missed something... had to take the code down from this post... I'll be back.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2009 at 8:27 am
And this was an accidental double post from above.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2009 at 9:25 am
Nah... I didn't miss anything... Andrew not only did it the way I would have, he even cited the same URL I was going to cite where I first saw the trick of using the difference between two row numbers.
Arun's code worked fine but only if the Ref #'s were sequential.
I have done performance testing in the past on the method Andrew used... even the quirky update won't beat it. It comes close but it was edged out.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply