February 4, 2009 at 6:52 am
I have a somewhat different business application summarized below.
A table with columns Interval and Value. These represent an activity that occurs at each minute of a 24 hour day.
The interval is unique [one value for each minute during the day]. The requirement is to "pivot" this data for contiguous intervals with the same value:
Example:
480 279
481 279
482 279
483 279
484 282
485 282
486 279
487 279
The output is
Value From To
279 480 483
282 484 485
279 486 487
I used the Row_Number() approach to find all the contiguous data islands.
February 4, 2009 at 2:05 pm
Edward Boyle (2/4/2009)
I have a somewhat different business application summarized below.A table with columns Interval and Value. These represent an activity that occurs at each minute of a 24 hour day.
The interval is unique [one value for each minute during the day]. The requirement is to "pivot" this data for contiguous intervals with the same value:
Example:
480 279
481 279
482 279
483 279
484 282
485 282
486 279
487 279
The output is
Value From To
279 480 483
282 484 485
279 486 487
I used the Row_Number() approach to find all the contiguous data islands.
I don't see the need for row_number(). In that case straight sql solves it
select value_point , min(interval) as from_min ,max(interval) as to_min
from table
group by value_point
order by from_min
* Noel
February 4, 2009 at 4:50 pm
Jeff Moden (2/3/2009)
I agree with Barry... enjoy the cleverness of the code and the way the author of that code overcame the shortcomings of the language. 😉
I agree. I appreciate clever and said so:
microsoft.public.access.queries
Sep 18 2003
grouping records (See Groucho)
Fast forward 5 years.
microsoft.public.sqlserver.programming
Oct 23 2008
'stroed procedure to break numbers into ranges'
And in 2013 I'm sure some others will discover the same clever trick 🙂
BWAA-HAAA! So what? Heh... Consider this... If the people who wrote T-SQL and other languages got it all right the first time, people wouldn't have the need for what you're trying to sell. 😛
I appreciate your insight...when are you going to dive into Dataphor? 🙂
February 4, 2009 at 6:49 pm
steve dassin (2/4/2009)
I appreciate your insight...when are you going to dive into Dataphor? 🙂
In all likely hood, never. Not because I don't think it's worthwhile or anything like that. It's for one of the same reasons I don't use CLR's... it's not available on every instance of SQL Server I'll run into and telling people I can't do something without a chunk of 3rd party software just isn't what people want to hear when they hire a consultant. Maybe, if I were a permanent employee, things might be a bit different.
You've got some good stuff in there... why don't you try to convice M$ to buy your good extensions?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 5, 2009 at 10:39 am
Jeff Moden (1/31/2009)
Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.Before I studied the winners code, I did a "data smear" using the "running total quirky update" to form the groups. The advantage with the quirky update is that it'll work in both SQL Server 2k and 2k5. But, for 2k5, the winner's solution beats the quirky update.
Here's the results...
7936 ms - Quirky update
7390 ms - Winner's code
I'll take that 6.9% defeat as a substantial butt whipping... further, the winner's code also has the advantage of being able to be put into a view, if you need to do such a thing. My hat's off to the winner... I didn't believe such a thing was possible.
Lordy, I love it when I learn something new! 🙂
Wow! That approach really is brilliant.
I am in awe that someone thought of that!
February 6, 2009 at 6:32 am
Noel
Perhaps I mis-stated the problem. I am looking for separate contiguous intervals with the same value, not the start and end for any value [as your code provides].
In any case the sampel code below.
Create Table #T (Interval Int,Val Int)
Insert #T Values(480, 279)
Insert #T Values(481, 279)
Insert #T Values(482, 279)
Insert #T Values(483, 279)
Insert #T Values(484, 282)
Insert #T Values(486, 279)
Insert #T Values(487, 279)
Select * from #t
------------- Your Code------------
select Val , min(interval) as from_min ,max(interval) as to_min
from #T
group by val
order by from_min
/* Result
279481487
282484484
The 279 Value has two separate intervals [which I need to identify]
*/
------------- My Code ---------------------
With T as
(Select Interval,Val,RN=Row_Number() over(partition by Val order by Interval) From #T
), Island as
(Select Grp=Row_Number() Over(Order by Interval)-RN,Interval,Val From T)
Select
Start=Min(Interval),
[End]=Max(Interval),
Val
From Island
Group by Val,Grp
order by Start
/*Result
481483279
484484282
486487279
*&/
February 14, 2009 at 8:11 pm
RBarryYoung (1/31/2009)
Heh. You know Jeff, I think that this might change that article that you're writing, just a little bit. I bet you don't mind too much though. 😛
Actually, I just found a fly in the ointment... the method isn't perfect and your first instinct on the Grp sometimes duplicating was spot on. I'll post the example that does the imperfection soon... I wanna play with it some more.
Looks like the article might change, alright... as a possible recommendation not to do this.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2009 at 8:41 pm
Jeff Moden (2/14/2009)
RBarryYoung (1/31/2009)
Heh. You know Jeff, I think that this might change that article that you're writing, just a little bit. I bet you don't mind too much though. 😛Actually, I just found a fly in the ointment... the method isn't perfect and your first instinct on the Grp sometimes duplicating was spot on. I'll post the example that does the imperfection soon... I wanna play with it some more.
Looks like the article might change, alright... as a possible recommendation not to do this.
Ok... here's the code that exposes the "fly"... maybe I'm doing something wrong, but if I did, I sure would like to know what it is.
--===== Create a test table
CREATE TABLE #Sample
(
SampleDate DATETIME,
Item CHAR(1)
)
--===== Populate the test table with sample data
INSERT INTO #Sample
(SampleDate,Item)
SELECT '2009-02-14 21:27:33.433','A' UNION ALL
SELECT '2009-02-14 21:27:35.970','A' UNION ALL
SELECT '2009-02-14 21:27:45.247','B' UNION ALL
SELECT '2009-02-14 21:27:50.077','A' UNION ALL
SELECT '2009-02-14 21:28:12.177','A' UNION ALL
SELECT '2009-02-14 21:28:13.937','B' UNION ALL
SELECT '2009-02-14 21:28:26.360','B' UNION ALL
SELECT '2009-02-14 21:28:26.753','B' UNION ALL
SELECT '2009-02-14 21:28:27.673','B' UNION ALL
SELECT '2009-02-14 21:28:39.193','B' UNION ALL
SELECT '2009-02-14 21:28:42.510','A' UNION ALL
SELECT '2009-02-14 21:28:51.493','B' UNION ALL
SELECT '2009-02-14 21:28:55.037','B' UNION ALL
SELECT '2009-02-14 21:29:04.493','A' UNION ALL
SELECT '2009-02-14 21:29:16.020','B' UNION ALL
SELECT '2009-02-14 21:29:20.267','B' UNION ALL
SELECT '2009-02-14 21:29:36.410','B' UNION ALL
SELECT '2009-02-14 21:29:53.947','B' UNION ALL
SELECT '2009-02-14 21:30:00.377','A' UNION ALL
SELECT '2009-02-14 21:30:04.150','A' UNION ALL
SELECT '2009-02-14 21:30:05.013','A' UNION ALL
SELECT '2009-02-14 21:30:10.857','B' UNION ALL
SELECT '2009-02-14 21:30:16.770','B' UNION ALL
SELECT '2009-02-14 21:30:17.687','A' UNION ALL
SELECT '2009-02-14 21:30:18.110','B' UNION ALL
SELECT '2009-02-14 21:30:18.347','B' UNION ALL
SELECT '2009-02-14 21:30:22.533','B' UNION ALL
SELECT '2009-02-14 21:30:38.897','A' UNION ALL
SELECT '2009-02-14 21:30:44.870','A' UNION ALL
SELECT '2009-02-14 21:31:13.933','B'
--===== Try to calculate the Groups in date order. It fails.
-- Look at rows that contain 11 and 15-18 in the RowNumSampleDate
-- column to observe the failure. Obviously, that will break
-- a sequence by group.
;WITH
cteRowNumItem AS
(
SELECT SampleDate,
Item,
RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)
FROM #Sample
)
SELECT SampleDate,
Item,
GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,
ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,
RowNum AS RowNumItem
FROM cteRowNumItem
ORDER BY SampleDate
Man... it's too bad, too... I thought this was a really great method. I don't trust this method because of this breakage proof...
I dunno... maybe I just didn't go far enough... maybe when I do the sequencing, it'll work if I include both the Item and the GroupNum in the partition.... lemme see.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2009 at 8:50 pm
Yeah... that fixes it... or at least for this particular problem...
;WITH
cteRowNumItem AS
(
SELECT SampleDate,
Item,
RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)
FROM #Sample
)
,
cteGroup AS
(
SELECT SampleDate,
Item,
GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,
ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,
RowNum AS RowNumItem
FROM cteRowNumItem
)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Item,GroupNum ORDER BY RowNumSampleDate) AS Sequence
FROM cteGroup
ORDER BY SampleDate
I still don't trust it, yet. I've got some more testing to do with a larger data set.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2009 at 9:52 pm
Ok... now I trust it... just gotta remember to NOT let the fact that GroupNums aren't unique across items but the combination of Item and GroupNum works as expected when partitioning to build a sequence by group...
If anyone else wants to play with a large data set, here's the code I'm using to do my testing with...
USE TempDB
GO
SELECT TOP 1000000
RAND(CAST(NEWID() AS VARBINARY))+GETDATE() AS SampleDate,
CHAR(ABS(CHECKSUM(NEWID()))%2+65) ITEM
INTO Sample
FROM Master.sys.SysColumns t1 CROSS JOIN Master.sys.SysColumns t2
CREATE CLUSTERED INDEX IX_Sample_Item_SampleDate ON dbo.Sample
(
[ITEM] ASC,
[SampleDate] ASC
)
GO
--===== This works if you include both the Item and GroupNum in the partition...
;WITH
cteRowNumItem AS
(
SELECT SampleDate,
Item,
RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)
FROM Sample
)
,
cteGroup AS
(
SELECT SampleDate,
Item,
GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,
ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,
RowNum AS RowNumItem
FROM cteRowNumItem
)
SELECT SampleDate,Item,
ROW_NUMBER() OVER (PARTITION BY Item,GroupNum ORDER BY RowNumSampleDate) AS Sequence,
GroupNum,RowNumSampleDate,RowNumItem
FROM cteGroup
ORDER BY SampleDate
--Jeff Moden
Change is inevitable... Change for the better is not.
February 14, 2009 at 10:04 pm
Jeff Moden (2/14/2009)
Ok... now I trust it... just gotta remember to NOT let the fact that GroupNums aren't unique across items but the combination of Item and GroupNum works as expected when partitioning to build a sequence by group...
I'm still on hiatus, but Yeah, I think that's the key...
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
February 15, 2009 at 4:43 am
Jeff,
You got me worried with your doubts about this ROW_NUMBER() difference technique.
I used this technique in a proposed solution to this topic:
http://www.sqlservercentral.com/Forums/Topic656492-338-1.aspx
However, I think the proposed solution is valid, because the columns used in the PARTITION BY clause of the ROW_NUMBER() functions are also included in the final GROUP BY clause.
Do you agree?
Andrew
February 15, 2009 at 9:14 am
Looks fine to me, Andrew. Sorry for the panic, but I went into a panic when I saw that the group numbers weren't unique across items when I was setting up an experiment for an article on "data smears"... I just didn't take the code to the next step. The group numbers combined with the item number do make the correct grouping. It's a bit like Cod Liver Oil... tastes like crap but is so good for you. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 15, 2009 at 11:46 am
To add to this discussion, I think that a nested CTE can provide a solution for this. The problem is that nested CTEs are not supported, but there is a trick to work around this. Following your test cases, we can define a simple table as:
create table result
(
id int,
code char(1)
)
with these rows:
id code
8A
6A
5A
4B
3B
2B
1A
Step 1: get a sequential order of the records. This will help us ensure there are no gaps on the IDs. This can be achieved by using row_number and for reusability wrapping it around a CTE
WITH SEQ_LIST (seq,id,code) as
(
select row_number() over (order by id desc) as seq, id,code
from result (nolock)
)
STEP 2: since we would like to reuse this with another CTE we can create a view:
create view dbo.vwResult
as
WITH SEQ_LIST (seq,id,code) as
(
select row_number() over (order by id desc) as seq, id,code
from result (nolock)
)
select * from SEQ_LIST
The view returns this:
Seq id Code (note how id has a gap from 8 to 6)
18A
26A
35A
44B (the seq should end here because B is found)
53B
62B
71A
STEP 3: We are not ready to use the view. The trick now is to find the most recent record. Once that is found, we need to select the records prior to the most recent one. Since the view is already in sequential order, we can do a select top 1 to get the latest record:
SELECT top 1 seq, code, 1 as cnt
FROM vwResult
returns this row:
seq code cnt
1A1
STEP 4: We now need to get the records prior to the last one with a matching code. This can be done by using a recursive CTE.
with RESULT_CNT (seq, code,cnt) as
(
SELECT top 1 seq, code, 1 as cnt --get the most recent row
FROM vwResult (NOLOCK)
union all
SELECT a.seq, a.code, 1 as cnt --gets the next row only if same code
FROM vwResult (NOLOCK) a
inner join RESULT_CNT b on a.seq = b.seq+1 and a.code = b.code
)
select code, sum(cnt) as Total from RESULT_CNT -- gets the repetition count
group by code
The key is to select the records with the same code and with the next sequence number (a.seq = b.seq+1) . The result for this test case is this:
Seq Code Count
1A1
2A1
3A1
Seq 4 has code B, so the recursive query only returns three rows.
We select all those sequential records with the same code. For each record, we provide the instance count of 1. This will help us do an aggregate by code at the end of the query. The aggreagate result is:
Code Total
A3
I hope this helps.
February 15, 2009 at 8:05 pm
Nice job, but recursion is actually slower than a cursor.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 44 (of 44 total)
You must be logged in to reply to this topic. Login to reply