count of consecutive value

Question

count of consecutive value

Viewing 15 posts - 31 through 44 (of 44 total)

You must be logged in to reply to this topic. Login to reply

Edward Boyle-478467 Ten Centuries Points: 1165 More actions · Answer 1

I have a somewhat different business application summarized below.

A table with columns Interval and Value. These represent an activity that occurs at each minute of a 24 hour day.

The interval is unique [one value for each minute during the day]. The requirement is to "pivot" this data for contiguous intervals with the same value:

Example:

480 279

481 279

482 279

483 279

484 282

485 282

486 279

487 279

The output is

Value From To

279 480 483

282 484 485

279 486 487

I used the Row_Number() approach to find all the contiguous data islands.

noeld SSC Guru Points: 96590 More actions · Answer 2

Edward Boyle (2/4/2009)
I have a somewhat different business application summarized below.
A table with columns Interval and Value. These represent an activity that occurs at each minute of a 24 hour day.
The interval is unique [one value for each minute during the day]. The requirement is to "pivot" this data for contiguous intervals with the same value:
Example:
480 279
481 279
482 279
483 279
484 282
485 282
486 279
487 279
The output is
Value From To
279 480 483
282 484 485
279 486 487
I used the Row_Number() approach to find all the contiguous data islands.

I don't see the need for row_number(). In that case straight sql solves it

select value_point , min(interval) as from_min ,max(interval) as to_min

from table

group by value_point

order by from_min

* Noel

steve dassin SSCrazy Points: 2396 More actions · Answer 3

Jeff Moden (2/3/2009)
I agree with Barry... enjoy the cleverness of the code and the way the author of that code overcame the shortcomings of the language. 😉

I agree. I appreciate clever and said so:

microsoft.public.access.queries

Sep 18 2003

grouping records (See Groucho)

http://tinyurl.com/5tt644

Fast forward 5 years.

microsoft.public.sqlserver.programming

Oct 23 2008

'stroed procedure to break numbers into ranges'

http://tinyurl.com/d4fxbp

And in 2013 I'm sure some others will discover the same clever trick 🙂

BWAA-HAAA! So what? Heh... Consider this... If the people who wrote T-SQL and other languages got it all right the first time, people wouldn't have the need for what you're trying to sell. 😛

I appreciate your insight...when are you going to dive into Dataphor? 🙂

www.beyondsql.blogspot.com

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 4

steve dassin (2/4/2009)
I appreciate your insight...when are you going to dive into Dataphor? 🙂

In all likely hood, never. Not because I don't think it's worthwhile or anything like that. It's for one of the same reasons I don't use CLR's... it's not available on every instance of SQL Server I'll run into and telling people I can't do something without a chunk of 3rd party software just isn't what people want to hear when they hire a consultant. Maybe, if I were a permanent employee, things might be a bit different.

You've got some good stuff in there... why don't you try to convice M$ to buy your good extensions?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 5

Jeff Moden (1/31/2009)
Nah... I take it back... not clever... freakin' brilliant. The key to this whole thing is the grouping. You've gotta be able to group each "cluster" of rows that are in a sequence by date and slot. I knew it couldn't be done with a single ROW_NUMBER, but the winner used the difference between two ROW_NUMBERs with different partitions to generate the GRP for each student... absolutely brilliant.
Before I studied the winners code, I did a "data smear" using the "running total quirky update" to form the groups. The advantage with the quirky update is that it'll work in both SQL Server 2k and 2k5. But, for 2k5, the winner's solution beats the quirky update.
Here's the results...
7936 ms - Quirky update
7390 ms - Winner's code
I'll take that 6.9% defeat as a substantial butt whipping... further, the winner's code also has the advantage of being able to be put into a view, if you need to do such a thing. My hat's off to the winner... I didn't believe such a thing was possible.
Lordy, I love it when I learn something new! 🙂

Wow! That approach really is brilliant.

I am in awe that someone thought of that!

Edward Boyle-478467 Ten Centuries Points: 1165 More actions · Answer 6

Noel

Perhaps I mis-stated the problem. I am looking for separate contiguous intervals with the same value, not the start and end for any value [as your code provides].

In any case the sampel code below.

Create Table #T (Interval Int,Val Int)

Insert #T Values(480, 279)

Insert #T Values(481, 279)

Insert #T Values(482, 279)

Insert #T Values(483, 279)

Insert #T Values(484, 282)

Insert #T Values(486, 279)

Insert #T Values(487, 279)

Select * from #t

------------- Your Code------------

select Val , min(interval) as from_min ,max(interval) as to_min

from #T

group by val

order by from_min

/* Result

279481487

282484484

The 279 Value has two separate intervals [which I need to identify]

*/

------------- My Code ---------------------

With T as

(Select Interval,Val,RN=Row_Number() over(partition by Val order by Interval) From #T

), Island as

(Select Grp=Row_Number() Over(Order by Interval)-RN,Interval,Val From T)

Select

Start=Min(Interval),

[End]=Max(Interval),

Val

From Island

Group by Val,Grp

order by Start

/*Result

481483279

484484282

486487279

*&/

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 7

RBarryYoung (1/31/2009)
Heh. You know Jeff, I think that this might change that article that you're writing, just a little bit. I bet you don't mind too much though. 😛

Actually, I just found a fly in the ointment... the method isn't perfect and your first instinct on the Grp sometimes duplicating was spot on. I'll post the example that does the imperfection soon... I wanna play with it some more.

Looks like the article might change, alright... as a possible recommendation not to do this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 8

Jeff Moden (2/14/2009)
RBarryYoung (1/31/2009)
Heh. You know Jeff, I think that this might change that article that you're writing, just a little bit. I bet you don't mind too much though. 😛
Actually, I just found a fly in the ointment... the method isn't perfect and your first instinct on the Grp sometimes duplicating was spot on. I'll post the example that does the imperfection soon... I wanna play with it some more.
Looks like the article might change, alright... as a possible recommendation not to do this.

Ok... here's the code that exposes the "fly"... maybe I'm doing something wrong, but if I did, I sure would like to know what it is.

--===== Create a test table

CREATE TABLE #Sample

(

SampleDate DATETIME,

Item CHAR(1)

)

--===== Populate the test table with sample data

INSERT INTO #Sample

(SampleDate,Item)

SELECT '2009-02-14 21:27:33.433','A' UNION ALL

SELECT '2009-02-14 21:27:35.970','A' UNION ALL

SELECT '2009-02-14 21:27:45.247','B' UNION ALL

SELECT '2009-02-14 21:27:50.077','A' UNION ALL

SELECT '2009-02-14 21:28:12.177','A' UNION ALL

SELECT '2009-02-14 21:28:13.937','B' UNION ALL

SELECT '2009-02-14 21:28:26.360','B' UNION ALL

SELECT '2009-02-14 21:28:26.753','B' UNION ALL

SELECT '2009-02-14 21:28:27.673','B' UNION ALL

SELECT '2009-02-14 21:28:39.193','B' UNION ALL

SELECT '2009-02-14 21:28:42.510','A' UNION ALL

SELECT '2009-02-14 21:28:51.493','B' UNION ALL

SELECT '2009-02-14 21:28:55.037','B' UNION ALL

SELECT '2009-02-14 21:29:04.493','A' UNION ALL

SELECT '2009-02-14 21:29:16.020','B' UNION ALL

SELECT '2009-02-14 21:29:20.267','B' UNION ALL

SELECT '2009-02-14 21:29:36.410','B' UNION ALL

SELECT '2009-02-14 21:29:53.947','B' UNION ALL

SELECT '2009-02-14 21:30:00.377','A' UNION ALL

SELECT '2009-02-14 21:30:04.150','A' UNION ALL

SELECT '2009-02-14 21:30:05.013','A' UNION ALL

SELECT '2009-02-14 21:30:10.857','B' UNION ALL

SELECT '2009-02-14 21:30:16.770','B' UNION ALL

SELECT '2009-02-14 21:30:17.687','A' UNION ALL

SELECT '2009-02-14 21:30:18.110','B' UNION ALL

SELECT '2009-02-14 21:30:18.347','B' UNION ALL

SELECT '2009-02-14 21:30:22.533','B' UNION ALL

SELECT '2009-02-14 21:30:38.897','A' UNION ALL

SELECT '2009-02-14 21:30:44.870','A' UNION ALL

SELECT '2009-02-14 21:31:13.933','B'

--===== Try to calculate the Groups in date order. It fails.

-- Look at rows that contain 11 and 15-18 in the RowNumSampleDate

-- column to observe the failure. Obviously, that will break

-- a sequence by group.

;WITH

cteRowNumItem AS

(

SELECT SampleDate,

Item,

RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)

FROM #Sample

)

SELECT SampleDate,

Item,

GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,

ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,

RowNum AS RowNumItem

FROM cteRowNumItem

ORDER BY SampleDate

Man... it's too bad, too... I thought this was a really great method. I don't trust this method because of this breakage proof...

I dunno... maybe I just didn't go far enough... maybe when I do the sequencing, it'll work if I include both the Item and the GroupNum in the partition.... lemme see.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 9

Yeah... that fixes it... or at least for this particular problem...

;WITH

cteRowNumItem AS

(

SELECT SampleDate,

Item,

RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)

FROM #Sample

)

,

cteGroup AS

(

SELECT SampleDate,

Item,

GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,

ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,

RowNum AS RowNumItem

FROM cteRowNumItem

)

SELECT *,

ROW_NUMBER() OVER (PARTITION BY Item,GroupNum ORDER BY RowNumSampleDate) AS Sequence

FROM cteGroup

ORDER BY SampleDate

I still don't trust it, yet. I've got some more testing to do with a larger data set.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 10

Ok... now I trust it... just gotta remember to NOT let the fact that GroupNums aren't unique across items but the combination of Item and GroupNum works as expected when partitioning to build a sequence by group...

If anyone else wants to play with a large data set, here's the code I'm using to do my testing with...

USE TempDB

GO

SELECT TOP 1000000

RAND(CAST(NEWID() AS VARBINARY))+GETDATE() AS SampleDate,

CHAR(ABS(CHECKSUM(NEWID()))%2+65) ITEM

INTO Sample

FROM Master.sys.SysColumns t1 CROSS JOIN Master.sys.SysColumns t2

CREATE CLUSTERED INDEX IX_Sample_Item_SampleDate ON dbo.Sample

(

[ITEM] ASC,

[SampleDate] ASC

)

GO

--===== This works if you include both the Item and GroupNum in the partition...

;WITH

cteRowNumItem AS

(

SELECT SampleDate,

Item,

RowNum = ROW_NUMBER() OVER(PARTITION BY Item ORDER BY SampleDate)

FROM Sample

)

,

cteGroup AS

(

SELECT SampleDate,

Item,

GroupNum = ROW_NUMBER() OVER(ORDER BY SampleDate)-RowNum,

ROW_NUMBER() OVER(ORDER BY SampleDate) AS RowNumSampleDate,

RowNum AS RowNumItem

FROM cteRowNumItem

)

SELECT SampleDate,Item,

ROW_NUMBER() OVER (PARTITION BY Item,GroupNum ORDER BY RowNumSampleDate) AS Sequence,

GroupNum,RowNumSampleDate,RowNumItem

FROM cteGroup

ORDER BY SampleDate

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RBarryYoung SSC Guru Points: 143329 More actions · Answer 11

Jeff Moden (2/14/2009)
Ok... now I trust it... just gotta remember to NOT let the fact that GroupNums aren't unique across items but the combination of Item and GroupNum works as expected when partitioning to build a sequence by group...

I'm still on hiatus, but Yeah, I think that's the key...

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 12

Jeff,

You got me worried with your doubts about this ROW_NUMBER() difference technique.

I used this technique in a proposed solution to this topic:

http://www.sqlservercentral.com/Forums/Topic656492-338-1.aspx

However, I think the proposed solution is valid, because the columns used in the PARTITION BY clause of the ROW_NUMBER() functions are also included in the final GROUP BY clause.

Do you agree?

Andrew

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 13

Looks fine to me, Andrew. Sorry for the panic, but I went into a panic when I saw that the group numbers weren't unique across items when I was setting up an experiment for an article on "data smears"... I just didn't take the code to the next step. The group numbers combined with the item number do make the correct grouping. It's a bit like Cod Liver Oil... tastes like crap but is so good for you. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Oscar D. Garcia SSC Eights! Points: 874 More actions · Answer 14

To add to this discussion, I think that a nested CTE can provide a solution for this. The problem is that nested CTEs are not supported, but there is a trick to work around this. Following your test cases, we can define a simple table as:

create table result

(

id int,

code char(1)

)

with these rows:

id code

8A

6A

5A

4B

3B

2B

1A

Step 1: get a sequential order of the records. This will help us ensure there are no gaps on the IDs. This can be achieved by using row_number and for reusability wrapping it around a CTE

WITH SEQ_LIST (seq,id,code) as

(

select row_number() over (order by id desc) as seq, id,code

from result (nolock)

)

STEP 2: since we would like to reuse this with another CTE we can create a view:

create view dbo.vwResult

as

WITH SEQ_LIST (seq,id,code) as

(

select row_number() over (order by id desc) as seq, id,code

from result (nolock)

)

select * from SEQ_LIST

The view returns this:

Seq id Code (note how id has a gap from 8 to 6)

18A

26A

35A

44B (the seq should end here because B is found)

53B

62B

71A

STEP 3: We are not ready to use the view. The trick now is to find the most recent record. Once that is found, we need to select the records prior to the most recent one. Since the view is already in sequential order, we can do a select top 1 to get the latest record:

SELECT top 1 seq, code, 1 as cnt

FROM vwResult

returns this row:

seq code cnt

1A1

STEP 4: We now need to get the records prior to the last one with a matching code. This can be done by using a recursive CTE.

with RESULT_CNT (seq, code,cnt) as

(

SELECT top 1 seq, code, 1 as cnt --get the most recent row

FROM vwResult (NOLOCK)

union all

SELECT a.seq, a.code, 1 as cnt --gets the next row only if same code

FROM vwResult (NOLOCK) a

inner join RESULT_CNT b on a.seq = b.seq+1 and a.code = b.code

)

select code, sum(cnt) as Total from RESULT_CNT -- gets the repetition count

group by code

The key is to select the records with the same code and with the next sequence number (a.seq = b.seq+1) . The result for this test case is this:

Seq Code Count

1A1

2A1

3A1

Seq 4 has code B, so the recursive query only returns three rows.

We select all those sequential records with the same code. For each record, we provide the instance count of 1. This will help us do an aggregate by code at the end of the query. The aggreagate result is:

Code Total

A3

I hope this helps.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 15

Nice job, but recursion is actually slower than a cursor.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)