Grouping results based on previous rows

Question

Post reply

Grouping results based on previous rows

luisap

Grasshopper

Points: 20
More actions
May 1, 2015 at 12:47 pm

#303587

The following represents data from a table that tracks conditions along sections of highways.
The sections are defined between two points along the highway (Start, End).
The Start/End points are always increasing in value, but starts may not match the end of the previous section.
I'm trying to report on matching conditions along the highway, and break only when there's a change in conditions.
Here's the example table
CREATE TABLE Test
(
Hwy Char (5),
Start int,
End int,
Condition1 Char(4),
Condition2 Char(4)
);
Insert into Test Values ('A', 10000, 10020, 'BARE', 'GOOD');
Insert into Test Values ('A', 10025, 10030, 'BARE', 'GOOD');
Insert into Test Values ('A', 10030, 10040, 'ICY', 'GOOD');
Insert into Test Values ('A', 10040, 10050, 'ICY', 'GOOD');
Insert into Test Values ('A', 10050, 10060, 'ICY', 'GOOD');
Insert into Test Values ('A', 10060, 10070, 'BARE', 'GOOD');
Insert into Test Values ('A', 10075, 10080, 'BARE', 'FAIR');
Insert into Test Values ('A', 10080, 10090, 'BARE', 'FAIR');
Insert into Test Values ('A', 10090, 10100, 'BARE', 'GOOD');
I'm trying to get the following results:
Hwy Start End Condition1 Condition2
QEW 10000 10030 BARE GOOD
QEW 10030 10060 ICY GOOD
QEW 10060 10070 BARE GOOD
QEW 10075 10090 BARE FAIR
QEW 10090 10100 BARE GOOD
I've been trying to use min and max on Start/End, and group by, but I think it's too simplistic an approach. Any thoughts?

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 1

Quick solution, should be self explanatory (I'll add comments if I'll get the time;-))

😎

USE tempdb;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.Test_ROADS') IS NOT NULL DROP TABLE dbo.Test_ROADS;

CREATE TABLE dbo.Test_ROADS

(

Hwy Char (5),

Start int,

[End] int,

Condition1 Char(4),

Condition2 Char(4)

);

Insert into dbo.Test_ROADS Values

('A', 10000, 10020, 'BARE', 'GOOD')

,('A', 10025, 10030, 'BARE', 'GOOD')

,('A', 10030, 10040, 'ICY', 'GOOD')

,('A', 10040, 10050, 'ICY', 'GOOD')

,('A', 10050, 10060, 'ICY', 'GOOD')

,('A', 10060, 10070, 'BARE', 'GOOD')

,('A', 10075, 10080, 'BARE', 'FAIR')

,('A', 10080, 10090, 'BARE', 'FAIR')

,('A', 10090, 10100, 'BARE', 'GOOD');

;WITH RANKED_CONDITIONS AS

(

SELECT

TR.Hwy

,X.X_POINT

,ROW_NUMBER() OVER

(

PARTITION BY TR.Hwy

ORDER BY TR.Hwy

,X.X_POINT

) AS BD_RID

,DENSE_RANK() OVER

(

PARTITION BY TR.Hwy

ORDER BY X.Condition1

,X.Condition2

) AS BD_DRNK

,X.Condition1

,X.Condition2

FROM dbo.Test_ROADS TR

CROSS APPLY

(

SELECT TR.Start,TR.Condition1,TR.Condition2 UNION ALL

SELECT TR.[End],TR.Condition1,TR.Condition2

) AS X(X_POINT,Condition1,Condition2)

)

,GROUPED_SET AS

(

SELECT

RC.Hwy

,RC.X_POINT

,RC.BD_RID - ROW_NUMBER() OVER

(

PARTITION BY RC.Hwy

,RC.BD_DRNK

ORDER BY RC.BD_RID

) AS GRP_ID

,RC.Condition1

,RC.Condition2

FROM RANKED_CONDITIONS RC

)

SELECT

GS.Hwy AS HWY

,MIN(GS.X_POINT) AS START_POINT

,MAX(GS.X_POINT) AS END_POINT

,MAX(GS.Condition1) AS CONDITION_1

,MAX(GS.Condition2) AS CONDITION_2

FROM GROUPED_SET GS

GROUP BY GS.Hwy, GS.GRP_ID

ORDER BY HWY

,START_POINT

,END_POINT;

Results

HWY START_POINT END_POINT CONDITION_1 CONDITION_2

----- ----------- ----------- ----------- -----------

A 10000 10030 BARE GOOD

A 10030 10060 ICY GOOD

A 10060 10070 BARE GOOD

A 10075 10090 BARE FAIR

A 10090 10100 BARE GOOD

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 2

This works with your data

WITH CTE AS (

SELECT Hwy, Start, [End], Condition1, Condition2,

ROW_NUMBER() OVER(PARTITION BY Hwy ORDER BY Start, [End]) -

ROW_NUMBER() OVER(PARTITION BY Hwy, Condition1, Condition2 ORDER BY Start, [End]) AS rnDiff

FROM Test)

SELECT Hwy,

MIN(Start) AS Start,

MAX([End]) AS [End],

Condition1, Condition2

FROM CTE

GROUP BY Hwy, Condition1, Condition2, rnDiff

ORDER BY Hwy, Start;

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

luisap Grasshopper Points: 20 More actions · Answer 3

luisap

Grasshopper

Points: 20

May 2, 2015 at 4:41 pm

#1795203

Thank you both! I'll give that a try!

Sigerson Hall of Fame Points: 3697 More actions · Answer 4

@mark-3 Cowne

+10

Very elegant solution.

Sigerson

"No pressure, no diamonds." - Thomas Carlyle

Mark Cowne One Orange Chip Points: 26990 More actions · Answer 5

Sigerson (5/5/2015)
@mark-3 Cowne
+10
Very elegant solution.

Thanks!

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Dwain Camps SSC Guru Points: 86978 More actions · Answer 6

Sigerson (5/5/2015)
@mark-3 Cowne
+10
Very elegant solution.

I'll second that. Using the staggered rows approach is pretty well known when it comes to finding contiguous dates, but not everyone realizes how it can be used to locate changing data by row.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St