I need help breaking free from RBAR

Question

I need help breaking free from RBAR

Tom Rippetoe

SSC Veteran

Points: 213
More actions
March 29, 2011 at 10:34 pm

#389170

I am trying to break free from my RBAR ways, but this one is stumping me. Can you provide any suggestions on how to accomplish the following in a more set-based approach?
I have a table that stores information about road segments. Each segment must have a begin-point and an end-point and is ranked based on some criteria (the criteria are not important to this question). A segment is 0.10 miles long, and a new segment begins every 0.01 miles. So there are lots of overlaping segments. My challenge is to 'group' segments together that rank above some threshold value (in this example 90%) and that have begin-points that are 'consecutive' along the roadway. That is, there is an unbroken sequence of begin-points along a physical roadway. If there is a break in begin-points, a new 'group' should be started.
Here is the relevant table structure:
CREATE TABLE [dbo].[RoadSegment](
[SegmentId] [nvarchar](25) NOT NULL,
[BeginPoint] [float] NOT NULL,
[EndPoint] [float] NOT NULL,
[SegmentRank] [float] NOT NULL,
[GroupId] [nvarchar](25) NULL
)
A small set of sample data:
INSERT INTO RoadSegment (SegmentId, BeginPoint, [EndPoint], SegmentRank) VALUES
('123', 0.00, 0.10, 90),
('124', 0.01, 0.11, 92),
('125', 0.02, 0.12, 95),
('126', 0.03, 0.13, 84),
('127', 0.04, 0.14, 95),
('128', 0.05, 0.15, 95)
In this example, segments with the IDs '123', '124', and '125' would be grouped together, i.e. get the same GroupId. A new group woud begin with segment '127' and include '128'. Segment '126' would not be included in any group because its rank was below the threshold value.
I can start off the data processing easily enough....SELECT * FROM RoadSegment WHERE SegmentRank >= 90 ORDER BY BeginPoint. But after that my old RBAR ways want to take over and iterate over each record:
psuedo-code:
IF CurrentBeginPoint - PreviousBeginPoint = 0.01
THEN CurrentGroupId = PreviousGroupId
ELSE CurrentGroupId = PreviousGroupId + 1
SET GroupId = CurrentGroupId
PreviousBeginPoint = CurrentEndPoint
PreviousGroupId = CurrentGroupId
FETCH NEXT record
Is there a better way of doing this?
Thank you.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

UMG Developer SSChampion Points: 13482 More actions · Answer 1

Essentially what you need is a running total. You can read about that in an article from Jeff Moden here: http://www.sqlservercentral.com/articles/T-SQL/68467/

So something along this should for for you:

--Determine where each group should start

WITH CTE AS (SELECT SegmentId, BeginPoint, EndPoint, ROW_NUMBER() OVER (ORDER BY BeginPoint) AS RN FROM RoadSegment WHERE SegmentRank >= 90)

SELECT

C.SegmentId,

C.RN,

CASE WHEN C.BeginPoint = (P.BeginPoint + 0.01) THEN 0 ELSE 1 END AS NewGroup

INTO #Pass1

FROM CTE C

LEFT JOIN CTE P

ON (C.RN - 1) = P.RN;

--Setup for the "Quirky Update"

ALTER TABLE #Pass1 ALTER COLUMN RN INT NOT NULL;

ALTER TABLE #Pass1 ADD GroupNum INT NULL;

ALTER TABLE #Pass1 ADD CONSTRAINT Pass1_PK PRIMARY KEY CLUSTERED (RN ASC);

DECLARE

@Anchor INT,

@SUM INT;

SET @SUM = 0;

--Perform the "Quirky Update" (i.e. Running Total)

UPDATE #Pass1

SET

@Anchor = RN,

@Sum = GroupNum = @SUM + NewGroup

FROM #Pass1 WITH (TABLOCKX)

OPTION (MAXDOP 1);

--Store the group number in the source table

UPDATE RS

SET

GroupID = CAST(P1.GroupNum AS NVARCHAR(25))

FROM RoadSegment RS

INNER JOIN #Pass1 P1

ON RS.SegmentId = P1.SegmentId;

--Cleanup

DROP TABLE #Pass1;

--Display the results

SELECT

*

FROM RoadSegment

ORDER BY

BeginPoint;

I did it in a temporary table since the "Quirky Update" method has very specific requirements to make it work correctly, and then updated your table with the results. (Read and understand the article.)

I noticed that your GroupID column is a NVARCHAR, is there a reason you aren't using an INT?

BTW: Using the FLOAT type for your Begin/End Point probably isn't the best idea since it in an imprecise data type, in that what you put in is not always what you will get out. (Precision can be lost.) I would recommend using the DECIMAL/NUMERIC type instead.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

This uses what is called the quirky update by some. It is one way of doing this, but the code below is is not completely correct, as I am sure Jeff will point out when he sees it.

I have left out several key components because of the size of the data set, plus it is getting late. I just wanted to demonstrate one way of doing this.

Read this article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url] and the discussions to understand this method better, especially if you intend to use this method in a production environment.

drop table [dbo].[RoadSegment];

go

CREATE TABLE [dbo].[RoadSegment](

[SegmentId] [nvarchar](25) NOT NULL,

[BeginPoint] [float] NOT NULL,

[EndPoint] [float] NOT NULL,

[SegmentRank] [float] NOT NULL,

[GroupId] [int] NULL -- change to int from nvarchar

);

GO

INSERT INTO RoadSegment (SegmentId, BeginPoint, [EndPoint], SegmentRank) VALUES

('123', 0.00, 0.10, 90),

('124', 0.01, 0.11, 92),

('125', 0.02, 0.12, 95),

('126', 0.03, 0.13, 84),

('127', 0.04, 0.14, 95),

('128', 0.05, 0.15, 95);

GO

declare @SeqRank float = 90,

@GrpId int = 0;

update dbo.RoadSegment set

GroupId = null; -- ensure GroupId is null before processing

with RoadSegments as (

select

SegmentId,

BeginPoint,

BeginPoint + 0.01 as NextPoint,

EndPoint,

SegmentRank,

GroupId

from

dbo.RoadSegment

where

SegmentRank >= @SeqRank

),

GroupedSegments as (

select

rs1.SegmentId,

rs1.BeginPoint,

rs1.NextPoint,

rs1.EndPoint,

rs1.SegmentRank,

rs1.GroupId,

rs2.BeginPoint as BeginPoint2

from

RoadSegments rs1

left outer join RoadSegments rs2

on rs1.BeginPoint = rs2.NextPoint

)

update dbo.RoadSegment set

@GrpId = GroupId = @GrpId + case when gs.BeginPoint2 is null then 1 else 0 end

from

dbo.RoadSegment rs

inner join GroupedSegments gs

on (rs.SegmentId = gs.SegmentId)

;

select

*

from

dbo.RoadSegment

where

GroupId is not null

order by

GroupId,

SegmentId

;

Tom Rippetoe SSC Veteran Points: 213 More actions · Answer 3

Thank you for the code and the article references. I am excited to read and understand both.

(I was a bit too casual in my table definition. I am using an int for the GroupId, and per the advice here, i will change the data type of the begin/end points).

Thanks again.

UMG Developer SSChampion Points: 13482 More actions · Answer 4

CELKO (3/31/2011)
Cast the segment_id to integer and do the subtraction:
('123', 0.00, 0.10, 90, ('123- 1))
('124', 0.01, 0.11, 92, ('124- 2))
('125', 0.02, 0.12, 95, ('125- 3))
('127', 0.04, 0.14, 95, ('127- 4))
('128', 0.05, 0.15, 95, ('128- 5))'

Celko,

I had thought about suggesting that method, but I made the assumption that I couldn't guarantee that the segment_id would always be an integer and always be in begin point order so I went with the safer, but less performant method.

Tom Rippetoe SSC Veteran Points: 213 More actions · Answer 5

It was just coincidence / convenience regarding the order of segment IDs in the example I created. I think the method proposed by Celko wouldn't work out for that reason.

Tom Rippetoe SSC Veteran Points: 213 More actions · Answer 6

First, thank you for your time. I do appreciate it.

The segment id is a varchar(24), and its value is assigned randomly-ish. There are some pertinent bits of info in the first 8 or so characters of the Id to help identify the road (county number, region, district, etc), but with the remaining characters there is not a particular sequence that we could count on to 'order' the segments.

I suspect adding another ROW_NUMBER() as an 'ordering surrogate' to view would be the way to go - but certainly not sure what that would look like.