Sub-Grouping by consecutive rows

Question

Sub-Grouping by consecutive rows

mishka-723908

SSCommitted

Points: 1675
More actions
January 22, 2016 at 5:16 pm

#310976

I have a dataset like the one below. I would like to group by Col be and cola. But by the sequence of col A. See my desired output below. Thank you in advance.
Starting dataset -
ABTo Sum
1AL2
1Al3
1AL4
23NY5
1Al6
34NY7
34NY8
32CA8
42NY9
1Al10
1Al11
1Al12
Desired output
ABTo Sum
1AL9
23NY5
1Al6
34NY15
32CA8
42NY9
1Al33
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 1

I've not tested it (no readily consumable data in post... see first link in my signature line for how to do that and I strongly recommend you do that before your next post) but this is simple, provided that you understand that there's no way to enforce the output order that you posted.

SELECT A,B,ToSum = SUM(ToSum)

FROM dbo.YourTable

GROUP BY A,B

;

You can certainly add an ORDER BY after the GROUP BY but there's nothing in the data that will preserve the original order of data to get the same order as your desired output.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mishka-723908 SSCommitted Points: 1675 More actions · Answer 2

Hello, Thanks for your reply.

This will group all the rows where A = 1 and B = Al to one row.

I only want to sum those up where they are consecutive in the data set.

mister.magoo SSC-Forever Points: 47174 More actions · Answer 3

mishka-723908 (1/22/2016)
Hello, Thanks for your reply.
This will group all the rows where A = 1 and B = Al to one row.
I only want to sum those up where they are consecutive in the data set.

I suggest you read Jeff's entire post again, then follow his advice about how to ask your question so that people can help.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mishka-723908 SSCommitted Points: 1675 More actions · Answer 4

I tried to paste the result to make it looks nice, but could not.

Here is the code for the dataset in question. Thanks for your help.

CREATE TABLE #testa

(

A INTEGER,

B VARCHAR(2),

ToSum INTEGER

)

INSERT INTO #testa

( A, B, ToSum )

select 1, 'AL', 2 UNION ALL

SELECT 1, 'AL', 3 UNION ALL

SELECT 1, 'AL', 4 UNION ALL

SELECT 23, 'NY', 5 UNION ALL

SELECT 1, 'AL', 6 UNION ALL

SELECT 34, 'NY', 7 UNION ALL

SELECT 34, 'NY', 8 UNION ALL

SELECT 32, 'CA', 8 UNION ALL

SELECT 42, 'NY', 9 UNION ALL

SELECT 1, 'AL', 2 UNION ALL

SELECT 1, 'AL', 2

SELECT * FROM #testa

I would like the result to look like -

ABTo Sum

1AL9

23NY5

1Al6

34NY15

32CA8

42NY9

1Al33

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 5

I think this what you're after... Please note that this much easier on later editions of SQL server using the LAG/LEAD functions...

Some consumable test data...

IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL

DROP TABLE #temp;

CREATE TABLE #temp (

SeqNum INT IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED, -- adding an indetity column to estabolist an insertion order

-- and a clustered primary key. A properly order clustered index is going to be essential for the next step.

ColA INT NOT NULL,

ColB CHAR(2) NOT NULL,

ToSum INT NOT NULL,

SeqGroup INT NULL -- This value will be filled in to estabolish "sequense groups"

);

INSERT #temp (ColA, ColB, ToSum) VALUES

(1,'AL',2),

(1,'AL',3),

(1,'AL',4),

(23,'NY',5),

(1,'AL',6),

(34,'NY',7),

(34,'NY',8),

(32,'CA',8),

(42,'NY',9),

(1,'AL',10),

(1,'AL',11),

(1,'AL',12);

The actual solution...

-- Use the "Quirky Method" to populate the SeqGroup column.

DECLARE

@ColA INT,

@ColB CHAR(2),

@SeqNum INT,

@SeqGroup INT;

UPDATE t SET

@SeqGroup = t.SeqGroup = CASE WHEN @ColA <> t.ColA AND @ColB <> t.ColB THEN t.SeqNum ELSE ISNULL(@SeqGroup, t.SeqNum) END,

@ColA = t.ColA,

@ColB = t.ColB,

@SeqNum = t.SeqNum

FROM

#temp t WITH (TABLOCKX) -- not really necessary w/ a temp table but you do need a 100% guarantee that you are the only one accessing the table at this time.

OPTION(MAXDOP 1); -- necessary to prevent a parallel execution plan.

-- The final select

SELECT

t.ColA,

t.ColB,

ToSum = SUM(t.ToSum)

FROM

#temp t

GROUP BY

t.SeqGroup,

t.ColA,

t.ColB;

The output...

ColA ColB ToSum

----------- ---- -----------

1 AL 9

23 NY 5

1 AL 6

34 NY 15

32 CA 8

42 NY 9

1 AL 33

Side note... Jeff will be back shortly to show a better quirky update... I always manage to mangle it to some degree... 😛

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 6

mishka-723908 (1/22/2016)
Hello, Thanks for your reply.
This will group all the rows where A = 1 and B = Al to one row.
I only want to sum those up where they are consecutive in the data set.

One of the "problems" with data in a table is that you simply can't rely on the "natural order". There has to be something in the table that enforces the order. What you're asking for may look like it will work but without a column to enforce the correct sequence will break sooner than later and produce the wrong answers.

No solution is possible for what you ask unless such a column exists in the table. Anything that anyone posts that looks like it will work will be totally unreliable without it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 7

Jason A. Long (1/22/2016)
I think this what you're after... Please note that this much easier on later editions of SQL server using the LAG/LEAD functions...
Some consumable test data...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp;
CREATE TABLE #temp (
SeqNum INT IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED, -- adding an indetity column to estabolist an insertion order
-- and a clustered primary key. Aproperly order clustered index is going to be important.
ColA INT NOT NULL,
ColB CHAR(2) NOT NULL,
ToSum INT NOT NULL,
SeqGroup INT NULL -- This value will be filled in to estabolish "sequense groups"
);
INSERT #temp (ColA, ColB, ToSum) VALUES
(1,'AL',2),
(1,'AL',3),
(1,'AL',4),
(23,'NY',5),
(1,'AL',6),
(34,'NY',7),
(34,'NY',8),
(32,'CA',8),
(42,'NY',9),
(1,'AL',10),
(1,'AL',11),
(1,'AL',12);
The actual solution...
-- Use the "Quirky Method" to populate the SeqGroup column.
DECLARE
@ColA INT,
@ColB CHAR(2),
@SeqNum INT,
@SeqGroup INT;
UPDATE t SET
@SeqGroup = t.SeqGroup = CASE WHEN @ColA <> t.ColA AND @ColB <> t.ColB THEN t.SeqNum ELSE ISNULL(@SeqGroup, t.SeqNum) END,
@ColA = t.ColA,
@ColB = t.ColB,
@SeqNum = t.SeqNum
FROM
#temp t WITH (TABLOCKX) -- not really necessary w/ a temp table but you do need a 100% guarantee that you are the only one accessing the table at this time.
OPTION(MAXDOP 1); -- necessary to prevent a parallel execution plan.
-- The final select
SELECT
t.ColA,
t.ColB,
ToSum = SUM(t.ToSum)
FROM
#temp t
GROUP BY
t.SeqGroup,
t.ColA,
t.ColB;
The output...
ColA ColB ToSum
----------- ---- -----------
1 AL 9
23 NY 5
1 AL 6
34 NY 15
32 CA 8
42 NY 9
1 AL 33
Side note... Jeff will be back shortly to show a better quirky update... I always manage to mangle it to some degree... 😛

Nope. won't be demonstrating a Quirky Update for this because there is nothing to enforce the desired order. This problem has no solution without such a column to enforce the order.

There must be a disturbance in the force because this is the second such request for an impossible solution due to unenforceable ordered data in two days.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mishka-723908 SSCommitted Points: 1675 More actions · Answer 8

Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.

Thank you all for your help.

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 9

mishka-723908 (1/22/2016)
Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.
Thank you all for your help.

What is that column because that will change everything. Jason's code will need to be changed to take advantage of that column.

It you want the Quirky Update solution, I need for you to repost the readily consumable data including that column because we can't do the Quirky Update without it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jason A. Long SSC-Insane Points: 23786 More actions · Answer 10

mishka-723908 (1/22/2016)
Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.
Thank you all for your help.

Mishka - Before you scamper off and attempt to adapt this to your own data, heed Jeff's request for consumable test data that includes your actual ordering column.

The "Quirky Method" is more than capable of generating some nasty surprises, in the form of difficult to detect, incorrect results, if you don't know what you're doing and why you're doing it.

If you're considering it for a production, Jeff's article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url] is a worthwhile read (if not 100% compulsory).

It's a lengthy article, so you may want to concentrate on the 2nd 1/2 that introduces the "Quirky Update"...

The running totals problem that he address in the article is obviously different that you current task, but the rules he lays out for using the Quirky method are universal and should be fully understood before using this method in any production code.

Just my 2 cents...

mishka-723908 SSCommitted Points: 1675 More actions · Answer 11

Hello,

Here is the set. There will be a begin and end date where we get the min(begin) and max(end) for each set. The cols will be ordered by the date combination. Sorry if I asked the question in the wrong area. We are using SQL 2014 for this.

IF OBJECT_ID('tempdb..#testa', 'U') IS NOT NULL

DROP TABLE #testa;

CREATE TABLE #testa

(

A INTEGER,

B VARCHAR(2),

ToSum INTEGER,

beginDate BIGINT,

endDate bigint

)

INSERT INTO #testa

( A, B, ToSum, beginDate, endDate )

select 1, 'AL', 2, 20081003153540,20081003160918 UNION ALL

SELECT 1, 'AL', 3, 20081003160918,20090824171732 UNION ALL

SELECT 1, 'AL', 4, 20090824171732,20090824172124 UNION ALL

SELECT 23, 'NY', 5, 20090824172124,20100805131507 UNION ALL

SELECT 1, 'AL', 6, 20100805131507,20100805134025 UNION ALL

SELECT 34, 'NY', 7, 20100805134025,20100805135736 UNION ALL

SELECT 34, 'NY', 8, 20100805135736,20110720153831 UNION ALL

SELECT 32, 'CA', 8, 20110720153831,20110815143945 UNION ALL

SELECT 42, 'NY', 9, 20110815143945,20111017170554 UNION ALL

SELECT 1, 'AL', 2, 20111017170554,20111017170640 UNION ALL

SELECT 1, 'AL', 2, 20111017170640,20120809174002 UNION ALL

SELECT 1, 'AL', 2, 20120809174002,20130809174002

SELECT * FROM #testa

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 12

There's no need for the Quirky Update in such a case. The "Difference Between Two Row Numbers" will do it, will be faster that the Quirky Update if the right clustered index is available, and will use methods that are "supported".

Thanks for posting the readily consumable data. I'll be back soon.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004402 More actions · Answer 13

Here's the latest readily consumable data with a couple of comments added for other experimenters...

--===== Conditionally drop the test table to make reruns in SSMS easier.

IF OBJECT_ID('tempdb..#testa', 'U') IS NOT NULL

DROP TABLE #testa

;

--===== Create the test table. Note the modifications

CREATE TABLE #testa

(

A INTEGER NOT NULL,

B CHAR(2) NOT NULL,

ToSum INTEGER NOT NULL,

beginDate BIGINT NOT NULL,

endDate BIGINT NOT NULL--,

CONSTRAINT PK_#testa PRIMARY KEY CLUSTERED (beginDate,A,B)

)

;

--===== Populate the test table

INSERT INTO #testa

(A, B, ToSum, beginDate, endDate)

SELECT 1, 'AL', 2, 20081003153540, 20081003160918 UNION ALL

SELECT 1, 'AL', 3, 20081003160918, 20090824171732 UNION ALL

SELECT 1, 'AL', 4, 20090824171732, 20090824172124 UNION ALL

SELECT 23, 'NY', 5, 20090824172124, 20100805131507 UNION ALL

SELECT 1, 'AL', 6, 20100805131507, 20100805134025 UNION ALL

SELECT 34, 'NY', 7, 20100805134025, 20100805135736 UNION ALL

SELECT 34, 'NY', 8, 20100805135736, 20110720153831 UNION ALL

SELECT 32, 'CA', 8, 20110720153831, 20110815143945 UNION ALL

SELECT 42, 'NY', 9, 20110815143945, 20111017170554 UNION ALL

SELECT 1, 'AL', 2, 20111017170554, 20111017170640 UNION ALL

SELECT 1, 'AL', 2, 20111017170640, 20120809174002 UNION ALL

SELECT 1, 'AL', 2, 20120809174002, 20130809174002

;

--===== Show what we have, so far.

SELECT * FROM #testa

;

Here's one possible solution. I say "possible" solution because I don't have a 2012+ box at my disposal right now to test a Preceding Rows thing, which might cut down on the number of sorts and shorten the code, to boot.

--===== Solve the problem using the ordered difference between two row numbers.

WITH

cteEnumerate AS

( --=== Create the difference between row numbers to form aggregate group markers.

SELECT A, B, ToSum, beginDate,

GrpNum = ROW_NUMBER() OVER (ORDER BY beginDate)

- ROW_NUMBER() OVER (PARTITION BY A,B ORDER BY beginDate)

FROM #testa

),

cteAggregate AS

( --=== Do the aggregates by GrpNum and figure out the display order

SELECT A, B, ToSum = SUM(ToSum), SortDisplay = MIN(beginDate)

FROM cteEnumerate

GROUP BY GrpNum, A, B

) --=== Display the aggregated data in the correct order.

SELECT A, B, ToSum

FROM cteAggregate

ORDER BY SortDisplay

;

Here are the results with sequence groups in the original order as they appear in the original data...

A B ToSum

----------- ---- -----------

1 AL 9

23 NY 5

1 AL 6

34 NY 15

32 CA 8

42 NY 9

1 AL 6

(7 row(s) affected)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ed Wagner SSC Guru Points: 287026 More actions · Answer 14

Nice job, Jeff. I especially like the way you calculated the group number using two row number functions. Kind of reminiscent of something Itzik wrote once, but I don't remember where.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions