January 22, 2016 at 5:16 pm
I have a dataset like the one below. I would like to group by Col be and cola. But by the sequence of col A. See my desired output below. Thank you in advance.
Starting dataset -
ABTo Sum
1AL2
1Al3
1AL4
23NY5
1Al6
34NY7
34NY8
32CA8
42NY9
1Al10
1Al11
1Al12
Desired output
ABTo Sum
1AL9
23NY5
1Al6
34NY15
32CA8
42NY9
1Al33
January 22, 2016 at 5:37 pm
I've not tested it (no readily consumable data in post... see first link in my signature line for how to do that and I strongly recommend you do that before your next post) but this is simple, provided that you understand that there's no way to enforce the output order that you posted.
SELECT A,B,ToSum = SUM(ToSum)
FROM dbo.YourTable
GROUP BY A,B
;
You can certainly add an ORDER BY after the GROUP BY but there's nothing in the data that will preserve the original order of data to get the same order as your desired output.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2016 at 5:45 pm
Hello, Thanks for your reply.
This will group all the rows where A = 1 and B = Al to one row.
I only want to sum those up where they are consecutive in the data set.
January 22, 2016 at 5:49 pm
mishka-723908 (1/22/2016)
Hello, Thanks for your reply.This will group all the rows where A = 1 and B = Al to one row.
I only want to sum those up where they are consecutive in the data set.
I suggest you read Jeff's entire post again, then follow his advice about how to ask your question so that people can help.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 22, 2016 at 6:16 pm
I tried to paste the result to make it looks nice, but could not.
Here is the code for the dataset in question. Thanks for your help.
CREATE TABLE #testa
(
A INTEGER,
B VARCHAR(2),
ToSum INTEGER
)
INSERT INTO #testa
( A, B, ToSum )
select 1, 'AL', 2 UNION ALL
SELECT 1, 'AL', 3 UNION ALL
SELECT 1, 'AL', 4 UNION ALL
SELECT 23, 'NY', 5 UNION ALL
SELECT 1, 'AL', 6 UNION ALL
SELECT 34, 'NY', 7 UNION ALL
SELECT 34, 'NY', 8 UNION ALL
SELECT 32, 'CA', 8 UNION ALL
SELECT 42, 'NY', 9 UNION ALL
SELECT 1, 'AL', 2 UNION ALL
SELECT 1, 'AL', 2 UNION ALL
SELECT 1, 'AL', 2
SELECT * FROM #testa
I would like the result to look like -
ABTo Sum
1AL9
23NY5
1Al6
34NY15
32CA8
42NY9
1Al33
January 22, 2016 at 6:38 pm
I think this what you're after... Please note that this much easier on later editions of SQL server using the LAG/LEAD functions...
Some consumable test data...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp;
CREATE TABLE #temp (
SeqNum INT IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED, -- adding an indetity column to estabolist an insertion order
-- and a clustered primary key. A properly order clustered index is going to be essential for the next step.
ColA INT NOT NULL,
ColB CHAR(2) NOT NULL,
ToSum INT NOT NULL,
SeqGroup INT NULL -- This value will be filled in to estabolish "sequense groups"
);
INSERT #temp (ColA, ColB, ToSum) VALUES
(1,'AL',2),
(1,'AL',3),
(1,'AL',4),
(23,'NY',5),
(1,'AL',6),
(34,'NY',7),
(34,'NY',8),
(32,'CA',8),
(42,'NY',9),
(1,'AL',10),
(1,'AL',11),
(1,'AL',12);
The actual solution...
-- Use the "Quirky Method" to populate the SeqGroup column.
DECLARE
@ColA INT,
@ColB CHAR(2),
@SeqNum INT,
@SeqGroup INT;
UPDATE t SET
@SeqGroup = t.SeqGroup = CASE WHEN @ColA <> t.ColA AND @ColB <> t.ColB THEN t.SeqNum ELSE ISNULL(@SeqGroup, t.SeqNum) END,
@ColA = t.ColA,
@ColB = t.ColB,
@SeqNum = t.SeqNum
FROM
#temp t WITH (TABLOCKX) -- not really necessary w/ a temp table but you do need a 100% guarantee that you are the only one accessing the table at this time.
OPTION(MAXDOP 1); -- necessary to prevent a parallel execution plan.
-- The final select
SELECT
t.ColA,
t.ColB,
ToSum = SUM(t.ToSum)
FROM
#temp t
GROUP BY
t.SeqGroup,
t.ColA,
t.ColB;
The output...
ColA ColB ToSum
----------- ---- -----------
1 AL 9
23 NY 5
1 AL 6
34 NY 15
32 CA 8
42 NY 9
1 AL 33
Side note... Jeff will be back shortly to show a better quirky update... I always manage to mangle it to some degree... 😛
January 22, 2016 at 6:46 pm
mishka-723908 (1/22/2016)
Hello, Thanks for your reply.This will group all the rows where A = 1 and B = Al to one row.
I only want to sum those up where they are consecutive in the data set.
One of the "problems" with data in a table is that you simply can't rely on the "natural order". There has to be something in the table that enforces the order. What you're asking for may look like it will work but without a column to enforce the correct sequence will break sooner than later and produce the wrong answers.
No solution is possible for what you ask unless such a column exists in the table. Anything that anyone posts that looks like it will work will be totally unreliable without it.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2016 at 6:48 pm
Jason A. Long (1/22/2016)
I think this what you're after... Please note that this much easier on later editions of SQL server using the LAG/LEAD functions...Some consumable test data...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
DROP TABLE #temp;
CREATE TABLE #temp (
SeqNum INT IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED, -- adding an indetity column to estabolist an insertion order
-- and a clustered primary key. Aproperly order clustered index is going to be important.
ColA INT NOT NULL,
ColB CHAR(2) NOT NULL,
ToSum INT NOT NULL,
SeqGroup INT NULL -- This value will be filled in to estabolish "sequense groups"
);
INSERT #temp (ColA, ColB, ToSum) VALUES
(1,'AL',2),
(1,'AL',3),
(1,'AL',4),
(23,'NY',5),
(1,'AL',6),
(34,'NY',7),
(34,'NY',8),
(32,'CA',8),
(42,'NY',9),
(1,'AL',10),
(1,'AL',11),
(1,'AL',12);
The actual solution...
-- Use the "Quirky Method" to populate the SeqGroup column.
DECLARE
@ColA INT,
@ColB CHAR(2),
@SeqNum INT,
@SeqGroup INT;
UPDATE t SET
@SeqGroup = t.SeqGroup = CASE WHEN @ColA <> t.ColA AND @ColB <> t.ColB THEN t.SeqNum ELSE ISNULL(@SeqGroup, t.SeqNum) END,
@ColA = t.ColA,
@ColB = t.ColB,
@SeqNum = t.SeqNum
FROM
#temp t WITH (TABLOCKX) -- not really necessary w/ a temp table but you do need a 100% guarantee that you are the only one accessing the table at this time.
OPTION(MAXDOP 1); -- necessary to prevent a parallel execution plan.
-- The final select
SELECT
t.ColA,
t.ColB,
ToSum = SUM(t.ToSum)
FROM
#temp t
GROUP BY
t.SeqGroup,
t.ColA,
t.ColB;
The output...
ColA ColB ToSum
----------- ---- -----------
1 AL 9
23 NY 5
1 AL 6
34 NY 15
32 CA 8
42 NY 9
1 AL 33
Side note... Jeff will be back shortly to show a better quirky update... I always manage to mangle it to some degree... 😛
Nope. won't be demonstrating a Quirky Update for this because there is nothing to enforce the desired order. This problem has no solution without such a column to enforce the order.
There must be a disturbance in the force because this is the second such request for an impossible solution due to unenforceable ordered data in two days.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2016 at 6:52 pm
Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.
Thank you all for your help.
January 22, 2016 at 6:56 pm
mishka-723908 (1/22/2016)
Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.Thank you all for your help.
What is that column because that will change everything. Jason's code will need to be changed to take advantage of that column.
It you want the Quirky Update solution, I need for you to repost the readily consumable data including that column because we can't do the Quirky Update without it.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2016 at 7:23 pm
mishka-723908 (1/22/2016)
Hello, Thank you for the reply. It did work. This is the way I get the data in the file from a source system and need to properly insert it. There will be a column to keep the table properly ordered.Thank you all for your help.
Mishka - Before you scamper off and attempt to adapt this to your own data, heed Jeff's request for consumable test data that includes your actual ordering column.
The "Quirky Method" is more than capable of generating some nasty surprises, in the form of difficult to detect, incorrect results, if you don't know what you're doing and why you're doing it.
If you're considering it for a production, Jeff's article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url] is a worthwhile read (if not 100% compulsory).
It's a lengthy article, so you may want to concentrate on the 2nd 1/2 that introduces the "Quirky Update"...
The running totals problem that he address in the article is obviously different that you current task, but the rules he lays out for using the Quirky method are universal and should be fully understood before using this method in any production code.
Just my 2 cents...
January 22, 2016 at 7:24 pm
Hello,
Here is the set. There will be a begin and end date where we get the min(begin) and max(end) for each set. The cols will be ordered by the date combination. Sorry if I asked the question in the wrong area. We are using SQL 2014 for this.
IF OBJECT_ID('tempdb..#testa', 'U') IS NOT NULL
DROP TABLE #testa;
CREATE TABLE #testa
(
A INTEGER,
B VARCHAR(2),
ToSum INTEGER,
beginDate BIGINT,
endDate bigint
)
INSERT INTO #testa
( A, B, ToSum, beginDate, endDate )
select 1, 'AL', 2, 20081003153540,20081003160918 UNION ALL
SELECT 1, 'AL', 3, 20081003160918,20090824171732 UNION ALL
SELECT 1, 'AL', 4, 20090824171732,20090824172124 UNION ALL
SELECT 23, 'NY', 5, 20090824172124,20100805131507 UNION ALL
SELECT 1, 'AL', 6, 20100805131507,20100805134025 UNION ALL
SELECT 34, 'NY', 7, 20100805134025,20100805135736 UNION ALL
SELECT 34, 'NY', 8, 20100805135736,20110720153831 UNION ALL
SELECT 32, 'CA', 8, 20110720153831,20110815143945 UNION ALL
SELECT 42, 'NY', 9, 20110815143945,20111017170554 UNION ALL
SELECT 1, 'AL', 2, 20111017170554,20111017170640 UNION ALL
SELECT 1, 'AL', 2, 20111017170640,20120809174002 UNION ALL
SELECT 1, 'AL', 2, 20120809174002,20130809174002
SELECT * FROM #testa
January 22, 2016 at 8:04 pm
There's no need for the Quirky Update in such a case. The "Difference Between Two Row Numbers" will do it, will be faster that the Quirky Update if the right clustered index is available, and will use methods that are "supported".
Thanks for posting the readily consumable data. I'll be back soon.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 22, 2016 at 9:07 pm
Here's the latest readily consumable data with a couple of comments added for other experimenters...
--===== Conditionally drop the test table to make reruns in SSMS easier.
IF OBJECT_ID('tempdb..#testa', 'U') IS NOT NULL
DROP TABLE #testa
;
--===== Create the test table. Note the modifications
CREATE TABLE #testa
(
A INTEGER NOT NULL,
B CHAR(2) NOT NULL,
ToSum INTEGER NOT NULL,
beginDate BIGINT NOT NULL,
endDate BIGINT NOT NULL--,
CONSTRAINT PK_#testa PRIMARY KEY CLUSTERED (beginDate,A,B)
)
;
--===== Populate the test table
INSERT INTO #testa
(A, B, ToSum, beginDate, endDate)
SELECT 1, 'AL', 2, 20081003153540, 20081003160918 UNION ALL
SELECT 1, 'AL', 3, 20081003160918, 20090824171732 UNION ALL
SELECT 1, 'AL', 4, 20090824171732, 20090824172124 UNION ALL
SELECT 23, 'NY', 5, 20090824172124, 20100805131507 UNION ALL
SELECT 1, 'AL', 6, 20100805131507, 20100805134025 UNION ALL
SELECT 34, 'NY', 7, 20100805134025, 20100805135736 UNION ALL
SELECT 34, 'NY', 8, 20100805135736, 20110720153831 UNION ALL
SELECT 32, 'CA', 8, 20110720153831, 20110815143945 UNION ALL
SELECT 42, 'NY', 9, 20110815143945, 20111017170554 UNION ALL
SELECT 1, 'AL', 2, 20111017170554, 20111017170640 UNION ALL
SELECT 1, 'AL', 2, 20111017170640, 20120809174002 UNION ALL
SELECT 1, 'AL', 2, 20120809174002, 20130809174002
;
--===== Show what we have, so far.
SELECT * FROM #testa
;
Here's one possible solution. I say "possible" solution because I don't have a 2012+ box at my disposal right now to test a Preceding Rows thing, which might cut down on the number of sorts and shorten the code, to boot.
--===== Solve the problem using the ordered difference between two row numbers.
WITH
cteEnumerate AS
( --=== Create the difference between row numbers to form aggregate group markers.
SELECT A, B, ToSum, beginDate,
GrpNum = ROW_NUMBER() OVER (ORDER BY beginDate)
- ROW_NUMBER() OVER (PARTITION BY A,B ORDER BY beginDate)
FROM #testa
),
cteAggregate AS
( --=== Do the aggregates by GrpNum and figure out the display order
SELECT A, B, ToSum = SUM(ToSum), SortDisplay = MIN(beginDate)
FROM cteEnumerate
GROUP BY GrpNum, A, B
) --=== Display the aggregated data in the correct order.
SELECT A, B, ToSum
FROM cteAggregate
ORDER BY SortDisplay
;
Here are the results with sequence groups in the original order as they appear in the original data...
A B ToSum
----------- ---- -----------
1 AL 9
23 NY 5
1 AL 6
34 NY 15
32 CA 8
42 NY 9
1 AL 6
(7 row(s) affected)
--Jeff Moden
Change is inevitable... Change for the better is not.
January 23, 2016 at 9:30 am
Nice job, Jeff. I especially like the way you calculated the group number using two row number functions. Kind of reminiscent of something Itzik wrote once, but I don't remember where.
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply