Find Similar rows and add totals together.

Question

Post reply

Find Similar rows and add totals together.

leroy-1092048

Ten Centuries

Points: 1054
More actions
November 30, 2010 at 2:41 pm

#229196

Hello all,
I have a question and just cant figure it out.
For example I have a table with 5 columns A, B, C, D and E
I would like to find the rows that all have identical Values in A, B and C and return the SUM(D) and MIN(E)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 1

Is this a homework q?

Look at GROUP BY clause ... it will provide what you need.

Scott Pletcher, SQL Server MVP 2008-2010

leroy-1092048 Ten Centuries Points: 1054 More actions · Answer 2

LoL.. this is not home work... well i guess it is since its for work, but i am at home. 🙂

to clarify a little:

we have a table in our DB that keeps stats of the number of transactions.

the stats are stored in the table every hour and keep track of Who, Where, why, number/hour and the rowID, time

these hourly stats are flooding our system and we are looking to reduce them to daily stats.

so the ultimate plan is to take the grouping of (who, where and why) A.K.A. A, B and C

and total the number/hour (D) and the min rowID (E) for each grouping. i can deal with the time.

---------------------------

you are right i got the results needed by using the Group By clause

, for some reason i got lost in the Over(partition by.... ) stuff

here is what i used

Select A, B, C, sum(D) as Total, min(E) as MinID

from statsTable

group by A, B, C

now here is what i believe to be the tricky part.

I need to take these results and save the Total - sum(E) to the row with the MinID

and delete the rows that are not the minID for each group.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Looks like you were on the right track at the start - does this help?

;WITH RowPicker AS (

SELECT A, B, C, D, E,

Total = SUM(D) OVER (PARTITION BY A, B, C),

MinID = MIN(E) OVER (PARTITION BY A, B, C)

FROM statsTabl

)

SELECT *

FROM RowPicker WHERE MinID = E

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

leroy-1092048 Ten Centuries Points: 1054 More actions · Answer 4

thanks for your iput on that..

i am currently working with:

Select A, B, C, sum(D) as Total, min(E) as MinID

from statsTable

group by A, B, C

as this query retuns exactly what i want my final data to ook like.

the issue now, is to make the data in the 'Statstable' end up exactly like the data the query returns.

so i am not sure how to modify the T-SQL to end up like that.

bassically i need to take the datafrom the above query and:

for each row returned in the above query i need to

update the row where E = Min(E) and set D with the value of SUM(D)

delete all rows not equal to minID's

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

leroy-1092048 (12/1/2010)
...
update the row where E = Min(E) and set D with the value of SUM(D)
delete all rows not equal to minID's
...

So...how will you identify those rows? The query I posted above provides this information.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 6

Barring some type of "fuzzy" update, the fastest way is probably to save your results as a temporary table, then use that table to UPDATE the original data and DELETE the unneeded rows.

For example:

IF OBJECT_ID('tempdb.dbo.#totals') IS NOT NULL

DROP TABLE tempdb.dbo.#totals

SELECT A, B, C, SUM(D) AS Total, MIN(E) AS MinID, MAX(E) AS MaxID

INTO #totals

FROM tablename

GROUP BY A, B, C

UPDATE tablename

SET D = t.Total

FROM tablename

INNER JOIN #totals t ON

t.A = tablename.A AND t.B = tablename.B AND t.C = tablename.C AND

t.MinID = tablename.E

DELETE FROM tablename

FROM tablename

INNER JOIN #totals t ON

t.A = tablename.A AND t.B = tablename.B AND t.C = tablename.C AND

t.MinID <> tablename.E AND tablename.E <= t.MaxID

Scott Pletcher, SQL Server MVP 2008-2010

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

scott.pletcher (12/2/2010)
Barring some type of "fuzzy" update, the fastest way is probably to save your results as a temporary table, then use that table to UPDATE the original data and delete the unneeded rows.
For example:
IF OBJECT_ID('tempdb.dbo.#totals') IS NOT NULL
DROP TABLE tempdb.dbo.#totals
SELECT A, B, C, SUM(D) AS Total, MIN(E) AS MinID, MAX(E) AS MaxID
INTO #totals
FROM tablename
GROUP BY A, B, C
UPDATE tablename
SET D = t.Total
FROM tablename
INNER JOIN #totals t ON
t.A = tablename.A AND t.B = tablename.B AND t.C = tablename.C AND
t.MinID = tablename.E
DELETE FROM tablename
FROM tablename
INNER JOIN #totals t ON
t.A = tablename.A AND t.B = tablename.B AND t.C = tablename.C AND
t.MinID <> tablename.E AND tablename.E <= t.MaxID

What happens when two rows share the same minimum value for 'E'?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 8

I thought E was an id value and thus unique. Maybe I'm wrong there.

Scott Pletcher, SQL Server MVP 2008-2010

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

scott.pletcher (12/2/2010)
I thought E was an id value and thus unique. Maybe I'm wrong there.

Maybe not - there's not enough info.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

leroy-1092048 Ten Centuries Points: 1054 More actions · Answer 10

Chris ad scott,

let me test what you two have posted and i will give you an update.

E is a time stamp for each hour of the day. so there are no duplicates of E.

so the idea here is to take each time stamp for the day (upto 24 of them) and take the sum(D), that is the transaction count for each hour and reduce it down to daily totals instead of the hourly totals... 1 entry per day. reducing the storage needs and rowcount... especially on older data.

the reason for the Min(E) on the time stamp is because if a unique (user, location, action --- A, B, C) did not happen then there would be no entry that hour... so i could not assume that each grouping would have a 00:00:00 time stamp

but i really appreciate your help and you have given me a lot of ammo here to continue. Thank you.

I will keep you posted and also if all works well i will post the final code that was used.

Leroy L

scott.pletcher SSCertifiable Points: 5958 More actions · Answer 11

Thanks for the explanation, that makes sense.

Scott Pletcher, SQL Server MVP 2008-2010

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

@leroy - thanks.

@scott - your method will almost certainly perform better than mine, with two analytical functions.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden