Simple GROUP BY Query

Question

Simple GROUP BY Query

stephen.aa

SSC Eights!

Points: 815
More actions
March 25, 2021 at 3:54 pm

#3862067

Hello:
I have this query. I am trying to understand why I have to GROUP by all of the fields in my SELECT Statement. Also, the query is returning the sum for all of the jmoJobID fields, when I want this to be summed by the jmaPartID field.
SELECT DISTINCT JobOperations.jmoJobID, JobAssemblies.jmaPartID, JobOperations.jmoWorkCenterID, JobOperations.jmoProcessID, JobOperations.jmoOperationQuantity, SUM(JobOperations.jmoCompletedProductionHours) AS SUM_CompletedProductionHours
FROM JobOperations INNER JOIN
JobAssemblies ON JobOperations.jmoJobID = JobAssemblies.jmaJobID
WHERE jmoJobID = '201739-01'
GROUP BY JobOperations.jmoJobID, JobOperations.jmoWorkCenterID, JobAssemblies.jmaPartID, JobOperations.jmoProcessID, JobOperations.jmoOperationQuantity
ORDER BY JobOperations.jmoWorkCenterID

Thank you!
Steve Anderson

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101126 More actions · Answer 1

Here's my best guess at what you're trying to do.

SELECT 
    JO.jmoJobID, JA.jmaPartID, JO.jmoWorkCenterID, 
    JO.jmoProcessID, JO.jmoOperationQuantity, 
    JA.SUM_CompletedProductionHours

FROM JobOperations JO INNER JOIN

(
    SELECT jmaJobID, jmaPartID, SUM(jmaCompletedProductionHours) AS SUM_CompletedProductionHours
    FROM JobAssemblies 
    GROUP BY jmaJobID, jmaPartID
) AS JA ON 
JO.jmoJobID = JA.jmaJobID

WHERE jmoJobID = '201739-01'

ORDER BY JO.jmoWorkCenterID

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Mr. Brian Gale SSC-Insane Points: 24800 More actions · Answer 2

It is because of your aggregate function (SUM). Once you put in an aggregate function, you NEED to group by all non-aggregate columns that are in use. Otherwise SQL has no way to "group" the data to be used by the SUM in any meaningful manner.

Now the fun trick comes in when you want to SUM but NOT group by all of the columns (as is your case) - that is where Windowing functions come in. So your SUM statement would be something like:

SUM(JobOperations.jmoCompletedProductionHours) OVER (PARTITION BY jmoJobID) AS SUM_CompletedProductionHours

Then remove the GROUP BY clause completely.

The "OVER" keyword tells SQL "this is a windowing function". The PARTITION BY clause tells SQL "I want to group by this/these column(s)". In this case, it would group it by jmoJobID and ONLY jmoJobID. You can add "ORDER BY" to a windowing function as well, but that clause is useless in a SUM operation. Now, if you were doing ROW_NUMBER() (for example), then the ORDER BY would be helpful; but SUM and COUNT don't benefit from the data being in any specific order.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

stephen.aa SSC Eights! Points: 815 More actions · Answer 3

Thanks for your reply.

I actually tried this. It is summing the hours not by the PartID in the Job, but by the job, so it appears. Every row for the last column has the same value.

SELECT DISTINCT JobOperations.jmoJobID, JobAssemblies.jmaPartID, JobOperations.jmoWorkCenterID, JobOperations.jmoProcessID, JobOperations.jmoOperationQuantity, SUM(JobOperations.jmoCompletedProductionHours) OVER (PARTITION BY JobAssemblies.jmaPartID) AS SUM_CompletedProductionHours

FROM JobOperations INNER JOIN

JobAssemblies ON JobOperations.jmoJobID = JobAssemblies.jmaJobID

WHERE jmoJobID = '201739-01'

ORDER BY JobOperations.jmoWorkCenterID

Steve Anderson

stephen.aa SSC Eights! Points: 815 More actions · Answer 4

Scott:

Thank you for your reply. What did I miss?

Invalid column name 'jmoCompletedProductionHours'.

Steve Anderson

ScottPletcher SSC Guru Points: 101126 More actions · Answer 5

Sorry, thought I changed it to jma.... (which I'll now do in my orig post as well). But I'm assuming that such a column exists just because or your comments in the OP:

SUM(jmaCompletedProductionHours) AS SUM_CompletedProductionHours

Now that I think about it more, there's likely not such a column in that table.

Keep in mind, we know NOTHING about your tables. Especially as you didn't provide DDL for the tables, so I had to guess.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

jcelko212 32090 SSCrazy Eights Points: 9303 More actions · Answer 6

>> I am trying to understand why I have to GROUP by all of the fields[sic: columns] in my SELECT Statement. Also, the query is returning the sum for all of the jmoJob_id fields [sic], when I want this to be summed by the jmaPart_id field [sic]. <<

The ANSI/ISO standards use "field" to a part of the column that has some meaning in itself. For example, a DATE value has the fields {year, month, day}, which have some meaning but not a complete meaning in themselves. I think you meant column.

Your use of camel case is also somewhat problematic, because so many ISO standards have different conventions about the meaning of upper and lowercase and identifiers. When I worked at AIRMICS, we also found that a person's eyes jump to an uppercase letter, so the text actually gets harder to read when you mix cases. This is why the body of your books, newspapers, etc. don't do that.

Also, putting the table name as a prefix is a really bad idea. For example, that the job_id is a totally different attribute when it appears in job operations and in job assemblies; if this is the same data element, then it should have the same consistent name of your entire schema and actually for your entire enterprise. This convention is a leftover from the 1960s filesystems, when each file was completely separate from the others and not part of a schema

It would also help if you would follow netiquette and post DDL for us. In over 30 years of writing SQL, I have very rarely had to use the SELECT DISTINCT because I usually have a key and the key guarantees me uniqueness. I also recommend not using an ORDER BY in a query. This turns the query into a cursor, and the display options for cursor should be done in the presentation layer. Never in the database layer.

SELECT JO.job_id, JA.part_id, JO.workcenter_id, JO.process_id, JO.operation_quantity,

SUM (JO.completed_production_hours) AS completed_production_hours_sum

FROM JobOperations. AS JO,

JobAssemblies AS JA

WHERE JO.job_id = JA.job_id

AND JO. job_id = '201739-01'

GROUP BY JO.job_id, JA.part_id, JO.workcenter_id, JO.process_id, JO.operation_quantity;

Now to answer your original question. Think about it, what single value would you use for, say, the workcenter_id when you have 100 of them?

Please post DDL and follow ANSI/ISO standards when asking for help.