Group By clause

Question

Group By clause

Viewing 14 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

At this point though - if the group by and the select don't match, can't match and won't match, I prefer to switch away from the aggregate altogether. In other words - use a CTE with the row_number() set correctly, and then only pull rows where the row_number equates to 1....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

RBarryYoung SSC Guru Points: 143329 More actions · Answer 2

Brandie Tarvin (3/9/2009)
Case statements. You do a "CASE MyStatus When 'A' Then 1 When 'C' Then 0 When 'Z' Then 0 End as StatusCount".
You HAVE to group by MyStatus even though it's nowhere else within your SELECT statement. Suddenly, if you have a status history table you're connecting to, you get duplicated results. Therefore, you have to use DISTINCT instead of "fixing" the GROUP BY.

I have to agree with Lynn. I need to see a more concrete example because I am having a hard time understanding why you have to Group By a column that you are not including in the result set, but you still need the result set to be distinct.

It's important to remember that DISTINCT is just a degenerate (though convenient) form of GROUP BY where all of the result columns are also GROUP BY columns. Any query with a DISTINCT can be transformed into an equivalent GROUP BY query.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Kit G SSCrazy Points: 2888 More actions · Answer 3

I'm new to SQL myself and I might be missing something here, but from the first few posts it looks like you're saying that GROUP BY doesn't always give you distinct/unique rows. But in that second example that you gave, the one with the additional field in the GROUP BY clause, the computer IS giving you distinct rows, by its definition of distinct. The quantities are different so that makes the row different from any other row and therefore is returned as a unique row.

The problem isn't with the GROUP BY function, but with the data being feed to it from what I can see.

I'm just trying to understand what's happening so I can understand what SQL does with the data and then hopefully make SQL do what I want to with the data. I've been given to understand that DISTINCT and GROUP BY are pretty much the same, just one is in the SELECT statement and one is in the WHERE statement. If there is another difference, I'd like to know so I use them properly.

-- Kit

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 4

Kit G (3/9/2009)
I'm new to SQL myself and I might be missing something here, but from the first few posts it looks like you're saying that GROUP BY doesn't always give you distinct/unique rows. But in that second example that you gave, the one with the additional field in the GROUP BY clause, the computer IS giving you distinct rows, by its definition of distinct. The quantities are different so that makes the row different from any other row and therefore is returned as a unique row.
The problem isn't with the GROUP BY function, but with the data being feed to it from what I can see.
I'm just trying to understand what's happening so I can understand what SQL does with the data and then hopefully make SQL do what I want to with the data. I've been given to understand that DISTINCT and GROUP BY are pretty much the same, just one is in the SELECT statement and one is in the WHERE statement. If there is another difference, I'd like to know so I use them properly.

Oh I think you have the essence of the idea. It's a matter of semantics at this point.

What Brandie is getting at is a scenario that makes it "look" like the GROUP BY is not returning distinct values (since you're only showing some of the GROUP BY columns in the SELECT statement, essentially removing an opportunity to show why something is distinct from its counterparts).

In large queries with busy developers, it's easy enough for someone to update just one of the two clauses, and cause themselves long periods of trying to "debug" why SQL isn't giving them distinct stuff.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 5

Brandie Tarvin (3/9/2009)[hr Group By doesn't guarantee distinct results.

Books Online
Groups a selected set of rows into a set of summary rows by the values of one or more columns or expressions. One row is returned for each group. Aggregate functions in the SELECT clause list provide information about each group instead of individual rows.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Kit G SSCrazy Points: 2888 More actions · Answer 6

Thank you. And now I understand where any such "errors" might be coming from.

-- Kit

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

My favorite trick to resolving most of the problems stated in this thread is to do a pre-aggregation in a Temp Table and go from there. It limits performance problems due to whacko rowcounts generated by joining to large tables and it makes the code a lot easier to write and understand. Of course, all bet's are off if you're trying to do it in a View, but I don't do aggregations in views, anyway... there's always some clown in the crowd that will try to join or do some form of filtering on the aggregated columns in the view causing the whole view to materialize before the filtering/joining occurs.

Yeah... the temp table does use TempDB... so will a lot of the stuff that gets spawned if you try to do it all in one query.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Brandie Tarvin SSC Guru Points: 173125 More actions · Answer 8

As soon as I find some data and code I can post without having to sanitize it first, I'll put up an example.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Brandie Tarvin SSC Guru Points: 173125 More actions · Answer 9

Now I feel silly. @=)

I went back through my most recent examples and found I'd re-written my code to use nested sub-queries because our BU decided they wanted different values, etc. So, I'm not currently using a DISTINCT & GROUP BY in the same query / sub-query.

But I swear I've had to throw a DISTINCT on top of a GROUP BY because if issues of using a field in a CASE statement that doesn't end up in the end SELECT.

Oh, well. Sorry. No example today.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

matt.mallen Old Hand Points: 314 More actions · Answer 10

Brandie,

Having a field referenced in a CASE within the SELECT statement does not specifically require the field to be part of the GROUP BY clause. It will work if you put all the fields referenced in the CASE, but it may not give the desired results. Alternatively, you can place the CASE into the GROUP BY clause so the results are grouped on the results of the CASE and not on the individual fields within it. I'm not sure this is the fastest route to use as I usually experience a bit of a performance hit when grouping by the output of a CASE.

Matt

Brandie Tarvin SSC Guru Points: 173125 More actions · Answer 11

Matt,

I've actually had script syntax check failures because I didn't put a field in my CASE statement in my GROUP BY. That's how I found out I needed it.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

matt.mallen Old Hand Points: 314 More actions · Answer 12

Brandie,

I have yet to find an instance where copying the CASE statement to the GROUP BY clause has not satisfied the need for grouping, but you mentioned you have experienced it. If you find an example, would you mind passing it on? It would be interesting to try and figure out why it wouldn't be able to group by the output of the CASE. The only thing that comes to mind would potentially be a call within the CASE to a scalar function pulling data from a view that holds non-deterministic values, but that doesn't seem like something one would want to group on.

Brandie Tarvin SSC Guru Points: 173125 More actions · Answer 13

Matt,

Just found a case in code I'm doing right now that failed the check because I didn't put the conditional check field in the GROUP BY.

Create Table #Detail_REIFeed (DetREIFeedID int identity(1,1) NOT NULL,

CycleID int NOT NULL, TransactionType char(6) NOT NULL,

TransactionAmount decimal(12,2) NOT NULL, AmtSign char(1) NOT NULL);

GO

Insert into #Detail_REIFeed (CycleID, TransactionType, TransactionAmount, AmtSign)

(Select 199,'PRM ',494.00,'+' UNION ALL

Select 199,'IBNR ',0.26,'+' UNION ALL

Select 199,'CHGUEP',493.48,'+' UNION ALL

Select 199,'PRM ',411.00,'+' UNION ALL

Select 199,'IBNR ',0.01,'+' UNION ALL

Select 199,'CHGUEP',410.98,'+' UNION ALL

Select 199,'PDLSS ',500.00,'+' UNION ALL

Select 199,'PDALAE',0.00,'+' UNION ALL

Select 199,'PDLSS ',200.00,'-' UNION ALL

Select 199,'PDLSS ',200.00,'-')

GO

Select Distinct CycleID, SUM(CASE AmtSign WHEN '-'

THEN ISNULL(TransactionAmount,0.00) Else 0.00 End) as NegTransAmt,

SUM(CASE AmtSign WHEN '+'

THEN ISNULL(TransactionAmount,0.00) Else 0.00 End) as PosTransAmt

from #Detail_REIFeed

Group By CycleID--, AmtSign;

-- Don't group by AmtSign

GO

Select Distinct CycleID, SUM(CASE AmtSign WHEN '-'

THEN ISNULL(TransactionAmount,0.00) Else 0.00 End) as NegTransAmt,

SUM(CASE AmtSign WHEN '+'

THEN ISNULL(TransactionAmount,0.00) Else 0.00 End) as PosTransAmt

from #Detail_REIFeed

Group By CycleID, AmtSign;

-- See the difference when grouping by AmtSign ??

GO

For reasons passing understanding, the mainframe I'm sending this to doesn't want the positive or negative amount sign in the final text file with the actual dollar amount. They want it as a separate field. And this amount is a subquery inside my main query (the main query does the final math and additional other record counts).

Anyway, this doesn't cover the DISTINCT with GROUP BY issue at the moment, but it should answer your question as to why I'd have a field in my Group By that I'm not selecting on.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

matt.mallen Old Hand Points: 314 More actions · Answer 14

You don't need to include the AmtSign in the group by (as your code shows) and once you don't include it in the group by (which is unnecessary as it is inside the sum), you only get one row with or without the distinct. The multiple rows is caused by over-grouping the results. If I wanted the sum of sales for a year I wouldn't group by the transaction date, I would group by the year. It is definitely possible to group by information inside the sum or not required for the grouping and get multiple records for the answer (though they should be distinct to the groups).