June 25, 2021 at 3:00 pm
Hi,
We have a report that returns a set of aggregated data and in some cases i've duplicated records due to some nulls on a date column, is there any way to maintain the aggregation working by escaping the null value?
Example query:
select t.col1, t.col2, sum(t.col3) col3, sum(t.col4) col4, col5
from (
select 7213 col1, 992 col2, 30 col3, 0 col4, cast(getdate() as date) col5
union all
select 7213 col1, 992 col2, 0 col3, 30 col4, null col5
) t
group by t.col1, t.col2, col5
It returns 2 rows because of the Null value on col5 and what should return was a single record, any ideia on how to escape the null on col5?
Thanks in advance
June 25, 2021 at 3:24 pm
This is likely to be problematic. Consider the following (I added an extra row of data)
SELECT t.col1
,t.col2
,col3 = SUM(t.col3)
,col4 = SUM(t.col4)
,t.col5
FROM
(
SELECT col1 = 7213
,col2 = 992
,col3 = 30
,col4 = 0
,col5 = CAST(GETDATE() AS DATE)
UNION ALL
SELECT col1 = 7213
,col2 = 992
,col3 = 0
,col4 = 30
,col5 = NULL
UNION ALL
SELECT col1 = 7213
,col2 = 992
,col3 = 0
,col4 = 30
,col5 = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
) t
GROUP BY t.col1
,t.col2
,t.col5;
Given the above scenario, which row should the NULL aggregate into?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 25, 2021 at 3:30 pm
select t.col1, t.col2, sum(t.col3) col3, sum(t.col4) col4, col5
from (
select 7213 col1, 992 col2, 30 col3, 0 col4, cast(getdate() as date) col5
union all
select 7213 col1, 992 col2, 0 col3, 30 col4, null col5
) t
where t.col5 is not null
group by t.col1, t.col2, col5
June 25, 2021 at 3:34 pm
where t.col5 is not null
Hah! I thought of this too, but assumed that the values in Col3 and Col4 for the NULL date were required to be included somewhere in the SUMs.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 25, 2021 at 4:03 pm
If it would be OK to force the NULL value to be added to lowest date:
;WITH test_data AS (
SELECT col1 = 7213
,col2 = 992
,col3 = 30
,col4 = 0
,col5 = CAST(GETDATE() AS DATE)
UNION ALL
SELECT col1 = 7213
,col2 = 992
,col3 = 0
,col4 = 30
,col5 = NULL
UNION ALL
SELECT col1 = 7213
,col2 = 992
,col3 = 0
,col4 = 30
,col5 = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
)
SELECT t.col1
,t.col2
,col3 = SUM(t.col3)
,col4 = SUM(t.col4)
,col5
FROM (
SELECT col1, col2, col3, col4, CASE WHEN col5 IS NULL THEN LEAD(col5, 1) OVER(PARTITION BY col1, col2 ORDER BY col5) ELSE col5 END AS col5
FROM test_data t
) AS t
GROUP BY t.col1
,t.col2
,t.col5
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 25, 2021 at 4:50 pm
If it would be OK to force the NULL value to be added to lowest date:
;WITH test_data AS (
SELECT col1 = 7213
,col2 = 992
,col3 = 30
,col4 = 0
,col5 = CAST(GETDATE() AS DATE)
UNION ALL
SELECT col1 = 7213
,col2 = 992
,col3 = 0
,col4 = 30
,col5 = NULL
UNION ALL
SELECT col1 = 7213
,col2 = 992
,col3 = 0
,col4 = 30
,col5 = DATEADD(DAY, -1, CAST(GETDATE() AS DATE))
)
SELECT t.col1
,t.col2
,col3 = SUM(t.col3)
,col4 = SUM(t.col4)
,col5
FROM (
SELECT col1, col2, col3, col4, CASE WHEN col5 IS NULL THEN LEAD(col5, 1) OVER(PARTITION BY col1, col2 ORDER BY col5) ELSE col5 END AS col5
FROM test_data t
) AS t
GROUP BY t.col1
,t.col2
,t.col5
That will have issues with multiple nulls,
You could try something like,
WITH TEMP_CTE AS(
SELECT *, ISNULL(COL_ONE, MIN(COL_ONE) OVER(PARTITION BY 1)) AS SUM_DATE FROM
(VALUES (getdate(), 1)
, (DATEADD(day, -1, getdate()), 2)
, (DATEADD(day, -1, getdate()), 5)
, (null, 3)
, (null, 4)
, (null, 9)
, (getdate(), 3)
, (DATEADD(day, 1, getdate()), 5)) TESTY(COL_ONE, COL_TWO)
)
SELECT SUM_DATE, SUM(COL_TWO) FROM TEMP_CTE
GROUP BY SUM_DATE
June 25, 2021 at 5:31 pm
Hi,
In our report the col5 only returns:
On the 1 and 2 scenarios no issue, but on the rare ocasions that the report returns option 3 i've the problem
Thanks
June 25, 2021 at 5:45 pm
Instead of using LEAD or LAG you could use MIN/MAX/FIRST_VALUE/LAST_VALUE depending on requirements. Or - you could define a default value to all null values.
Which one you use depends on the requirements.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 25, 2021 at 8:19 pm
For over 35 years, SQL forums have asked posters to provide DDL, sample data and their first honest attempt at a solution on their own.
I had a little something extra and require that you follow ISO and ANSI standard rules, as well as good coding practices. Looking at the number of points you have, you probably have no idea what any of this means. But basically you post a lot of garbage exclamation. For example, why do you think that col "-n" is a meaningful name in a relational model? Sorry that this is what you might have expressed in a spreadsheet 30 years ago. If you're going to do this kind of garbage programming, that at least put the "as" keyword between an expression and its new name.
Without any DDL, it is impossible to answer your question. We cannot read your mind or see what's on your screen. I would strongly suggest that you start using the ANSI/ISO standard CURRENT_TIMESTAMP instead of the old Sybase getdate(). Writing things with proprietary code simply makes you look at a hillbilly who doesn't yet speak the language.
Please post DDL and follow ANSI/ISO standards when asking for help.
July 15, 2021 at 3:15 am
Thanks for your sharing,That's pretty cool.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply