CTE Crazy: Sums, Counts and Pivots I'm missing something here

Question

Post reply

CTE Crazy: Sums, Counts and Pivots I'm missing something here

Tom257

SSChampion

Points: 11543
More actions
February 28, 2018 at 7:19 am

#327702

I've not had to use SQL for a while and I'm having trouble getting to grips with aggregates again.
In the code below I think its This line that is not doing what I expect

  SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls

I want to get a report of counts by Day and Type - with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve.
Here is a Sample table
-- DROP TABLE #TmpData; CREATE TABLE #TmpData (     CallDateTime DateTime,     Duration Time,     CallTypeId Int ) INSERT INTO #TmpData SELECT '2017-08-01 10:14:37.00', '00:00:03.450', 1 UNION ALL SELECT '2017-08-01 10:55:46.00', NULL, 2 UNION ALL SELECT '2017-08-01 11:30:50.00', NULL, 2 UNION ALL SELECT '2017-08-01 12:48:02.00', '00:00:02.500', 1 UNION ALL SELECT '2017-08-01 14:48:24.00', '00:00:02.910', 1 UNION ALL SELECT '2017-08-01 17:05:26.00', '00:00:04.460', 1 UNION ALL SELECT '2017-08-04 02:11:47.00', '00:13:54.790', 4 UNION ALL SELECT '2017-08-06 01:29:13.00', '00:01:18.180', 5 UNION ALL SELECT '2017-08-06 04:33:33.00', NULL, 3 UNION ALL SELECT '2017-08-06 04:33:45.00', '00:00:00.000', 5 UNION ALL SELECT '2017-08-06 04:33:49.00', NULL, 3 UNION ALL SELECT '2017-08-06 04:35:59.00', NULL, 3 UNION ALL SELECT '2017-08-06 05:08:22.00', '00:00:00.000', 5 UNION ALL SELECT '2017-08-06 20:51:28.00', '00:00:03.310', 4 UNION ALL SELECT '2017-08-08 15:49:27.00', '00:00:19.000', 5 UNION ALL SELECT '2017-08-10 15:24:19.00', '00:04:32.790', 4 UNION ALL SELECT '2017-08-10 18:12:30.00', NULL, 2 UNION ALL SELECT '2017-08-10 18:13:01.00', NULL, 3 UNION ALL SELECT '2017-08-10 18:40:06.00', NULL, 2 UNION ALL SELECT '2017-08-10 22:35:13.00', '00:01:33.710', 4 ;
Here is my report so far :
Its wrong! it produces counts of 1 everywhere there should be a count, but some of those 1's should be true counts e.g. Column [1] on the first date [2017-08-01 should be 4
And it seems far too complicated - (I've gone CTE crazy here) - in addition to fixing the counts there has got to be a simpler way
;WITH SplitDates AS (     SELECT    CAST(N.CallDateTime AS date) AS CallDate,             DATEDIFF(MILLISECOND, '1900-01-01',ISNULL(N.Duration,'00:00:00')) AS [DurationMs],             N.CallTypeId,             -- Rules - some circumstances are not counted             CASE WHEN N.CallTypeID = 4 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0                  WHEN N.CallTypeID = 5 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0                  ELSE 1             END AS CallCount     FROM #TmpData AS N ), DurationByDay AS (     SELECT CallDate, SUM(DurationMs) AS [Total Duration]     FROM SplitDates     GROUP BY CallDate ), SplitDays AS (     SELECT S.CallTypeId,         SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls,                 S.CallDate     FROM SplitDates AS S     GROUP BY S.CallDate, S.CallTypeId, S.CallCount ), PivotNumbers AS (     SELECT CallDate, [1], [2], [3], [4], [5] , [Total Duration] -- TODO Convert back to HH:mm:ss here     FROM ( SELECT CallTypeId, T.CallDate, NumCalls, D.[Total Duration]             FROM SplitDays AS T             JOIN DurationByDay AS D ON D.CallDate = T.CallDate             ) AS SourceTable     PIVOT ( SUM(NumCalls)             FOR CallTypeId IN ([1], [2], [3], [4], [5]) ) AS PivotTable ) SELECT * FROM PivotNumbers ORDER BY CallDate

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Tom257 SSChampion Points: 11543 More actions · Answer 1

OK so preparing this data for SSC seems to have stimuated a few grey cells

I've fixed the output it - as follows
However - can it be further simplified?

;WITH SplitDates AS ( SELECT CAST(N.CallDateTime AS date) AS CallDate, DATEDIFF(MILLISECOND, '1900-01-01',ISNULL(N.Duration,'00:00:00')) AS [DurationMs], N.CallTypeId, -- Rules - some not counted CASE WHEN N.CallTypeID = 4 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0 WHEN N.CallTypeID = 5 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0 ELSE 1 END AS CallCount FROM #TmpData AS N ), SumCallsPerTypePerDay AS ( SELECT SUM(CallCount) AS NumCalls, CallDate, CallTypeId FROM SplitDates GROUP BY CallDate, CallTypeId ), DurationByDay AS ( SELECT CallDate, SUM(DurationMs) AS [Total Duration] FROM SplitDates GROUP BY CallDate ), PivotNumbers AS ( SELECT CallDate, [1], [2], [3], [4], [5] , [Total Duration] FROM ( SELECT CallTypeId, T.CallDate, NumCalls, D.[Total Duration] FROM SumCallsPerTypePerDay AS T JOIN DurationByDay AS D ON D.CallDate = T.CallDate ) AS SourceTable PIVOT ( SUM(NumCalls) FOR CallTypeId IN ([1], [2], [3], [4], [5]) ) AS PivotTable ) SELECT * FROM PivotNumbers ORDER BY CallDate

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

Your query looks so overwhelming when it can be so simple:
SELECT CAST( td.CallDateTime AS DATE), COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1], COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2], COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3], COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4], COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5], SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00'))) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE);

EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Tom257 SSChampion Points: 11543 More actions · Answer 3

@luis Cazares
Thanks for this simplification - I have several of these reports to do, and this will help me get back into the SQL mindset.

drew.allen SSC Guru Points: 77014 More actions · Answer 4

Luis Cazares - Wednesday, February 28, 2018 7:41 AM
Your query looks so overwhelming when it can be so simple:
SELECT CAST( td.CallDateTime AS DATE), COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1], COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2], COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3], COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4], COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5], SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00'))) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE);
EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.

You can make it even simpler. Null values are ignored in SUM, so there is no reason to convert it to the additive identity, i.e., zero.

SELECT CAST( td.CallDateTime AS DATE), COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1], COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2], COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3], COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4], COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5], SUM(DATEDIFF(MILLISECOND, 0,Duration)) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE);

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

drew.allen SSC Guru Points: 77014 More actions · Answer 5

Part of the problem you were running into was your choice of using PIVOT instead of the more flexible CROSS TAB, which Luis used.

PIVOT is limited to aggregating a single attribute, but you want to aggregate TWO attributes (CallCount and Duration).
PIVOT can only reference one expression for the column definitions, but you needed to reference two (CallType and Duration).

I tend not to use PIVOT at all, because of these limitations.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Luis Cazares SSC Guru Points: 183706 More actions · Answer 6

drew.allen - Wednesday, February 28, 2018 8:42 AM
Luis Cazares - Wednesday, February 28, 2018 7:41 AM
Your query looks so overwhelming when it can be so simple:
SELECT CAST( td.CallDateTime AS DATE), COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1], COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2], COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3], COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4], COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5], SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00'))) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE);
EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.
You can make it even simpler. Null values are ignored in SUM, so there is no reason to convert it to the additive identity, i.e., zero.
SELECT CAST( td.CallDateTime AS DATE), COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1], COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2], COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3], COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4], COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5], SUM(DATEDIFF(MILLISECOND, 0,Duration)) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE);
Drew

Except that in some cases, you might get a NULL instead of a zero. It's a matter of what you expect as part of the results.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

jcelko212 32090 SSCrazy Eights Points: 9306 More actions · Answer 7

Tom257 - Wednesday, February 28, 2018 7:19 AM

I've not had to use SQL for a while and I'm having trouble getting to grips with aggregates again.

In the code below I think its This line that is not doing what I expect

SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls

I want to get a report of counts by Day and Type - with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve.
Here is a Sample table
-- DROP TABLE #TmpData; CREATE TABLE #TmpData ( CallDateTime DateTime, Duration Time, CallTypeId Int ) INSERT INTO #TmpData SELECT '2017-08-01 10:14:37.00', '00:00:03.450', 1 UNION ALL SELECT '2017-08-01 10:55:46.00', NULL, 2 UNION ALL SELECT '2017-08-01 11:30:50.00', NULL, 2 UNION ALL SELECT '2017-08-01 12:48:02.00', '00:00:02.500', 1 UNION ALL SELECT '2017-08-01 14:48:24.00', '00:00:02.910', 1 UNION ALL SELECT '2017-08-01 17:05:26.00', '00:00:04.460', 1 UNION ALL SELECT '2017-08-04 02:11:47.00', '00:13:54.790', 4 UNION ALL SELECT '2017-08-06 01:29:13.00', '00:01:18.180', 5 UNION ALL SELECT '2017-08-06 04:33:33.00', NULL, 3 UNION ALL SELECT '2017-08-06 04:33:45.00', '00:00:00.000', 5 UNION ALL SELECT '2017-08-06 04:33:49.00', NULL, 3 UNION ALL SELECT '2017-08-06 04:35:59.00', NULL, 3 UNION ALL SELECT '2017-08-06 05:08:22.00', '00:00:00.000', 5 UNION ALL SELECT '2017-08-06 20:51:28.00', '00:00:03.310', 4 UNION ALL SELECT '2017-08-08 15:49:27.00', '00:00:19.000', 5 UNION ALL SELECT '2017-08-10 15:24:19.00', '00:04:32.790', 4 UNION ALL SELECT '2017-08-10 18:12:30.00', NULL, 2 UNION ALL SELECT '2017-08-10 18:13:01.00', NULL, 3 UNION ALL SELECT '2017-08-10 18:40:06.00', NULL, 2 UNION ALL SELECT '2017-08-10 22:35:13.00', '00:01:33.710', 4 ;

Here is my report so far :
Its wrong! it produces counts of 1 everywhere there should be a count, but some of those 1's should be true counts e.g. Column [1] on the first date [2017-08-01 should be 4
And it seems far too complicated - (I've gone CTE crazy here) - in addition to fixing the counts there has got to be a simpler way

;WITH SplitDates AS ( SELECT CAST(N.CallDateTime AS date) AS CallDate, DATEDIFF(MILLISECOND, '1900-01-01',ISNULL(N.Duration,'00:00:00')) AS [DurationMs], N.CallTypeId, -- Rules - some circumstances are not counted CASE WHEN N.CallTypeID = 4 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0 WHEN N.CallTypeID = 5 AND ISNULL(N.Duration,'00:00:00.000') = '00:00:00.000' THEN 0 ELSE 1 END AS CallCount FROM #TmpData AS N ), DurationByDay AS ( SELECT CallDate, SUM(DurationMs) AS [Total Duration] FROM SplitDates GROUP BY CallDate ), SplitDays AS ( SELECT S.CallTypeId, SUM(S.CallCount) OVER(PARTITION BY S.CallTypeId, S.CallDate) AS NumCalls, S.CallDate FROM SplitDates AS S GROUP BY S.CallDate, S.CallTypeId, S.CallCount ), PivotNumbers AS ( SELECT CallDate, [1], [2], [3], [4], [5] , [Total Duration] -- TODO Convert back to HH:mm:ss here FROM ( SELECT CallTypeId, T.CallDate, NumCalls, D.[Total Duration] FROM SplitDays AS T JOIN DurationByDay AS D ON D.CallDate = T.CallDate ) AS SourceTable PIVOT ( SUM(NumCalls) FOR CallTypeId IN ([1], [2], [3], [4], [5]) ) AS PivotTable ) SELECT * FROM PivotNumbers ORDER BY CallDate

>>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<

actually,you have some more fundamental problems. What you posted is not a table! By definition, a table must have a key and you have no way to ever have a key, because everything is nullable. You then committed a design fallacy called â€œattribute splittingâ€ by putting a date, a timestamp and a time in their own columns. Letâ€™s go ahead and try and correct your design and bring you up to the current SQL standards(well, actually, whatâ€™s been in effect for over 15 years).

Please think about how silly a â€œ<something>_type_idâ€ is; what is your blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (thatâ€™s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math were you going to do on it? What is the square root of your credit cardnumber?

CREATE TABLE Foobars

(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY,

call_end_timestamp DATETIME2(0),

CHECK(call_start_timestamp <= call_end_timestamp ) ,

call_typeCHAR(1) NOT NULL

CHECK(call_type IN (â€˜01â€™, â€˜02â€™. â€˜03â€™, â€™04â€™);

Please use the ANSI/ISO syntax for insertion. Why still use the old Sybase conventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a huge conceptual error; TIME data type is not a duration; itâ€™s a clocktime! What youâ€™re doing is fundamentally wrong. This really isnâ€™tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server youâ€™re going to need a start and end timestamp (Microsoft has no INTERVAL data type). It would look like this:

INSERT INTO Foobars

VALUES

('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),

('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'),

etc

>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<

UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping.

>>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<

actually,you have some more fundamental problems. What you posted is not atable! By definition, a table must have a key and you have no way to ever have a key, because everything is nullable. You then committed a design fallacy called â€œattribute splittingâ€ by putting a date, a timestamp and a time in their own columns. Letâ€™s go ahead and try and correct your design and bring you up to the current SQL standards(well, actually, whatâ€™s been in effect for over 15 years).

Pleasethink about how silly a â€œ<something>_type_idâ€ is; what isyour blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (thatâ€™s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?

CREATETABLE Foobars

(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY,

call_end_timestamp DATETIME2(0),

CHECK(call_start_timestamp <= call_end_timestamp ) ,

call_typeCHAR(1) NOT NULL

CHECK(call_type IN (â€˜01â€™, â€˜02â€™. â€˜03â€™, â€™04â€™);

Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; itâ€™s a clocktime! What youâ€™re doing is fundamentally wrong. This really isnâ€™tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona ontemporal data in SQL. In SQL server youâ€™re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this:

INSERTINTO Foobars

VALUES

('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),

('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'),

etc

>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<

Use CAST (call_start_timestamp AS DATE) AS all_date for your grouping.

Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; itâ€™s a clocktime! What youâ€™re doing is fundamentally wrong. This really isnâ€™tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona ontemporal data in SQL. In SQL server youâ€™re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this: INSERTINTO FoobarsVALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. >>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<actually,you have some more fundamental problems. What you posted is not atable! By definition, a table must have a key and you have no way toever have a key, because everything is nullable. You then committed adesign fallacy called â€œattribute splittingâ€ by putting a date, atimestamp and a time in their own columns. Letâ€™s go ahead and tryand correct your design and bring you up to the current SQLstandards(well, actually, whatâ€™s been in effect for over 15 years).Pleasethink about how silly a â€œ<something>_type_idâ€ is; what isyour blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (thatâ€™s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?CREATETABLE Foobars(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY, call_end_timestamp DATETIME2(0), CHECK(call_start_timestamp <= call_end_timestamp ) , call_typeCHAR(1) NOT NULL CHECK(call_type IN (â€˜01â€™, â€˜02â€™. â€˜03â€™, â€™04â€™);Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; itâ€™s a clocktime! What youâ€™re doing is fundamentally wrong. This really isnâ€™tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona ontemporal data in SQL. In SQL server youâ€™re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this: INSERTINTO FoobarsVALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. [/code] etc>I want to get a report[sic:queries are not reports;]of counts by call_dateand call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want tosolve. <<UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. >>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<actually,you have some more fundamental problems. What you posted is not atable! By definition, a table must have a key and you have no way toever have a key, because everything is nullable. You then committed adesign fallacy called â€œattribute splittingâ€ by putting a date, atimestamp and a time in their own columns. Letâ€™s go ahead and tryand correct your design and bring you up to the current SQLstandards(well, actually, whatâ€™s been in effect for over 15 years).Pleasethink about how silly a â€œ<something>_type_idâ€ is; what isyour blood type ID? If you read a book on basic data modeling, youwould learn that an attribute property (thatâ€™s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?CREATETABLE Foobars(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY, call_end_timestamp DATETIME2(0), CHECK(call_start_timestamp <= call_end_timestamp ) , call_typeCHAR(1) NOT NULL CHECK(call_type IN (â€˜01â€™, â€˜02â€™. â€˜03â€™, â€™04â€™);Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, theoptimizer can do anything with it. Finally, you made a huge conceptual error; TIME data type is not a duration; itâ€™s a clocktime! What youâ€™re doing is fundamentally wrong. This really isnâ€™t your fault, so much as the way we measure time. Get a copy of the Rick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server youâ€™re going to need a start and end timestamp (Microsoft has no INTERVAL data type). It wouldlook like this: INSERT INTO Foobars VALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_date and call_type, with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve. <<
UseCAST (call_start_timestamp AS DATE) AS all_date for your grouping. [/code]Please use the ANSI/ISO syntax for insertion. Why still use the old Sybaseconventions? When you use the original Sybase row-at-a time, the optimizer can do anything with it. Finally, you made a hugeconceptual error; TIME data type is not a duration; itâ€™s a clocktime! What youâ€™re doing is fundamentally wrong. This really isnâ€™tyour fault, so much as the way we measure time. Get a copy of theRick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server youâ€™re going to need a start and end timestamp (Microsoft has no INTERVAL data type). It would look like this: INSERT INTO Foobars VALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_date and call_type, with the added factor of ignoring some items in the count - the sample data is much simplified to show the specific issue I want to solve. <<Use Cast (call_start_timestamp AS DATE) AS all_date for your grouping.

>>I've not had to use SQL for a while and I'm having trouble getting togrips with aggregates again. <<actually,you have some more fundamental problems. What you posted is not a
table! By definition, a table must have a key and you have no way toever have a key, because everything is nullable. You then committed adesign fallacy called â€œattribute splittingâ€ by putting a date, atimestamp and a time in their own columns. Letâ€™s go ahead and tryand correct your design and bring you up to the current SQL standards(well, actually, whatâ€™s been in effect for over 15 years).Pleasethink about how silly a â€œ<something>_type_idâ€ is; what is your blood type ID? If you read a book on basic data modeling, you would learn that an attribute property (thatâ€™s the ISO term for it)can be a type or an ID, but not both in some weird silly hybrid.Also, neither a type nor an identifier can be numeric; what math wereyou going to do on it? What is the square root of your credit cardnumber?

CREATETABLE Foobars(call_start_timestamp DATETIME2(0) NOT NULL PRIMARY KEY, call_end_timestamp DATETIME2(0), CHECK(call_start_timestamp <= call_end_timestamp ) , call_typeCHAR(1) NOT NULL CHECK(call_type IN (â€˜01â€™, â€˜02â€™. â€˜03â€™, â€™04â€™);Pleasethe ANSI/ISO syntax for insertion. Why still use the old Sybase conventions? When you use the original Sybase row-at-a time, the optimizer can do anything with it. Finally, you made a huge conceptual error; TIME data type is not a duration; itâ€™s a clocktime! What youâ€™re doing is fundamentally wrong. This really isn't your fault, so much as the way we measure time. Get a copy of the Rick Snodgrass PDF download from the University of Arizona on temporal data in SQL. In SQL server youâ€™re going to need a startand end timestamp (Microsoft has no INTERVAL data type). It would look like this: INSERTINTO Foobars VALUES('2017-08-0110:55:46', '2017-08-01 10:55:46', '02'),('2017-08-0111:30:50', '2017-08-01 11:30:50', '02'), etc>I want to get a report[sic:queries are not reports;]of counts by call_date and call_type, with the added factor of ignoring some items in the count - thesample data is much simplified to show the specific issue I want to solve. <<Use CAST (call_start_timestamp AS DATE) AS all_date for your grouping.

Please post DDL and follow ANSI/ISO standards when asking for help.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 8

Bravo! You really out did yourself, Joe.
I wonder if anyone can make sense of your post.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Tom257 SSChampion Points: 11543 More actions · Answer 9

@joe Celko

Wow Joe, thanks for the thorough treatise!

In answer: The original table does have a key, I omitted it as part of the simplification process as it plays no part in the output, as I omitted about 30 other columns from the original table.

The data presented is intended to isolate the specific problem I was running into, which enabled Luis Cazares to immediately see where I was going wrong.

The Duration column - You're right, it's not good to store a timespan in a time column - as you can see by converting to milliseconds for the SUM operation - but once again, the data is already there, and it doesn't include an end time. I guess that underlying this some physical device records a number of milliseconds or ticks and some *bright* developer decided to convert that to a time. If I ever get to that part of the system I'll see if its possible to store end time, or the timespan as a number.

CallType is a lookup table - and CallTypeId is a foreign key created automatically by some relationship modelling software - again non under my control and irrelevant to the problem.

And while you're marking my question like some schoolkid's homework 😀 - maybe improve the way you present your comments, I probably missed half of what you said because of repetition and poor formatting. :crazy:

Lynn Pettis SSC Guru Points: 442470 More actions · Answer 10

Tom257 - Thursday, March 1, 2018 1:02 PM
@joe Celko
Wow Joe, thanks for the thorough treatise!
In answer: The original table does have a key, I omitted it as part of the simplification process as it plays no part in the output, as I omitted about 30 other columns from the original table.
The data presented is intended to isolate the specific problem I was running into, which enabled Luis Cazares to immediately see where I was going wrong.
The Duration column - You're right, it's not good to store a timespan in a time column - as you can see by converting to milliseconds for the SUM operation - but once again, the data is already there, and it doesn't include an end time. I guess that underlying this some physical device records a number of milliseconds or ticks and some *bright* developer decided to convert that to a time. If I ever get to that part of the system I'll see if its possible to store end time, or the timespan as a number.
CallType is a lookup table - and CallTypeId is a foreign key created automatically by some relationship modelling software - again non under my control and irrelevant to the problem.
And while you're marking my question like some schoolkid's homework 😀 - maybe improve the way you present your comments, I probably missed half of what you said because of repetition and poor formatting. :crazy:

Sometimes Mr. Celko actually says something worthwhile, just doesn't seem to happen too often. Also, he wears sunglasses at night. Just ignore most of what he says, but watch for those few tidbits that actually make sense.

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 11

Luis Cazares - Wednesday, February 28, 2018 8:50 AM
drew.allen - Wednesday, February 28, 2018 8:42 AM
Luis Cazares - Wednesday, February 28, 2018 7:41 AM
Your query looks so overwhelming when it can be so simple:
SELECT CAST( td.CallDateTime AS DATE), COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1], COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2], COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3], COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4], COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5], SUM(DATEDIFF(MILLISECOND, 0,ISNULL(Duration,'00:00:00'))) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE);
EDIT: I made a mistake on the Types 4 and 5. I have corrected it and made it even simpler.
You can make it even simpler. Null values are ignored in SUM, so there is no reason to convert it to the additive identity, i.e., zero.
SELECT CAST( td.CallDateTime AS DATE), COUNT(CASE WHEN td.CallTypeId = 1 THEN 1 END) AS [1], COUNT(CASE WHEN td.CallTypeId = 2 THEN 1 END) AS [2], COUNT(CASE WHEN td.CallTypeId = 3 THEN 1 END) AS [3], COUNT(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 END) AS [4], COUNT(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 END) AS [5], SUM(DATEDIFF(MILLISECOND, 0,Duration)) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE);
Drew
Except that in some cases, you might get a NULL instead of a zero. It's a matter of what you expect as part of the results.

NULLs can have some really surprising effects on applications, especially when they can a NULL aggregation warning like both of those solutions do.

Warning: Null value is eliminated by an aggregate or other SET operation.

Some applications will interpret such a return as an error and others may include it as a part of the output while others may use it as the first of two result sets and ignore the second result set. My suggestion would be to always avoid producing such an advisory message. Since you really don't want to do this by messing with ANSI WARNINGS settings, you have to do it all in the code even if it makes it a bit longer.

Also, since this appears to be a reporting mechanism for CDRs (Call Detail Records) from a telephone system, I'll also state that the use of DATEDIFF will return an INT datatype which, speaking from experience, can quickly and easily be outstripped when aggregating millisecond call durations even though it'll handle nearly 26 days (25D:20:31:23.647, to be exact) of duration.

With both of those thoughts in mind, here's the code that I recommend for this problem because, even though it's a bit longer, is a lot more bullet proof when it comes to the problems I've cited.
SELECT CallDate = CAST( td.CallDateTime AS DATE) ,[1] = SUM(CASE WHEN td.CallTypeId = 1 THEN 1 ELSE 0 END) ,[2] = SUM(CASE WHEN td.CallTypeId = 2 THEN 1 ELSE 0 END) ,[3] = SUM(CASE WHEN td.CallTypeId = 3 THEN 1 ELSE 0 END) ,[4] = SUM(CASE WHEN td.CallTypeId = 4 AND Duration <> '00:00:00.000' THEN 1 ELSE 0 END) ,[5] = SUM(CASE WHEN td.CallTypeId = 5 AND Duration <> '00:00:00.000' THEN 1 ELSE 0 END) ,[Total Duration] = SUM(ISNULL(DATEDIFF_BIG(ms, 0,Duration),0)) FROM #TmpData AS td GROUP BY CAST( td.CallDateTime AS DATE) ;
@ Tom257 ... the method that all of us have used is an ancient "Black Arts" form of T-SQL known as a CROSS TAB. Microsoft used to include it in Books Online until they introduced the PIVOT function. Here are a couple of articles about CROSS TABs and why (IMHO) CROSS TABs are better than PIVOT.

http://www.sqlservercentral.com/articles/T-SQL/63681/
http://www.sqlservercentral.com/articles/Crosstab/65048/

@ Joe Celko ... You really need to buy a good book on T-SQL and another on the typical functionality of telephone systems.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

jcelko212 32090 SSCrazy Eights Points: 9306 More actions · Answer 12

Tom257 - Thursday, March 1, 2018 1:02 PM

>> Wow Joe, thanks for the thorough treatise! <<

I am always pedantic 🙂

>> In answer: The original table does have a key, I omitted it as part of the simplification process as it plays no part in the output, as I omitted about 30 other columns from the original table. <<

Wrong. The key is part of the very definition of a table and omitting it is an incredibly serious flaw. It says you regarding the relational database as it was a deck of punch cards with no constraints, no schema, etc. stop doing this!

>> The data presented is intended to isolate the specific problem I was running into, which enabled Luis Cazares to immediately see where I was going wrong. <<

Actually Luis was able to successfully guess at what you might have meant thanks to his having a lot of experience with bad postings. About 20 something years ago, I would've made the same guesses. But I got tired of doing that after a few decades. So I stop guessing and started correcting that

>> The Duration column - You're right, it's not good to store a timespan in a time column - as you can see by converting to milliseconds for the SUM operation - but once again, the data is already there, and it doesn't include an end time. I guess that underlying this some physical device records a number of milliseconds or ticks and some *bright* developer decided to convert that to a time. If I ever get to that part of the system I'll see if its possible to store end time, or the timespan as a number. <<

The problem with using a duration is that it's always wrong, because time is a moving thing. If you use the (start timestamp, COALSECE (end timestamp, CURRENT_TIMESTAMP)) idiom, the date range will always be correct at the time of invocation.

>> call_types is a lookup table - and call_type_id is a foreign key created automatically by some relationship modelling software - again not under my control and irrelevant to the problem. <<

There's a good rule of thumb that if the values in such a lookup table are relatively small (whatever that currently means) and static (whatever that means), then use a CHECK( .. IN ..) constraint. The optimizer will be able to immediately use this and not have to do a table join. I'm going to assume you have fewer than 100 call types, so this would fit into that model. However, if this lookup has lots of values, or is dynamic, then use a REFRENCES clause to establish the relationship in the schema. The worst thing you can see a somebody actually using outer joins.

However, there is no such thing as <something>_type_id in RDBMS. The data element can be a type, such as a blood type, employee type, etc. or it can be a "something in particular" kind of identifier. . If you like reading dull, boring standards look at the made metadata standards committee rules and the ISO 11179 standards. Those postfixes are called "attribute properties", and an attribute can have one and only one of them.

>> And while you're marking my question like some schoolkid's homework BigGrin - maybe improve the way you present your comments, I probably missed half of what you said because of repetition and poor formatting. <<

Yeah, I know I'm having trouble with my dictation program. I had to stop typing because my hands are going. It actually takes longer to train the software that I would have thought, but I do a lot of technical stuff. Maybe I should have stuck to do in those Harlequin romancesâ€¦ Someday you'll also have 45 or 50 years in this game, and your hands will be an arthritic mess too 🙁

Please post DDL and follow ANSI/ISO standards when asking for help.