September 24, 2013 at 3:50 am
Hi All - I have a count/grouping problem and for the life of me can't make sense of it. Involved in the query is one table (plus a time-dimension table as a variation) which holds info on enquiries made, each enquiry has its int key and an FK referencing the table which contains info on cases, it's a straight forward one-to many relationship. What I want the query to do is count distinct cases and count enquiries per month. It shouldn't be hard, but I get inconsistent and partly wrong results. Below is the sql with result sets. The first two snipets are variations on achiving the count/group operations in one query, and the third bit looking up each month seperately and puts it in one result set, where the months are in the same order as in the two queries above. The last result set is also the correct one.
I have checked the data and there are no abnormalities, so I must be to do with just not understanding the way sql count and groups. Any help would be most welcome, thanx.
SELECT TOP (100) PERCENT COUNT(DISTINCT CaseID) AS Cases, COUNT( distinct CallID) AS Enquiries, RIGHT(CONVERT(varchar(10), CallDatAdd, 5), 5) as [Month-Year]
FROM tbldCalls
WHERE (CallDatAdd BETWEEN CONVERT(DATETIME, '2013-01-01 00:00:00', 102) AND CONVERT(DATETIME, '2013-06-30 00:00:00', 102))
GROUP BY RIGHT(CONVERT(varchar(10), CallDatAdd, 5), 5)
order by [Month-Year] asc
CasesEnquiriesMonth-Year
72394501-13
64686602-13
77393703-13
67582904-13
75892805-13
63278206-13
SELECT COUNT(DISTINCT tbldCalls.CaseID) AS Cases, COUNT(tbldCalls.CallID) AS Enquiries, dimension_time.Month_Text
FROM dimension_time INNER JOIN
tbldCalls ON CONVERT(date, dimension_time.Day_Timestamp) = CONVERT(date, tbldCalls.CallDatAdd)
WHERE (tbldCalls.CallDatAdd BETWEEN CONVERT(DATETIME, '2013-01-01 00:00:00', 102) AND CONVERT(DATETIME, '2013-06-30 00:00:00', 102))
GROUP BY dimension_time.Month_Text, dimension_time.Month_Key
ORDER BY dimension_time.Month_Key asc
CasesEnquiriesMonth_Text
723945Jan
646866Feb
773937Mar
675829Apr
758928May
632782Jun
SELECT count(distinct CaseID) as Cases, count (callid) as Enquiries
FROM tbldCalls
WHERE (CallDatAdd between CONVERT(DATETIME, '2013-01-01 00:00:00', 102) and CONVERT(DATETIME, '2013-01-31 00:00:00', 102))
Union all
SELECT count(distinct CaseID) as FebCases, count (callid) as Enquiries
FROM tbldCalls
WHERE (CallDatAdd between CONVERT(DATETIME, '2013-02-01 00:00:00', 102) and CONVERT(DATETIME, '2013-02-28 00:00:00', 102))
Union all
SELECT count(distinct CaseID) as MarchCases, count (callid) as Enquiries
FROM tbldCalls
WHERE (CallDatAdd between CONVERT(DATETIME, '2013-03-01 00:00:00', 102) and CONVERT(DATETIME, '2013-03-31 00:00:00', 102))
Union all
SELECT count(distinct CaseID) as AprilCases, count (callid) as Enquiries
FROM tbldCalls
WHERE (CallDatAdd between CONVERT(DATETIME, '2013-04-01 00:00:00', 102) and CONVERT(DATETIME, '2013-04-30 00:00:00', 102))
Union all
SELECT count(distinct CaseID) as MayCases, count (callid) as Enquiries
FROM tbldCalls
WHERE (CallDatAdd between CONVERT(DATETIME, '2013-05-01 00:00:00', 102) and CONVERT(DATETIME, '2013-05-31 00:00:00', 102))
Union all
SELECT count(distinct CaseID) as JuneCases, count (callid) as Enquiries
FROM tbldCalls
WHERE (CallDatAdd between CONVERT(DATETIME, '2013-06-01 00:00:00', 102) and CONVERT(DATETIME, '2013-06-30 00:00:00', 102))
CasesEnquiries
680888
618825
773937
630772
678831
632782
September 24, 2013 at 6:26 am
DOH!!!!
the union all queries should look up the date as for example
BETWEEN CONVERT(DATETIME, '2012-12-01 00:00:00', 102) AND CONVERT(DATETIME, '2012-12-31 23:59:59', 102)
as the day ends 23:59:59 ... :blush:
September 25, 2013 at 6:58 am
Things to keep in mind:
- the BETWEEN operator is the logical equivalent of ">= and <=".
- the DATETIME datatype has precision down to fractions of a second.
- when you define a DATETIME without specifying the time a time of 00:00:00.000 is assumed.
When you are dealing with DATETIME ranges you should use the ">= AND <" construct to avoid excluding or double-counting edge values. For example:
TestDate >= '01/01/2013' and TestDate < '02/01/2013'
The reason is that if you try to specify a <= value you would have to be much more precise to avoid excluding any data. For example:
TestDate >= '01/01/2013' and TestDate <= '02/01/2013 23:59:59.999'
If you use the pattern:
TestDate >= '01/01/2013' and TestDate <= 01/31/2013'
TestDate >= 02/01/2013' and TestDate <= 02/28/2013'
You will exclude any records with values between '01/31/2013 00:00:00.001' and '01/31/2013 32:59:59.999'.
If you use the pattern:
TestDate >= '01/01/2013' and TestDate <= 02/01/2013'
TestDate >= 02/01/2013' and TestDate <= 03/01/2013'
You will double-count any records with a value of '02/01/2013 00:00:00.000'.
By using the ">= and <" construct you are clearly splitting values into one group or another.
September 26, 2013 at 6:27 am
Thanx paul.s.lach, makes sense.
Some of my parameters are passed by an access front end and I hadn't realised that passing a date only parameter would result in an 00:00:000 assumption (we don't really need anything more accurate than date when searching). Have rewritten my procs now and gotten rid of the 'between ... and' logic alltogether. Thanx for the tip!
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply