August 17, 2009 at 5:57 am
Here's an interesting little problem that I'm trying to solve ...
Basically, I have a table of registrations and subscribers. What I want is to have a report, with the data presented by month. For each month, there should be a record with the amount of subscribers we had up to that date.
So, as an example set of data,
DECLARE @TestTable TABLE
(
ID INT PRIMARY KEY IDENTITY,
DateCreated SMALLDATETIME,
Subscriber BIT
)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-01-01', 1)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-02-01', 1)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-02-02', 1)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-04-05', 1)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-05-05', 0)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-05-07', 1)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-05-09', 1)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-10-01', 0)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-11-01', 1)
INSERT INTO @TestTable(DateCreated, Subscriber)
VALUES('2009-12-01', 1)
IDDateCreatedSubscriber
12009-01-01 00:00:001
22009-02-01 00:00:001
32009-02-02 00:00:001
42009-04-05 00:00:001
52009-05-05 00:00:000
62009-05-07 00:00:001
72009-05-09 00:00:001
82009-10-01 00:00:000
92009-11-01 00:00:001
102009-12-01 00:00:001
And, the result set should be:
YearMonthCount
2009January1
2009Februray3
2009March3
2009April4
2009May6
2009June6
2009July6
2009August6
2009September6
2009October6
2009November7
2009December8
Now, here's the issue I'm trying to solve. I know this would be "easy" to do by just running a "SELECT COUNT(*) FROM TABLE WHERE dateCreated < @CurrentDate AND Subscriber = 1" for each month and inserting a record into a temp table. But what I'm trying to accomplish is to avoid having to query the entire table several times, especially since I'm potentially dealing with a non-indexed field. What I want to do is to basically run the query once, get all the data, and then do some sort of join or aggregate for each month that's requested. something like "SELECT SUM(CASE WHEN dateCreated < @CurrentDate THEN 1 ELSE 0) FROM TABLE GROUP BY dateCreated", except only showing the months that I'm interested in. One solution that I've considered is making a temp table, dumping all the records from my raw table into that temp table, then indexing the temp table accordingly and re-querying it repeatedly for each month, but I'd still like to find a solution that would involve only querying the table once, in whatever form I choose.
August 17, 2009 at 7:22 am
I assume you have something like a calendar table...
If not, I've included a calendar (CTE_calendar) in the code below.
Also, please note that I've used a temp table with a PK constraint to store the intermediate result.
The reason is as simple as it is complicated:
Jeff Moden's article [/url] shows how the "quirky update" works (at least in the attached SQL code).
DECLARE @TestTable TABLE
(
ID INT PRIMARY KEY IDENTITY,
DateCreated SMALLDATETIME,
Subscriber BIT
)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-01-01', 1)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-02-01', 1)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-02-02', 1)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-04-05', 1)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-05-05', 0)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-05-07', 1)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-05-09', 1)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-10-01', 0)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-11-01', 1)
INSERT INTO @TestTable (DateCreated, Subscriber)
VALUES ('2009-12-01', 1)
SET NOCOUNT ON
-- Declare intermedite table to hold the results and to perform the running total on
CREATE TABLE #test2 (ID INT IDENTITY(1,1), datevalue DATETIME, cnt INT, rn_total INT,
CONSTRAINT PK_#test2_ID PRIMARY KEY (ID)
)
; WITH
CTE_calendar AS -- build calendar table, assuming it doesn't exist
(
SELECT DATEADD(mm, RowNum-1,CONVERT(DateTime, '20090101', 112))AS datevalue
FROM
(
SELECT row_number() OVER (ORDER BY object_id) AS RowNum FROM sys.all_objects
) RowTable
WHERE RowNum <= DATEDIFF(Month,'20090101','20091231') + 1
),
CTE_monthlyData AS --build monthly total of rows with [subscriber] = 1
(
SELECT
DATEADD(mm, DATEDIFF(mm, 0, CAST(DateCreated AS DATETIME)), 0) AS CREATED,
COUNT(*) AS CNT
FROM @TestTable TT
WHERE Subscriber = 1
GROUP BY DATEADD(mm, DATEDIFF(mm, 0, CAST(DateCreated AS DATETIME)), 0)
)
INSERT INTO #test2 (datevalue, CNT, rn_total) -- fill intermdiate table
SELECT datevalue, ISNULL(CNT,0),0
FROM CTE_calendar
LEFT OUTER JOIN CTE_monthlyData ON datevalue = CREATED
ORDER BY datevalue
DECLARE @run INT-- used for adding the running total
SET @run = 0
UPDATE #test2-- calculate the running total
SET @run = rn_total = CNT + @run
FROM #test2 WITH (INDEX(PK_#test2_ID),TABLOCKX)
-- print requested result
SELECT
YEAR(datevalue) AS Year,
DATENAME(MONTH, datevalue) AS Month,
rn_total AS COUNT
FROM #test2
-- cleanup
DROP TABLE #test2
/*result set:
YearMonthCOUNT
2009January1
2009February3
2009March3
2009April4
2009May6
2009June6
2009July6
2009August6
2009September6
2009October6
2009November7
2009December8
*/
August 17, 2009 at 7:39 am
So, let me see if I understand this right.
What you're essentially doing is, getting a total of the subscribers on a monthly basis, and then, using the results stored into a temp table, you're getting a running total of the sum of the monthly data, and updating the temp table "sum" field with that running total.
Creative. Very creative. I like it, thanks!
August 17, 2009 at 9:27 am
kramaswamy (8/17/2009)
So, let me see if I understand this right.What you're essentially doing is, getting a total of the subscribers on a monthly basis, and then, using the results stored into a temp table, you're getting a running total of the sum of the monthly data, and updating the temp table "sum" field with that running total.
Creative. Very creative. I like it, thanks!
Youre understanding is 100% correct. Glad you like the solution! 🙂
But I have to admit that the only creative part from my side was to apply a solution I learned here at SCC (based on Jeff's article I mentioned in my previous post). So, if there'd be any creativity award for this solution, it belongs to Jeff. 😉
August 17, 2009 at 10:04 am
A few more things to add:
#1) I forgot to thank you for the way you've provided sample data and that you took the time to dive into the solution I proposed in order to understand what's going on! Good job!
#2) You've mentioned that you're "potentially dealing with a non-indexed field". I assume you're talking about the subscriber column. What's the reason for not having it as included column (preferred on an index having the DateCreated column as first index column)?
If the subscriber column can either be 0 or 1 then you could use SUM(subscriber) instead of COUNT(*) and the WHERE clause... - assuming you can change the data type from BIT to say TINYINT (mabe with a constraint to only allow 0,1 and NULL)
#3) The CAST part of my code might not be required, when I look at the code "from a distance"...
#4) If you're having a calendar table with separate month column then you wouldn't even need the date conversion function. This should also help performance...
Here's an alternative solution considering #3 and #4 (assuming same sample data as before):
SET NOCOUNT ON
-- Create intermedite table to hold the results and to perform the running total on
CREATE TABLE #test2 (ID INT IDENTITY(1,1), MonthValue DATETIME, cnt INT, rn_total INT,
CONSTRAINT PK_#test2_ID PRIMARY KEY NONCLUSTERED (ID)
)
; WITH
CTE_calendar AS -- build calendar table, assuming it doesn't exist
(
SELECT
CONVERT(DateTime, '20090101', 112) + RowNum - 1 AS datevalue,
DATEADD(mm, DATEDIFF(mm, 0, CONVERT(DateTime, '20090101', 112) + RowNum - 1), 0) AS MonthValue
FROM
(
SELECT row_number() OVER (ORDER BY object_id) AS RowNum FROM sys.all_objects
) RowTable
WHERE RowNum <= DATEDIFF(dd,'20090101','20091231') + 1
)
INSERT INTO #test2 (MonthValue, CNT, rn_total) -- fill intermdiate table
SELECT MonthValue, SUM(ISNULL(CAST(Subscriber AS INT),0)),0-- CAST only required if data type change from BIT to integer type is not an option.
FROM CTE_calendar
LEFT OUTER JOIN @TestTable TestTable ON datevalue = DateCreated
GROUP BY MonthValue
ORDER BY MonthValue
DECLARE @run INT-- used for adding the running total
SET @run = 0
UPDATE #test2-- calculate the running total
SET @run = rn_total = CNT + @run
FROM #test2 WITH (INDEX(PK_#test2_ID),TABLOCKX)
-- print requested result
SELECT
YEAR(MonthValue) AS Year,
DATENAME(MONTH, MonthValue) AS Month,
rn_total AS COUNT
FROM #test2
-- cleanup
DROP TABLE #test2
August 17, 2009 at 10:09 am
Yeah, having tried to help some other people out here with incomplete data to work with, I know how annoying that can be, so I do what I can 😛
The reason why I mentioned potentially non-indexed columns is because I didn't give a complete example case, the actual situation involves some more intricate querying with joins and stuff, but which wasn't really applicable to the problem at hand.
I've got the date part handled by creating a year and month temp table based on the values which we need to use for the query, so I'm pretty much set now. Thanks alot for the help~
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply