June 24, 2018 at 11:25 pm
Hi there,
I have been stuck on this for days and I need submitting this report in the next two days.
This is the data I am working with:
And this is what I need:
Essentially, If I am reporting for June I need to have all the months even for those institutions who only have data as of February or March perhaps and need to add the same totals for those missing months as the total of the last month.
Anyone can help please?
Here is the SQL code to create the data:
CREATE TABLE #data
(
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
GO
INSERT INTO #data
VALUES
('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
GO
SELECT * FROM #data
June 25, 2018 at 7:33 am
Something like this would work. Please note, I personally would refine this - I hate looping, I would make it more configurable (i.e. tweaking parameters) and I would investigate why the LAST_VALUE function was not working as I expected. Anyway, I didn't have more than a few minutes to spend on this.
Best wishes on this!
CREATE TABLE #data
(
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
GO
INSERT INTO #data
VALUES
('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
GO
declare
@ReportDateLimitEnd date = '20180601'
, @ReportDateLimitStart date = '20180101'
;
declare @DatePeriods table (ReportingDate Date, Institution varchar(4));
while @ReportDateLimitEnd >= @ReportDateLimitStart
begin
insert into @DatePeriods (ReportingDate, Institution)
select distinct @ReportDateLimitStart, Institution
from #data;
set @ReportDateLimitStart = dateadd(month,1,@ReportDateLimitStart);
end
select
dp.ReportingDate
, dp.Institution
, case when d.Totals is not null then d.Totals else lt.LastTotal end as Totals
from
@DatePeriods dp
left join #data d on dp.ReportingDate = d.ReportingMonth and dp.Institution = d.Institution
left join (
select distinct d.Institution, FIRST_VALUE(d.Totals) over (partition by d.Institution order by d.ReportingMonth desc) as LastTotal
from #data d
) lt on lt.Institution = dp.Institution
order by
dp.Institution, dp.ReportingDate
;
drop table #data
June 25, 2018 at 7:39 am
June 25, 2018 at 7:44 am
Below is a solution without looping. Although I guess this is still not the most optimal way... it's the best I could come up with for now.
if OBJECT_ID('tempdb..#data') is not null
drop table #data
CREATE TABLE #data
(
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
GO
-- create a temp table with the sample data
INSERT INTO #data
VALUES
('20180101','Ins1',32),
('20180201','Ins1',37),
('20180301','Ins1',32),
('20180401','Ins1',40),
('20180501','Ins1',38),
('20180601','Ins1',37),
('20180101','Ins2',145),
('20180201','Ins2',155),
('20180101','Ins3',1489),
('20180201','Ins3',1487),
('20180301','Ins3',1485);
GO
-- create a TALLY table to list all the required dates
if OBJECT_ID('tempdb..#tally_date') is not null
drop table #tally_date
CREATE TABLE #tally_date (AllMonths datetime)
INSERT INTO #tally_date
VALUES
('20180101'),
('20180201'),
('20180301'),
('20180401'),
('20180501'),
('20180601'),
('20180701'),
('20180801'),
('20180901'),
('20181001'),
('20181101'),
('20181201')
;
select all_institutions.Institute as Institution
, #tally_date.AllMonths
, COALESCE(#data.Totals, last_reported.Totals) AS Totals
from #tally_date
-- use OUTER APLLY to get all months for each Institution
outer apply (select distinct Institution as Institute from #data) all_institutions
left join #data
on #tally_date.AllMonths = #data.ReportingMonth
and all_institutions.Institute = #data.Institution
-- use CROSS APPLY to get the last reported value of each Institution
cross apply (select top 1 Totals
from #data
where Institution = Institute
order by ReportingMonth desc
) as last_reported
where AllMonths < '20180701'
order by all_institutions.Institute
, AllMonths
;
June 25, 2018 at 7:44 am
The best option would be to use a time/date dimension table if you have one. If you don't, you can easily find one. Using it would avoid the loop. I forgot to mention that.
With this table, you would have a column for the day number of the month. You could then query the table like this, "give me all dates where day is 1 and dates are between @DateRangeStart and @DateRangeEnd"
June 25, 2018 at 11:45 am
Still need to learn LEAD/LAG but would think one of these would be something to use with SQL Server 2012 and newer.
June 25, 2018 at 12:36 pm
romina - Monday, June 25, 2018 8:28 AMThank you HanShi
Looking into your solution too.You guys have no idea how much I appreciate you taking the time to help me out with this. It's been so frustrating ... it looks simple and yet it hasn't been!
Thanks
A solution using analytical function max() over(order by). This looks at all the available dates for an institution and generates 12 rows one for each month. I have slightly modified the input DML statements to test various scenarios. More complicated than the earlier solutions i guess but made more generic
CREATE TABLE data
(
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
GO
INSERT INTO data
VALUES
('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-09-01','Ins1',40),
('2019-09-01','Ins1',42),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
GO
with generate_12_months
as (
select dateadd(month
,y.rnk-1
,cast(concat(year_val,'-','01','-','01') as date)
) as running_months
,x.institution
from (select top 12
row_number() over(order by (select null)) as rnk
from sys.objects
)y
join (select distinct /* for each distinct year in a institution generates 12 months*/
datepart(year,reportingmonth) as year_val
,Institution
from data
)x
on 1=1
)
,interim_data
as (
select cd.running_months
,cd.institution
,d.totals
,d.reportingmonth
from generate_12_months cd
left join data d
on cd.running_months=d.reportingmonth
and cd.institution=d.institution
)
,interim_data2
as(/*the max reportingmonth gets the first not null value of the month ordered by running_months*/
select *
,max(x.reportingmonth) over(partition by x.institution order by x.running_months) as max_val
from interim_data x
)
select *
,max(y.totals) over(partition by y.institution,y.max_val) as not_null_totals
from interim_data2 y
June 25, 2018 at 1:18 pm
Lynn Pettis - Monday, June 25, 2018 11:45 AMStill need to learn LEAD/LAG but would think one of these would be something to use with SQL Server 2012 and newer.
if OBJECT_ID('tempdb..#data') is not null
drop table #data
CREATE TABLE #data
(
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
-- create a temp table with the sample data
INSERT INTO #data
VALUES
('20180101','Ins1',32),
('20180201','Ins1',37),
('20180301','Ins1',32),
('20180401','Ins1',40),
('20180501','Ins1',38),
('20180601','Ins1',37),
('20180101','Ins2',145),
('20180201','Ins2',155),
('20180101','Ins3',1489),
('20180201','Ins3',1487),
('20180301','Ins3',1485);
-- create a TALLY table to list all the required dates
if OBJECT_ID('tempdb..#tally_date') is not null
drop table #tally_date
CREATE TABLE #tally_date (AllMonths datetime)
INSERT INTO #tally_date
VALUES
('20180101'),
('20180201'),
('20180301'),
('20180401'),
('20180501'),
('20180601'),
('20180701'),
('20180801'),
('20180901'),
('20181001'),
('20181101'),
('20181201')
;
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 25, 2018 at 3:03 pm
Here is another option:
Declare @startDate date = '2018-01-01'
, @endDate date = '2018-06-01';
With reportDates
As (
Select ReportDate = dateadd(month, -d.rn, @endDate)
From (Select row_number() over(Order By ac.[object_id]) - 1 As rn From sys.all_columns ac) As d
Where dateadd(month, -d.rn, @endDate) >= @startDate
)
, distinctInstitutions
As (
Select Distinct
d.Institution
From #data d
)
Select r.ReportDate
, i.Institution
, Totals = coalesce(d.Totals, t.Totals, 0)
From reportDates r
Cross Join distinctInstitutions i
Left Join #data d On d.ReportingMonth = r.ReportDate
And d.Institution = i.Institution
Outer Apply (Select Top 1
d3.Totals
From #data d3
Where d3.ReportingMonth < r.ReportDate
And d3.Institution = i.Institution
Order By
d3.ReportingMonth desc) t
Order By
i.Institution
, r.ReportDate;
This builds the reporting dates based on the start/end dates you define - this also assumes that you will always have data from the beginning of the date range. If you don't have data on the first date for each institution those will return a 0 until you have data available.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
June 27, 2018 at 9:10 am
drop table temp1;
create table temp1 ([date] date,Institution varchar(50),Totals int)
INSERT INTO temp1
VALUES
('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
GO
drop table temp2
create table temp2 ([date] date)
declare @startDate date,@endDate date
set @startDate = '2018-01-01'
set @endDate = '2018-06-01'
truncate table temp2
while (@startDate <= @endDate)
begin
insert into temp2 values(@startDate)
set @startDate = dateadd(dd,1,Eomonth(@startDate))
end
drop table #ReportDtl
;with cte_tmp as (select * from temp2, (select distinct institution from temp1 where date<=@endDate) t)
select t2.[date],t2.institution,t1.Totals into #ReportDtl
from temp1 t1 right outer join cte_tmp t2
on t1.[date] = t2.[date] and t1.institution = t2.institution order by 2,1
UPDATE sf
SET sf.Totals = sf3.Totals
FROM #ReportDtl sf3,#ReportDtl sf
WHERE sf3.date = (SELECT MAX(date) FROM #ReportDtl sf2 WHERE sf2.date <
sf.date and sf2.Totals IS NOT NULL AND sf2.institution = sf.institution)
and sf3.institution = sf.institution
and sf.Totals IS NULL;
select * from #ReportDtl;
June 27, 2018 at 9:26 am
Sudhakar Ramakrishnan - Wednesday, June 27, 2018 9:10 AMdrop table temp1;
create table temp1 ([date] date,Institution varchar(50),Totals int)
INSERT INTO temp1
VALUES
('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
GO
drop table temp2
create table temp2 ([date] date)
declare @startDate date,@endDate date
set @startDate = '2018-01-01'
set @endDate = '2018-06-01'
truncate table temp2
while (@startDate <= @endDate)
begin
insert into temp2 values(@startDate)
set @startDate = dateadd(dd,1,Eomonth(@startDate))
end
drop table #ReportDtl
;with cte_tmp as (select * from temp2, (select distinct institution from temp1 where date<=@endDate) t)
select t2.[date],t2.institution,t1.Totals into #ReportDtl
from temp1 t1 right outer join cte_tmp t2
on t1.[date] = t2.[date] and t1.institution = t2.institution order by 2,1
UPDATE sf
SET sf.Totals = sf3.Totals
FROM #ReportDtl sf3,#ReportDtl sf
WHERE sf3.date = (SELECT MAX(date) FROM #ReportDtl sf2 WHERE sf2.date <
sf.date and sf2.Totals IS NOT NULL AND sf2.institution = sf.institution)
and sf3.institution = sf.institution
and sf.Totals IS NULL;
select * from #ReportDtl;
WHILE loops tend to perform horribly. There were several solutions offered earlier in the thread which did not require a WHILE loop.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 27, 2018 at 10:49 am
Perhaps the simplest way to demonstrate what is going on is:
If you need to show each month for each institution, regardless of any activity for given institution in given month,
you need to generate data set where each month is combined with each institution.
First, we need a table with all the months we want to report for:?
IF Object_ID( 'tempdb..#ReportingMonths' ) IS NOT NULL
DROP TABLE #ReportingMonths
;
GO
CREATE TABLE #ReportingMonths
( ReportingMonth datetime NOT NULL )
GO
INSERT INTO #ReportingMonths
VALUES
('2018-01-01'),
('2018-02-01'),
('2018-03-01'),
('2018-04-01'),
('2018-05-01'),
('2018-06-01')
;
GO
SELECT * FROM #ReportingMonths
;
GO
Now you need to combine each row from #ReportingMonths with each Institution. I assume you have a table
where you keep Institutions, one row per Institution, similar to this:--2018-Jun-27 12:29:07--
IF Object_ID( 'tempdb..#Institution' ) IS NOT NULL
DROP TABLE #Institutions
;
GO
SELECT DISTINCT
Institution
INTO #Institutions
FROM #Data
;
GO
SELECT * FROM #Institutions;
GO
Following query will combine each row from #ReportingMonths with each row from #Institutions:
SELECT
D.Institution
, M.ReportingMonth
FROM #ReportingMonths AS M
, #Institutions AS D
;
The final query would look like this:WITH CrossJoinPart
AS
(
SELECT
D.Institution
, M.ReportingMonth
FROM #ReportingMonths AS M
, #Institutions AS D
)
SELECT
X.Institution
, X.ReportingMonth
, D.Totals
FROM CrossJoinPart AS X
LEFT JOIN #Data AS D
ON X.ReportingMonth = D.ReportingMonth
AND X.Institution = D.Institution
ORDER BY Institution , ReportingMonth
;
No loops, no Window functions, no APPLY.
🙂
June 28, 2018 at 9:30 am
romina - Sunday, June 24, 2018 11:25 PMHi there,I have been stuck on this for days and I need submitting this report in the next two days.
This is the data I am working with:
And this is what I need:
Essentially, If I am reporting for June I need to have all the months even for those institutions who only have data as of February or March perhaps and need to add the same totals for those missing months as the total of the last month.
Anyone can help please?
Here is the SQL code to create the data:
CREATE TABLE #data
(
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
GOINSERT INTO #data
VALUES
('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
GOSELECT * FROM #data
How about we go entirely dynamic instead of hard coding the needed dates? I assumed that the date values already in the table set the range of dates by just taking the minimum and maximum values.CREATE TABLE #data (
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
INSERT INTO #data (ReportingMonth, Institution, Totals)
VALUES ('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
SELECT *
FROM #data;
DECLARE @StartMonth AS date,
@EndMonth AS date;
SELECT @StartMonth = MIN(ReportingMonth),
@EndMonth = MAX(ReportingMonth)
FROM #data;
WITH Numbers AS (
SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
DATES AS (
SELECT TOP (DATEDIFF(month, @StartMonth, @EndMonth) + 1)
DATEADD(month, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @StartMonth) AS ReportingMonth
FROM Numbers AS N1
CROSS JOIN Numbers AS N2
),
Institutions AS (
SELECT Institution, MAX(ReportingMonth) AS MaxMonth
FROM #data
GROUP BY Institution
),
MAX_MONTH_TOTALS AS (
SELECT D.Institution, I.MaxMonth, D.Totals
FROM #data AS D
INNER JOIN Institutions AS I
ON D.Institution = I.Institution
AND D.ReportingMonth = I.MaxMonth
)
SELECT
DT.ReportingMonth,
I.Institution,
CASE
WHEN DT.ReportingMonth > MMT.MaxMonth THEN MMT.Totals
ELSE D.Totals
END
FROM DATES AS DT
CROSS JOIN Institutions AS I
INNER JOIN MAX_MONTH_TOTALS AS MMT
ON I.Institution = MMT.Institution
LEFT OUTER JOIN #data AS D
ON DT.ReportingMonth = D.ReportingMonth
AND I.Institution = D.Institution
ORDER BY
I.Institution,
DT.ReportingMonth;
DROP TABLE #data;
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
June 28, 2018 at 12:15 pm
sgmunson - Thursday, June 28, 2018 9:30 AMromina - Sunday, June 24, 2018 11:25 PMHi there,I have been stuck on this for days and I need submitting this report in the next two days.
This is the data I am working with:
And this is what I need:
Essentially, If I am reporting for June I need to have all the months even for those institutions who only have data as of February or March perhaps and need to add the same totals for those missing months as the total of the last month.
Anyone can help please?
Here is the SQL code to create the data:
CREATE TABLE #data
(
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
GOINSERT INTO #data
VALUES
('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);
GOSELECT * FROM #data
How about we go entirely dynamic instead of hard coding the needed dates? I assumed that the date values already in the table set the range of dates by just taking the minimum and maximum values.
CREATE TABLE #data (
ReportingMonth datetime,
Institution varchar(4),
Totals integer
);
INSERT INTO #data (ReportingMonth, Institution, Totals)
VALUES ('2018-01-01','Ins1',32),
('2018-02-01','Ins1',37),
('2018-03-01','Ins1',32),
('2018-04-01','Ins1',40),
('2018-05-01','Ins1',38),
('2018-06-01','Ins1',37),
('2018-01-01','Ins2',145),
('2018-02-01','Ins2',155),
('2018-01-01','Ins3',1489),
('2018-02-01','Ins3',1487),
('2018-03-01','Ins3',1485);SELECT *
FROM #data;DECLARE @StartMonth AS date,
@EndMonth AS date;SELECT @StartMonth = MIN(ReportingMonth),
@EndMonth = MAX(ReportingMonth)
FROM #data;WITH Numbers AS (
SELECT 1 AS N UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
DATES AS (SELECT TOP (DATEDIFF(month, @StartMonth, @EndMonth) + 1)
DATEADD(month, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1, @StartMonth) AS ReportingMonth
FROM Numbers AS N1
CROSS JOIN Numbers AS N2
),
Institutions AS (SELECT Institution, MAX(ReportingMonth) AS MaxMonth
FROM #data
GROUP BY Institution
),
MAX_MONTH_TOTALS AS (SELECT D.Institution, I.MaxMonth, D.Totals
FROM #data AS D
INNER JOIN Institutions AS I
ON D.Institution = I.Institution
AND D.ReportingMonth = I.MaxMonth
)
SELECT
DT.ReportingMonth,
I.Institution,
CASE
WHEN DT.ReportingMonth > MMT.MaxMonth THEN MMT.Totals
ELSE D.Totals
END
FROM DATES AS DT
CROSS JOIN Institutions AS I
INNER JOIN MAX_MONTH_TOTALS AS MMT
ON I.Institution = MMT.Institution
LEFT OUTER JOIN #data AS D
ON DT.ReportingMonth = D.ReportingMonth
AND I.Institution = D.Institution
ORDER BY
I.Institution,
DT.ReportingMonth;DROP TABLE #data;
The problem with this approach is that it requires at least three scans of the table: once for determining the institutions; once for determining the min/max dates; and once for the final results. Here is a test approach that only requires one scan. You may need to increase the size of the tally table if you have gaps larger than 10 months.
;
WITH
Tally AS
(
SELECT ROW_NUMBER() OVER(ORDER BY @@VERSION) - 1 AS n FROM (VALUES(0), (0), (0), (0), (0), (0), (0), (0), (0), (0)) c(n)
)
, ReportingIntervals AS
(
SELECT
Institution,
Totals,
ReportingMonth,
DATEDIFF(MONTH, ReportingMonth, LEAD(ReportingMonth, 1, DATEADD(MONTH, 1, GETDATE())) OVER(PARTITION BY Institution ORDER BY ReportingMonth)) AS ReportMonthDuration
FROM #data
)
SELECT DATEADD(MONTH, n, ReportingMonth) AS ReportingMonth, Institution, Totals
FROM ReportingIntervals ri
INNER JOIN Tally t
ON t.n < ri.ReportMonthDuration
;
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply