Pivot Data based on the date range

Question

Pivot Data based on the date range

Sandeeppunreddy2

SSC Veteran

Points: 203
More actions
March 25, 2020 at 5:18 pm
Go to Answer
#3736992

Hi All,
I have a dataset which has information about students and their registered courses. following is a temp table with a sample structure and data.
DROP TABLE IF EXISTS #StudentData
CREATE TABLE #StudentData (StudentID INT,CourseType VARCHAR(50),CourseID INT,SubscriptionStartDate DATE, SubscriptioneEndDate DATE)
INSERT INTO #StudentData VALUES (1,'Basic',50,'10-15-2016','10-15-2017')
INSERT INTO #StudentData VALUES (1,'Advanced',72,'10-15-2016','2018-12-31')
INSERT INTO #StudentData VALUES (1,'Basic',50,'11-13-2017','12-31-2018')
INSERT INTO #StudentData VALUES (1,'Intermediate',66,'01-10-2017','07-30-2017')
INSERT INTO #StudentData VALUES (1,'Intermediate',68,'08-01-2017','12-30-2017')
INSERT INTO #StudentData VALUES (1,'Advanced',85,'01-10-2019',NULL)
INSERT INTO #StudentData VALUES (1,'Basic',57,'10-15-2020','10-15-2021')
SELECT * FROM #StudentData
ORDER BY CourseType,SubscriptionStartDate
I am trying to create a query that can group the data for different bands(based on the start and end dates), folloiwng is the desired output for the sample provided above.
DROP TABLE IF EXISTS #DesiredOutput
CREATE TABLE #DesiredOutput (StudentID INT,CourseType VARCHAR(50),[16-17] VARCHAR(50),[17-18] VARCHAR(50),[18-19] VARCHAR(50),[19-20] VARCHAR(50),[20-21] VARCHAR(50))
INSERT INTO #DesiredOutput VALUES (1,'Basic',50,50,null,null,57)
INSERT INTO #DesiredOutput VALUES (1,'Intermediate',null,'66,68',null,null,null)
INSERT INTO #DesiredOutput VALUES (1,'Advanced',72,72,null,85,85)
SELECT * FROM #DesiredOutput
I tried Pivot and was trying to use case whens but had no luck and hoping someone can help me. any help is highly appreciated.
Thanks in advance.

Jeff Moden

SSC Guru

Points: 1004704
More actions
March 26, 2020 at 5:48 am
Answer
#3737091
Consider that when we run your code, it produces 68 reads (pages), which equates to 557,056 bytes of memory I/O just to handle 7 very narrow rows of data for one student. Here's the output from statistics I/O...
Table 'Worktable'. Scan count 2, logical reads 67, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#StudentData________________________________________________________________________________________________________00000000001B'.
Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Using the fnTally function (see my signature line below for a like-named link) and some ancient "Black Arts" T-SQL known as a CROSSTAB, we can get it down 1 read. It also gets rid of the NULLs and sorts the output in a more naturally expected order.
```
WITH
cteDates AS
(--==== Create all the year numbers for each date range.
 SELECT DISTINCT
         StudentID, CourseType, CourseID
        ,SubYear = DATENAME(yy,DATEADD(yy,t.N,SubscriptionStartDate))
   FROM #StudentData sd
  CROSS APPLY dbo.fnTally(0,DATEDIFF(yy,SubscriptionStartDate,ISNULL(SubscriptioneEndDate,GETDATE()))) t
)
,ctePreAgg AS  
(--==== Aggregate multiple courses per year for each student and course type
 SELECT  StudentID, CourseType, SubYear
        ,CourseIDs = STRING_AGG(CourseID,',') WITHIN GROUP (ORDER BY CourseID)
   FROM cteDates
  GROUP BY StudentID, CourseType, SubYear
)--==== This ancient art form is known as a CROSSTAB
 SELECT  StudentID 
        ,CourseType
        ,[2016] = MAX(CASE WHEN SubYear = 2016 THEN CourseIDs ELSE '' END)
        ,[2017] = MAX(CASE WHEN SubYear = 2017 THEN CourseIDs ELSE '' END)
        ,[2018] = MAX(CASE WHEN SubYear = 2018 THEN CourseIDs ELSE '' END)
        ,[2019] = MAX(CASE WHEN SubYear = 2019 THEN CourseIDs ELSE '' END)
        ,[2020] = MAX(CASE WHEN SubYear = 2020 THEN CourseIDs ELSE '' END)
        ,[2021] = MAX(CASE WHEN SubYear = 2021 THEN CourseIDs ELSE '' END)
   FROM ctePreAgg
  GROUP BY StudentID,CourseType
  ORDER BY StudentID, CASE CourseType WHEN 'Basic' THEN 1 WHEN 'Intermediate' THEN 2 ELSE 3 END
;
```
Here's the output from statistics I/O...
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#StudentData________________________________________________________________________________________________________00000000001B'.
Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Personally, I try to avoid incremental rCTEs and PIVOTs at all costs. Please see the following articles for why (performance, of course!).
https://www.sqlservercentral.com/articles/hidden-rbar-counting-with-recursive-ctes
https://www.sqlservercentral.com/articles/cross-tabs-and-pivots-part-1-%e2%80%93-converting-rows-to-columns-1
--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
Change is inevitable... Change for the better is not.
Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 1

It would help a whole lot if you explained the basis of the pivot. For example, what does the column labels of [xx-yy] equate to in the original data?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

BTylerWhite SSCommitted Points: 1569 More actions · Answer 2

This is quite a tricky task. It also looks like from your provided desired output you want some string aggregation with the CourseIDs. I'm not really sure if a PIVOT is the best case for this, since we aren't really trying to perform any aggregate functions against a true aggregation column.

Also, some of the Start and EndDates don't match up quite like the desired output. For example, I see some 2017-2017 ranges.

This might help you get started:

SELECT SD.StudentID,
       SD.CourseType,
       CONCAT(FORMAT(SD.SubscriptionStartDate, N'yy'), '-', FORMAT(SD.SubscriptioneEndDate, N'yy')) AS SubscriptionYears,
       STRING_AGG(SD.CourseID, ',') WITHIN GROUP(ORDER BY SD.CourseType) AS CourseIds
FROM #StudentData AS SD
GROUP BY SD.StudentID,
         SD.CourseType,
         CONCAT(FORMAT(SD.SubscriptionStartDate, N'yy'), '-', FORMAT(SD.SubscriptioneEndDate, N'yy'));

I used FORMAT for readability, but you may want to consider another way to extract these years for better performance for your columns.

Sandeeppunreddy2 SSC Veteran Points: 203 More actions · Answer 3

Hi Jeff,

The years are basically enrollment years like the medical enrollment years provided by employers to their employees which can start from any month of the year but usually are 12 months in length (but that is not standard as there can be programs that can start and end within the same year) so those are standard. (like [2016-2017],[2017-2018], etc.. basically each year.)

as an example, if a student enrolled for a course that starts on 01/01/2016 and the course is valid for 30 months with an end date of 06/30/2018.there will be one row in the source, in the final output for that student the course should appear in [16-17], [17-18] and [18-19] bands or in other words, the course was valid for at least 1 day in the years 2016,2017 and 2018. and as we do not have a bucket that shows only [18] we display it in the band [18-19] along with the other 2 bands. I think it will be simpler if I change those bands to be just years like [2016], [2017], [2018],[2019],[2020] and if a course is valid for at least 1 day in that year then the course ID should appear in the appropriate column.

It is possible that a student registers for different courses within each course type in the same year and that is the 'Intermediate' course type in the sample data. the only rule is that with in the same course type, a student can not be registered in more than one course at the same time.

I appreciate the help.

This reply was modified 5 years, 11 months ago by Sandeeppunreddy2.
This reply was modified 5 years, 11 months ago by Sandeeppunreddy2.
This reply was modified 5 years, 11 months ago by Sandeeppunreddy2.

Sandeeppunreddy2 SSC Veteran Points: 203 More actions · Answer 4

Hi Tyler,

Agreed, it is difficult to get this done using the Pivot. spent a lot of time trying to get that work.

and Yes, it is possible for a student to register in 2 different courses with in the same course type in the same year. but the rule is that the 2 courses should not overlap so the 1st course will end before the second course is started so the SubscriptionEndDate for the fist course will always be before the Subscription Start date of the second course (again, with in the same Course Type. 'Intermediate' in the sample I provided originally as an example). so, in this case, the band [2017-2018] should display both the courses as they have a start date in 2017. notice the comma-separated courses 66,68 in the band [17-18] in the desired output resultset.

Steve Collins SSCrazy Points: 2251 More actions · Answer 5

This is very tricky. I'm this far and my @$%@$ pivot isn't work for some reason and I can't figure out why. If (or when) it works my plan is/was to make it dynamic and exec sp_executesql

set ansi_warnings off;

declare
  @bands            table(start_yr          int,
                          end_yr            int,
                          band              char(7));
declare
  @cols             varchar(max);

declare
  @sql              nvarchar(max);

with
min_max_yr_cte(min_yr, max_yr) as (
    select min(year(SubscriptionStartDate)), max(year(SubscriptioneEndDate)) from #StudentData)
insert @bands
select
  (mmy.min_yr+t.n-1), (mmy.min_yr+t.n),
  concat('[',right(convert(varchar, (mmy.min_yr+t.n-1)), 2),'-', right(convert(varchar, (mmy.min_yr+t.n)), 2), ']') 
from
  min_max_yr_cte mmy
 cross apply
  dbo.tally(mmy.max_yr-mmy.min_yr) t;

select 
  @cols=string_agg(band, ', ')
from
  @bands;

drop table if exists #rows_to_pivot;
create table #rows_to_pivot(
  band              char(7),
  StudentID         INT,
  CourseType        VARCHAR(50),
  CourseID          INT);

insert #rows_to_pivot
select
  b.band,
  sd.StudentID,
  sd.CourseType,
  sd.CourseID
from
  @bands b
 join
  #StudentData sd on b.start_yr>=year(sd.SubscriptionStartDate)
                     and b.end_yr<=isnull(year(sd.SubscriptioneEndDate),b.end_yr)
union all
select
  b.band,
  sd.StudentID,
  sd.CourseType,
  sd.CourseID
from
  @bands b
 join
  #StudentData sd on b.start_yr=year(sd.SubscriptionStartDate)
                     and b.start_yr=isnull(year(sd.SubscriptioneEndDate),b.end_yr)
where
  year(sd.SubscriptionStartDate)=isnull(year(sd.SubscriptioneEndDate),b.end_yr);

select * from #rows_to_pivot order by 4,1;

select StudentID, CourseType, [16-17], [17-18], [18-19], [19-20], [20-21]
from 
  (select StudentID, CourseType, CourseID, band from #rows_to_pivot) rtp 
 pivot
  (max(rtp.CourseID) for rtp.band in([16-17], [17-18], [18-19], [19-20], [20-21])) pvt

This reply was modified 5 years, 11 months ago by Steve Collins.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Steve Collins SSCrazy Points: 2251 More actions · Answer 6

Because the start date, end date combinations aren't uniformly spanning 2 years the first part of the script creates the year bands using a tally table. Then the StudentData is bucketed into the year bands into the temp table #rows_to_pivot. Then for some reason the pivot (which to my eye should work fine) isn't working. Right now the aggregate function is max but it should switched to string_agg().

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Sandeeppunreddy2 SSC Veteran Points: 203 More actions · Answer 7

@SCdecade - Thanks for the attempt. I will look into the approach.

I have finally managed to complete the query.

changed the bands to use simple years assuming 2016 is [16-17], 2017 as [17-18], etc.. got this idea when I was trying to explain the year band to Jeff in this same post.
I converted the student data to be more granular from the original source by using CTEs to create one row per StudentID, CourseType, CourseID and Year the course is active for. as following

; WITH CTE AS

(

SELECT studentID,coursetype,courseid, datepart(year, SubscriptionStartDate) as yr,SubscriptioneEndDate,SubscriptionStartDate

FROM #StudentData

UNION ALL

SELECT studentID,coursetype,courseid, yr + 1,SubscriptioneEndDate,SubscriptionStartDate

FROM CTE

WHERE yr < datepart(year, isnull(SubscriptioneEndDate,getdate()))

)

SELECT * FROM (

SELECT studentID,coursetype,yr,STRING_AGG(courseid,',') AS Courses

FROM (

SELECT DISTINCT studentID,coursetype,yr,courseid

FROM (

select studentID,coursetype,courseid,yr

from CTE

) as a

) AS B

GROUP BY studentID,coursetype,yr

) AS C

PIVOT

(

MIN(Courses)

for yr in ([2016],[2017],[2018],[2019],[2020],[2021])

) as pvt

After the granularity change and the tip from SCdecade to use the STRING_AGG() to aggregate , I used Pivot and it worked great!

Thanks for all the help guys.

This reply was modified 5 years, 11 months ago by Sandeeppunreddy2.