Find minimum and maximum start date with repeated values

Question

Find minimum and maximum start date with repeated values

SSC Enthusiast

Points: 146

May 4, 2017 at 8:01 pm

I have a bunch of data which has a serviceid, 2 amount fields and a date range from the 16th to the 15th of each month. Of the 2 amount fields only 1 will have a value each month with the other being zero. Lately we have implemented a new system where we have another date range and value which has accurate start and end dates. Over time this new system will phase out the old system however at the moment I need to have a report that combines the 2 together giving priority to the new system.

The report I need will show the serviceid, amount, start date and end date to basically build up a history of changes in value over the years. I had this all working ok until I realised that sometimes the amount changes for a few months and then changes back and so using MIN & MAX was causing overlapping data. I have no idea how to get around this and have been stumped for weeks now.

Here is some test data with the amount being Priority1, Priority2 & Priority3 which is used in that order so the new system overrides the old. My basic query that I used is below but as you can see from the results this doesn't work.

CREATE TABLE #Test (ServiceID int, StartDate datetime, EndDate datetime, Priority2 decimal(10,2), Priority3 decimal(10,2), Priority1 decimal(10,2), Priority1Start datetime, Priority1End datetime) GO

INSERT INTO #Test VALUES (100,'2016-04-16','2016-05-15',100.00,0.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-05-16','2016-06-15',100.00,0.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-05-16','2016-06-15',0.00,150.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-06-16','2016-07-15',0.00,150.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-07-16','2016-08-15',0.00,150.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-08-16','2016-09-15',0.00,180.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-09-16','2016-10-15',0.00,200.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-10-16','2016-11-15',0.00,240.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-11-16','2016-12-15',0.00,240.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (120,'2016-11-16','2016-12-15',0.00,300.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2016-12-16','2017-01-15',0.00,200.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (120,'2016-12-16','2017-01-15',0.00,300.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2017-01-16','2017-02-15',0.00,200.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (120,'2017-01-16','2017-02-15',0.00,300.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (122,'2017-01-16','2017-02-15',180.00,0.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2017-02-16','2017-03-15',0.00,400.00,400.00,'2017-03-05','2017-04-20') INSERT INTO #Test VALUES (120,'2017-02-16','2017-03-15',0.00,300.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (122,'2017-02-16','2017-03-15',180.00,0.00,NULL,NULL,NULL) INSERT INTO #Test VALUES (100,'2017-03-16','2017-04-15',0.00,400.00,400.00,'2017-03-05','2017-04-20') INSERT INTO #Test VALUES (122,'2017-03-16','2017-04-15',280.00,0.00,280.00,'2017-03-21',NULL) INSERT INTO #Test VALUES (120,'2017-03-16','2017-04-15',0.00,200.00,200.00,'2017-04-01',NULL) INSERT INTO #Test VALUES (122,'2017-04-16','2017-05-15',280.00,0.00,280.00,'2017-03-21',NULL) INSERT INTO #Test VALUES (120,'2017-04-16','2017-05-15',0.00,200.00,200.00,'2017-04-01',NULL) INSERT INTO #Test VALUES (100,'2017-04-16','2017-05-15',0.00,450.00,450.00,'2017-04-21',NULL)

SELECT ServiceID, StartDate = MIN(StartDate), EndDate = MAX(EndDate), Amount FROM ( SELECT ServiceID, StartDate = COALESCE(Priority1Start,StartDate), EndDate = CASE WHEN Priority1Start IS NOT NULL THEN Priority1End ELSE EndDate END, Amount = COALESCE(Priority1,CASE WHEN Priority2 <> 0 THEN Priority2 ELSE Priority3 END) FROM #Test) t1 GROUP BY ServiceID, Amount ORDER BY 1,2,3

The results show an overlap on ServiceID 100 for the amount of 200.00 and I'm unsure how to get around that.
Also, Priority2 and Priority3 are the old system and Priority1 is the new system and when the data changes from the old system to the new system it leaves a gap in the dates. I actually worked around that by inserting this into a temp table with an identity increment and then joining it to itself and checking the startdate date was always 1 day before the end date for each service. If not I would change the date. This was working ok except for the duplicate amounts.
I also noticed that the data on line 3 has the same dates as line 2 which is quite common throughout the table so it should ignore priority3 and only include priority2 and apart from running a delete query first to remove the duplicate data I wasn't sure how to deal with this.

The final outcome should actually look

ServiceID	StartDate	EndDate	Amount
100	2016/04/16	2016/06/15	100.00
100	2016/06/16	2016/08/15	150.00
100	2016/08/16	2016/09/15	180.00
100	2016/09/16	2016/10/15	200.00
100	2016/10/16	2016/12/15	240.00
100	2016/12/16	2017/03/04	200.00
100	2017/03/05	2017/04/20	400.00
100	2017/04/21	NULL	450.00
120	2016/11/16	2017/03/31	300.00
120	2017/04/01	NULL	200.00
122	2017/01/16	2017/03/20	180.00
122	2017/03/21	NULL	280.00

Sorry for the long post and I hope it all makes sense. Each line in the table also has a unique id if needed although they aren't in any particular order. It basically a table of charges that generates every month. Is it possible to do what I'm trying to do please?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

-- "choose priority values" and "eliminate date range dupes" could be done within the same subquery. SELECT ServiceID, StartDate = MIN(StartDate), EndDate = MAX(EndDate), Amount FROM ( -- d3 SELECT -- manufacture an appropriate "GROUP BY" column ServiceID, StartDate, EndDate, Amount, Grouper = DATEADD(MONTH,0-ROW_NUMBER() OVER(PARTITION BY ServiceID, Amount ORDER BY StartDate),EndDate) FROM ( -- d2 SELECT -- eliminate date range dupes ServiceID, StartDate, EndDate, Amount = MIN(Amount) FROM ( -- d1 SELECT -- choose priority values ServiceID, StartDate = COALESCE(Priority1Start,StartDate), EndDate = CASE WHEN Priority1Start IS NOT NULL THEN Priority1End ELSE EndDate END, Amount = COALESCE(Priority1,NULLIF(Priority2,0),Priority3) FROM #Test ) d1 GROUP BY ServiceID, StartDate, EndDate ) d2 ) d3 GROUP BY ServiceID, Grouper, Amount ORDER BY ServiceID, StartDate

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

jamie 82947 SSC Enthusiast Points: 146 More actions · Answer 2

Thank you so much!! This is awesome and works great! After running it through all of the real data I found a few more problems however I was able to adapt your code to make it work perfectly.
I had not used NULLIF before but that is going to come in handy. I'm also still trying to get my head around your Grouper as I've never used ROW_NUMBER OVER PARTITION before either but I'm sure I'll figure it out as I test it on various data. I love learning extra commands that I have never used before.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

jamie 82947 - Sunday, May 7, 2017 9:53 PM
Thank you so much!! This is awesome and works great! After running it through all of the real data I found a few more problems however I was able to adapt your code to make it work perfectly.
I had not used NULLIF before but that is going to come in handy. I'm also still trying to get my head around your Grouper as I've never used ROW_NUMBER OVER PARTITION before either but I'm sure I'll figure it out as I test it on various data. I love learning extra commands that I have never used before.

Excellent work Jamie for figuring out the tweaks, and many thanks for the feedback.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden