CTE Adding a value to a 'datetime' column caused an overflow

Question

CTE Adding a value to a 'datetime' column caused an overflow

Lee Polikoff

Mr or Mrs. 500

Points: 576
More actions
January 6, 2016 at 11:49 am

#323368

I have a table with gaps in the data. The requirement is to fill in the gaps with the data from the last month that had data.
Here is a sample of the source data:
SOURCEUSER_IDPHASE PHASE_MONTH
CMN625150Open 1/1/2014
CMN625150In Work 3/1/2014
CMN625150In Work 9/1/2014
CMN625150Released 10/1/2014
The result should be:
CMN625150Open 1/1/2014
CMN625150Open 2/1/2014
CMN625150In Work 3/1/2014
CMN625150In Work 4/1/2014
CMN625150In Work 5/1/2014
CMN625150In Work 6/1/2014
CMN625150In Work 7/1/2014
CMN625150In Work 8/1/2014
CMN625150In Work 9/1/2014
CMN625150Released 10/1/2014
CMN625150Released 11/1/2014
etc...
I created a CTE but it is throwing a overflow error. Here is some sample code:
CREATE TABLE #SAMPLETABLE
(
[SOURCE] [varchar](50) NOT NULL,
[USER_ID] [varchar](50) NOT NULL,
[PHASE] [varchar](100) NOT NULL,
[PHASE_MONTH] [datetime] NOT NULL
)
insert into #SAMPLETABLE
Select 'CMN','625150','Open','1/1/2014' Union all
Select 'CMN','625150','In Work','3/1/2014' Union all
Select 'CMN','625150','In Work','9/1/2014' Union all
Select 'CMN','625150','Released','10/1/2014'
;with cte as
(
select SOURCE,USER_ID,PHASE,PHASE_MONTH
from #SAMPLETABLE
union all
select SOURCE,USER_ID,PHASE,dateadd(mm, 1, PHASE_MONTH)
from cte
where not exists
--the resultant YearMonth value should not lie in the original table
( select 1 from #SAMPLETABLE s
where s.PHASE_MONTH = dateadd(mm, 1, cte.PHASE_MONTH)
--stop at current month
and dateadd(mm, 1, cte.PHASE_MONTH) < GETDATE()
)
)
select SOURCE,USER_ID,PHASE,PHASE_MONTH from cte
order by PHASE_MONTH
option (maxrecursion 0)
drop table #SAMPLETABLE
Can someone please tell me what I need to do to get this to work correctly?
The source data has about 100K rows.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 1

The immediate problem is the location of your clause that restricts the results to dates before today.

Right now it's in the NOT EXISTS subquery, which means that once rows start getting generated with dates greater than today, the subquery won't return rows for those dates. When the subquery doesn't return rows, the NOT EXISTS evaluates to true, so the recursion continues until you've exceeded the maximum value for datetime.

To fix that problem, you just need to move this

AND DATEADD(mm, 1, cte.PHASE_MONTH) < GETDATE()

outside the NOT EXISTS subquery.

Additionally, there's likely a much more efficient way to solve this problem than the rCTE. I can't test alternate solutions right now, but when I can I'll follow up.

Cheers!

EDIT: Got bitten by the greater-than/less-than transformations in code tags again, so removed them.

Lee Polikoff Mr or Mrs. 500 Points: 576 More actions · Answer 2

So Simple - Thanks!

I tried doing this using nested loops and it kept crashing my system after hours of processing. I just made the change to the CTE and it took 9 seconds.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Here's an alternative to generate the rows using cascade CTEs.

I'm including a SQL 2012+ version which you might want to use if at all possible.

WITH

E(n) AS(

SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

),

E2(n) AS(

SELECT a.n FROM E a, E b

),

cteTally(n) AS(

SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

FROM E2

),

Phases AS(

SELECT SOURCE,

USER_ID,

PHASE,

PHASE_MONTH,

ROW_NUMBER() OVER( PARTITION BY SOURCE, USER_ID ORDER BY PHASE_MONTH) rn

FROM #SAMPLETABLE

),

PhasesRanges AS(

SELECT s.SOURCE,

s.USER_ID,

s.PHASE,

s.PHASE_MONTH,

ISNULL( e.PHASE_MONTH, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()) + 1, 0)) AS end_range

FROM Phases s

LEFT

JOIN Phases e ON s.source = e.source

AND s.user_id = e.user_id

AND s.rn = e.rn - 1

)

SELECT p.SOURCE,

p.USER_ID,

p.PHASE,

DATEADD(MM, t.n - 1, p.PHASE_MONTH) PHASE_MONTH

FROM cteTally t

JOIN PhasesRanges p ON p.PHASE_MONTH <= DATEADD(MM, -t.n, p.end_range)

ORDER BY PHASE_MONTH;

-- SQL Server 2012+ version using LEAD()

WITH

E(n) AS(

SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

),

E2(n) AS(

SELECT a.n FROM E a, E b

),

cteTally(n) AS(

SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

FROM E2

),

Phases AS(

SELECT SOURCE,

USER_ID,

PHASE,

PHASE_MONTH,

ISNULL( LEAD(PHASE_MONTH) OVER( PARTITION BY SOURCE, USER_ID ORDER BY PHASE_MONTH) , DATEADD(MM, DATEDIFF(MM, 0, GETDATE()) + 1, 0)) AS end_range

FROM #SAMPLETABLE

)

SELECT p.SOURCE,

p.USER_ID,

p.PHASE,

DATEADD(MM, t.n - 1, p.PHASE_MONTH) PHASE_MONTH

FROM cteTally t

JOIN Phases p ON p.PHASE_MONTH <= DATEADD(MM, -t.n, p.end_range)

ORDER BY PHASE_MONTH;

Note: The PARTITION BY might be removed if it's not needed and it's causing incorrect results.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Eirikur Eiriksson SSC Guru Points: 182918 More actions · Answer 4

Further on Luis's fine answer, make certain that the proper POC (partition, order, covering) index is in place as it will remove the need for a very expensive sort operator in the execution plan.

😎

POC index for the #SAMPLETABLE

CREATE NONCLUSTERED INDEX NCLIDX_TMP_SAMPLETABLE_SOURCE_USER_ID_PHASE_MONTH_INCL_PASE ON #SAMPLETABLE

(

[SOURCE] ASC

,[USER_ID] ASC

,[PHASE_MONTH] ASC

)

INCLUDE ([PHASE]);