CTE Adding a value to a 'datetime' column caused an overflow

  • I have a table with gaps in the data. The requirement is to fill in the gaps with the data from the last month that had data.

    Here is a sample of the source data:

    SOURCEUSER_IDPHASE PHASE_MONTH

    CMN625150Open 1/1/2014

    CMN625150In Work 3/1/2014

    CMN625150In Work 9/1/2014

    CMN625150Released 10/1/2014

    The result should be:

    CMN625150Open 1/1/2014

    CMN625150Open 2/1/2014

    CMN625150In Work 3/1/2014

    CMN625150In Work 4/1/2014

    CMN625150In Work 5/1/2014

    CMN625150In Work 6/1/2014

    CMN625150In Work 7/1/2014

    CMN625150In Work 8/1/2014

    CMN625150In Work 9/1/2014

    CMN625150Released 10/1/2014

    CMN625150Released 11/1/2014

    etc...

    I created a CTE but it is throwing a overflow error. Here is some sample code:

    CREATE TABLE #SAMPLETABLE

    (

    [SOURCE] [varchar](50) NOT NULL,

    [USER_ID] [varchar](50) NOT NULL,

    [PHASE] [varchar](100) NOT NULL,

    [PHASE_MONTH] [datetime] NOT NULL

    )

    insert into #SAMPLETABLE

    Select 'CMN','625150','Open','1/1/2014' Union all

    Select 'CMN','625150','In Work','3/1/2014' Union all

    Select 'CMN','625150','In Work','9/1/2014' Union all

    Select 'CMN','625150','Released','10/1/2014'

    ;with cte as

    (

    select SOURCE,USER_ID,PHASE,PHASE_MONTH

    from #SAMPLETABLE

    union all

    select SOURCE,USER_ID,PHASE,dateadd(mm, 1, PHASE_MONTH)

    from cte

    where not exists

    --the resultant YearMonth value should not lie in the original table

    ( select 1 from #SAMPLETABLE s

    where s.PHASE_MONTH = dateadd(mm, 1, cte.PHASE_MONTH)

    --stop at current month

    and dateadd(mm, 1, cte.PHASE_MONTH) < GETDATE()

    )

    )

    select SOURCE,USER_ID,PHASE,PHASE_MONTH from cte

    order by PHASE_MONTH

    option (maxrecursion 0)

    drop table #SAMPLETABLE

    Can someone please tell me what I need to do to get this to work correctly?

    The source data has about 100K rows.

  • The immediate problem is the location of your clause that restricts the results to dates before today.

    Right now it's in the NOT EXISTS subquery, which means that once rows start getting generated with dates greater than today, the subquery won't return rows for those dates. When the subquery doesn't return rows, the NOT EXISTS evaluates to true, so the recursion continues until you've exceeded the maximum value for datetime.

    To fix that problem, you just need to move this

    AND DATEADD(mm, 1, cte.PHASE_MONTH) < GETDATE()

    outside the NOT EXISTS subquery.

    Additionally, there's likely a much more efficient way to solve this problem than the rCTE. I can't test alternate solutions right now, but when I can I'll follow up.

    Cheers!

    EDIT: Got bitten by the greater-than/less-than transformations in code tags again, so removed them.

  • So Simple - Thanks!

    I tried doing this using nested loops and it kept crashing my system after hours of processing. I just made the change to the CTE and it took 9 seconds.

  • Here's an alternative to generate the rows using cascade CTEs.

    I'm including a SQL 2012+ version which you might want to use if at all possible.

    WITH

    E(n) AS(

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    ),

    E2(n) AS(

    SELECT a.n FROM E a, E b

    ),

    cteTally(n) AS(

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

    FROM E2

    ),

    Phases AS(

    SELECT SOURCE,

    USER_ID,

    PHASE,

    PHASE_MONTH,

    ROW_NUMBER() OVER( PARTITION BY SOURCE, USER_ID ORDER BY PHASE_MONTH) rn

    FROM #SAMPLETABLE

    ),

    PhasesRanges AS(

    SELECT s.SOURCE,

    s.USER_ID,

    s.PHASE,

    s.PHASE_MONTH,

    ISNULL( e.PHASE_MONTH, DATEADD(MM, DATEDIFF(MM, 0, GETDATE()) + 1, 0)) AS end_range

    FROM Phases s

    LEFT

    JOIN Phases e ON s.source = e.source

    AND s.user_id = e.user_id

    AND s.rn = e.rn - 1

    )

    SELECT p.SOURCE,

    p.USER_ID,

    p.PHASE,

    DATEADD(MM, t.n - 1, p.PHASE_MONTH) PHASE_MONTH

    FROM cteTally t

    JOIN PhasesRanges p ON p.PHASE_MONTH <= DATEADD(MM, -t.n, p.end_range)

    ORDER BY PHASE_MONTH;

    -- SQL Server 2012+ version using LEAD()

    WITH

    E(n) AS(

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    ),

    E2(n) AS(

    SELECT a.n FROM E a, E b

    ),

    cteTally(n) AS(

    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

    FROM E2

    ),

    Phases AS(

    SELECT SOURCE,

    USER_ID,

    PHASE,

    PHASE_MONTH,

    ISNULL( LEAD(PHASE_MONTH) OVER( PARTITION BY SOURCE, USER_ID ORDER BY PHASE_MONTH) , DATEADD(MM, DATEDIFF(MM, 0, GETDATE()) + 1, 0)) AS end_range

    FROM #SAMPLETABLE

    )

    SELECT p.SOURCE,

    p.USER_ID,

    p.PHASE,

    DATEADD(MM, t.n - 1, p.PHASE_MONTH) PHASE_MONTH

    FROM cteTally t

    JOIN Phases p ON p.PHASE_MONTH <= DATEADD(MM, -t.n, p.end_range)

    ORDER BY PHASE_MONTH;

    Note: The PARTITION BY might be removed if it's not needed and it's causing incorrect results.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Further on Luis's fine answer, make certain that the proper POC (partition, order, covering) index is in place as it will remove the need for a very expensive sort operator in the execution plan.

    😎

    POC index for the #SAMPLETABLE

    CREATE NONCLUSTERED INDEX NCLIDX_TMP_SAMPLETABLE_SOURCE_USER_ID_PHASE_MONTH_INCL_PASE ON #SAMPLETABLE

    (

    [SOURCE] ASC

    ,[USER_ID] ASC

    ,[PHASE_MONTH] ASC

    )

    INCLUDE ([PHASE]);

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply