How to return a range value based on the first day of year and the value of row

  • Hello,

    i have a table called Events, the value of that table like as bellow :

    create table Events (
    id int,
    code_events varchar(10),
    Events varchar(10),
    Events_start datetime,
    Events_end datetime
    )

    insert into Events values(1,'AC_83','Event 1','2020-07-15','2020-07-30')
    insert into Events values(2,'AC_84','Event 2','2019-06-01','2030-07-30')
    insert into Events values(3,'AC_86','Event 3','2020-07-15','2022-12-15')
    insert into Events values(4,'AC_83','Event 1','2020-09-15','2025-04-30')
    insert into Events values(5,'AC_87','Event 4','2021-07-15','2022-12-15')

    I want to return a range of missing dates from the first date of 2020 and only if Events_start>=2020

    As bellow what i want to get :

    code_events    Events    Events_start    Events_end                       IS_Messing
    AC_83 Event 1 2020-01-01 00:00:00.000 2020-07-15 00:00:00.000 1
    AC_83 Event 1 2020-07-15 00:00:00.000 2020-07-30 00:00:00.000 0
    AC_83 Event 1 2020-07-30 00:00:00.000 2020-09-15 00:00:00.000 1
    AC_83 Event 1 2020-09-15 00:00:00.000 2025-04-30 00:00:00.000 0
    AC_84 Event 2 2019-06-01 00:00:00.000 2030-07-30 00:00:00.000 0
    AC_86 Event 3 2020-01-01 00:00:00.000 2020-07-15 00:00:00.000 1
    AC_86 Event 3 2020-07-15 00:00:00.000 2022-12-15 00:00:00.000 0
    AC_87 Event 4 2020-01-01 00:00:00.000 2021-07-15 00:00:00.000 1
    AC_87 Event 4 2021-07-15 00:00:00.000 2022-12-15 00:00:00.000 0

    Any idea how can i do that ?

    Thanks for help !

    • This topic was modified 3 years, 1 month ago by  IdealPR.
  • It's a one time only type query or it's intended to be run frequently?  The id value in the last row of the sample data should be 5, no?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • WITH PrevEnds
    AS
    (
    SELECT E.code_events, E.[Events], E.Events_start, E.Events_end
    ,LAG(E.Events_end) OVER (PARTITION BY E.code_events ORDER BY E.Events_start) AS Prev_Events_end
    FROM Events E
    )
    ,Results
    AS
    (
    SELECT P.code_events, P.[Events]
    ,CASE
    WHEN N.N = 1 AND X.Keep1 = 1
    THEN COALESCE(P.Prev_Events_end, '20200101')
    ELSE P.Events_start
    END AS Events_start
    ,CASE
    WHEN N.N = 1 AND X.Keep1 = 1
    THEN P.Events_start
    ELSE P.Events_end
    END AS Events_end
    ,CASE
    WHEN N.N = 1 AND X.Keep1 = 1
    THEN 1
    ELSE 0
    END AS IsMissing
    ,CASE
    WHEN N.N = 2 OR X.Keep1 = 1
    THEN 1
    ELSE 0
    END AS KeepRow
    FROM PrevEnds P
    CROSS APPLY (VALUES (1),(2)) N(N)
    CROSS APPLY
    (
    VALUES
    (
    CASE
    WHEN P.Events_start <> COALESCE(P.Prev_Events_end, '19000101')
    AND P.Events_start > '20200101'
    THEN 1
    ELSE 0
    END
    )
    ) X (Keep1)
    )
    SELECT code_events, [Events], Events_start, Events_end, IsMissing
    FROM Results
    WHERE KeepRow = 1
    ORDER BY Code_events, Events_start;

    • This reply was modified 3 years, 1 month ago by  Ken McKelvey.
  • the query will be run frequently

    yes the id should be 5 it is a mistake

    • This reply was modified 3 years, 1 month ago by  IdealPR.
  • Ken McKelvey this looks familiar

    https://www.sqlservercentral.com/forums/topic/need-to-create-gaps-in-a-table-of-date-ranges

    Again one "brute force" way to do this would be to expand all the date ranges and use the set operator EXCEPT to remove the (also expanded) gaps.  Then the results could be grouped by splitting the calculated dates into contiguous ranges.  Reliable and simple (sort of) but maybe not the fastest way

    with
    unq_events_cte as (
    select code_events, min(Events_end) min_end
    from #Events
    group by code_events
    having year(min(Events_start))>=2020),
    exp_cte as (
    select code_events, v.dt
    from unq_events_cte e
    cross apply dbo.fnTally(0, datediff(day, '20200101', iif(e.min_end<'20201231', '20201231', e.min_end))) fn
    cross apply (values (dateadd(day, fn.n, '20200101'))) v(dt)
    except
    select code_events, v.dt
    from #Events e
    cross apply dbo.fnTally(0, datediff(day, e.Events_start, e.Events_end)) fn
    cross apply (values (dateadd(day, fn.n, e.Events_start))) v(dt)),
    gap_cte as (
    select *, case when datediff(day, dt, lag(dt, 1, 1)
    over (partition by code_events
    order by dt))<>-1
    then 1 else 0 end gap
    from exp_cte),
    grp_cte as (
    select *, sum(gap) over (partition by code_events order by dt) grp
    from gap_cte)
    select code_events,
    iif(min(dt)='20200101', '20200101', dateadd(day, 1, min(dt))) Events_start,
    iif(max(dt)='20201231', '20201231', dateadd(day, 1, max(dt))) Events_end,
    1 Is_missing
    from grp_cte
    group by code_events, grp
    union all
    select code_events, Events_start, Events_end, 0
    from #Events
    order by code_events, Events_start;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • >> I have a table called Events, the value of that table like as bellow :<<

    Actually you don't have a table in your posting. By definition, a table must have a key and that means it's columns cannot all be NULL. You also don't know there's no such thing as a generic "_id" in RDBMS. It must be the identifier of something in particular, it can't be a numeric because it's always measured on a nominal scale.

    You might also want to read the ISO 11179 standards and learn how to properly name columns. For example, since a column is made of rows, and each row is made of columns, you can't have plural column names. They are by their very nature, in the definition of RDBMS, scalar values.

    CREAT TABLE Events

    (event_code VARCHAR(10) NOT NULL

    CHECK (event_code IN (..)),

    event_name VARCHAR(10) NOT NULL,

    event_start_date DATE NOT NULL,

    event_end_date DATE,

    CHECK (event_start_date <= event_end_date),

    PRIMARY KEY (event_code, event_name, event_start_date)

    );

    Did you know we've had a DATE data type for several years now? There is no need to carry the time and in fact simply waste storage. You're still writing that old Sybase code!

    Please note the use of a check clause to guarantee that your codes are drawn from a list. But it looks like what you're calling a code is not what we call a code. It looks more like the name of an event with the year attached to it. Think of an area code and phone numbe, as an example of what a code is. You also want to use a check constraint to be sure the start and end dates are in order. Notice the use of a null for the end date of an event; this means event is still ongoing and has not yet completed. You want to download a copy of temporal queries and SQL by Rick Snodgrass. It's a free PDF and the University of Arizona on their website.

    Unfortunate this is still pretty bad design, as far as I can tell. This is why you wind up with a three column primary key when a unique event name would have allowed a single column. You will eventually discover that most of the work and SQL is done in the DDL, and not the DML you have to do elaborate things in the DML, it's usually because you screwed up the basic design and have to include everything.

    We've had no need for the old-fashioned Sybase insertion one at a time. If you put in a table constructor, this will let the optimizer stand a chance of picking the best way to do this. If you do it as if you were still using punch cards, the SQL engine is obligated to treat them as punchcards one at a time. The semi-colon was also left out of Sybase in the early days, but now many of the SQL Server statements require it. Since it is ANSI and ISO standard. You might want to start actually writing real SQL and not dialect.

    INSERT INTO Events

    VALUES

    ('AC_83', 'Event 1', '2020-07-15', '2020-07-30'),

    ('AC_83', 'Event 1', '2020-09-15', '2025-04-30'),

    ('AC_84', 'Event 2', '2019-06-01', '2030-07-30'),

    ('AC_86', 'Event 3', '2020-07-15', '2022-12-15'),

    ('AC_87', 'Event 4', '2021-07-15', '2022-12-15');

    >> I want to return a range of missing dates from the first date of 2020 and only if event_start>=2020 <<

    Why are you writing SQL as if it was assembly language? We don't use bit flags in high-level languages, and in particular since oriented high-level languages. Again, this is designed failure will lead to elaborate queries and poor performance.

    As below what I want to get :

    code_events Events event_start event_end is_missing_flg

    AC_83 Event 1 2020-07-15 2020-07-30 0

    AC_83 Event 1 2020-07-30 2020-09-15 1

    AC_83 Event 1 2020-09-15 2025-04-30 0

    AC_84 Event 2 2019-06-01 2030-07-30 0

    AC_86 Event 3 2020-01-01 2020-07-15 1

    AC_86 Event 3 2020-07-15 2022-12-15 0

    AC_87 Event 4 2020-01-01 2021-07-15 1

    AC_87 Event 4 2021-07-15 2022-12-15 0

    did you notice that the end of one of your intervals is the start of the following interval? So the event status at the intersection of these two intervals is ambiguous. Is it missing or not missing? Or is it a quantum thin so it is both?

    Do you have a calendar table? Instead of actually materializing the missing data flags, you could compute them with a simple lookup. But right now your specs are too vague to make any judgments. Why don't you post specs that tell us what you're actually trying to do?.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Steve Collins wrote:

    this looks familiar

    I think both windowed and brute force approaches have their place. I have just tidied up my code to try and make it more understandable:

    WITH PrevEnds
    AS
    (
    -- Get previous Events_end as will be needed to determine if there are missing rows
    -- and also for the Events_start value of missing rows
    SELECT E.code_events, E.[Events], E.Events_start, E.Events_end
    ,LAG(E.Events_end)
    OVER (PARTITION BY E.code_events ORDER BY E.Events_start) AS Prev_Events_end
    FROM Events E
    )
    SELECT P.code_events, P.[Events]
    ,CASE
    WHEN X.IsMissing = 1
    THEN COALESCE(P.Prev_Events_end, '2020')
    ELSE P.Events_start
    END AS Events_start
    ,CASE
    WHEN X.IsMissing = 1
    THEN P.Events_start
    ELSE P.Events_end
    END AS Events_end
    ,X.IsMissing
    FROM PrevEnds P
    -- The maximum number of missing rows = the number of original rows
    CROSS APPLY (VALUES ('PossibleMissing'),('Original')) R(RowType)
    -- Determine if the row is missing
    CROSS APPLY
    (
    VALUES
    (
    CASE
    WHEN R.RowType = 'PossibleMissing'
    AND P.Events_start <> COALESCE(P.Prev_Events_end, '1900')
    AND P.Events_start > '2020'
    THEN 1
    ELSE 0
    END
    )
    ) X (IsMissing)
    WHERE R.RowType = 'Original' OR X.IsMissing = 1
    ORDER BY Code_events, Events_start;

    • This reply was modified 3 years, 1 month ago by  Ken McKelvey.
    • This reply was modified 3 years, 1 month ago by  Ken McKelvey.
    • This reply was modified 3 years, 1 month ago by  Ken McKelvey.
    • This reply was modified 3 years, 1 month ago by  Ken McKelvey.
    • This reply was modified 3 years, 1 month ago by  Ken McKelvey.
    • This reply was modified 3 years, 1 month ago by  Ken McKelvey.
  • Ken McKelvey wrote:

    Steve Collins wrote:

    this looks familiar

    I think both windowed and brute force approaches have their place. I have just tidied up my code to try and make it more understandable:

    It looks good.  Next time I'm going to start with your code 🙂

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply