How to build 7 day slots based on start and end dates?

  • Hi All,

    I have a interesting challenge. I have data lets say RowID, UserID, Date in a table. If UserID appears twice within a 7 day period, i need to delete the one with the lower RowID. The data is for one month only and each month min and max dates are different. i have to use those to define "one month", i cannot use standard months. So lets say its Jan 2 - Jan 30. if at the end of the month its not a complete 7 day period, use whatever days are left to create the last week.

    the key is to build those 7 day time slots in a programmatic way which i am short-handed with. I am thinking CTEs or maybe a table variable but the logic is escaping me. Any gurus can shed some pointers?

    Thanks!

  • Are you wanting to delete any user record if that user has another record within 7 actual days or are you only wanting to do this if it's in the same specific 7 day span (such as Sunday - Saturday)?

    If it's the former, you can use something like this:

    DELETE

    FROM YourTable yt

    WHERE EXISTS (SELECT sq.RowID

    FROM YourTable sq

    WHERE sq.UserID = yt.UserID and

    sq.RowID > yt.RowID and

    DATEDIFF(dd, yt.[Date], sq.[Date]) < 7)

    On a side note, I personally dislike doing deletions like this. I would rather add a bit field 'IsActive' if possible and deactive records rather than deleting them.

    └> bt



    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Hi buddy, got ur solution or looking for something else......

  • Please explain a little more detailed what your looking for. The best way would be to give us some sample data and expected result. Furthermore, you need to define the "direction" where to start.

    Example: date values: 3-20, 4-1, 4-6, 4-9, 4-14

    If you start at the most recent date, you'd eleminate 4-9 and 4-1. But if you'd start with the oldest date, 4-6 and 4-14 need to be removed...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Example: date values: 3-20, 4-1, 4-6, 4-9, 4-14

    If you start at the most recent date, you'd eleminate 4-9 and 4-1. But if you'd start with the oldest date, 4-6 and 4-14 need to be removed...

    Also, how is your data entered? Is it done on a daily basis throughout the month or bulk entered into a table at the end of a period? That too makes a difference. For instance, if it were done on a daily basis the only records left from Imu92's example would be 3/20 and 4/14. 4/1 would be eliminated when 4/6 was entered, 4/6 would be eliminated when 4/9 was entered, and 4/9 would be eliminated when 4/14 was entered.

    If it is data is entered interactively or on a daily basis then the answer is very straightfoward and can go in a trigger or in the procedure you use to insert the data (assuming the same procedure is used for all data inserts).

    If you are bulk loading the data at the end of the time period then you need to establish very clear rules for eliminating the data. As has been pointed out, it makes a big difference in the data.

  • sorry for the late reply guys

    berry...this worked perfect..i cant believe how simple this is...

    if anyone is interested, here's the sample script

    create table sample_7day (call_id int identity(1,1), UserID int, Post_date datetime)

    insert into sample_7day values (1, '20100102')

    insert into sample_7day values (1, '20100109')

    insert into sample_7day values (1, '20100102')

    insert into sample_7day values (2, '20100103')

    insert into sample_7day values (3, '20100104')

    insert into sample_7day values (3, '20100104')

    insert into sample_7day values (3, '20100111')

    insert into sample_7day values (3, '20100123')

    insert into sample_7day values (4, '20100123')

  • bteraberry (4/8/2010)


    Are you wanting to delete any user record if that user has another record within 7 actual days or are you only wanting to do this if it's in the same specific 7 day span (such as Sunday - Saturday)?

    If it's the former, you can use something like this:

    DELETE

    FROM YourTable yt

    WHERE EXISTS (SELECT sq.RowID

    FROM YourTable sq

    WHERE sq.UserID = yt.UserID and

    sq.RowID > yt.RowID and

    DATEDIFF(dd, yt.[Date], sq.[Date]) < 7)

    On a side note, I personally dislike doing deletions like this. I would rather add a bit field 'IsActive' if possible and deactive records rather than deleting them.

    Hi bteraberry,

    First, I absolutely agree... I dislike doing deltions like this and would rather mark a column as you suggest.

    On the code side of the house, I'm not sure what I'm doing differently than you but I get the following error when I run your code...

    Msg 102, Level 15, State 1, Line 2

    Incorrect syntax near 'yt'.

    When I make the necessary fix, it has some unintended consequences...

    --===== Create a table with all the same UserID and a year's worth of dates

    SELECT IDENTITY(INT,1,1) AS RowID,

    1 AS UserID,

    DATEADD(dd,Number,'20100101') AS Date

    INTO dbo.YourTable

    FROM Master.dbo.spt_Values

    WHERE Type = 'P'

    AND Number BETWEEN 0 AND 364

    SELECT * FROM dbo.YourTable

    --===== Run the repaired delete

    DELETE yt

    FROM YourTable yt

    WHERE EXISTS (SELECT sq.RowID

    FROM YourTable sq

    WHERE sq.UserID = yt.UserID and

    sq.RowID > yt.RowID and

    DATEDIFF(dd, yt.[Date], sq.[Date]) < 7)

    SELECT * FROM dbo.YourTable

    DROP TABLE dbo.YourTable

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BIMind (4/10/2010)


    sorry for the late reply guys

    berry...this worked perfect..i cant believe how simple this is...

    if anyone is interested, here's the sample script

    create table sample_7day (call_id int identity(1,1), UserID int, Post_date datetime)

    insert into sample_7day values (1, '20100102')

    insert into sample_7day values (1, '20100109')

    insert into sample_7day values (1, '20100102')

    insert into sample_7day values (2, '20100103')

    insert into sample_7day values (3, '20100104')

    insert into sample_7day values (3, '20100104')

    insert into sample_7day values (3, '20100111')

    insert into sample_7day values (3, '20100123')

    insert into sample_7day values (4, '20100123')

    There's a potential problem with the code. Please post the actual code that you're using so one of us can test it to make sure that you're not going to delete more than you bargained for.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hmmm Jeff..thanks for the input...

    i am surprised by the results with your code...i am wondering why...because it works perfectly with the same data i posted above

  • Here's one very high speed method that also has the advantage of being able to stop the code just prior to the final delete to verify what's actually going to be deleted (test data generator included)...

    --=============================================================================

    -- Create and populate a test table with known data.

    -- This section isn't part of the solution. It's just to prove the code.

    --=============================================================================

    --===== Create a table with all the same UserID and a year's worth of dates

    SELECT IDENTITY(INT,1,1) AS RowID,

    1 AS UserID,

    DATEADD(dd,Number,'20100101') AS Date

    INTO dbo.YourTable

    FROM Master.dbo.spt_Values

    WHERE Type = 'P'

    AND Number BETWEEN 0 AND 364

    --===== Ok... now add dupes (triplicates really) for all those rows

    -- just to make this a "worst case" problem

    INSERT INTO dbo.YourTable

    SELECT UserID, Date

    FROM dbo.YourTable

    CROSS JOIN (SELECT 1 UNION ALL SELECT 1)t(N) --Cross join causes the dupes

    --===== Show the mess we have on our hands (this section can be deleted)

    SELECT * FROM dbo.YourTable ORDER BY UserID, Date, RowID

    --=============================================================================

    -- Solve the problem

    --=============================================================================

    --===== First, copy the required data from your table

    -- into someplace where we can do a little manipulation

    SELECT *, CAST(0 AS TINYINT) AS KeepMe

    INTO #Delete

    FROM dbo.YourTable

    --===== Add the quintessential clustered index

    CREATE INDEX IX_#Delete_Guide

    ON #Delete (UserID, Date, RowID) WITH FILLFACTOR = 100

    --===== Declare and preset some handy variables

    DECLARE @PrevUserID INT,

    @KeepDate DATETIME,

    @PrevKeepMe TINYINT

    SELECT @PrevUserID = -1,

    @KeepDate = '17530101',

    @PrevKeepMe = 0

    --===== Mark the rows to either be kept or deleted

    UPDATE d

    SET @PrevKeepMe = KeepMe

    = CASE

    WHEN UserID <> @PrevUserID THEN 1

    WHEN DATEDIFF(dd,@KeepDate,Date) < 7 THEN 0

    ELSE 1

    END,

    @KeepDate = CASE WHEN @PrevKeepMe = 1 THEN Date ELSE @KeepDate END,

    @PrevUserID = UserID

    FROM #Delete d WITH(TABLOCKX)

    OPTION (MAXDOP 1)

    --===== Now, do the final delete on the original table

    DELETE yt

    FROM dbo.YourTable yt

    INNER JOIN #Delete d

    ON yt.RowID = d.RowID

    AND d.KeepMe = 0

    --===== Show the original table after the deletes have occurred

    SELECT * FROM dbo.YourTable ORDER BY UserID, Date, RowID

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • BIMind (4/10/2010)


    Hmmm Jeff..thanks for the input...

    i am surprised by the results with your code...i am wondering why...because it works perfectly with the same data i posted above

    It's because there is no break in the dates for the given UserID... therefor, all the dates are within 7 days of some other date. The code I wrote above should do just fine and it's very fast. Read the comments in the code. The column order of the clustered index on the Temp Table is essential as is the OPTION (MAXDOP 1) hint and the WITH(TABLOCKX) hint.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh... and just to be sure... this code also removes dupes if they occur on the same date. Only the earliest dupe according to the RowID will be kept. Change the clustered index so RowID is DESC if you want to keep the latest dupe according to RowID. If you want to keep all dupes for a given date, then we need to make a change. Lemme know, please.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff...this is pretty close...however this keeps the lowest row ID of any duplicates found and deletes the higher row IDs..i need it the other way around..

    i tried row ID desc in the c. index and changing column positions but i cant get it how i want?

  • BIMind (4/12/2010)


    Jeff...this is pretty close...however this keeps the lowest row ID of any duplicates found and deletes the higher row IDs..i need it the other way around..

    i tried row ID desc in the c. index and changing column positions but i cant get it how i want?

    I'm not sure which column positions you changed but put them back the way they were. Just changing the ROWID to DESC in the clustered index does the trick.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply