April 8, 2010 at 6:14 pm
Hi All,
I have a interesting challenge. I have data lets say RowID, UserID, Date in a table. If UserID appears twice within a 7 day period, i need to delete the one with the lower RowID. The data is for one month only and each month min and max dates are different. i have to use those to define "one month", i cannot use standard months. So lets say its Jan 2 - Jan 30. if at the end of the month its not a complete 7 day period, use whatever days are left to create the last week.
the key is to build those 7 day time slots in a programmatic way which i am short-handed with. I am thinking CTEs or maybe a table variable but the logic is escaping me. Any gurus can shed some pointers?
Thanks!
April 8, 2010 at 6:45 pm
Are you wanting to delete any user record if that user has another record within 7 actual days or are you only wanting to do this if it's in the same specific 7 day span (such as Sunday - Saturday)?
If it's the former, you can use something like this:
DELETE
FROM YourTable yt
WHERE EXISTS (SELECT sq.RowID
FROM YourTable sq
WHERE sq.UserID = yt.UserID and
sq.RowID > yt.RowID and
DATEDIFF(dd, yt.[Date], sq.[Date]) < 7)
On a side note, I personally dislike doing deletions like this. I would rather add a bit field 'IsActive' if possible and deactive records rather than deleting them.
April 9, 2010 at 3:35 am
Hi buddy, got ur solution or looking for something else......
April 9, 2010 at 2:58 pm
Please explain a little more detailed what your looking for. The best way would be to give us some sample data and expected result. Furthermore, you need to define the "direction" where to start.
Example: date values: 3-20, 4-1, 4-6, 4-9, 4-14
If you start at the most recent date, you'd eleminate 4-9 and 4-1. But if you'd start with the oldest date, 4-6 and 4-14 need to be removed...
April 9, 2010 at 3:37 pm
Example: date values: 3-20, 4-1, 4-6, 4-9, 4-14
If you start at the most recent date, you'd eleminate 4-9 and 4-1. But if you'd start with the oldest date, 4-6 and 4-14 need to be removed...
Also, how is your data entered? Is it done on a daily basis throughout the month or bulk entered into a table at the end of a period? That too makes a difference. For instance, if it were done on a daily basis the only records left from Imu92's example would be 3/20 and 4/14. 4/1 would be eliminated when 4/6 was entered, 4/6 would be eliminated when 4/9 was entered, and 4/9 would be eliminated when 4/14 was entered.
If it is data is entered interactively or on a daily basis then the answer is very straightfoward and can go in a trigger or in the procedure you use to insert the data (assuming the same procedure is used for all data inserts).
If you are bulk loading the data at the end of the time period then you need to establish very clear rules for eliminating the data. As has been pointed out, it makes a big difference in the data.
April 10, 2010 at 11:06 am
sorry for the late reply guys
berry...this worked perfect..i cant believe how simple this is...
if anyone is interested, here's the sample script
create table sample_7day (call_id int identity(1,1), UserID int, Post_date datetime)
insert into sample_7day values (1, '20100102')
insert into sample_7day values (1, '20100109')
insert into sample_7day values (1, '20100102')
insert into sample_7day values (2, '20100103')
insert into sample_7day values (3, '20100104')
insert into sample_7day values (3, '20100104')
insert into sample_7day values (3, '20100111')
insert into sample_7day values (3, '20100123')
insert into sample_7day values (4, '20100123')
April 10, 2010 at 7:03 pm
bteraberry (4/8/2010)
Are you wanting to delete any user record if that user has another record within 7 actual days or are you only wanting to do this if it's in the same specific 7 day span (such as Sunday - Saturday)?If it's the former, you can use something like this:
DELETE
FROM YourTable yt
WHERE EXISTS (SELECT sq.RowID
FROM YourTable sq
WHERE sq.UserID = yt.UserID and
sq.RowID > yt.RowID and
DATEDIFF(dd, yt.[Date], sq.[Date]) < 7)
On a side note, I personally dislike doing deletions like this. I would rather add a bit field 'IsActive' if possible and deactive records rather than deleting them.
Hi bteraberry,
First, I absolutely agree... I dislike doing deltions like this and would rather mark a column as you suggest.
On the code side of the house, I'm not sure what I'm doing differently than you but I get the following error when I run your code...
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near 'yt'.
When I make the necessary fix, it has some unintended consequences...
--===== Create a table with all the same UserID and a year's worth of dates
SELECT IDENTITY(INT,1,1) AS RowID,
1 AS UserID,
DATEADD(dd,Number,'20100101') AS Date
INTO dbo.YourTable
FROM Master.dbo.spt_Values
WHERE Type = 'P'
AND Number BETWEEN 0 AND 364
SELECT * FROM dbo.YourTable
--===== Run the repaired delete
DELETE yt
FROM YourTable yt
WHERE EXISTS (SELECT sq.RowID
FROM YourTable sq
WHERE sq.UserID = yt.UserID and
sq.RowID > yt.RowID and
DATEDIFF(dd, yt.[Date], sq.[Date]) < 7)
SELECT * FROM dbo.YourTable
DROP TABLE dbo.YourTable
--Jeff Moden
Change is inevitable... Change for the better is not.
April 10, 2010 at 7:04 pm
BIMind (4/10/2010)
sorry for the late reply guysberry...this worked perfect..i cant believe how simple this is...
if anyone is interested, here's the sample script
create table sample_7day (call_id int identity(1,1), UserID int, Post_date datetime)
insert into sample_7day values (1, '20100102')
insert into sample_7day values (1, '20100109')
insert into sample_7day values (1, '20100102')
insert into sample_7day values (2, '20100103')
insert into sample_7day values (3, '20100104')
insert into sample_7day values (3, '20100104')
insert into sample_7day values (3, '20100111')
insert into sample_7day values (3, '20100123')
insert into sample_7day values (4, '20100123')
There's a potential problem with the code. Please post the actual code that you're using so one of us can test it to make sure that you're not going to delete more than you bargained for.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 10, 2010 at 7:52 pm
Hmmm Jeff..thanks for the input...
i am surprised by the results with your code...i am wondering why...because it works perfectly with the same data i posted above
April 10, 2010 at 7:53 pm
Here's one very high speed method that also has the advantage of being able to stop the code just prior to the final delete to verify what's actually going to be deleted (test data generator included)...
--=============================================================================
-- Create and populate a test table with known data.
-- This section isn't part of the solution. It's just to prove the code.
--=============================================================================
--===== Create a table with all the same UserID and a year's worth of dates
SELECT IDENTITY(INT,1,1) AS RowID,
1 AS UserID,
DATEADD(dd,Number,'20100101') AS Date
INTO dbo.YourTable
FROM Master.dbo.spt_Values
WHERE Type = 'P'
AND Number BETWEEN 0 AND 364
--===== Ok... now add dupes (triplicates really) for all those rows
-- just to make this a "worst case" problem
INSERT INTO dbo.YourTable
SELECT UserID, Date
FROM dbo.YourTable
CROSS JOIN (SELECT 1 UNION ALL SELECT 1)t(N) --Cross join causes the dupes
--===== Show the mess we have on our hands (this section can be deleted)
SELECT * FROM dbo.YourTable ORDER BY UserID, Date, RowID
--=============================================================================
-- Solve the problem
--=============================================================================
--===== First, copy the required data from your table
-- into someplace where we can do a little manipulation
SELECT *, CAST(0 AS TINYINT) AS KeepMe
INTO #Delete
FROM dbo.YourTable
--===== Add the quintessential clustered index
CREATE INDEX IX_#Delete_Guide
ON #Delete (UserID, Date, RowID) WITH FILLFACTOR = 100
--===== Declare and preset some handy variables
DECLARE @PrevUserID INT,
@KeepDate DATETIME,
@PrevKeepMe TINYINT
SELECT @PrevUserID = -1,
@KeepDate = '17530101',
@PrevKeepMe = 0
--===== Mark the rows to either be kept or deleted
UPDATE d
SET @PrevKeepMe = KeepMe
= CASE
WHEN UserID <> @PrevUserID THEN 1
WHEN DATEDIFF(dd,@KeepDate,Date) < 7 THEN 0
ELSE 1
END,
@KeepDate = CASE WHEN @PrevKeepMe = 1 THEN Date ELSE @KeepDate END,
@PrevUserID = UserID
FROM #Delete d WITH(TABLOCKX)
OPTION (MAXDOP 1)
--===== Now, do the final delete on the original table
DELETE yt
FROM dbo.YourTable yt
INNER JOIN #Delete d
ON yt.RowID = d.RowID
AND d.KeepMe = 0
--===== Show the original table after the deletes have occurred
SELECT * FROM dbo.YourTable ORDER BY UserID, Date, RowID
--Jeff Moden
Change is inevitable... Change for the better is not.
April 10, 2010 at 7:57 pm
BIMind (4/10/2010)
Hmmm Jeff..thanks for the input...i am surprised by the results with your code...i am wondering why...because it works perfectly with the same data i posted above
It's because there is no break in the dates for the given UserID... therefor, all the dates are within 7 days of some other date. The code I wrote above should do just fine and it's very fast. Read the comments in the code. The column order of the clustered index on the Temp Table is essential as is the OPTION (MAXDOP 1) hint and the WITH(TABLOCKX) hint.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 10, 2010 at 8:11 pm
Oh... and just to be sure... this code also removes dupes if they occur on the same date. Only the earliest dupe according to the RowID will be kept. Change the clustered index so RowID is DESC if you want to keep the latest dupe according to RowID. If you want to keep all dupes for a given date, then we need to make a change. Lemme know, please.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 12, 2010 at 10:43 am
Jeff...this is pretty close...however this keeps the lowest row ID of any duplicates found and deletes the higher row IDs..i need it the other way around..
i tried row ID desc in the c. index and changing column positions but i cant get it how i want?
April 12, 2010 at 12:45 pm
BIMind (4/12/2010)
Jeff...this is pretty close...however this keeps the lowest row ID of any duplicates found and deletes the higher row IDs..i need it the other way around..i tried row ID desc in the c. index and changing column positions but i cant get it how i want?
I'm not sure which column positions you changed but put them back the way they were. Just changing the ROWID to DESC in the clustered index does the trick.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply