June 5, 2008 at 1:21 pm
I'm dealing with a large table and I need to delete the rows that are greater than the 5000th row let's say. What's the simplest way to get a row number and be able to delete all records in my table after the 5000th record?
Any help would be greatly appreciated.
Cheers!
June 5, 2008 at 1:36 pm
Bull -
Can you give us some specifics on your table? I'm not sure I'm clear on whether you have a row number or not, and if you don't, how to go about assigning a row number to those rows.
Meaning - there are plenty of methods to do so, but they require having a specific order (or do you want just any 5000 rows, and delete the rest?)
Anyway - your table layout, and how you plan on figuring out which 5000 rows to keep would be helpful...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 5, 2008 at 1:50 pm
Matt,
Here's an example of what I have:
A table of information, w/o row numbers anywhere. The order of what I need deleted is the order that the data has been loaded into the table already...after the nth record I need to delete all of the data.
select * from a1;
test field date num
-------- -------- -------- ---------
1 NULLNULLNULL
2 NULLNULLNULL
3 NULLNULLNULL
4 NULLNULLNULL
5 NULLNULLNULL
10 NULLNULLNULL
7 NULLNULLNULL
122311141 fsf 2001-01-01 00:00:001
3453566456fhrthrj 2001-01-02 00:00:006
423 NULLNULLNULL
324 NULLNULLNULL
645 NULLNULLNULL
If I wanted to remove the 9th, 10th, and 11th, and 12th record shown above, is there a way in SQL to fetch the row_number? There was this row_number() over function I am reading about, but I don't know if that is what I need.
Is there some way to do something that might work like this...?
delete from (select * from a1)
where rownum > 8;
Thank you for your help!
June 5, 2008 at 2:04 pm
The problem you have is that the ROW_NUMBER() you keep hearing about is the row number applied over a specified order, OR a randomized order. Physical order has no meaning in a database table, so even if we were to try to apply a row number or autonumber column, there's no guarantee that it will assign them in the order you're expecting......
To fix this going forward - you should consider adding a column like an identity that would give you some understanding of the order they hit the database (again - even that's not a perfect scenario, but is pretty darn close).
Depending on how big the table is - try selecting the TOP 5000 rows without an order, save them to a temp table, truncate the original table and insert those rows back in. You'll end up with 5000 rows, although I can't guarantee that they will be the FIRST 5000 rows......most likely it will roughly be them, but again - it could read from just about anywhere in the table.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 5, 2008 at 2:05 pm
I can't base the row/record deletes on the values because I need the records as they were updated into the table, not greater than that first column value or anything like that. There is not last update date type of field either, otherwise that would have made it easier as well. I just need a way to get everything > the nth row out of my table as it is stored in SQL Server 2005.
Looks like a cursor might do the trick...
SQL Server 2005 Books Online
Deleting Rows in Result Sets
Transact-SQL scripts, stored procedures, and triggers can use the WHERE CURRENT OF clause on a DELETE statement to delete the cursor row on which they currently are positioned. The following example deletes a single row from the EmployeePayHistory table using a cursor named complex_cursor. DELETE affects only the single row currently fetched from the cursor.
Copy Code
USE AdventureWorks;
GO
DECLARE complex_cursor CURSOR FOR
SELECT a.EmployeeID
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID) ;
OPEN complex_cursor;
FETCH FROM complex_cursor;
DELETE FROM HumanResources.EmployeePayHistory
WHERE CURRENT OF complex_cursor;
CLOSE complex_cursor;
DEALLOCATE complex_cursor;
GO
Please let me know if you concur or know of an easier or better way to accomplish this.
Thank you!
June 5, 2008 at 2:11 pm
You're still running up against essentially the same problem.
SELECT isn't guaranteed to pull them in physical order, so while you will end up with 5000 rows, some may not be in the "first 5000" at all.
Because of the "there's no such thing as physical order" - you can't rely on that to give you those rows. I'm not sure there IS a way to give them to you without some column that emulate the physical order.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 5, 2008 at 2:15 pm
bullcrap420 (6/5/2008)
SELECT a.EmployeeID
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID)
If this defines the record set you need to delete, why not do the following (or have I missed something here):
DELETE
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID)
June 5, 2008 at 3:25 pm
jim.powers (6/5/2008)
bullcrap420 (6/5/2008)
SELECT a.EmployeeID
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID)
If this defines the record set you need to delete, why not do the following (or have I missed something here):
DELETE
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID)
There's a difference (if I understand it) - what Bull was showing only deleted one row ("delete any one row NOT equal to the max") - whereas the operation you're looking at deletes ALL rows not equal to the max.
Besides - that seems to have specific criteria allowing you to exclude what you don't want.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 5, 2008 at 4:54 pm
bullcrap420 (6/5/2008)
Please let me know if you concur or know of an easier or better way to accomplish this.Thank you!
Heh... I don't concur with any of what you're trying to do... the cursor will likely be your best bet if not terribly slow. But whatever you do, make sure you have a backup before you delete a thing. As the others have stated, you've not specified and order and the physical order of a table cannot be relied upon over time.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 6, 2008 at 2:45 am
Thanks guys.
Looks like the select into using top was a quick/easy way that worked. There is guarantee this way because the data is random from what I am reading, but it randomly grabbed the same records in the same order like I needed based on doing a visual scroll-spot check.
Here's what I ended up doing and it seems to have worked fine:
Let's say I had a table where I only needed the top 4 records that show up when I run.... select * from TableA1;... (The data appears to be in the same exact order every time I run the 'select *...', and I don't need the rest of the data. Data will be modified only if I change it.)
TableA1
1 123 2erh52 adrehfasdf q3sh45
51 17623 25hr2 adfahrsdf q3trh45
21 1253 25hr2 adfahretsdf q3ehr45
21 2123 25e2 adfarehsdf q34reh5
17 1263 23452 adfahesdf q34r5
7 12w3 te252 adfaerhtsdf q3trh45
I just put the 4 records in a new table....
select TOP 4 *
INTO NEW_TableA1
FROM TableA1;
Then, after spot-checking the source data, and what was in the table, it looked good...so I dropped the TableA1 and renamed the NEW_TableA1.
I agree, ideally it would be nice to have a column for row number or some data that I can order by, but I have neither for this. This appeared to have worked fine. Thank you guys for your help! 😀
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply