newbie SQL Server delete rows question

Question

newbie SQL Server delete rows question

bullcrap420

SSC Enthusiast

Points: 152
More actions
June 5, 2008 at 1:21 pm

#187626

I'm dealing with a large table and I need to delete the rows that are greater than the 5000th row let's say. What's the simplest way to get a row number and be able to delete all records in my table after the 5000th record?
Any help would be greatly appreciated.
Cheers!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 1

Bull -

Can you give us some specifics on your table? I'm not sure I'm clear on whether you have a row number or not, and if you don't, how to go about assigning a row number to those rows.

Meaning - there are plenty of methods to do so, but they require having a specific order (or do you want just any 5000 rows, and delete the rest?)

Anyway - your table layout, and how you plan on figuring out which 5000 rows to keep would be helpful...

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

bullcrap420 SSC Enthusiast Points: 152 More actions · Answer 2

Matt,

Here's an example of what I have:

A table of information, w/o row numbers anywhere. The order of what I need deleted is the order that the data has been loaded into the table already...after the nth record I need to delete all of the data.

select * from a1;

test field date num

-------- -------- -------- ---------

1 NULLNULLNULL

2 NULLNULLNULL

3 NULLNULLNULL

4 NULLNULLNULL

5 NULLNULLNULL

10 NULLNULLNULL

7 NULLNULLNULL

122311141 fsf 2001-01-01 00:00:001

3453566456fhrthrj 2001-01-02 00:00:006

423 NULLNULLNULL

324 NULLNULLNULL

645 NULLNULLNULL

If I wanted to remove the 9th, 10th, and 11th, and 12th record shown above, is there a way in SQL to fetch the row_number? There was this row_number() over function I am reading about, but I don't know if that is what I need.

Is there some way to do something that might work like this...?

delete from (select * from a1)

where rownum > 8;

Thank you for your help!

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

The problem you have is that the ROW_NUMBER() you keep hearing about is the row number applied over a specified order, OR a randomized order. Physical order has no meaning in a database table, so even if we were to try to apply a row number or autonumber column, there's no guarantee that it will assign them in the order you're expecting......

To fix this going forward - you should consider adding a column like an identity that would give you some understanding of the order they hit the database (again - even that's not a perfect scenario, but is pretty darn close).

Depending on how big the table is - try selecting the TOP 5000 rows without an order, save them to a temp table, truncate the original table and insert those rows back in. You'll end up with 5000 rows, although I can't guarantee that they will be the FIRST 5000 rows......most likely it will roughly be them, but again - it could read from just about anywhere in the table.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

bullcrap420 SSC Enthusiast Points: 152 More actions · Answer 4

I can't base the row/record deletes on the values because I need the records as they were updated into the table, not greater than that first column value or anything like that. There is not last update date type of field either, otherwise that would have made it easier as well. I just need a way to get everything > the nth row out of my table as it is stored in SQL Server 2005.

Looks like a cursor might do the trick...

SQL Server 2005 Books Online

Deleting Rows in Result Sets

Transact-SQL scripts, stored procedures, and triggers can use the WHERE CURRENT OF clause on a DELETE statement to delete the cursor row on which they currently are positioned. The following example deletes a single row from the EmployeePayHistory table using a cursor named complex_cursor. DELETE affects only the single row currently fetched from the cursor.

Copy Code

USE AdventureWorks;

GO

DECLARE complex_cursor CURSOR FOR

SELECT a.EmployeeID

FROM HumanResources.EmployeePayHistory AS a

WHERE RateChangeDate <>

(SELECT MAX(RateChangeDate)

FROM HumanResources.EmployeePayHistory AS b

WHERE a.EmployeeID = b.EmployeeID) ;

OPEN complex_cursor;

FETCH FROM complex_cursor;

DELETE FROM HumanResources.EmployeePayHistory

WHERE CURRENT OF complex_cursor;

CLOSE complex_cursor;

DEALLOCATE complex_cursor;

GO

Please let me know if you concur or know of an easier or better way to accomplish this.

Thank you!

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

You're still running up against essentially the same problem.

SELECT isn't guaranteed to pull them in physical order, so while you will end up with 5000 rows, some may not be in the "first 5000" at all.

Because of the "there's no such thing as physical order" - you can't rely on that to give you those rows. I'm not sure there IS a way to give them to you without some column that emulate the physical order.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jim SSCrazy Eights Points: 8720 More actions · Answer 6

bullcrap420 (6/5/2008)
SELECT a.EmployeeID
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID)

If this defines the record set you need to delete, why not do the following (or have I missed something here):

DELETE

FROM HumanResources.EmployeePayHistory AS a

WHERE RateChangeDate <>

(SELECT MAX(RateChangeDate)

FROM HumanResources.EmployeePayHistory AS b

WHERE a.EmployeeID = b.EmployeeID)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 7

jim.powers (6/5/2008)
bullcrap420 (6/5/2008)
SELECT a.EmployeeID
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID)
If this defines the record set you need to delete, why not do the following (or have I missed something here):
DELETE
FROM HumanResources.EmployeePayHistory AS a
WHERE RateChangeDate <>
(SELECT MAX(RateChangeDate)
FROM HumanResources.EmployeePayHistory AS b
WHERE a.EmployeeID = b.EmployeeID)

There's a difference (if I understand it) - what Bull was showing only deleted one row ("delete any one row NOT equal to the max") - whereas the operation you're looking at deletes ALL rows not equal to the max.

Besides - that seems to have specific criteria allowing you to exclude what you don't want.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004441 More actions · Answer 8

bullcrap420 (6/5/2008)
Please let me know if you concur or know of an easier or better way to accomplish this.
Thank you!

Heh... I don't concur with any of what you're trying to do... the cursor will likely be your best bet if not terribly slow. But whatever you do, make sure you have a backup before you delete a thing. As the others have stated, you've not specified and order and the physical order of a table cannot be relied upon over time.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

bullcrap420 SSC Enthusiast Points: 152 More actions · Answer 9

Thanks guys.

Looks like the select into using top was a quick/easy way that worked. There is guarantee this way because the data is random from what I am reading, but it randomly grabbed the same records in the same order like I needed based on doing a visual scroll-spot check.

Here's what I ended up doing and it seems to have worked fine:

Let's say I had a table where I only needed the top 4 records that show up when I run.... select * from TableA1;... (The data appears to be in the same exact order every time I run the 'select *...', and I don't need the rest of the data. Data will be modified only if I change it.)

TableA1

1 123 2erh52 adrehfasdf q3sh45

51 17623 25hr2 adfahrsdf q3trh45

21 1253 25hr2 adfahretsdf q3ehr45

21 2123 25e2 adfarehsdf q34reh5

17 1263 23452 adfahesdf q34r5

7 12w3 te252 adfaerhtsdf q3trh45

I just put the 4 records in a new table....

select TOP 4 *

INTO NEW_TableA1

FROM TableA1;

Then, after spot-checking the source data, and what was in the table, it looked good...so I dropped the TableA1 and renamed the NEW_TableA1.

I agree, ideally it would be nice to have a column for row number or some data that I can order by, but I have neither for this. This appeared to have worked fine. Thank you guys for your help! 😀