how to remove 18 milion bookid from table without using delete ?are there are an

Question

how to remove 18 milion bookid from table without using delete ?are there are an

ahmed_elbarbary.2010

Hall of Fame

Points: 3118
More actions
August 24, 2022 at 6:34 pm

#4080256
i work on sql server 2019 i have two tables table books and booksdetails
i need to delete all data from table books details that have book id related on table books
so i make it as below
books table have 20milions book id
bookdetails have 19milion book id related
after delete bookdetails will have one milion
```
delete d  from dbo.bookpublishdetails d
 inner join dbo.books b on b.bookid=d.bookid
```
so result of applying delete will delete 18 milion from table bookdetails
so it will take too much time reach too 1 hour although i have clustered indexes on bookid on table dbo.books
and nonclusterd index on table dbo.bookpublishdetails on colunmn book id
after delete i will use table dbo.bookpublishdetails to more processes as select and delete and insert on table
bookdetails
so are there are another logic best from delete to avoid too much time consumed on delete ?
ddl details
```
create table dbo.book
 (
 bookid int,
 bookname varchar(200),
 ISBN varchar(20),
 PageNumbers int,
 PublishedBy varchar(200)
 )
 create clustered index book_ix on dbo.book(bookid)
    
 create table dbo.bookpublishdetails
 (
 bookdetailsid int,
 bookid int,
 BookPublishedId int,
 PublishedBy varchar(300)
 )
 create nonclustered index bookdetails_ix on dbo.bookpublishdetails(bookid)
```
the goal from question not remove data i only need rows remaining neglect 18 milions to make some reports for one milin remaining
August 25, 2022 at 12:49 pm

This was removed by the editor as SPAM
August 25, 2022 at 12:52 pm

This was removed by the editor as SPAM

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

ratbak Ten Centuries Points: 1382 More actions · Answer 1

You can do it in batches -- e.g.,

DECLARE @batchCount INT = 4000;
DECLARE @deleteCount INT = 4000;

WHILE @deleteCount = @batchCount 
    BEGIN
        DELETE TOP(@batchCount) 
        FROM dbo.bookpublishdetails d
        INNER JOIN d.bookid b ON b.bookid
        SET @deleteCount = @@ROWCOUNT();
    END

or you can insert the rows you want to keep into another table and either
- rename tables or switch partitions, or
- truncate the original & reinsert from the table with only the desired rows

#2 is often preferable (w/ renaming or partition switching preferable to truncate & reinsert) when the number of rows to be deleted is much larger than the number of rows to be retained.

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 2

I just want to make sure we get this right. You're saying that you want to eliminate all book details for books that DO exist in the books table?

I just can't imagine the the business requirements for why that would need to be done instead of eliminating all the book details that DON'T exist in the Books table.

Like I said, just making sure because it doesn't seem like the logical thing to do.

What I'd do is a FULL join between the two tables and use the WHERE clause to identify the IDs that you want to keep and store those in a separate table. Then, create a new table and copy only the row details that you want to keep into that new table using the IDs from the separate table that you created from the FULL join.

Then, I'd rename the old table (append the word _Old to the current name) and rename the new table to what the original table was. Keep the old table for a week or so to make sure no one needs anything from it. If you do this in the BULK LOGGED Recovery Model and set up to do the copy using minimal logging even with a new Clustered Index already on it, it'll go about twice as fast as copying the data and then adding a Clustered Index with a whole lot less waste (provided that you do the Minimal Logging correctly). If you can't shift to the BULK LOGGED Recovery Model because of AG or similar requirements, it'll still go a lot faster than doing the DELETES.

Once the copy is done, you can create the same non-clustered indexes as you have on the old table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 3

And, no... I wouldn't do a delete in batches. You end up blowing out the log file even with that and you end up with no temporary backup if something goes haywire or the requirements were wrong.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ratbak Ten Centuries Points: 1382 More actions · Answer 4

On top of echoing Jeff's concerns about deleting book details as sounding wrong/dangerous, I'd also question why are you trying to delete book details?

If it's about performance, then make sure statistics are up-to-date and you have an index that supports the reporting query. With proper selectivity (don't arbitrarily select all rows or all columns unless you need them) and indexing (and decent I/O), performance shouldn't vary that much whether you have 2 million, 20 million, or 200 million rows.