August 24, 2022 at 6:34 pm
i work on sql server 2019 i have two tables table books and booksdetails
i need to delete all data from table books details that have book id related on table books
so i make it as below
books table have 20milions book id
bookdetails have 19milion book id related
after delete bookdetails will have one milion
delete d from dbo.bookpublishdetails d
inner join dbo.books b on b.bookid=d.bookid
so result of applying delete will delete 18 milion from table bookdetails
so it will take too much time reach too 1 hour although i have clustered indexes on bookid on table dbo.books
and nonclusterd index on table dbo.bookpublishdetails on colunmn book id
after delete i will use table dbo.bookpublishdetails to more processes as select and delete and insert on table
bookdetails
so are there are another logic best from delete to avoid too much time consumed on delete ?
ddl details
create table dbo.book
(
bookid int,
bookname varchar(200),
ISBN varchar(20),
PageNumbers int,
PublishedBy varchar(200)
)
create clustered index book_ix on dbo.book(bookid)
create table dbo.bookpublishdetails
(
bookdetailsid int,
bookid int,
BookPublishedId int,
PublishedBy varchar(300)
)
create nonclustered index bookdetails_ix on dbo.bookpublishdetails(bookid)
the goal from question not remove data i only need rows remaining neglect 18 milions to make some reports for one milin remaining
August 24, 2022 at 6:53 pm
DECLARE @batchCount INT = 4000;
DECLARE @deleteCount INT = 4000;
WHILE @deleteCount = @batchCount
BEGIN
DELETE TOP(@batchCount)
FROM dbo.bookpublishdetails d
INNER JOIN d.bookid b ON b.bookid
SET @deleteCount = @@ROWCOUNT();
END
#2 is often preferable (w/ renaming or partition switching preferable to truncate & reinsert) when the number of rows to be deleted is much larger than the number of rows to be retained.
August 24, 2022 at 7:14 pm
I just want to make sure we get this right. You're saying that you want to eliminate all book details for books that DO exist in the books table?
I just can't imagine the the business requirements for why that would need to be done instead of eliminating all the book details that DON'T exist in the Books table.
Like I said, just making sure because it doesn't seem like the logical thing to do.
What I'd do is a FULL join between the two tables and use the WHERE clause to identify the IDs that you want to keep and store those in a separate table. Then, create a new table and copy only the row details that you want to keep into that new table using the IDs from the separate table that you created from the FULL join.
Then, I'd rename the old table (append the word _Old to the current name) and rename the new table to what the original table was. Keep the old table for a week or so to make sure no one needs anything from it. If you do this in the BULK LOGGED Recovery Model and set up to do the copy using minimal logging even with a new Clustered Index already on it, it'll go about twice as fast as copying the data and then adding a Clustered Index with a whole lot less waste (provided that you do the Minimal Logging correctly). If you can't shift to the BULK LOGGED Recovery Model because of AG or similar requirements, it'll still go a lot faster than doing the DELETES.
Once the copy is done, you can create the same non-clustered indexes as you have on the old table.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2022 at 7:15 pm
And, no... I wouldn't do a delete in batches. You end up blowing out the log file even with that and you end up with no temporary backup if something goes haywire or the requirements were wrong.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 24, 2022 at 7:31 pm
On top of echoing Jeff's concerns about deleting book details as sounding wrong/dangerous, I'd also question why are you trying to delete book details?
If it's about performance, then make sure statistics are up-to-date and you have an index that supports the reporting query. With proper selectivity (don't arbitrarily select all rows or all columns unless you need them) and indexing (and decent I/O), performance shouldn't vary that much whether you have 2 million, 20 million, or 200 million rows.
August 25, 2022 at 12:49 pm
This was removed by the editor as SPAM
August 25, 2022 at 12:52 pm
This was removed by the editor as SPAM
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply