how to remove 18 milion bookid from table without using delete ?are there are an

  • i work on sql server 2019 i have two tables table books and booksdetails

    i need to delete all data from table books details that have book id related on table books

    so i make it as below

    books table have 20milions book id

    bookdetails have 19milion book id related

    after delete bookdetails will have one milion

    delete d  from dbo.bookpublishdetails d
    inner join dbo.books b on b.bookid=d.bookid

    so result of applying delete will delete 18 milion from table bookdetails

    so it will take too much time reach too 1 hour although i have clustered indexes on bookid on table dbo.books

    and nonclusterd index on table dbo.bookpublishdetails on colunmn book id

    after delete i will use table dbo.bookpublishdetails to more processes as select and delete and insert on table

    bookdetails

    so are there are another logic best from delete to avoid too much time consumed on delete ?

    ddl details

    create table dbo.book
    (
    bookid int,
    bookname varchar(200),
    ISBN varchar(20),
    PageNumbers int,
    PublishedBy varchar(200)
    )
    create clustered index book_ix on dbo.book(bookid)

    create table dbo.bookpublishdetails
    (
    bookdetailsid int,
    bookid int,
    BookPublishedId int,
    PublishedBy varchar(300)
    )
    create nonclustered index bookdetails_ix on dbo.bookpublishdetails(bookid)

    the goal from question not remove data i only need rows remaining  neglect 18 milions to make some reports for one milin remaining

    1. You can do it in batches  -- e.g.,
      DECLARE @batchCount INT = 4000;
      DECLARE @deleteCount INT = 4000;

      WHILE @deleteCount = @batchCount
      BEGIN
      DELETE TOP(@batchCount)
      FROM dbo.bookpublishdetails d
      INNER JOIN d.bookid b ON b.bookid
      SET @deleteCount = @@ROWCOUNT();
      END

    2. or you can insert the rows you want to keep into another table and either

      • rename tables or switch partitions, or
      • truncate the original & reinsert from the table with only the desired rows

    #2 is often preferable (w/ renaming or partition switching preferable to truncate & reinsert) when the number of rows to be deleted is much larger than the number of rows to be retained.

  • I just want to make sure we get this right.  You're saying that you want to eliminate all book details for books that DO exist in the books table?

    I just can't imagine the the business requirements for why that would need to be done instead of eliminating all the book details that DON'T exist in the Books table.

    Like I said, just making sure because it doesn't seem like the logical thing to do.

    What I'd do is a FULL join between the two tables and use the WHERE clause to identify the IDs that you want to keep and store those in a separate table.  Then, create a new table and copy only the row details that you want to keep into that new table using the IDs from the separate table that you created from the FULL join.

    Then, I'd rename the old table (append the word _Old to the current name) and rename the new table to what the original table was.  Keep the old table for a week or so to make sure no one needs anything from it.  If you do this in the BULK LOGGED Recovery Model and set up to do the copy using minimal logging even with a new Clustered Index already on it, it'll go about twice as fast as copying the data and then adding a Clustered Index with a whole lot less waste (provided that you do the Minimal Logging correctly).  If you can't shift to the BULK LOGGED Recovery Model because of AG or similar requirements, it'll still go a lot faster than doing the DELETES.

    Once the copy is done, you can create the same non-clustered indexes as you have on the old table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And, no... I wouldn't do a delete in batches.  You end up blowing out the log file even with that and you end up with no temporary backup if something goes haywire or the requirements were wrong.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • On top of echoing Jeff's concerns about deleting book details as sounding wrong/dangerous, I'd also question why are you trying to delete book details?

    If it's about performance, then make sure statistics are up-to-date and you have an index that supports  the reporting query. With proper selectivity (don't arbitrarily select all rows or all columns unless you need them) and indexing (and decent I/O), performance shouldn't vary that much whether you have 2 million, 20 million, or 200 million rows.

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply