Compare and Delete rows from 2 different tables in 2 different databases - same instance...

  • Lynn Pettis (8/16/2012)


    SQL_ME_RICH (8/16/2012)


    Lynn - Would I need to modify the WHERE clause in the sub-query if I wanted to do this in batches? So it's not trying to eat the elephant all at once?

    I tried running this last night, but it ran for over 17 hours and still did not complete.

    🙁

    The subquery already has a WHERE clause. What you need, most likely, is an index on the t2.PK_Column column on the table ARCH.dbo.MyTable.

    And if t2.PK_Column is unique (asit appears to be from earlier posts), the index should be a unique index as well.

  • Also, you could probably get away with a nonclustered index.

  • Let me give it a try - also, in addition to indexing, would there be a better benefit to increasing the batch size, or would making the batch size smaller (in your opinion) possibly speed things up?

    Thanks again!

  • Also - is doing this in batches instead of all at once not the best approach? The environment I am supporting is not a very robust one, and the CPU is being pegged by this process, in addition to creating a lock of locking and dead locking.

  • SQL_ME_RICH (8/16/2012)


    Also - is doing this in batches instead of all at once not the best approach? The environment I am supporting is not a very robust one, and the CPU is being pegged by this process, in addition to creating a lock of locking and dead locking.

    Slight change:

    USE AutoDispatch;

    go

    declare @BatchSize int;

    set @BatchSize = 10000;

    WHILE @BatchSize > 0

    BEGIN

    DELETE TOP (@BatchSize)

    FROM t1

    FROM dbo.yConnections t1

    WHERE EXISTS(SELECT 1 FROM yQueue_Archive.dbo.yConnections t2 WHERE t2.ConnectionID = t1.ConnectionID);

    set @BatchSize = @@ROWCOUNT;

    WAITFOR DELAY '00:00:05' ;-- wait 5 seconds before deleting next batch. Adjust as needed.

    END

  • SQL_ME_RICH (8/16/2012)


    Also - is doing this in batches instead of all at once not the best approach? The environment I am supporting is not a very robust one, and the CPU is being pegged by this process, in addition to creating a lock of locking and dead locking.

    If you delete (or attempt to delete) all 140 million rows at once you will force a table lock on the table and no one will be able to do anything that uses it. Also, your t-log will explode as the entire 140 million row delete will be a single transaction.

    Instead of the WAITFOR above, you could substitute a BACKUP LOG to help manage t-log growth.

  • You can shrink the size of the batches being deleted as well.

  • OK - let me do a little more reading on how to implement a log backup in this query from your article earlier, and do some testing on this today. I think it looks solid, and I like the idea of making the PKey unique and non-clustered. At the rate it ran last night (17 hours for 1 million rows) it would take 6 months or longer to complete.

    I will report back with some testing results.

    Thanks again Lynn!

  • SQL_ME_RICH (8/16/2012)


    OK - let me do a little more reading on how to implement a log backup in this query from your article earlier, and do some testing on this today. I think it looks solid, and I like the idea of making the PKey unique and non-clustered. At the rate it ran last night (17 hours for 1 million rows) it would take 6 months or longer to complete.

    I will report back with some testing results.

    Thanks again Lynn!

    Well, think about it, it had to do a table scan for each record it whated to check for existance.

  • Yep - it needed indexing at the very least...I thought about that last night before running it.

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply