delete as head blocker

  • Hi All,

    Need some help in finding out of what is the problem in the query and how can we make the query faster.

    On one of the production environment, a DELETE statement was running over 50 mins and blocked more than 100 sessions.

    I wasn't able to get the actual plan. However, including estimated plan in the attached zip file.

    Below is the SQL statement.

    DELETE FROM c_repos_applied_lock
    WHERE rowid_lock IN (SELECT c_repos_applied_lock.rowid_lock
    FROM c_repos_applied_lock,
    c_repos_table
    WHERE ( c_repos_applied_lock.job_type_str = 'A' )
    AND ( c_repos_table.rowid_table =
    c_repos_applied_lock.rowid_table )
    AND ( (
    Datediff (second, c_repos_applied_lock.create_date,
    CURRENT_TIMESTAMP) ) > (
    c_repos_table.batch_lock_wait_seconds * 3 ) ))

    ran sp_whoisactive during the time of issue and collected some info.

    1

     

    spid 233

    <--sql text -->

    <?query --
    DELETE FROM C_REPOS_APPLIED_LOCK WHERE ROWID_LOCK IN (SELECT C_REPOS_APPLIED_LOCK.ROWID_LOCK FROM C_REPOS_APPLIED_LOCK, C_REPOS_TABLE WHERE (C_REPOS_APPLIED_LOCK.JOB_TYPE_STR = 'A') AND (C_REPOS_TABLE.ROWID_TABLE = C_REPOS_APPLIED_LOCK.ROWID_TABLE) AND ((DATEDIFF (second, C_REPOS_APPLIED_LOCK.CREATE_DATE, CURRENT_TIMESTAMP )) > (C_REPOS_TABLE.BATCH_LOCK_WAIT_SECONDS * 3)))
    --?>

    <--lock info>

    <Database name="CMX_ORS">
    <Locks>
    <Lock request_mode="S" request_status="GRANT" request_count="1" />
    </Locks>
    <Objects>
    <Object name="C_REPOS_APPLIED_LOCK" schema_name="dbo">
    <Locks>
    <Lock resource_type="KEY" index_name="NI_APPLIED_LOCK_EWGPS3" request_mode="U" request_status="GRANT" request_count="56" />
    <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="1" />
    <Lock resource_type="PAGE" page_type="*" index_name="NI_APPLIED_LOCK_EWGPS3" request_mode="IU" request_status="GRANT" request_count="3" />
    </Locks>
    </Object>
    <Object name="C_REPOS_TABLE" schema_name="dbo">
    <Locks>
    <Lock resource_type="OBJECT" request_mode="Sch-S" request_status="GRANT" request_count="1" />
    </Locks>
    </Object>
    </Objects>
    </Database>

     

    spid = 657 update (victim)

    sql text :

    <?query --
    UPDATE C_REPOS_APPLIED_LOCK SET LOCK_QUERY_SQL=@P0, LOCK_EXCLUSIVE_IND=@P1, JOB_TYPE_STR=@P2, MODULE_NAME=@P3, INTERACTION_ID=@P4, LAST_UPDATE_DATE=@P5, UPDATED_BY=@P6 WHERE ROWID_TABLE = @P7 AND LOCK_GROUP_STR=@P8
    --?>

    <--lock info>

    <Database name="CMX_ORS">
    <Locks>
    <Lock request_mode="S" request_status="GRANT" request_count="1" />
    </Locks>
    <Objects>
    <Object name="(null)">
    <Locks>
    <Lock resource_type="METADATA.DATABASE_PRINCIPAL" principal_name="dbo" request_mode="Sch-S" request_status="GRANT" request_count="1" />
    </Locks>
    </Object>
    <Object name="C_REPOS_APPLIED_LOCK" schema_name="dbo">
    <Locks>
    <Lock resource_type="KEY" index_name="NI_APPLIED_LOCK_EWGPS3" request_mode="U" request_status="WAIT" request_count="1" />
    <Lock resource_type="OBJECT" request_mode="IX" request_status="GRANT" request_count="1" />
    <Lock resource_type="PAGE" page_type="*" index_name="NI_APPLIED_LOCK_EWGPS3" request_mode="IU" request_status="GRANT" request_count="1" />
    </Locks>
    </Object>
    </Objects>
    </Database>

     

    Questions:

    1. Why DELETE is slow? It took more than 50 mins and as per app team confirmation, we had to KILL the spid eventually. was it is reads more data? usually, we don't see more than 10-15 rows in this table at any moment of time. I don't know why I see a lot of reads at that time frame. please refer above screenshot.

    2. From sp_wia output I was expecting 'X' locks on the table as it is a DELETE operation, but I see UPDATE locks. Any reason why update locks for delete operation?

    3. Any ways we can re-write of optimize this query. this query is there been for a while.. I don't know why they are doing co-related query/sub query instead of direct where condition. not sure why?

    Is there away to decrease the execution time of this query? can it be done in smaller chunks for the locks to held for shorter duration? I don't know, I am just doing my brain dump of what to ask.

    4. The applied lock is kind of a table which used to hold some lock info. this is very dynamic table. A lots of inserts and deletes happen before acquiring and release this lock. Was the load on the table was more I don't know,

    how to build a narrative at that time frame?

    Please provide some guidance and reasoning on how can we make it faster. if there any specific questions to be asked to the app team, or if more information is needed pls let me know.

    Attachment contains below info.

    -Table structure

    -sql stmt

    -Table structure and index definitions

    -Estimated plan

    -few blocking screenshots


     

    Thanks,

    Sam

     

    Attachments:
    You must be logged in to view attached files.
  • First of all, don't use this construct of you can directly perform the delete from the "in-list-query"

    DELETE RAL
    FROM c_repos_applied_lock RAL
    INNER JOIN c_repos_table RT
    ON RAL.job_type_str = 'A'
    AND RT.rowid_table = RAL.rowid_table
    AND DATEDIFF(second, RAL.create_date, CURRENT_TIMESTAMP) > RT.batch_lock_wait_seconds * 3;

    this part of your query is non-SARGable ! " AND DATEDIFF(second, RAL.create_date, CURRENT_TIMESTAMP) > RT.batch_lock_wait_seconds * 3;" ( functions in where-clause or on-clause )

    Meaning this can only be processed after the full set has been assembled ! ( and probably pumped to tempdb )

    I think your c_repos_table must be smaller than your c_repos_applied_lock, so give this version a swirl:

    DELETE RAL
    FROM c_repos_applied_lock RAL
    INNER JOIN c_repos_table RT
    ON RAL.job_type_str = 'A'
    AND RT.rowid_table = RAL.rowid_table
    AND RAL.create_date < dateadd(ss, RT.batch_lock_wait_seconds * -3 , CURRENT_TIMESTAMP);

     

    I don't think your "job_type_str = 'A'" will filter massively, so locking ( and lock escallation ) can be massive.

    Chances are splitting and using a temptb may be a solution here:

    Select RT.rowid_table 
    , dateadd(ss, RT.batch_lock_wait_seconds * -3 , CURRENT_TIMESTAMP) TsRef
    INTO #TMP
    FROM c_repos_table RT

    DELETE RAL
    FROM c_repos_applied_lock RAL
    INNER JOIN #TMP RT
    ON RAL.job_type_str = 'A'
    AND RT.rowid_table = RAL.rowid_table
    AND RAL.create_date < RT.TsRef;

    TEST IT - test it - TEST IT !!!!

     

    edited: You may also want to narrow the scope using "Delete Top ( nnnn ) ..." to avoid lock escallation.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Although it probably won't make a difference - I would recommend moving filtering from the JOIN to a WHERE:

    DELETE RAL
    FROM c_repos_applied_lock RAL
    INNER JOIN c_repos_table RT ON RT.rowid_table = RAL.rowid_table
    WHERE RAL.job_type_str = 'A'
    AND RAL.create_date < dateadd(ss, RT.batch_lock_wait_seconds * -3 , CURRENT_TIMESTAMP);

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks a lot Johan and Jeffrey.

    One thing, I wanted to ask. Is there a way to run deletes in small batches to minimize blocking especially when child tables are involved i.e. CASCADING deletes are involved ensuring child deletes are also taken care in small batches.

    Also, any share thoughts on Question 1 and Question 2.

     

     

  • And here's the bite in the back by cascading deletes !

    The reason why we only use foreign keys "NO ACTION" is because we want to take full control over the cleanup process.

    Keep in mind cascading deletes may end up processing a whole tree structure of multiple parent - child - grandchild -...  IN A SINGLE TRANSACTION !!

    Taking control of the cleanup process ( " bottom to top" ) means you can also take control over the transaction size and scope.

    Downside (but also a win ) is you must build the whole selection process and determine per table how many (bottom) rows you'll handle in a batch.

    Looping a "delete top( @n) from bottomtable inner join #tempbottomkeys ..."  until @@rowcount < @n gives you the control over the size of your transactions, so you avoid massive (b)locking and lock escalation.

    Of course, you must support all foreign keys with the appropriate indexes on all tables involved.

     

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Remember that deletes affect all indexes (except maybe filtered indexes) and causes a check for every FK pointing at the table.  There's some silly wide tables at one of the companies I do work for and it took almost 6 minutes to delete just one row.  The fix was to index the FK related columns on all of those other referencing tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My guideline for FK indexes is: put indexes in place that match the FK definition 100%, unless it is proven for that case it hurts regular operations.

    And even then, just disable those indexes and rebuild them during a cleanup cycle and disable afterward again.

    This way they stay documented, so you remember why they have been disabled

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Question to some of the panel, would setting the DEADLOCK_PRIORITY  be an option?

    Chris Powell

    George: You're kidding.
    Elroy: Nope.
    George: Then lie to me and say you're kidding.

  • It is an option, but you may end up being deadlocked most of the time, depending the amount of blocking and lock escallation

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply