Clustered Index Update Slow Update On Large Table

  • Hi I am having problem updating a large table with millions of rows please advice to reduce the update time.

    Table


    CREATE TABLE [dbo].[tbl_sms_job_detail](
        [JobDetailID] [int] IDENTITY(1,1) NOT NULL,
        [JobID] [int] NULL,
        [DistributorID] [int] NULL,
        [ResellerID] [int] NULL,
        [CustomerID] [int] NULL,
        [SenderID] [nvarchar](50) NULL,
        [PhoneNumber] [nvarchar](100) NULL,
        [SMSMessage] [nvarchar](1000) NULL,
        [MessageType] [nvarchar](50) NULL,
        [MessageLength] [int] NULL,
        [MessageParts] [int] NULL,
        [ClientRate] [decimal](18, 5) NULL,
        [ClientCost] [decimal](18, 5) NULL,
        [ResellerRate] [decimal](18, 5) NULL,
        [ResellerCost] [decimal](18, 5) NULL,
        [DistributorRate] [decimal](18, 5) NULL,
        [DistributorCost] [decimal](18, 5) NULL,
        [RouteDetailID] [int] NULL,
        [SMSID] [nvarchar](200) NULL,
        [DLRStatus] [nvarchar](100) NULL,
        [ErrorCode] [int] NULL,
        [ErrorDescription] [nvarchar](2000) NULL,
        [SentDate] [datetime] NULL,
        [SentDateUTC] [datetime] NULL,
        [SMSSource] [nvarchar](50) NULL,
        [SMSType] [nvarchar](100) NULL,
        [APISMSID] [int] NULL,
        [DLRDate] [datetime] NULL,
        [DLRDateUTC] [datetime] NULL,
    CONSTRAINT [PK_tbl_sms_job_detail] PRIMARY KEY CLUSTERED
    (
        [JobDetailID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]




    Non Clustered Index

    CREATE NONCLUSTERED INDEX [NonClusteredIndex-20170919-173756] ON [dbo].[tbl_sms_job_detail]
    (
        [JobID] ASC,
        [DLRStatus] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
    GO

    CREATE NONCLUSTERED INDEX [NonClusteredIndex-20170919-174142] ON [dbo].[tbl_sms_job_detail]
    (
        [SMSID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
    GO

    Update Procedure

    CREATE Procedure [dbo].[sp_update_message_status]
    @SMSID nvarchar(200),
    @DLRStatus nvarchar(100),
    @errorcode int,
    @ErrorDescription nvarchar(2000)
    AS
    UPDATE tbl_sms_job_detail SET DLRStatus = @DLRStatus, ErrorCode = @errorcode, ErrorDescription = @ErrorDescription WHERE SMSID = @SMSID

    Execution Plan

    This Procedure is called up to 1000 times in several minutes and some of them fail to update as it takes time to update the previous record what can be done to increase the update of record in this table.

    Thanks

  • The plan doesn't show anything useful.

    How long does the update take?
    Does it always take that long?

    Is the query waiting for anything, and if so, what is the wait type and resource?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • And how many records will it update, typically, for a given value of SMSID ?  Are your statistics up to date?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • So my understanding is that you have this table containing millions of rows and a procedure which updates the following NULL-able columns, sometimes as often as 1,000 times per minute (or 17 times per second). 

      [DLRStatus] [nvarchar](100) NULL,
      [ErrorCode] [int] NULL,
      [ErrorDescription] [nvarchar](2000) NULL,

    I'm guessing the back story is that this table logs or queues SMS messages, row updates occur retoractively for specific messages that fail, and when the messaging service is down this table gets pounded pretty hard. I suspect the issue going on here is page splitting. In addition to causing increased write activity, page splitting can also cause index fragmentation. For this table you need to specify a Fill Factor of maybe 90 or 80 percent. What this does is reserve some un-allocated space in each page as it's initially inserted, so when a row is subsequently updated, changing it's data length, there is room for the row to change in size within the same page, without SQL Server having to split rows onto a new page and shuffle clustered pages around. However, specifying a new Fill Factor on the clustered index would require rebuilding all indexes on the table and therefore a window of downtime.

    So we know for sure what's going on with fill factor and page allocation, the following query will reveal more detail about the allocation and configuration of the table in question.


    SELECT
        -- @@servername AS ServerName
        -- , DB_NAME() AS DatabaseName
        s.name AS SchemaName
        , o.name AS ObjectName
        , MIN(o.create_date) AS CreateDate
        , ISNULL(i.name,'-') AS IndexName
        , i.type_desc AS IndexType
        , pp.data_compression_desc AS CompressionType
        , MAX(CAST(i.is_primary_key AS TINYINT)) AS IsPrimary
        , MAX(CAST(i.is_unique AS TINYINT)) AS IsUnique
        , MAX(CAST(i.is_padded AS TINYINT)) AS IsPadded
        , AVG(i.fill_factor) AS FillFactorPct
        , MAX(STATS_DATE(i.object_id,i.index_id)) AS StatsLastUpdated
        , CASE WHEN SUM(pp2.partition_number) = 2 THEN 'Y' ELSE 'N' END AS IsPartitioned
        , ps.partition_number AS PartitionNumber
        , MAX( ps.record_count ) AS RowsCount
        , (SUM(ps.page_count) * 8192 ) / 1024000 AS SizeTotalMB
        , CAST( AVG( ps.avg_page_space_used_in_percent ) AS SMALLINT ) AS AvgPageFullPct
        , CAST( AVG( ps.avg_fragmentation_in_percent ) AS SMALLINT ) AS AvgFragmentationPct
    FROM sys.dm_db_index_physical_stats(
            DB_ID()    -- database_id
            , NULL    -- object_id
            , NULL    -- index_id
            , NULL    -- partition_number
            , 'SAMPLED' -- LIMITED | SAMPLED | DETAILED
            ) ps
        INNER JOIN sys.indexes i ON i.object_id = ps.object_id AND i.index_id = ps.index_id
        INNER JOIN sys.partitions AS pp ON pp.object_id = ps.object_id AND pp.index_id = ps.index_id AND pp.partition_number = ps.partition_number
        LEFT OUTER JOIN sys.partitions AS pp2 ON pp2.object_id = ps.object_id AND pp2.index_id = ps.index_id AND pp2.partition_number = 2
        INNER JOIN sys.objects o ON o.object_id = ps.object_id
            INNER JOIN sys.schemas s ON s.schema_id = o.schema_id
    WHERE o.name = 'tbl_sms_job_detail'
    GROUP BY
        s.name
        , o.name
        , i.name
        , i.type_desc
        , i.index_id
        , ps.partition_number
        , pp.data_compression_desc;

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • The first thing I would do is a wait stats analysis and file IO stall analysis during the run. If it is on the same spid run in a loop it would be very easy by using sp_whoisactive and it's delay parameter. 

    I have a suspicion that it could be your server is grossly underpowered from a memory and/or (especially) IO write capacity. I say that because the majority of clients I see have that. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Eric M Russell - Wednesday, September 20, 2017 9:16 AM

    So my understanding is that you have this table containing millions of rows and a procedure which updates the following NULL-able columns, sometimes as often as 1,000 times per minute (or 17 times per second). 

      [DLRStatus] [nvarchar](100) NULL,
      [ErrorCode] [int] NULL,
      [ErrorDescription] [nvarchar](2000) NULL,

    I'm guessing the back story is that this table logs or queues SMS messages, row updates occur retoractively for specific messages that fail, and when the messaging service is down this table gets pounded pretty hard. I suspect the issue going on here is page splitting. In addition to causing increased write activity, page splitting can also cause index fragmentation. For this table you need to specify a Fill Factor of maybe 90 or 80 percent. What this does is reserve some un-allocated space in each page as it's initially inserted, so when a row is subsequently updated, changing it's data length, there is room for the row to change in size within the same page, without SQL Server having to split rows onto a new page and shuffle clustered pages around. However, specifying a new Fill Factor on the clustered index would require rebuilding all indexes on the table and therefore a window of downtime.

    So we know for sure what's going on with fill factor and page allocation, the following query will reveal more detail about the allocation and configuration of the table in question.


    SELECT
        -- @@servername AS ServerName
        -- , DB_NAME() AS DatabaseName
        s.name AS SchemaName
        , o.name AS ObjectName
        , MIN(o.create_date) AS CreateDate
        , ISNULL(i.name,'-') AS IndexName
        , i.type_desc AS IndexType
        , pp.data_compression_desc AS CompressionType
        , MAX(CAST(i.is_primary_key AS TINYINT)) AS IsPrimary
        , MAX(CAST(i.is_unique AS TINYINT)) AS IsUnique
        , MAX(CAST(i.is_padded AS TINYINT)) AS IsPadded
        , AVG(i.fill_factor) AS FillFactorPct
        , MAX(STATS_DATE(i.object_id,i.index_id)) AS StatsLastUpdated
        , CASE WHEN SUM(pp2.partition_number) = 2 THEN 'Y' ELSE 'N' END AS IsPartitioned
        , ps.partition_number AS PartitionNumber
        , MAX( ps.record_count ) AS RowsCount
        , (SUM(ps.page_count) * 8192 ) / 1024000 AS SizeTotalMB
        , CAST( AVG( ps.avg_page_space_used_in_percent ) AS SMALLINT ) AS AvgPageFullPct
        , CAST( AVG( ps.avg_fragmentation_in_percent ) AS SMALLINT ) AS AvgFragmentationPct
    FROM sys.dm_db_index_physical_stats(
            DB_ID()    -- database_id
            , NULL    -- object_id
            , NULL    -- index_id
            , NULL    -- partition_number
            , 'SAMPLED' -- LIMITED | SAMPLED | DETAILED
            ) ps
        INNER JOIN sys.indexes i ON i.object_id = ps.object_id AND i.index_id = ps.index_id
        INNER JOIN sys.partitions AS pp ON pp.object_id = ps.object_id AND pp.index_id = ps.index_id AND pp.partition_number = ps.partition_number
        LEFT OUTER JOIN sys.partitions AS pp2 ON pp2.object_id = ps.object_id AND pp2.index_id = ps.index_id AND pp2.partition_number = 2
        INNER JOIN sys.objects o ON o.object_id = ps.object_id
            INNER JOIN sys.schemas s ON s.schema_id = o.schema_id
    WHERE o.name = 'tbl_sms_job_detail'
    GROUP BY
        s.name
        , o.name
        , i.name
        , i.type_desc
        , i.index_id
        , ps.partition_number
        , pp.data_compression_desc;

    Hi Thanks for the help i run this query and got the following results

  • Hi smsid is unique field it will only update one record for each sms id

  • My sever spec is intel xeon 8 core 16 gb ram and 512 gb ssd i do not think server specs is the problem

  • vineet.boom - Thursday, September 21, 2017 2:31 AM

    Hi smsid is unique field it will only update one record for each sms id

    Then why isn't the index on that field specified as unique?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • vineet.boom - Wednesday, September 20, 2017 11:38 PM

    vineet.boom - Wednesday, September 20, 2017 11:38 PM

    Hi Thanks for the help i run this query and got the following results

    According to your execution plan posted earlier, 85% of the cost is the Clustered Index Update operation, which suggests page splits. Looking at this index allocation information, fill factor is 0, which means 100%. If your process is updating NULL-able columns, then I'd recommend filling 90%, which would leave 10% for row growth.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Move the error details to a different table.  That table could be keyed by the clustering key of this table plus perhaps a sequential error number.

    This table should almost certainly be clustered on JobID first rather than identity.  If so, that could save one nonclustered index and perhaps drastically reduce overall overhead, depending on how joins/lookups are typically done.  Ridding your mind of the myth that "every table should be clustered by identity" could massively improve performance of your entire system.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Eric M Russell - Wednesday, September 20, 2017 9:16 AM

    So my understanding is that you have this table containing millions of rows and a procedure which updates the following NULL-able columns, sometimes as often as 1,000 times per minute (or 17 times per second). 

      [DLRStatus] [nvarchar](100) NULL,
      [ErrorCode] [int] NULL,
      [ErrorDescription] [nvarchar](2000) NULL,

    I'm guessing the back story is that this table logs or queues SMS messages, row updates occur retoractively for specific messages that fail, and when the messaging service is down this table gets pounded pretty hard. I suspect the issue going on here is page splitting. In addition to causing increased write activity, page splitting can also cause index fragmentation. For this table you need to specify a Fill Factor of maybe 90 or 80 percent. What this does is reserve some un-allocated space in each page as it's initially inserted, so when a row is subsequently updated, changing it's data length, there is room for the row to change in size within the same page, without SQL Server having to split rows onto a new page and shuffle clustered pages around. However, specifying a new Fill Factor on the clustered index would require rebuilding all indexes on the table and therefore a window of downtime.

    So we know for sure what's going on with fill factor and page allocation, the following query will reveal more detail about the allocation and configuration of the table in question.


    SELECT
        -- @@servername AS ServerName
        -- , DB_NAME() AS DatabaseName
        s.name AS SchemaName
        , o.name AS ObjectName
        , MIN(o.create_date) AS CreateDate
        , ISNULL(i.name,'-') AS IndexName
        , i.type_desc AS IndexType
        , pp.data_compression_desc AS CompressionType
        , MAX(CAST(i.is_primary_key AS TINYINT)) AS IsPrimary
        , MAX(CAST(i.is_unique AS TINYINT)) AS IsUnique
        , MAX(CAST(i.is_padded AS TINYINT)) AS IsPadded
        , AVG(i.fill_factor) AS FillFactorPct
        , MAX(STATS_DATE(i.object_id,i.index_id)) AS StatsLastUpdated
        , CASE WHEN SUM(pp2.partition_number) = 2 THEN 'Y' ELSE 'N' END AS IsPartitioned
        , ps.partition_number AS PartitionNumber
        , MAX( ps.record_count ) AS RowsCount
        , (SUM(ps.page_count) * 8192 ) / 1024000 AS SizeTotalMB
        , CAST( AVG( ps.avg_page_space_used_in_percent ) AS SMALLINT ) AS AvgPageFullPct
        , CAST( AVG( ps.avg_fragmentation_in_percent ) AS SMALLINT ) AS AvgFragmentationPct
    FROM sys.dm_db_index_physical_stats(
            DB_ID()    -- database_id
            , NULL    -- object_id
            , NULL    -- index_id
            , NULL    -- partition_number
            , 'SAMPLED' -- LIMITED | SAMPLED | DETAILED
            ) ps
        INNER JOIN sys.indexes i ON i.object_id = ps.object_id AND i.index_id = ps.index_id
        INNER JOIN sys.partitions AS pp ON pp.object_id = ps.object_id AND pp.index_id = ps.index_id AND pp.partition_number = ps.partition_number
        LEFT OUTER JOIN sys.partitions AS pp2 ON pp2.object_id = ps.object_id AND pp2.index_id = ps.index_id AND pp2.partition_number = 2
        INNER JOIN sys.objects o ON o.object_id = ps.object_id
            INNER JOIN sys.schemas s ON s.schema_id = o.schema_id
    WHERE o.name = 'tbl_sms_job_detail'
    GROUP BY
        s.name
        , o.name
        , i.name
        , i.type_desc
        , i.index_id
        , ps.partition_number
        , pp.data_compression_desc;

    Adding to the above query, you can use the https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-os-wait-stats-transact-sql#WaitTypes dmv to analyse the WAITS if any AND 
    https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-exec-query-stats-transact-sql for the Query stats.

    First solve the problem then write the code !

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply