166 days to create index

  • I have a table containing 2,163,568,622 rows hosted on SQL server Standard 2008.

    Attempted to create a new index:

    --------------------------------------------------------------------------------------------------------------------------------------------------

    Using TEMPDB = ON this ran for 2 hours 8 mins and only read 27m rows.

    run time 02:08:12.174

    rows 27,293,752

    Web applications reported periodic timeout messages.

    Cancelled.

    --------------------------------------------------------------------------------------------------------------------------------------------------

    Tried again with TEMPDB = OFF. Ran for 4 hours 34 mins and read 53m rows.

    run time 04:35:09.826

    row 53,166,762

    Web application reported no timeout errors.

    Cancelled.

    --------------------------------------------------------------------------------------------------------------------------------------------------

    TempDB disk space is 500GB , the table I am reading is larger but the resulting index size is smaller.

    13m record reads per hours / 2,163,568,622 rows = 166 days to create this index !

    5 other indexes on this table display about 70% fragmentation. Will try rebuilding these first. Previous rebuilds on existing indexes took approx 1 - 2 hours only.

    Enterprise edition is not an option.

    Index was suggested via tuning advisor and the dev team concur.

    Thanks for ideas.

    Scott

    CREATE NONCLUSTERED INDEX [_dta_index_L_9_375672386__K3_K2_K10_1_6_7_9_11_12] ON [dbo].[L]

    (

    [DID] ASC,

    [JID] ASC,

    [Date] ASC

    )

    INCLUDE ( [LID],

    [Ig],

    [Sp],

    [Od],

    [TID],

    [SID]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

    CREATE TABLE [dbo].[L](

    [LID] [int] IDENTITY(1,1) NOT NULL,

    [JID] [int] NULL,

    [DID] [int] NOT NULL,

    [Lat] [decimal](20, 10) NULL,

    [Long] [decimal](20, 10) NULL,

    [Ig] [bit] NULL,

    [Sp] [decimal](4, 1) NULL,

    [Co] [decimal](4, 1) NULL,

    [Od] [decimal](8, 2) NULL,

    [Date] [datetime] NULL,

    [TID] [int] NULL,

    [SID] [int] NULL,

    [In] [datetime] NULL,

    [Icon] [int] NULL,

    [Re] [bit] NULL,

    [Lst] [int] NULL,

    [Cel] [decimal](13, 8) NULL,

    [Ch] [decimal](13, 8) NULL,

    [St] [int] NULL,

    [Ce] [int] NULL,

    [La] [int] NULL,

    [Co] [int] NULL,

    [So] [decimal](13, 8) NULL,

    CONSTRAINT [L_PK] PRIMARY KEY CLUSTERED

  • First, please, for clarity's sake, rename that index. Anytime I open a server and see _dta at the start of an index, I almost want to cry.

    I would suggest you capture the wait statistics on the process as it's running in order to understand where your slow-down is coming from. I'd think it's either in memory or possibly waiting on I/O within tempdb, but understanding why and where is going to be a big help.

    How much memory does the system have? Is tempdb and your database on the same disk? What's the clustered index on? I can see that you have one, but not the definition.

    But, at the end of the day, you'll probably have to throw hardware at this problem. You're trying to sort a very large data set. You need memory and lots of fast disks to get it done.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the reply.

    "First, please, for clarity's sake, rename that index. Anytime I open a server and see _dta at the start of an index, I almost want to cry."

    Can do - sorry.

    "I would suggest you capture the wait statistics on the process as it's running in order to understand where your slow-down is coming from."

    I captured the following dueing index creation using SP_WHO3.

    ---------------------------------------------------------------------------------

    dd hh ss ss mss: 00 02:08:12.174

    sql_text: <?query -- insert [dbo].[L] select * from [dbo].[L] option (maxdop 1) --?>

    reads: 27,293,752

    writes: 662

    wait info: NULL

    blocking session: NULL

    TempDB = on.

    ---------------------------------------------------------------------------------

    dd hh ss ss mss: 00 02:11:41.440

    sql_text: <?query -- insert [dbo].[L] select * from [dbo].[L] option (maxdop 1)--?>

    reads: 27,905,369

    writes: 675

    wait info: (26544ms)PREEMPTIVE_OS_WRITEFILEGATHER

    blocking session :NULL

    TempDB = on.

    ---------------------------------------------------------------------------------

    dd hh ss ss mss: 00 00:01:29.580

    sql_test: <?query -- insert [dbo].[L] select * from [dbo].[L] option (maxdop 1) --?>

    reads: 524,440

    writes: 5

    wait info: (33ms)PAGEIOLATCH_SH:DM2:1(*)

    blocking session: NULL

    TempDB = off.

    -----------------------------------------------------------------------------------

    dd hh ss ss mss:00 04:35:09.826

    sql_test: <?query -- insert [dbo].[L] select * from [dbo].[L] option (maxdop 1) --?>

    reads: 53,166,762

    writes: 4,084

    wait info: NULL

    blocking session: NULL

    TempDB = off.

    ----------------------------------------------------------------------------------

    "I'd think it's either in memory or possibly waiting on I/O within tempdb, but understanding why and where is going to be a big help."

    I use Quest Spotlight to monitor blocking and locking performance.

    The "WAIT STATISICS" tab is currently showing 75% MISC WAITS.

    I will check it again during the index creation and publish the results.

    Anything specific to look for ?

    "How much memory does the system have? "

    96GB

    "Is tempdb and your database on the same disk?"

    m: data 2.18tb (raid10)

    n: logs 553gb (raid1)

    t: tempdb 558gb (raid1)

    "What's the clustered index on? I can see that you have one, but not the definition."

    Clustered index on Primary Key LID (ASC).

    NAME: LID

    SORT: ASC

    DATA TYPE: INT

    Size: 4

    Idenity: YES

    Allow nulls: NO

    ALTER TABLE [dbo].[L] ADD CONSTRAINT [L_PK] PRIMARY KEY CLUSTERED

    (

    [LID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    "But, at the end of the day, you'll probably have to throw hardware at this problem. You're trying to sort a very large data set. You need memory and lots of fast disks to get it done. "

    15x600GB (SAS) disks in a NexSan SaysBoy. (1 hot spare).

    Disk performance during daily operations is great.

    SQL performance during daily operation is great.

    - 18% CPU utilisation

    - 64Gb of 96gb used

    - cache hit rate is 82%.

    - no deadlocking

    - low number of blocked processes

    - lock waits 2-5 seconds on Object Locks

    Maintenance on a large single table:

    - can rebuild in 1 - 2 hours but some writes will timeout.

    - can delete 3000 rows per min before causing blocking problems (2.8m inserts per day in this table - another table has more but can truncate).

    - cannot create new index tbl.L.

    We are very heavy write and low read in general during normal operations.

    Thanks for any advice.

    Scott

  • scott_lotus (4/29/2014)


    I use Quest Spotlight to monitor blocking and locking performance.

    The "WAIT STATISICS" tab is currently showing 75% MISC WAITS.

    I will check it again during the index creation and publish the results.

    Anything specific to look for ?

    Just what the actual waits are. You already have two that are interesting. PREEMPTIVE_OS_WRITEFILEGATHER is an indication of file growth occurring and your system is waiting on it. So, if you're growing files during this process, it'll certainly slow you down. You'd want to grow those files ahead of time so it's not waiting. You're also seeing PAGEIOLATCH_SH which is flat out an indication you're waiting on the disk for some other process to clear. But those waits seem to be associated with an INSERT query, not your index build. You need to see what the index build itself is waiting on.

    ALTER TABLE [dbo].[L] ADD CONSTRAINT [L_PK] PRIMARY KEY CLUSTERED

    (

    [LID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    Tiny thing, unlikely to help overall, but since your PK is on LID, you don't need to have that in the INCLUDE statement. SQL Server will just ignore it anyway since the key for the PK is automatically stored with the nonclustered index.

    15x600GB (SAS) disks in a NexSan SaysBoy. (1 hot spare).

    Disk performance during daily operations is great.

    SQL performance during daily operation is great.

    - 18% CPU utilisation

    - 64Gb of 96gb used

    - cache hit rate is 82%.

    - no deadlocking

    - low number of blocked processes

    - lock waits 2-5 seconds on Object Locks

    Maintenance on a large single table:

    - can rebuild in 1 - 2 hours but some writes will timeout.

    - can delete 3000 rows per min before causing blocking problems (2.8m inserts per day in this table - another table has more but can truncate).

    - cannot create new index tbl.L.

    We are very heavy write and low read in general during normal operations.

    Thanks for any advice.

    Scott

    You have a reasonably hefty system. I'd think it could handle what you're adding to it, but you need to identify where the bottleneck is first, then understand why it's a bottleneck. Early, and incomplete, indications are that it's disk related, but you need to get more accurate information about the specific blocking occurring in your index creation script.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • What indexes does SQL report are missing? What is the usage of existing indexes on that table?

    There's a reasonable chance that the table should be clustered by [DID] rather than [LID], but obviously not enough info yet to decide that. And would need to know the insert pattern of [DID] column.

    First, please run the queries below, after changing the db name and table name if needed, and post the results:

    --!! chg to your db name

    USE [<your_db_name>]

    SET DEADLOCK_PRIORITY LOW --probably irrelevent, but just in case

    DECLARE @list_missing_indexes bit

    DECLARE @table_name_pattern sysname

    SET @list_missing_indexes = 1 --1=list missing index(es); 0=don't.

    --!! put your table name/name pattern here

    SET @table_name_pattern = 'L'

    PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

    --SELECT create_date FROM sys.databases WITH (NOLOCK) WHERE name = N'tempdb'

    IF @list_missing_indexes = 1

    BEGIN

    SELECT

    GETDATE() AS capture_date,

    DB_NAME(mid.database_id) AS Db_Name,

    OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,

    mid.equality_columns, mid.inequality_columns, mid.included_columns,

    ca1.sql_up_days AS days_in_use,

    migs.*,

    mid.statement, mid.object_id, mid.index_handle

    FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)

    CROSS APPLY (

    SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'

    ) AS ca1

    LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON

    mig.index_handle = mid.index_handle

    LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON

    migs.group_handle = mig.index_group_handle

    --order by

    --DB_NAME, Table_Name, equality_columns

    WHERE

    1 = 1

    AND mid.database_id = DB_ID()

    AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern

    ORDER BY

    --avg_total_user_cost * (user_seeks + user_scans) DESC,

    Db_Name, Table_Name, equality_columns, inequality_columns

    END --IF

    -- list index usage stats (seeks, scans, etc.)

    SELECT

    ius2.row_num, DB_NAME() AS db_name,

    i.name AS index_name,

    OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name,

    i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,

    dps.row_count,

    SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,

    ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,

    ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,

    fk.Reference_Count AS fk_ref_count,

    FILEGROUP_NAME(i.data_space_id) AS filegroup_name,

    ca1.sql_up_days AS days_in_use,

    ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,

    ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update

    FROM sys.indexes i WITH (NOLOCK)

    INNER JOIN sys.objects o WITH (NOLOCK) ON

    o.object_id = i.object_id

    CROSS APPLY (

    SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'

    ) AS ca1

    OUTER APPLY (

    SELECT

    ', ' + COL_NAME(object_id, ic.column_id)

    FROM sys.index_columns ic

    WHERE

    ic.key_ordinal > 0 AND

    ic.object_id = i.object_id AND

    ic.index_id = i.index_id

    ORDER BY

    ic.key_ordinal

    FOR XML PATH('')

    ) AS key_cols (key_cols)

    OUTER APPLY (

    SELECT

    ', ' + COL_NAME(object_id, ic.column_id)

    FROM sys.index_columns ic

    WHERE

    ic.key_ordinal = 0 AND

    ic.object_id = i.object_id AND

    ic.index_id = i.index_id

    ORDER BY

    COL_NAME(object_id, ic.column_id)

    FOR XML PATH('')

    ) AS nonkey_cols (nonkey_cols)

    LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

    dps.object_id = i.object_id AND

    dps.index_id = i.index_id

    LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON

    ius.database_id = DB_ID() AND

    ius.object_id = i.object_id AND

    ius.index_id = i.index_id

    LEFT OUTER JOIN (

    SELECT

    database_id, object_id, MAX(user_scans) AS user_scans,

    ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans

    FROM sys.dm_db_index_usage_stats WITH (NOLOCK)

    WHERE

    database_id = DB_ID()

    --AND index_id > 0

    GROUP BY

    database_id, object_id

    ) AS ius2 ON

    ius2.database_id = DB_ID() AND

    ius2.object_id = i.object_id

    LEFT OUTER JOIN (

    SELECT

    referenced_object_id, COUNT(*) AS Reference_Count

    FROM sys.foreign_keys

    WHERE

    is_disabled = 0

    GROUP BY

    referenced_object_id

    ) AS fk ON

    fk.referenced_object_id = i.object_id

    WHERE

    i.object_id > 100 AND

    i.is_hypothetical = 0 AND

    i.type IN (0, 1, 2) AND

    o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND

    (

    o.name LIKE @table_name_pattern AND

    o.name NOT LIKE 'dtprop%' AND

    o.name NOT LIKE 'filestream[_]' AND

    o.name NOT LIKE 'MSpeer%' AND

    o.name NOT LIKE 'MSpub%' AND

    --o.name NOT LIKE 'queue[_]%' AND

    o.name NOT LIKE 'sys%'

    )

    --AND OBJECT_NAME(i.object_id, DB_ID()) IN ('tbl1', 'tbl2', 'tbl3')

    ORDER BY

    --row_count DESC,

    --ius.user_scans DESC,

    --ius2.row_num, --user_scans+user_seeks

    -- list clustered index first, if any, then other index(es)

    db_name, table_name, CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END, index_name

    PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Have you tried using the ONLINE = ON option in your index creation statement?

  • Michael Valentine Jones (4/29/2014)


    Have you tried using the ONLINE = ON option in your index creation statement?

    I believe OP said that Enterprise Edition was not an option.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • As Grant said, it looks like the system might be waiting on file growth. Make an educated guess on how big the Leaf Level of the index is going to be and grow your database at least that much.

    Also, if you're in the FULL recovery model, your log file is going to grow a lot on this one. Make a backup just before you're ready to try another rebuild and the set the server to the BULK LOGGED recovery model. Your point-in-time backups will take the hit of not being able to recover to a point in time because, in the BULK LOGGED mode, CREATE INDEX is minimally logged.

    Also, what kind of table is this? Is it an "audit" or "order detail" type of table by any chance? Do you have a "Date_Created" column on this table and, regardless of type, is it "temporal" and the old rows are mostly static?

    As for the new index, everyone on your team realizes that NCIs are a duplication of data that includes all keys, includes, and the PK columns, right? You sure you REALLY want to make an index that turns out to be 9 columns wide (PK includes in the INCLUDEs)? Are you sure that key lookups are going to hurt that bad? Are you sure that your backups can handle the extra load and that you actually have the time to do the backups each night. Are you really, really sure that you want another index and set of stats to maintain on this relatively large table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Maybe not relevant, but.... the column [LID] has datatype [int] with IDENTITY(1,1).

    The table contains 2,163,568,622 rows.

    Max size of an integer is 2,147,483,647.

    I guess it's possible that negative numbers have been used, but otherwise I see a problem!

  • Thank you again for the reply , very helpful.

    Grant Fritchey (4/29/2014)


    Just what the actual waits are. You already have two that are interesting. PREEMPTIVE_OS_WRITEFILEGATHER is an indication of file growth occurring and your system is waiting on it. So, if you're growing files during this process, it'll certainly slow you down. You'd want to grow those files ahead of time so it's not waiting. You're also seeing PAGEIOLATCH_SH which is flat out an indication you're waiting on the disk for some other process to clear. But those waits seem to be associated with an INSERT query, not your index build. You need to see what the index build itself is waiting on.

    Ok will grow the MDF.

    ScottPletcher (4/29/2014)


    First, please run the queries below, after changing the db name and table name if needed, and post the results:

    Results are as follows (unmasked names - apologies for masking before - hope this formatting is legible):

    2014-04-30 08:37:19.390DM2LocationNULL[DateTimeInserted][DID]48339384434302014-04-30 00:02:00.940NULL30685.116185009395.1800NULLNULL00[DM2].[dbo].[Location]375672386339383

    2014-04-30 08:37:19.390DM2 Location[DID][SentDate][LID], [JID], [Latitude], [Longitude], [Ignition], [Speed], [Course], [Odometer], [PacketTypeID], [PacketStatusID], [DateTimeInserted], [DisplayIcon], [RESETFlag], [LastAckTime], [CellVoltage], [CellChargeVoltage], [GSMStrength], [GSMCID], [GSMLacID], [CoulombCounter], [SourceVoltage]48211032723130302014-04-30 08:18:30.490NULL3.1263729958005593.4500NULLNULL00[DM2].[dbo].[Location]3756723862110326

    79DM2 Location_PKLocation12166456225LID NULL5681059650464500791017587392014-04-30 08:37:15.4402014-04-30 00:02:00.9402014-04-30 08:37:19.3402014-04-30 08:37:19.3471PRIMARY480700NULL2014-04-26 00:54:22.787NULLNULL

    79DM2_dta_index_Location_100_375672386__K2D_K7_1_3_4_5_6_8_9_10_11_12_13_14_15_16Location172166456225JID, SpeedCourse, DateTimeInserted, DID, DisplayIcon, Ignition, LastAckTime, Latitude, LID, Longitude, Odometer, PacketStatusID, PacketTypeID, RESETFlag, SentDate17193336001017587392014-04-30 08:37:19.283NULLNULL2014-04-30 08:37:19.3471PRIMARY480000NULLNULLNULLNULL

    79DM2 _dta_index_Location_5_103671417__K2_K1_K3_4_5_6_7_8_9_10_11_12_13_14Location22166456225JID, LID, DID, Course, DateTimeInserted, DisplayIcon, Ignition, Latitude, Longitude, Odometer, PacketStatusID, PacketTypeID, SentDate, Speed50709411001017587392014-04-30 08:37:19.340NULLNULL2014-04-30 08:37:19.3471PRIMARY480000NULLNULLNULLNULL

    79DM2 _dta_index_Location_5_103671417__K3_K1Location32166456225DID, LID NULL29265001017587222014-04-30 08:36:54.340NULLNULL2014-04-30 08:37:19.3471PRIMARY480000NULLNULLNULLNULL

    79DM2 _dta_index_Location_did_sent_status_locid_invalidsLocation152166456225DID, SentDate, PacketStatusID, LID NULL11839657001017587222014-04-30 08:37:19.307NULLNULL2014-04-30 08:37:19.3471PRIMARY480000NULLNULLNULLNULL

    I used this statement found in this forum which reports same results:

    select * from

    (select user_seeks * avg_total_user_cost * (avg_user_impact * 0.01) as index_advantage,

    migs.* from sys.dm_db_missing_index_group_stats migs) as migs_adv

    inner join sys.dm_db_missing_index_groups as mig on migs_adv.group_handle=

    mig.index_group_handle

    inner join sys.dm_db_missing_index_details as mid on mig.index_handle=mid.index_handle

    where statement like '%location%'

    order by migs_adv.index_advantage

    896.941267673874211032723530702014-04-30 09:48:22.680NULL3.1264128343777193.4500NULLNULL002110327211032621103269375672386[DID][SentDate][LID], [JID], [Latitude], [Longitude], [Ignition], [Speed], [Course], [Odometer], [PacketTypeID], [PacketStatusID], [DateTimeInserted], [DisplayIcon], [RESETFlag], [LastAckTime], [CellVoltage], [CellChargeVoltage], [GSMStrength], [GSMCID], [GSMLacID], [CoulombCounter], [SourceVoltage][DM2].[dbo].[Location]

    1255862.02415035339384434302014-04-30 00:02:00.940NULL30685.116185009395.1800NULLNULL003393843393833393839375672386NULL[DateTimeInserted][DID][DM2].[dbo].[Location]

    Jeff Moden (4/29/2014)


    As Grant said, it looks like the system might be waiting on file growth. Make an educated guess on how big the Leaf Level of the index is going to be and grow your database at least that much.

    Good idea.

    USE DM2

    GO

    ALTER DATABASE DM2

    MODIFY FILE

    (NAME = DM2,

    SIZE = 1375GB)

    GO

    -- current size 1350004736KB

    Jeff Moden (4/29/2014)


    Also, if you're in the FULL recovery model ...

    In SIMPLE MODE.

    Jeff Moden (4/29/2014)


    Also, what kind of table is this? Is it an "audit" or "order detail" type of table by any chance? Do you have a "Date_Created" column on this table and, regardless of type, is it "temporal" and the old rows are mostly static?

    Logs location data , approx 2.5m rows per day.

    6 hours a day we execute the following in a bid to reduce the table size.

    delete top (3000) from location where -- DID is unallocated

    This has been added fairly recently, causing index fragmentation.

    Jeff Moden (4/29/2014)


    As for the new index, everyone on your team realizes that NCIs are a duplication of data that includes all keys, includes, and the PK columns, right? You sure you REALLY want to make an index that turns out to be 9 columns wide (PK includes in the INCLUDEs)? Are you sure that key lookups are going to hurt that bad? Are you sure that your backups can handle the extra load and that you actually have the time to do the backups each night. Are you really, really sure that you want another index and set of stats to maintain on this relatively large table?

    All good questions. Yes we understand the space considerations when adding NCIs. We have an overnight service gathering reporting data. This service timed out on a specific report (location table) after 20 mins if concurrent instances of this reports are running (WCF service appears to allow 5 max) but execute successfully one at a time. The service cursors which i believe is inefficient.

    OPTION (MAXDOP 1) has been set to no effect. NOLOCK (which i understand is not best practice) has been tested to no effect (not blocking issues). Query appears to SCAN. Tunning Advisor recommended the index listed above. Independently the dev team came to the same conclusion.

    Before attempting to add the index its clearer as a result of this post we should investigate:

    - growing the mdf manually.

    - rebuild existing indexes.

    - amend the report service to allow administrative control over the number of concurrent executions and set to 1 for testing.

    - investigate the removal of the cursor / making the query more efficient.

    - add a statistics job to table location (disabled some time ago) , ensure this is run often

    - analyse the query in tuning advisor again

    If the index still needs to be added then:

    - grow the MDF to the appropriate size.

    - min number of columns

    Thank you for taking the time to respond.

    Scott

  • I may have missed some information as this is a long thread but I noticed that you have not set a MAXDOP value when creating your index, if the MAXDOP setting on the server is set to say 1 for example, you will be limiting the index creation statement to only use a single thread.

    I have came across this scenario before and setting the MAXDOP = X value in the index creation statement to a higher value to utilise more CPU has dramatically reduced the index creation time.

    Be sure though to not set the value to high that it affects the overall performance of the SQL instance.

    MCITP SQL 2005, MCSA SQL 2012

  • I strongly recommend looking over the execution plan for the query and do your own investigation and tuning. The DTA is notorious for making poor choices at times and you seem to have multiple indexes from the DTA.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (4/30/2014)


    I strongly recommend looking over the execution plan for the query and do your own investigation and tuning. The DTA is notorious for making poor choices at times and you seem to have multiple indexes from the DTA.

    That is an interesting comment. I had read in the past to avoid creating your own indexes because DTA was more likely to do it correctly.

  • scott_lotus (4/30/2014)


    That is an interesting comment. I had read in the past to avoid creating your own indexes because DTA was more likely to do it correctly.

    I would love to know where you read that (assuming it wasn't on Microsoft documentation about the DTA). I really don't know anyone who suggests using the DTA except in cases where there is literally no one who knows anything about SQL Server and you're desperate for some hope of better performance. Some of its recommendations can even be dangerous for performance. The kindest thing I can say about it is, always thoroughly test its recommendations.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • @scott_lotus,

    I agree with Grant and I believe that you'll find that most of the heavy hitters (on this site, anyway) are very leary of taking DTA at its word. Looking at the width of some of your indexes (all, apparently, DTA recommendations), they appear to be full covering indexes. While that's not necessarily bad, it's also not necessarily good and it may be that less wide indexes with the related key row lookups might not be that bad and could have the advantage of reducing you maintenance windows and table/index footprint without slowing down the related queries in a significant manner.

    Of course, if you have the space and the maintenance window has the time, it all might be a moot point.

    The reason why I asked what type of table this is is to get a feel for the temporal nature of the table. It would appear that this is, indeed, a temporal log table where the rows are inserted and never updated. It may be that your could benefit greatly from partitioning. Since you have the Standard Edition, that would require a partitioned View rather than a partitioned Table, but the effort may very well be worth the effort. Once automated, it'll save on just about all aspects of the table especially if you do something like partitioning by month and placing the table for each month on separate file groups. It would also give you the hidden advantage of being able to do things like piece-meal restores should that need ever raise its ugly head.

    Partitioned Views have the disadvantage of requiring whole row inserts if you want the view to be updateable. You can get by the problem by writing an Instead-Of trigger for INSERTs so that you can still have an identity column on the underlying tables. It may also help your queries through "partition elimination" if one of the main criteria for your query is the date the rows were created. It would also substantially reduce index maintenance time if you truly do write rows and never update them. We could also show you how to "pack" the file groups once you did the empty DID deletes so that you don't have unneccessarily large file groups that reserve a lot of blank space forever.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply