May 10, 2021 at 3:42 pm
Hi
I have a package that runs weekly
It has an intermittant problem
There are a number of tables that have a reasonable amount of data in
Once built indexes are added. This includes clustered
Around 1 in every 3 times, one specific table takes around 8-12 hours to add the clustered index
A typical build would be around 15 mins
There's only 80M rows and the index is narrow
On investigation, LATCH_EX is showing as the wait type for the duration
Is there a better approach when you see LATCH_EX as a wait type
I'm thinking leave the CIX on (this is something i've been looking at else where as I've been advised to test - although that was a seperate issue and did not have LATCH_EX waits)
OR pre-order by the clustered index i.e. load in that order then apply the cluster
OR something else?
I haven't really seen this when adding a clustered index before
Thanks
- Damian
May 10, 2021 at 3:49 pm
Create the clustered index only before loading the table. Create the non-clustered indexes after loading the table.
If this is a permanent table, seriously consider using data compression. Although it takes CPU and elapsed time to compress the data, often the I/O saved makes the table still load faster and you still get the benefits of much faster reads from the table from then on.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 10, 2021 at 3:53 pm
Great, thanks
I will give that a go
- Damian
May 11, 2021 at 11:43 am
A quick note on why creating a clustered index first and then loading the data is usually the best approach (testing is always your best friend). The clustered index defines the data storage. It's not simple an index. A clustered index is the table. So, when you create a table, without a clustered index, you're defining what is called a heap. It stores the data one way. When you then tell that table, now you have a clustered index, you're not "adding" an index. Instead, you're telling that table, now you ARE a clustered index. This means that the data has to be moved. It's simply adding a thing on the side. It's literally moving the data out of the heap and into the cluster. So, you're going to see lots and lots of IO, and possibly lots and lots of contention for researches, making that process really painful.
Hope that helps a little.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 11, 2021 at 5:42 pm
Ummm.... ok... I want to remind you that NOTHING is a panacea! And, with that, I have a couple of questions...
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2021 at 2:57 pm
Thanks
In answer to your question:
1. The source is another table in another database and I am using a dataflow task to move it from one table to another. The source is a Stored procedure
So, within SSIS, it’s split in 3. One task to drop the index, one task to extract the data and one task to create the index
It’s the index creation that seems to be the issue here
2. Full recovery model
3. Not really
4. The data is not updated; it’s read for the next phase
5. Wholesale replacement
6.
CREATE TABLE [InboundExp].[CreditAccountHeader](
[CreditAccountHeaderID] [bigint] NOT NULL,
[PersonID] [bigint] NOT NULL,
[AccountTypeID] [smallint] NOT NULL,
[CompanyTypeID] [smallint] NOT NULL,
[SpecialInstructionID] [smallint] NOT NULL,
[SourceCode] [int] NULL
) ON [PRIMARY]
GO
CREATE CLUSTERED INDEX [IXC_CreditAccountHeader] ON [InboundExp].[CreditAccountHeader]
(
[PersonID] ASC,
[CreditAccountHeaderID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
- Damian
May 12, 2021 at 6:21 pm
K. Thanks. One of your answers begs another question. Is that "other" database on the same instance?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2021 at 6:43 pm
Yes, same instance
Thanks
- Damian
May 12, 2021 at 7:28 pm
I would recommend the following:
The load will cause the clustered index to fragment - but since you will be rebuilding all indexes after the load that shouldn't cause any issues. But - you could output the data from the source system in clustered index order and that may help. Only way to be sure would be to test, and compare the increase in how long it takes to extract the data sorted - as it takes to rebuild.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 17, 2021 at 9:07 am
This was removed by the editor as SPAM
May 17, 2021 at 12:48 pm
I'll try to get back to this later today but... the load should not actually cause the Clustered Index to fragment... if I'm reading things right, it's a wholesale load/replacement.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 17, 2021 at 2:39 pm
Thanks
Yes, it's wholesale
Fragmentation post population does not appear to be an issue, so doesn't need rebuilding (99.4%)
If I leave the CIX in place (i.e. don not drop and recreate) and load using my normal SP, it takes forever (cancelled after one hour at around 20% through)
If I leave the CIX in place and tweak my normal SP to order by the CIX order, it takes around 10% longer (no great deal - 16 mins vs 18 mins); This comparison is against the original drop , populate, recreate method (when it works fine)
If I leave as is, it sometimes sits waiting (LATCH_EX) for hours - initial problem
One further issue - I cannot reproduce this on my development server!
So, do I take the marginal increase as a reliability expense against no waits or am I missing something else here?
Thanks
- Damian
May 17, 2021 at 2:51 pm
1. The source is another table in another database and I am using a dataflow task to move it from one table to another. The source is a Stored procedure
There must be some blocking / other delay in reading the data from that source table. Look for some activity on that table that is slowing things down.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 17, 2021 at 2:54 pm
I thought that initially Scott
but, there are 2 seperate tasks both indepdant and both sequential; 1 to populate the table then a second to create the index
It's waiting on the index create task by which time any interaction with anoth DB has taken place
- Damian
May 17, 2021 at 3:00 pm
Hmm, interesting. For the nonclustered index(es)? For a clus index create, I could see delays as SQL would have to sort the entire contents of the table.
What is the 'cost threshold for parallelism' setting? Is it at least 40? Sometimes SQL has huge waits if it tries to parallel a task that shouldn't be. 80M rows presumably could gain from parallelism, but not necessarily, and frankly SQL has some quirks in this area.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply