Very large table - performance issues

Question

Very large table - performance issues

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Bhaskar.Shetty Hall of Fame Points: 3054 More actions · Answer 1

GilaMonster (5/22/2013)
I would disagree there, stats that are more than a day old and have a single row change absolutely do not need updating. That's almost as bad as blanket updating everything.

It also shows the Rows being inserted / updated / deleted after the last statistics updated, which can give good idea which table need a statistics update, instead of blanket updating everything.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

Bhaskar.Shetty (5/22/2013)
GilaMonster (5/22/2013)
I would disagree there, stats that are more than a day old and have a single row change absolutely do not need updating. That's almost as bad as blanket updating everything.
It also shows the Rows being inserted / updated / deleted after the last statistics updated, which can give good idea which table need a statistics update, instead of blanket updating everything.

Yes, however your post made no mention of making an educated decision based on the number of rows changed. It said "these tables require a statistics update with full scan", which is not necessarily true and is likely to mislead someone new to SQL that doesn't have the background to understand the nuances of statistics maintenance.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

lnardozi 61862 SSCommitted Points: 1931 More actions · Answer 3

Have any of the queries changed? Is there maybe an unindexed column in a where clause now? Maybe one index needs a covering column added?

John_P SSC Veteran Points: 240 More actions · Answer 4

I have no idea (maybe someone else here does) but your last 400 million rows would put your DID into negative numbers - assuming it is not an unsigned int. Could that be causing a problem?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

John_P (5/22/2013)
I have no idea (maybe someone else here does) but your last 400 million rows would put your DID into negative numbers - assuming it is not an unsigned int. Could that be causing a problem?

If an identity hits maxint, it doesn't loop round and start at negative numbers, instead any further inserts fail with an out of bounds error.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Abu Dina SSChampion Points: 14155 More actions · Answer 6

Eric M Russell (5/22/2013)
Abu Dina (5/21/2013)
We have a tall table that contains 2.6 billion rows
Table structure:
The application which uses this table has been running slow for the last couple of days and it seems to have happened following the addition of about 400 million rows last weekend.
I think it's because of the index fragmentation although I'm not sure how to check if this the case without affecting the application?
So my first question is, how do I check to see if the indexes are fragmented and whether the stats need updating on such a large table?
I notice that your clustered index is on IX_dType. Why was that column chosen to cluster the table. Especially for tables with a large number of rows, you typically want to cluster on a column with unique sequential values. There could very well be fragmentation.

I don't know, I didn't create the table but now I'm looking after it! :w00t:

---------------------------------------------------------

It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 7

Abu Dina (5/21/2013)
We have a tall table that contains 2.6 billion rows
Table structure:

Gosh. A couple of people have mentioned a couple of things seriously wrong with this already but let me summarize some of the things I see wrong even if some of them are repeats.

1. 2.6 Billion rows using an INT for a PK. Most people don't have the forsight to start such a column at [font="Arial Black"]-[/font]2,147,483,648 and that's about the only way this table could have 2.6 Billion rows in it. If you're going to keep adding 400,000 rows on a regular basis, now would be a good time to convert the PK column (DID) to a BIGINT (heh... while the table is still "small" 😛 )

2. The Clustered Index is on the DTYPE column. I suspect there are relatively few unique values that go into that column. If you added 400,000 rows, you will likely have caused some huge page splits throughout the table which means that you have to read a lot more pages than you probably need to because many of the pages might only be half full due to the page splits. To wit, I strongly agree that the clustered index is on the wrong column. I don't know if your DID column is an IDENTITY column (or similar in operation) or not or whether the Created column appears in many queries but, if they are, I'd be tempted to make a UNIQUE Clustered Index using the Created and DID columns and in that order. Every index will benefit from the uniqueness of the clustered index.

The rule of thumb for a clustered index is "Narrow, ever increasing, and unique". If you build it correctly, you should almost never have to defrag the clustered index.

It would also be highly beneficial if the Created column was based on GETDATE() just to keep the "ever increasing" thing going.

3. I don't know if the table is partitioned or not. For the sake of easier index maintenance, you should strongly consider it if it's not. It'll take some planning and maybe some code changes if the DID column is an IDENTITY column, but it will absolutely be worth it insofar as index maintenance goes. With a bit more planning, you can make also make it so you only need to backup a very small portion of the table and you'll be able to suddenly do that on a nightly basis with Point-in-Time recovery. It'll also allow you (if you use a partitioned view across 2 databases) to restore the more recent parts of the table separately incase of a DR event to get the system back online much more quickly than trying to restore a 2.6 Billion row table in one fell swoop.

Also, keep in mind that partitioning isn't done for performance. The only thing that partitioning has to do with performance is 1) it'll make index management a whole lot faster so it could be done every night if necessary and 2) properly managed indexes will help performance. Depending on which method of partioning you chose, it may also reduce backup times, DR restore times, and tape space.

4. I can't tell from your graphic, but it would also be a good thing to find out if there are any triggers on the table. Those won't hurt SELECTs but they could really put the whammy on INSERTs, UPDATEs, and DELETES.

5. Consider archiving some of that data in the table. Do you REALLY need all 2.6 Billion rows to be accessable on an instantaneous basis? Even if you do, using partioned views will allow you to move a great deal of the data out of the main database and into an archive database. Again, you have to plan for this carefully especially if you want to do INSERTs via the partitioned view.

Of course, this is all just an opinion on my part and isn't complete (they're suggestions to look into, not complete solutions). I just wanted to give you some things to look at that I've had pretty good success with in the past.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Abu Dina SSChampion Points: 14155 More actions · Answer 8

Jeff, thanks so much for your ideas and suggestions. I've spent a few days on this problem and the table has been restructured and a new indexing strategy has been implemented.

I'm glad to report that performance has significantly improved.

Unfortunately I'm a little busy at the moment so I can't go into too much detail about what I've done but I will keep this thread alive somehow and will post back with more detail about this. Watch the space!

Once again thank you and thanks to all who have contributed to this thread. I couldn't have done this without you guys! 😎

---------------------------------------------------------

It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn