Dealing with super huge table which is partially partitioned.

Question

Post reply

Dealing with super huge table which is partially partitioned.

JasonO

SSC-Addicted

Points: 499
More actions
February 14, 2023 at 9:31 am

#4149414

I'm dealing with 1 production database which is very large, roughly about 17TB in size.
The problem is there are a couple of super large tables in the server which are partially partitioned.
During the initial stage of the database, partitions are being created to those tables however somehow it stopped. The last partition which I'm able to see was back in 2011. Since then all datas moving forward are only kept on a single partition! The biggest table was having close to 20Bil rows!!!
I do not know much about the history of the DB as most of the team which manages the database has long gone. Now we're left to clean up this mess which I've not much idea on how to move along other than to create partitions and move the data back into it's partition. However this method do come with a very heavy price to pay since I'm guessing the database will need to sort each data into their partitions.
Another way is to totally abandon the existing table by renaming it then create a duplicate table with proper partitioning in placed & all the new data will be inserted into this new table.
Is there any other way for me to move forward on this?

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Jo Pattyn SSC-Dedicated Points: 32641 More actions · Answer 1

Not familiar with partitions, but partitioned views seem interesting to minimize downtime.

https://www.sqlshack.com/sql-server-partitioned-views/

Moving to a new partition with split range https://medium.com/@MadhavanR51/add-new-partition-range-to-the-sql-server-partitioned-table-split-partition-range-60e197c2f73d

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 2

wrote:

Dealing with a super huge table that is partially partitioned can be a challenging task for many database administrators and developers. However, with the right approach and tools, you can effectively manage this type of data structure and make the most of its potential benefits.
One of the key things to consider when dealing with a partially partitioned table is to understand the partitioning scheme and how it affects your queries. Make sure to know the column(s) used for partitioning and how data is distributed across the partitions. This will help you optimize your queries for better performance by taking advantage of partition pruning.
Another important factor to consider is the storage engine used for the table. Some storage engines like MyISAM do not support partitioning, while others like InnoDB have limited support. If you are using a storage engine that does not support partitioning, you may need to consider switching to a different one to take full advantage of partitioning.
It's also important to regularly monitor the growth of the table and its partitions. As the table grows, you may need to add more partitions to ensure that each partition is of manageable size. This will help to avoid performance issues and ensure that your queries run efficiently.
Finally, be sure to maintain the table and its partitions by regularly performing maintenance tasks such as optimizing the table, checking for corruption, and repairing any damaged partitions.
In conclusion, dealing with a super huge table that is partially partitioned can be a complex task, but with proper planning and the right tools, you can effectively manage this type of data structure and achieve better performance and scalability.

If you get your answers from ChatGPT you should say so.

https://depositphotos.com/38269889/stock-photo-beautiful-young-woman.html

This reply was modified 3 years ago by Jonathan AC Roberts. Reason: link updated

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 3

Jonathan AC Roberts wrote:

If you get your answers from ChatGPT you should say so.

Ha! I was just coming down to say something similar. I suspect we're going to be seeing a LOT of this from now on.

Ugh!

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Grant Fritchey SSC Guru Points: 399381 More actions · Answer 4

If you've been running without partitions for 12 years or so, why add them back? Partitions are all about data management, not performance. If you're not dealing with all the data movement of partitions, dropping them as they age out, adding them as new stuff comes in, why are they even there and why are you worrying about them at all?

As far as performance goes, what everything thinks will get them improved behaviors with partitions is partition elimination. Problem is, partition elimination is hard to guarantee for lots of workloads. If you can't guarantee partition elimination in your queries, then you get partition scans. These are worse than regular scans.

However, no, the only way to get partitions is to move data into them, either at the point of collection, or after the fact. So yeah, big, ugly task in front of you if you want to reimplement them. I'd suggest, instead, you just start adding them since you've been without them up to this point. But, great big but, only if you can guarantee partition elimination in your queries. Further, only if you're dropping partitions too. You don't want to only add, forever. The whole idea of partitions are moving data windows, say, for example, only storing the last five years of data, partitioned by month (because we only, ever, query the data by month), where in we drop the last partition as each new month arrives. If you're not doing something along these lines, again, partitioning is not the way to go.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 5

Grant Fritchey wrote:

Jonathan AC Roberts wrote:
If you get your answers from ChatGPT you should say so.
Ha! I was just coming down to say something similar. I suspect we're going to be seeing a LOT of this from now on.
Ugh!

Openai have an app where you can test text to see if it was written by AI: https://platform.openai.com/ai-text-classifier

AI answers always seems to talk in the third person and never about itself it also always has an "In conclusion" or "In summary" to end. It also repeats the main subject of the question many times in the answer. The answer starts and ends with paragraphs that contain "dealing with a super huge table that is partially partitioned" Also the word "partition" occurs 15 times in the answer.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

Jonathan AC Roberts wrote:

wrote:
Dealing with a super huge table that is partially partitioned can be a challenging task for many database administrators and developers. However, with the right approach and tools, you can effectively manage this type of data structure and make the most of its potential benefits.
One of the key things to consider when dealing with a partially partitioned table is to understand the partitioning scheme and how it affects your queries. Make sure to know the column(s) used for partitioning and how data is distributed across the partitions. This will help you optimize your queries for better performance by taking advantage of partition pruning.
Another important factor to consider is the storage engine used for the table. Some storage engines like MyISAM do not support partitioning, while others like InnoDB have limited support. If you are using a storage engine that does not support partitioning, you may need to consider switching to a different one to take full advantage of partitioning.
It's also important to regularly monitor the growth of the table and its partitions. As the table grows, you may need to add more partitions to ensure that each partition is of manageable size. This will help to avoid performance issues and ensure that your queries run efficiently.
Finally, be sure to maintain the table and its partitions by regularly performing maintenance tasks such as optimizing the table, checking for corruption, and repairing any damaged partitions.
In conclusion, dealing with a super huge table that is partially partitioned can be a complex task, but with proper planning and the right tools, you can effectively manage this type of data structure and achieve better performance and scalability.
If you get your answers from ChatGPT you should say so.
spam link removed from here/

Jonathan... we normally mark such posts as your as spam when they contain a non-database related link, such as the one in your post above.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 7

Jeff Moden wrote:

Jonathan AC Roberts wrote:
wrote:
Dealing with a super huge table that is partially partitioned can be a challenging task for many database administrators and developers. However, with the right approach and tools, you can effectively manage this type of data structure and make the most of its potential benefits.
One of the key things to consider when dealing with a partially partitioned table is to understand the partitioning scheme and how it affects your queries. Make sure to know the column(s) used for partitioning and how data is distributed across the partitions. This will help you optimize your queries for better performance by taking advantage of partition pruning.
Another important factor to consider is the storage engine used for the table. Some storage engines like MyISAM do not support partitioning, while others like InnoDB have limited support. If you are using a storage engine that does not support partitioning, you may need to consider switching to a different one to take full advantage of partitioning.
It's also important to regularly monitor the growth of the table and its partitions. As the table grows, you may need to add more partitions to ensure that each partition is of manageable size. This will help to avoid performance issues and ensure that your queries run efficiently.
Finally, be sure to maintain the table and its partitions by regularly performing maintenance tasks such as optimizing the table, checking for corruption, and repairing any damaged partitions.
In conclusion, dealing with a super huge table that is partially partitioned can be a complex task, but with proper planning and the right tools, you can effectively manage this type of data structure and achieve better performance and scalability.
If you get your answers from ChatGPT you should say so.
spam link removed form here/
Jonathan... we normally mark such posts as your as spam when they contain a non-database related link, such as the one in your post above.

It is just a link that has the same photo as the avatar of person who posted the message taken from ChatGPT.

woman-with-great-hair-1506380402

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 8

Grant Fritchey wrote:

If you've been running without partitions for 12 years or so, why add them back? Partitions are all about data management, not performance. If you're not dealing with all the data movement of partitions, dropping them as they age out, adding them as new stuff comes in, why are they even there and why are you worrying about them at all?
As far as performance goes, what everything thinks will get them improved behaviors with partitions is partition elimination. Problem is, partition elimination is hard to guarantee for lots of workloads. If you can't guarantee partition elimination in your queries, then you get partition scans. These are worse than regular scans.
However, no, the only way to get partitions is to move data into them, either at the point of collection, or after the fact. So yeah, big, ugly task in front of you if you want to reimplement them. I'd suggest, instead, you just start adding them since you've been without them up to this point. But, great big but, only if you can guarantee partition elimination in your queries. Further, only if you're dropping partitions too. You don't want to only add, forever. The whole idea of partitions are moving data windows, say, for example, only storing the last five years of data, partitioned by month (because we only, ever, query the data by month), where in we drop the last partition as each new month arrives. If you're not doing something along these lines, again, partitioning is not the way to go.

All I can say to that is + 1 Million!!!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

Jonathan AC Roberts wrote:

It is just a link that has the same photo as the avatar of person who posted the message taken from ChatGPT.

I'm thinking that may have been their goal to begin with. I'd remove that link because it has zero to do with databases or even ChatGPT and everything to do with hair-care advertising.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 10

Jeff Moden wrote:

Jonathan AC Roberts wrote:
It is just a link that has the same photo as the avatar of person who posted the message taken from ChatGPT.
I'm thinking that may have been their goal to begin with. I'd remove that link because it has zero to do with databases or even ChatGPT and everything to do with hair-care advertising.

I don't think it anything to do with hair products, that's just a stock photo that people can use on their website. It's more they just pick a photo of a good looking smiling girl with beautiful hair as they think it will get more attention from the men answering questions.

I've updated the link so it points to the stock photo site where it came from instead of a site that used it.

This reply was modified 3 years ago by Jonathan AC Roberts. Reason: I've updated the link so it points to the stock photo site where it came from instead of a site that used it

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 11

Interesting... the first time I clicked on it, it took me to a page selling hair-care products with that picture on it. I just clicked on it again and, you're correct... just a picture.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 12

Jeff Moden wrote:

Interesting... the first time I clicked on it, it took me to a page selling hair-care products with that picture on it. I just clicked on it again and, you're correct... just a picture.

I just did a Google Reverse Image search on his avatar and found a link to it and put it in. When you complained about it potentially advertising hair products I picked a different link from Google Reverse Image search that went to a stock photo site that had his image and replaced the link with that.

JasonO SSC-Addicted Points: 499 More actions · Answer 13

Grant Fritchey wrote:

If you've been running without partitions for 12 years or so, why add them back? Partitions are all about data management, not performance. If you're not dealing with all the data movement of partitions, dropping them as they age out, adding them as new stuff comes in, why are they even there and why are you worrying about them at all?
As far as performance goes, what everything thinks will get them improved behaviors with partitions is partition elimination. Problem is, partition elimination is hard to guarantee for lots of workloads. If you can't guarantee partition elimination in your queries, then you get partition scans. These are worse than regular scans.
However, no, the only way to get partitions is to move data into them, either at the point of collection, or after the fact. So yeah, big, ugly task in front of you if you want to reimplement them. I'd suggest, instead, you just start adding them since you've been without them up to this point. But, great big but, only if you can guarantee partition elimination in your queries. Further, only if you're dropping partitions too. You don't want to only add, forever. The whole idea of partitions are moving data windows, say, for example, only storing the last five years of data, partitioned by month (because we only, ever, query the data by month), where in we drop the last partition as each new month arrives. If you're not doing something along these lines, again, partitioning is not the way to go.

Yes, basically the database has been running without partition for 12 years but the most surprising fact is none of the application guys are aware of this. They had been informed that the database for their application has partition table implemented. It's only now when they decided to perform housekeeping that they realize the database has been without partition for 12 years.

Now, they're facing with the issue of slowness in dropping old data as well as temp DB issue due to index eager spool.

Somehow this issues are all inter-related to each other which makes troubleshooting and pin pointing the exact cause problematic.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 14

I'm curious... has anyone checked to see if any of that 17TB can be archived and moved out of the main table?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)