May 10, 2024 at 7:54 am
I've inherited a couple of rather large databases from my ex-colleague when I join this company.
Today, a developer reached out to me to inform me that in that databases there's a couple of large tables which has partition & they would want to partition ALL of the tables in the DB. Bear in mind this is a huge 1TB database.
Upon some deep dive, I notice the partition function's date range are out of date, the last values is back in 2018. So all of the data after Dec 2018 are now kept in the last partition. Seems like this defeats the actual purpose of partitioning. I get back to the developer and mention my findings & inform them that to partition the whole database will take huge effort and resources thus he needs to discuss with his manager first.
My question is this, in what situation will partitioning actually beneficial? My basic understanding is partitioning is good for data maintenance & performance if they frequently needs to query data based on the date ranges. Is there any other scenario where partitioning is beneficial? Appreciate if someone could advice in laymen term.
Furthermore, if there's no housekeeping being done towards data then partitioning will still be meaningless right? As data keeps growing and the number of partitioning will need to be increase all the time.
May 10, 2024 at 10:12 am
https://www.brentozar.com/archive/2012/03/how-decide-if-should-use-table-partitioning/
I'm quite hesitant to use partition because it requires maintance and good queries that support it. Maybe the odd partition swap.
As Brent usually mentions, what is the developer trying to fix?
You can also use partitioned views in some cases https://sqlsunday.com/2014/08/31/partitioned-views/
May 10, 2024 at 1:12 pm
Partitioning, first and foremost, is a data management tool, not a performance tool. We must be up front and clear with this. If you partition any tables, let alone every table (that is without a doubt probably not needed) in order to not suffer negative impacts to performance (your performance will get very bad), you must be able to guarantee that every query will filter by the partitioning key. If you can't guarantee filtering on the partitioning key, you will see scans across partitions, and that is going to hurt performance, a lot.
I would only partition if I had a lot of data, i could guarantee partition elimination, and, most importantly, I needed to regularly remove data, and/or move it around. Rolling dates is a good example. You only have to keep the previous twelve months of data, so once a month, you drop the oldest partition and create a new one.
Partitioning for performance? Nope. Just almost never have I done that. It's frequently just too hard to guarantee the right kinds of queries.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 10, 2024 at 2:19 pm
Jeff will no doubt add something here - do heed his advise.
but as mentioned (and as you yourself raised as an issue) partitioning should only be used when most of the following are met
in majority of cases that people decide on using partitioning is because A - they don't really understand how it works, and B their code is bad and they think partitioning will help performance (It WONT and it will most times make it worst)
May 10, 2024 at 2:57 pm
Actually, partitioning came sometimes accidentally (or coincidentally) help performance. For example, the data was originally clustered by id, but to partition you cluster it first by a date. IF you happen to do the big majority of your lookups by date, then the new partitioning could indeed help performance.
However, you could get the same performance gain simply by properly clustering the data without doing the partitioning at all. Often, that makes it look as though "partitioning helped performance" when in fact it was really just the re-clustering that helped performance.
The single most important factor for best overall performance is best clustering the table. So focus initially on that, and ignore any phony rules such as "always cluster by id." Clustering is far too important to be controlled by trite rule(s). Instead, carefully select each table's clustering based on that table's actual needs and usage.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
May 22, 2024 at 6:26 am
As Joe said, quoting Brent, what problem are the developers trying to fix.
The first thing in any troubleshooting situation is to articulate the problem in neutral terms. If it is described in terms of a potential solution then almost certainly the underlying problem is not understood and the wrong solution will be applied.
Speak to your manager, speak to the developers, and speak to the developer's manager. The aim should be to get the problem defined, so that potential solutions can be proposed and a cost-effective plan 0ut together.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
May 24, 2024 at 12:13 am
All of the other folks have brought up great points on this thread and there's really nothing to add. To emphasize, though...
The bottom line is that most (again... can't bring myself to say "ALL" and be able to guarantee it) partitioned tables are slower than a properly indexed monolith. They can be great for index maintenance, backups, and restores but, like everyone here has been stressing, you have to know what problem you're trying to solve because they have a lot of gazintas and special requirements that frequently make them not worth it. That's probably why the predecessor of the person on this thread did even try to maintain them for, apparently, the last 6 years.
So, back to the original post... it partitioning going to actually be worth it? Wouldn't it just be better to archive some old stuff by moving it to another database that doesn't require nightly full and transaction log backups?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 10, 2024 at 5:23 am
If you really want to see the differences between proper cluster/non-cluster indexing, and Partitioning data management processes, you need a test platform to check the performance indicators for alternative solutions. As one person stated, there were design/implementation flaws causing problems for many years.
DBASupport
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply