When working with large amounts of data in ETL jobs it can often can sink a lot of time inserting new data into a table and archiving old data out.
For example imagine we have a reporting table in our data warehouse that stores the last 12 completed months of sales, each month we remove or archive the oldest month from the table and add the latest complete month to it. The delete operation alone can take hours to run when data gets to a large enough size, it often also causes a large level of locks/blocking potentially making the whole table inaccessible to other queries. If we were to partition the data on month SQL Server will let us remove a whole partition from the table pretty much instantly and swap a new one in again pretty much instantly from a source table where we’ve pre-loaded the table, all this and no blocking!
Now Partitioning does have it’s drawbacks and it needs to be a good fit for your scenario before you think about using it. Some potentially blockers are it’s Enterprise only until SQL Server 2016 and if you’re querying across partitions where SQL Server can’t use partition elimination to limit the partitions it reads things can get a lot slower.
Let’s take a look at getting a working example that uses partitioning to swap in and out data from our partitioned table. To hook this up we first need a new database that has 12 partitions one for each month…
We then need a partition function that will put our data into the correct partition depending on it’s month…
We now need a partition scheme to map a partition function to one or more file groups, in our case we’ll put all our partitions in the same filegroup to keep things simple…
I’ll keep the sales table small by just putting a date and quantity on it, I’ll also add a computed field for month as that’s what we need to pass into the partition function…
For examples sake let’s just insert a record per day in 2018 with a random quantity…
You can then view the partitions we’ve created and how many rows they have in them by querying sys.partitions…
Let’s pretend we’re loading in data for January 2019 and as part of this process want to first remove or archive the data from January 2018. As I mentioned before large deletes can take time and cause a lot of blocking but because were using partition tables we can either truncate a partition or swap one out almost instantly. On SQL Server 2016+ we have support for truncating a partition like this…
With 1 being the ID of the partition we want to truncate. If you’re running an earlier version of SQL Server or want to archive rather than delete then instead you can move the partition out into a new table…
At this point the data for January has been removed or archived so we’re ready to now swap in January 2019. To do that we need our source table to be on the same filegroup and have constraints that prevent any data being in it that is not appropriate for the partition we’re swapping in to. Without the constraint the swap will error as SQL Sever Will not allow the possibility of swapping in data that does not fit the partition based on the partition function.
All the data from NewData is now in the first partition for our DaySales table. For large warehouse tables this process can MASSIVELY speed up load and archiving processes and prevent excessive blocking/locking.
I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...
Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.
In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...
You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...
Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.