October 30, 2015 at 4:18 am
Okay yesterday I was in a bit of a panic and made the post Partition Sliding window causing loads of... in a haste without thinking it through, so that topic can be ignored, I managed to rest calm down and think.
Here's the situation; we have a database for our warehouse where they test machines before shipping them off to the clients,these machines are either newly build or came back for repairs. Each machine gets it own table which is the warehouse name followed by sequence number that's unique for that machine. The database is starting to get sluggish, we want to implement a sliding window partition,containing 2 partitions Archive & Current, with current containing a max of the past 45 days.
Issues like I said are new machines,so on an almost 24/7 basis new tables can be added or an old table that hasn't been touched in years gets new data.
On a weekly basis there is a job that checks the tables for compression levels, every 4 hours a backup of the transaction log is taken.
There is a 0.00 margin for downtime :crying:
So this is a range right partion,I've done the following
CREATE PARTITION FUNCTION [WAREHOUSE_P_F](datetime) AS RANGE RIGHT FOR VALUES (N'2015-09-08T00:00:00.000')
CREATE PARTITION SCHEME [WAREHOUSE_S_F] AS PARTITION [WAREHOUSE_P_F] TO ([FG_DATA_001], [FG_DATA_CURRENT])
Now I've been moving the tables over starting with the smallest and also a few big ones,checking locks,transaction logs and filegrowth of FG_DATA_CURRENT.
Now comes my problem with the implementation of the sliding window,the procedure I got takes to long blocks to many progress's and gets deadlocked itself after a while.
This was not an issue during development for the setup and my request back then for a proper realistic test environment fell on deaf ears.
I've seen some potential solutions but they've all used staging tables (and they were also dealing with a single table), so anyone got a idea what I could do.
October 30, 2015 at 5:04 am
Resender (10/30/2015)
Okay yesterday I was in a bit of a panic and made the post Partition Sliding window causing loads of... in a haste without thinking it through, so that topic can be ignored, I managed to rest calm down and think.Here's the situation; we have a database for our warehouse where they test machines before shipping them off to the clients,these machines are either newly build or came back for repairs. Each machine gets it own table which is the warehouse name followed by sequence number that's unique for that machine. The database is starting to get sluggish, we want to implement a sliding window partition,containing 2 partitions Archive & Current, with current containing a max of the past 45 days.
Issues like I said are new machines,so on an almost 24/7 basis new tables can be added or an old table that hasn't been touched in years gets new data.
On a weekly basis there is a job that checks the tables for compression levels, every 4 hours a backup of the transaction log is taken.
There is a 0.00 margin for downtime :crying:
So this is a range right partion,I've done the following
CREATE PARTITION FUNCTION [WAREHOUSE_P_F](datetime) AS RANGE RIGHT FOR VALUES (N'2015-09-08T00:00:00.000')
CREATE PARTITION SCHEME [WAREHOUSE_S_F] AS PARTITION [WAREHOUSE_P_F] TO ([FG_DATA_001], [FG_DATA_CURRENT])
Now I've been moving the tables over starting with the smallest and also a few big ones,checking locks,transaction logs and filegrowth of FG_DATA_CURRENT.
Now comes my problem with the implementation of the sliding window,the procedure I got takes to long blocks to many progress's and gets deadlocked itself after a while.
This was not an issue during development for the setup and my request back then for a proper realistic test environment fell on deaf ears.
I've seen some potential solutions but they've all used staging tables (and they were also dealing with a single table), so anyone got a idea what I could do.
Before proceeding any further,
how many rows are in the table?
What size is it?
What gains do you expect from partitioning?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 3, 2015 at 3:52 am
Before proceeding any further,
ChrisM@Work (10/30/2015)how many rows are in the table?
The largest table contains above 1.5 million records,the smallest about 16, and their hundreds on hundreds that are currently empty.
ChrisM@Work (10/30/2015)What size is it?
I currently don't have the figures or the time, but based on our estimates we would need a minimum of 54GB of disk for all the data in the past 45 days
and it's been over 1600 days since this database came into use the first time with this system.
ChrisM@Work (10/30/2015)What gains do you expect from partitioning?
Improve query response times,for 1 machine will need all the historical data but the reports only need the data from the past 6 weeks.
November 3, 2015 at 4:01 am
Resender (11/3/2015)
Before proceeding any further,ChrisM@Work (10/30/2015)how many rows are in the table?
The largest table contains above 1.5 million records,the smallest about 16, and their hundreds on hundreds that are currently empty.
ChrisM@Work (10/30/2015)What size is it?
I currently don't have the figures or the time, but based on our estimates we would need a minimum of 54GB of disk for all the data in the past 45 days
and it's been over 1600 days since this database came into use the first time with this system.
ChrisM@Work (10/30/2015)What gains do you expect from partitioning?
Improve query response times,for 1 machine will need all the historical data but the reports only need the data from the past 6 weeks.
Partitioning is unlikely to improve your query response times, in fact it's quite likely to make them worse. There's a good proof of this here. If performance is your sole reason for considering partitioning, then you'd be well advised not to pursue it further, and focus on other avenues for query tuning.
1.5 million rows is very modest.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 3, 2015 at 4:59 am
OK, not so good news then,the total size of the database currently is 1.9TB
November 3, 2015 at 5:12 am
Each machine gets it own table which is the warehouse name followed by sequence number that's unique for that machine.
How many of these tables are there in the db?
Are they structurally identical?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 3, 2015 at 5:46 am
Total number of machine tables (3/11/2015 13:44 CET) 123851
Those tables are the same
[MachineRegisterId] [int] NOT NULL,
[UtcTime] [datetime] NOT NULL,
[Value] [float] NOT NULL
With primary key on the combination of MachineRegisterId & UtcTime
November 3, 2015 at 5:52 am
Resender (11/3/2015)
Total number of machine tables (3/11/2015 13:44 CET) 123851Those tables are the same
[MachineRegisterId] [int] NOT NULL,
[UtcTime] [datetime] NOT NULL,
[Value] [float] NOT NULL
With primary key on the combination of MachineRegisterId & UtcTime
Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?
What is the total rowcount?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 3, 2015 at 6:16 am
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
Total number of machine tables (3/11/2015 13:44 CET) 123851Those tables are the same
[MachineRegisterId] [int] NOT NULL,
[UtcTime] [datetime] NOT NULL,
[Value] [float] NOT NULL
With primary key on the combination of MachineRegisterId & UtcTime
Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?
What is the total rowcount?
oh not at all the idea is to partition all 123,851 tables over 2 partitions so that each table has an archive/active section,my appologies if my explanation was bad.
November 3, 2015 at 7:15 am
Resender (11/3/2015)
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
Total number of machine tables (3/11/2015 13:44 CET) 123851Those tables are the same
[MachineRegisterId] [int] NOT NULL,
[UtcTime] [datetime] NOT NULL,
[Value] [float] NOT NULL
With primary key on the combination of MachineRegisterId & UtcTime
Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?
What is the total rowcount?
oh not at all the idea is to partition all 123,851 tables over 2 partitions so that each table has an archive/active section,my appologies if my explanation was bad.
what's the total rowcount?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 3, 2015 at 7:16 am
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
Total number of machine tables (3/11/2015 13:44 CET) 123851Those tables are the same
[MachineRegisterId] [int] NOT NULL,
[UtcTime] [datetime] NOT NULL,
[Value] [float] NOT NULL
With primary key on the combination of MachineRegisterId & UtcTime
Your plan is to merge these 123,851 tables into one partitioned table (with two additional columns derived from the table names), with a partition on [UtcTime]?
What is the total rowcount?
oh not at all the idea is to partition all 123,851 tables over 2 partitions so that each table has an archive/active section,my appologies if my explanation was bad.
what's the total rowcount?
and how is this lot typically interrogated?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 3, 2015 at 7:26 am
In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is
89,372,968,620
ChrisM@Work (11/3/2015)and how is this lot typically interrogated?
I wish I knew but I'm frequently being told that the reports are beyond my paygrade
November 3, 2015 at 7:49 am
Resender (11/3/2015)
In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is89,372,968,620
ChrisM@Work (11/3/2015)and how is this lot typically interrogated?
I wish I knew but I'm frequently being told that the reports are beyond my paygrade
Until you posted up that number, I was thinking that the best way to handle this lot would be a single table, but the largest single table I've worked with was a little over INT rows, about 2,800,000,000 - and it took some effort to get it to play nicely. Here, our largest table is a little under 1,000,000,000 rows and we get results back in milliseconds. 89,372,968,620 is out of my experience / comfort zone but I'd still lean towards one or two tables, one to hold individual devices, the other to hold activity. Let's see what alternatives folks come up with.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 3, 2015 at 8:14 am
ChrisM@Work (11/3/2015)
Resender (11/3/2015)
In the meantime 123883 tables (yes 30 new tables got added) the total recordcount is89,372,968,620
ChrisM@Work (11/3/2015)and how is this lot typically interrogated?
I wish I knew but I'm frequently being told that the reports are beyond my paygrade
Until you posted up that number, I was thinking that the best way to handle this lot would be a single table, but the largest single table I've worked with was a little over INT rows, about 2,800,000,000 - and it took some effort to get it to play nicely. Here, our largest table is a little under 1,000,000,000 rows and we get results back in milliseconds. 89,372,968,620 is out of my experience / comfort zone but I'd still lean towards one or two tables, one to hold individual devices, the other to hold activity. Let's see what alternatives folks come up with.
Let's up that unconformtability a bit,from the figures I get a minute of unexpected downtime and the inputbuffers for the data not holding would cost us 4160 euro.
Or to be better put this is an application that resulted from the merge of several dozens instead being developed from the ground up, and we're reached a point where we can neither maintain or keep feedback from it properly.
November 3, 2015 at 11:09 am
Resender (11/3/2015)
OK, not so good news then,the total size of the database currently is 1.9TB
Hold the phone a minute... While it's true that partitioning is NOT a panacea of performance for queries in general, it's fantastic at being able to almost instantly drop (using SWITCH) large amounts of legacy data.
With that in mind, let's peel just one potato at a time to get to the bottom of all this. For starters, what are the 3 largest tables in the database that you'd like to apply the 45 day drop-rows rule to and how are they used? Are they OLTP or more like an audit/invoice detail table?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 24 total)
You must be logged in to reply to this topic. Login to reply