Table Partitioning for 500gb in size

Question

Post reply

Table Partitioning for 500gb in size

AK1516

SSCrazy

Points: 2705
More actions
February 25, 2014 at 6:50 am

#288643

Can some help me reffer to a good document or script to do the table partitioning for a 500gb in sliding window technique ?
In my 4 year experince never got a chance... this is my first ...

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Hunchback SSCrazy Points: 2649 More actions · Answer 1

Hope this helps.

Partitioned Table and Index Strategies Using SQL Server 2008

http://technet.microsoft.com/en-us/library/dd578580(v=SQL.100).aspx

I will also suggest all blog posts from Dan Guzman about this theme.

http://weblogs.sqlteam.com/dang/search.aspx?q=table%20partitioning

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 2

Hold the phone, folks. 500GB isn't something to sneeze at here. If the table isn't already partitioned, it will need to be and, unless you want to wait for a Clustered Index and all of the non-clustered indexes to be rebuilt, I'd suggest a bit more than "lemme do it now" planning. A "sliding window" scenario is only the tip of the proverbial iceberg.

1. The first thing to ask is, what kind of table is it? Is it an OLTP table or an audit table where rows are written but never updated?

2. What is the current primary key?

3. Are there any foreign keys point to or from the table?

4. What is the current clustered index (it can be different than the PK).

5. Are there any other unique indexes other than the PK?

6. What are you proposing for the partioning column?

7. What is the definition of the table? (generating a CREATE TABLE from the table would help here).

8. Do you have enough free disk space to hold 2.2 copies of the table?

9. Are you going to try to take advantage of greatly reduced backup sizes?

10. Are you going to try to take advantage of "Piecmeal restores"?

11. If this is an audit table, do you have room on some other disk that the server can see to keep the table online for all but a minute during the partitioning process?

12. If this is an audit table, can we store this partitioned table in a separate database so we can take advantage of "Read only" partitions and "packed partitions"?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

AK1516 SSCrazy Points: 2705 More actions · Answer 3

1. The first thing to ask is, what kind of table is it? Is it an OLTP table or an audit table where rows are written but never updated?

2. What is the current primary key?

4 columns

3. Are there any foreign keys point to or from the table?

Yes - I am mainly looking for this answer ..... how to handle the foreign keys.. there are few examples online.. none have explained handling the foreign keys from table and to table...in particular for sliding window technique..

4. What is the current clustered index (it can be different than the PK).

Sam columns as PK

5. Are there any other unique indexes other than the PK?

Yes

6. What are you proposing for the partioning column?

Datetime

7. What is the definition of the table? (generating a CREATE TABLE from the table would help here).

Cannot Put it here

8. Do you have enough free disk space to hold 2.2 copies of the table?

yes

9. Are you going to try to take advantage of greatly reduced backup sizes?

No

10. Are you going to try to take advantage of "Piecmeal restores"?

No

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 4

AK1516 (2/25/2014)
1. The first thing to ask is, what kind of table is it? Is it an OLTP table or an audit table where rows are written but never updated?
2. What is the current primary key?
4 columns
3. Are there any foreign keys point to or from the table?
Yes - I am mainly looking for this answer ..... how to handle the foreign keys.. there are few examples online.. none have explained handling the foreign keys from table and to table...in particular for sliding window technique..
4. What is the current clustered index (it can be different than the PK).
Sam columns as PK
5. Are there any other unique indexes other than the PK?
Yes
6. What are you proposing for the partioning column?
Datetime
7. What is the definition of the table? (generating a CREATE TABLE from the table would help here).
Cannot Put it here
8. Do you have enough free disk space to hold 2.2 copies of the table?
yes
9. Are you going to try to take advantage of greatly reduced backup sizes?
No
10. Are you going to try to take advantage of "Piecmeal restores"?
No

You didn't answer the first question. Is it an OLTP table or an audit table where you write each row once and never update it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

AK1516 SSCrazy Points: 2705 More actions · Answer 5

AK1516

SSCrazy

Points: 2705

February 25, 2014 at 11:06 pm

#1692132

Its In OLTP

SQLBlimp SSCertifiable Points: 6340 More actions · Answer 6

AK1516 (2/25/2014)
Can some help me reffer to a good document or script to do the table partitioning for a 500gb in sliding window technique ?
In my 4 year experince never got a chance... this is my first ...

Jeff Moden's questions are all great and need to be answered in detail.

However, I would add two more:

1. Does the table have any LOB columns (nvarchar(max), varbinary(max), text, ntext, image)?

2. If so is the text file group the same as the file group of the clustered index?

I have seen gigantic tables like this with low row counts and gigantic LOB, and partitioning might not be as efficient for that as placing the text file group on other than PRIMARY.

Thanks

John.

Igor Micev SSC-Dedicated Points: 33109 More actions · Answer 7

JohnFTamburo (2/27/2014)
AK1516 (2/25/2014)
Can some help me reffer to a good document or script to do the table partitioning for a 500gb in sliding window technique ?
In my 4 year experince never got a chance... this is my first ...
Jeff Moden's questions are all great and need to be answered in detail.
However, I would add two more:
1. Does the table have any LOB columns (nvarchar(max), varbinary(max), text, ntext, image)?
2. If so is the text file group the same as the file group of the clustered index?
I have seen gigantic tables like this with low row counts and gigantic LOB, and partitioning might not be as efficient for that as placing the text file group on other than PRIMARY.
Thanks
John.

Hi John. This is a nice remark. I think LOB columns usually go to a separate table (usually on another filegroup) having the original table's primary key as a foreign key. However, if it's not the case, than it would be a better strategy.

Igor Micev,My blog: www.igormicev.com

aaa121 SSCertifiable Points: 5902 More actions · Answer 8

Hi All,

Really interesting thread. I happen to have just inherited a database which has been thrown together with no filegroups or partitioning and this has a table or 500GB (2 billion records).

I'll answer the questions Jeff set out as I would be interested to hear your views.

1. The first thing to ask is, what kind of table is it? Is it an OLTP table or an audit table where rows are written but never updated?
It's a table in a DW where incremental records are first staged and then appended to this table everyday
2. What is the current primary key?
JourneyEventID BIGINT
3. Are there any foreign keys point to or from the table?
There is a Journey_ID field which is a FK but it is not physically bound by any constraints - I presume this is for the performance of loading the data - with the data integrity maintained in the ETL
4. What is the current clustered index (it can be different than the PK).
UNIQUE CLUSTERED INDEX on 2 columns (Journey_ID, JourneyEventID)
5. Are there any other unique indexes other than the PK?
UNIQUE NONCLUSTERED INDEX on JourneyEventID
UNIQUE NONCLUSTERED INDEX on 3 columns (Journey_ID, JourneyEventID, EventTimeStamp)
6. What are you proposing for the partioning column?
EventTimeStamp
7. What is the definition of the table? (generating a CREATE TABLE from the table would help here).
CREATE TABLE [dbo].[JourneyEvent](
[JourneyEventID] [bigint] NOT NULL,
[Journey_ID] [bigint] NULL,
[EventTimeStamp] [datetime] NULL,
[Latitude] [float] NULL,
[Longitude] [float] NULL,
[PointHeading] [int] NULL,
[PointSpeed] [int] NULL,
[EventType_ID] [smallint] NULL,
[EventDuration] [int] NULL,
[DistanceTravelled] [int] NULL,
[TimeElapsed] [int] NULL,
[AverageAcceleration] [float] NULL,
[MaxAcceleration] [float] NULL,
[AverageSpeedFromPrevPoint] [int] NULL,
[RoadType_ID] [smallint] NULL,
[GPSQuality_ID] [smallint] NULL,
[GeoFencingAreaFlag] [char](1) NULL,
[PrevJourneyEvent_ID] [bigint] NULL,
[ExclusionReason_ID] [smallint] NULL,
CONSTRAINT [PK_JourneyEvent] PRIMARY KEY NONCLUSTERED
(
[JourneyEventID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
8. Do you have enough free disk space to hold 2.2 copies of the table?
Yes.
9. Are you going to try to take advantage of greatly reduced backup sizes?
This is not an immediate issue, but it is something I would like to take into account now.
10. Are you going to try to take advantage of "Piecmeal restores"?
This is not an immediate issue, but it is something I would like to take into account now.
11. If this is an audit table, do you have room on some other disk that the server can see to keep the table online for all but a minute during the partitioning process?
12. If this is an audit table, can we store this partitioned table in a separate database so we can take advantage of "Read only" partitions and "packed partitions"?
Eventhough this is not a audit table, I think this is something that can be looked at.

AK1516 SSCrazy Points: 2705 More actions · Answer 9

AK1516

SSCrazy

Points: 2705

March 6, 2014 at 9:22 am

#1694777

Jeff, It would be great if you can guide us..

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 10

My apologies. I lost track of this thread.

I'll try to answer both sets of questions that you good folks have answered.

First, FK's that point to the partitioned table require a unique index just like any other table. The problem with partitioned tables and FKs is that if you "align" the indexes, you MUST include the partitioning column in that index. Of course, that also means the "unique" index will no longer be truly unique because of the (usual) date column that the partitioning column is made up from. That includes PKs and any other unique index. You CAN, of course, create non-aligned indexes and they'll work just fine as targets for FKs but, if you have any non-aligned indexes, two things happen... first, you lose the ability to rebuild indexes on just on partition at a time (a BIG loss, so far as I'm concerned) AND, should you ever need it, will not allow SWITCH in or SWITCH out unless you first drop the non-aligned indexes. Switch out is particularly important if you decide to drop large older sections of the table (like the oldest month). If you can't SWITCH out, then you have to resort to DELETEs and we know how long those can take and what it does to the log file. It also screws things up if you're after a reduction in backups for partitions that didn't change. Yeah... I know. People don't think that's important for indexes but I have a 25GB audit table at work where only 9GB of it is actual data and the rest is (ugh!) indexes.

I think that Partitioned Views are a bit more forgiving because they require that only the PK include the partitioning check constraint column. They seem like a bit more work to maintain but, with what I've had to go through to really make Partitioned Tables work the way I wanted (mostly to optimize backups and index maintenance on mostly static Audit tables where only the latest partition is written to), it almost seems worth it to me. The trouble is that they limited to only 256 tables in the UNION ALLs according to Books Online and my personal experience is that 253 is the actual limit. I don't know why that is.

Getting to the creation of Partitioned Tables, if you switch the target database (I'll explain that in a minute) to Bulk Logged or SIMPLE, the creation of the Clustered Index and the NCI's can be minimally logged operations so as to not cause your log file to explode, especially on really big tables. Supposedly, it can be an ONLINE operation during the first build of the indexes as well (can't do rebuilds of individual partitions ONLINE until 2014). I preferred not to do it that way, though.

I have time on my side and I was partitioning an Audit table where only INSERTs are done and they are temporal in nature. That allowed me to build a monthly process for each month and to use it during the initial build where I would copy just one month of data to its new partition on a separate file in a separate file group in a separate database (all of that is for backup reduction, being able to move lesser used legacy data to less expensive storage than the SAN, and to be able to make the older files READ ONLY without having to run the main database through a SINGLE USER mode period, which is required to make a file READ ONLY). It also allowed me to include a special process I've built to shrink a partition and rebuild the indexes on it without it doubling the size and wasting the resulting freespace when I make the partition READ ONLY. I'm writing an article on that bad boy. You certainly don't need to make partitions READ ONLY but I and the auditors certainly feel more comfortable about most of the Audit table being READ ONLY.

That's also why I asked if you have twice the disk space. I keep the original table online the whole time I'm building the partitions. When I'm all done, then I set the databases both to SINGLE USER, do a final true up between the old table and the new partitioned table, do a full backup on both databases, flop a synonym to point at the new table, drop the old table, and bring both databases back to a MULTI-USER mode.

OLTP tables aren't like Audit tables. They generally have important unique indexes that get fouled up by the partitioning column. If you can avoid such unique columns without avoiding DRI (an INSTEAD OF trigger is another way to guarantee uniqueness), then you'll find partitioning a bit easier. Invoice/InvoiceDetail and similar pairs of tables will work better and a whole lot easier if you have a "CreatedDate" column in both tables instead of trying to partition the detail table based on the unique IDs that point back the header (Invoice) table. Of course, if someone wrote some bad front end code, especially for INSERTs, adding a column to the table will cause the app to diesel once... BOOM!

I love SQL Server but MS didn't make partitioning the walk in the park that everyone thinks if you really need for things like SWITCH to work and you really want to take advantage of things like reduced maintenance, reduced backups (got mine down from 6 hours to 6 minutes), and performance (which really only occurs if the partitioning column is included in the criteria of a query... other stuff can actually take longer than a monolithic table/index). Unless the partitioning column is also the PK, they also made having UNIQUE indexes a real chore.

So, here's my bottom line recommendation. If you're partitioning for some vision of performance, you might want to consider NOT partitioning. Partition to reduce backups (tough to do for non-temporal based OLTP and other tables, "easy" to do for Audit tables, which are normally the biggest tables), reduce index maintenance, and to enable "Piecemeal Restores".

The other major recommendation that I have is that you REALLY need to study about partitioning in Books Online. There are a helluva lot of "gotchas". If you partition without understanding those and your partition without a serious plan in place, you're going to hate yourself. That's another reason for having twice the disk space for the table you want to partition... keep the original table online and working until you've proven that you haven't broken anything by partitioning the table. I've spent all of my spare time and some serious overtime in the last 6 months studying partitioning and doing full blown experiments with it. I'm just now comfortable with what I think I know about it but I still don't claim to know everything about it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 11

aaa121 (3/6/2014)
Hi All,
Really interesting thread. I happen to have just inherited a database which has been thrown together with no filegroups or partitioning and this has a table or 500GB (2 billion records).
I'll answer the questions Jeff set out as I would be interested to hear your views.
1. The first thing to ask is, what kind of table is it? Is it an OLTP table or an audit table where rows are written but never updated?
It's a table in a DW where incremental records are first staged and then appended to this table everyday
2. What is the current primary key?
JourneyEventID BIGINT
3. Are there any foreign keys point to or from the table?
There is a Journey_ID field which is a FK but it is not physically bound by any constraints - I presume this is for the performance of loading the data - with the data integrity maintained in the ETL
4. What is the current clustered index (it can be different than the PK).
UNIQUE CLUSTERED INDEX on 2 columns (Journey_ID, JourneyEventID)
5. Are there any other unique indexes other than the PK?
UNIQUE NONCLUSTERED INDEX on JourneyEventID
UNIQUE NONCLUSTERED INDEX on 3 columns (Journey_ID, JourneyEventID, EventTimeStamp)
6. What are you proposing for the partioning column?
EventTimeStamp
7. What is the definition of the table? (generating a CREATE TABLE from the table would help here).
CREATE TABLE [dbo].[JourneyEvent](
[JourneyEventID] [bigint] NOT NULL,
[Journey_ID] [bigint] NULL,
[EventTimeStamp] [datetime] NULL,
[Latitude] [float] NULL,
[Longitude] [float] NULL,
[PointHeading] [int] NULL,
[PointSpeed] [int] NULL,
[EventType_ID] [smallint] NULL,
[EventDuration] [int] NULL,
[DistanceTravelled] [int] NULL,
[TimeElapsed] [int] NULL,
[AverageAcceleration] [float] NULL,
[MaxAcceleration] [float] NULL,
[AverageSpeedFromPrevPoint] [int] NULL,
[RoadType_ID] [smallint] NULL,
[GPSQuality_ID] [smallint] NULL,
[GeoFencingAreaFlag] [char](1) NULL,
[PrevJourneyEvent_ID] [bigint] NULL,
[ExclusionReason_ID] [smallint] NULL,
CONSTRAINT [PK_JourneyEvent] PRIMARY KEY NONCLUSTERED
(
[JourneyEventID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
8. Do you have enough free disk space to hold 2.2 copies of the table?
Yes.
9. Are you going to try to take advantage of greatly reduced backup sizes?
This is not an immediate issue, but it is something I would like to take into account now.
10. Are you going to try to take advantage of "Piecmeal restores"?
This is not an immediate issue, but it is something I would like to take into account now.
11. If this is an audit table, do you have room on some other disk that the server can see to keep the table online for all but a minute during the partitioning process?
12. If this is an audit table, can we store this partitioned table in a separate database so we can take advantage of "Read only" partitions and "packed partitions"?
Eventhough this is not a audit table, I think this is something that can be looked at.

You might have an easier time of things because you've already added EVENTTIMESTAMP. You'll either have to drop the other UNIQUE index or make it non-aligned with all the caveats that a non-aligned index brings with it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)