Need to Create Partition to an existing Table

  • Thanks for all your information and time today, highly appreciate it.

    I will reconvene if at all I am stuck or need more info.

    Thanks again, have a nice eve...

  • My pleasure.

    Welcome to ask more questions should they arise.

  • :), here it goes.

    Below is the PF i was using earlier, if you see the oldest day available is 20081231. Now the requirement is to extend this for all the months in 2007 also. I need to increase the range to include the below.

    20070131, 20070228, 20070331, 20070430, 20070531, 20070630, 20070731, 20070831, 20070930, 20071031, 20071130, 20071231

    how to do this on the PF and PS which is already created. Do i need to alter this existing PF and PS.

    Please provide me the accurate syntax. Thank you..

    CREATE PARTITION FUNCTION [pf_MONTHLY_LOADS](int) AS RANGE LEFT FOR VALUES (20081231, 20090131, 20090228, 20090331, 20090430, 20090531, 20090630, 20090731, 20090831, 20090930, 20091031, 20091130, 20091231, 20100131, 20100228, 20100331, 20100430, 20100531, 20100630, 20100731, 20100831, 20100930, 20101031, 20101130, 20101231, 20110131, 20110228, 20110331, 20110430, 20110531, 20110630, 20110731, 20110831, 20110930, 20111031, 20111130, 20111231, 20120131, 20120229, 20120331, 20120430, 20120531, 20120630, 20120731, 20120831, 20120930, 20121031, 20121130, 20121231, 20130131, 20130228, 20130331, 20130430, 20130531, 20130630, 20130731, 20130831, 20130930, 20131031, 20131130, 20131231, 20140131, 20140228, 20140331, 20140430, 20140531, 20140630, 20140731, 20140831, 20140930, 20141031, 20141130, 20141231, 20150131, 20150228, 20150331, 20150430, 20150531, 20150630, 20150731, 20150831, 20150930, 20151031, 20151130, 20151231, 20160131, 20160229, 20160331, 20160430, 20160531, 20160630, 20160731, 20160831, 20160930, 20161031, 20161130, 20161231, 20170131, 20170228, 20170331, 20170430, 20170531, 20170630, 20170731, 20170831, 20170930, 20171031, 20171130, 20171231, 20180131, 20180228, 20180331, 20180430, 20180531, 20180630, 20180731, 20180831, 20180930, 20181031, 20181130, 20181231, 20190131, 20190228, 20190331, 20190430, 20190531, 20190630, 20190731, 20190831, 20190930, 20191031, 20191130, 20191231, 20200131, 20200229, 20200331, 20200430, 20200531, 20200630, 20200731, 20200831, 20200930, 20201031, 20201130, 20201231)

    GO

  • rangu (6/5/2014)


    I have an existing table with 34M records and is expected to grow further.

    we have decided to partition the table.

    we are planning to create a partition base on a date value.

    Without any malice or sarcasm intended or implied at all, I have to ask the most important question there is when it comes to partitioning....

    [font="Arial Black"]WHY DO YOU WANT TO PARTITION THE TABLE?[/font]

    Seriously... what do you want partitioning to do for you? Except when you include the paritioning column in the criteria of a query, a lot of queries actually run a bit slower when they've been moved to a partitioned table.

    You're doing the partitioning by "date" (in which case I also recommend using the first of the month for the partitioning values along with "RIGHT" partitioning)... do these rows ever change once created?

    I have some recommendations for how to partition this table but one recommedation, depending on why you think you need to partition, might be "don't waste your time or energy". In order to be able to tell, I need to know the real reason ("It's a big table" isn't necessarily the right reason) you want to partition the table and what the table and index structure for ALL the indexes is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rangu (6/5/2014)


    :), here it goes.

    Below is the PF i was using earlier, if you see the oldest day available is 20081231. Now the requirement is to extend this for all the months in 2007 also. I need to increase the range to include the below.

    20070131, 20070228, 20070331, 20070430, 20070531, 20070630, 20070731, 20070831, 20070930, 20071031, 20071130, 20071231

    how to do this on the PF and PS which is already created. Do i need to alter this existing PF and PS.

    Please provide me the accurate syntax. Thank you..

    CREATE PARTITION FUNCTION [pf_MONTHLY_LOADS](int) AS RANGE LEFT FOR VALUES (20081231, 20090131, 20090228, 20090331, 20090430, 20090531, 20090630, 20090731, 20090831, 20090930, 20091031, 20091130, 20091231, 20100131, 20100228, 20100331, 20100430, 20100531, 20100630, 20100731, 20100831, 20100930, 20101031, 20101130, 20101231, 20110131, 20110228, 20110331, 20110430, 20110531, 20110630, 20110731, 20110831, 20110930, 20111031, 20111130, 20111231, 20120131, 20120229, 20120331, 20120430, 20120531, 20120630, 20120731, 20120831, 20120930, 20121031, 20121130, 20121231, 20130131, 20130228, 20130331, 20130430, 20130531, 20130630, 20130731, 20130831, 20130930, 20131031, 20131130, 20131231, 20140131, 20140228, 20140331, 20140430, 20140531, 20140630, 20140731, 20140831, 20140930, 20141031, 20141130, 20141231, 20150131, 20150228, 20150331, 20150430, 20150531, 20150630, 20150731, 20150831, 20150930, 20151031, 20151130, 20151231, 20160131, 20160229, 20160331, 20160430, 20160531, 20160630, 20160731, 20160831, 20160930, 20161031, 20161130, 20161231, 20170131, 20170228, 20170331, 20170430, 20170531, 20170630, 20170731, 20170831, 20170930, 20171031, 20171130, 20171231, 20180131, 20180228, 20180331, 20180430, 20180531, 20180630, 20180731, 20180831, 20180930, 20181031, 20181130, 20181231, 20190131, 20190228, 20190331, 20190430, 20190531, 20190630, 20190731, 20190831, 20190930, 20191031, 20191130, 20191231, 20200131, 20200229, 20200331, 20200430, 20200531, 20200630, 20200731, 20200831, 20200930, 20201031, 20201130, 20201231)

    GO

    No, You can't alter PF and PS for this. You need to go back:

    1. Recreate clustered index on another FG

    2. Drop PS

    3. Drop PF

    and now go forward again:

    1. PF

    2. New files/FG's if neeeded

    3. PS

    4. Clustered index on PS

  • ok, The existing PF and PS has dependency with other tables which are in the DB, and they are active today.

    If I go ahead try to alter or Drop and Recreate the PS and PF to extend the range, would there be any impact on other tables? what is the option, I cannot go ahead and make changes on all those tables.

    Just want to extend the range, Please suggest if this is possible.

    The below is my assumption, that it would work.

    ALTER PARTITION SCHEME ArchivePartitionScheme NEXT USED [PRIMARY] GO

    ALTER PARTITION FUNCTION [ArchivePartitionFunction]() SPLIT RANGE(20081130) GO

  • You can split only the last range, for example 2014-05-31, but not in the beginning or the middle, however please test it, I had such a case long time ago.

    Sharing same PF and PS between multiple tables assumes that they will have same partitioning structure, particularly same boundary values. However if this is not the case in your situation, you can create dedicated PF-PS for each table or set of tables.

    In your current situation, what I can suggest is to run these scripts during the weekend, hopefully you will have less user activity. And more common note: you should have carefully planned ahead and tested it in dev prior to run it in production.

  • It sounds as if the original table wasn't clustered. That alone can cause big performance issues.

    You should put the appropriate clustered index on the table before doing the partitioning. 34M rows isn't many at all for SQL Server and would normally never require partitioning based on row count.

    For now, I suggest making it a unique clustered index by including the PK, since I don't have further knowledge of the table's data at this time.

    CREATE UNIQUE CLUSTERED INDEX [ClusteredIndex_on_ps_MONTHLY_LOADS_635375610053010257] ON [HISTORICAL_Test]

    (

    [AS_OF_TS],

    [hist_row_id]

    ) WITH ( SORT_IN_TEMPDB = ON, ONLINE = OFF ) ON [<<filegroup_name>>]

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks for your response Scott, [hist_row_id] acts as a Primary key for that table, so acts a Clustered Index anyway.

    And today it has 34M may not look many, but over the next 2 years it would grow in to Billions.

  • rangu (6/6/2014)


    Thanks for your response Scott, [hist_row_id] acts as a Primary key for that table, so acts a Clustered Index anyway.

    And today it has 34M may not look many, but over the next 2 years it would grow in to Billions.

    But that's the wrong column to cluster the table on, and that will cause performance issues. Partitioning will "solve" them, but mainly because you're clustering the table by date then, not because of the partitioning itself.

    As long as you specify the date / a date range in queries, 2B rows will not necessarily be a performance issue once the table is properly clustered.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • so I have tested, if it accepts values.

    Used below to alter to include new boundary values.

    ALTER PARTITION SCHEME [ps_Historical_FCM_Test] NEXT USED [PRIMARY]

    GO

    ALTER PARTITION FUNCTION [pf_Historical_FCM_Test]() SPLIT RANGE(20081130)

    GO

    The new value is added, however this didn't reflect on the partition yet. for that I need to drop and re create the Partition I believe. what do you think?

  • rangu (6/6/2014)


    The new value is added, however this didn't reflect on the partition yet.

    Does this mean that if you select * from table where your_value = .... does not return any records? Or you believe that new value is placed in wrong partition? If this is the case, it proves that you can't split partition in the beginng or anywhere in the middle.

    In any case I think you have to drop and create partitioning.

  • Still haven't got an answer to why the table is being partitioned in the first place. What's the point?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • So the question why are we doing a Partition.

    Before answering this question, since I am newbie and want to understand from you when exactly we need a partition on any table?

    In our current scenario, the table on which queries are used to generate reports, are running very slow.

    Though it has a clustered in the name of Primary Key and a non Clustered Index in our case is a date column, as this is most frequent column used while writing queries. And we never use this primary key while writing queries as this is more of table specific column to tell us row uniqueness.

    The table itself has 270+ columns and with 34M records today, will grow further. Added to this there is no unique column identified in this table, the primary key is a kind of Identity column to identify a row uniquely.

    I strongly understand, correct me if I am wrong. If a table has at least one unique field, or combination of two or more keys to form a unique key, would for sure increase the performance levels.

    Here we have no choice left. The idea is to partition such a way to improve the query performance. in most of the cases, We would require to generate report which form latest feeds(I mean latest dates of month and year). But need to store the historical information too which would be helpful in yearly report generation or comparisons against previous years data etc.

    Please let me know if our approach is correct or do you still feel the performance can be still improved without going for partition at all. Need your valuable suggestions.

  • The primary goal of partitioning is to reduce maintenance efforts and time by limiting index rebuild jobs only to latest partitions and implementing partial backup (with setting prior partitions to read-only).

    In some cases it may benefit performance as well:

    1. When your files are placed on different physical spindles (however in your case you're placing them all in primary)

    AND

    2. You have lots of queries that scan accross multiple partitions, like compare current moth sales to the prior one, etc.

    Before deploying partitioning to production, I would recommend to thorougly test it in lower environment for performance.

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply