Partitioning Basics

  • Hi there

    I have a table with 33,112,015 rows, the main indexes are built on Year and Week, performance atm is good, but I am thinking of partitioning the data. I have two questions:

    1. If the file groups I partition to are on the same disk, will I see an improvement? I am under the impression that it wont improve performance as it will reside on the same disk, so the load will still be on 1 disk only?

    2. The wksYear range from 2003 to 2011 and weeks for each year range from 52-53. Is it possible to split initially on year, so have 9 FileGroups each containing 1 year's data, and then have FG9 containing 2011 data, be split again into 52 partitions according to the week data for 2011?

    New data gets loaded weekly.

    CREATE TABLE [dbo].[weeksData](

    [source] [char](1) NOT NULL,

    [wksYear] [char](4) NOT NULL,

    [wksWeek] [char](2) NOT NULL,

    [wksISSN] [varchar](13) NULL,

    [wksTCMPOS_VOL] [bigint] NULL,

    [wksTCMPOS_VAL] [bigint] NULL,

    [wksTCM_VALUE] [money] NULL,

    [wksTCM_QUANTITY] [bigint] NULL,

    [wksTCM_ARP] [money] NULL,

    CONSTRAINT [PK_dumpData] PRIMARY KEY CLUSTERED

    (

    [source] ASC,

    [wksYear] ASC,

    [wksWeek] ASC,

    [wksISBN] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    ) ON [PRIMARY]

    I don't want any partition functions or code, I just want to know if my logic is correct/viable?

    Thanks.

  • you can read this http://msdn.microsoft.com/en-us/library/ms179316.aspx

  • thanks, they are not really clear on the performance improvements on having the FG on a single disk.

    will go read some more online.

  • Hi Jakodewet,

    I believe that you will still see performance benefits yes, even if all files in the filegroups of the partition scheme are resident on the same spindles.

    Obviously performance should be better if the files are resident on separate spindles, but you will still see benefits even without this.

    Firstly, the optimiser understands partition schemes and makes use of them when formulating a query plan.

    If you have 10 years of data in a partitioned table partitioned on year, and you specify that you are interested in 2002 only in your query, then the optimiser can immediately discard the other 2001, 2003, 2004…2010 partitions and focus on the 2002 partition it needs. This is called partition elimination and can yield a big performance boost for queries with predicates based on the partition key.

    If you have more than one table to partition and you can utilise the same partition function for them both, then do so.

    If you can use the same partition scheme as well then do this too. This can also help the optimiser when you query the two tables together. It is called alignment.

    In SQL Server 2008 I think the threading model was improved a lot with partitioned tables.

    SQL Server 2005 queries over partitioned tables used 1 thread per partition. In SQL Server 2008 I believe multiple threads can be applied to each partition making the potential performance gain much greater.

    Not sure I understood the second part of your question properly, but if you are asking if partitions have to be symmetrical then they do not.

    Perhaps for your 2010 and 2009 partitions you might like these partitioned by month, so 24 partitions for the two years.

    Then 2008, 2007…2001 by year. So 8 partitions for all of these.

    SPLIT and MERGE can be used to carve up your partitions and glue them back together however you want to.

    Happy Partitioning!

  • Hi

    Thanks a lot for your input, really appreciate it.

    Put this on the back burner for a couple of days, but going to get into it ASAP.

    You understood my second question 100% thanks, I would like 9 partitions for each year and then also 52 partitions for the 2011 year of data, split into weeks.

    Thanks and will let you guys know if I need any further assistance.

  • Andy, just a quick question.

    If these partitions are all going to be on the same disk, is it necessary to create a file group for each partition or can I just use the "ALL TO PRIMARY" clause?

    Thanks

    Jako

  • Glad I could help on this Jakodewet.

    If all your partitions are going to be on the same spindles then there is no law to stop you housing them all in the one filegroup I suppose, with the "ALL TO PRIMARY" clause.

    I think it's usually stated as good practice to leave the Primary filegroup exclusively for the system objects of the database and then create at least one new filegroup to contain data files for all the user objects. So although you could put everything on primary, perhaps best to have at least one additional filegroup.

    Additionally, if your database is fairly large and all the datafiles are in the primary filegroup then backing up and restoring the database is an all or nothing proposition.

    If you have datafiles spread out through different filegroups, then you have the possibility for backing up and restoring individual file groups (piece-meal restores). You'll need to be using enterprise edition for this functionality.

    Filegroups are free though so why not use them? You might just be doing yourself a favour in the long term if your database continues to grow.

  • Thanks Andy, you have been a great help and you have given me a lot of info to work with.

    Will start with the partitioning shortly.

  • Glad I could help Jako, I wish you good luck with your partitioning.

  • Hi Andy

    Sorry to bother again, but I have run into a little problem. I have created a new file group and the scheme/partition function:

    ALTER DATABASE bs

    ADD FILEGROUP FG1

    GO

    ALTER DATABASE bs

    ADD FILE

    (

    NAME = 'FG1',

    FILENAME = 'C:\SQLDATA\bs_FG1.ndf',

    SIZE = 500MB

    )

    TO FILEGROUP FG1;

    GO

    CREATE PARTITION FUNCTION pf_BS_years (CHAR(4))

    AS RANGE RIGHT

    FOR VALUES ('2002','2003','2004','2005','2006','2007','2008','2009','2010')

    GO

    CREATE PARTITION SCHEME ps_BS_years

    AS PARTITION pf_BS_years ALL TO (FG1)

    GO

    the problem is that the table already exists in the db and it has a clustered index

    ALTER TABLE [dbo].[Bookscan_Weeks] ADD CONSTRAINT [PK_Bookscan_Weeks] PRIMARY KEY CLUSTERED

    (

    [Panel] ASC,

    [wksYear] ASC,

    [wksWeek] ASC,

    [wksISBN] ASC

    )

    How will I apply the scheme to this table?

    I was thinking along the lines of creating a temp table with the partition scheme, populate it, drop the old version, rename the temp table and adding the clustered index again?

    Am I thinking about this wrong?

  • Hi Jako,

    Even simpler than that.

    To move a table to a partition scheme you just create or re-create the clustered index.

    When you create the clustered index just specify the partition scheme like this at the end....."ON MyPartitionScheme(MyPartitionKey)"

    Or if you already have the table with a clustered index in place then something along these lines...

    CREATE CLUSTERED INDEX idx_MyClusteredIndex ON MyTable(MyClusterIndexField) WITH DROP_EXISTING ON MyPartitionScheme(MyPartitionKey)

    If you have a big table, rebuilding the clustered index can take quite some time, so don't panic.

    Go and have a cuppa and when you return your table will be in partitioned Nirvana.

  • Thanks Andy, did it my way in the meantime, worked a treat, but will keep your way in mind when I do it on the production server.

    Already seeing an improvement, the table contains about 3.2 million records and its bringing basic yearly calculations up in 2-3 seconds.

    Now just to partition 2011 into weeks.

    Thanks again for all the help

  • ok, so the year partition is up and running, working good so far.

    Now I want to include the weeks, not sure how to do it only for 2011, but I was thinking of using a Filtered Index?

    CREATE PARTITION FUNCTION pf_BS_weeks (CHAR(2))

    AS RANGE RIGHT

    FOR VALUES ('00','01','02','03','04','05','06',...'51')

    GO

    CREATE PARTITION SCHEME ps_BS_weeks

    AS PARTITION pf_BS_weeks ALL TO (FG1)

    GO

    CREATE INDEX idx_Weeks

    ON BS (wksWeek)

    WHERE wksYear = '2011'

    ON ps_BS_weeks(wksWeek)

    ?

  • 1) partitioning is an ADVANCED feature, and there are MANY ways you can zig when you should zag. I HIGHLY recommend you get a professional on board to help you get the most out of this feature (if it is even right for you to use at all).

    2) it is possible to get increased performance from partitioning, even with all data on same set of spindles, if you can get partition elimination in your queries. But partitioning is first and foremost a MANAGEMENT feature, not a performance feature.

    3) that is a 'questionable' table design. those sub-date fields shoud be numerics of the smallest type required - MUCH more efficient design than chars. Depending on data value distributions and data access patters the indexing order of cluster could be suboptimal as well.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • thanks for the input Kevin.

    The table design is from an old legacy system before I started working here, looking at some ways to improve performance.

    Will probably redo the design in the near future.

    Thanks again, all info is helpful as this is my first time working with partitions.

    Have a good weekend

    Jako

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply