Understanding creating Partition Key on a date

  • Hi all,

    I'm trying to understand partition keys and how to create them.  Did some googling but the samples I found it still doesn't click to me.  So I have a requirement in which my user specifically requests a partition key on a date as this is what will be in the where clause of user queries. In the past I've typically used indexes on date fields like this and I've never worked with partition keys before so trying to learn them and implement them.  So a little preliminary info; The records in the tables I'm working with are 500 million + (size most likely being the reason they want partition key on this date).   Here are my fields (For simplistic sake I just included four fields):

    ClaimLineID (PK bigint)

    ClaimHeaderID (FK bigint)

    ClaimLineAmount (money)

    ServiceStartDate  (date)

    I learn best by seeing in action so I'm wondering if someone could throw a quick create table statement with these 4 fields where ServiceStartDate is being implemented as a partition key.  Would also be great to see it as an alter table statement as well.  This will help me to understand how to create these a new table as well as modify existing table to add it.  Any help provided would be greatly appreciated.  Thanks in advance!

    Strick

  • Why do you want to partition the table? Partitioning a table is for speeding up backup, not speeding up your queries.

    (Where's the "SQL Howlers" article that Gail Shaw wrote about that?)

  • This should give you some idea.  With that partitioning, you can no longer have ClaimLineID as a pk (or even a stand-alone unique index/constraint).

    If you can use data compression, you almost certainly should for that many rows (i.e., uncomment "DATA_COMPRESSION = ").  If you have (max) types in the table, you should consider forcing them to LOB pages (by using sp_tableoption to set 'max value types out of row' to 1).

    SET ANSI_NULLS ON;
    SET QUOTED_IDENTIFIER ON;
    GO
    --DROP TABLE dbo.Services;
    --DROP PARTITION SCHEME ServiceStartDatePartitionScheme;
    --DROP PARTITION FUNCTION ServiceStartDatePartitionFunction;
    GO

    CREATE PARTITION FUNCTION ServiceStartDatePartitionFunction ( date )
    AS RANGE LEFT
    FOR VALUES (
    '20200101', '20200102', '20200103', '20200104', '20200105', '20200106',
    '20200107', '20200108', '20200109', '20200110', '20200111', '20200112',
    '20210101', '20210102', '20210103', '20210104', '20210105', '20210106',
    '20210107', '20210108', '20210109', '20210110', '20210111', '20210112',
    '20220101', '20220102', '20220103', '20220104', '20220105', '20220106',
    '20220107', '20220108', '20220109', '20220110', '20220111', '20220112',
    '20230101', '20230102', '20230103', '20230104', '20230105', '20230106',
    '20230107', '20230108', '20230109', '20230110', '20230111', '20230112' );
    GO

    CREATE PARTITION SCHEME ServiceStartDatePartitionScheme
    AS PARTITION ServiceStartDatePartitionFunction
    ALL TO ( [PRIMARY] );
    GO

    CREATE TABLE dbo.Services (
    ServiceStartDate date NOT NULL,
    ClaimLineID bigint NOT NULL,
    ClaimHeaderID bigint NOT NULL,
    ClaimLineAmount money NULL,
    INDEX Services__CL UNIQUE CLUSTERED ( ServiceStartDate, ClaimLineID ) WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98 ) ON ServiceStartDatePartitionScheme ( ServiceStartDate )
    /* , CONSTRAINT Services__PK PRIMARY KEY NONCLUSTERED ( ClaimLineId ) */ );
    GO

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Hello and thanks for your response.

    My requirement is for a specific field not a table.  They are asking for a partition key on the field ServiceStartDate.  I'm very ignorant to partition keys and it is very new to me so I don't know why it is requesting this specifically so I'm not in a place to rebuttal why it may or may not be a good idea.

    The reading up on it via googling it; it appears be like indexing that field but better for very large tables but maybe its not and I'm just trying to learn it so I can implement it.

  • Hi Scott,

     

    Thanks for your response.  I am using your sample to learn this and am working through setting this up in our test environment.  A few questions.

     

    1. In the create partition function what are the  "for values" for and what made you select those specific values in your example?
    2. Also in the partition function what do "as Range left" mean and when would I use left vs right?

     

    Thank you so much for all your help!

     

    Strick

  • Clustering by the date will solve the performance issues.  Partitioning will aid performance only if you need to rebuild the table, in which case you would only need to rebuild the current partition rather than the whole table.  Similarly, theoretically you could back up only the current partition and not all the old partitions that aren't changing, thus also reducing overhead somewhat for backups.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Gail's piece: https://www.red-gate.com/simple-talk/databases/sql-server/database-administration-sql-server/gail-shaws-sql-server-howlers/

    You can't partition a column. This is for tables. A client asking for this doesn't really understand what they want.

  • By far the best performance payoff for this is creating the best clustered index on the table, as coded below.  You would likely not even need to partition then.

    Based on the PK comment you made, I'm assuming you have a clustered PK on an identity column.  Which is a so-called "standard practice", but that is actually the biggest myth in table design.  Defaulting to a clustered identity is usually WRONG, WRONG, **SO WRONG**.

    Here are the general commands to adjust the indexes, if my guess above about the PK is correct.

    ALTER TABLE DROP CONSTRAINT [PK_yourtablename] ON dbo.yourtablename;

    CREATE UNIQUE CLUSTERED INDEX yourtablename__CL ( ServiceStartDate, ClaimLineID ) ON dbo.yourtablename WITH ( /*DATA_COMPRESSION = PAGE,*/ FILLFACTOR = 98, SORT_IN_TEMPDB = ON );

    ALTER TABLE ADD CONSTRAINT yourtablename__PK PRIMARY KEY ( ClaimLineID ) WITH ( /*DATA_COMPRESSION = ROW,*/ FILLFACTOR = 99, SORT_IN_TEMPDB = ON );

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Hi Scott,

     

    Sorry, actually there was a typo.  I can see how the typo would create your assumption.  This is a Claim Line table so the ClaimHeaderID and ClaimLineID as actually a composite PK.

    ClaimHeaderID is still a FK back to a ClaimHeader table.  So if should have read like this:

     

    ClaimHeaderID (PK FK bigint)

    ClaimLineID (PK bigint)

    ClaimLineAmount (money)

    ServiceStartDate  (date)

     

    As far as whether I should be using index vs the partition request I'm inclined to agree with you since I've worked with indexes on dates in large tables my whole career and haven't ever come across where someone wanted a partition key.  Unfortunately I'm kinda stuck implementing the requirement.  So I'm inclined to figure out how to implement a partition key per the request.

  • There is no way to implement the request. There is no partition key on a non partitioned table

  • But you're going to have to include the date in the clustering key to do partitioning anyway, you might as well change the clustering first.  And that will be less work than having to partition the table too.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Hi Steve,

    So even if though the index approach should be the way I go, why wouldn't Scott's earlier code not work for me if I at least implement it that way?

  • You very likely already have a clustered index on the table, on ( ClaimHeaderID, ClaimLineID ).  (That's a vastly better clus key than an identity column, btw.)

    You can only have one clustered index on a table.  So you will have to drop the current index; create the new clustered index (on ServiceStartDate, ClaimHeaderID, ClaimLineID); finally recreate the PK as nonclustered.

    Best way to do that: (1) script out the CREATE for the existing index, but change it to NONCLUSTERED; (2) drop the index; (3) create the new clus index; recreate the old index as nonclus.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • stricknyn wrote:

    Hi Steve,

    So even if though the index approach should be the way I go, why wouldn't Scott's earlier code not work for me if I at least implement it that way?

    That's a partitioned table.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply