Hidden gotchas in horizontal partitioning

  • After receiving some advice from an expert here at SSC (http://www.sqlservercentral.com/Forums/Topic946575-266-1.aspx), I realized I need to horizontally partition my largest table.

    I don't have Enterprise Edition, so I'm essentially going to use a schema-bound view over partitioned tables. This approach is based on an article I found online: http://www.simple-talk.com/sql/sql-tools/sql-server-partitioning-without-enterprise-edition/.

    Here is what it will look like:

    -- The original table

    CREATE TABLE GPS (

    PingDate DATETIME2 NOT NULL,

    DeviceID INT NOT NULL,

    PingLocation GEOGRAPHY NOT NULL,

    CONSTRAINT GSP_PK PRIMARY KEY (PingDate, DeviceID)

    )

    -- Paritioned table

    CREATE TABLE GPS_2008 (

    PingDate DATETIME2 NOT NULL,

    DeviceID INT NOT NULL,

    PingLocation GEOGRAPHY NOT NULL,

    CONSTRAINT GPS_08_PK PRIMARY KEY (PingDate, DeviceID),

    CONSTRAINT GPS_PingDate CHECK (PingDate >= '2008-01-01' AND PingDate < '2009-01-01')

    )

    CREATE TABLE GPS_2009 (

    PingDate DATETIME2 NOT NULL,

    DeviceID INT NOT NULL,

    PingLocation GEOGRAPHY NOT NULL,

    CONSTRAINT GPS_08_PK PRIMARY KEY (PingDate, DeviceID),

    CONSTRAINT GPS_PingDate CHECK (PingDate >= '2009-01-01' AND PingDate < '2010-01-01')

    )

    CREATE TABLE GPS_2010 (

    PingDate DATETIME2 NOT NULL,

    DeviceID INT NOT NULL,

    PingLocation GEOGRAPHY NOT NULL,

    CONSTRAINT GPS_08_PK PRIMARY KEY (PingDate, DeviceID),

    CONSTRAINT GPS_PingDate CHECK (PingDate >= '2010-01-01' AND PingDate < '2011-01-01')

    )

    GO

    -- parititoned view

    CREATE VIEW VW_GPS WITH SCHEMABINDING

    AS

    SELECT

    PingDate,

    DeviceID,

    PingLocation

    FROM GPS_2008

    UNION ALL

    SELECT

    PingDate,

    DeviceID,

    PingLocation

    FROM GPS_2009

    UNION ALL

    SELECT

    PingDate,

    DeviceID,

    PingLocation

    FROM GPS_2010

    Are there any hidden gotchas to this approach that I should be aware of?

    Has anyone here done this with success? And if so, can you recommend any best practices?

  • Just be aware that this approach - also known as poor's man partitioning - is not as efficient as real partitioning is.

    This approach allows to simulate table partitioning handling "partitioning" on the application side as opposed as handling "partitioning" at SQL Server kernel. Final users will never notice the difference but DBA knows it would not perform as well as real partitioning does.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • Thanks for your response...can I ask a follow-up question? (and maybe this is more philosophical than technical, but I'm curious to hear your thoughts)

    You refer to this method as poor man's partitioning, but Enterprise edition is something along the lines of $32k per processor so I can't imagine that this is an unusual occurrence for many cost-conscious companies.

    Do these other companies accept the performance hit of this partitioning method or is there an alternative?

  • 8kb (7/6/2010)


    Thanks for your response...can I ask a follow-up question? (and maybe this is more philosophical than technical, but I'm curious to hear your thoughts)

    You refer to this method as poor man's partitioning, but Enterprise edition is something along the lines of $32k per processor so I can't imagine that this is an unusual occurrence for many cost-conscious companies.

    Do these other companies accept the performance hit of this partitioning method or is there an alternative?

    In the database world size matters.

    "Size" in this context refers to Volume (how much data), Volatility (how often data changes e.g. ins/upd/del) and Concurrency (how many processes are hitting the database concurrently).

    "Size" is also compounded by "Criticality" - it is not the same a large Datamart for historical Advertising trends than a same size OLTP system handling global sales.

    Bottom line is SLA.

    When the very survival of the organization depends on a specific database "the Business" will probably find $32K per processor a bargain.

    _____________________________________
    Pablo (Paul) Berzukov

    Author of Understanding Database Administration available at Amazon and other bookstores.

    Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
  • I used to do this before we had 2K5 and above! The additional things we did was to have the tables in different files\filegroups on difference drives for better performance.

  • Thanks for both of your responses. Very informative for me...

  • I hope it's not too late to chip in some extra thoughts on this.

    What you are doing is known as a local partitioned view. It uses the same idea as distributed partitioned views, except all the member tables reside on the same server.

    The documentation does say that partitioned tables are the preferred solution, but as you have discovered, that feature is only available in Enterprise Edition.

    Both partitioned tables and local partitioned views have their strengths and weaknesses. Partitioned tables have some definite performance issues on 2005, especially as the number of partitions rises. Many of these problems were addressed for 2008, but there are still some difficulties.

    Local partitioned views can work extremely well, so long as the query optimiser has enough information to eliminate partitions or otherwise limit the amount of data searched. In the best case, these optimisations can occur at compile time, although this typically requires one or more predicates to reference constant values.

    Partitions can also be eliminated or restricted at run time through the use of start-up filters, which will often be the case if you use parameters or variables in the query. The query plan will show a regular Filter iterator, but it will have a Start Up Expression Predicate. This is evaluated before all the iterators below it in the plan, and that branch of the plan only starts to execute if the predicate evaluates to true.

    A full discussion of the merits of table partitioning and local partitioned views would require much more space than is reasonable to use here. My view is that there is a good deal of overlap between the two technologies, and there are cases where each will out-perform the other.

    As far as query optimisation is concerned, local partitioned views have reasonably deep support, so you should find that the optimiser does a pretty good job with it. The downside is that newer engine features may not have such deep support because Microsoft do not appear to be actively extending the local partitioned views feature. Likely this will be more of an issue with the new features introduced in 2008, rather than 2005, but it is something to bear in mind.

    Partitioned tables are pretty much the reverse. This is a new feature (at least from 2005's point of view) so optimisation and query processor support is not yet deeply embedded. It is being actively developed, so new features will be partition-aware. I hope that makes some sort of sense.

    As far as the sample code you provided goes, you should be aware that constraint names must be unique in each table, and schema bound views require that all referenced objects specify a schema. Once those modifications are done, the example code works well. Bear in mind that local partitioned views do not have to be schema bound, but I do recommend it as long as you are sure you can meet the conditions. One requirement is that all objects referenced must be in the same database as the view - which is not always the case for local partitioned views.

  • This is great information. Thanks for taking the time to provide such a detailed explanation (and for catching the bugs in the code...). The majority of responses I've received when asking about partitioning is that it really isn't recommended as a best practice, or if it is necessary, it's something that should be embarked upon with extreme caution. But if we do decide to go forward with it, then I'll definitely be using your information as a reference.

    I stumbled upon this scenario because of a very large table running in SQL Server Express that was causing an out of memory error during DBCC CHECKDB. The overwhelming opinion has been to simply get a licensed version (even Workgroup edition) and keep the table in its normal size. I have to assume this is the best solution only because I know there are much, much bigger databases out there, presumably with giant tables, and I can't imagine they are all running out of memory during the integrity checks.

    Anyway, I'm kind of rambling. But my overall impression is that this isn't something that is typically done, and that a licensed edition of SQL Server with enough RAM should be able handle very large tables for all maintenance tasks.

  • 8kb (7/17/2010)


    I stumbled upon this scenario because of a very large table running in SQL Server Express that was causing an out of memory error during DBCC CHECKDB. The overwhelming opinion has been to simply get a licensed version (even Workgroup edition) and keep the table in its normal size. I have to assume this is the best solution only because I know there are much, much bigger databases out there, presumably with giant tables, and I can't imagine they are all running out of memory during the integrity checks.

    Anyway, I'm kind of rambling. But my overall impression is that this isn't something that is typically done, and that a licensed edition of SQL Server with enough RAM should be able handle very large tables for all maintenance tasks.

    Yes, I did read the link you posted where Paul Randal gave you some advice on that. It was interesting because I had never seen DBCC CHECKDB fail for this reason before.

    Certainly buying a more capable version of SQL Server makes sense if you have outgrown the capabilities of Express, but I'm not sure I'd spend my own money on Workgroup Edition (it is very limited). I guess it depends a lot on the wider circumstances, but if the issue with DBCC CHECKDB is your only present concern, I might be tempted to go the local partitioned view route and keep saving up for Standard Edition!

    By the way, I have worked on a number of very large production systems for major corporations that use local partitioned views. Not everyone needs Enterprise Edition, but many large tables can benefit from horizontal partitioning.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply