Table Partioning or Any Other Solution

  • I am reposting this from Microsoft forum

    /******************  My Original Question  *******************/

    I have a table that needs to be updated every cycle (15 minutes).  The data needs to be truncated and new data inserted every cycle. The problem is that I need the current cycle of data to be readable until the next cycle of data is completely loaded.  The writes will happen once a cycle. The reads will occur very frequently throughout the entire cycle.

    I was thinking of using SQL Server 2005 table partitioning (range partitioning) and have two partitions with 15 mins of data in each. I just wanted to know if there is a better way of doing it than using table partitioning for this purpose. Thanks for your help ...

    /****************** Two useful responses from Microsoft Forum *******************/

    Partitioning is definitely the way to go with this. Here is one thing to watch out for:

    You might want more buckets than just two. Maybe set up enough buckets for a 24 hour period. Why? What happens if there is a processing anomaly? You only have 15 minutes to correct it before the data starts to get intemingled - then you can't truncate, you have to delete based upon datetime of the transaction. If you have enough buckets for a 24 hour period, then you have 24 hours to remedy the problem. Same concept, just more transaction buckets.

    -- ABC

    If you're using partitioning there's a much easier way than partitioning every 15 minutes.  Use a partition function with a single boundary at NULL and all your data will go into a single partition.  Create two tables on the same partition scheme using that function.  One is your target table, the other is the staging table.  Load the staging table, and then simply issue an ALTER TABLE SWITCH to switch the data from staging into production and vice versa.  Once the ALTER TABLE obtains a schema lock, the data will be instantly switched.  This should appear as a brief wait to your users, it depends on how many there are and how long their SELECT queries run.  After that truncate the staging table.

    - XYZ

    /********** Now the problem I am having ******************/

    I have setup the partitioning exactly the way XYZ said it. When i try to switch partitions it doesn't switch because of the reads that are occuring. I am running the following sql command for switch:

    Truncate Table OptimizationResults

    GO

    ALTER TABLE OptimizationResultsStg SWITCH PARTITION 1 TO OptimizationResults

    PARTITION 1;

    GO

    It asks me to delete data before switch. I don't want to use Delete FROm Table because it will take too long with millions of rows in this table. Truncate statement gets blocked by the read statements being run by users. I am hesitant of using Snapshot Isolation (in combination with switch partions) because it will put a lot of IO overhead with millions of records. Do you have any idea how can i resolve this issue? Please help ...

  • Not sure I would use partioning for this, but then I grew up on a farm and like to keep things simple!

     

    How about a 2nd table that holds which cycle is the current actvive cycle that the queries would read this info from.  Then in the main table you add a column with this value in it (I would use an INT value).

    Now when you are done inserting into the table you simply switch which cycle ID is the active one.  Now you should be able to go clean up the table, or if you have downtime during the night etc. you could set up a cleanup job at that time.

    This is of course without knowing anything about your environment, so hard to tell if this would work performance wise.

  • Hi,

    have you thought of using a view that is dynamically modified after the load of data to always point to the table you want to report from?

    Jakub

  • How do you read data? Do you read data by time range like from 9:00 to 10:00? or just for the current cycle?

    The solution can be very different based on how you read data.

     

  • not sure if I understand your purpose. I was thinking about reading the current cycle only

    anyway if you need to read data from previous cycles too:

    how to do that may depend whether you load data incrementaly. If I understand correctly you may think about loading data as follow:

    Example:

    1st cycle (table 1)

    previous new input to table 2 (cycle 0) + new input to table 1

    2nd cycle (table 2)

    new input to table 1 (from cycle 1) + new input to table 2

  • First of all thanks for your help guys. Let me clarify a bit more:

    I only need to read current cycle. I haven't decided how the data will be inserted I will probably use bulk insert. So again the current cycle has to be available to users until I am done loading the new cycle.

    I am looking to use one of the new SQL Server 2005 features to do this task for performance and less management overhead reasons. I might actually go back to partitioning but for now I am going to try another approach which is ALTER SCHEMA. Have three shemas lets say dbo, stg, temp.

    dbo.OptimizationResults is being read by users. I will load up stg.OptimizationResults. Once the load is done I will switch dbo.OptimizationResults to temp.OptimizationResults and then switch stg.OptimizationResults to dbo.OptimizationResults.

    I will test it out and let you guys know.

  • If I understand correctly, when you are loading current cycle data, users still need to read data from previous cycle.

    Partition may be not the best solution unless you explicitly add the partititon number as query condition.

     

    My suggestion is as follows:

    1) Create two data tables instead of partition;

    2) Create a status table with two columns: DataTableName and Status. Insert two rows with the DataTableName  as the two data table and Status set to 1. (1 is loading and 0 is current cycle ready to use)

    3) Create a view

    SELECT ... FROM DataTable1 D, StatusTable S WHERE S.DataTableName='DataTable1' AND Status=0

    UNION ALL

    SELECT ... FROM DataTable1 D, StatusTable S WHERE S.DataTableName='DataTable2' AND Status=0

    4) When you loading data, get the table name with status 1 from status table

    5) Truncate data in this data table and bulk loading data into that table

    5) When loading is done, use "begine tran, switch the status in the status table, commit tran" for the two rows in the status table.

     

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply