How to switch data online

  • Hi SQL experts:

    My project requires switching data online. We use a regular job to prepare new data and they should be made available when ready. For customers we only support simple read-only operations which will be finished in several seconds. We thought of 2 plans, each has pros and cons.

    (1)Use three machines for the switching. MachineA is working online and serve clients, while MachineB is preparing data. MachineC contains meta info (which machine is available currently). When MachineB is ready, it asks MachineC to update the meta info. Then machineA will went offline and start to prepare data for the next round. When clients want to access the data, firstly it needs to ask MachineC to know whether A or B is available currently.

    Pros:

    a.Switching is quick and online performance will not be affected

    Cons:

    a.The logic is complex. We need to synchronize between 3 machines

    b.Client is close coupled to server

    c.Switching might happen just after client gets the meta info. Exception will happen then and client application should be designed carefully to handle this situation.

    (2)Use only one DB. All requests are handled by a stored procedure SP_ SomeTask . For each request, SP_SomeTask will check meta information to determine which table to use. Suppose currently tableA has the data. When new data is ready, we load it into tableB and create indexes. After tableB is ready, we update the meta information and queries will be executed on tableB afterwards. TableA will be used for loading next time. And so on.

    Pros:

    a.Logic is simple and client don’t care about the details

    Cons:

    a.Performance of online request might be affected when tableB is loading and indexing

    My question is:

    Do we have other candidate solutions?

    Is there supplementation of analysis or improvements for solution (1) and (2)?

    Any comments is appreciated

  • I would look at using 3 databases - not servers.

    DatabaseA has application code and access to the data.

    DatabaseB and C alternate between build cycles. When B is active, C is being built. When C is active, B is being built.

    In DatabaseA you then define synonyms for all objects in databaseB/C. When you are ready to publish, you drop the synonyms and recreate them pointing to the other database. The switch should take no more than a few seconds.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • There are other alternatives as well. Can you give us some of the business reasons for wanting to do this? What are you trying to accoplish by switching the data like this?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Updating this active thread.

    An even easier solution:

    Two Tables, let's call them MyTableA and MyTableB

    You also have a view called MyTable.

    Consumers of the data read from the view, MyTable, which points to the current active table. Whenever you wish to swap out tables, alter the view. This will manage concurrency very nicely - the alter will get blocked by any active requests. When those requests complete, the view is recreated and subsequent requests hit the new table.

    If you are concerned about performance, (and you are indexing, writing etc) then ensure your disk configuration is optimal for the load. (i.e. the tables are each on separate filegroups (on separate spindles or shelfs (if you are on a SAN) Also, you are likely making heavy use of TempDB building your indexes so see if you can also separate tempdb out into multiple files on separate spindles. Finally, make sure the maximum degree of paralellesm won't allow a very cpu intensive process to dominate the server.

    One 'gotcha' if your view definition is 'select * from myTableA' and you alter the definition of that table, you need to exec SP_refreshView 'myTable' to refresh the metadata.

    Good Luck, I hope that helped

    SQL guy and Houston Magician

  • Just for everyone's benfit, this question has been posted in 6 different threads. I think they've all been linked to this thread now, so we can all work within one thread.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I would recommend looking into SQL Server Table Partitioning for this solution. You don't need multiple servers, multiple databases, or multiple tables. Just a well-designed partition strategy (possible date driven) and an understanding of the new partitioning functions/schemes that became available with SQL Server 2005.

    As far as 'switching' data into an active table, SQL Server Table Partitioning can do this too.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Doh - posted this in the wrong thread

    Some lite reading:

    http://sqlblog.com/blogs/linchi_shea/archive/2007/01/02/some-vldb-availability-tidbits.aspx

    http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Thanks a lot for all of the help.

    Now I have other candidates:

    (3) use SQL Server Table Partitioning

    (4) use 2 tables and one view as Robert Cary suggested

    (5) use 3 databases as Jeffrey suggested (which is similar to (4))

    All of them are cool. But they might have the same problem as my second solution. When the refresh job loads data from upstream systems, the online performance might be affected. You know, we might need to load a large bunch of data, so I/O and CPU could be used heavily at that time.

    Can I set the priority of different tasks? If we can, then I can set high priority for online query.

  • You set the priority of a job to an extent using Maximum Degree of Paralellism. If you want the refresh job to take a lower priority you can reduce the number of CPUs a query will utilize.

    SQL guy and Houston Magician

  • Thank you Robert!

    I will try max degree of parallelism Option. Suppose that will help:)

  • I would not recommend attempting to use the max degree of parallelism setting to micromanage your thread priority.

    xuejianpan,

    These data loads that you're asking about:

    1. how often do you need to load the data?

    2. how much data is loaded for each load?

    3. what time constraints do you have around getting the data loaded? Meaning, does the data come in at a certain time during business hours and need to be loaded within a certain amount of time from receiving it?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply