Dealing with a "replicated" system

  • Hi all,

    Title doesn't really explain my situation very well, so I'll try to be a bit more clear.

    I'm working with a system in which we have one very large table, containing about 70 million records. This table is added to on a daily basis with about 30000 new records a day.

    The table contains a myriad of different fields. In order to make searches against this table more effective, I've come up with a new structure of tables, that divides the main table up into smaller components.

    The new structure still has one large table, but many of the columns are changed to tinyints, smallints, or ints, and are foreign keys pointing to smaller lookup tables. The searches are done against the smaller tables and then joined back to the main table using indexes.

    This system works great - the searches are all fast, and everything works nicely.

    The problem is that building the new structure is a slow process. It executes daily in the morning, and truncates all the tables and rebuilds them, along with their indexes, and takes about 2 hours to run. The time to run is only going to increase as the years go by.

    If the main table were just being appended to on a daily basis, this wouldn't be a problem to solve, since all I'd need to do disable the indexes, add the new rows, and then rebuild the indexes.

    The problem with that idea is that the main table is also being updated. Some of the updates are being done through stored procs, while others are being done through straight UPDATE statements.

    After thinking about the situation, I've come up with two solutions.

    The first, and easiest solution, is to just leave it as-is. The system I've built works, and it works well. The problem of course, is the time. If there's a problem with the script, then re-running it means we lose a lot of working time, and after a few years it will probably be taking too long to justify using.

    The second solution I've thought of is ugly, and I hate suggesting it. But, it makes sense. And this suggestion is to use an UPDATE trigger on the main table. Every time an UPDATE occurs on the main table, the trigger would replicate that update against the new structures. This would allow me to only need to build the new rows for the day, on each day, thus removing the problem of having a growing amount of time for building the structure.

    So, the questions to you all is this -

    1. Can anyone think of a better solution than either of the two I've proposed? And, just to cut this at the tail, no, I can't change the main table, it's being used by too many other systems.

    2. If there aren't any better solutions, which of the two that I've proposed is the better choice? Are the benefits of the UPDATE trigger worth the costs of having a trigger on a main production table containing millions of rows?

  • Actually after thinking about it more, I think I have a better idea.

    Rather than having the UPDATE trigger actually do all the work in updating the replicated tables, how about I have the UPDATE trigger just update a field in the main table, called LastUpdated, and set it to GETDATE().

    Then, on my daily job that inserts the new records into the replicated tables, I'll also delete all records from the replicated tables who's records in the main table were updated in the previous 24 hours, and then re-create those records.

    The number of updates is probably minimal, in comparison to the size of the tables, so this should save significant amount of time.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply