Help with high impact searches or expensive searches against a table that is mostly in motion

  • I have a design quandary that I hope someone has experienced or can assist with.

    We are using SQL Server 2005.

    There is a main search db (DB1):

    This DB1 is updated using a SQL Job that runs every hour. The update process is performed by deleting records in need of updating then reinserting them. This is due in part to the fact that the data rows inserted and/or deleted are not necessarily equal to the count of data rows that currently exist in the table.

    (i.e.

    DB1.Current table contains:

    Acct1 John Doe

    Acct2 John Doe

    Acct3 John Doe

    Update contains:

    Acct2 John Doe

    Acct3 John Doe

    Acct4 John Doe

    Acct5 John Doe

    Process would delete Acct1, Acct2, Acct 3 from DB1.Current table

    then

    Insert Acct2, Acct3, Acct4, and Acct5 to DB1.Current table

    Please bear in mind DB1.CurrentTable has over 30 columns whose data is derived from 9 different tables)

    On top of the hourly update, individual updates happen to rows sporadically by web users. The hourly update can contain anywhere from 0 to 30000 rows requiring updating. Depending on the count of rows and how busy the aforementioned tables are ("9 different tables"), this process can take up to 50 minutes.

    DB1 contains up to 17-18 million records, and does not get searched. However, it replicates to various other databases, including DB2, whose main use is as a search table as follows:

    A SQL job process searches this DB2.currenttable by joining anywhere from 100 - 100,000 records from another table using "fuzzy logic" to DB2.currenttable, ultimately dumping the results into another table where further processing is performed to whittle down the matches as designated.

    This search happens intermittedly, and there are no controls as to when it is performed or how many records are being searched. The only control is that only one set of data is searched at a time as the SQL Job will only run in sequential mode.

    Our problem seems to lie in the fact that we are constantly trying to update a table (in DB1), replicate the table (to DB2), then search the table (in DB2) and this causes often dead locks, and at times, the search is performed while data has been deleted (by the update process) BEFORE the data is replaced (by the update process), which sacrifices the integrity of our data.

    I don' t know if I have explained this well, but what I need to know essentially is how do you do high impact searches or expensive searches against a table that is mostly in motion?

    I have tried to convince the admin that we are trying to do too much and that we should reconsider how often our table is being updated, perhaps moving to a daily or twice daily update model, during which time all searches are suspended; of course, I am being told our clients expect close to real time data and this is not an option.

    Hopefully, someone has dealt with this type of business process and can shed some light as to how to effectively deal with the problem.

    Thank you in advance....

  • Make sure that the delete and insert (it's not an update) are within a single transaction. That will prevent missing data during a query and should help, a bit, with the deadlocks. If this were 2008 I'd suggest looking into snapshot isolation because I think that would help as well. Otherwise, you need to reexamine the process. Also, you haven't mentioned performance on these processes. Depending on how slowly this stuff is occurring, that could be increasing the number of deadlocks. As far as the deadlocks go, you need to make sure that you're processing the information in the same order to avoid the classic deadly embrace occurence which it sounds like you might be hitting.

    I realize this is all general advice, but that's about all I can give with that level of detail. The "delete and reinsert everything" model is extremely problematic, as you're discovering.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the response....

    As far as the timing, the actual insert and delete can take a few minutes, depending on the amount of data.

    What is your opinion regarding the practice to try and update a search table every 60 minutes yet perform high impact searches that join other tables simultaneously?

    Do you know if this is a common practice in general, or do most organizations try and use a process by which the search table gets updated once or twice daily, and during those times, the join search query process is disabled?

    Or perhaps another way to tackle this is to clone the table, update the cloned table that is not in use, then swap the cloned table with the existing table.

    (I am not sure that process would be practical, but I am trying to come up with ideas that may solve our issue)

  • It's not at all unheard of to modify data, a lot, and query it at the same time. It's pretty standard in fact. But, you have set expectations and spend a lot of time tweaking and tuning to be sure you're doing it the best way possible.

    Any attempt at cloning the table is still subject to some of the same issues you're already dealing with. I don't think it will help necessarily.

    As far as the data goes, do the queries and the inserted data directly correlate? Meaning, are the queries more likely to go after older data or newer data? If there is some sort of dividing line you can try to cluster based on that so that the access goes against one part of the table and the insert/delete go against another.

    I'm still back to working the methodology & process more than trying to add another layer on top. Also tuning everything so that it's performing optimally is vital.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Unfortunately, there is no specific pattern to the data being updated or the data being queried so clustering would not work here.

    You mentioned a transaction earlier. To clarify, if we enclose the delete and insert within a transaction, will this help keep maintain data integrity? (In other words, while the external select join query executes, it will have to wait for the transaction to be complete ensuring no data is missing from the search table before it is reinserted)

  • Robert Heynen-454800 (8/24/2011)


    Unfortunately, there is no specific pattern to the data being updated or the data being queried so clustering would not work here.

    You mentioned a transaction earlier. To clarify, if we enclose the delete and insert within a transaction, will this help keep maintain data integrity? (In other words, while the external select join query executes, it will have to wait for the transaction to be complete ensuring no data is missing from the search table before it is reinserted)

    Yes it should.

    BTW, the table does have a clustered index through, right? I was just thinking that adjusting it to match the access patterns might help.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Oh yes the search table and the joined tables have indexes on the columns used for matching.

    If applied, wouldn't the transaction cause potential blocking?

    Just making sure I know the impact if I initiate...

  • Oh yeah, it's going to block.

    It's all about choices. You want to get rows back and not fall into the crack between a delete and an insert... gotta have the transaction. You don't want the delete and insert to cause any blocking... transaction may have to go (or get tuned, or get shortened, or ....).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply