28 Million Records to Be Matched -- URGENT

  • Hey Guys I need some help... I have a table with 28 million records of zipcode and state... I have another table that has SH, SS district codes that I need to be updated in that first table with 28 records. The only problem is when I try to run Update Query joing the tables with state and zipcode ... the query has been running forever... What is the best way to do this?

  • While 28 million records is a large table I've seen larger.  Not knowing your enviornment means most advise you get will be fairly generic.  But some things to consider:

    1. Review how many indexes the table has, and how the update will affect them.  Anything affected by the update (unless used directly by the update) should be removed. 

    2. Check all referential integrity and see if anything that is involved with the update can be safely disabled (for this large update).  This is something you want to be careful with but it is a consideration.

    3. Review all triggers on the table and see what will be firing and if it is really needed in this situation (sometimes there are logging/security triggers that can be eliminated for this type of batch update).

    4. Check the data used to determine what will be updated and eliminate duplicate (if possible) data (i.e. anything that would cause the same record to be updated multiple times).

    5. break the update into batches with a commit after each batch.  A batch can be a state, or a state and zipcode range.  Check the data that defines what will be updated and see what the best groupings are, then simply write multiple update statements. 

    6. Run this type of update statement during off peak hours.

    James.

  • what is your execution plan saying?

  • Can you post the statement?  I have a feeling that there is a cross join.  If you are saying that you should be only updating 28 of the 28M records, something is wrong. 

    FYI, what is your opinion of "Forever"  Mine is pretty long   5min, 50min, 5 hours, etc...

  • This is one place where a "control loop" will help... write the code to only update a couple million at a time in a loop until everything has been updated.  You'll be amazed at how much faster it performs.

    Also, make sure you have the correct indexes to support the update and a PK on both tables...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • in addition to the above

    check that your logs and tables are not on the same drives - this will cause massive performance issues for an update of this nature

    if your field is already in an index, but you are doing a mass update and with this level of updating (I'm presuming you are updating 28million records with one of 28 categories) it might be worth disabling any indexes, running the update, rebuild the indexes (however if you choose the looping method you may want to retain indexes to improve selection speed)

    if you're adding a new field I'm guessing that it is not in an index and this maybe forcing the query plan to use a table seek for a row by row update which of course will invalidate any gains from a good performing index

  • Thank you all for all the suggestions... All of them are educational for me and I will use them now and in future.

  • Let us know what you finally come up with so that others may learn, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Before running the Update query, I normally would run a similar Select query to check if the appropriate rows would be updated properly. Try to run a Select with all the query structure that you composed for the update, but with minor modification, so it would just select insted of updating.

    Regards,Yelena Varsha

  • Add to the list for things to do to the tables needed for this process:

    • UPDATE STATISTICS
    • sp_recompile
    • sp_refreshview

    All of the assistance/suggestions/etc. listed in this posting will not help unless you have good information for the optimizer to use. A tablescan of 28 rows is OK unless it is used in a JOIN woth a table that has 28 million ...

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply