Update or Reinsert? which one is best

  • On daily basis I have to update data from flat file (abt. 2 Lac Rec.), either i have to update data (min. 10 fields per record is required to update) or i delete that selected data and reinsert the complete data from flat file.

    Which is the best way? and why? please suggest.

    Thanx

    Jai

  • this really depends on what you are trying to acheive and also your data volumes/indexes

    if you delete/re-insert then this may be slow and you may get heavy fragmentation on your table (i suggest dropping the indexes first , then delete/insert, then re-add indexes)

    if you can code decent logic into your update and you are sure your row-counts are good (ie no extra rows, no deleted rows) then an update MIGHT be better

    how many rows are you talking about? how many fields?

    also - do you have referential integrity from this table (ie are there foreign keys hanging off the table) - do you have triggers on the table?

    these are all things you would need to consider before making a decision

    MVDBA

  • I'd tend to agree with Michael. It also depends on resources and time. The updates might be spread out, whereas a reinsert tends to run at once. If you index later, is that an issue? Do you have the time, or will the indexing generate blocks or delays?

    You need to test both, and see how they impact your system and application.

  • There is around 40 Lac records in a table.

    Total columns = 48

    Updation required in 10 columns

    Every six hours the data is need to update (around 3 Lac records).

    No referential integrity

    No Triggers

    Table having an index, basically data is uploaded from flat file in temporary tables, then from these tables data updated in original tables, which is further used for reporting.

    i think the first option is better, reinsert the data with drop/recreating index.

    Please suggest.

  • 40 lac = 4,000,000 ( 4 million )

    (1 lac=100,00)

    is this correct ???

    one more question - does the file you are importing/updating from have a unique identifier? if so is this carried through into your import?

    also is there a modified_datetime field so that you can identify which rows to update easily?

    if this is then case then there might be some benefit in performing an update on just the 300,000 rows you need to update.

    in both scenarios I imagine you'll be importing from the flat file into a holding table of some kind? if you can identify the rows that need updating and you can match primary keys then the updates to the final table would be a much shorter period of time (therefore more uptime available to your users)

    but there are so many restrictions here that might not be practical. I would be interested in seeing the timing difference between

    a)update 10 columns in 300k records

    b)truncate table and Bulk insert

    do you have any existing timings?

    MVDBA

  • yes it is 4 million.

    Records having an unique identifier, but there is no modify flag in data to identify the updated records, that's why i have to update all 3,00,000 records.

    There are other records also in table so i cannot truncate table, but as i mention i can delete/reinsert these 3,00,000 records.

    These tables are not directly available for the users, jobs has been define which summarize this data and transfer to another tables which are accessed by the users.

    now what's your views.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply