Transferring millions of records in live production

Question

Transferring millions of records in live production

saum70

SSC Eights!

Points: 993
More actions
June 6, 2024 at 6:27 pm

#4426650
Hi
we have req of transferring huge data from live table to another table in production env. Live table means simultaneous insert will be going on when transfer of data is happening. For e. G table A at 11 in the night has 5 million records. At 11 it is decided to shift this 5 million to another table B while more new records are getting added in A. Table A does not have partition. Once data transferred this data shall be removed. It will be everyday process
Kindly suggest best practice to make this transfer without any delay to new insertion in the table A. Kindly assume no partition on A
- This topic was modified 6 months, 2 weeks ago by saum70.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Dennis Jensen SSChasing Mays Points: 606 More actions · Answer 1

If you are using MS SQL Server you would use the Snapshot or Read Uncommitted (or something like that) and make sure you grab only the information that is post the last read and prior to the point for new insertions. Then subsequent pulls of data are going to do the same thing. This allows you to minimize the amount of data you are pulling and does not block the table for insertions.

Further you could do this in smaller increments updating table B from table A on a smaller record basis which would allow the updating of table B to me more frequent with smaller chunks of data by implementing a trigger based on the insertion record count. Perhaps doing the update every 1 million records instead of 5 million records. As the more data you work with the less efficient (time wise) the entire process will be. For while you will not be locking the table, you will still be using the processor for this transfer of data which then could slow things down.

Note: This assumes that your data is time incremental in nature and that changes to old data is not occurring as that would be an update and not an insertion. If you do not time stamp the insertion then I strongly suggest that you do so for table A while table B does not necessarily need that information unless it is something that should accompany that data.

This reply was modified 6 months, 2 weeks ago by Dennis Jensen.

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 2

What is supposed to happen to the 5 million rows in Table A AFTER you've copied them to Table B and is Table B on the same database and on the same server?

EDIT... Thanks for updating the post.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 3

Have two tables... for example, TableA1 and Table A2.

Start out with a synonym called TableA and have it pointing at TableA1.

Have another synonym called TableAWork and have it pointing at TableA2.

Table A1 will be the one getting filled up during the day because the TableA synonym will be pointing to it.

At 11PM, drop both synonyms and recreate them to point to the opposite tables. The synonym for TableA will now be pointing to empty TableA2 and start taking the new rows. The synonym for TableAWork is now pointing to the 5 million row TableA1. You can basically do with it what you want without slowing down the input process. After you've copied the rows from it to wherever you were going to copy them to with impunity. When you're done, get the table name that the TableA synonym is pointing at and truncate it.

The next night, simply reverse the naming in the synonyms.

Total "down time" will be measured in low milliseconds and no interruption will be perceived because of the short time and the fact that an in-use synonym will wait for it to temporarily not be used.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

saum70 SSC Eights! Points: 993 More actions · Answer 4

Hello Dennis. Thanx for reply. Yes MS SQL will be used. But Wouldn’t trigger be resource expensive. Our system is going to be live 24x7. The data are time incremental and no update will happen in old data. To define trigger for transferring small chunk will call for logics and will be expensive.

Hello Jeff. Thanx for reply. I have never come across synonyms. Read some info on it. I am not really sure it will work. Anyways the old data will be removed from table A after transfer to B.

I have re@d that Switch command work even though there is no partition. Need to check.

Kindly revert for any doubts on the requirements.

Dennis Jensen SSChasing Mays Points: 606 More actions · Answer 5

Okay not wanting to confuse things I would say based on your most recent information that the solution presented by Jeff Moden would be a good method of doing this and can make the whole process easier to handle.

As for setting up a synonym that is fairly easy thing to do but here is a link to something that outlines it in detail.

https://www.sqlservertutorial.net/sql-server-basics/sql-server-synonym/#:~:text=To%20create%20a%20synonym%2C%20you%20use%20the%20CREATE,SYNONYM%20%5B%20schema_name_1.%20%5D%20synonym_name%20FOR%20object%20%3B

Still the options are somewhat limitless and the actual best practice is usually the simpliest most efficient method that you can implement that handles this as automatically as possible.

Perry Whittle SSC Guru Points: 234013 More actions · Answer 6

You haven’t specified whether there are any keys or referential integrity on the tables, do they contain any?

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Jeff Moden SSC Guru Points: 1004406 More actions · Answer 7

saum70 wrote:

Hello Jeff. Thanx for reply. I have never come across synonyms. Read some info on it. I am not really sure it will work. Anyways the old data will be removed from table A after transfer to B. I have re@d that Switch command work even though there is no partition. Need to check.

Yes... that'll also work. As Perry Whittle stated, you shouldn't have any FK's pointed at either table because your goal would be to TRUNCATE TableB instead of doing the much more expensive DELETEs.

Most of the work I've done in this area was long before partitioned tables were available in the "Standard Edition" and I'm afraid that I was stuck in that rut when I made my recommendation.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Perry Whittle SSC Guru Points: 234013 More actions · Answer 8

saum70 wrote:

I have never come across synonyms.

they're big in oracle, synonyms are very popular and also useful in sql server

-----------------------------------------------------------------------------------------------------------

"Ya can't make an omelette without breaking just a few eggs" 😉

Emperor100 SSCrazy Points: 2173 More actions · Answer 9

We have a similar requirement, we have used a SSIS package to copy the data from a table in Azure to on-premise.

As an alternate method, Create two jobs that can use components like SSIS packages, ADF pipelines, or T-SQL scripts. TableA should have a timestamp column (as Dennis suggested) for incremental data copying. You can use 50K or 100K chunk size.

Job 1: Every hour, retrieve the max value from the timestamp column and save it in a log table. Use this timestamp to incrementally copy data from TableA to TableB (handling INSERTs or UPSERTs).

Job 2: Run every 10 minutes and delete rows from TableA based on the timestamp stored in the log table (rows already copied to TableB).

After 11 pm daily, ensure that all rows deleted from TableA that were added before or at 11 pm.

Note: This method may not work if it is mandatory to copy or move the rows from TableA to TableB only after 11 pm and not through out the day.

=======================================================================