Big data: transfer x-number of rows per batch?

Question

Post reply

Big data: transfer x-number of rows per batch?

Viewing 15 posts - 16 through 30 (of 43 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246993 More actions · Answer 1

Super_Grover (10/24/2016)
Okay we're trying to solve this but I'll mark Phil's post as solution.
Thanks again!

If you find that you cannot run this as a distributed transaction, it should be possible to tweak the code.

Won't be quite as robust, but should get the job done.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

drew.allen SSC Guru Points: 76998 More actions · Answer 2

Since you've posted to a SQL 2012 forum, you might want to use a feature that was introduced in SQL 2012 specifically for paging: OFFSET/FETCH.

DECLARE @pg INT = 1,

@pg_size INT = 1000

@rows INT;

SET @rows = @pg_size;

WHILE @rows = @pg_size

BEGIN

INSERT INTO [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD]

SELECT * FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD]

ORDER BY doc_number

OFFSET @pg_size * (@pg - 1) ROWS

FETCH NEXT @pg_size ROWS ONLY;

SELECT @rows = @@ROWCOUNT, @pg += 1;

END

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Mitch C. Stein SSChasing Mays Points: 658 More actions · Answer 3

Another consideration is whether the process/progress could/should be interrupted mid-stream. Some additional flexibility might come from using a real table instead of temp where you could add a field for datetime when the records were transferred. Then you could pull batches of records from the set with no transferred datetime. This way, you could split the operation up over more than one day or if a batch got interrupted, delete them from the destination and refill. If the server catches on fire, you could stop your transfers and deal with it.

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 4

Another thing to strongly consider when transferring such large amounts of data is "Minimal Logging" and the proper use of Trace Flag 610, which has been available since 2008 R2. It can save not only a relatively huge amount of time, but a huge amount of log file space and, sometimes, MDF space depending on when you build the Clustered Index).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

wtren SSC Eights! Points: 806 More actions · Answer 5

wtren

SSC Eights!

Points: 806

October 25, 2016 at 2:17 am

#1908520

You should use bcp

Phil Parkin SSC Guru Points: 246993 More actions · Answer 6

wtren (10/25/2016)
You should use bcp

Please justify this comment.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

wtren SSC Eights! Points: 806 More actions · Answer 7

Bcp can control batch size directly, write very fast, have very little disturb and burden to product database, after that you can copy data to the destine and do your data loading slowly, and no need to read data from source again if you have any trouble, it's a different strategy.

Mitch C. Stein SSChasing Mays Points: 658 More actions · Answer 8

I was going to recommend SSIS for the same reasons, but it seemed like the conversation was focused on the solution we were discussing. It also sounds like a one - off, so I'm not sure how much of a learning curve is acceptable or the time frame.

wtren SSC Eights! Points: 806 More actions · Answer 9

yes, it's split the process into two step, and do it one by one, as I say, it's just a different strategy to do a work, depend on your requirement.

MadAdmin SSChampion Points: 11261 More actions · Answer 10

If you willing to go the CLR route, and the LOB is compressible, you can use compression functionality from System.IO.Compression to compress into a staging table locally, push smaller staging table across network, then inflate at the destination.

Catch-all queries done right [/url]
Gail Shaw's Performance Blog[/url]

Eric M Russell SSC Guru Points: 125564 More actions · Answer 11

Super_Grover (10/23/2016)
Hi all,
I must transfer a big table (231k rows not that big, but 735GB big because of blob) to another location by INSERT INTO SELECT FROM...
It appears that first the whole source table is being loaded before any record is actually inserted into the destination table. But there's not enough diskspace because tempdb just grows and grows.
...

If you're just selecting from one table into another table, then I'm thinking it shouldn't dump into tempdb, unless you're doing an ORDER BY.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 12

Phil Parkin (10/25/2016)
wtren (10/25/2016)
You should use bcp
Please justify this comment.

Although it was a bit short, it's not a bad idea. I've done similar before. Instead of messing around with loops and all manner of other tricks, you BCP the data out as native format (very fast) and then BCP or BULK INSERT it back in where you want it. If you hold your mouth just right while you're doing it (especially with TF 610), you can even do it in a minimally logged fashion with the Clustered Index already in place.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

wtren SSC Eights! Points: 806 More actions · Answer 13

Yes, very fast, for multiple tables, you can also open several windows and download different table simultaneously.

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 14

wtren (10/26/2016)
Yes, very fast, for multiple tables, you can also open several windows and download different table simultaneously.

Except, in most SAN environments, that will actually slow things down because you can't control which spindles will come into play and head-thrashing frequently becomes a performance problem for such loads. Even if parallel loads work very fast for today, they might not tomorrow.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

wtren SSC Eights! Points: 806 More actions · Answer 15

SAN normally provide cache to ease it, and SAN is always random io pattern.