Big data: transfer x-number of rows per batch?

  • wtren (10/27/2016)


    SAN normally provide cache to ease it, and SAN is always random io pattern.

    Are you saying that a SAN won't stream to multiple contiguous sectors on disk? If so, I find that very difficult to believe. It would spend most of the time thrashing the heads instead of writing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It is, but normally SAN shared simultaneously by multiple Servers, that makes things different.

  • wtren (10/28/2016)


    It is, but normally SAN shared simultaneously by multiple Servers, that makes things different.

    So do a test. Do 3 or 4 file parallel transfers, loads, or backups to the SAN and then try the same amount of data with just a single serial transfer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Okay, so I've been very busy at work with this case. Data migration is finished just in time but there's lot of room for improvements.

    We ended up using this querie which worked great:

    declare @Documentnummer nvarchar(20);

    declare @Bestandsnaam nvarchar(255);

    declare @BinData varbinary(max);

    declare @Cnt int;

    declare @TotCnt int;

    declare @WhileCnt int;

    declare @SavCnt int;

    declare @ChkCnt int;

    declare @msg varchar(100);

    --set @Cnt=0;

    --vanaf begin beginnen

    --TRUNCATE TABLE [CS_ConversieAanlevering].[harry].[MULTIMED_INHOUD];

    set @msg=N'Aantal regels gedaan: %d';

    SELECT @TotCnt=COUNT(*) FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD];

    SELECT @Cnt=COUNT(*) FROM [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD];--[CS_ConversieAanlevering].[harry].[MULTIMED_INHOUD];--

    IF @Cnt=0

    set @Documentnummer='';

    ELSE

    SELECT TOP 1 @Documentnummer=[DOCUMENTNUMMER] FROM [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD] ORDER BY [DOCUMENTNUMMER] DESC;

    WHILE @Cnt<@TotCnt

    BEGIN

    SET @SavCnt=@Cnt;

    SET @WhileCnt=0;

    WHILE @WhileCnt<100

    BEGIN

    SELECT TOP 1 @Documentnummer=[DOCUMENTNUMMER]

    ,@Bestandsnaam=[BESTANDSNAAM]

    ,@BinData=[BLOB]

    FROM [NLDSLASSQLTST].[CS_ConversieAanlevering].[dbo].[MULTIMED_INHOUD]

    WHERE DOCUMENTNUMMER>@Documentnummer

    ORDER BY [DOCUMENTNUMMER];

    INSERT INTO [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD]([Documentnummer],[Bestandsnaam],[BLOB])

    VALUES(@documentnummer,@bestandsnaam,@bindata);

    --INSERT INTO [CS_ConversieAanlevering].[harry].[MULTIMED_INHOUD]([Documentnummer],[Bestandsnaam],[BLOB])

    --VALUES(@documentnummer,@bestandsnaam,@bindata);

    --print @documentnummer;

    --print @bestandsnaam;

    --print @bindata;

    SET @Cnt+=1;

    SET @WhileCnt+=1;

    IF (@Cnt % 25)=0 RAISERROR (@Msg,10,1,@Cnt) WITH NOWAIT; --WAITFOR DELAY '00:00:01' --WITH NOWAIT --@ModResult=0

    --IF @Cnt=100 BREAK;

    END

    SELECT @Cnt=COUNT(*) FROM [P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD];--[CS_ConversieAanlevering].[harry].[MULTIMED_INHOUD]; --[P2WIDB002].[CS_ConversieAanlevering_VWB].[dbo].[MULTIMED_INHOUD]

    SET @ChkCnt=@Cnt-@SavCnt;

    IF @ChkCnt<>100

    BEGIN

    SET @msg=N'Fout opgetreden bij het INSERTEN, '+CAST(@ChkCnt AS nvarchar)+' ongelijk aan 100';

    THROW 55555,@msg,1;

    END

    --IF @Cnt>=100 BREAK;

    END

    This is the largest table in the whole DB. The whole DB needs to be transferred. Next delivery will be halfway december.

    The whole DB consists of about 250 table and a total size of about 1 TB.

    But we need to improve a lot, like:

    -if VPN goes down while transfer is ongoing it need to continu where it stopped when VPN is back up

    -same goes if VM-host crashes and takes sql machine with it (it happend)

    -when finished it needs to present a report stating rec counts and transfer times

    -in general: things need to be automated / be more consistent / more reliable etc. So pre define all tasks for transfering all data from all tables and when the moment arrives just hit the 'Go'-button.

    So currently I'm looking into making some sort of 'project' and ran into couple of terms like SSIS, SSDT and Visual Studio, (I already know VS pretty well; I do my c# and vb.net programming in VS) but I'm getting a bit confused about what would be the best solution for such a project.

    So, what would you guys recommend? Thanks again!

  • Super_Grover (11/2/2016)


    This is the largest table in the whole DB. [font="Arial Black"]The whole DB needs to be transferred.[/font]

    The solution is pretty easy then. Don't try to write a home-grown solution. Do a RESTORE or, if your SAN is so equipped, do a "SAN Snapshot", which may take only seconds.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/2/2016)


    Super_Grover (11/2/2016)


    This is the largest table in the whole DB. [font="Arial Black"]The whole DB needs to be transferred.[/font]

    The solution is pretty easy then. Don't try to write a home-grown solution. Do a RESTORE or, if your SAN is so equipped, do a "SAN Snapshot", which may take only seconds.

    I'd wish I could do a restore but sorry, that's not allowed.

    It's just: we have our filled DB on our side; the exact same DB (but empty/no data just empty tables) on the other side. We must insert the data table by table through sql commands.

    I watching YT video after another about SSIS / SSDT / VS projects but I'm getting more and more confused... 🙁 Any step-by-step guide available somewhere?

  • Super_Grover (11/2/2016)


    Jeff Moden (11/2/2016)


    Super_Grover (11/2/2016)


    This is the largest table in the whole DB. [font="Arial Black"]The whole DB needs to be transferred.[/font]

    The solution is pretty easy then. Don't try to write a home-grown solution. Do a RESTORE or, if your SAN is so equipped, do a "SAN Snapshot", which may take only seconds.

    I'd wish I could do a restore but sorry, that's not allowed.

    It's just: we have our filled DB on our side; the exact same DB (but empty/no data just empty tables) on the other side. We must insert the data table by table through sql commands.

    I watching YT video after another about SSIS / SSDT / VS projects but I'm getting more and more confused... 🙁 Any step-by-step guide available somewhere?

    Any chance of setting up "Replication" so that you don't need to worry about all this anymore?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The fastest data transferring channel is a truck loaded with DVD's

    :hehe:

    _____________
    Code for TallyGenerator

  • Yes, 1TB over WAN, is a really challenge, normally we need build a base offline, and do synchronise online after that.

  • if you transfer table by table, how are you going to be assured of a consistent view of the entire database? Are you going to leave the source table inactive for the duration?

  • To piggyback on consistency, are there dependencies that would make any particular batch of tables too large to fit within an acceptable window of time?

  • Sergiy (11/2/2016)


    The fastest data transferring channel is a truck loaded with DVD's

    :hehe:

    Or an overnight care package with a monster hard disk in it. 😀 Did that when I was working for Mediacom and they couldn't figure out how to transfer all the data in a single weekend of downtime. The old system was taken offline at COB on Friday in Ann Arbor, MI and the new system in New York, NY was up and running by Saturday night. Of course, consistency was there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/7/2016)


    Sergiy (11/2/2016)


    The fastest data transferring channel is a truck loaded with DVD's

    :hehe:

    Or an overnight care package with a monster hard disk in it. 😀 Did that when I was working for Mediacom and they couldn't figure out how to transfer all the data in a single weekend of downtime. The old system was taken offline at COB on Friday in Ann Arbor, MI and the new system in New York, NY was up and running by Saturday night. Of course, consistency was there.

    Monster hard drive is OK.

    What about the whole blade with RAID in it?

    When our server crashed Friday night (what other time would you pick?) the chance of procuring a new one did not look too promising.

    So, out sysadmin grabbed our QA server, jumped into airplane from Auckland to Sydney and by weekend wired it to the PROD infrastructure.

    The point of having proper "PROD copy" in QA has been proven.

    🙂

    _____________
    Code for TallyGenerator

  • Sergiy (11/7/2016)


    Jeff Moden (11/7/2016)


    Sergiy (11/2/2016)


    The fastest data transferring channel is a truck loaded with DVD's

    :hehe:

    Or an overnight care package with a monster hard disk in it. 😀 Did that when I was working for Mediacom and they couldn't figure out how to transfer all the data in a single weekend of downtime. The old system was taken offline at COB on Friday in Ann Arbor, MI and the new system in New York, NY was up and running by Saturday night. Of course, consistency was there.

    Monster hard drive is OK.

    What about the whole blade with RAID in it?

    When our server crashed Friday night (what other time would you pick?) the chance of procuring a new one did not look too promising.

    So, out sysadmin grabbed our QA server, jumped into airplane from Auckland to Sydney and by weekend wired it to the PROD infrastructure.

    The point of having proper "PROD copy" in QA has been proven.

    🙂

    Z'actly!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply