Bulk load architecture

  • Hello SQL Guys,
    I am working on a project where in a file location I will be getting daily files like marketing_input_1.txt, marketing_input_2.txt etc and sales_input_1.txt, sales_input_2.txt etc. The number of files can vary and that is shown as the number mentioned in the name. Rest all will be same across files like number of columns, sequence of columns, datatypes of columns. These all have to be loaded in SQL server table at one particular time. The table contains two additional columns. First createdatetime and second the version. Version is like if we run the data load for first time for a day, it should be 1 and for second time in a day the version for all the files should be 2. 

    Kindly let me know what should be ideal and effective approach to implement it.

  • Use SSIS and a For Each Loop on the directory where you are storing your files.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Tuesday, October 10, 2017 6:01 AM

    Use SSIS and a For Each Loop on the directory where you are storing your files.

    I second this, but should add that you will need to move/archive the files after they have been processed, to avoid processing them more than once.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Is SSIS as efficient as bulk load commands ? I am asking as performance will be a big test here. There will be almost 20-30 files with size varying from 1 million to 6 million records.

  • sqlenthu 89358 - Tuesday, October 10, 2017 6:54 AM

    Is SSIS as efficient as bulk load commands ? I am asking as performance will be a big test here. There will be almost 20-30 files with size varying from 1 million to 6 million records.

    Not quite, because there is an overhead in terms of startup/closedown and logging. I'd suggest that you test one against the other – those volumes should be no trouble for SSIS. (Of course, if your target tables have lots of constraints and indexes, that will slow things down regardless of your import method.)

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Major problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
    In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be better

  • Avi1 - Tuesday, October 10, 2017 7:56 AM

    Major problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
    In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be better

    There is one SP created by Jeff Moden (see the link below), which returns the dataset of all the files in a directory, it may be helpful
    https://www.sqlservercentral.com/Forums/Attachment1801.aspx

  • Avi1 - Tuesday, October 10, 2017 7:56 AM

    Major problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
    In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be better

    No problem in SSIS.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin - Tuesday, October 10, 2017 8:01 AM

    Avi1 - Tuesday, October 10, 2017 7:56 AM

    Major problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
    In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be better

    No problem in SSIS.

    You are right. I did not mean, there is a problem in SSIS to find latest filename. I mean for his data load process main issue is to find the latest file name, once it is identified data load can be done using any load process.

  • Avi1 - Tuesday, October 10, 2017 8:10 AM

    Phil Parkin - Tuesday, October 10, 2017 8:01 AM

    No problem in SSIS.

    You are right. I did not mean, there is a problem in SSIS to find latest filename. I mean for his data load process main issue is to find the latest file name, once it is identified data load can be done using any load process.

    Possibly, but this is not a problem if (as I suggested in my first post above) files are archived after they have been processed – which is industry best practice.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Mm, Like Phil says, looping through files in a directory in SSIS is a trivial task. No only that, but the ability to then archive them is just as simple as well. One using a For Each Container, and the other is just a File System Task.

    The OP didn't imply there was a "latest" file either, just there there are multiple. It could well be that the day's file overwrites any existing file there as well (so unprocessed files would be lost, which in a cumulative environment is not an issue).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Apologies to not clarify earlier but there will be only latest files in that directory i.e. the ones which need to be processed.

  • I wouldn't think that looping through the files or archiving them would be a big deal in either SSIS or SQL.  From what's been posted by people who know SSIS like Phil and Thom, it should be almost trivial.  If you're going to have multiple files containing from 1M to 6M rows each, I'd think log bloat will be your biggest problem.  I'd start reading up on minimal logging.  Here's a link to a guide on it: https://msdn.microsoft.com/library/dd425070.aspx

    Phil, can you achieve minimal logging with SSIS?

  • Ed Wagner - Wednesday, October 11, 2017 5:22 AM

    I wouldn't think that looping through the files or archiving them would be a big deal in either SSIS or SQL.  From what's been posted by people who know SSIS like Phil and Thom, it should be almost trivial.  If you're going to have multiple files containing from 1M to 6M rows each, I'd think log bloat will be your biggest problem.  I'd start reading up on minimal logging.  Here's a link to a guide on it: https://msdn.microsoft.com/library/dd425070.aspx

    Phil, can you achieve minimal logging with SSIS?

    When using the OLE DB Destination, you have the option of using "Fast Load". When you do this, you can use configurable options, such as the Batch Size. As highlighted in the article, you can set add to the FastLoadOptions ROWS_PER_BATCH=1000. This would aid in keeping the logs minimal in a Simple Recovery Model database, as the transactions are much smaller.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Ed Wagner - Wednesday, October 11, 2017 5:22 AM

    Phil, can you achieve minimal logging with SSIS?

    Yes you can. Some tweaks are required to the properties of the 'destination' component to achieve it, but there are resources online which describe how to do that (here is one).

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply