October 10, 2017 at 5:46 am
Hello SQL Guys,
I am working on a project where in a file location I will be getting daily files like marketing_input_1.txt, marketing_input_2.txt etc and sales_input_1.txt, sales_input_2.txt etc. The number of files can vary and that is shown as the number mentioned in the name. Rest all will be same across files like number of columns, sequence of columns, datatypes of columns. These all have to be loaded in SQL server table at one particular time. The table contains two additional columns. First createdatetime and second the version. Version is like if we run the data load for first time for a day, it should be 1 and for second time in a day the version for all the files should be 2.
Kindly let me know what should be ideal and effective approach to implement it.
October 10, 2017 at 6:08 am
Thom A - Tuesday, October 10, 2017 6:01 AMUse SSIS and a For Each Loop on the directory where you are storing your files.
I second this, but should add that you will need to move/archive the files after they have been processed, to avoid processing them more than once.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 10, 2017 at 6:54 am
Is SSIS as efficient as bulk load commands ? I am asking as performance will be a big test here. There will be almost 20-30 files with size varying from 1 million to 6 million records.
October 10, 2017 at 7:04 am
sqlenthu 89358 - Tuesday, October 10, 2017 6:54 AMIs SSIS as efficient as bulk load commands ? I am asking as performance will be a big test here. There will be almost 20-30 files with size varying from 1 million to 6 million records.
Not quite, because there is an overhead in terms of startup/closedown and logging. I'd suggest that you test one against the other – those volumes should be no trouble for SSIS. (Of course, if your target tables have lots of constraints and indexes, that will slow things down regardless of your import method.)
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 10, 2017 at 7:56 am
Major problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be better
October 10, 2017 at 8:00 am
Avi1 - Tuesday, October 10, 2017 7:56 AMMajor problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be better
There is one SP created by Jeff Moden (see the link below), which returns the dataset of all the files in a directory, it may be helpful
https://www.sqlservercentral.com/Forums/Attachment1801.aspx
October 10, 2017 at 8:01 am
Avi1 - Tuesday, October 10, 2017 7:56 AMMajor problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be better
No problem in SSIS.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 10, 2017 at 8:10 am
Phil Parkin - Tuesday, October 10, 2017 8:01 AMAvi1 - Tuesday, October 10, 2017 7:56 AMMajor problem you have is to identify the file to load, can you write extended stored procedure to loop through and find out the file name to be loaded. The function will return the filename which you can use in the bulkinsert process.
In fact you can write SQL script to access file system using dbo.sp_OAMethod, but I would say extended SP will be betterNo problem in SSIS.
You are right. I did not mean, there is a problem in SSIS to find latest filename. I mean for his data load process main issue is to find the latest file name, once it is identified data load can be done using any load process.
October 10, 2017 at 8:15 am
Avi1 - Tuesday, October 10, 2017 8:10 AMPhil Parkin - Tuesday, October 10, 2017 8:01 AMNo problem in SSIS.You are right. I did not mean, there is a problem in SSIS to find latest filename. I mean for his data load process main issue is to find the latest file name, once it is identified data load can be done using any load process.
Possibly, but this is not a problem if (as I suggested in my first post above) files are archived after they have been processed – which is industry best practice.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 10, 2017 at 8:32 am
Mm, Like Phil says, looping through files in a directory in SSIS is a trivial task. No only that, but the ability to then archive them is just as simple as well. One using a For Each Container, and the other is just a File System Task.
The OP didn't imply there was a "latest" file either, just there there are multiple. It could well be that the day's file overwrites any existing file there as well (so unprocessed files would be lost, which in a cumulative environment is not an issue).
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 10, 2017 at 8:37 am
Apologies to not clarify earlier but there will be only latest files in that directory i.e. the ones which need to be processed.
October 11, 2017 at 5:22 am
I wouldn't think that looping through the files or archiving them would be a big deal in either SSIS or SQL. From what's been posted by people who know SSIS like Phil and Thom, it should be almost trivial. If you're going to have multiple files containing from 1M to 6M rows each, I'd think log bloat will be your biggest problem. I'd start reading up on minimal logging. Here's a link to a guide on it: https://msdn.microsoft.com/library/dd425070.aspx
Phil, can you achieve minimal logging with SSIS?
October 11, 2017 at 5:52 am
Ed Wagner - Wednesday, October 11, 2017 5:22 AMI wouldn't think that looping through the files or archiving them would be a big deal in either SSIS or SQL. From what's been posted by people who know SSIS like Phil and Thom, it should be almost trivial. If you're going to have multiple files containing from 1M to 6M rows each, I'd think log bloat will be your biggest problem. I'd start reading up on minimal logging. Here's a link to a guide on it: https://msdn.microsoft.com/library/dd425070.aspxPhil, can you achieve minimal logging with SSIS?
When using the OLE DB Destination, you have the option of using "Fast Load". When you do this, you can use configurable options, such as the Batch Size. As highlighted in the article, you can set add to the FastLoadOptions ROWS_PER_BATCH=1000. This would aid in keeping the logs minimal in a Simple Recovery Model database, as the transactions are much smaller.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
October 11, 2017 at 5:56 am
Ed Wagner - Wednesday, October 11, 2017 5:22 AMPhil, can you achieve minimal logging with SSIS?
Yes you can. Some tweaks are required to the properties of the 'destination' component to achieve it, but there are resources online which describe how to do that (here is one).
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 15 posts - 1 through 15 (of 27 total)
You must be logged in to reply to this topic. Login to reply