4000 files to rename and move

  • Greetings all. I have a SSIS package that moves few thousand files every few hours. The input and output path are both variable and are supplied to a For-Each loop. The loop runs a script task that checks the input path, renames the file, moves the file to the new location and then deletes the file from the source, catching errors etc. All goes smoothly except . . .

    It goes 1 file at a time. The duration for these 4000 files runs about and hour.

    I'm looking for a solution to allow multiple files to be moved concurrently. There is no opportunity for wildcards because of the variable paths. I have thought that I could have a file task for each step and hand it off. I don't know if that will help much. So, any suggestions?

  • Not sure whether this would be faster ...

    Maybe Zip all the files and then move the Zip file. Depending on the nature of the file names and the renaming, you could possibly do it via a single wildcard command, eg

    ren *.tmp *.tmp2

    after unzipping at the destination.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Another way that comes to my mind is to conditionally direct the control flow to multiple file tasks according to the name of the file.

    For e.g., create 10+ file tasks (i.e. rename file tasks), take out the first character from the file name and direct the control flow (using precedence constraints) to these 10+ tasks as per the first character of the file name.

    Note that this requires the maximum concurrent tasks execution property needs to set to the appropriate value.

    --Ramesh


  • Thanks for your replies.

    Zipping at the source is not an option.

    Breaking out the files into separate streams is probably the best option, it is a natural way to create more file tasks.

    Still looking for alternative, maybe something besides SSIS?

  • "I'm looking for a solution to allow multiple files to be moved concurrently"

    If you look at the task manager

    you will probably see the the process will consume very little CPU power.

    The limitation is probably not CPU power but network capacity.

    I also move a lot of files over a company intranet. As it took many hours

    I did some experiment with VB6 (today I would use VB.net) and made a couple

    of paralell processes and let them run concurrently. But that did gain very little total

    time saving because the limitation was the network speed. The network has now been

    replaced by optical fiber and the total time is ok.

    If your network is not homogenus (has different capacity to different locations) then it

    might be usefull to set up paralell processes one for each location.

    /Gosta

  • What is your source? A local server (network location)? An FTP site? A drive on your SQL Server?

    What is your destination? A drive on your SQL Server? A network location?

    What about your network? Is it pure TCP / IP? Are there other network protocols intermixed?

    Is the only problem the transfer time? Why is taking one hour such a problem. What are you doing with the files once you have moved them? Is this process faster than the move?

    Just trying to understand your situation so I can see if I can help you.

    Thanks,

    Tim

  • I will look at a way of logging the time per file. Looking at the network is an excellent idea.

    The source for these files is numerous polling servers, these servers poll cash registers and download store data. Then the SSIS package collects them and moves them to a single location so that they can be processed into the warehouse.

    Another option I can try is to have each polling server push the files. That might make more sense.

    Thanks for your help, I have a couple of ideas to work with here.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply