processing a flat file without knowing the name

  • Simple question,

    The users drop a file in a specific folder on the network. I don't know the name. I'm just curious how others handle this. Don't need to be detailed.

    I'm using a for each file loop (even tho it's always one file) because its a simple way to enumerate the file info.

    The other option is a script that grabs the fileinfo and passes back to a var used on the file connection.

    Aside from that are there any other techniques used? Thx

     

  • Put *.* in the filespec of the Foreach loop and it will pick up whatever file is in the folder, regardless of name.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks Phil, yeah that's the default value when creating the container.

    I was asking outside of the FELC or Script Task is there a technique people use.

  • Can you tell us more about what you want to do?

    You could run a poweshell script to read the file names in the folder tons sql table and then run a script to bulk insert the recordst to a table

  • I'm using SSIS to load a file from the network. In this case I don't know the filename ahead of time and was just wondering what methods people use.

    I'm not stuck or anything I've actually completed the file load that I was working on, but was just thinking maybe there is another technique.

    My usual method is a C# script within the SSIS control flow where I check for the presence of the file then pass back the full path.

    Another method is the for each file loop. This is nice because you just give it a path and it enumerates the file names at run time as it loops through all files within a directory (and sub directories if desired) and combined with an expression on the file connection you can then iterate and load files using a single data flow.

    I'm assuming these are the 2 methods everyone uses.

    Powershell and BCP is a good thought. This file has quoted strings with embedded commas. Its my understanding that BCP doesn't handle quoted strings prior to SQL 2017.

  • One simple way is to use xpDirtree to read the file names in the source directory and then use BULK INSERT or OPENROWSET with the ACE drivers (if you're trying to import a spreadsheet) so that it can all be done in a relatively simple stored procedure.  I also handle double quoted true CSV files with a little prestidigitation of the delimiters in a BCP format file even before 2017 came out (been doing it since SQL Server 7.0 and it probably would have worked on 6.5).

    I also have been known to use xp_CmdShell to work some serious magic for things like file prep (especially on CSVs), file handling (find it, use it, move it to archive), some really simple but super effective error detection (and sometimes error correction) by row and column along with human readable/computer parsable error messages, and more but most people think it's a security risk and I don't spend much time demonstrating how it's not to people anymore because they're so afraid of it, they don't even believe what they're seeing.  I can even call PowerShell if I need to and I can use the SQL Agent to schedule the jobs (instead of using {gasp!} Windows Task Scheduler for PowerShell jobs, etc, etc.  It also allows me to pull data from some very remote places and, if internal, I can do it without having to give users privs on the source box where the data lives.

    The cool part about all of that is that I can do everything I need to do in relatively simple stored procedures and I don't have to worry about migrating packages during upgrades or going through any hell when the finally decide to change SSIS to something else like they did way back with DTS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    8<

    I also have been known to use xp_CmdShell to work some serious magic for things like file prep (especially on CSVs), file handling (find it, use it, move it to archive), some really simple but super effective error detection (and sometimes error correction) by row and column along with human readable/computer parsable error messages, and more but most people think it's a security risk and I don't spend much time demonstrating how it's not to people anymore because they're so afraid of it, they don't even believe what they're seeing.  I can even call PowerShell if I need to and I can use the SQL Agent to schedule the jobs (instead of using {gasp!} Windows Task Scheduler for PowerShell jobs, etc, etc.  It also allows me to pull data from some very remote places and, if internal, I can do it without having to give users privs on the source box where the data lives.

    >8

    Have you got some links to old examples from before you stopped demonstrating please? I also like to avoid Task Scheduler and wrap up the "odd" things up in one Agent job at all costs. So far a CmdExec job step has sufficed, but am interested to learn more about safe use of xp_CmdShell - I've always avoided it due to the dire warnings I saw everywhere when starting out.

    "Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
    — Samuel Johnson
    I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?

  • david.edwards 76768 wrote:

    Jeff Moden wrote:

    8<

    I also have been known to use xp_CmdShell to work some serious magic for things like file prep (especially on CSVs), file handling (find it, use it, move it to archive), some really simple but super effective error detection (and sometimes error correction) by row and column along with human readable/computer parsable error messages, and more but most people think it's a security risk and I don't spend much time demonstrating how it's not to people anymore because they're so afraid of it, they don't even believe what they're seeing.  I can even call PowerShell if I need to and I can use the SQL Agent to schedule the jobs (instead of using {gasp!} Windows Task Scheduler for PowerShell jobs, etc, etc.  It also allows me to pull data from some very remote places and, if internal, I can do it without having to give users privs on the source box where the data lives.

    >8

    Have you got some links to old examples from before you stopped demonstrating please? I also like to avoid Task Scheduler and wrap up the "odd" things up in one Agent job at all costs. So far a CmdExec job step has sufficed, but am interested to learn more about safe use of xp_CmdShell - I've always avoided it due to the dire warnings I saw everywhere when starting out.

    As you see in the PowerPoint presentation in the attached ZIP file, I used to be the same way until I went to work for a company that used it in production code.  I advised them that it was a "Best Practice" to disable it and never use it.  Their reply was simple... "Prove It" and I set out to do just that.  Instead, I ended up proving (at least to myself) that disabling it and never using it is actually a "Worst Practice", IMHO.

    Just so you know, I don't write PowerPoint "presentations".  I write PowerPoint "books" so that folks have something to study after the presentation is long over.  This one was no exception.  It's full of goodies.

    Go through the presentation and if you come to the conclusion that "Disabling it and never using it" is the wrong answer to "Ancillary Question #2", then I can provide some answers to other questions (I've not kept track of any links to forum questions solved by a demonstration of its use) and, perhaps, a demonstration or two.  I've also included demo code in the zip file and instructions for the correct setup in the presentation.  Please understand that those setup instructions were pre-Windows 10 and the screen shots have changed substantially but the steps are basically the same.  We still use xp_CmdShell (alot!) at work using the same basic setup.

    If you don't come to such a conclusion, then it would be complete waste of both our time to continue with demonstrations.

     

    Attachments:
    You must be logged in to view attached files.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    As you see in the PowerPoint presentation in the attached ZIP file, I used to be the same way until I went to work for a company that used it in production code.  I advised them that it was a "Best Practice" to disable it and never use it.  Their reply was simple... "Prove It" and I set out to do just that.  Instead, I ended up proving (at least to myself) that disabling it and never using it is actually a "Worst Practice", IMHO.

    Just so you know, I don't write PowerPoint "presentations".  I write PowerPoint "books" so that folks have something to study after the presentation is long over.  This one was no exception.  It's full of goodies.

    Go through the presentation and if you come to the conclusion that "Disabling it and never using it" is the wrong answer to "Ancillary Question #2", then I can provide some answers to other questions (I've not kept track of any links to forum questions solved by a demonstration of its use) and, perhaps, a demonstration or two.  I've also included demo code in the zip file and instructions for the correct setup in the presentation.  Please understand that those setup instructions were pre-Windows 10 and the screen shots have changed substantially but the steps are basically the same.  We still use xp_CmdShell (alot!) at work using the same basic setup.

    If you don't come to such a conclusion, then it would be complete waste of both our time to continue with demonstrations.

    Thanks Jeff a thoroughly engaging and informative presentation. I wish that all powerpoints were that useful!

    It fell on particularly fertile ground because so often, recently, my heart sinks when I hear the words "Best Practice". I feel that the term has been significantly diluted over the years, and now often means a circular reinforcement of subjective opinion at best, and the killer "ace up the sleeve" used to win an argument at worst.

    I am very much of the opinion, now, that "Disabling it and never using it" is the wrong answer.

    "Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it. When we enquire into any subject, the first thing we have to do is to know what books have treated of it. This leads us to look at catalogues, and at the backs of books in libraries."
    — Samuel Johnson
    I wonder, would the great Samuel Johnson have replaced that with "GIYF" now?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply