How To Work with Files based on file name patterns

  • This is tuff to explain and not be verbose but here goes.

    PROBLEM: How to delete a small number of xml files from a folder after the xml files have been downloaded from the source and before they are imported into our DB.

    GOTCHA #1: The process we use to download an d import is proprietary and can not be altered so we can't simply not download the few files we don't want to import.

    GOTCHA #2: The few files we don't want to import can be found using something like the IN() predicate in T-SQL.

    A psedocode example would be something like this:

    FOR EACH sFile in FILES

    IF Left(sFile, (CharIndex('.',sFile,1))-1) IN (SELECT sPrefix FROM PREFIXLIST ) THEN

    DELETE sFile

    LOOP

    This psudecode mixes VB and T-SQL but I hope it properly conveys what it is I am trying to do. If its not possible to delete the file then simply moving it would be welcomed.

    The first X number of letters in the file names are like an object identifier and thats how the system knows what the data in the file is related to. I can filter out these unwanted files by matching the first X number of characters against a list I can have set up in a View.

    Anyone know if this is possible in SSIS and if so how? Even a link to an example would be most welcomed.

    Thanks

    Kindest Regards,

    Just say No to Facebook!
  • Have you looked at the for each object in SSIS? Perform a loop using the file name's pattern to find all the instances, and then do what's necessary to it (archive/delete/whatyoulike).

    However, if you simply have a standardized list of files that always have the same name (no _YYYYMMDD, for example), you could even do this as an item list, and for each of these objects perform your tasks. You'd most likely want to name them with the .txt/.csv in the table/list.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (2/14/2011)


    Have you looked at the for each object in SSIS? Perform a loop using the file name's pattern to find all the instances, and then do what's necessary to it (archive/delete/whatyoulike).

    However, if you simply have a standardized list of files that always have the same name (no _YYYYMMDD, for example), you could even do this as an item list, and for each of these objects perform your tasks. You'd most likely want to name them with the .txt/.csv in the table/list.

    I have actually used the For each Object in SSSI to loop thru files but never to find within those files, files that match based on a set to compare to. If I were looking for all files that begin with X or end with X that would be easy but I need to fin all files whose first X number of characters (where X can vary from between 2 and 8 characters in length) match any one of N possible values where N is a set of matches that can vary from a few to as many as 12.

    The files we would want to delete or at least move can and will vary in number and so the whole process has to be dynamic.

    Imagine you have a table containing a list of file prefixes and each of these varies from 2 characters to 8 characters in length. You want to go thru all files in a specified folder and get the prefix of the filename and compare that prefix to this list of prefixes and if you find a match then that file needs to be deleted or moved.

    A files prefix is the first N number of letters up until the first period. So in the file abcd.2011014.xml the prefix would abdc . If in my list I had abcd then I would want to delete or move that file so that it does not get imported.

    Make sense?

    Kindest Regards,

    Just say No to Facebook!
  • Makes perfect sense.

    I can't post images inline here so I'll try to explain this as descriptively as possible.

    The general idea here is loop the loop.

    First, get your table of prefixes into an ADO Recordset using a simple dataflow task. Build a package scoped variable as an Object (I'll call it rsPrefix from here on), another one named strPrefix as a string, and a third called strFile as a string. Next, setup your OLEDB connection to pull your data out of table you're storing your prefixes in. Then put it to the Recordset Destination component.

    Now that your object is loaded, create a foreach loop using Foreach ADO Enumerator in the Collection item. Set the ADO object source variable as User::rsPrefix. Switch to the "Variable Mappings" on the left, and set the first item to User::strPrefix with Index 0. Index is just the column #'s here.

    Now, inside this loop, build ANOTHER ForEach Loop, this one ForEach File enumeration. Set your folder appropriately, set files to something impossible, a.b.c.d.e.f.g for example, so just in case anything goes awry nothing gets damaged.

    You'll want to retrieve the file name as Name and Extension. In variable mappings, set User::strFile to Index 0. Click OK here. Now we have fun with expressions.

    In the Collection tab, click the plus to the left of expressions, then the ellipsis (...) on the right. In the pop up, select FileSpec and then use: @[User::strPrefix] & (DT_STR, 2,1252)"*" as your expression. This will take the prefix and append an asterisk to it for the filename.

    Now, you'll use strFile for anything inside the loop, such as a file system task of move or delete file. Let's go with move for the moment. Under Source you'll want to build an expression of your original directory & strFile, and under destination you'll want to build an expression of your target directory & strFile.

    I hope that helps. Let me know if anything was unclear.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • YSLGuru (2/14/2011)


    GOTCHA #1: The process we use to download an d import is proprietary and can not be altered so we can't simply not download the few files we don't want to import.

    Just making sure... are the "download" and "import" processes executed separately and are they individual processes where we can wedge some other process inbetween?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • IMO this might be simpler if implemented in a Script Task.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Jeff Moden (2/14/2011)


    YSLGuru (2/14/2011)


    GOTCHA #1: The process we use to download an d import is proprietary and can not be altered so we can't simply not download the few files we don't want to import.

    Just making sure... are the "download" and "import" processes executed separately and are they individual processes where we can wedge some other process in between?

    Jeff

    In short, yes.

    The system, a Windows Service, is proprietary and so I cannot change how it works but I can insert in between the Download Data step and the Import Data step a step that calls a stored procedure of my choice including a custom SP. The only limitation is in what can be passed and nothing can be returned so the SP would need to be able to do %90 of the desired task without any run time arguments passed to it.

    Kindest Regards,

    Just say No to Facebook!
  • Phil Parkin (2/15/2011)


    IMO this might be simpler if implemented in a Script Task.

    That's what I would do.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • YSLGuru (2/22/2011)


    Jeff Moden (2/14/2011)


    YSLGuru (2/14/2011)


    GOTCHA #1: The process we use to download an d import is proprietary and can not be altered so we can't simply not download the few files we don't want to import.

    Just making sure... are the "download" and "import" processes executed separately and are they individual processes where we can wedge some other process in between?

    Jeff

    In short, yes.

    The system, a Windows Service, is proprietary and so I cannot change how it works but I can insert in between the Download Data step and the Import Data step a step that calls a stored procedure of my choice including a custom SP. The only limitation is in what can be passed and nothing can be returned so the SP would need to be able to do %90 of the desired task without any run time arguments passed to it.

    Then you could use xp_Dirtree to read the file names (using the 3rd operand) into a table and convert the ones you want to delete to viable DOS delete commands via xp_CmdShell or via (possibly) a script task depending on what you've allowed in your surface area configuration. To be sure, if you need it, there are ways to run DOS commands without actually using xp_CmdShell. If you don't want to use something like xp_Dirtree, you could use something like...

    DECLARE @ShellRefNum INT

    EXEC sp_oaCreate 'wScript.Shell', @ShellRefNum OUT

    EXEC sp_oaMethod @ShellRefNum, 'Run', NULL, 'CMD /c "DIR C:\ /s /a-d /b" > c:\Temp\WScriptTest.txt'

    EXEC sp_oaDestroy @ShellRefNum

    ... and then do a BULK INSERT from the c:\Temp\WScriptTest.txt file.

    If you can use it, the xp_DirTree command would look like this...

    EXEC Master.dbo.xp_DirTree 'C:\',1,1

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Parkin (2/15/2011)


    IMO this might be simpler if implemented in a Script Task.

    For one with SSIS knowledge/experience this probably is easier to do in SSIS. However if you haven't trained in SSIS or worse, used somehing else before like DTS or some other ETL tool, trying to use SSIS for even the simplest of task feels very much like trying to create a Rube Goldberg device to peform a task easier done without a device.

    I posted here only because it looked like I had no choice but to look at SSIS for this. If you have a link to an example Script Task for something like this, working with files I'd be most thankful. U have worked wih the FSO scripting ibject before in VB so I'm familiar with the Drives,Folders & Files techniques, I just am literally fearful of SSIS.

    Thanks again

    Kindest Regards,

    Just say No to Facebook!
  • Jeff

    Don't laugh but I'm not that familiar with Command line syntax and so I never thought about trying something like xp_DirTree. Do you know of any examples using xp_DirTree to work with files from within T-SQL? I'm not famialir with command line operations and so the example you gave is not as clear as an example T_SQL Only piece of code.

    Thanks

    Kindest Regards,

    Just say No to Facebook!
  • YSLGuru (3/15/2011)


    Jeff

    Don't laugh but I'm not that familiar with Command line syntax and so I never thought about trying something like xp_DirTree. Do you know of any examples using xp_DirTree to work with files from within T-SQL? I'm not famialir with command line operations and so the example you gave is not as clear as an example T_SQL Only piece of code.

    Thanks

    Let's define the problem a bit more specifically. Post the pattern of the file names you want to work with. Also post the "share" (usually in the form of a UNC like the following... \\machinename\path\subpath\filename.ext).

    If you were to also include a description of the fields in the file (usually by a record layout) and the create table statement for the staging table you want the files to load into, I can show you some "stuff". 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • YSLGuru (3/15/2011)


    Phil Parkin (2/15/2011)


    IMO this might be simpler if implemented in a Script Task.

    For one with SSIS knowledge/experience this probably is easier to do in SSIS. However if you haven't trained in SSIS or worse, used somehing else before like DTS or some other ETL tool, trying to use SSIS for even the simplest of task feels very much like trying to create a Rube Goldberg device to peform a task easier done without a device.

    I posted here only because it looked like I had no choice but to look at SSIS for this. If you have a link to an example Script Task for something like this, working with files I'd be most thankful. U have worked wih the FSO scripting ibject before in VB so I'm familiar with the Drives,Folders & Files techniques, I just am literally fearful of SSIS.

    Thanks again

    No need to use FSO now that you've got the power of .NET available to you.

    Here is an example of a script task which deletes all filenames ending in a certain letter ('y' in my example) from a certain folder.

    So

    freddy.txt gets torched

    freddie.txt is safe.

    public void Main()

    {

    // Delete only those files in a specified folder which end in a specified letter.

    string folder = "C:\\Temp\\SSISFileToDelete";

    string letter = "Y";

    string[] fileList = Directory.GetFiles(folder, "*" + letter + ".*");

    foreach (string f in fileList)

    {

    File.Delete(f);

    }

    Dts.TaskResult = (int)ScriptResults.Success;

    }

    Pretty straightforward, once you know how.

    One other thing you'll need to do is add a line to the default usings list:

    using System.IO;

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Heh... and they say xp_CmdShell is a risk. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply