Unzip .gz compressed XML file using SSIS

  • Hi,

    I need to pull a file from an SFTP server and unzip it as its compressed in .GZ format. Is that something that can be done within SSIS? Sorry for the noob Q but this is my first SSIS package.

    Thanks,

    S

    --
    :hehe:

  • Yes, you would use the EXECUTE PROCESS Task to run gzip to uncompress the file.

  • Thanks Mr.Pettis

    --
    :hehe:

  • Slick84 (3/16/2010)


    Thanks Mr.Pettis

    Now how does that go?? Oh yes, Mr. Pettis is my dad, I'm Lynn. 😉

    Your welcome, and if you need more help as you work through the process, don't hesitate to ask.

  • Hmm great.

    So i've been reading SSIS does not support official SFTP file retrieval and that I have to use EXECUTE PROCESS task in SSIS to run a command line utility program such as WINSCP to grab the file from the SFTP server. In fact, you were also in that forum discussion.

    Is that still true? Asking under the assumption some patch might have fixed this issue as SSIS 2K5 is 5 years old now.

    Thanks,

    S

    --
    :hehe:

  • Slick84 (3/16/2010)


    Hmm great.

    So i've been reading SSIS does not support official SFTP file retrieval and that I have to use EXECUTE PROCESS task in SSIS to run a command line utility program such as WINSCP to grab the file from the SFTP server. In fact, you were also in that forum discussion.

    Is that still true? Asking under the assumption some patch might have fixed this issue as SSIS 2K5 is 5 years old now.

    Thanks,

    S

    Yes, still true. Hasn't changed in SQL Server 2008 (SSIS 2008) either as far as I can tell.

  • Still true..

    CEWII

  • Okay, so i'll look into the EXECUTE TASK process. Thanks again.

    Typically, is it preferred to receive data in XML format or flat file? I know this might be something that comes down to personal preference, but since XML has tags defined, etc, Im thinking it would be preferred over flat files.

    I ask this because I have an option of receiving a GZ zipped flat file or GZ zipped XML file.

    Thanks,

    S

    UPDATE:

    I've been reading online that to receive data in XML files, when the data will be stored/imported in a database, is useless and thus the flat file is far more efficient. Any thoughts?

    --
    :hehe:

  • Flat files would be smaller and faster to receive. It really comes down to preference. Not having had to work with XML files (yet) and having much experience working with flat files (prefer fixed column/length files over csv files), you can guess my preference.

  • Lynn Pettis (3/17/2010)


    Flat files would be smaller and faster to receive. It really comes down to preference. Not having had to work with XML files (yet) and having much experience working with flat files (prefer fixed column/length files over csv files), you can guess my preference.

    ... and mine. The XML file contains everything the flat file contains, plus a lot of fluff (all of the mark-up commands).

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thanks guys for the responses. I just talked to the vendor and for this particular feed, they are only able to provide an XML file.

    The only reason I was trying to avoid this was because of the extra step involved to unzip it as I believe if it would have been a flat file, it wouldnt be zipped.

    Anyhow, good knowledge to know regardless. Thank you again.

    --
    :hehe:

  • Okay another question! hehe!

    Basically I have the SFTP profile setup on our server from where we are going to run the SSIS package and get the file from the SFTP server.

    So taking it one-step-at-a-time. The files on the server having a naming convention and is generated daily.

    To make it easier to visualize.. the folder on the SFTP server contains files up until 2 months ago for history & audit purposes and has the naming convention like so COMPANYNAME_Reviews_YYYYMMDD.xml.gz

    Now I want to grab only the latest file so basically today I would have grabbed COMPANYNAME_Reviews_20100317.xml.gz. When my SSIS package runs tomorrow I would want it to grab the 18th file and so on.

    Any ideas how I would tell my package to do this? It would be much easier to just specify a static name but since the dates are being stamped on the file name I'm sort of lost.

    Currently.. I have created a EXECUTE PROCESS task and under the PROCESS tab which has the "Executable" set as my WINSCP application (C:\Program Files\WinSCP\winscp.exe). Under the "Arguments" I have /console /script=C:\FolderSFTP\Vendor\ProductRatingFTPScript.txt which is a text file which runs the below code:

    #Option Batch - all prompts are automatically replied negatively

    option batch on

    #Option Conrfim - Toggles confirmations

    option confirm off

    #Open connect using this profile (must be setup on server executing script)

    open company@sftp.vendor.com

    #Change folder directory to Feeds

    cd /feeds

    #Opion Transfer Mode

    option transfer binary

    #Retrieve the file xml into this file

    get COMPANYNAME_Reviews_YYYYMMDD.xml.gz C:\FolderSFTP\Vendor\COMPANYNAME_Reviews_YYYYMMDD.xml.gz

    #Close session

    close

    #Close WinSCP

    exit

    Please notice where I retrieve XML file, I have to define the file name. But since this file name will change daily due to the dates changing on each days file, this process would be static than dynamic and I need it to run automatically basically. Can I use * (wildcard) to make this work?

    Hopefully my explanations make sense.

    UPDATE:

    Okay I just used the wild card and while it works, it pulls in all the files instead of just the latest one! Pretty obvious AFTER running it. LOL. Any advice?

    Thanks,

    S

    --
    :hehe:

  • look at date and string conversion functions. You can Getdate() convert it to a string in the format YYYYMMDD the concantenate it to the static part of the string to build the filename to retrive.

  • Hello Uripedes,

    Thanks for your response.

    I went a totally different route and used jscript to grab the latest file. I use a EXECUTE SQL task to run XP_CMDSHELL and have it execute the cscript which calls the jscript which in turn runs the WINSCP command line utility to grab the latest file only..

    Works great.

    Thanks,

    S

    --
    :hehe:

  • Glad ya got it... there's always more than one way to skin a... file sytem 😉

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply