Attacking SSIS at last...ummm?...

  • I have a little time to prep for a new project, and I believe that SSIS may be the right choice for it. I say that after having completed the Microsoft: SSIS Tutorial: Creating a Simple ETL Package (http://msdn.microsoft.com/en-us/library/ms169917.aspx). I must admit that I finally understand what all the fuss is about, but I have questions.

    If I were to create a simple import process for 1 format, I believe I can handle that with only mild irritation. My issue is trying to figure out how to account for 11 different formats. My initial thoughts are to use the following Control Flow items:

    File Watcher Task - Monitors the FTP folder looking for when the client uploads a file. I still have no real idea how this object works, so I thought I would try to learn this.

    File System Task - I have 2 of these tasks; one to copy to a staging folder (not sure if I need this), and one to either copy or move to an archive folder.

    Script Task - I need to remove the bottom row of text on most (not all) of the files. This is a check sum row which always starts with the same text. I am using the files copied to staging to remove this text, and those will be the actual upload files. The original files will be moved (with the check sum line in place) to the archive folder to preserve the original data form.

    I also believe this is where I am supposed to review the different format versions (i.e. - Format 1 has CustID, TransDate, & Amount; Format 2 has CustID, Name, TransDate, & Amount, etc.) and select which path the control flow should follow. The data flow for each of these would be different so I would be able to place each of those on their distinct table.

    Foreach Loop Container - I know I need this, but I have no idea what it should encapsulate.

    Data Flow Task - I can set the file name in the directory using what I learned in the tutorial. Can I use multiple Data Flow Task items on the same Control Flow to push different data sets to different tables?

    Send Mail Task - I have not looked at this yet, so I am not sure if I will have issues with it. I saw it, and thought it looked like something I should learn.

    This is a practice run, so it is not a truly vital issue. That being said, I could use all the help I can get. Is there a way of leaving the data in the original folder, and testing if the data has already been loaded to see if I should load the file again? Is there a simple means of getting the file name, and parsing the date from the name itself (yes, the date is part of the name already)?

    I guess the best thing to ask is now that I've completed the first tutorial what should I focus on doing first? I need to learn this quickly, but I also want to learn it the right / best way I can.

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • Jarid,

    SSIS is a wide and varied topic. For examples of Script Task (a lot), ForEach Loop Task, Sequence Container, Data-Flow Task, WMI Reader Task, and I'm sure others I can't think of right now you might look at: http://nclsqlinv.codeplex.com/[/url], its a package I wrote for inventorying SQL environments and uses a lot of different tasks and tech to get under the hood. There is a lot to see there.

    You have listed a few tasks that I want to address specifically.

    The File Watcher task basically watches for a filem but that means the package has to be running while it is watching. I'm not a big fan of having packages sitting out there watching for something to happen. YMMV.

    The File System task is one I rarely use, I generally opt for a Script Task, most of the uses for a File System task are moving files. I can easily accomplish this in .Net and I feel I have more control. Again, YMMV.

    Send Mail Task, this one I generally like but it has some limitations, such as it does not do any authentication other than trusted authentication, so if your mail host requires authentication you'll have to revert to using a Script Task.

    CEWII

  • To your specific points.

    File Watcher: I've never had much luck with this and usually just iterate the job to repeat every 5 minutes or so, with a quit successfully if the file is not found.

    Alternatively, you could do something like the following code stolen off one of my projects. It checks until the configured variable's time (6 AM in this case) looking for a file to come in. Once it does it processes. If it doesn't, it fails the job so I know we're out of date.

    Public Sub Main()

    '

    ' Add your code here

    '

    Dim result As String

    Dim TimeToFail As String

    TimeToFail = CStr(Dts.Variables("TimeToFailRepeatedChecks").Value)

    Do While Format(Now(), "HH:mm:ss") < TimeToFail

    result = FileSystem.Dir(CStr(Dts.Variables("FlatFilePathName").Value) & "StaticFilePattern_*.*")

    If result = "" Then

    System.Threading.Thread.Sleep(20000)

    Else

    Dts.TaskResult = Dts.Results.Success

    Exit Sub

    End If

    Loop

    ' If file never found.

    Dts.TaskResult = Dts.Results.Failure

    End Sub

    File System Tasks: Try it, but the script equivalent is more robust. I use it rarely.

    I hate removing Checksum rows. It's a mainframe holdover that should be shot, or at the least moved to a different file. I don't know of a better approach then dealing with it as a stream via Script before trying to feed it into a data source.

    You will need a different connection built for each format, but since you're looking at different data flows I don't think you're going to run into an issue there. However, how are you going to detect which format this should be using? Is there a naming pattern of some kind you can identify it with?

    What you'll want once you do know is to use conditional precedents to help control them. Drag a successful between two objects then double-click it and you'll see what options are available.

    The forEach loop is to just continuously loop through, for example, all the files in a directory one at a time. A little blog research will get you a lot of walkthroughs on that.

    The Control flow can hold as many dataflows as you like. You'll probably want to make sure you're not running more than one at a time however due to the complexity of multiple formats you're going to be dealing with.

    I'd avoid Send Mail. It's crotchety in my opinion. Work with sp_dbsendmail (or whatever it is) in the main server as a better option.

    Is there a way of leaving the data in the original folder, and testing if the data has already been loaded to see if I should load the file again?

    Don't archive it until you're done? Testing data is going to be a bit difficult but I usually always leave my archive component as the final piece, so that if anything broke a rerun automatically picks up the problem file once I correct whatever was misbehaving.

    Is there a simple means of getting the file name, and parsing the date from the name itself (yes, the date is part of the name already)?

    ForEach will easily let you get the filenames to a variable if you tell it what directory to look in, since it allows wildcarding. Once it's in a variable, it's easy to manipulate in a script component.

    I guess the best thing to ask is now that I've completed the first tutorial what should I focus on doing first? I need to learn this quickly, but I also want to learn it the right / best way I can.

    Lower your complexity on multiple data flows for now. Create a simple 10 row file, copy/name it 10 different dates, shove it into the same folder, and learn to process them iteratively into where you want them and then get them archived. Then, include another format type and learn to do conditional controls and keep them straight via detection and variable settings for the conditional precedents.

    That's my recommendation on where to start.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (2/23/2012)


    To your specific points.

    File System Tasks: Try it, but the script equivalent is more robust. I use it rarely.

    Second that one. The File System Task is sometimes counter-intuitive. (you have to use rename to move a file to a different folder if you are using expressions. Instead of move. Makes sense? Not really...)

    In .NET it only takes 2 lines of very simple code.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (2/24/2012)


    Evil Kraig F (2/23/2012)


    To your specific points.

    File System Tasks: Try it, but the script equivalent is more robust. I use it rarely.

    Second that one. The File System Task is sometimes counter-intuitive. (you have to use rename to move a file to a different folder if you are using expressions. Instead of move. Makes sense? Not really...)

    In .NET it only takes 2 lines of very simple code.

    I personally never use the file system task, I alway go for the script task becuse I am in control more.

    CEWII

  • Thank you all for your responses. You've given me a great deal to think about, and a greater focus on my research for this weekend.

    I want to make sure that I understand the recommendations you're making:

    FileWatcher - Ignore this one for the most part. Set a program which looks into the folder every so often until a set time. Once the file is there then call the dtsx package (BTW, I have only ever been able to do this via a command line call. It was not very pretty, and I REALLY want to believe there is a better way to do this). If the final test time passes, and the file still is not there then set a WatcherFlag = Fail. I like that plan since I already know how to do most of that in VB.

    FileTask - Consider this a plague, and avoid it as such. Use a script component, and do whatever is needed in VB. Again, I like that...especially with the comments about needing to use rename instead of move. Again, this is something I can already do in VB.

    SendMailTask - Hit or miss seems to be the best thing that was said about this. I have not played with the sp_sned_dbmail before, but from what I see this seems like a much better option. I'll play and should have that up and running today.

    Multiple Formats - I have a few questions on this one still. From what I can tell (both from the prior comments as well as my own research) if there are 4 different file types - each of which have their own format - and the file name can note the file type then I can set a Foreach Loop looking for the first file type, process all of those, end that Foreach Loop, move to the next format type with a new Foreach Loop, etc.

    What happens when I have 4 different formats within the same naming convention? Do I need to detect that with an external process before the dtsx is called, modify the file name to account for the specific format type (i.e. - AD_1, AD_2, AD_3, & AD_4), and then use the earlier mentioned technique? Is there a way that I can run this detection in the dtsx using a script component? If so, how do I know which connection manager to call to connect to the file? Or am I not understanding the connection manager as well as I think I am.

    My only thought that I can come up with on how to do all of this in the dtsx package is to pass the file name as a variable in the Foreach Loop, detect the format in a script task using If InStr(StrRead,StrTestFor) Then Path = Call matching dtsx package. If I am understanding SSIS correctly then that should not require the connection manager until after I know which format I'm dealing with. Doing it that way I would also be able to run just 1 Foreach Loop, and deal with n formats.

    For the time being, I'm going to run the make 10 different files with 10 different dates, etc. recommendation. I am fairly sure I will be at least at a working level of understanding by Monday, and what ever else I need I'll learn during the project.

    To that end, my project... I am heading over to a partner company's site. They manage a number of our accounts, but they do this for more than just our company. We need a daily feed of the account statuses, so the option that we've come up with is to have them send a backup of their database each day. They do not want to send the whole database since that has other company's data sets included. I am to create a procedure that will isolate only our records, back up the database, and send the back up to my company via an SFTP connection. Once on our end I will restore the database (replacing the previous day's database), and run a process to update our permanent tables (which I believe will end up being a separate database).

    I know this is a somewhat convoluted means of doing the data transfer, but both companies are under the gun to get this project up YESTERDAY. And neither I nor my counterpart's skill level in SSIS is very high, and we don't have much time to devote to this project exclusively. As such we're just field dressing the wound, and we'll get it into surgery later.

    Given my goals, does my approach seem viable?

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • For simplicity of purpose - on your foreach question, I'd frankly create a foreach loop specific to each file type (since each file type will tend to have a slightly different way to read files, parse the contents), etc.....

    Assuming that once you have brought the files in the processing then becomes the same, then the steps INSIDE the foreach loop would be essentially:

    import the file (custom by file type)

    Execute a package (which uses the imported stuff to run a standardized process)

    I've usually found it cleaner to set up multiple loops than to make a single loop with a lots of "ifs" involved.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Attacking SSIS at last...

    I finally started doing the same thing and had to stop. They really took serious offense to me hitting the server with a bat.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/24/2012)


    Attacking SSIS at last...

    I finally started doing the same thing and had to stop. They really took serious offense to me hitting the server with a bat.

    Wood, Aluminum or Composite?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (2/24/2012)


    Jeff Moden (2/24/2012)


    Attacking SSIS at last...

    I finally started doing the same thing and had to stop. They really took serious offense to me hitting the server with a bat.

    Wood, Aluminum or Composite?

    I think wood would be most satisfying, the clang when using aluminum is a downer..

    CEWII

  • Elliott Whitlow (2/24/2012)


    SQLRNNR (2/24/2012)


    Jeff Moden (2/24/2012)


    Attacking SSIS at last...

    I finally started doing the same thing and had to stop. They really took serious offense to me hitting the server with a bat.

    Wood, Aluminum or Composite?

    I think wood would be most satisfying, the clang when using aluminum is a downer..

    2 votes for wood, and 1 additional vote for a 12-gauge.

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • jarid.lawson (2/24/2012)


    Elliott Whitlow (2/24/2012)


    SQLRNNR (2/24/2012)


    Jeff Moden (2/24/2012)


    Attacking SSIS at last...

    I finally started doing the same thing and had to stop. They really took serious offense to me hitting the server with a bat.

    Wood, Aluminum or Composite?

    I think wood would be most satisfying, the clang when using aluminum is a downer..

    2 votes for wood, and 1 additional vote for a 12-gauge.

    "Swing away" brings some nice visuals to mind.. I had a server in the past that deserved a horrible painful "de-commissioning".

    CEWII

  • Elliott Whitlow (2/24/2012)


    jarid.lawson (2/24/2012)


    Elliott Whitlow (2/24/2012)


    SQLRNNR (2/24/2012)


    Jeff Moden (2/24/2012)


    Attacking SSIS at last...

    I finally started doing the same thing and had to stop. They really took serious offense to me hitting the server with a bat.

    Wood, Aluminum or Composite?

    I think wood would be most satisfying, the clang when using aluminum is a downer..

    2 votes for wood, and 1 additional vote for a 12-gauge.

    "Swing away" brings some nice visuals to mind.. I had a server in the past that deserved a horrible painful "de-commissioning".

    CEWII

    TNT comes to mind for those things.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • SQLRNNR (2/24/2012)


    Jeff Moden (2/24/2012)


    Attacking SSIS at last...

    I finally started doing the same thing and had to stop. They really took serious offense to me hitting the server with a bat.

    Wood, Aluminum or Composite?

    Or the actual animal (going Ozzy)?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply