Help with SSIS please

  • Hello,

    I am starting to try and get into learning SSIS and would appreciate some pointers please.

    I would be grateful if the hierarchy of the following could be explained and how they relate to one another:

    * Projects

    * Packages

    * Solutions

    I've seen a few SSIS tutorials, but none of them seem to really start at the beginning as far as I can see and don't explain the relationship between the above or how they should be used.

    Initially, for the sake of simplicity I have created two SSIS projects. One for each flat file I'm importing into an SQL table. By keeping each of these in a separate project it seems to make thing simpler. However, if I were to want to pull in several flat files into various tables for a given purpose or application, should I place all these packages into a single project?

    Any guidance or advice welcome.

    Regards

    Steve

  • The solution is the root element of the overall SSIS file structure which is determined by the .SLN file extension, which is usually in the root of the directory. The project (.dtproj) is the holder of multiple packages (.dtsx), it knows what packages are in the project. Then the packages are the things that actually do the work.

    The example you have given below can easily be done in one package with the use of a for each loop (FEL) and a variable. The FEL would be set as a file enumerator which looks at a particular directory. It would then set the variable to the path of the file it current has read, then you can change the connection manager to that particular file on the fly as its based on what is stored in the variable.

    There are lots of good resources on SSIS out there and a load on SSC as well.

    This link might be a good place to start, especially in relation to loops and variables SSIS 101 By Andy Leonard[/url].

    There is also the stairway to SSIS, in the Stairways section as well here[/url]

  • Thanks for your help.

    However, I have seen the "For Each" loop and followed a tutorial using one. But, I guess what I'm really asking is how should I group SSIS tasks?

    In my original scenario regarding multiple flat file imports, I was meaning several files of differring formats i.e where a single connectiin manager would not suffice (If, indeed, I understand this at all!).

    If I had say a dozen flat files which were all different in format, but all files related to a single application or system, should I place all these tasks into separate projects?

    In other words, what should the scope of a "Project" be? I am currently keeping each SSIS task in separate projects to keep things simple, but is there a right and wrong philosophy here? I don't want, for example, to have dozens of packages in a single project for fear of screwing up another package in the same project etc etc.

    Any help welcome.

  • Personally I group tasks in order of acheving the overall goal. So for example, if one goal is to copy 100 tables from ServerA to ServerB, I would class that as 1 package with 100 tasks, I wouldnt class it as 100 packages with 1 tasks as you have to think about maintainability. Its easier to maintain 1 package than it is to maintain 100 packages.

    So in your example, if the goal was ApplicationA creates 10 files, each file has a different number of columns, I need to get all 10 files into 10 tables, I would class that as 1 package with 10 data flow tasks.

    Now if the files are timestamped daily then that brings in loops as you would have to deploy the package every day with the updated timestamps, or write an expression which modifies the connection string to the flat files at run time. So there are two options on that if file name changes daily.

    A project is a collection of packages, so if you have for your example 12 projects with 1 package in each then thats just overkill, I would of done it as 1 project with 12 packages as the scope of the tasks is the same, or even 1 project 1 package depending what needs to be done as you can enfoce precedence constraints between the tasks. One example I can give is in relation to data warehousing and having multiple OLTP environments which are pulled from OLTP to the warehouse nightly. There is 1 project with a large number of packages, as each package falls under the same scope of loading the warehouse its the same project. Then there are multiple packages which go to the different OLTP environments and gets the data.

    You could just have 1 main SSIS project which all of your packages live in for every task you ever want to do. The most important part of the project is the packages, they are the things that execute the tasks, they are they things that are deployed to SQL. It then all boils down to personal preference.

    Do you go for a solution/project/package for 1 task

    Do you go for a solution/project/#packages for a scope of tasks

    Do you go for a soltuion/project/#packages for every tasks your business ever needs

    My preference is option 2, a new SSIS solution per scope, but you might want to do it differently

  • Hmmm .. quite a bit of food for thought there, thank you.

    So, just to clarify for me (apologies for labouring this point) if in my example I had to import 12 flat files with different numbers of columns in each (Let's keep it simple and assume static filenames), then would it not be better to have each flat file import task in a separate package under the same project as opposed to having all 12 in a single package?

    I ask because I wonder how easy it is to maintain one package with 12 tasks in from the perspective of separation for maintenance purposes. or, If I wanted to only execute a particular one of those 12 flat file imports, wouldn't this be easier if they were in their own packages?

    Apologies once again for being somewhat slow on the up take with all this.

  • Again, personal preference.

    In the example you have given below, within BIDS (business intelligence development studio/visual studio) you have the option to right click a task and run it on its own. Once deployed to a server it runs the whole package. So if for example the application creates 6 files 1 night, 6 files the other night, then I would have that as 2 packages 6 tasks each, if you want to just re import 1 via a deployed package then yes it would be 12 packages with 1 task other wise the package would just run all 12 imports again. So if 1 file is created at 8AM, 1 and 9AM etc, and you wanted to import them at 8:05AM, 9:05AM etc I would treat that as 12 packages, as you only need to execute a specific task, you dont want to import all 12 files again.

    It is a case of weighing up what is needed against what you think is best for your environment and maintaning that.

    Once you know the requirements of what is needed, when certain things need to happen etc then you can make the informed choice of 1 package lots of taks, multiple packages 1 task.

  • Thank you so much for all your help and advice.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply