Downside of a large number of script tasks in SSIS projects?

  • Hi Folks,
    I am currently working on a SSIS project that has many sub packages, being run in parralel, via a master package (in its current iteration, there are approximately 200 sub packages).  This project is created using a BIML metadata driven pattern.

    The sub packages are being used to BCP out data from a Sybase ASE data source, Zip the output using 7Z, query meta data from the source and write the meta data to the control databases log tables.

    In each of the sub packages, I want to include a C# script task, that is going to be used to parse an output log (that is capturing the Standard Output from the BCP task) and capture the file size of the BCP output prior to the zip task.  This will mean that the project could potentially have over 200 script tasks in total.  With the current server configuration (currently 4 cores), up to 6 sub packages could be running concurrently.

    I understand to make this work I will need to supply a unique ProjectCoreName for each script.  I had toyed around a single script task to run in the master package to parse all the log files after everything has been completed but it won't be able to retrieve the raw BCP output size, as the files would have already been compressed.

    Can anyone advise whether what I am attempting to do is a bad idea?

    Thanks

  • ash.j.hoff - Monday, July 17, 2017 5:26 PM

    Hi Folks,
    I am currently working on a SSIS project that has many sub packages, being run in parralel, via a master package (in its current iteration, there are approximately 200 sub packages).  This project is created using a BIML metadata driven pattern.

    The sub packages are being used to BCP out data from a Sybase ASE data source, Zip the output using 7Z, query meta data from the source and write the meta data to the control databases log tables.

    In each of the sub packages, I want to include a C# script task, that is going to be used to parse an output log (that is capturing the Standard Output from the BCP task) and capture the file size of the BCP output prior to the zip task.  This will mean that the project could potentially have over 200 script tasks in total.  With the current server configuration (currently 4 cores), up to 6 sub packages could be running concurrently.

    I understand to make this work I will need to supply a unique ProjectCoreName for each script.  I had toyed around a single script task to run in the master package to parse all the log files after everything has been completed but it won't be able to retrieve the raw BCP output size, as the files would have already been compressed.

    Can anyone advise whether what I am attempting to do is a bad idea?

    Thanks

    Assuming the script tasks are identical, it seems like a serious DRY violation putting the same thing into 200+ packages.
    Have you considered creating a SQLCLR routine which does whatever your script task does and then adding an ExecSQL task to your packages which calls this routine, with appropriate params? Seems a bit tidier.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Thanks for your reply Phil,
    Yes, a DRY violation pretty much sums it up.  I wasn't comfortable with where I was heading, hence why I posted.  I have only used scripting thus far in single packages.  It has worked properly and seemed straight forward enough (except for getting LINQ to work via BIML and MIST, but that is another subject for another time).
    Now, to be honest, I am not familiar with SQLCLR....yet.  I'll read up on it and see how it will fit.  I have just found Stairway to SQLCLR series, so I'll start with that.
    Cheers

  • ash.j.hoff - Tuesday, July 18, 2017 6:02 AM

    Thanks for your reply Phil,
    Yes, a DRY violation pretty much sums it up.  I wasn't comfortable with where I was heading, hence why I posted.  I have only used scripting thus far in single packages.  It has worked properly and seemed straight forward enough (except for getting LINQ to work via BIML and MIST, but that is another subject for another time).
    Now, to be honest, I am not familiar with SQLCLR....yet.  I'll read up on it and see how it will fit.  I have just found Stairway to SQLCLR series, so I'll start with that.
    Cheers

    Have fun learning & post back with any questions. To sum up: SQLCLRs are C# routines 'embedded' (my term) in a SQL Server database. The advantage of this is that they can be called in the same way as other SQL Server functions and procs.
    Practical examples usually include some form of string manipulation (eg, search using regular expressions), but I have used them in other ways too (eg, to execute a generic query and write out the results to a text file, with headers. Oh dear, I can imagine Jeff Moden (a regular poster) asking all sorts of questions about that!).

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • ash.j.hoff - Monday, July 17, 2017 5:26 PM

    Hi Folks,
    I am currently working on a SSIS project that has many sub packages, being run in parralel, via a master package (in its current iteration, there are approximately 200 sub packages).  This project is created using a BIML metadata driven pattern.

    The sub packages are being used to BCP out data from a Sybase ASE data source, Zip the output using 7Z, query meta data from the source and write the meta data to the control databases log tables.

    In each of the sub packages, I want to include a C# script task, that is going to be used to parse an output log (that is capturing the Standard Output from the BCP task) and capture the file size of the BCP output prior to the zip task.  This will mean that the project could potentially have over 200 script tasks in total.  With the current server configuration (currently 4 cores), up to 6 sub packages could be running concurrently.

    I understand to make this work I will need to supply a unique ProjectCoreName for each script.  I had toyed around a single script task to run in the master package to parse all the log files after everything has been completed but it won't be able to retrieve the raw BCP output size, as the files would have already been compressed.

    Can anyone advise whether what I am attempting to do is a bad idea?

    Thanks

    I know this is an older post and I know that Phil is waiting to bite my head off 😀 but you don't need to write code to parse the output log.  Unless you're saving the output log for posterity, you don't need to save that either.  If you learn to us xp_CmdShell properly, all of that will come back to you as a blob table that you can easily search and parse using T-SQL.

    And, no... turning xp_CmdShell off does nothing for security.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ash.j.hoff - Monday, July 17, 2017 5:26 PM

    Hi Folks,
    I am currently working on a SSIS project that has many sub packages, being run in parralel, via a master package (in its current iteration, there are approximately 200 sub packages).  This project is created using a BIML metadata driven pattern.

    The sub packages are being used to BCP out data from a Sybase ASE data source, Zip the output using 7Z, query meta data from the source and write the meta data to the control databases log tables.

    In each of the sub packages, I want to include a C# script task, that is going to be used to parse an output log (that is capturing the Standard Output from the BCP task) and capture the file size of the BCP output prior to the zip task.  This will mean that the project could potentially have over 200 script tasks in total.  With the current server configuration (currently 4 cores), up to 6 sub packages could be running concurrently.

    I understand to make this work I will need to supply a unique ProjectCoreName for each script.  I had toyed around a single script task to run in the master package to parse all the log files after everything has been completed but it won't be able to retrieve the raw BCP output size, as the files would have already been compressed.

    Can anyone advise whether what I am attempting to do is a bad idea?

    Thanks

    Sounds like you may benefit from introducing one more layer of abstraction into this. Can a new Child (Sub) Package be created that exposes Parameter that allow it to generically handle this work:

    1. BCP out data from a Sybase ASE data source
    2. C# script task parses output log (that is capturing the Standard Output from the BCP task) and captures the file size of the BCP output prior to the zip task
    3. Zip the output using 7Z
    4. query meta data from the source
    5. write the meta data to the control databases log tables
    The refactoring work would be complete when this new Child Package is called within the 200+ existing sub packages.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply