T-SQL (Proc) Conversion to SSIS!

  • Jeff Moden (7/24/2012)


    Koen Verbeeck (7/24/2012)


    Jeff Moden (7/23/2012)


    Thanks for the response, Jason. Do you have any good examples of where SSIS might outshine T-SQL? Please don't say "anything that needs xp_CmdShell". 😉

    How would you assess row level error handling with TSQL? For example, in SSIS you can stream data in the dataflow and you have a component that converts strings to integers (because the datasource was dirty and everything was stored as strings). If a conversion fails, you route the offending row to a flat file for further inspection.

    There are two ways. In T-SQL, I'd simply prevalidate the rows and mark the ones that had an error in them in a set-based fashion. If it's an import, both BCP and BULK INSERT (as of 2005 for Bulk Insert) have the ability to "automagically" route bad rows to an errata file for further inspection. Rumor has it that SSIS uses the same Bulk Insert engine.

    Good to know. Thanks for the answer.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Jeff Moden (7/24/2012)


    Koen Verbeeck (7/24/2012)


    Jeff Moden (7/23/2012)


    Thanks for the response, Jason. Do you have any good examples of where SSIS might outshine T-SQL? Please don't say "anything that needs xp_CmdShell". 😉

    How would you assess row level error handling with TSQL? For example, in SSIS you can stream data in the dataflow and you have a component that converts strings to integers (because the datasource was dirty and everything was stored as strings). If a conversion fails, you route the offending row to a flat file for further inspection.

    There are two ways. In T-SQL, I'd simply prevalidate the rows and mark the ones that had an error in them in a set-based fashion. If it's an import, both BCP and BULK INSERT (as of 2005 for Bulk Insert) have the ability to "automagically" route bad rows to an errata file for further inspection. Rumor has it that SSIS uses the same Bulk Insert engine.

    Routing the bad rows via BCP/Bulk is similar to the SSIS method. The pre-validation takes two passes.

    SSIS has it's uses, and some of them in particular that come to mind are optimization items:

    - Previous transformation dependent logic

    What I mean by this is the cases where you are forced to do multiple update statements against a table (unless you're going to overcomplicate the single update statement) because previous update values are required for the secondary update. If you have enough of these even the multi-computation single pass update will slow down due to optimizer complexity.

    SSIS can evaluate this as each row goes 'through' and only do a single delivery of the final output.

    - Collapsing transformations

    The Asynchronous Script component allows for much easier row-collapsing then multi-passing the data in a staging table. It does require a background in some simple coding techniques in your language of choice, however, and the component itself can have a painful learning curve.

    (You'll note the two above are particularly optimized to avoid multi-write and to make all changes 'in stream' before write)

    - Lookup Chains

    Updating multiple associated fields (for example, when setting the facts to use the IDs of the dimensions) I've found can be more optimal when done as a series of finite stream updates via the Lookup Component than a single update multi-column change against a staging table. This will depend heavily on the volume of data in your dimensions.

    - Configurations, particularly environmentally controlled

    While this could theoretically as easily be done with reading your variables in from a local database table that's standardized on all your servers, it's already done for you. Why fight the system and possible 'code misses' on multi-server deployments?

    - Easier to interface with external apps

    I've found that on the rare occassions I need to interface with external exes (with all the workload of a CLR component) and due to the fact that we run a dedicated SSIS server, my sysadmins and multi-DBEngine DBAs are more comfortable with SSIS talking to external code than the DB, so things like string splitters and the like are much easier to push through via SSIS.

    - File interpreter tools

    Biased: I find the definition/template files for BCP and Bulk to be a royal pain in the a$$. The GUI interface for setting up my file definitions makes me much faster in setting up text inbounds.

    - Simpler controls - File Access and Error Handling

    Another Bias: I find SSIS to be much simpler for file system looping and control pathing dependent on results and findings. I also find error controls to be much easier to deal with in SSIS compared to T-SQL, though this may be more familiarity than an actual fact. Try-Catch is new and a toy to me, SSIS Error controls are not and I've built some pretty heavy duty error controls in there.

    So, in some cases: It Depends on circumstances. In other cases: It's preference.

    However, sorry Abrar, didn't mean to go off on a diatribe there without actually discussing your case.

    If this is the same code with merely dynamic ID seed increments and ID pull restrictions, then I agree, I wouldn't attempt to bring this to SSIS. I couldn't say for sure unless you can define more about the dynamic pieces.

    To your questions, though, SSIS is no more secure (if anything, it's less due to network traffic and sniffing between yet another software being involved) than a properly secured and administered DB Server.

    For dynamic deployment, you're looking at using configurations. Not overly difficult to setup but if you're not currently doing multi-server (or environmentally controlled) configuration setups, and don't intend to in the near future, there's no reason to reinvent the wheel, so I generally agree with the other posters above. Create this proc with parameters for your dynamic #s, deploy it to the servers (since those names imply that they're always on the same server since you're just swapping DBs) and use jobs to call them with the correct parameters on each server.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (7/24/2012)


    However, sorry Abrar, didn't mean to go off on a diatribe there without actually discussing your case.

    If this is the same code with merely dynamic ID seed increments and ID pull restrictions, then I agree, I wouldn't attempt to bring this to SSIS. I couldn't say for sure unless you can define more about the dynamic pieces.

    To your questions, though, SSIS is no more secure (if anything, it's less due to network traffic and sniffing between yet another software being involved) than a properly secured and administered DB Server.

    For dynamic deployment, you're looking at using configurations. Not overly difficult to setup but if you're not currently doing multi-server (or environmentally controlled) configuration setups, and don't intend to in the near future, there's no reason to reinvent the wheel, so I generally agree with the other posters above. Create this proc with parameters for your dynamic #s, deploy it to the servers (since those names imply that they're always on the same server since you're just swapping DBs) and use jobs to call them with the correct parameters on each server.

    Much comprehensive response. And surely my requirement is much simpler and same as

    [Quote]

    If this is the same code with merely dynamic ID seed increments and ID pull restrictions, then I agree, I wouldn't attempt to bring this to SSIS.

    [/quote]

    Thank you ppl!

  • Just the row-by-row error handling itself should be enough to give SSIS a fair look. And yes, SSIS OLE DB Destination w/FastLoad option uses the same bulkload API as BULK INSERT. After routing bad rows to an error file with bcp or BULK INSERT, the process of programmatically handling those error rows to actually do something useful with them requires two extra disk I/O operations over what SSIS requires, one to write the error file in the first place and a second to read it and act on them. SSIS places us squarely inside the data pipeline to handle those rows in memory however we like: writing them to a file, discarding them without writing them to disk at all, emailing an alert based on data within them, writing them to a table, endless options there....all with the full compliment of SSIS functionality.

    Awesome responses all around however I doubt anyone except Jeff can convince Jeff to come off the T-SQL kick long enough to give SSIS a fair shake, and that's unfortunate because Jeff would be valuable as an SSIS aficionado to compliment his black belt in T-SQL.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (7/26/2012)


    SSIS places us squarely inside the data pipeline to handle those rows in memory however we like: writing them to a file, discarding them without writing them to disk at all, emailing an alert based on data within them, writing them to a table, endless options there....all with the full compliment of SSIS functionality.

    Writing them to a file. Already done by BCP/BULK INSERT. No advantage to SSIS.

    Discarding them without writing them to disk at all. Already done by BCP/BULK INSERT. No advantage to SSIS.

    Emailing an alert based on data within them. WHAT data? It's bad data that the import failed to read in properly. Since the data is bad, other than basing the alert on the fact that there is bad data, what data within the bad rows would you base the alert on? No advantage SSIS.

    "endless options there....all with the full compliment of {T-SQL} functionality." 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/26/2012)


    opc.three (7/26/2012)


    SSIS places us squarely inside the data pipeline to handle those rows in memory however we like: writing them to a file, discarding them without writing them to disk at all, emailing an alert based on data within them, writing them to a table, endless options there....all with the full compliment of SSIS functionality.

    Writing them to a file. Already done by BCP/BULK INSERT. No advantage to SSIS.

    Not quite. With BULK INSERT and bcp we have no opportunity to manipulate the data before it is written to the file as we do with SSIS.

    Discarding them without writing them to disk at all. Already done by BCP/BULK INSERT. No advantage to SSIS.

    Hmmm, I have doubts. Having used BULK INSERT and bcp for quite some time I am pretty sure I have run into an issue trying to do this reliably. Do you mind sharing how you would accomplish this with BULK INSERT?

    Emailing an alert based on data within them. WHAT data? It's bad data that the import failed to read in properly. Since the data is bad, other than basing the alert on the fact that there is bad data, what data within the bad rows would you base the alert on? No advantage SSIS.

    I beg to differ. Sending an email is a silly example of what one could do, but that is kind of the point. You can do useful things or silly things, the options are endless, and as we know business users have a way of conjuring and justifying some pretty silly requirements. We also inherit some pretty nasty ones too from those that came before us. A more realistic option would be to write the bad rows to a table with a single wide column. Another common one is to send the bad row to a Transformation Script Component to manually parse the data row based on known data issues, e.g. embedded column delimiters or 'bad' control characters, before attempting to load the now parseable data into the staging table. At any rate, SSIS absolutely has a distinct advantage here.

    "endless options there....all with the full compliment of {T-SQL} functionality." 😀

    Heh, if T-SQL were complete and needed no complementary tools then xp_cmdshell would not exist 😀

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply