March 2, 2016 at 11:01 am
Hi all,
I have a process written in SQL that is around 4000 Line, the process is basically 25 section of SQL code that pull data from different data source then do some data validation, aggregation, calculation and finally feed a DW tables.
I am trying to automate this code and put everything in one or 2 stored procedure but the challenge is every section is pretty much relying on the output from the previous section. I am running the code manually right now to monitor the output from each section and make sure there is no errors.
Any thoughts on best practice to automate this huge code.
Greatly appreciated.
Thanks
March 2, 2016 at 11:27 am
Any thoughts on best practice to automate this huge code.
Best practice is to break up the code into small chunks, each of which does essentially one thing, and not to create a code Behemoth.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 2, 2016 at 11:29 am
Phil Parkin (3/2/2016)
Any thoughts on best practice to automate this huge code.
Best practice is to break up the code into small chunks, each of which does essentially one thing, and not to create a code Behemoth.
Then, of course, you would need a 'master' proc to drive the entire process. This would call the smaller procs you have created.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 2, 2016 at 12:05 pm
Phil - Do you mean a Master SP that will call 25 smaller SP? assuming I am turning each section (Chunk) to a SP.
Thanks,
March 2, 2016 at 12:17 pm
oneteabag (3/2/2016)
Phil - Do you mean a Master SP that will call 25 smaller SP? assuming I am turning each section (Chunk) to a SP.Thanks,
Precisely.
Takes more setting up initially, but far easier to debug and maintain going forward.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 2, 2016 at 12:34 pm
That's a lot of work indeed..
Thanks,
March 2, 2016 at 12:45 pm
The alternative, since this appears to be some type of ETL process to feed a DW table, would be to create an SSIS package.
That tool is specifically built for this job.
March 2, 2016 at 1:04 pm
I haven't used SSIS before, will it handle this giant 4000 Line Code?
March 2, 2016 at 1:05 pm
Hugo Kornelis (3/2/2016)
The alternative, since this appears to be some type of ETL process to feed a DW table, would be to create an SSIS package.That tool is specifically built for this job.
IMO that's an overkill if everything is in the same server and can be accomplished in pure T-SQL.
SSIS would require additional resources for O&M that aren't always available.
March 2, 2016 at 1:09 pm
Luis Cazares (3/2/2016)
Hugo Kornelis (3/2/2016)
The alternative, since this appears to be some type of ETL process to feed a DW table, would be to create an SSIS package.That tool is specifically built for this job.
IMO that's an overkill if everything is in the same server and can be accomplished in pure T-SQL.
SSIS would require additional resources for O&M that aren't always available.
I agree, if we're talking about a single database.
But if there are multiple DBs involved, even on the same instance, I'd use SSIS to avoid the creation of explicit cross-database dependencies.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 2, 2016 at 1:15 pm
Hugo Kornelis (3/2/2016)
The alternative, since this appears to be some type of ETL process to feed a DW table, would be to create an SSIS package.That tool is specifically built for this job.
The previous recommendation would be ideal for building it in SSIS also...the 'master sp' becomes SSIS. I would even consider this solution is all sources were from the same database and the destination was the same database - if for no other reason than the fact that I can control the batch and commit sizes without having to write code in SQL.
With that said - I would guess that the OP is pulling the data from multiple sources over a linked server (although he didn't specifically state that - he did state multiple sources). Additionally, the data validation - aggregation - calculation could be simplified in SSIS and performed in the pipeline between the source and destination, improving processing time.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
March 2, 2016 at 1:22 pm
Phil Parkin (3/2/2016)
Luis Cazares (3/2/2016)
Hugo Kornelis (3/2/2016)
The alternative, since this appears to be some type of ETL process to feed a DW table, would be to create an SSIS package.That tool is specifically built for this job.
IMO that's an overkill if everything is in the same server and can be accomplished in pure T-SQL.
SSIS would require additional resources for O&M that aren't always available.
I agree, if we're talking about a single database.
But if there are multiple DBs involved, even on the same instance, I'd use SSIS to avoid the creation of explicit cross-database dependencies.
I'd like to have cross-database dependencies to analyze if any change to the schema would impact current code.
There's not enough information to give solid advise.
March 2, 2016 at 1:23 pm
oneteabag (3/2/2016)
I haven't used SSIS before, will it handle this giant 4000 Line Code?
If you decide to use SSIS (and note that some important remarks about this have been made by others), you would not use your existing code, You would build an SSIS package from scratch, using the native SSIS tools.
Using SSIS just to call an existing stored procedure that does all the work will not help you at all, that will just add a layer of complexity.
March 2, 2016 at 1:29 pm
Luis Cazares (3/2/2016)
Phil Parkin (3/2/2016)
Luis Cazares (3/2/2016)
Hugo Kornelis (3/2/2016)
The alternative, since this appears to be some type of ETL process to feed a DW table, would be to create an SSIS package.That tool is specifically built for this job.
IMO that's an overkill if everything is in the same server and can be accomplished in pure T-SQL.
SSIS would require additional resources for O&M that aren't always available.
I agree, if we're talking about a single database.
But if there are multiple DBs involved, even on the same instance, I'd use SSIS to avoid the creation of explicit cross-database dependencies.
I'd like to have cross-database dependencies to analyze if any change to the schema would impact current code.
There's not enough information to give solid advise.
I'm surprised that you think this. Cross-database dependencies limit scale-ability (because you can't move the DBs to separate instances without refactoring) and it's all too easy to create what I think of as 'circular-reference syndrome', where database A depends on database B and vice versa, meaning that you cannot recreate one database, from script, unless the other already exists.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 2, 2016 at 1:39 pm
To Clarify it is just One database we are talking about.
So the question is : would I create one SSIS with 4000 line of SQL code or I need to break this to 25 small SSIS For each Section?
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply