March 2, 2016 at 2:00 pm
Sorry a small addition: the first section of the code pull data from another DB over a linked server...The rest of section Is basically data validation, aggregation and calculation ...last section is to push result of calculation to a number of tables in DW.
March 2, 2016 at 2:53 pm
oneteabag (3/2/2016)
To Clarify it is just One database we are talking about.So the question is : would I create one SSIS with 4000 line of SQL code or I need to break this to 25 small SSIS For each Section?
Without details of exactly what you are trying to do, it's impossible to say, but my first inclination is to say neither/both. You'll probably need separate pieces for each section, but they can easily be combined into a single SSIS package. Also, since SSIS is essentially an object-oriented GUI, it doesn't really make sense to talk about lines of code. If you're planning to just plop your existing code into a SQL Task, don't bother with SSIS.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
March 2, 2016 at 4:39 pm
Thank you Allen. I am not very familiar with SSIS. The SQL code basically Pull revenue & GM by client for 5000 Client from SQL Server A To SQL sever B, then run a series of sales commission calculations based on predefined criteria. Last step is to push the data to a sales fact table.
the SQL code is big because there is a lot of exceptions whether the client is Domestic or international, also currency conversion, etc..
Right now I am running it section by section in SSMS to make sure there is no errors generated...which I think is not the best way to run it
Thanks,
March 2, 2016 at 10:12 pm
oneteabag (3/2/2016)
...but the challenge is every section is pretty much relying on the output from the previous section.
So why not use the single proc with the 4,000 lines of code? That's not a bad thing for a full up ETL process that does everything you want. You know it already works correctly, right? Why change it except to maybe add a little progress reporting/lgging and the CREATE PROCEDURE statement along with whatever parameters you need to pass and Bob's your Uncle. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 3, 2016 at 7:50 am
Jeff Moden (3/2/2016)
oneteabag (3/2/2016)
...but the challenge is every section is pretty much relying on the output from the previous section.So why not use the single proc with the 4,000 lines of code? That's not a bad thing for a full up ETL process that does everything you want. You know it already works correctly, right? Why change it except to maybe add a little progress reporting/lgging and the CREATE PROCEDURE statement along with whatever parameters you need to pass and Bob's your Uncle. 😉
I'm with Jeff on this, leave it as one SP, nothing wrong with that. Just remember if you do break it into 25 chunks, later you may need to update all 25 if you have to add any new fields. Myself I would rather have to touch it in one place than 25, and it could make testing it easier. A 'standard' or 'best practice' is usually what works best for you and your organization, IMHO.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
March 3, 2016 at 8:36 am
below86 (3/3/2016)
Jeff Moden (3/2/2016)
oneteabag (3/2/2016)
...but the challenge is every section is pretty much relying on the output from the previous section.So why not use the single proc with the 4,000 lines of code? That's not a bad thing for a full up ETL process that does everything you want. You know it already works correctly, right? Why change it except to maybe add a little progress reporting/lgging and the CREATE PROCEDURE statement along with whatever parameters you need to pass and Bob's your Uncle. 😉
I'm with Jeff on this, leave it as one SP, nothing wrong with that. Just remember if you do break it into 25 chunks, later you may need to update all 25 if you have to add any new fields. Myself I would rather have to touch it in one place than 25, and it could make testing it easier. A 'standard' or 'best practice' is usually what works best for you and your organization, IMHO.
If a new table column requires a change to all modularised procs in a large ETL process, the modularisation went badly wrong.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 3, 2016 at 9:28 am
What I am getting for all the opinion is I need to break these 4K Line code to a smaller Chunks either multiple SP or multiple SSIS.
Will it be easy to create the dependencies if I elect to use SP? i.e SP3 will only run if SP2 is successful and so on..just curious if someone tried it before.
March 3, 2016 at 9:46 am
Phil Parkin (3/3/2016)
below86 (3/3/2016)
Jeff Moden (3/2/2016)
oneteabag (3/2/2016)
...but the challenge is every section is pretty much relying on the output from the previous section.So why not use the single proc with the 4,000 lines of code? That's not a bad thing for a full up ETL process that does everything you want. You know it already works correctly, right? Why change it except to maybe add a little progress reporting/lgging and the CREATE PROCEDURE statement along with whatever parameters you need to pass and Bob's your Uncle. 😉
I'm with Jeff on this, leave it as one SP, nothing wrong with that. Just remember if you do break it into 25 chunks, later you may need to update all 25 if you have to add any new fields. Myself I would rather have to touch it in one place than 25, and it could make testing it easier. A 'standard' or 'best practice' is usually what works best for you and your organization, IMHO.
If a new table column requires a change to all modularised procs in a large ETL process, the modularisation went badly wrong.
Well, you would at least need to make changes to two then, the initial pull from the source, and another to add it to the warehouse. I'm talking about a new column/field being pulled from the source, either it's new to the source or was previously ignored. And then that value needs to work it's way through to make it to the warehouse. Depending on design and what each step is doing you may need to touch most if not all of them. Unless you are just coding a 'SELECT *', you will need add code for the 'new' column/field. And I think we would all agree using a 'SELECT *' is a bad idea.
-------------------------------------------------------------
we travel not to escape life but for life not to escape us
Don't fear failure, fear regret.
March 3, 2016 at 9:43 pm
My sound advise:
1) Leave the code as it is. The fact that a code is very long in and of itself does not mean it is bad. I would break it up only if you plan on re-using parts of the code for other areas. If not, leave it. If so, break the shared code into their own procs and call them.
2) Implement logging. Create a central logging table to log the progress. Make each key step write to it key info. That way you can easily see the current status in run time, and any key info you want to bring to it. You can even use it to test if the previous section worked as expected (e.g. pulled more than 0 rows), and you can action accordingly (e.g. don't run next section if it is 0).
That is how I would approach it....
As to the concept of SSIS, based off your responses I see that you are very new. I would suggest to keep it out of SSIS as it would require a significant rewrite/rework and learning curve of the tool to migrate it across, and depending on how you do it it might even perform worse.
March 4, 2016 at 9:12 am
Nice Thoughts newbie, I got the logging part but wasn't sure if I stay away from SSIS how I would automate this process.? Right now it is just a plain SQL Code that I run on demand.
March 4, 2016 at 9:26 am
below86 (3/3/2016)
Phil Parkin (3/3/2016)
below86 (3/3/2016)
Jeff Moden (3/2/2016)
oneteabag (3/2/2016)
...but the challenge is every section is pretty much relying on the output from the previous section.So why not use the single proc with the 4,000 lines of code? That's not a bad thing for a full up ETL process that does everything you want. You know it already works correctly, right? Why change it except to maybe add a little progress reporting/lgging and the CREATE PROCEDURE statement along with whatever parameters you need to pass and Bob's your Uncle. 😉
I'm with Jeff on this, leave it as one SP, nothing wrong with that. Just remember if you do break it into 25 chunks, later you may need to update all 25 if you have to add any new fields. Myself I would rather have to touch it in one place than 25, and it could make testing it easier. A 'standard' or 'best practice' is usually what works best for you and your organization, IMHO.
If a new table column requires a change to all modularised procs in a large ETL process, the modularisation went badly wrong.
Well, you would at least need to make changes to two then, the initial pull from the source, and another to add it to the warehouse. I'm talking about a new column/field being pulled from the source, either it's new to the source or was previously ignored. And then that value needs to work it's way through to make it to the warehouse. Depending on design and what each step is doing you may need to touch most if not all of them. Unless you are just coding a 'SELECT *', you will need add code for the 'new' column/field. And I think we would all agree using a 'SELECT *' is a bad idea.
If someone offered me the choice between
a) Modifying one long and complex piece of code, and
b) Modifying several small and well-organised pieces of code, where the function of each separate piece of code is clear
I would elect option (b) any day of the week.
The only circumstances under which option (a) would have a chance would be where I am the author of the code and know exactly what's going on.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 4, 2016 at 9:39 am
oneteabag (3/4/2016)
Nice Thoughts newbie, I got the logging part but wasn't sure if I stay away from SSIS how I would automate this process.? Right now it is just a plain SQL Code that I run on demand.
If you want it to run on a defined schedule, you could use SQL Server Agent or Window's scheduled tasks (or something else). Those would still be the options with SSIS.
March 4, 2016 at 11:47 am
Phil Parkin (3/4/2016)
If someone offered me the choice betweena) Modifying one long and complex piece of code, and
b) Modifying several small and well-organised pieces of code, where the function of each separate piece of code is clear
I would elect option (b) any day of the week.
The only circumstances under which option (a) would have a chance would be where I am the author of the code and know exactly what's going on.
I would love option (b) but, in most cases, I find they are neither short nor well-organized and the functionality is usually not clear because people have written them as if they were writing managed code usually relegated to handling just one row at a time and usually within a transaction.
I'm with you, though. If I write it, option (b) is frequently the right way for me to go because a lot of the functionality can and is reused.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 5, 2016 at 3:17 am
Luis Cazares (3/4/2016)
oneteabag (3/4/2016)
Nice Thoughts newbie, I got the logging part but wasn't sure if I stay away from SSIS how I would automate this process.? Right now it is just a plain SQL Code that I run on demand.If you want it to run on a defined schedule, you could use SQL Server Agent or Window's scheduled tasks (or something else). Those would still be the options with SSIS.
What he says is correct. Scheduling should not be the deciding factor whether to use SSIS or stored proc. This is because they both can be scheduled in relatively the same way. If you have access to SQL Server Agent on the server that would definitely be (my) preferred way as it will run even when your PC is off (as long as the server is still running), as opposed to Window scheduled tasks where your pc must be on and logged in for it to run.
March 9, 2016 at 1:38 pm
gabtugm (3/5/2016)
Luis Cazares (3/4/2016)
oneteabag (3/4/2016)
Nice Thoughts newbie, I got the logging part but wasn't sure if I stay away from SSIS how I would automate this process.? Right now it is just a plain SQL Code that I run on demand.If you want it to run on a defined schedule, you could use SQL Server Agent or Window's scheduled tasks (or something else). Those would still be the options with SSIS.
What he says is correct. Scheduling should not be the deciding factor whether to use SSIS or stored proc. This is because they both can be scheduled in relatively the same way. If you have access to SQL Server Agent on the server that would definitely be (my) preferred way as it will run even when your PC is off (as long as the server is still running), as opposed to Window scheduled tasks where your pc must be on and logged in for it to run.
I was thinking on using scheduled tasks in a server, not a workstation. But your comment is important for the OP to consider.
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply