May 31, 2023 at 9:14 pm
I'm trying to create an SSIS package that will query a master control table [TableList], which contains 2 columns (SourceTable and TargetTable), then load those tables from a source server to a target server.
It "runs", but although the variables with the table names seem to update for each iteration, it looks like the table metadata doesn't update for each new iteration for each new pair of tables. So the package errors out on the second step through that indicates it's still looking for columns from the first table:
Error: 0xC0202005 at Data Flow Task, OLE DB Source [105]: Column "facility_id" cannot be found at the datasource.
Error: 0xC004701A at Data Flow Task, SSIS.Pipeline: OLE DB Destination failed the pre-execute phase and returned error code 0xC0202025.
Information: 0x4004300B at Data Flow Task, SSIS.Pipeline: "OLE DB Destination" wrote 0 rows.
(where facility_id is a column in the first table, but not in subsequent tables).
Here's what I've done:
What I've tried to fix this:
Based on the watch, the correct table names look like they're getting retrieved. But it seems that the metadata for the columns in the data flow source/destination are stale and aren't refreshed for each new pair of tables.
Thanks for any assistance you can offer!
SQL 2016. VS2019, with package set to 2016 mode.
Rich
May 31, 2023 at 9:33 pm
The meta data isn't going to change, that's a designer thing.
What you're trying to do isn't going to work.
Once you create a data flow that's it, that's the only metadata it will have.
You'll have better luck using stored procedures if you need to do this dynamically. Then you can use your meta to build statements and execute them. Of course depending on source and dest servers/platforms this may not be an option.
Another option might be to create the DTSX files on the fly. Maybe look at BIML (BI markup language). Never really used it so I can't offer much advice. Here's a resource: https://www.bimlscript.com/
June 1, 2023 at 12:56 am
Isn't this a job for "Replication"? IIRC, it can even update the meta data in the target to match that in the source.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 1, 2023 at 9:28 am
If you can purchase third-party software products, the following may be of interest:
https://www.cozyroc.com/ssis/data-flow-task
But, as stated above, the structure of SSIS packages is fixed at design time. If you create separate dataflows for each of the tables in your control table, you can make this work, though it makes your package ungainly. Better to have separate packages (one per table) and have a master package which executes them, once again depending on what is in the control table.
Despite the additional complexity, each of these solutions offers a potential performance improvement over your current intended architecture, because the multiple dataflows / packages may be executed in parallel.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 1, 2023 at 11:55 am
this is one of those cases where you do NOT want to use SSIS native dataflow to do the transfer.
<li class="text--left" style="text-align: left;">powershell - same as above or using Dbatools
<li class="text--left" style="text-align: left;">bcp (out and in)
in all of the above you would supply source/destination server and a list of tablename pairs (source/destination name)
if columns between source/dest are exactly the same it is easier - if they differ you will need to do a bit more code (bulkcopy columnmapping) but nothing significant.
June 1, 2023 at 9:12 pm
Thanks everyone for your responses. Good to have confirmation that I wasn't simply inept in not getting my test to work with dynamic iteration over a list of tables. I'm surprised to learn that SSIS doesn't natively support dynamic refresh of table metadata (I'm not an SSIS guru by any means).
Phil: up until now, I've been running this package with manually-created pairs of OLEDB source/destination, one pair per table. I put all of these source-destination pairs (there are maybe 20 pairs, for 20 tables) into one data flow, in one package. You suggested splitting these off into one-table-per-package (or per-dataflow). Is there any reason for your recommendation? At least in debug mode, the tables are processed in parallel when I put them all into one dataflow.
Jeff: I have restricted access to the source server, just SELECT permissions on a single schema, so setting up replication is unlikely to be possible without a lot of conversation and negotiation.
Thank you all again!
June 2, 2023 at 8:54 am
There are reasons for separating logic into multiple packages, even though it may seem less tidy and requires more config up front.
Number 1 is around error handling and the debugging of errors. If you have 20 source-target pairs in a single dataflow and any one of them has an error, you might find it difficult to track down which pair(s) caused the error. You may also find it painful trying to reproduce the error, because all of the other successful pairs are running within the same container. If you have separate packages, this whole process is easier, from identification to resolution.
Number 2 is control. You can control the order of execution and which packages run in parallel or series. I use an SSIS execution framework which allows me to define all of this in metadata.
Number 3 is maintenance and adherence to software engineering best practices (each package should do one thing, and keep it simple).
SSIS packages tend to grow over time. Before you know it, you could be dealing with a monster, unless you stick to best practices.
Having said all of this, it does feel like overkill in your situation. One intermediate step you should consider is to put each source-target pair into its own dataflow. This would immediately give you control over execution order and parallelism and would make debugging easier, because each dataflow can be executed independently of the others (via right-click / Execute Task).
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 2, 2023 at 1:34 pm
Thanks for the detailed reply, Phil. I guess I meant to include in my last post "is there a reason to split this up other than for debugging/error-trapping?".
For my purposes, this is easier to maintain in one data flow, and I've been running this package with an Agent job for over a year without any errors (with about 20 tables in the one data flow). I need to add some more tables, and I started to think this was getting a bit overloaded to maintain with all these source/destination pairs; I take your points. Still, replacing those with 20 data flows feels to me like making this more work, not less, for me to maintain. Sounds like you are doing some more advanced SSIS work than my simple ETL (as I've implemented most of my value-added work using post-load T-SQL stored procs, called in SSIS as Execute SQL tasks).
I don't like being *that guy* who creates expedient rather than well-ordered code... I'll give this some more thought.
Still a bit surprised this isn't provided as native functionality in SSIS, without resorting to third-party tools or C# coding. (Then again, MS shipped the last version of Visual Studio without the ability to work with SSIS, so...)
Thanks again! I haven't been on here in quite some time, but there was a long period when I was on here almost daily, and I've learned a lot from your posts over the years!
June 2, 2023 at 4:03 pm
No problem.
Just picking up on one point in your last comment, I 100% agree with using stored procs to implement business logic on the SQL Server side. I mostly use SSIS to move data from A to B, where A and B are not the same database (or technology), and to orchestrate that movement. The simpler, the better.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
June 2, 2023 at 4:23 pm
Couple things I would like to add to Phil's reply about logic separation.
Way back when we started with SSIS in 2005 and didn't have a clue, we built some monster packages. They wouldn't load properly and took forever and would even crash VS. One thing SSIS likes to do is validate everything upon opening. Separation helps alleviate this.
One of the projects we created back in 2005 as one monster package is still running today but has about 30 packages. Our rule is generally one data flow per package. As Phil said it makes testing and debug much easier.
Another thing that doesn't work well with SSIS is code merging. With separation you can have people work on individual packages and still be able to merge their changes since the code is in separate files. Even if not merging, it still allows multiple devs to work on the same project more easily.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply