October 23, 2013 at 11:11 pm
I have a huge result set from SQL, 2-3 million rows which need to be "processed". Is there anyway I can retrieve this set in small pieces to save memory ?
October 24, 2013 at 12:49 am
You really need to give more specific details.
What exactly do you mean with "processed"? Which transformations are you using?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 24, 2013 at 10:37 am
Koen Verbeeck (10/24/2013)
You really need to give more specific details.What exactly do you mean with "processed"? Which transformations are you using?
Sorry, I'll clarify. Pull, each record, do some concatenation of ALL columns of each row and then dump into a destination table (by script task). I have no choice but to do this complicated thing. So, I want to save memory. Did I explain that properly ?
October 24, 2013 at 1:40 pm
Just curious: why do you use a script task as a destination? What is your destination?
Maybe there are easier alternatives.
As long as you don't use any blocking transformations (sort, aggregate and maybe your script transformation), the SSIS data flow will act as a pipeline.
Data flows out while data comes in. This means you can transfer a lot of data without experiencing memory pressure.
You can concatenate the columns with a derived column component, which is synchronous and will not lead to memory pressure.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
October 24, 2013 at 2:51 pm
Koen Verbeeck (10/24/2013)
Just curious: why do you use a script task as a destination? What is your destination?Maybe there are easier alternatives.
As long as you don't use any blocking transformations (sort, aggregate and maybe your script transformation), the SSIS data flow will act as a pipeline.
Data flows out while data comes in. This means you can transfer a lot of data without experiencing memory pressure.
You can concatenate the columns with a derived column component, which is synchronous and will not lead to memory pressure.
In my etl, the source column to destination column is dynamic. Its not like Address goes to Home_Address etc.
There is only one source table and one target table. Source_Column_Number1 can contain anything like - Address, PersonName etc !!! (LOL !) Its like Source_Column_Number1 goes to Target_Column_Number1. Then, Source_Column_Number33 goes to Target_Column_Number1. YES ! That is applicable in my case. I have no control over that.
October 24, 2013 at 10:28 pm
blasto_max (10/24/2013)
Koen Verbeeck (10/24/2013)
Just curious: why do you use a script task as a destination? What is your destination?Maybe there are easier alternatives.
As long as you don't use any blocking transformations (sort, aggregate and maybe your script transformation), the SSIS data flow will act as a pipeline.
Data flows out while data comes in. This means you can transfer a lot of data without experiencing memory pressure.
You can concatenate the columns with a derived column component, which is synchronous and will not lead to memory pressure.
In my etl, the source column to destination column is dynamic. Its not like Address goes to Home_Address etc.
There is only one source table and one target table. Source_Column_Number1 can contain anything like - Address, PersonName etc !!! (LOL !) Its like Source_Column_Number1 goes to Target_Column_Number1. Then, Source_Column_Number33 goes to Target_Column_Number1. YES ! That is applicable in my case. I have no control over that.
You mentioned concatenation in an earlier post, yet what you are describing here is a dynamic data flow. Where does the concatenation come in?
How do you determine the column mappings?
It's rather advanced, but one idea would be to have a master package which determines the column mappings and then creates and runs a custom child package which uses these mappings.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 25, 2013 at 1:03 am
Effecting dynamic stuff like that in SSIS can be painful.
You somehow have to make the changes to the package at "design" time. Which can mean programatically creating a package in memory or manipulating a template package before executing either from the controller package or app.
Cozyroc has some custom controls that could help too.
EzAPI could help generate packages but not so good at manipulating existing package. BIML could help out too.
Manipulating ssis packages programmaticalky isn't well documented. The Ssis dev blog has some key bits of info...and a couple of other blogs have working non-trivial code too (such as, manipulating a derived column transform, etc)
Ive done this to walk a source db, use odbcgetschema functions to help generate ddl squ and run it on destination server, then set up package to pump source table to dest table.
Obviously this would be no brainer for swl server to sql server...
EzApi & BIML would be my bets now if I didn't have to use 32-bit data sources thru ODBC...
October 25, 2013 at 10:30 am
corey lawson (10/25/2013)
Effecting dynamic stuff like that in SSIS can be painful.You somehow have to make the changes to the package at "design" time. Which can mean programatically creating a package in memory or manipulating a template package before executing either from the controller package or app.
Cozyroc has some custom controls that could help too.
EzAPI could help generate packages but not so good at manipulating existing package. BIML could help out too.
Manipulating ssis packages programmaticalky isn't well documented. The Ssis dev blog has some key bits of info...and a couple of other blogs have working non-trivial code too (such as, manipulating a derived column transform, etc)
Ive done this to walk a source db, use odbcgetschema functions to help generate ddl squ and run it on destination server, then set up package to pump source table to dest table.
Obviously this would be no brainer for swl server to sql server...
EzApi & BIML would be my bets now if I didn't have to use 32-bit data sources thru ODBC...
Custom solutions like cozy roc, biml and server linking are simply not an option. All I got is SSIS and C#. Thats what makes it harder.
October 25, 2013 at 12:51 pm
It is the dynamic remapping of input & output fields in data flow tasks that bites...
Look at BIFuture.blogspot.com for C# code to add a derived column to ssis article...
The technet ssis team blog has some magic info you may need too.
With ezAPI you could potentially write a script task that creates a new in-memory pkg. It is doable without too. then execute that package. EzAPI makes hooking tasks together easier than just the api calls.
EzAPI seems to works great in vs 2010.
October 25, 2013 at 12:55 pm
Export to delimitted flat file, then bulk insert each file row to a table field big enough to handle each row?
October 25, 2013 at 2:27 pm
corey lawson (10/25/2013)
It is the dynamic remapping of input & output fields in data flow tasks that bites...Look at BIFuture.blogspot.com for C# code to add a derived column to ssis article...
The technet ssis team blog has some magic info you may need too.
With ezAPI you could potentially write a script task that creates a new in-memory pkg. It is doable without too. then execute that package. EzAPI makes hooking tasks together easier than just the api calls.
EzAPI seems to works great in vs 2010.
Thanks for all the great suggestions. Wow ! there is really a lot I did not know about.
I think I found the right BIFuture link you mentioned -
http://bifuture.blogspot.com/2011/01/ssis-adding-derived-column-to-ssis.html
That is a maybe.
Can you give me the "technet ssis team blog" which is related to this problem ?
EzAPI is too much work for me. I read about and will pass it.
Thanks once again...and...
October 25, 2013 at 2:28 pm
oops double post.
October 25, 2013 at 2:57 pm
Here's some of the obscure info regarding working with connection managers in a package programmatically, to get the AcquireConnection() method to actually work:
there's another article there too for doing the same for an ADO.Net connection manager.
Here's some more links to look into:
http://blogs.msdn.com/b/mattm/archive/2008/12/30/api-sample-ado-net-source.aspx
Basically, MattM's blogs have some really useful details...
I didn't find this until the other day, oddly enough:
http://technet.microsoft.com/en-us/library/ms136025.aspx
The big thing, if trying to manipulate a DTS package object from a script task (Control Flow) instead of a script component (data flow), you'll need to find the right .Net libs to reference in your script task project before you can use them in a script task. They're already usable for Script Components.
You may need to make copies of the two DLLs (SqlServer.Dts.Pipeline.Wrapper, SqlServer.Dts.RuntimeWrapper) from the GAC and put them somewhere else so you can add them. I believe I posted my travails about this somewhere here on SQL Server Central forums too...
Another thing, when you see object/interface calls like "IDTS....90", those are for the SSIS 2005 libraries. "IDTS...100", 2008. "IDTS...110" , 2010/2012...
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply