Processing large amounts of data in SSIS ?

Question

Post reply

Processing large amounts of data in SSIS ?

blasto_max

SSCertifiable

Points: 5025
More actions
October 23, 2013 at 11:11 pm

#299501

I have a huge result set from SQL, 2-3 million rows which need to be "processed". Is there anyway I can retrieve this set in small pieces to save memory ?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 1

You really need to give more specific details.

What exactly do you mean with "processed"? Which transformations are you using?

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

blasto_max SSCertifiable Points: 5025 More actions · Answer 2

Koen Verbeeck (10/24/2013)
You really need to give more specific details.
What exactly do you mean with "processed"? Which transformations are you using?

Sorry, I'll clarify. Pull, each record, do some concatenation of ALL columns of each row and then dump into a destination table (by script task). I have no choice but to do this complicated thing. So, I want to save memory. Did I explain that properly ?

Koen Verbeeck SSC Guru Points: 259105 More actions · Answer 3

Just curious: why do you use a script task as a destination? What is your destination?

Maybe there are easier alternatives.

As long as you don't use any blocking transformations (sort, aggregate and maybe your script transformation), the SSIS data flow will act as a pipeline.

Data flows out while data comes in. This means you can transfer a lot of data without experiencing memory pressure.

You can concatenate the columns with a derived column component, which is synchronous and will not lead to memory pressure.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

blasto_max SSCertifiable Points: 5025 More actions · Answer 4

Koen Verbeeck (10/24/2013)
Just curious: why do you use a script task as a destination? What is your destination?
Maybe there are easier alternatives.
As long as you don't use any blocking transformations (sort, aggregate and maybe your script transformation), the SSIS data flow will act as a pipeline.
Data flows out while data comes in. This means you can transfer a lot of data without experiencing memory pressure.
You can concatenate the columns with a derived column component, which is synchronous and will not lead to memory pressure.

In my etl, the source column to destination column is dynamic. Its not like Address goes to Home_Address etc.

There is only one source table and one target table. Source_Column_Number1 can contain anything like - Address, PersonName etc !!! (LOL !) Its like Source_Column_Number1 goes to Target_Column_Number1. Then, Source_Column_Number33 goes to Target_Column_Number1. YES ! That is applicable in my case. I have no control over that.

Phil Parkin SSC Guru Points: 246962 More actions · Answer 5

blasto_max (10/24/2013)
Koen Verbeeck (10/24/2013)
Just curious: why do you use a script task as a destination? What is your destination?
Maybe there are easier alternatives.
As long as you don't use any blocking transformations (sort, aggregate and maybe your script transformation), the SSIS data flow will act as a pipeline.
Data flows out while data comes in. This means you can transfer a lot of data without experiencing memory pressure.
You can concatenate the columns with a derived column component, which is synchronous and will not lead to memory pressure.
In my etl, the source column to destination column is dynamic. Its not like Address goes to Home_Address etc.
There is only one source table and one target table. Source_Column_Number1 can contain anything like - Address, PersonName etc !!! (LOL !) Its like Source_Column_Number1 goes to Target_Column_Number1. Then, Source_Column_Number33 goes to Target_Column_Number1. YES ! That is applicable in my case. I have no control over that.

You mentioned concatenation in an earlier post, yet what you are describing here is a dynamic data flow. Where does the concatenation come in?

How do you determine the column mappings?

It's rather advanced, but one idea would be to have a master package which determines the column mappings and then creates and runs a custom child package which uses these mappings.

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

corey lawson Hall of Fame Points: 3773 More actions · Answer 6

Effecting dynamic stuff like that in SSIS can be painful.

You somehow have to make the changes to the package at "design" time. Which can mean programatically creating a package in memory or manipulating a template package before executing either from the controller package or app.

Cozyroc has some custom controls that could help too.

EzAPI could help generate packages but not so good at manipulating existing package. BIML could help out too.

Manipulating ssis packages programmaticalky isn't well documented. The Ssis dev blog has some key bits of info...and a couple of other blogs have working non-trivial code too (such as, manipulating a derived column transform, etc)

Ive done this to walk a source db, use odbcgetschema functions to help generate ddl squ and run it on destination server, then set up package to pump source table to dest table.

Obviously this would be no brainer for swl server to sql server...

EzApi & BIML would be my bets now if I didn't have to use 32-bit data sources thru ODBC...

blasto_max SSCertifiable Points: 5025 More actions · Answer 7

corey lawson (10/25/2013)
Effecting dynamic stuff like that in SSIS can be painful.
You somehow have to make the changes to the package at "design" time. Which can mean programatically creating a package in memory or manipulating a template package before executing either from the controller package or app.
Cozyroc has some custom controls that could help too.
EzAPI could help generate packages but not so good at manipulating existing package. BIML could help out too.
Manipulating ssis packages programmaticalky isn't well documented. The Ssis dev blog has some key bits of info...and a couple of other blogs have working non-trivial code too (such as, manipulating a derived column transform, etc)
Ive done this to walk a source db, use odbcgetschema functions to help generate ddl squ and run it on destination server, then set up package to pump source table to dest table.
Obviously this would be no brainer for swl server to sql server...
EzApi & BIML would be my bets now if I didn't have to use 32-bit data sources thru ODBC...

Custom solutions like cozy roc, biml and server linking are simply not an option. All I got is SSIS and C#. Thats what makes it harder.

corey lawson Hall of Fame Points: 3773 More actions · Answer 8

It is the dynamic remapping of input & output fields in data flow tasks that bites...

Look at BIFuture.blogspot.com for C# code to add a derived column to ssis article...

The technet ssis team blog has some magic info you may need too.

With ezAPI you could potentially write a script task that creates a new in-memory pkg. It is doable without too. then execute that package. EzAPI makes hooking tasks together easier than just the api calls.

EzAPI seems to works great in vs 2010.

corey lawson Hall of Fame Points: 3773 More actions · Answer 9

Export to delimitted flat file, then bulk insert each file row to a table field big enough to handle each row?

blasto_max SSCertifiable Points: 5025 More actions · Answer 10

corey lawson (10/25/2013)
It is the dynamic remapping of input & output fields in data flow tasks that bites...
Look at BIFuture.blogspot.com for C# code to add a derived column to ssis article...
The technet ssis team blog has some magic info you may need too.
With ezAPI you could potentially write a script task that creates a new in-memory pkg. It is doable without too. then execute that package. EzAPI makes hooking tasks together easier than just the api calls.
EzAPI seems to works great in vs 2010.

Thanks for all the great suggestions. Wow ! there is really a lot I did not know about.

I think I found the right BIFuture link you mentioned -

http://bifuture.blogspot.com/2011/01/ssis-adding-derived-column-to-ssis.html

That is a maybe.

Can you give me the "technet ssis team blog" which is related to this problem ?

EzAPI is too much work for me. I read about and will pass it.

Thanks once again...and...

blasto_max SSCertifiable Points: 5025 More actions · Answer 11

blasto_max

SSCertifiable

Points: 5025

October 25, 2013 at 2:28 pm

#1661738

oops double post.

corey lawson Hall of Fame Points: 3773 More actions · Answer 12

Here's some of the obscure info regarding working with connection managers in a package programmatically, to get the AcquireConnection() method to actually work:

OleDB: http://blogs.msdn.com/b/mattm/archive/2008/08/22/accessing-oledb-connection-managers-in-a-script.aspx

there's another article there too for doing the same for an ADO.Net connection manager.

Here's some more links to look into:

http://blogs.msdn.com/b/mattm/archive/2008/12/30/samples-for-creating-ssis-packages-programmatically.aspx

http://blogs.msdn.com/b/mattm/archive/2008/12/30/api-sample-ado-net-source.aspx

Basically, MattM's blogs have some really useful details...

I didn't find this until the other day, oddly enough:

http://technet.microsoft.com/en-us/library/ms136025.aspx

The big thing, if trying to manipulate a DTS package object from a script task (Control Flow) instead of a script component (data flow), you'll need to find the right .Net libs to reference in your script task project before you can use them in a script task. They're already usable for Script Components.

You may need to make copies of the two DLLs (SqlServer.Dts.Pipeline.Wrapper, SqlServer.Dts.RuntimeWrapper) from the GAC and put them somewhere else so you can add them. I believe I posted my travails about this somewhere here on SQL Server Central forums too...

Another thing, when you see object/interface calls like "IDTS....90", those are for the SSIS 2005 libraries. "IDTS...100", 2008. "IDTS...110" , 2010/2012...