June 17, 2009 at 2:20 pm
I created an SSIS package that pulls data and partitions them based on State and then Partition Number.
The partitioning works great, the trouble I have is the performance of writing directly to the files on the destination server which is internal.
The script component uses the parameters and opens up a StreamWriter for each state and channel. At the row-level a counter per set of states and paritions is kept to distribute the records evenly. In this case there are 6 files. The files are written directly to their destination server (\\destserv\stategroup1\file1.dat etc.)
The server the SSIS is running on has a 1Gb card but is only using 1% of network utilization
On the desination server side it's using 12% of a 100Mb card.
If I copy a file generated from the source server to the destination I get 70% utilization on the destination server and 7% on the source server.
Is there a way to increase performace here? I'm next going to test a write out to the source server and file copy to the destination. The previous test with file copy was from a file generated last month using another process.
Also, I thought this was the appropriate forum to post this in... if it should go in the Performance Tuning my apologies in advance.
June 17, 2009 at 2:46 pm
Just curious, why are you using a script task for this? I aggree that you should attempt to run a test where you write the files locally and then move them to the destination server, but I have a feeling that you'll find the bottleneck in your script task.
June 17, 2009 at 3:05 pm
The number of files we send varies. I found myself creating several flat file destinations. I was thinking to programatically write these files out directly to the destination.
Before going this route I had 9 destination flat files. I used a conditional split to sort out the states which was fine, but to split the next level just seemed to give up the flexibility I was looking for.
So instead I decided to write the files by script component instead because the flexibility was great, although it appears I suffer in performance.
Writing to local files is definitely helping. During the first 30 minutes the package has written just over 4.5GBs.
It'll probably be better to create the partitioned sets in tables (or a table) and cycle through and export.
Either way, it was fun getting this to work in a Script Task.
June 17, 2009 at 3:14 pm
You could create one file destination and use the expression builder to create the file names so your process is dynamic enough to create as many (or few) files as needed based on your partitioning logic.
As far as getting the partitions into a data flow, here's what I'd do:
1. Control Flow - execute SQL task: call a SP that retrieves just the distinct states and partition #s from the database.
2. For each loop container - use this to assign the loop through the recordset from #1 and assign state, partition numbers into variables.
3. Data flow within loop - your Source query can use the variables in #2 to get the data for that partition. You'll then write that data to your file destination and use whatever identifying values (such as the variable values) in your expression builder to define the file name, path, etc.
Make sense?
June 17, 2009 at 8:43 pm
You have mentioned "Script Task" and "Script Component" as if they are the interchangeable - they are not. Just for clarification, I'm assuming that you mean Script Task here?
If you were using a Script Component for this, I could understand why performance might be lacking ...
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
June 17, 2009 at 8:56 pm
Have you tried using BufferedStream ?
June 18, 2009 at 8:07 am
I'm using Script Component for this.
John,
Yep I see where you're going with that. That is the route I'm planning on taking if this doesn't pan out. Appreciate the help.
Thanks for the suggestion of the bufferedstream. I'm not sure how to implement it exactly.
A few excerpts from the script regarding the way I write to a file:
I declare a 2D array of streamwriters under the public class:
Public textWriter(10, 10) As StreamWriter
I loop through the states and partition counts and assign files to the streamwriters:
For i = 1 To Me.Variables.numdays
For j = chanbot To chantop
textWriter(i, j) = New StreamWriter(basepath + "day" + CStr(i) + "idwcdp" + CStr(j) + ".txt")
Next
chan(i) = chanbot
Next
At the row level I check the value of state and write to the appropriate file. The partition (chan) for that day is cycled to write out the data evenly.
If .state statelim1 and .state<=statelim2) then
day = 2
Else
day = 3
End If
textWriter(day, chan(day)).WriteLine(Row.lname)
textWriter(day, chan(day)).WriteLine(Row.fname)
textWriter(day, chan(day)).WriteLine(Row.address)
.... etc etc
If chan(day) = chantop Then
chan(day) = chanbot
Else
chan(day) = chan(day) + 1
End If
End With
From trying to quickly implement the BufferedStream it appears it doesn't like StreamWriter as an object type:
Value of type system.io.streamwriter cannot be converterd to system.io.stream
The error was from attempting this in the Pre-Execute Class:
bufStream(i, j) = New BufferedStream(textWriter(i, j))
Along with this beforehand:
Public bufStream(10, 10) As BufferedStream
June 18, 2009 at 8:59 am
Actually you can use FileStream, which is also buffered. Here is how you can construct it:
textWriter(i, j) = New StreamWriter(New FileStream(basepath + "day" + CStr(i) + "idwcdp" + CStr(j) + ".txt", FileMode.CreateNew))
June 18, 2009 at 9:11 am
Phil Parkin (6/17/2009)
You have mentioned "Script Task" and "Script Component" as if they are the interchangeable - they are not. Just for clarification, I'm assuming that you mean Script Task here?If you were using a Script Component for this, I could understand why performance might be lacking ...
Phil, you've broght up an interesting point here. I try to avoid script tasks/components unless absolutely necessary and even then I try to use it for variable manipulation, error processing, and non-data centric operations. Can you elaborate on the differences between the script task and component? I'm assuming that the component processes rows 1-by-1 similar to the OLE DB Command transformation. Is this what you are referring to?
June 18, 2009 at 9:17 am
John Rowan (6/18/2009)
Phil Parkin (6/17/2009)
You have mentioned "Script Task" and "Script Component" as if they are the interchangeable - they are not. Just for clarification, I'm assuming that you mean Script Task here?If you were using a Script Component for this, I could understand why performance might be lacking ...
Phil, you've broght up an interesting point here. I try to avoid script tasks/components unless absolutely necessary and even then I try to use it for variable manipulation, error processing, and non-data centric operations. Can you elaborate on the differences between the script task and component? I'm assuming that the component processes rows 1-by-1 similar to the OLE DB Command transformation. Is this what you are referring to?
Actually the script tasks / component are very fast. They are in fact full-blown .NET classes. The performance problem is not caused by the used technology.
The difference between script task and script component is that:
- script task is piece of functionality used in the SSIS package control flow.
- script component is piece of functionality used in the SSIS data flow.
This is it. The same terminology is applied to the other parts of the SSIS framework. You have FTP Task and then you have Lookup Component.
June 18, 2009 at 11:26 am
I implemented your suggestion CozyRoc and didn't see a performance gain. I also tried increasing bufferrow and buffersize properties for the dataflow which didn't yield antying.
I also tried to use FileStream on it's own with a function I found in an example. Critical changes are:
Public textWriter(10, 10) As FileStream
textWriter(i, j) = New FileStream(basepath + "day" + CStr(i) + "idwcdp" + CStr(j) + ".txt", FileMode.Create)
AddText(textWriter(day, chan(day)), .lname & .fname & .address & .city & .state & .zip _
& .zip4 & .phone & .mtgdt & .mor & .mor & .blank & .npid & .ssn)
Private Shared Sub AddText(ByVal fs As FileStream, ByVal value As String)
Dim info As Byte() = New UTF8Encoding(True).GetBytes(value)
fs.Write(info, 0, info.Length)
End Sub
Again, no performance increase, roughly 1 million records written every 1 minute and 48 seconds. Same as before if not slower.
Thanks again for the suggestions... I learned a couple of things here. If you have other suggestions I welcome them. If not I'll go the other route of writing out one file at a time and handle the partitioning in an SQL Task.
June 18, 2009 at 2:34 pm
June 18, 2009 at 2:38 pm
I suppose so... thanks again for your suggestions.
June 18, 2009 at 2:42 pm
June 18, 2009 at 5:47 pm
A Script Task is the way to go here in my opinion.
File I/O is the bottleneck so you want to limit the number of writes you do. I recommend caching the data you want to write to the file for a bit and then flushing it all to file.
One way to do this is to write a class that handles this for you. So basically the class holds every line you add in memory for a while until the limit is hit, at which point it flushes out to file.
A friend did something similar before and also created the files locally before copying them to the destination. This was the most efficient he was able to get it.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply