November 15, 2010 at 9:42 am
I'm trying to maximise the row through my Data flow process.
A large number of the columns are defined as VARCHAR(30). Does the SSIS dataflow treat varchar the same as the database engine? (that is to say, if a field is defined as varchar(20) and it contains only 2 characters, the size consumed is 2 bytes).
logically I think that is what should happed, but I can't seem to find an answer to confirm that on the net.
_____________________________________________________________________________MCITP: Business Intelligence Developer (2005)
November 15, 2010 at 11:39 am
The section Estimated Row Size/Length of this article seems to contradict that.
It says that the length is determined by the largest possible values in the columns. Which makes sense I think. The rows in the buffer have always the same size (unless they are all made smaller or bigger by some component), so it should be big enough to hold a row where all the columns contain their largest value. In the database however you can play with the row length in the pages. But unfortunately for you, buffers are not the same as pages.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
November 16, 2010 at 2:29 am
damn it. I suspected as much.
The 'largest values' row size I have to deal with is a horendous 4276 bytes! the actual data contained in the fields is probably more like 500-1000 bytes.
Thanks for answer my question.
_____________________________________________________________________________MCITP: Business Intelligence Developer (2005)
November 16, 2010 at 4:19 am
Can every column contain it's maximum size? There's no chance in shrinking the size?
If not, just don't use blocking components 🙂
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
November 16, 2010 at 4:32 am
But adding a semi blocking component will improve the performance. Check the following article. Hope this helps.
http://msdn.microsoft.com/en-us/library/cc966529.aspx
Regards,
Pravasis
November 16, 2010 at 5:06 am
pravasis (11/16/2010)
But adding a semi blocking component will improve the performance. Check the following article. Hope this helps.
Thanks for the very interesting article.
But I have trouble finding the section that says partial blocking transformations will improve performance. They introduce a new buffer and possibly a new worker thread, but does this automatically mean a performance improvement?
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
November 16, 2010 at 5:23 am
Well, it depends on the number of CPUs you are having. In Figure 6 of "Design approaches" section, you will find two additional union all which are redundant. This has been added to add an additional thread to improve the performance. Hope its clear now.
Regards,
Pravasis
November 16, 2010 at 5:27 am
da-zero (11/16/2010)
Can every column contain it's maximum size? There's no chance in shrinking the size?If not, just don't use blocking components 🙂
highly unlikely that they will - it's a poorly designed database, pretty much from the ground up, but hey, we are where we are - just got to live with it unfortunately.
the rest of the process is pretty clean a derived column task to add some audit info and a row count - thats it.
my next plan of attack is to look at reducing the field sizes, but it should be a one off move data migration, so it the effort may be worth more than the benefit...
_____________________________________________________________________________MCITP: Business Intelligence Developer (2005)
November 16, 2010 at 5:36 am
pravasis (11/16/2010)
Well, it depends on the number of CPUs you are having. In Figure 6 of "Design approaches" section, you will find two additional union all which are redundant. This has been added to add an additional thread to improve the performance. Hope its clear now.
Hmmm. The section you are referring to describes the use of parallelism to speed up a slow component. The union alls are used to reduce buffer size (I guess, there's not really an explanation) in each thread and the final union all is used to merge the two threads back together.
But it is not the same as saying "adding a semi blocking component will improve the performance". That is simply not true. It can improve performance in some occasions, but not always.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
November 16, 2010 at 5:42 am
Very correct. Probably thats the reason I suggested to go through the link to get a complete understanding and design the package accoring to the available resources and selecting best approach. Any ways, thanks for clarifying the point.
Regards,
Pravasis
November 16, 2010 at 10:57 am
pravasis (11/16/2010)
But adding a semi blocking component will improve the performance. Check the following article. Hope this helps.
Thank you for that article! It helped clarify some of the "unknown areas" of SSIS for me. In depth, yet easy to read - perfect! It definitely provoked some future awareness in this area for me.
November 17, 2010 at 9:37 pm
Even it helped me to understand the internal principles of SSIS in a better way and how we can tune the SSIS package to get the best performance.
Regards,
Pravasis
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply