SSIS Data Tuning Question

  • I'm trying to maximise the row through my Data flow process.

    A large number of the columns are defined as VARCHAR(30). Does the SSIS dataflow treat varchar the same as the database engine? (that is to say, if a field is defined as varchar(20) and it contains only 2 characters, the size consumed is 2 bytes).

    logically I think that is what should happed, but I can't seem to find an answer to confirm that on the net.

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

  • The section Estimated Row Size/Length of this article seems to contradict that.

    It says that the length is determined by the largest possible values in the columns. Which makes sense I think. The rows in the buffer have always the same size (unless they are all made smaller or bigger by some component), so it should be big enough to hold a row where all the columns contain their largest value. In the database however you can play with the row length in the pages. But unfortunately for you, buffers are not the same as pages.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • damn it. I suspected as much.

    The 'largest values' row size I have to deal with is a horendous 4276 bytes! the actual data contained in the fields is probably more like 500-1000 bytes.

    Thanks for answer my question.

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

  • Can every column contain it's maximum size? There's no chance in shrinking the size?

    If not, just don't use blocking components 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • But adding a semi blocking component will improve the performance. Check the following article. Hope this helps.

    http://msdn.microsoft.com/en-us/library/cc966529.aspx

    Regards,
    Pravasis

  • pravasis (11/16/2010)


    But adding a semi blocking component will improve the performance. Check the following article. Hope this helps.

    http://msdn.microsoft.com/en-us/library/cc966529.aspx

    Thanks for the very interesting article.

    But I have trouble finding the section that says partial blocking transformations will improve performance. They introduce a new buffer and possibly a new worker thread, but does this automatically mean a performance improvement?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Well, it depends on the number of CPUs you are having. In Figure 6 of "Design approaches" section, you will find two additional union all which are redundant. This has been added to add an additional thread to improve the performance. Hope its clear now.

    Regards,
    Pravasis

  • da-zero (11/16/2010)


    Can every column contain it's maximum size? There's no chance in shrinking the size?

    If not, just don't use blocking components 🙂

    highly unlikely that they will - it's a poorly designed database, pretty much from the ground up, but hey, we are where we are - just got to live with it unfortunately.

    the rest of the process is pretty clean a derived column task to add some audit info and a row count - thats it.

    my next plan of attack is to look at reducing the field sizes, but it should be a one off move data migration, so it the effort may be worth more than the benefit...

    _____________________________________________________________________________MCITP: Business Intelligence Developer (2005)

  • pravasis (11/16/2010)


    Well, it depends on the number of CPUs you are having. In Figure 6 of "Design approaches" section, you will find two additional union all which are redundant. This has been added to add an additional thread to improve the performance. Hope its clear now.

    Hmmm. The section you are referring to describes the use of parallelism to speed up a slow component. The union alls are used to reduce buffer size (I guess, there's not really an explanation) in each thread and the final union all is used to merge the two threads back together.

    But it is not the same as saying "adding a semi blocking component will improve the performance". That is simply not true. It can improve performance in some occasions, but not always.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Very correct. Probably thats the reason I suggested to go through the link to get a complete understanding and design the package accoring to the available resources and selecting best approach. Any ways, thanks for clarifying the point.

    Regards,
    Pravasis

  • pravasis (11/16/2010)


    But adding a semi blocking component will improve the performance. Check the following article. Hope this helps.

    http://msdn.microsoft.com/en-us/library/cc966529.aspx

    Thank you for that article! It helped clarify some of the "unknown areas" of SSIS for me. In depth, yet easy to read - perfect! It definitely provoked some future awareness in this area for me.

  • Even it helped me to understand the internal principles of SSIS in a better way and how we can tune the SSIS package to get the best performance.

    Regards,
    Pravasis

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply