Troubleshoot Data Flow Pre-Execute Process

  • Yeah, that's right. The pre-execute process.

    It's taking me three hours just to get through pre-execution on one particular Data Flow task, before it even starts loading the data.... GRRR. This is driving me nuts.

    I'm currently testing it with an increased MaxBufferRowSize and logging the BufferTuning Event, but I'm not sure that'll help. I've played with all the usual settings in the Data Flow task to speed up performance, but my biggest worry is why it's taking so long for the darn thing to think about what it's about to do.

    BTW, quick note, the Proc pulling the records for the data source references 2 different servers on the network. Linked Servers. However, I've tuned my Stored Procedure as best I can given the circumstances and since the issue is in the pre-execute, I can't figure out what might be causing the problem.

    Any thoughts?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Try and disable package validation.

    Every little bit of gained juice will help you

    ~PD

  • Pre-execute is not just thinking about what it is going to do. It is actually running the stored procedure and filling the first buffer.

    Pre-execute on some components in SSIS does a lot of work. OLEDB Sources actually execute their queries and wait for data to start coming back and Lookup components in full cache mode fill their cache and sort the data.

    Your particular problem is likely the stored procedure taking forever to return data. All is not lost - it does not have to complete to get out of the pre-execute, in most cases, it just has to start returning data. So, if you can speed up the initial return of data, you will probably see improved results overall because the procedure will run in parallel with your data flow a bit more. However, if the procedure is tuned as well as it is going to get, you could simply be finished.

    Here is another thought - the procedure is using a couple of linked servers - did you consider doing the work being done in the procedure using SSIS? SSIS is pretty good at collecting data from several sources, joining it together, and transforming it. You may find that you can do the same operation in SSIS is less time.

  • Michael,

    Actually, the SSIS package I'm building is pretty huge already and the last few times I tried to use LOOKUP, my performance ended up being worse than using a stored procedure with Left Outer Join. I'm unfortunately in a situation where the package was supposed to be done weeks ago and I can't afford delay to play with something I wasn't able to make work the first several times I tried it. So I'll have to experiment on my own time after putting what I've got into production.

    pduplessis, As far as turning off package validation goes, you can't turn off package validation. You can delay it until the individual steps are ready to run, but you can't turn it completely off. And since I was only executing one task (not the entire package), I know delay won't help.

    I've tuned and tuned my stored procedure and it's as good as it's going to get. I guess I'll just have to let it lie for now. Thanks for the advice, guys.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • If you can temporarily live with the performance, that is probably what you need to do.

    LOOKUP components are not very fast once you get to a lookup table that is of any size. Remember that the MERGE JOIN component is a pretty good alternative if you can sort your data at the data source (rather than using a SORT component).

  • MERGE JOIN is something I'll look into. The one time I played with it, I couldn't quite figure it out. I can't remember exactly what my problem was with it.

    Thanks, Michael.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • It is just a join of two sorted data sets.

    Remember, like everything else in SSIS, it is data-type, case, accent, and trailing space sensitive.

  • Brandie,

    Thanks for setting the validation part straight. One day when I am grown up, I will start playing with some serious SSIS performance tuning...

    Kirk Hassleden wrote a reaaaally good SSIS tuning article. Its old, but worthwhile the read. Maybe you can find some answers in here...

    http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-ssis-tuning-the-dataflow-task/

    Good luck man

    ~PD

  • Michael,

    It just occurred to me... Doesn't using a Merge Join mean you're pulling all columns of the table into SSIS prior to the task anyway?

    I think that's why I originally went with a stored proc. Because on the SP I could do all my joins and only pull a specific # of columns instead of "pulling" all the columns of all the tables into SSIS prior to processing. Or maybe this is just an assumption I made somewhere along the way.

    PD, thanks for the link. I'll definitely look into it.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Using a MERGE JOIN, you would have to include the columns you intend to join on. With a stored procedure, you may not need these columns, but they are probably not a significant portion of the data you are consuming.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply