SSIS data import reduced to a crawl

  • Hi

    I am not looking for a solution to this because I believe I already have one, but possibly if others have suffered the same type of issues then maybe we can share the experiences.

    The problem is that when using SSIS to import data via ODBC from a 3rd pty RDBMS (Progress 10.1C) into SQL 2005 that initially the data flow runs very fast and then reduces to a crawl.

    In this instance I have to load all data from a Progress 10.1C database into SQL 2005....3.5M rows and 20 cols.....once done we will run scheduled tasks to keep data in sync. The problem I describe is representative of the 15 tables and 120M rows that I initially need to load into SQL.

    The destination table in SQL has an identical structure to the source table in Progress. The initial SQL table is a “heap”..ie there are no indexes or constraints and is empty....intend to create indexes etc after load.

    I created a new SSIS package using BIDS via the Imp/Exp Wizard...data source is Net Provider for ODBC and uses a preconfigured ODBC DSN (Progress OpenEdge Driver)...data destination is local SQL server using SQL native Client. I can connect without any problems to both dB’s.

    In the data flow task, the source is a SQL command “SELECT....from...etc”. There are no transformations/look ups/ containers etc...just a simple data load.

    When I run this either in BIDS dev or as a scheduled job from SSMS the time taken to load 3.5M rows is approx 30 minutes....when I view the data flow in BIDS this initially runs very fast and I get 1.5M rows in 5 mins...the next 3M rows take another 25 mins...????

    I have played around with various options...TABLOCK, ROWS_PER_BATCH, DefaultBufferMaxRows, DefaultBufferSize etc etc and I cannot improve on the overall load speed....it always slows down to a crawl....why???

    HOWEVER....the following is the solution I have come up with and maybe more experienced SSIS users can shed some light on this please........

    The source data table has an IDENTITY PK col....so I have used this to limit a dataflow task to only import data where the identity PK col is for example >499999 and < 1000000..ie 500K blocks. I now have seven identical dataflow tasks in one SSIS package...where the only difference is in the source select statements...to break out into 500K chunks. [b]This runs in 6 minutes ![/b] Has anyone got any thoughts/explanation for this behaviour please.......I have another 14 tables and 100M records to import. In case anyone is wondering how we might continue to keep the data in sync...we solved this first..by using Linked Server/Openquery. We get the MAX ID NO from the exg SQL table and pass that as a variable to the OPENQUERY statement that selects records > than the variable....we thought that was the hard part 😀

    Any thoughts and comments will be appreciated.

    Kind regards gah

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Can you check if there is sudden memory consumption spike at this 5 minute range? Can you make a test, forwarding the data to Trash Destination, so you can confirm this is an issue with the data source?

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • what am I supposed to do with Trash Destination?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Just follow the link provided by CozyRoc and you will see what the trash destination is and how you use it.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • CozyRoc (9/1/2009)


    Can you check if there is sudden memory consumption spike at this 5 minute range? Can you make a test, forwarding the data to Trash Destination, so you can confirm this is an issue with the data source?

    OK..there are no memory spikes on either server, nor any significant spikes in CPU..

    have mapped source to Trash Destination ...the slow down still happens....what am I supposed to be looking for in the way of results?

    thanks gah

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • This confirms the problem is with the specific ODBC source. I don't think the issue is in SSIS because this same general component is successfully working with many other ODBC drivers. I would recommend you contact Progress and ask for assistance to diagnose the issue.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • CozyRoc (9/2/2009)


    This confirms the problem is with the specific ODBC source. I don't think the issue is in SSIS because this same general component is successfully working with many other ODBC drivers. I would recommend you contact Progress and ask for assistance to diagnose the issue.

    Hi CozyRoc...thanks for your input and time...I do appreciate the feedback.

    For the time being I will continue with what I have got.

    We only deal with Progress thro our VER...(very expensive reseller :hehe:) ....and I am not sure that they really wish us to pursue this avenue..after all it significantly reduces our reliance on them for ad hoc reports and hence their revenue stream !

    We can sync new and updated records in near real time (every 5mins)...will roll our own solution for a full population that should at the moment run in our overnight window.

    Regards gah

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • I would like to also thank you for the detailed post and workaround you have come up for this issue. It will help many other users with similar problems.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • Are you using the Merant driver for Progress? If so, Merant is a third party vendor right? Contact them instead of Progress. By the way, the Progress knowledgebase is quite good, make sure you check that out as well.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • John Rowan (9/2/2009)


    Are you using the Merant driver for Progress? If so, Merant is a third party vendor right? Contact them instead of Progress. By the way, the Progress knowledgebase is quite good, make sure you check that out as well.

    John its the driver that is bundled with 10.1C (I believe its DataDirect) and provided by Progress...previous versions were Merant drivers for Progress 9.x

    Have already looked on Progresstalk.com and PSDN...but not a lot of comment/feedback

    I have logged this with our VAR who have promised to fwd onto Progress...all we can do is wait. It maybe that there are Progress db parameters that require tweaking on the Progress sql brokers.

    In the meantime ... am still pondering on why multiple threads work and a single slows down.

    many thanks for your intrerest and suggestions.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply