SSIS, bulk import with ODBC

  • Hi all,

    First of all if I am posting this in the wrong forum my bet didn't see any correct forum for 2012 business intelligence.

    Let me explain our scenario:

    We are running an SQL Server 2012 and we run an import daily from and progress database to this SQL Server. However the bulk import from the SSIS package is very slow. What we do know:

    - Recreate the table with indexes and fields etc.

    - Import with bulk import

    - Delete the old table and rename the newly imported table

    However the total time of importing is around 9 hours each day and then we are not importing all of the tables. Our main problem is one table is over 3 million rows large, and the import time of that table alone is around 3 hours. In my opinion that can go faster and better but I simply don't know how to achieve that...

    Is there a better practice to increase the speed of the bulk import so that the total import time gets faster?

    One thing maybe that can be a problem the progress database where the data is been extracted is an read only database we can't directly input data in to that database (Only with our application). So changes to the database are not allowed.

    Does anyone have tips to increase the speed of our SSIS package so our import runs (much) faster?

    Thx in advance!

  • Three million rows in three hours, now that is slow! Then again, how does the table look like? Are there tons of blobs or what is the average row size?

    You could consider adding indexes after bulk load, and use the TABLOCK hint. The database should be in simple or bulk_logged recovery to have the bulk load minimally logged.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Thx for the deadlock tip. How can i Find THE size of a row? 🙂

  • The easiest way to find the average data size of a row in table is to run sp_spaceused on the table, and then run:

    SELECT convert(bigint, reserved) * 8192 / rows

    This gives you the result in bytes.

    Or for that matter, since you recreate the table, the size of the table after the load is just as relevant. The whole point is that the number of rows is not the most important metric, the total size.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Result of sp_spaceused:

    name

    Table

    rows

    38131415

    reserved

    9961568

    data

    4952240 kb

    index_size

    5007848 kb

    unused

    1480 kb

    Size reserved:

    2140 byte

    with:

    SELECT

    *

    FROM

    OPENQUERY (LinkedServer, 'select

    *

    from

    PUB."Table"

    where

    "dd_sys-dat" = sysdate'

    )

    This query returns 62 rows and take up to 00:07:12 minutes, The total size of this table is 38131415, in your opinion is this to long? It is an progress database.

  • Sorry, but this is utterly confusing. In your initial post, you talked about a three-million row table, which you truncated and reinserted through a bulk-load operation in SSIS.

    Now you give me data about a table with 38 million rows and of a total size of 81 GB. And out of nowhere, there comes a Progress database which you are fetching data from. I assumed that you were reading data from file.

    If you are in fact importing 81 GB of data with indexes in place, three hours is not that excessive.

    As for whether the performance for your Progress query is acceptable, I am not the person to say; I don't work with Progress. But I assume that Progress has indexes as well, and an index could serve you well here.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • As Erland said, above statistics seems ok kind of performance to me with indexes. But if destination database is not operational database and just a OLAP database, then consider adding indexes after data is fully loaded into the destination table. What I mean here is: data transfer and Index rebuilding happens simultaneously. Some cases it is needed to protect integrity(mostly OLTP databases), some cases it is not needed (mostly OLAP databases).

    Thanks,
    Naveen.
    Every thought is a cause and every condition an effect

  • Hmm sorry for the no reply busy times here...

    For the fact, it aint an OLAP database it is our operational database. But since the last post I made the size has growed allot, it now does 5 hours to import the complete table (All of our financial transactions) it is kept up to speed in the day with an openquery() each hour.

    However what I see is that an SSIS package in this case is much faster BUT that is with an total new import not with a WHERE statement in the progress query. So what is faster an SP with OPENQUERY or an SSIS package with an OLEDB connection query.

    Then last question, what is smart to add indexes BEFORE inserting the 38 milion results or to add the indexes AFTER the complete import from the table is done.

  • If imports are taking that long, maybe you should consider incremental loads, rather than truncating and reloading? This may require logic to detect change, and it may require changes in the Progress database, but it may pay off in the long run.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • That could be an idea,

    however I am not familiar with progress query writing and I always get an type collision...

    Next to it the progress database is only a read only connection and data input is only done on a application layer and the we are not allowed en will never get the functionality to alter the tables within progress. So that ain't an option. Is there a smart way to do a hourly import and get it really fast (like 30 sec or less fast with in the 38 million row table??)

  • Truncating the table and reloading is not defensible in the long run. Your best option is to investigate time with the Progress application to find out how you can find deltas. And work for any necessary changes. Just shrugging your shoulders is somewhat irresponsible in my opinion.

    The second best option is to write a stored procedure that accepts a table-valued parameter and then run a MERGE statement where you compare on all columns there are in the table. It will not complete in 30 seconds, but you will not have to rebuild indexes every time.

    My assumption is that less than 1% of the data actually changes from day to day.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • What do you mean by delta's?

    And is it faster to camper if the row changed or just to reload the table?

    Cause I got 2 fields I can check if there has been new data (A date field and a UNIX time field where 0 is the beginning of that day).

    However If I try to write a a progress query for it:

    SELECT * FROM PUB."ab-149" WHERE dd-sys_dat = @Current_Date AND dd-sys_time = @Current_Time

    I get an error that there is a collesion with the types however I tried to convert the variables within the progress query but that wasnt any solution just kept getting errors... 🙁

  • prennings (10/19/2013)


    What do you mean by delta's?

    From what you have told me, you are loading data from a Progress table every day. I assume that you don't make it just to spend your time, but because there have been changes. Exactly what changes I don't know, but I will have to assume that there are new rows, changed rows and deleted rows. Those changes are the delta.

    And is it faster to camper if the row changed or just to eload the table?

    Yes. Even if it is that bad that you need to scan the entire source table for changes. (Which it appears that you do not.)

    However If I try to write a a progress query for it:

    SELECT * FROM PUB."ab-149" WHERE dd-sys_dat = @Current_Date AND dd-sys_time = @Current_Time

    I get an error that there is a collesion with the types however I tried to convert the variables within the progress query but that wasnt any solution just kept getting errors... 🙁

    So this is a forum for SQL Server and not for Progress. Or are you running a query against the Progress database from SQL Server?

    Whatever, you need to get that query going.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • what version of Progress(Open Edge) are you running?

    what version of ODBC driver have you installed?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply