Loading change data from AS400 (or other source systems)

  • Well it is looking like the SSIS method is not going to work either sigh 😉

    Apparently there is a bug when using a Lookup transform on DB2 data when it is a Decimal Data Type. Unfortunately several columns In our source system (DB2) are Decimal, and many of them would need to be compared making this bug rather prohibitive! However, I would still like to thank you Robert, as I did read the stairway to SSIS series and learned quite a bit and was successful at incremental loads on several tables, there are just some major ones that wont work because of the data conversion mentioned above!

    So it's back to the OPENQUERY method. Hopefully Jeff can chime back in here regarding some more effective methods 🙂

    I have also revised my insertion and update methods based on the entry and change dates (I had forgotten to do so before) and that has sped things up considerably. However, the deletion process still takes forever as it has to search the whole source table for entries that are not there in the destination table which takes just as long to load the data.

    Link to my blog http://notyelf.com/

  • shannonjk (12/14/2012)


    Well it is looking like the SSIS method is not going to work either sigh 😉

    Apparently there is a bug when using a Lookup transform on DB2 data when it is a Decimal Data Type. Unfortunately several columns In our source system (DB2) are Decimal, and many of them would need to be compared making this bug rather prohibitive! However, I would still like to thank you Robert, as I did read the stairway to SSIS series and learned quite a bit and was successful at incremental loads on several tables, there are just some major ones that wont work because of the data conversion mentioned above!

    So it's back to the OPENQUERY method. Hopefully Jeff can chime back in here regarding some more effective methods 🙂

    I have also revised my insertion and update methods based on the entry and change dates (I had forgotten to do so before) and that has sped things up considerably. However, the deletion process still takes forever as it has to search the whole source table for entries that are not there in the destination table which takes just as long to load the data.

    I'm not familiar with the SSIS - DB2 bug, but why couldn't you cast the decimal to something else; integer say, and then undo the math on the SSIS side. So if the decimal is a 123.45 - multiply by 100, bring into SSIS by 12345. Divide by 100 in SSIS and then write the 123.45 into your DW. I'm sure it's not as simple as that, but there must be a way to work around this -- especially since you're bring in that data currently into your DW.

    Just a thought,

    Rob

  • What is the actual error that you are getting?

    The reason that I ask is because I get a bogus truncation error or about a dozen tables from DB2 to staging.

    If you use OPENQUERY you can use a merge statement.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Robert, the source table is in the lookup transform itself. So I do not know of a way to feed that and change the data types. I have tried doing 'use the results of a sql query' to cast the data type but that does not work as well. Additionally doing that over 12 million rows could be a bit slow and defeat the purpose of the original point anyway 😉

    Welsh, the error is that the data types do not match, which digging into the advanced editors and looking at the input/output columns...all the data types are exactly the same. Using the Merge query yields the same extraordinarily slow results as the left join (tested and verified).

    Link to my blog http://notyelf.com/

  • shannonjk (12/17/2012)


    Robert, the source table is in the lookup transform itself. So I do not know of a way to feed that and change the data types. I have tried doing 'use the results of a sql query' to cast the data type but that does not work as well. Additionally doing that over 12 million rows could be a bit slow and defeat the purpose of the original point anyway 😉

    Welsh, the error is that the data types do not match, which digging into the advanced editors and looking at the input/output columns...all the data types are exactly the same. Using the Merge query yields the same extraordinarily slow results as the left join (tested and verified).

    I never stated that the MERGE would be fast and it is not because you are using an OPENQUERY. I avoid it because it is terribly slow and I could not load the data with the window that I have?

    Have you considered using the Data Conversion Task?

    I have tables that do not have Date InseDatedate Columns. I can't add columns to the tables because the AS400 is hosted by a Vendor and it is proprietary.

    I'm Using combination of Lookup Task for inserts and a Hash for Updates.

    I'm loading 100's of tables from the DB2 Data Source alone and several are greater than 12 million.

    I'm not using OPENQUERY. I'm redireting the records that fail to a text file.

    You have gotten a lot of good advise from the Forum Members, hopefully things work out for you.:-)

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I never stated that the MERGE would be fast and it is not because you are using an OPENQUERY. I avoid it because it is terribly slow and I could not load the data with the window that I have?

    Have you considered using the Data Conversion Task?

    I have tables that do not have Date InseDatedate Columns. I can't add columns to the tables because the AS400 is hosted by a Vendor and it is proprietary.

    I'm Using combination of Lookup Task for inserts and a Hash for Updates.

    I'm loading 100's of tables from the DB2 Data Source alone and several are greater than 12 million.

    I'm not using OPENQUERY. I'm redireting the records that fail to a text file.

    You have gotten a lot of good advise from the Forum Members, hopefully things work out for you.:-)

    Ah I misunderstood the intention of the Merge statement then. I agree, so far when connecting to the AS400 (which is also on a hosted solution) using openquery is horribly slow.

    I guess what I don't understand is...according to the stairway to SSIS columns, I would use the OLE DB Source for the data that is in my Data Warehouse, and the Lookup transform is the source data, I would then redirect that data that didn't match to an insert. The lookup won't work however as that is where I get my error when attempting to connect the columns from the destination to the source (even though the data types are exact).

    So I am not sure, if the source (as400) data is defined IN the Lookup, how I would go about converting its data using a data conversion DFT. Does that make sense?

    Perhaps since we both have similar situation, could we possibly communicate in private so you could show me an example of how you have this set up? That would be greatly appreciated!

    And yes I am very happy with the advice, it may not have worked exactly yet but I have found other uses for both suggestions lol. I have also gotten the incremental load to work fine, just not when connecting to the AS400. The openquery is also super useful in some other tasks that I have as well. All in all learning a lot, just attempting to deal with the shortcomings of my company's current setup 🙂

    Link to my blog http://notyelf.com/

  • I need to correct a statement that I made with respect to the OPENQUERY being slow.

    The OPENQUERY itself is not necessarily slow because it is using a Linked Server to an AS400.

    I suspect that the bottleneck is the bandbandwidth to the connection to the remote AS400.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • shannonjk (12/12/2012)


    Ok so finally getting the testing done on this.

    So far I am finding OPENQUERY to be a slower method. Are there particular things I should be utilizing in order to accomplish this? I am simply starting off with the basic insert script of

    select column1, column2 from openquery(sourceserv, 'select column1, column2 from table') as t1

    LEFT JOIN destinationtable as t2

    on t1.key = t2.key

    where t2.key is null.

    It is taking quite a long time to accomplish but perhaps I am not using the best method here?

    That's basically a join across a linked server which probably won't be the fastest car on the track. How many rows are in the table you've identified in the open query?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes I believe OPENQUERY in this case is limited to the capabilities of the connection to the AS400 (which is hosted off-site from the sql server and we have a 50Mb connection vs the original Gb connection). Which is why I am trying to switch to the incremental load so we are loading less data, and intelligently.

    Jeff, the table in the openquery has about 12 million rows in it.

    Link to my blog http://notyelf.com/

  • shannonjk (12/18/2012)


    Yes I believe OPENQUERY in this case is limited to the capabilities of the connection to the AS400 (which is hosted off-site from the sql server and we have a 50Mb connection vs the original Gb connection). Which is why I am trying to switch to the incremental load so we are loading less data, and intelligently.

    Jeff, the table in the openquery has about 12 million rows in it.

    LEFT JOIN with a NULL detector is one of the slowest ways to check for things not present even within SQL Server never mind across a linked server. Explore which is better in this case... WHERE NOT IN or WHERE NOT EXISTS. I haven't tried it yet but an EXCEPT just between key ID's might be the way to go, as well. MERGE certainly is an option but I can't really vouch for it's performance because I'm still stuck in the 2k5 world at work. My gut tells me that it would be as slow as the LEFT JOIN/NULL DETECT across linked servers but I could certaily be wrong on that.

    You may also be fighting an implicit conversion because of the datatype disparity between SQL Server and DB2. I believe I'd write a small, limited, inner join query of known key IDs and see if you're getting seeks on the SQL Server side. If you're not, you may have to change the datatype on the SQL Server side.

    There's another thing that a lot of people don't know about AS400's. Their cache speed is reliant on a little battery pack inside the machine. Most people don't even know it exists. If the battery is out of juice, the cache doesn't work... PERIOD! This is actually a part of the reason why some people think AS400's are slow. Seriously... I just went through this at work.

    Also, some people forget to expose the indexes on the AS400 side when they make such a DB2 "FILE" (as the DB2 folks call it) available. I forget what the name of the process is for doing that but I'll ask my AS400 guy tomorrow at work.

    The AS400 also has the equivalent of a WITH(NOLOCK) hint that you can include in the query. I'll find that out, as well. Rumor has it that it does NOT have the same problems as the SQL Server hint has.

    Last but not least, when you do your laundry, do you wash the clean clothes? I don't. With that in mind, if you told the AS400 to keep track of (using a "write back" from SQL Server) what it already sent to SQL Server either by a date column or some sort of a flag and built an index on that, as well, you could exclude a whole bunch of rows from the AS400 side. It won't help at all on the first run but it'll help a whole lot on subquent runs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff thank you! That was quite a lot of useful information. Yes I went with the LEFT JOIN/NULL then I tried the EXCEPT on the keys and that was just as slow, followed by the merge. I did not know that the WHERE NOT IN or WHERE NOT EXISTS was faster than that so that is good.

    In regards to your laundry, I will have to research if I have that capability in our AS400. Right now I am pretty sure I am in 'read only' mode. I will check with the admin of the AS400 though as that may help. I do filter for inserts on just yesterday's date (or a date bound of last date loaded to yesterday). That does seem to help a bit which is good. I think the Deletes are really going to be the only pain, the inserts and changes are easy because there is a Entry date and change date to work with and filter down to.

    I will check on this soon and let you know how it goes. I will also check in regards to the seeks.

    Link to my blog http://notyelf.com/

  • The DB2 equivalent to WITH (NOLOCK) is described in the following post:

    http://www.sqlservercentral.com/Forums/Topic1040710-338-1.aspx#bm1040762

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply