August 26, 2008 at 6:10 pm
Okay, I am going to jump back into the thread. I was out of the office all day today otherwise I would have jumped in sooner.
Based on what has been described as needed I would likely agree with your boss. If you are loading data from a text file and need to create or update a column then doing it in a dataflow should be fairly simple. I agree that it is probably easier as a SQL guy to dump it into a staging table and use T-SQL to manipulate it, but then you are adding storage and IO to your package. A lookup component should work. If the join column is character data then you may need to disable caching in order to get a match and/or trim and same case the strings.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
August 27, 2008 at 3:57 am
If you're trying to avoid using the OLE DB Command in the Data Flow (which I recommend because it's RBAR), then you might try using a Derived Column transformation after either a Lookup or a left outer Merge Join.
If you don't have multiple tables to use (staging & regular), then load the text file into an ADO recordset in one Data Flow, then in the next, read in the recordset, do your lookups and set the derived column to 'Y' for those that meet the criteria and "whatever" for those that don't meet the criteria. Load that set into your OLE DB Destination with the derived column being used instead of the original column and you should be set.
August 27, 2008 at 12:18 pm
Frank,
I decided to stage data into STAGE_DDA_REV_ADJ table
because I need to see new COLUMN NAMES. Not File Source column names.
It's confusing.
So again, I still don't understand how I can implement this SQL
using Lookup and Derived Columns.
UPDATE STAGE_DDA_REV_ADJ
SET valid = 'Y'
FROM
STAGE_DDA_REV_ADJ STG,
CIS_ACCOUNT CUR
WHERE
CUR.CUR_ACCT_NUM = STG.CUR_ACCT_NUM AND
STG.CUST_AGRMNT_NUM IS NULL
If in [Lookup / Reference Table Tab] I use query:
SELECT CUR_ACCT_NUM FROM CIS_ACCOUNT
that gives me one column in the next [Tab - Columns].
It is CUR_ACCT_NUM.
Frank:
The output I'd expect here is the cust_agrmnt_num for any cur_acct_num's that exist in your database table and NULL for those that don't.
But I don't get cust_agrmnt_num as an output.
I only get one output column - CUR_ACCT_NUM
But what the use of it?
I add Derived Columns object.
In Derived Columns I drag "valid" column to the Derived Column Name field.
In Expression I enter - ISNULL( [CUST_AGRMNT_NUM] ) ? "Y"
In Derived Column field I choose 'replace valid'.
The expression stays in red color
and I get error
"Cannot parse the expression ISNULL( [CUST_AGRMNT_NUM] ) ? "Y".....
Am I doing the right thing?
--2.--
Another problem is how do I implement this SQL:
update stage_dda_rev_adj
set cust_agrmnt_num= null
where rtrim(ltrim(cust_agrmnt_num))=''
I tried Derived Columns / Expression:
CUST_AGRMNT_NUM == "" ? NULL
No luck. Parsing Error.
I tried Expression: CUST_AGRMNT_NUM == "" ? NULL(DT_STR, 22, 1251)
Same error.
August 27, 2008 at 12:44 pm
Small example.
In your dataflow, you have the following:
OLE DB Source1 -> Stage_DDA_Rev_Adj
OLE DB Source2 -> CIS_ACCount
--NOTE: Both sources have to be sorted. Advanced Editor, go to last tab, click on the top level of the Output heirarchy. IsSorted = True (and make sure the sources are sorted in your source queries with an OrderBy). Open the Output heirarchy and choose the Sort order for your columns. I recommend setting only one column, Cur_Acct_Num on both sources, with a setting of 1.
Merge Join -> Set to Left outer with Source1 as Left input and Source2 as Right input. Choose all columns from left input and Cur_Acct_Num from Right input (there are boxes in the screen).
Conditional Split -> Add formula for ISNULL(CIS_Account.Cur_Acct_Num) == TRUE. Add formula for ISNULL(CIS_Account.Cur_Acct_Num) == FALSE. Precedence constraint from true step leads to DerivedColumn1. False step leads to DerivedColumn2.
DerivedColumn1 -> Create a new column (NewValid for example) where the value is NULL.
DerivedColumn2 -> Create a new column (NewValid for example) where the value is 'Y'.
Each Derived column task leads to separate Ole DB destinations where the NewValid Column is mapped to the proper column in your destination table. Or you can UNION the results (not recommend. Takes longer to process) and send them both to the same OLE DB Dest. Task.
The above process is a little bulky. It's the long way around Lookup (which you can use straight off your OLE DB Source1), but Lookup works the same way. You just use a Failure from Lookup to go to your DerivedColumn1 and a Success to go to DerivedColumn2.
August 27, 2008 at 1:48 pm
I found the answer to one of my questions:
To update to NULL if empty, you need to use Expression in Derived Coluns:
CUST_AGRMNT_NUM == "" ? (DT_STR,22,1252)NULL(DT_STR,22,1252) : CUST_AGRMNT_NUM
Jesus!
What takes seconds in T-SQL takes hours in SSIS!
I have less and less confidence in SSIS Derived Columns/Lookup objects.
August 28, 2008 at 3:44 am
Hi Riga,
I think my post wasn't clear enough and so I'll try to answer each point 1 by 1.
I decided to stage data into STAGE_DDA_REV_ADJ table
because I need to see new COLUMN NAMES. Not File Source column names.
It shouldn't matter what the column names are. When you insert to your OLEDB Destination you decide the mappings. Also, in most (if not all) components you have the opportunity to alias your output so you can call the column names in the pipeline anything you like.
If in [Lookup / Reference Table Tab] I use query:
SELECT CUR_ACCT_NUM FROM CIS_ACCOUNT
Your query should be
SELECT CUR_ACCT_NUM, CUST_AGRMT_NUM FROM CIS_ACCOUNT
You will then see the CUST_AGRMT_NUM column becomes available for you to add to your pipeline.
If you create your join between CURR_ACCT_NUM coming through the pipeline from the flat file and the CURR_ACCT_NUM from your lookup table, you will then have the following outputs:-
(Green arrow) Successful lookups - all those records from the flat file where the lookup was able to find a CURR_ACCT_NUM in CIS_ACCOUNT. Because you have checked CUST_AGRMT_NUM you will also have this column added to the pipeline. The value of which you will use downstream. This part of the dataflow is replicating the where condition in your update
CUR.CUR_ACCT_NUM = STG.CUR_ACCT_NUM
(the other part of the WHERE clause we deal with in an expression).
(Red arrow) Failed lookups - all those records from the flat file where the lookup was unable to find a record in CIS_ACCOUNT whose CURR_ACCT_NUM matched.
You have a couple of options here. If you wanted to keep these nonmatching rows as part of the final dataset for insert, you can pull the red output to another component and union it downstream, or you can change the error behaviour of the lookup so that it ignores errors and sends all output down the green pipeline.
The green / successful output is the one you need to work with (for the purposes of replicating your update statement). If you pull this into your derived column you will now have the column CUST_AGRMT_NUM to work with. You add a new column as at this stage there is nothing to replace. Call it "IsValid". The expression that replicates the last part of your WHERE clause
AND
STG.CUST_AGRMNT_NUM IS NULL
is ISNULL(CUST_AGRMNT_NUM)?"Y":"N" .
In summary, you now have added a new column to your pipeline which will have the value "Y" for all those records that a) had a matching CURR_ACCT_NUM in CIS_ACCOUNT and b) Had a null value for CUST_AGRMNT_NUM in that record.
In Expression I enter - ISNULL( [CUST_AGRMNT_NUM] ) ? "Y"
The expression language used in SSIS is very powerful and can do many useful things however it can be hard to debug. The expression you're entering is whats called a ternary expression and needs three arguments. [Test Expression]?[ValueOrExpressionIfTestSucceeds]:[ValueOrExpressionIfTestFails]
All arguments need to be present. From your post you've missed the final argument in the expression.
Good luck 🙂
Kindest Regards,
Frank Bazan
August 28, 2008 at 4:09 am
The speed of SSIS is supposed to be much better in SQL 2008. Apparently MS learned from the 2005 implementation and have added some "umph" to the engine. @=)
August 28, 2008 at 2:28 pm
Frank,
You wrote
Your query should be
SELECT CUR_ACCT_NUM, CUST_AGRMT_NUM FROM CIS_ACCOUNT
You will then see the CUST_AGRMT_NUM column becomes available for you to add to your pipeline.
But CIS_ACCOUNT doesn't have CUST_AGRMT_NUM column ?..
August 29, 2008 at 3:06 am
In which case, your lookup query was correct and you just need to adjust the expression downstream. (TRIM(CUST_AGRMNT_NUM) == "")||(ISNULL(CUST_AGRMT_NUM))? "Y":"N"
Hope this helps
Kindest Regards,
Frank Bazan
August 29, 2008 at 7:15 am
I agree that the boss probably wants to eliminate "execute SQL" statements because he just plain doesn't understand and doesn't have enough knowledge. My opinion is that if it can be done from SQL in the database, it should be done there because it will perform much better. If you are doing looping, conditions, or processing based on a lot of variables, then SSIS will help you, but the basics should still be done in the database.
Hope this helps.
Karen
August 29, 2008 at 7:36 am
In some circumstances using SQL can be the right way to go, but I reckon that in this example performing the logic in the dataflow will prove quicker.
What you're effectively saying is that a bulk insert from a csv to a temp table, followed by an update on the temp table, followed by another insert to the main table, is faster than an insert from a csv file straight into the main table? I'm afraid I can't see how that's faster?
Kindest Regards,
Frank Bazan
August 29, 2008 at 7:52 am
No, that is not what I said. I answered the first question only. Sorry to have confused you or anybody else who read my reply.
Karen
Viewing 12 posts - 16 through 26 (of 26 total)
You must be logged in to reply to this topic. Login to reply