Filter rows from source file

  • Hi

    I want to transform a flat file records to a database table using SSIS data flow task, it is working properly, but I want to filter my file record on basis of a particular column value before transforming into distination table.

    I am using flat file source for input and Oledb distination for output.

    can anybody help me

  • You want to use a conditional split component in your data flow. This will allow you to specify an expression (MyCol=="A") to split data into multiple outputs.

    It is not necessary to use all of the outputs - so if you simply want to discard some data just do not connect the output to anything.

  • Hi

    Thanks

    its working properly. Now I want some other task, I do not want to insert records from flat file to table every time, I want to insert only that record which not exist in distination table already, if exist then I want to update only the value on the basis of key column.

    How can I do this?

    Please help.

    Thanks

    Vaseem

  • Ok, that is getting into a bigger subject.

    What you are trying to do is exactly the same thing as a "Type 1 Slowly Changing Dimension" update. You have several options:

    1) There is a "Slowly Changing Dimension" update component. It is actually a wizard that creates an update and an insert (and some other components if necessary) that inserts new records and updates existing records. This component is easy to use, but slow for tables with more than 1000 records.

    2) You can use a lookup component to find existing records and manually configure an insert off of the lookup failure output and an update (OLEDB Command transformation) from the lookup success. The OLEDB Command transformation has an output, but you do not have to have it go anywhere, so you can use this to update individual records. This is another relatively slow option, but performance will be better than #1.

    3) You can use a MERGE JOIN component with a FULL OUTER JOIN to find matching records and then a conditional split to send data to the insert or the update. This option will work for larger tables, but not huge ones (under 10,000 records).

    4) You can download and use the free TableDifference component found here:

    http://www.sqlbi.com/Projects/TableDifference/tabid/74/language/en-US/Default.aspx There are instructions on the SQLBI site - this component is pretty much the MERGE JOIN and CONDITIONAL SPLIT from #3 all in one component.

    5) You can load your data into a staging table and use T-SQL to do the insert and update. This option is the way to go for very large tables. SSIS does all of it's comparisons one row at a time. It is way better at RBAR activity than the MSSQL database engine, but a set-based comparison can be tuned to work much better.

    I would suggest searching on google for "MSSQL Type 1 Slowly Changing Dimension" to see other options. Look for data warehousing and BI related topics. You may not be building a data warehouse, but the BI community does this type of table update all the time and they will have the best information for you.

  • Thanks for prompt response.

    i will check the options which you are suggesting but performance will be the issue since I m working on huge records more than 3 lakhs, so this will be a major issue.

    Can I use delete then insert logic? Is there any built in facility available or I have to manually perform delete operation (using any sql task component).

    Please suggest?

    Thanks

    Vaseem

  • You could truncate the table and then insert, but with a large number of records, this is pretty inefficient unless most of the records actually change.

    You will probably want to stage the records in an additional table and then use T-SQL to do the insert and update. This makes it pretty simple:

    1) Truncate "staging" table

    2) Insert all records into staging table

    3) Update all existing records by joining on the key field

    4) Insert any records that are in staging but not in your final table

    - If you were using SQL 2008, you could do what is often referred to as an UPSERT -

    Like I said, look on Google for some additional options and real-life performance information before you make a decision.

  • Hi

    Staging table logic works great.

    Now I have some other requirement. I want to do all these task on a specific condition else some other task. Is there any method in which I can put all these task in one condition and rest in some others or I have to use Script component.

    please suggest?

    Thanx

    Vaseem

  • You were not very specific, but conditional logic can be handled in the control flow through the Precedence Constraints. Double-click on one of your control flow connectors (the green arrows) and fill in the logic in the dialog that appears.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply