May 12, 2010 at 9:18 am
What's the simplest way to get a distinct list of values from a table using SSIS?
For example, if a SQL table contains Customer names and I just want to get the distinct list of customer Last Names (don't care about first names), how can this be done using built-in SSIS features without using SQL statements?
Thanks in advance
--pete
May 12, 2010 at 9:21 am
Use an Aggregate transform. T-SQL is faster though.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
May 12, 2010 at 9:24 am
I agree T-SQL would be A LOT faster.. What is the aversion to SELECT statements, in almost ALL cases they perform better than chosing the name off the list. So much so that I NEVER just select a tablename for a source.
CEWII
May 12, 2010 at 9:42 am
Thanks for the quick replies. I kinda guessed that Aggregate could serve the purpose, but will go with SQL nonetheless.
I'm developing a dimensional data warehouse and my boss prefers that the SSIS packages be more than just a bunch of ExecuteSQL tasks strung together.... Plus, I'm learning the ropes of SSIS (broadening my horizons), and would like to leverage SSIS features as much as possible.
Thanks again.
May 12, 2010 at 10:17 am
you should really be using select statements for your SQL server sources rather than using table access so you can add the Select Dinstinct or Group by there without using execute sql tasks.
If you need some proof to show your boss then there are some really good guidlines here to building quality ssis packages.
http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx"> http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx
May 12, 2010 at 10:24 am
Thanks for the best practices link!
My boss isn't against Execute SQL tasks, but merely a proponent of using SSIS as an ETL tool where data manipulation is fairly visible/obvious, rather than doing transformations solely in SQL stored procs -- (aka "brute force sql" in his words).
May 12, 2010 at 10:32 am
What comes to mind is that each tool has its strong points.. You can just select a tablename from a list BUT you take a performance hit to do so. You can do a date conversion in your SQL statement that you could also do in a Derived Column task, but T-SQL wold be faster in most cases. If you can do much of your work before the data even gets into the SSIS pipeline then it will run that much faster. Every task you add to the pipeline adds "cost".
That link should be helpful.
CEWII
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply