August 5, 2011 at 11:31 am
General question here, though I'm going to ask it specifically with regards to the Aggregate Task
Is SSIS more, less, or just as efficient as SQL Server at handling data manipulation tasks? Specifically, the Aggregate Task?
As a simple example, let's say I have a table which contains 1 million rows.
Case 1 : Pipeline read the rows through an SSIS data flow task, using an OLE DB Source reading directly from a table, then pass the values through an Aggregate task to compute some aggregate values, and then output those results into an Excel Destination.
Case 2 : Instead of reading all the rows, have the aggregation be done as a SQL Command, then feed the rows into an Excel Destination
Is this the same for other SSIS tasks? For example, using the Conditional Split task to divide the incoming rows vs using a WHERE clause in the SQL Command, or using a derived column to calculate a new column vs using a condition in the SELECT clause in the SQL command.
August 8, 2011 at 12:12 pm
anyone happen to have any insight on this one? been looking around and haven't been able to find much information.
August 8, 2011 at 1:55 pm
kramaswamy (8/5/2011)
General question here, though I'm going to ask it specifically with regards to the Aggregate TaskIs SSIS more, less, or just as efficient as SQL Server at handling data manipulation tasks? Specifically, the Aggregate Task?
As a simple example, let's say I have a table which contains 1 million rows.
Case 1 : Pipeline read the rows through an SSIS data flow task, using an OLE DB Source reading directly from a table, then pass the values through an Aggregate task to compute some aggregate values, and then output those results into an Excel Destination.
Case 2 : Instead of reading all the rows, have the aggregation be done as a SQL Command, then feed the rows into an Excel Destination
Is this the same for other SSIS tasks? For example, using the Conditional Split task to divide the incoming rows vs using a WHERE clause in the SQL Command, or using a derived column to calculate a new column vs using a condition in the SELECT clause in the SQL command.
This would be easy to test - why not set it up and see what happens?
By the way, your Case 2 is not formed quite the way I would suggest. Better to have a datasource with a query containing the aggregation as its source & map that to an Excel destination (if you must use Excel) as part of a dataflow.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 9, 2011 at 1:56 am
First of all, don't use the OLE DB command in the dataflow. It is slow.
If you do the aggregation in the source query, as Phil suggested, it will probably have better results than the Aggregate transform in the dataflow.
Why? Because the Aggregate is a blocking transform, which means that it will read all data in memory first, before it actually does anything. This can become quite troublesome on large datasets.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
August 9, 2011 at 9:18 am
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.
CEWII
August 9, 2011 at 9:23 am
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 9, 2011 at 9:36 am
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.
CEWII
August 9, 2011 at 9:39 am
Elliott Whitlow (8/9/2011)
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.
CEWII
We think alike. Doing the manipulation at source keeps things tidy & feels more efficient. And please don't mention the 'E' word on this forum 🙂
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 9, 2011 at 11:40 am
Elliott Whitlow (8/9/2011)
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.
CEWII
Not entirely true regarding the Excel file part. You can use some sort of SQL query to read from the Excel file. I say "some sort of SQL", because not all the TSQL syntax is supported. But some other functions are, so you can also do some "cleaning" in the source query. However no aggregates 🙁
But I do exactly the same thing. Rather complex source query, rather simple data flow.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
August 9, 2011 at 11:42 am
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.
CEWII
We think alike. Doing the manipulation at source keeps things tidy & feels more efficient. And please don't mention the 'E' word on this forum 🙂
The E-word is bad, but what about .csv files with a comma delimiter, but with values that also contains comma's and double quotes?
*shudder*
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
August 9, 2011 at 12:26 pm
well that's not too bad is it? you just put the double quote as a text qualifier, so it removes them?
as for my question, I'll see if i can get a chance to do some tests, but the main reason i was asking the question here is because i don't really have a good test environment to use - the PC which I'm running SSIS on is pretty slow, so that's gonna skew the tests a lot. the production environment is better, but i'm not going to run tests on that environment.
August 9, 2011 at 3:17 pm
kramaswamy (8/9/2011)
well that's not too bad is it? you just put the double quote as a text qualifier, so it removes them?
That works when the double quotes are used as they are intended to: around the text. But it all goes wrong when double quotes appear in the actual text.
What if you need to import a column that contains the following text:
"And then, at the crack of down, Irene shouted: "Hey, are you allright?" Then she turned on the radio."
(I made this up, it doesn't come from a movie or a book or something like that :-))
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply