Question regarding SSIS efficiency relative to SQL Server

Question

Post reply

Question regarding SSIS efficiency relative to SQL Server

kramaswamy

SSCoach

Points: 18181
More actions
August 5, 2011 at 11:31 am

#254935

General question here, though I'm going to ask it specifically with regards to the Aggregate Task
Is SSIS more, less, or just as efficient as SQL Server at handling data manipulation tasks? Specifically, the Aggregate Task?
As a simple example, let's say I have a table which contains 1 million rows.
Case 1 : Pipeline read the rows through an SSIS data flow task, using an OLE DB Source reading directly from a table, then pass the values through an Aggregate task to compute some aggregate values, and then output those results into an Excel Destination.
Case 2 : Instead of reading all the rows, have the aggregation be done as a SQL Command, then feed the rows into an Excel Destination
Is this the same for other SSIS tasks? For example, using the Conditional Split task to divide the incoming rows vs using a WHERE clause in the SQL Command, or using a derived column to calculate a new column vs using a condition in the SELECT clause in the SQL command.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

kramaswamy SSCoach Points: 18181 More actions · Answer 1

anyone happen to have any insight on this one? been looking around and haven't been able to find much information.

Phil Parkin SSC Guru Points: 246985 More actions · Answer 2

kramaswamy (8/5/2011)
General question here, though I'm going to ask it specifically with regards to the Aggregate Task
Is SSIS more, less, or just as efficient as SQL Server at handling data manipulation tasks? Specifically, the Aggregate Task?
As a simple example, let's say I have a table which contains 1 million rows.
Case 1 : Pipeline read the rows through an SSIS data flow task, using an OLE DB Source reading directly from a table, then pass the values through an Aggregate task to compute some aggregate values, and then output those results into an Excel Destination.
Case 2 : Instead of reading all the rows, have the aggregation be done as a SQL Command, then feed the rows into an Excel Destination
Is this the same for other SSIS tasks? For example, using the Conditional Split task to divide the incoming rows vs using a WHERE clause in the SQL Command, or using a derived column to calculate a new column vs using a condition in the SELECT clause in the SQL command.

This would be easy to test - why not set it up and see what happens?

By the way, your Case 2 is not formed quite the way I would suggest. Better to have a datasource with a query containing the aggregation as its source & map that to an Excel destination (if you must use Excel) as part of a dataflow.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 3

First of all, don't use the OLE DB command in the dataflow. It is slow.

If you do the aggregation in the source query, as Phil suggested, it will probably have better results than the Aggregate transform in the dataflow.

Why? Because the Aggregate is a blocking transform, which means that it will read all data in memory first, before it actually does anything. This can become quite troublesome on large datasets.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 4

I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.

CEWII

Phil Parkin SSC Guru Points: 246985 More actions · Answer 5

Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.
CEWII

Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Elliott Whitlow SSC Guru Points: 102402 More actions · Answer 6

Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.
CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!

Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.

CEWII

Phil Parkin SSC Guru Points: 246985 More actions · Answer 7

Elliott Whitlow (8/9/2011)
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.
CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.
CEWII

We think alike. Doing the manipulation at source keeps things tidy & feels more efficient. And please don't mention the 'E' word on this forum 🙂

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 8

Elliott Whitlow (8/9/2011)
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.
CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.
CEWII

Not entirely true regarding the Excel file part. You can use some sort of SQL query to read from the Excel file. I say "some sort of SQL", because not all the TSQL syntax is supported. But some other functions are, so you can also do some "cleaning" in the source query. However no aggregates 🙁

But I do exactly the same thing. Rather complex source query, rather simple data flow.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 9

Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
Phil Parkin (8/9/2011)
Elliott Whitlow (8/9/2011)
I have long been a fan of letting the database engine handle those kind of operations before they even get into the pipeline. For this reason I don't think I have ever used the aggregate functions in SSIS, perhaps once as a test, but never in production.
CEWII
Haha, me neither. That's why I'm trying to get someone else to test it, just to be sure!
Where ever possible I will tweak the source query and make the source system condition the data to as near a "pipeline ready" state as I can. If I can do the operation readily in the source query, I will put it there. Which is why I hate reading Excel files, I have to do all the conditioning in the pipeline.
CEWII
We think alike. Doing the manipulation at source keeps things tidy & feels more efficient. And please don't mention the 'E' word on this forum 🙂

The E-word is bad, but what about .csv files with a comma delimiter, but with values that also contains comma's and double quotes?

*shudder*

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

kramaswamy SSCoach Points: 18181 More actions · Answer 10

well that's not too bad is it? you just put the double quote as a text qualifier, so it removes them?

as for my question, I'll see if i can get a chance to do some tests, but the main reason i was asking the question here is because i don't really have a good test environment to use - the PC which I'm running SSIS on is pretty slow, so that's gonna skew the tests a lot. the production environment is better, but i'm not going to run tests on that environment.

Koen Verbeeck SSC Guru Points: 259115 More actions · Answer 11

kramaswamy (8/9/2011)
well that's not too bad is it? you just put the double quote as a text qualifier, so it removes them?

That works when the double quotes are used as they are intended to: around the text. But it all goes wrong when double quotes appear in the actual text.

What if you need to import a column that contains the following text:

"And then, at the crack of down, Irene shouted: "Hey, are you allright?" Then she turned on the radio."

(I made this up, it doesn't come from a movie or a book or something like that :-))

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP