SSIS performance issue

Question

SSIS performance issue

Andre 425568

SSCrazy

Points: 2479
More actions
August 11, 2014 at 4:29 am

#310024

Hi
I have changed an ETL process which runs on SSMS to run directly from the file system through SSIS, to I run SQL Execute task which them gets the file with the SSIS script. It is taking 4+ times longer to run the same query through SSIS than directly in SSMS. There are a vast number of scripts so it difficult narrowing down the cause of this. Any hints what I can do and what to look for to improve performance? We are using quite a lot of cursors in the queries to do updates on tables, some which have more than a billion records.
Andre

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Andre

First, do you really need those cursors? If you can find a way to do your update with a set-based query, it's likely to run much faster. Second, are the settings the same for you when you run it from SSMS as they are for whatever account runs it through SSIS? Options such as ANSI NULLS and regional settings can make a difference. Finally, is the network latency the same between the source server and where SSMS runs as between the source server and whatever server consumes the output?

John

Andre 425568 SSCrazy Points: 2479 More actions · Answer 2

John, thanks for the reply

First of all the cursor seems to be the only way, we are updating/deleting records where rowcounts are in the 100 of millions or early billions, we had issues when running straight updates, so we put them into a cursor to update a millions rows at a time.

How do I check this for SSIS and SSMS?

Second, are the settings the same for you when you run it from SSMS as they are for whatever account runs it through SSIS? Options such as ANSI NULLS and regional settings can make a difference.

To third qustion, it's on the same server, we have set max memory on SQL so SSIS has about 50Gb memory to use.

Anything else?

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 3

For SSMS, go to Tools -> Options -> Query Execution -> SQL Server -> Advanced. For SSIS, it depends on the properties of the network library you're using to connect to SQL Server. I think you can also capture the options in Profiler when the connection is made. Make sure you don't leave the trace running if it's a live server.

You might also want to query your plan cache to see whether you have two different execution plans.

John

Andre 425568 SSCrazy Points: 2479 More actions · Answer 4

Not seeing it, am using Visual Studio 201 BI development, am getting to options but not seeing the rest there

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 5

John Mitchell-245523

SSC Guru

Points: 148809

August 11, 2014 at 7:30 am

#1736857

You said you were using SSMS.

John

Andre 425568 SSCrazy Points: 2479 More actions · Answer 6

It runs faster on SSMS than on SSIS, I need to get the performance of SSIS up, and I am not sure why the scripts runs about 4 times faster through SSMS than on SSIS, sorry if I was not clear on this

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 7

That's what I thought you meant, and it's why I suggested you compare your options on SSMS to those it uses when run through SSIS.

John

Andre 425568 SSCrazy Points: 2479 More actions · Answer 8

Maybe I am being a bit slow, but explain to me exactly how to compare the two.

I would also like to find out why everyone is saying do not use cursors in SSIS, I have not really found a reason as such why to avoid it.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 9

For SSMS, go to Tools -> Options -> Query Execution -> SQL Server -> Advanced. For SSIS, it depends on the properties of the network library you're using to connect to SQL Server. I think you can also capture the options in Profiler when the connection is made. Make sure you don't leave the trace running if it's a live server.

Cursors aren't bad in SSIS in particular; they're just bad for performance generally (in most cases). There are cases where there's no alternative but to use them, and even a few situations in which they'll outperform set-based code. Your desire to avoid locking a whole chunk of data and filling up the transaction log is a good reason to use some sort of loop. SSIS can automatically process data in batches, though, so you may wish to take advantage of that instead of explicitly declaring cursors in your code. I don't know enough about what you're trying to do to recommend one way or the other.

John

Andre 425568 SSCrazy Points: 2479 More actions · Answer 10

Thanks I will have a look at what you said, check the settings etc.

The cursors we use look for a table table and field name, and for example if a certain table was two data fields that are both nulls we delete the entry(line). We are doing it in batches of checking through a million rows at a a time. The code is working perfectly though in SQL Mngmnt Studio, when we use the same code, pull it in through from a file location and run it as a SQL Query, it takes 4 time longer to run, that is our main concern, there are probably better ways to do it but as we are working with about 500 tables in total, we found it better to keep it generic code as far as possible, rather than writing separate code for each table. I will so what the settings are in SSMS and SSIS, but I do not think that is the issue, the normal queries, that also run from a gile location, example that set up primary and foreign keys, create indexes etc are all running within what we are expecting.

At this stage it looks like getting away from the cursors seems to be the way to go, but that is a lot of extra work to put in, seeing as I am also trying to get the ETL more acceptable through SSIS than running code manually as it has been done before. We would ideally alos like to keep the code the same as we might have clients where the ETL's rather need to be done through SSMS because of server capacity.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 11

Andre 425568 (8/12/2014)
I will so what the settings are in SSMS and SSIS, but I do not think that is the issue, the normal queries, that also run from a gile location, example that set up primary and foreign keys, create indexes etc are all running within what we are expecting.

Yes, but DDL operations don't (usually) return result sets and so network latency is much less likely to be a factor. Also, DML is more likely to be sensitive to options and regional settings. Would you be able to post a snippet of your code (or all of it), please?

John

Andre 425568 SSCrazy Points: 2479 More actions · Answer 12

This is the cursor specifically where I picked up the problem

DECLARE @object_id INT

DECLARE @table_name VARCHAR(128)

DECLARE @col_name VARCHAR(128)

DECLARE @exec_sql_string VARCHAR(500)

DECLARE @exec_sql NVARCHAR(500)

DECLARE @max_id NVARCHAR(4000)

DECLARE @OutputParameter NVARCHAR(4000)

DECLARE @id_count NUMERIC(38, 0)

DECLARE @id_count_to NUMERIC(38, 0)

SET NOCOUNT ON

DECLARE recs CURSOR DYNAMIC

FOR

SELECT object_id

,name

FROM sys.tables ts

WHERE object_id IN (

SELECT object_id

FROM sys.all_columns

WHERE name = 'INDEX0'

)

ORDER BY name

OPEN recs

FETCH

FROM recs

INTO @object_id

,@table_name

WHILE @@FETCH_STATUS = 0

BEGIN

SET @exec_sql_string = ''

DECLARE recs1 CURSOR DYNAMIC

FOR

SELECT name

FROM sys.all_columns

WHERE object_id = @object_id

AND name != 'ID'

AND name != 'INDEX0'

AND (

name NOT LIKE 'MAS%'

AND name NOT LIKE 'Z%'

)

OPEN recs1

FETCH

FROM recs1

INTO @col_name

WHILE @@FETCH_STATUS = 0

BEGIN

IF @exec_sql_string = ''

BEGIN

SET @exec_sql_string = 'DELETE FROM ' + @table_name + ' WHERE ' + @col_name + ' IS NULL '

END

ELSE

BEGIN

SET @exec_sql_string = @exec_sql_string + 'AND ' + @col_name + ' IS NULL '

END

FETCH

FROM recs1

INTO @col_name

END

SET @exec_sql = 'SELECT @OutputParameter = MAX(ID) FROM ' + @table_name

EXEC @max_id = sp_executesql @exec_sql

,N'@OutputParameter nvarchar(4000) output'

,@OutputParameter OUTPUT

IF LTRIM(RTRIM(@OutputParameter)) = ''

OR @OutputParameter IS NULL

BEGIN

SET @OutputParameter = '0'

END

SET @id_count = 1

IF @OutputParameter <> '0'

BEGIN

WHILE @id_count < @OutputParameter

BEGIN

SET @id_count_to = @id_count + 1000000

EXEC ('BEGIN TRANSACTION ' + @exec_sql_string + ' AND ID >= ' + @id_count + ' AND ID <= ' + @id_count_to + ' COMMIT TRANSACTION')

PRINT('FROM: ' + CAST(@id_count as varchar(150)) + ' TO: ' + CAST(@id_count as varchar(150)))

SET @id_count = @id_count + 1000000

END

SET @id_count_to = @id_count + 1000000

EXEC ('BEGIN TRANSACTION ' + @exec_sql_string + ' AND ID >= ' + @id_count + ' AND ID <= ' + @id_count_to + ' COMMIT TRANSACTION')

PRINT('FROM: ' + CAST(@id_count as varchar(150)) + ' TO: ' + CAST(@id_count as varchar(150)))

END

CLOSE recs1

DEALLOCATE recs1

FETCH

FROM recs

INTO @object_id

,@table_name

END

CLOSE recs

DEALLOCATE recs

GO

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 13

I can't see anything there that would obviously run differently under different settings. What I advise you to do is to capture each event with a trace, and capture the execution plans as well. You can then compare the slow one to the faster one. Are you confident that they are doing exactly the same thing and it's just the speed that's different?

One thing I did notice - I think your DELETEs would run quicker if you use the TOP clause instead of searching for ranges in your data.

John

Andre 425568 SSCrazy Points: 2479 More actions · Answer 14

I will try that with a trace, we have looked at using top, but it has not been very effective. The only difference between the two is one gets copied and runs in a query window in ssms and the one in SSIS uses the file with the code from Execute SQL Task, then in General -SQL Statement we use file connection and the option just above that we link to the file with the exact same code run through SSMS Query.

I will test profiler in a few days only as we are in the middle of a TEST ETL for one Client, and just after that another Client is going live, for now I will use scripts directly where we have cursors and for the rest of the code, I will use SSIS.