August 11, 2014 at 4:29 am
Hi
I have changed an ETL process which runs on SSMS to run directly from the file system through SSIS, to I run SQL Execute task which them gets the file with the SSIS script. It is taking 4+ times longer to run the same query through SSIS than directly in SSMS. There are a vast number of scripts so it difficult narrowing down the cause of this. Any hints what I can do and what to look for to improve performance? We are using quite a lot of cursors in the queries to do updates on tables, some which have more than a billion records.
Andre
August 11, 2014 at 4:43 am
Andre
First, do you really need those cursors? If you can find a way to do your update with a set-based query, it's likely to run much faster. Second, are the settings the same for you when you run it from SSMS as they are for whatever account runs it through SSIS? Options such as ANSI NULLS and regional settings can make a difference. Finally, is the network latency the same between the source server and where SSMS runs as between the source server and whatever server consumes the output?
John
August 11, 2014 at 5:51 am
John, thanks for the reply
First of all the cursor seems to be the only way, we are updating/deleting records where rowcounts are in the 100 of millions or early billions, we had issues when running straight updates, so we put them into a cursor to update a millions rows at a time.
How do I check this for SSIS and SSMS?
Second, are the settings the same for you when you run it from SSMS as they are for whatever account runs it through SSIS? Options such as ANSI NULLS and regional settings can make a difference.
To third qustion, it's on the same server, we have set max memory on SQL so SSIS has about 50Gb memory to use.
Anything else?
August 11, 2014 at 7:11 am
For SSMS, go to Tools -> Options -> Query Execution -> SQL Server -> Advanced. For SSIS, it depends on the properties of the network library you're using to connect to SQL Server. I think you can also capture the options in Profiler when the connection is made. Make sure you don't leave the trace running if it's a live server.
You might also want to query your plan cache to see whether you have two different execution plans.
John
August 11, 2014 at 7:18 am
Not seeing it, am using Visual Studio 201 BI development, am getting to options but not seeing the rest there
August 11, 2014 at 7:30 am
You said you were using SSMS.
John
August 11, 2014 at 2:04 pm
It runs faster on SSMS than on SSIS, I need to get the performance of SSIS up, and I am not sure why the scripts runs about 4 times faster through SSMS than on SSIS, sorry if I was not clear on this
August 12, 2014 at 2:03 am
That's what I thought you meant, and it's why I suggested you compare your options on SSMS to those it uses when run through SSIS.
John
August 12, 2014 at 2:33 am
Maybe I am being a bit slow, but explain to me exactly how to compare the two.
I would also like to find out why everyone is saying do not use cursors in SSIS, I have not really found a reason as such why to avoid it.
August 12, 2014 at 2:43 am
For SSMS, go to Tools -> Options -> Query Execution -> SQL Server -> Advanced. For SSIS, it depends on the properties of the network library you're using to connect to SQL Server. I think you can also capture the options in Profiler when the connection is made. Make sure you don't leave the trace running if it's a live server.
Cursors aren't bad in SSIS in particular; they're just bad for performance generally (in most cases). There are cases where there's no alternative but to use them, and even a few situations in which they'll outperform set-based code. Your desire to avoid locking a whole chunk of data and filling up the transaction log is a good reason to use some sort of loop. SSIS can automatically process data in batches, though, so you may wish to take advantage of that instead of explicitly declaring cursors in your code. I don't know enough about what you're trying to do to recommend one way or the other.
John
August 12, 2014 at 4:15 am
Thanks I will have a look at what you said, check the settings etc.
The cursors we use look for a table table and field name, and for example if a certain table was two data fields that are both nulls we delete the entry(line). We are doing it in batches of checking through a million rows at a a time. The code is working perfectly though in SQL Mngmnt Studio, when we use the same code, pull it in through from a file location and run it as a SQL Query, it takes 4 time longer to run, that is our main concern, there are probably better ways to do it but as we are working with about 500 tables in total, we found it better to keep it generic code as far as possible, rather than writing separate code for each table. I will so what the settings are in SSMS and SSIS, but I do not think that is the issue, the normal queries, that also run from a gile location, example that set up primary and foreign keys, create indexes etc are all running within what we are expecting.
At this stage it looks like getting away from the cursors seems to be the way to go, but that is a lot of extra work to put in, seeing as I am also trying to get the ETL more acceptable through SSIS than running code manually as it has been done before. We would ideally alos like to keep the code the same as we might have clients where the ETL's rather need to be done through SSMS because of server capacity.
August 12, 2014 at 4:46 am
Andre 425568 (8/12/2014)
I will so what the settings are in SSMS and SSIS, but I do not think that is the issue, the normal queries, that also run from a gile location, example that set up primary and foreign keys, create indexes etc are all running within what we are expecting.
Yes, but DDL operations don't (usually) return result sets and so network latency is much less likely to be a factor. Also, DML is more likely to be sensitive to options and regional settings. Would you be able to post a snippet of your code (or all of it), please?
John
August 12, 2014 at 4:57 am
This is the cursor specifically where I picked up the problem
DECLARE @object_id INT
DECLARE @table_name VARCHAR(128)
DECLARE @col_name VARCHAR(128)
DECLARE @exec_sql_string VARCHAR(500)
DECLARE @exec_sql NVARCHAR(500)
DECLARE @max_id NVARCHAR(4000)
DECLARE @OutputParameter NVARCHAR(4000)
DECLARE @id_count NUMERIC(38, 0)
DECLARE @id_count_to NUMERIC(38, 0)
SET NOCOUNT ON
DECLARE recs CURSOR DYNAMIC
FOR
SELECT object_id
,name
FROM sys.tables ts
WHERE object_id IN (
SELECT object_id
FROM sys.all_columns
WHERE name = 'INDEX0'
)
ORDER BY name
OPEN recs
FETCH
FROM recs
INTO @object_id
,@table_name
WHILE @@FETCH_STATUS = 0
BEGIN
SET @exec_sql_string = ''
DECLARE recs1 CURSOR DYNAMIC
FOR
SELECT name
FROM sys.all_columns
WHERE object_id = @object_id
AND name != 'ID'
AND name != 'INDEX0'
AND (
name NOT LIKE 'MAS%'
AND name NOT LIKE 'Z%'
)
OPEN recs1
FETCH
FROM recs1
INTO @col_name
WHILE @@FETCH_STATUS = 0
BEGIN
IF @exec_sql_string = ''
BEGIN
SET @exec_sql_string = 'DELETE FROM ' + @table_name + ' WHERE ' + @col_name + ' IS NULL '
END
ELSE
BEGIN
SET @exec_sql_string = @exec_sql_string + 'AND ' + @col_name + ' IS NULL '
END
FETCH
FROM recs1
INTO @col_name
END
SET @exec_sql = 'SELECT @OutputParameter = MAX(ID) FROM ' + @table_name
EXEC @max_id = sp_executesql @exec_sql
,N'@OutputParameter nvarchar(4000) output'
,@OutputParameter OUTPUT
IF LTRIM(RTRIM(@OutputParameter)) = ''
OR @OutputParameter IS NULL
BEGIN
SET @OutputParameter = '0'
END
SET @id_count = 1
IF @OutputParameter <> '0'
BEGIN
WHILE @id_count < @OutputParameter
BEGIN
SET @id_count_to = @id_count + 1000000
EXEC ('BEGIN TRANSACTION ' + @exec_sql_string + ' AND ID >= ' + @id_count + ' AND ID <= ' + @id_count_to + ' COMMIT TRANSACTION')
PRINT('FROM: ' + CAST(@id_count as varchar(150)) + ' TO: ' + CAST(@id_count as varchar(150)))
SET @id_count = @id_count + 1000000
END
SET @id_count_to = @id_count + 1000000
EXEC ('BEGIN TRANSACTION ' + @exec_sql_string + ' AND ID >= ' + @id_count + ' AND ID <= ' + @id_count_to + ' COMMIT TRANSACTION')
PRINT('FROM: ' + CAST(@id_count as varchar(150)) + ' TO: ' + CAST(@id_count as varchar(150)))
END
CLOSE recs1
DEALLOCATE recs1
FETCH
FROM recs
INTO @object_id
,@table_name
END
CLOSE recs
DEALLOCATE recs
GO
August 12, 2014 at 5:14 am
I can't see anything there that would obviously run differently under different settings. What I advise you to do is to capture each event with a trace, and capture the execution plans as well. You can then compare the slow one to the faster one. Are you confident that they are doing exactly the same thing and it's just the speed that's different?
One thing I did notice - I think your DELETEs would run quicker if you use the TOP clause instead of searching for ranges in your data.
John
August 12, 2014 at 5:31 am
I will try that with a trace, we have looked at using top, but it has not been very effective. The only difference between the two is one gets copied and runs in a query window in ssms and the one in SSIS uses the file with the code from Execute SQL Task, then in General -SQL Statement we use file connection and the option just above that we link to the file with the exact same code run through SSMS Query.
I will test profiler in a few days only as we are in the middle of a TEST ETL for one Client, and just after that another Client is going live, for now I will use scripts directly where we have cursors and for the rest of the code, I will use SSIS.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply