September 24, 2023 at 4:30 am
Hi everyone
One of my SSIS components seem to be very slow. I am fairly new to SS so I am probably doing something wrong. My SSIS package downloads files from SFTP server, imports them into SS table, runs SP. After I import the file(s) into the SS table, SSIS runs below code to create index. If there are no indexes then the SP run super slow. So I definitely have to include the below query. Is there a better way to handle how indexes get created/updated?
Thank you
USE [OptionsDB]
GO
/****** Object: StoredProcedure [dbo].[CreateOptionsEODIndex] Script Date: 2023-09-23 9:22:59 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[CreateOptionsEODIndex]
As
CREATE CLUSTERED COLUMNSTORE INDEX [OptionsEODIndex] ON [dbo].[OptionsEOD] WITH (DROP_EXISTING = ON, COMPRESSION_DELAY = 0) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [OptionsEODIndex2] ON [dbo].[OptionsEOD]
(
[UNDERLYING_SYMBOL] ASC,
[QUOTE_DATE] ASC,
[EXPIRATION] ASC,ASC,
[OPTION_TYPE] ASC
)
INCLUDE([ROOT_SYMBOL],[BID_1545],[ASK_1545]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
September 24, 2023 at 9:51 am
Are you saying that this proc runs slower in SSIS than it would if you were running it from SSMS?
Have you determined which statement is the slow one (I imagine it's the clustered index creation)?
Have you tried doing the import with the indexes in place?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 24, 2023 at 12:41 pm
why are you dropping and recreating the indexes? what are you trying to accomplish with it?
(for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)
September 24, 2023 at 6:30 pm
why are you dropping and recreating the indexes? what are you trying to accomplish with it?
(for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)
Ah, thank you for that. Looking forward to seeing the OP's response.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 24, 2023 at 7:50 pm
Do you know which index takes the longest to create or are they both about the same?
You can use MAXDOP in the options when creating an index, which might speed it up a bit:
September 24, 2023 at 9:10 pm
Are you saying that this proc runs slower in SSIS than it would if you were running it from SSMS?
Have you determined which statement is the slow one (I imagine it's the clustered index creation)?
Have you tried doing the import with the indexes in place?
No, SSIS runs about same time as SSMS
Yes, it is the clustered one. It takes 3m to run. The entire SP takes 3.5m to run.
The files that get added are new CSV files from the SFTP server so there are no indexes to import.
September 24, 2023 at 9:19 pm
why are you dropping and recreating the indexes? what are you trying to accomplish with it?
(for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)
I am a rookie so there is most likely a more efficient way to do what I want to do.
I know that indexes speed up queries big time so that is why I re-index the main table after SSIS imports the CSV file into the main table. I just don't know if the way I am handling the re-creation is the most efficient way to do this. As for timing, the entire SP takes about 3.5m to run. The clustered index (by commenting out the nonclustered one) gives the SP a run time of about 3m. The entire SSIS package takes about 6m19s to run. The table has about 31million records so not a huge table. It is only storing test data. My concern is when I get the entire financial data from data provider then the run time will be unacceptable.
Originally, I thought of clustered index so I added it. The non-clustered one came from the execution plan so I added it b/c it was suggested by the plan. I am completely open to re-writing the entire indexing SP if need be. In the end, I have to make sure the main table is indexed properly so the various SP have acceptable run time.
Is there a better way to index the table? I am open to all ideas.
September 24, 2023 at 9:30 pm
those files you are getting.... can you answer the following.
and a few questions regarding the production server.
September 24, 2023 at 9:58 pm
those files you are getting.... can you answer the following.
- how many files per load
- how many rows per file (max, min and average size)
- how often you get them.
- should they all be processed as a unit or can some be processed and some rejected?
- are you loading first to a staging table or directly to their final destination
and a few questions regarding the production server.
- how many cores
- total server ram
- how much ram allocated to the SQL instance
- current max DOP setting
- current CTFP (cost threshold for parallelism)
- Sql server edition (express, standard, enterprise)
here you go
and a few questions regarding the production server.
September 24, 2023 at 10:26 pm
Sql server edition (express, standard, enterprise) - developer --- if you are using this for Production usage you CAN NOT use developer license - if it is only for development and testing using Developer license is acceptable but not for Production.
are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table. care to be clearer on this
with regards to your SP CreateOptionsEODIndex - (change its name to rebuild instead as you are not creating the indexes!!!)
September 24, 2023 at 11:11 pm
Sql server edition (express, standard, enterprise) - developer --- if you are using this for Production usage you CAN NOT use developer license - if it is only for development and testing using Developer license is acceptable but not for Production.
- i use SS for hobby reasons only. no commercial use whatsoever. the db is large so that is why i picked developer b/c it has no cap on how big a db can be.
are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table. care to be clearer on this
- does your SP read the files (as csv files) and loads them to staging table
- or are you using SSIS to load the files and then calling a SP to process the staging table data
- and are the files moved one at the time onto the final table or one by one as soon as each file is loaded onto the staging table
SSIS downloads the files from the SFTP server and then it calls this SP to import the CSV into the temp table and then the main table:
ALTER PROCEDURE [dbo].[UpdateOptionsEOD] (@FILENAME varchar(200), @RECORD_ADD_DATE datetime)
AS
DECLARE @FILEPATH VARCHAR(200)
SET @FILEPATH = @FILENAME
DECLARE @RECORD_ADD_DT varchar(26)
SET @RECORD_ADD_DT = convert(varchar(26),@RECORD_ADD_DATE,121)
DROP TABLE IF EXISTS #TEMP_TABLE;
--temp table definition here
Exec ( 'BULK INSERT #TEMP_TABLE
FROM ''' +
@FILEPATH +
''' WITH (FIRSTROW = 2, FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'', MAXERRORS = 0);
UPDATE #TEMP_TABLE
SET UNDERLYING_SYMBOL = ''SPX''
WHERE TRIM(UNDERLYING_SYMBOL) LIKE ''^SPX''
INSERT INTO DBO.OptionsEOD
SELECT *, ''' + @RECORD_ADD_DT + ''' AS TIME_STAMP
FROM #TEMP_TABLE'
)
DROP TABLE #TEMP_TABLEwith regards to your SP CreateOptionsEODIndex - (change its name to rebuild instead as you are not creating the indexes!!!)
- For such a small number of rows being added each time you should not be recreating the columnstore index. and as you are on SQL 2019 there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you.
- if going forward you still think the columnstore index needs maintenance as part of this load, use alter index reorganize instead (NOTE: only use reorganize with columnstore indexes!!!)
- the nonclustered index might be ok to run if it is fast - but I would add "data_compression = page" to the WITH options
what exactly do you mean by this "there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you"? Are you saying that SS19 does the re-build in the background whenever i add a new file to the main table?
September 26, 2023 at 11:01 pm
helps to read the documentation.
search for tuple-mover
September 27, 2023 at 7:14 am
If you want a load into a new table, then create the empty table AND create the Clustered Index. Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.
If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".
There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS. Phil Parkin would likely be the one to help there.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 27, 2023 at 7:55 am
If you want a load into a new table, then create the empty table AND create the Clustered Index. Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.
If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".
There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS. Phil Parkin would likely be the one to help there.
the OP is already using "bulk insert" within a SP into a temp table - and after that a insert into final table. SSIS is not being used for the load itself but for the FTP part of it to get the files in the first place
the loading itself would not have any performance issue as the OP is loading 7 files once per week, sequentially, of less than 30k rows - average of 11.5K rows
the performance issue the OP is trying to address is related to the fact that they are rebuilding a columnstore index on a table with more than 30 Million rows - and this is taking 3 mins to do.
September 27, 2023 at 8:48 am
If you want a load into a new table, then create the empty table AND create the Clustered Index. Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.
If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".
There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS. Phil Parkin would likely be the one to help there.
You can just add a script control and put a any T-SQL statement in it. So you can do minimal logging from SSIS.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply