How to speed up this?

Question

Post reply

How to speed up this?

water490

SSC Eights!

Points: 983

September 24, 2023 at 4:30 am

#4292859

Hi everyone

One of my SSIS components seem to be very slow. I am fairly new to SS so I am probably doing something wrong. My SSIS package downloads files from SFTP server, imports them into SS table, runs SP. After I import the file(s) into the SS table, SSIS runs below code to create index. If there are no indexes then the SP run super slow. So I definitely have to include the below query. Is there a better way to handle how indexes get created/updated?

Thank you

USE [OptionsDB]
GO
/****** Object:  StoredProcedure [dbo].[CreateOptionsEODIndex]    Script Date: 2023-09-23 9:22:59 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[CreateOptionsEODIndex]

As

CREATE CLUSTERED COLUMNSTORE INDEX [OptionsEODIndex] ON [dbo].[OptionsEOD] WITH (DROP_EXISTING = ON, COMPRESSION_DELAY = 0) ON [PRIMARY]

CREATE NONCLUSTERED INDEX [OptionsEODIndex2] ON [dbo].[OptionsEOD]
(
[UNDERLYING_SYMBOL] ASC,
[QUOTE_DATE] ASC,
[EXPIRATION] ASC,
 ASC,
[OPTION_TYPE] ASC
)
INCLUDE([ROOT_SYMBOL],[BID_1545],[ASK_1545]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246991 More actions · Answer 1

Are you saying that this proc runs slower in SSIS than it would if you were running it from SSMS?

Have you determined which statement is the slow one (I imagine it's the clustered index creation)?

Have you tried doing the import with the indexes in place?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

frederico_fonseca SSCoach Points: 16167 More actions · Answer 2

why are you dropping and recreating the indexes? what are you trying to accomplish with it?

(for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)

Phil Parkin SSC Guru Points: 246991 More actions · Answer 3

frederico_fonseca wrote:

why are you dropping and recreating the indexes? what are you trying to accomplish with it?
(for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)

Ah, thank you for that. Looking forward to seeing the OP's response.

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

Jonathan AC Roberts SSCoach Points: 18377 More actions · Answer 4

Do you know which index takes the longest to create or are they both about the same?

You can use MAXDOP in the options when creating an index, which might speed it up a bit:

https://learn.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql?view=sql-server-ver16

https://learn.microsoft.com/en-us/sql/t-sql/statements/create-index-transact-sql?view=sql-server-ver16

water490 SSC Eights! Points: 983 More actions · Answer 5

Phil Parkin wrote:

Are you saying that this proc runs slower in SSIS than it would if you were running it from SSMS?
Have you determined which statement is the slow one (I imagine it's the clustered index creation)?
Have you tried doing the import with the indexes in place?

No, SSIS runs about same time as SSMS

Yes, it is the clustered one. It takes 3m to run. The entire SP takes 3.5m to run.

The files that get added are new CSV files from the SFTP server so there are no indexes to import.

water490 SSC Eights! Points: 983 More actions · Answer 6

frederico_fonseca wrote:

why are you dropping and recreating the indexes? what are you trying to accomplish with it?
(for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)

I am a rookie so there is most likely a more efficient way to do what I want to do.

I know that indexes speed up queries big time so that is why I re-index the main table after SSIS imports the CSV file into the main table. I just don't know if the way I am handling the re-creation is the most efficient way to do this. As for timing, the entire SP takes about 3.5m to run. The clustered index (by commenting out the nonclustered one) gives the SP a run time of about 3m. The entire SSIS package takes about 6m19s to run. The table has about 31million records so not a huge table. It is only storing test data. My concern is when I get the entire financial data from data provider then the run time will be unacceptable.

Originally, I thought of clustered index so I added it. The non-clustered one came from the execution plan so I added it b/c it was suggested by the plan. I am completely open to re-writing the entire indexing SP if need be. In the end, I have to make sure the main table is indexed properly so the various SP have acceptable run time.

Is there a better way to index the table? I am open to all ideas.

frederico_fonseca SSCoach Points: 16167 More actions · Answer 7

those files you are getting.... can you answer the following.

how many files per load
how many rows per file (max, min and average size)
how often you get them.
should they all be processed as a unit or can some be processed and some rejected?
are you loading first to a staging table or directly to their final destination

and a few questions regarding the production server.

how many cores
total server ram
how much ram allocated to the SQL instance
current max DOP setting
current CTFP (cost threshold for parallelism)
Sql server edition (express, standard, enterprise)

This reply was modified 1 year, 3 months ago by frederico_fonseca.

water490 SSC Eights! Points: 983 More actions · Answer 8

frederico_fonseca wrote:

those files you are getting.... can you answer the following.
how many files per load
how many rows per file (max, min and average size)
how often you get them.
should they all be processed as a unit or can some be processed and some rejected?
are you loading first to a staging table or directly to their final destination
and a few questions regarding the production server.
how many cores
total server ram
how much ram allocated to the SQL instance
current max DOP setting
current CTFP (cost threshold for parallelism)
Sql server edition (express, standard, enterprise)

here you go

how many files per load - varies. i get one new new file each business day. i run the ssis package usually on the weekend so that would be 5 files being loaded
how many rows per file (max, min and average size) - min is 1998, max is 23626, avg is 11426, avg for last 4 years is 18667
how often you get them. - i get one new file each business day
should they all be processed as a unit or can some be processed and some rejected? - all need to be imported
are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table

and a few questions regarding the production server.

how many cores - i have 12th Gen Intel(R) Core(TM) i9-12900K 3.20 GHz. per intel website, it has 16 cores (8 performance cores, 8 efficient cores)
total server ram - 128 GB
how much ram allocated to the SQL instance - 50000 MB = 50 GB
current max DOP setting - 4
current CTFP (cost threshold for parallelism) - 40
Sql server edition (express, standard, enterprise) - developer

frederico_fonseca SSCoach Points: 16167 More actions · Answer 9

Sql server edition (express, standard, enterprise) - developer --- if you are using this for Production usage you CAN NOT use developer license - if it is only for development and testing using Developer license is acceptable but not for Production.

are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table. care to be clearer on this

does your SP read the files (as csv files) and loads them to staging table
or are you using SSIS to load the files and then calling a SP to process the staging table data
and are the files moved one at the time onto the final table or one by one as soon as each file is loaded onto the staging table

with regards to your SP CreateOptionsEODIndex - (change its name to rebuild instead as you are not creating the indexes!!!)

For such a small number of rows being added each time you should not be recreating the columnstore index. and as you are on SQL 2019 there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you.
- if going forward you still think the columnstore index needs maintenance as part of this load, use alter index reorganize instead (NOTE: only use reorganize with columnstore indexes!!!)
the nonclustered index might be ok to run if it is fast - but I would add "data_compression = page" to the WITH options

water490 SSC Eights! Points: 983 More actions · Answer 10

frederico_fonseca wrote:

Sql server edition (express, standard, enterprise) - developer --- if you are using this for Production usage you CAN NOT use developer license - if it is only for development and testing using Developer license is acceptable but not for Production.
- i use SS for hobby reasons only. no commercial use whatsoever. the db is large so that is why i picked developer b/c it has no cap on how big a db can be.
are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table. care to be clearer on this
does your SP read the files (as csv files) and loads them to staging table
or are you using SSIS to load the files and then calling a SP to process the staging table data
and are the files moved one at the time onto the final table or one by one as soon as each file is loaded onto the staging table
SSIS downloads the files from the SFTP server and then it calls this SP to import the CSV into the temp table and then the main table:
ALTER PROCEDURE [dbo].[UpdateOptionsEOD] (@FILENAME varchar(200), @RECORD_ADD_DATE datetime)
AS

DECLARE @FILEPATH VARCHAR(200)
SET @FILEPATH = @FILENAME

DECLARE @RECORD_ADD_DT varchar(26)
SET @RECORD_ADD_DT = convert(varchar(26),@RECORD_ADD_DATE,121)

DROP TABLE IF EXISTS #TEMP_TABLE;

--temp table definition here

Exec ( 'BULK INSERT #TEMP_TABLE
  FROM ''' + 
 @FILEPATH + 
 ''' WITH (FIRSTROW = 2, FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'', MAXERRORS = 0);

UPDATE #TEMP_TABLE
SET UNDERLYING_SYMBOL = ''SPX''
WHERE TRIM(UNDERLYING_SYMBOL) LIKE ''^SPX''
 
INSERT INTO DBO.OptionsEOD
SELECT *, ''' + @RECORD_ADD_DT + ''' AS TIME_STAMP
FROM #TEMP_TABLE'
)

DROP TABLE #TEMP_TABLE
with regards to your SP CreateOptionsEODIndex - (change its name to rebuild instead as you are not creating the indexes!!!)
For such a small number of rows being added each time you should not be recreating the columnstore index. and as you are on SQL 2019 there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you.
if going forward you still think the columnstore index needs maintenance as part of this load, use alter index reorganize instead (NOTE: only use reorganize with columnstore indexes!!!)
the nonclustered index might be ok to run if it is fast - but I would add "data_compression = page" to the WITH options

what exactly do you mean by this "there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you"? Are you saying that SS19 does the re-build in the background whenever i add a new file to the main table?

frederico_fonseca SSCoach Points: 16167 More actions · Answer 11

helps to read the documentation.

https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver16

search for tuple-mover

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 12

If you want a load into a new table, then create the empty table AND create the Clustered Index. Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.

If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".

There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS. Phil Parkin would likely be the one to help there.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

frederico_fonseca SSCoach Points: 16167 More actions · Answer 13

Jeff Moden wrote:

If you want a load into a new table, then create the empty table AND create the Clustered Index. Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.
If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".
There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS. Phil Parkin would likely be the one to help there.

the OP is already using "bulk insert" within a SP into a temp table - and after that a insert into final table. SSIS is not being used for the load itself but for the FTP part of it to get the files in the first place

the loading itself would not have any performance issue as the OP is loading 7 files once per week, sequentially, of less than 30k rows - average of 11.5K rows

the performance issue the OP is trying to address is related to the fact that they are rebuilding a columnstore index on a table with more than 30 Million rows - and this is taking 3 mins to do.

Jonathan AC Roberts SSCoach Points: 18377 More actions · Answer 14

Jeff Moden wrote:

If you want a load into a new table, then create the empty table AND create the Clustered Index. Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.
If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".
There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS. Phil Parkin would likely be the one to help there.

You can just add a script control and put a any T-SQL statement in it. So you can do minimal logging from SSIS.