How to speed up this?

  • Hi everyone

    One of my SSIS components seem to be very slow.  I am fairly new to SS so I am probably doing something wrong.  My SSIS package downloads files from SFTP server, imports them into SS table, runs SP.  After I import the file(s) into the SS table, SSIS runs below code to create index.  If there are no indexes then the SP run super slow.  So I definitely have to include the below query.  Is there a better way to handle how indexes get created/updated?

    Thank you

    USE [OptionsDB]
    GO
    /****** Object: StoredProcedure [dbo].[CreateOptionsEODIndex] Script Date: 2023-09-23 9:22:59 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO

    ALTER PROCEDURE [dbo].[CreateOptionsEODIndex]

    As

    CREATE CLUSTERED COLUMNSTORE INDEX [OptionsEODIndex] ON [dbo].[OptionsEOD] WITH (DROP_EXISTING = ON, COMPRESSION_DELAY = 0) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [OptionsEODIndex2] ON [dbo].[OptionsEOD]
    (
    [UNDERLYING_SYMBOL] ASC,
    [QUOTE_DATE] ASC,
    [EXPIRATION] ASC,
    ASC,
    [OPTION_TYPE] ASC
    )
    INCLUDE([ROOT_SYMBOL],[BID_1545],[ASK_1545]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]

     

  • Are you saying that this proc runs slower in SSIS than it would if you were running it from SSMS?

    Have you determined which statement is the slow one (I imagine it's the clustered index creation)?

    Have you tried doing the import with the indexes in place?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • why are you dropping and recreating the indexes? what are you trying to accomplish with it?

    (for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)

  • frederico_fonseca wrote:

    why are you dropping and recreating the indexes? what are you trying to accomplish with it?

    (for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)

    Ah, thank you for that. Looking forward to seeing the OP's response.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Do you know which index takes the longest to create or are they both about the same?

    You can use MAXDOP in the options when creating an index, which might speed it up a bit:

    https://learn.microsoft.com/en-us/sql/t-sql/statements/create-columnstore-index-transact-sql?view=sql-server-ver16

    https://learn.microsoft.com/en-us/sql/t-sql/statements/create-index-transact-sql?view=sql-server-ver16

     

  • Phil Parkin wrote:

    Are you saying that this proc runs slower in SSIS than it would if you were running it from SSMS?

    Have you determined which statement is the slow one (I imagine it's the clustered index creation)?

    Have you tried doing the import with the indexes in place?

    No, SSIS runs about same time as SSMS

    Yes, it is the clustered one.  It takes 3m to run.  The entire SP takes 3.5m to run.

    The files that get added are new CSV files from the SFTP server so there are no indexes to import.

  • frederico_fonseca wrote:

    why are you dropping and recreating the indexes? what are you trying to accomplish with it?

    (for Phil - OP has DROP_EXISTING = ON so indexes are there already while import is being done)

    I am a rookie so there is most likely a more efficient way to do what I want to do.

    I know that indexes speed up queries big time so that is why I re-index the main table after SSIS imports the CSV file into the main table.  I just don't know if the way I am handling the re-creation is the most efficient way to do this.  As for timing, the entire SP takes about 3.5m to run.  The clustered index (by commenting out the nonclustered one) gives the SP a run time of about 3m.  The entire SSIS package takes about 6m19s to run.  The table has about 31million records so not a huge table.  It is only storing test data.  My concern is when I get the entire financial data from data provider then the run time will be unacceptable.

    Originally, I thought of clustered index so I added it.  The non-clustered one came from the execution plan so I added it b/c it was suggested by the plan.  I am completely open to re-writing the entire indexing SP if need be.  In the end, I have to make sure the main table is indexed properly so the various SP have acceptable run time.

    Is there a better way to index the table?  I am open to all ideas.

  • those files you are getting.... can you answer the following.

    • how many files per load
    • how many rows per file (max, min and average size)
    • how often you get them.
    • should they all be processed as a unit or can some be processed and some rejected?
    • are you loading first to a staging table or directly to their final destination

    and a few questions regarding the production server.

    • how many cores
    • total server ram
    • how much ram allocated to the SQL instance
    • current max DOP setting
    • current CTFP (cost threshold for parallelism)
    • Sql server edition (express, standard, enterprise)
  • frederico_fonseca wrote:

    those files you are getting.... can you answer the following.

    • how many files per load
    • how many rows per file (max, min and average size)
    • how often you get them.
    • should they all be processed as a unit or can some be processed and some rejected?
    • are you loading first to a staging table or directly to their final destination

    and a few questions regarding the production server.

    • how many cores
    • total server ram
    • how much ram allocated to the SQL instance
    • current max DOP setting
    • current CTFP (cost threshold for parallelism)
    • Sql server edition (express, standard, enterprise)

    here you go

    • how many files per load - varies. i get one new new file each business day. i run the ssis package usually on the weekend so that would be 5 files being loaded
    • how many rows per file (max, min and average size) - min is 1998, max is 23626, avg is 11426, avg for last 4 years is 18667
    • how often you get them. - i get one new file each business day
    • should they all be processed as a unit or can some be processed and some rejected? - all need to be imported
    • are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table

    and a few questions regarding the production server.

    • how many cores - i have 12th Gen Intel(R) Core(TM) i9-12900K 3.20 GHz. per intel website, it has 16 cores (8 performance cores, 8 efficient cores)
    • total server ram - 128 GB
    • how much ram allocated to the SQL instance - 50000 MB = 50 GB
    • current max DOP setting - 4
    • current CTFP (cost threshold for parallelism) - 40
    • Sql server edition (express, standard, enterprise) - developer
  • Sql server edition (express, standard, enterprise) - developer --- if you are using this for Production usage you CAN NOT use developer license - if it is only for development and testing using Developer license is acceptable but not for Production.

    are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table. care to be clearer on this

    • does your SP read the files (as csv files) and loads them to staging table
    • or are you using SSIS to load the files and then calling a SP to process the staging table data
    • and are the files moved one at the time onto the final table or one by one as soon as each file is loaded onto the staging table

     

    with regards to your SP CreateOptionsEODIndex - (change its name to rebuild instead as you are not creating the indexes!!!)

    • For such a small number of rows being added each time you should not be recreating the columnstore index. and as you are on SQL 2019 there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you.

      • if going forward you still think the columnstore index needs maintenance as part of this load, use alter index reorganize instead (NOTE: only use reorganize with columnstore indexes!!!)

    • the nonclustered index might be ok to run if it is fast - but I would add "data_compression = page" to the WITH options

     

  • frederico_fonseca wrote:

    Sql server edition (express, standard, enterprise) - developer --- if you are using this for Production usage you CAN NOT use developer license - if it is only for development and testing using Developer license is acceptable but not for Production.

    - i use SS for hobby reasons only.   no commercial use whatsoever.  the db is large so that is why i picked developer b/c it has no cap on how big a db can be.

    are you loading first to a staging table or directly to their final destination - the SP loads into a temp table and then into the main table. care to be clearer on this

    • does your SP read the files (as csv files) and loads them to staging table
    • or are you using SSIS to load the files and then calling a SP to process the staging table data
    • and are the files moved one at the time onto the final table or one by one as soon as each file is loaded onto the staging table

    SSIS downloads the files from the SFTP server and then it calls this SP to import the CSV into the temp table and then the main table:

    ALTER PROCEDURE [dbo].[UpdateOptionsEOD] (@FILENAME varchar(200), @RECORD_ADD_DATE datetime)
    AS

    DECLARE @FILEPATH VARCHAR(200)
    SET @FILEPATH = @FILENAME

    DECLARE @RECORD_ADD_DT varchar(26)
    SET @RECORD_ADD_DT = convert(varchar(26),@RECORD_ADD_DATE,121)

    DROP TABLE IF EXISTS #TEMP_TABLE;

    --temp table definition here

    Exec ( 'BULK INSERT #TEMP_TABLE
    FROM ''' +
    @FILEPATH +
    ''' WITH (FIRSTROW = 2, FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'', MAXERRORS = 0);

    UPDATE #TEMP_TABLE
    SET UNDERLYING_SYMBOL = ''SPX''
    WHERE TRIM(UNDERLYING_SYMBOL) LIKE ''^SPX''

    INSERT INTO DBO.OptionsEOD
    SELECT *, ''' + @RECORD_ADD_DT + ''' AS TIME_STAMP
    FROM #TEMP_TABLE'
    )

    DROP TABLE #TEMP_TABLE

    with regards to your SP CreateOptionsEODIndex - (change its name to rebuild instead as you are not creating the indexes!!!)

    • For such a small number of rows being added each time you should not be recreating the columnstore index. and as you are on SQL 2019 there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you.

      • if going forward you still think the columnstore index needs maintenance as part of this load, use alter index reorganize instead (NOTE: only use reorganize with columnstore indexes!!!)

    • the nonclustered index might be ok to run if it is fast - but I would add "data_compression = page" to the WITH options

    what exactly do you mean by this "there is no point in doing a reorganize to cleanup the index as a internal background SQL task already does that for you"?  Are you saying that SS19 does the re-build in the background whenever i add a new file to the main table?

  • helps to read the documentation.

    https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver16

    search for tuple-mover

  • If you want a load into a new table, then create the empty table AND create the Clustered Index.  Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.

    If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".

    There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS.  Phil Parkin would likely be the one to help there.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    If you want a load into a new table, then create the empty table AND create the Clustered Index.  Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.

    If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".

    There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS.  Phil Parkin would likely be the one to help there.

    the OP is already using "bulk insert" within a SP into a temp table - and after that a insert into final table.  SSIS is not being used for the load itself but for the FTP part of it to get the files in the first place

    the loading itself would not have any performance issue as the OP is loading 7 files once per week, sequentially, of less than 30k rows - average of 11.5K rows

    the performance issue the OP is trying to address is related to the fact that they are rebuilding a columnstore index on a table with more than 30 Million rows - and this is taking 3 mins to do.

  • Jeff Moden wrote:

    If you want a load into a new table, then create the empty table AND create the Clustered Index.  Then, forget SSIS... use BULK INSERT with the TABLOCK option and read up on the requirements for MINIMAL LOGGING.

    If your "Minimal Logging" is successful, the load will be faster than with loading into a bare table with logging AND you won't have to build the Clustered Index afterwards and you won't have the "doubling of space required".

    There is a way to do such "Minimal Logging" in SSIS but I don't know how to do it because I avoid SSIS.  Phil Parkin would likely be the one to help there.

    You can just add a script control and put a any T-SQL statement in it.  So you can do minimal logging from SSIS.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply