Slower and slower.. what will you do...?

  • We have a table with a column that stores filenames. These filenames are on average 100 bytes long. We have a procedure that needs to add a filename record to this table but only if the filename does not already exist in the table. As the table gets larger and larger it takes longer and longer to insert a single filename record into it.

    One way to speed this up is to store the filename field as a "char" not varchar type. We could also index the filename field but indexing those those 100 byte long names upon record insertion really slows down the entire process.

    How would you deal with this issue?

    TIA,

    barkingdog

  • i'd add index to filename with a generous fillfactor, say 70. and i'd rebuild indexes at least weekly.

    did u test performance with index?

  • For testing purposes, how many file names are we talking about? And, if you don't mind, can you post the CREATE statment for the table, any existing indexs, and any triggers, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And the code you're using to check existence and insert also, please.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Or create a unique constraint (or check the ignore duplicate key) on the existing index.

    then you always can insert directly and if a record exists, you will get a warning.


    N 56°04'39.16"
    E 12°55'05.25"

  • The general issue we are having is this: inserting 10,000 rows into a clean table is quick. If we add

    another 100,000 rows to that table, then inserting another 10,000 takes a lot longer than it did for

    the first batch of 10,000. .This is true regardless if we use an index on filename or not.

    We will be adding about 10,000 such records each day. Our test table has about 1.2 milions rows so things can get pretty slow!

    The table schema (sans index on Filename) is shown below:

    CREATE TABLE [dbo].[IF2](

    [IFID] [int] IDENTITY(1,1) NOT NULL,

    [TypeID] [int] NOT NULL,

    [FPath] [varchar](255) NOT NULL,

    [InDateTime] [datetime] NOT NULL CONSTRAINT [DF_IF2_InputDateTime] DEFAULT (getdate()),

    [Status] [int] NOT NULL,

    [FileName] [varchar](100) NOT NULL,

    CONSTRAINT [PK_IFId] PRIMARY KEY CLUSTERED

    (

    [IFID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,

    FILLFACTOR = 90) ON [PRIMARY]

    ) ON [PRIMARY]

    Here's how we check for existance of the filename....

    IF EXISTS(SELECT * FROM IF2 WHERE [FileName]=@fileName and TypeId = @TypeId and Status =@Status)

    BEGIN

    SELECT @existingpath= (SELECT FPath FROM IF2 WHERE FileName = @fileName and TypeId = @ImageTypeId and Status = @status)

    RETURN

    END

    and if not found, we insert it thus.

    INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

    VALUES(@TypeId, @fpath, @fileName, @status, getDate())

    TIA,

    Barkingdog

  • It seems to me that you'd need an index on (filename,typeID,status) with an include for (fpath). Give how big you have fpath - the include may or may not do anything for you, so perhaps test with or without that.

    It also seems to me that you're doing this one filename at a time, and the first reaction to that is - WHY? You haven't shown anything so far that would require one at a time handling.

    Just curious. One single 10,000 row insert operation is a whole lot faster than 10,000, 1-row inserts.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt is spot on... rewrite the insert code to be set based instead of RBAR.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • That's gonna really hurt, especially if there's no index. Are there other indexes on the table, other than the primary key? If so, on what columns

    Where are you getting the filenames from? Another table? Input from an application?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • If you cannot convert to set-based inserts (preferable bulk inserts) ...

    skip the "if exists" but use the new SQL2005 try catch methode and put a unique constraint at the field combination.

    (TypeId, FPath, FileName, Status)

    -- avoid an extra read to see if exists, just do the insert, it will fail

    -- due to the unique constraint

    begin try

    INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

    VALUES(@TypeId, @fpath, @fileName, @status, getDate())

    end try

    begin catch

    -- add any logging overhere

    ...

    end catch

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • >>Where are you getting the filenames from?

    A sepearted, no sq

  • >>>Where are you getting the filenames from?

    A seperate non-sql application generates the files and their name upon demand. THen the app ships them over to a directory, which is "watched" by a filescan routine. When it finds a new file it triggers the sql that this set of posts is about.

    I definitely think that batching things together (maybe in a seperate table) and later updating the primary table in a single go makes sense. The idea of letting an error happen and trapping also has value (depending upon how quickly the error handling routine work.)

    Thanks to everyone for their help. Much appreciated.

    Barkingdog

  • This may be way off what you are looking for in a answer, but have you thought about horizntally partitioning that table?

  • In that case, I would suggest the following.

    When the new file is spotted, import that file (bcp or Bulk insert) into a staging table. Then do joins between that staging table and the main table to eliminate dups as you insert. Something like this.

    CREATE TABLE #FileStaging (

    ...

    )

    BULK INSERT #FileStaging

    FROM 'C:\SomeDirectory\someFile.csv' WITH ... (bulk insert options here)

    -- Test indexes on the # table. Probably on Filename, Type and Status.

    INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

    SELECT TypeID, fPath, FileName, Status, getdate()

    FROM #FileStaging FS

    WHERE NOT EXISTS

    (SELECT 1 FROM IF2

    WHERE IF2.FileName = FS.FileName AND IF2.TypeID = FS.TypeID AND IF2.Status = FS.Staging)

    The not exists is probably the best. You may also want to try an outer join with a check for nonexistance to see if it's faster. Should be much the same

    INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

    SELECT TypeID, fPath, FileName, Status, getdate()

    FROM #FileStaging FS LEFT OUTER JOIN IF2 ON IF2.FileName = FS.FileName AND IF2.TypeID = FS.TypeID AND IF2.Status = FS.Staging

    WHERE IF2.FileName IS NULL

    I would also recommend an index on IF2 (if one doesn't exist) on FileName, TypeID and Status

    Is there any chance of duplicates in the dumped file?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply