Slower and slower.. what will you do...?

Question

Post reply

Slower and slower.. what will you do...?

Barkingdog

SSCoach

Points: 18670
More actions
January 22, 2008 at 5:18 pm

#185954

We have a table with a column that stores filenames. These filenames are on average 100 bytes long. We have a procedure that needs to add a filename record to this table but only if the filename does not already exist in the table. As the table gets larger and larger it takes longer and longer to insert a single filename record into it.
One way to speed this up is to store the filename field as a "char" not varchar type. We could also index the filename field but indexing those those 100 byte long names upon record insertion really slows down the entire process.
How would you deal with this issue?
TIA,
barkingdog

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

russell-154600 Ten Centuries Points: 1375 More actions · Answer 1

i'd add index to filename with a generous fillfactor, say 70. and i'd rebuild indexes at least weekly.

did u test performance with index?

Jeff Moden SSC Guru Points: 1004312 More actions · Answer 2

For testing purposes, how many file names are we talking about? And, if you don't mind, can you post the CREATE statment for the table, any existing indexs, and any triggers, please?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

And the code you're using to check existence and insert also, please.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 4

Or create a unique constraint (or check the ignore duplicate key) on the existing index.

then you always can insert directly and if a record exists, you will get a warning.

N 56°04'39.16"
E 12°55'05.25"

Barkingdog SSCoach Points: 18670 More actions · Answer 5

The general issue we are having is this: inserting 10,000 rows into a clean table is quick. If we add

another 100,000 rows to that table, then inserting another 10,000 takes a lot longer than it did for

the first batch of 10,000. .This is true regardless if we use an index on filename or not.

We will be adding about 10,000 such records each day. Our test table has about 1.2 milions rows so things can get pretty slow!

The table schema (sans index on Filename) is shown below:

CREATE TABLE [dbo].[IF2](

[IFID] [int] IDENTITY(1,1) NOT NULL,

[TypeID] [int] NOT NULL,

[FPath] [varchar](255) NOT NULL,

[InDateTime] [datetime] NOT NULL CONSTRAINT [DF_IF2_InputDateTime] DEFAULT (getdate()),

[Status] [int] NOT NULL,

[FileName] [varchar](100) NOT NULL,

CONSTRAINT [PK_IFId] PRIMARY KEY CLUSTERED

(

[IFID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,

FILLFACTOR = 90) ON [PRIMARY]

) ON [PRIMARY]

Here's how we check for existance of the filename....

IF EXISTS(SELECT * FROM IF2 WHERE [FileName]=@fileName and TypeId = @TypeId and Status =@Status)

BEGIN

SELECT @existingpath= (SELECT FPath FROM IF2 WHERE FileName = @fileName and TypeId = @ImageTypeId and Status = @status)

RETURN

END

and if not found, we insert it thus.

INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

VALUES(@TypeId, @fpath, @fileName, @status, getDate())

TIA,

Barkingdog

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

It seems to me that you'd need an index on (filename,typeID,status) with an include for (fpath). Give how big you have fpath - the include may or may not do anything for you, so perhaps test with or without that.

It also seems to me that you're doing this one filename at a time, and the first reaction to that is - WHY? You haven't shown anything so far that would require one at a time handling.

Just curious. One single 10,000 row insert operation is a whole lot faster than 10,000, 1-row inserts.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004312 More actions · Answer 7

Matt is spot on... rewrite the insert code to be set based instead of RBAR.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

That's gonna really hurt, especially if there's no index. Are there other indexes on the table, other than the primary key? If so, on what columns

Where are you getting the filenames from? Another table? Input from an application?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Johan Bijnens SSC Guru Points: 135364 More actions · Answer 9

If you cannot convert to set-based inserts (preferable bulk inserts) ...

skip the "if exists" but use the new SQL2005 try catch methode and put a unique constraint at the field combination.

(TypeId, FPath, FileName, Status)

-- avoid an extra read to see if exists, just do the insert, it will fail

-- due to the unique constraint

begin try

INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

VALUES(@TypeId, @fpath, @fileName, @status, getDate())

end try

begin catch

-- add any logging overhere

...

end catch

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Barkingdog SSCoach Points: 18670 More actions · Answer 10

>>Where are you getting the filenames from?

A sepearted, no sq

Barkingdog SSCoach Points: 18670 More actions · Answer 11

>>>Where are you getting the filenames from?

A seperate non-sql application generates the files and their name upon demand. THen the app ships them over to a directory, which is "watched" by a filescan routine. When it finds a new file it triggers the sql that this set of posts is about.

I definitely think that batching things together (maybe in a seperate table) and later updating the primary table in a single go makes sense. The idea of letting an error happen and trapping also has value (depending upon how quickly the error handling routine work.)

Thanks to everyone for their help. Much appreciated.

Barkingdog

namtaru Valued Member Points: 70 More actions · Answer 12

This may be way off what you are looking for in a answer, but have you thought about horizntally partitioning that table?

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

In that case, I would suggest the following.

When the new file is spotted, import that file (bcp or Bulk insert) into a staging table. Then do joins between that staging table and the main table to eliminate dups as you insert. Something like this.

CREATE TABLE #FileStaging (

...

)

BULK INSERT #FileStaging

FROM 'C:\SomeDirectory\someFile.csv' WITH ... (bulk insert options here)

-- Test indexes on the # table. Probably on Filename, Type and Status.

INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

SELECT TypeID, fPath, FileName, Status, getdate()

FROM #FileStaging FS

WHERE NOT EXISTS

(SELECT 1 FROM IF2

WHERE IF2.FileName = FS.FileName AND IF2.TypeID = FS.TypeID AND IF2.Status = FS.Staging)

The not exists is probably the best. You may also want to try an outer join with a check for nonexistance to see if it's faster. Should be much the same

INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)

SELECT TypeID, fPath, FileName, Status, getdate()

FROM #FileStaging FS LEFT OUTER JOIN IF2 ON IF2.FileName = FS.FileName AND IF2.TypeID = FS.TypeID AND IF2.Status = FS.Staging

WHERE IF2.FileName IS NULL

I would also recommend an index on IF2 (if one doesn't exist) on FileName, TypeID and Status

Is there any chance of duplicates in the dumped file?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass