January 22, 2008 at 5:18 pm
We have a table with a column that stores filenames. These filenames are on average 100 bytes long. We have a procedure that needs to add a filename record to this table but only if the filename does not already exist in the table. As the table gets larger and larger it takes longer and longer to insert a single filename record into it.
One way to speed this up is to store the filename field as a "char" not varchar type. We could also index the filename field but indexing those those 100 byte long names upon record insertion really slows down the entire process.
How would you deal with this issue?
TIA,
barkingdog
January 22, 2008 at 5:21 pm
i'd add index to filename with a generous fillfactor, say 70. and i'd rebuild indexes at least weekly.
did u test performance with index?
January 22, 2008 at 6:38 pm
For testing purposes, how many file names are we talking about? And, if you don't mind, can you post the CREATE statment for the table, any existing indexs, and any triggers, please?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 23, 2008 at 1:48 am
And the code you're using to check existence and insert also, please.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 23, 2008 at 1:55 am
Or create a unique constraint (or check the ignore duplicate key) on the existing index.
then you always can insert directly and if a record exists, you will get a warning.
N 56°04'39.16"
E 12°55'05.25"
January 25, 2008 at 4:51 pm
The general issue we are having is this: inserting 10,000 rows into a clean table is quick. If we add
another 100,000 rows to that table, then inserting another 10,000 takes a lot longer than it did for
the first batch of 10,000. .This is true regardless if we use an index on filename or not.
We will be adding about 10,000 such records each day. Our test table has about 1.2 milions rows so things can get pretty slow!
The table schema (sans index on Filename) is shown below:
CREATE TABLE [dbo].[IF2](
[IFID] [int] IDENTITY(1,1) NOT NULL,
[TypeID] [int] NOT NULL,
[FPath] [varchar](255) NOT NULL,
[InDateTime] [datetime] NOT NULL CONSTRAINT [DF_IF2_InputDateTime] DEFAULT (getdate()),
[Status] [int] NOT NULL,
[FileName] [varchar](100) NOT NULL,
CONSTRAINT [PK_IFId] PRIMARY KEY CLUSTERED
(
[IFID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
Here's how we check for existance of the filename....
IF EXISTS(SELECT * FROM IF2 WHERE [FileName]=@fileName and TypeId = @TypeId and Status =@Status)
BEGIN
SELECT @existingpath= (SELECT FPath FROM IF2 WHERE FileName = @fileName and TypeId = @ImageTypeId and Status = @status)
RETURN
END
and if not found, we insert it thus.
INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)
VALUES(@TypeId, @fpath, @fileName, @status, getDate())
TIA,
Barkingdog
January 25, 2008 at 5:27 pm
It seems to me that you'd need an index on (filename,typeID,status) with an include for (fpath). Give how big you have fpath - the include may or may not do anything for you, so perhaps test with or without that.
It also seems to me that you're doing this one filename at a time, and the first reaction to that is - WHY? You haven't shown anything so far that would require one at a time handling.
Just curious. One single 10,000 row insert operation is a whole lot faster than 10,000, 1-row inserts.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
January 26, 2008 at 7:51 am
Matt is spot on... rewrite the insert code to be set based instead of RBAR.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 26, 2008 at 11:34 am
That's gonna really hurt, especially if there's no index. Are there other indexes on the table, other than the primary key? If so, on what columns
Where are you getting the filenames from? Another table? Input from an application?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 26, 2008 at 12:02 pm
If you cannot convert to set-based inserts (preferable bulk inserts) ...
skip the "if exists" but use the new SQL2005 try catch methode and put a unique constraint at the field combination.
(TypeId, FPath, FileName, Status)
-- avoid an extra read to see if exists, just do the insert, it will fail
-- due to the unique constraint
begin try
INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)
VALUES(@TypeId, @fpath, @fileName, @status, getDate())
end try
begin catch
-- add any logging overhere
...
end catch
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
January 26, 2008 at 12:11 pm
>>Where are you getting the filenames from?
A sepearted, no sq
January 26, 2008 at 12:17 pm
>>>Where are you getting the filenames from?
A seperate non-sql application generates the files and their name upon demand. THen the app ships them over to a directory, which is "watched" by a filescan routine. When it finds a new file it triggers the sql that this set of posts is about.
I definitely think that batching things together (maybe in a seperate table) and later updating the primary table in a single go makes sense. The idea of letting an error happen and trapping also has value (depending upon how quickly the error handling routine work.)
Thanks to everyone for their help. Much appreciated.
Barkingdog
January 26, 2008 at 1:10 pm
This may be way off what you are looking for in a answer, but have you thought about horizntally partitioning that table?
January 27, 2008 at 2:54 am
In that case, I would suggest the following.
When the new file is spotted, import that file (bcp or Bulk insert) into a staging table. Then do joins between that staging table and the main table to eliminate dups as you insert. Something like this.
CREATE TABLE #FileStaging (
...
)
BULK INSERT #FileStaging
FROM 'C:\SomeDirectory\someFile.csv' WITH ... (bulk insert options here)
-- Test indexes on the # table. Probably on Filename, Type and Status.
INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)
SELECT TypeID, fPath, FileName, Status, getdate()
FROM #FileStaging FS
WHERE NOT EXISTS
(SELECT 1 FROM IF2
WHERE IF2.FileName = FS.FileName AND IF2.TypeID = FS.TypeID AND IF2.Status = FS.Staging)
The not exists is probably the best. You may also want to try an outer join with a check for nonexistance to see if it's faster. Should be much the same
INSERT INTO IF2(TypeId, FPath, FileName, Status, InDatetime)
SELECT TypeID, fPath, FileName, Status, getdate()
FROM #FileStaging FS LEFT OUTER JOIN IF2 ON IF2.FileName = FS.FileName AND IF2.TypeID = FS.TypeID AND IF2.Status = FS.Staging
WHERE IF2.FileName IS NULL
I would also recommend an index on IF2 (if one doesn't exist) on FileName, TypeID and Status
Is there any chance of duplicates in the dumped file?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply