January 5, 2009 at 8:27 am
Hello,
I am developing a System for a large insurance company in Sweden that will replace a Documentum
system.
I am far from ready with the implementation but I have done alot of testing with concurrent users and speed of writing/reading files to SQL 2008 with Filestream.
It is really really fast but since I put files in same NTFSfolder and I know for a fact that millions of files in same folder always is a bad thing (especially in NTFS even though it have support for +4Billion files per Volume).
So what strategy should I use when I know that there will be around 1Million files (200k-3Mb sizes) saved each year?
Best regards
Jonas
January 5, 2009 at 9:30 am
I wonder if it's just one folder or if SQL will manage to create subfolders to balance the load. It is really designed to integrate and work well with large document management systems like those from EMC and Hitachi, so it's an interesting question.
I'll look around.
January 5, 2009 at 9:55 am
I had the great fortune, while at PASS, of talking to one of the project engineers for the FILESTREAM product. I asked him about that issue, given how I've seen massive performance degradation on folders with a large number of files. He gave me the lowdown on how keep performance high on drives with a large number of files.
Removing the support for short (8.3)filenames and removing the last access update on files makes an enormous difference when it comes to > 300k files.
fsutil behavior set disable8dot3 1 (disables 8.3 naming convention)
fsutil behavior set disablelastaccess 1 (disables checking on the last access time)
You can also create multiple filegroups and use a partition function to put files in different locations which would reduce the number of documents in a given folder.
Check out the excellent filestream white paper at http://msdn.microsoft.com/en-us/library/cc949109.aspx
January 5, 2009 at 11:25 am
I was just going to post that stuff and Nicholas beat me to it. Thanks, Nicholas.
The 8.3 thing seems to be a biggy and then filegroups might be a good second solution. If you have Enterprise edition, I'd look at partitioning instead of filegroups.
January 5, 2009 at 11:53 am
Hello,
thanks alot for that information, I had already read the whitepapers but that was in the beginning of my development with FS, it was really great to read through it again now when I have learned more about it in trial and #ยค%"ยค% way. ๐
Im not really getting how the partition is meant to work in filestream schemas, because it seems like I also need to create a new filegroup for each partition in that case to make it efficient?
I plan to put out my whole project on CodeProject once it is ready but there are a couple of months left before Im there.
In the database I will have one table with native XML that hold all Associations and attributes(metadata) on the files(it will also keep a higher folder level, so each folder have 0-n folders and files).
With XSL transformations against UDFs I then get the exact XML I want to present for each .NET Treeview.
My XML is really big on some customers (row), one row is one customer, each customer have 1-10000 incurances(folders) each insurance have 0-100 000 claims (subfolders) and each claim have one or several Filestream objects ( that map against a table with filestreamobject.
I think it will be a great application and I am really happy for the help I have received.
Even though some XML is +5MB (100k rows) once I have it in clientmemory it only takes a couple of ms to find a correct FileGUID and read the File from file table (and my XSL will never let me get that many rows from SP even if I wanted to ๐ ).
Best regards
Jonas
January 5, 2009 at 12:22 pm
That sounds awesome, even though you lost me after the first paragrpah :w00t:
January 7, 2009 at 3:42 am
hehe
Im sorry, English isnt my native language so Im that good in explaining I guess ๐
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply