March 2, 2020 at 2:57 pm
This isn't strictly a SQL Server question, but there are some pretty sharp people on here, and I'm probably not the only one with a problem like this.
I have a database that has associated files in a folder/subfolder structure on the server – Windows Server Standard, 2007, SP2. The associated files are mostly images, over 60,000 of them, in around 50 folders and a few subfolders.
The association with the database is strictly by name – record number E4327ac is associated with any file that has E4327ac as the leftmost part of the filename. When a record was called, a stored procedure ran a PowerShell command, (formerly a VBScript command) to get all appropriate files, and the list of associated files was displayed automatically.
This worked fine initially, when the number of associated files was small. When a few years ago, the number got large and the SP got slow, I made the display of the associated file list 'on demand' via a command button in the application. This solved the problem of every record being slow, but it was still slow for the ones being displayed manually, and prevented the ability to select records by the presence or absence of associated files, something that the users wanted. And it kept getting slower, as the number of files continuously grew.
My next step was to create a SQL Server table of filenames, and instead of an SP that calls a PowerShell command, I join directly to this table. This gives me both instant response time and filtering ability, so the users are happy. I run a stored procedure to update the table when the user manipulates files via the database UI, and once per week to keep it current, since the primary photographer does her work in batches and loads her images directly onto the server, rather than going through the database UI. There are also buttons in the UI to allow manual updating of all associated files for a record, a record group, or the entire database, if a user thinks there are new files that are not being shown in the database's file list.
This is mostly satisfactory, but I am not entirely happy with the disconnect from the actual file/folder structure. I tried activating the Windows file indexing service, to see if that would make the PowerShell 'DIR' command any faster. I don't know if it made any difference – I didn't conduct any timing tests, but if it did, the difference was not noticeable. It is still on the order of minutes – trivial for a procedure that runs once per week as a batch in the middle of Sunday night, but not at all practical for the 'instant' response I need in the UI.
Is there some method for caching the file structure, or otherwise setting things up so that SQL Server can get the list quickly AND have it be up-to-date at all times?
March 2, 2020 at 4:03 pm
I can only think of 2 options
MVDBA
March 2, 2020 at 4:19 pm
March 2, 2020 at 4:39 pm
filestream is for storage of files within a filegroup on the database, maybe it would remove your need to get the filenames. the files are contained in the database.
we use it to keep the files linked to the data and backed up at the same time (no difference between our artwork and customer data due to time difference in backups on different servers)
it's pretty fast, but I'm not 100% sold - still , it's better than storing files in a database as varbinary
MVDBA
March 2, 2020 at 5:03 pm
Hm, okay - that's an interesting idea.
This was originally an Access database (actually, it was initially a series of Excel spreadsheets, before I got involved), so packing this much attachment data into the database would have been impossible there. I've also read various opinions on this sort of approach, mostly from people saying it's something to go at very carefully.
I currently have over 220 GB worth of such files. Loading that into a database that is currently less than 200 MB is going to have some impact. At the very least, my backup time and storage requirements will go through the roof, unless I split the table containing the files out to a different file group and have separate backup procedures for each.
Also, displaying such a file would mean reading it out of the database and storing it to a temp file, no? I don't have viewing code in the application - I just shell to a DOS run command and let the user's OS figure out what to do with it. But that only works for a file on disk somewhere, not an image of a file in a SQL Server table.
If you've got this implemented, how well does it work for you? How many images and records do you have, and how do you deal with the issues I mentioned? Did you convert to this from something else, or was it built that way from scratch?
This sounds intriguing, but to be frank, it also sounds like a huge amount of work. I would also have to implement some sort of mass import mechanism for the photographer. Forcing her to use the current UI, which only handles file(s) for one record at a time would be a huge inconvenience for her. A mass export function would also be necessary. Users occasionally copy multiple selected image files from the server storage to their own computers, because they need them for an article they are writing, or an exhibition they are preparing.
Do you have any links to articles that detail how to implement this method? And do you think it's worth the effort to try it?
March 2, 2020 at 5:15 pm
It is a big amount of work, but we also came from an Access system and eveolved - as for the backups, well yes your database will get bigger, but your fileshare will be smaller - I bet with backup compression it might be faster and smaller
as for splitting the filegroups - it's a 100% must do as they are different filegroup types
as for quantity we use it for digital signage (the screens you see in mcdonalds, starbucks etc) for the past 3 years everything that gets printed to go on a tray, every 3 for 2 offer we print - maybe 900GB just in compressed artwork and videos - plus all the stuff we store in Amazon S3
MVDBA
March 2, 2020 at 7:44 pm
Filestream/filetable don't necessarily mean changes for your client. This tech is designed to somewhat sync the filesystem with the db, include the binaries (files) in backups, and prevent sync issues.
Filetable gives you a SMB file share for access to the files, which is much simpler and easier for most clients and apps (like web apps) to work with. You could still see the file names inside SQL Server.
I've worked with systems like yours that were not synced, and we typically had a process run to "sync" the file system and db on a regular basis. Problems were manually handled, but we then had all queries in the db, not db+filesystem.
When I did this with imaging systems, we often had over 100,000 image files stored.
March 2, 2020 at 8:09 pm
Thank you, Steve. I just looked at some intro tutorials. It doesn't seem as bad as I though at first. I enabled Filestream in the DB, according to instructions on Microsoft Docs. I'm now picking through subsequent pages on how to set up Filestream tables - so far, it looks reasonable. I've actually got three sets of file attachment constructs in this database, although only one is large - the other two are less than 100 files each. I'll try it with one of the small sets first and see how well I can handle it. If it goes well, I'll do it for the others.
March 2, 2020 at 8:26 pm
This might also help you. It's free (for an email): https://www.sqlservercentral.com/books/the-art-of-sql-server-filestream-by-jacob-sebastian-and-sven-aelterman
March 2, 2020 at 10:00 pm
Looks great. Many thanks to both you and Mike - I had no idea this feature even existed (SQL Server is kind of overwhelming sometimes). This takes care of my reading for the next few days.
March 3, 2020 at 9:00 am
just keep going - when you are done, maybe write it up and post it to steve… I like a good refresher course on things I haven't done for a while
MVDBA
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply