What is the best way to store PDF files in db?

  • I am researching storing PDF file in our MS SQL Server database, but can't really find any help on the topic.  Can anyone point me in the right direction?  Should we use binary, blob, other?  Any guidance would be appreciated.

  • G'day,

    My opinion is that you do not really want to store the PDF in a DB.  At the risk of arguing semantics, you probably want to store either the contents of the PDF, or the physical location of the PDF.  Which one you choose to store depends on what you are going to do with the PDF. 

    If you are simply going to echo the PDf back to someone when requested, you are better off storing the physical location in a normal varchar field, and storing the PDF in a file system.

    If you are going to allow update of the content then you shuld look at the TEXT data type.  Update implies generating a different PDF after the changes have been made.  This introduces another set of issues.

    Assuming you want to support search of the PDF contents, you may want to consider a subset of the actual text.  I do not have any experience in the area of selecting appropriate subsets for indexing purposes.  I am sure there are others on the forums who do have that experience.

    I could continue speculating all afternoon, but it probably makes sense to just ask the question: What are you going to do with the PDF?

    Hope this helps

    Wayne

  • Thanks for the input Wayne - we will generate once and then just make it available to the public.  I have considered storing just the path to the file, which could be stored on our server - but the developer doesn't like that.  But, it may be the best solution - thanks for you input!

  • You'll want to use the IMAGE datatype.

    Note:  We have an on-going project where we are either creating or uploading pdf's.  Prior to this project starting we conducted research as to the pro's/con's of storing files in the database or filesystem.  All in all it was about a 50/50 split.  The deciding factor was that our system requires merge replication and we didn't have the means to replicate files through the file system, so we decided to keep the files in the database.  When we started uploading files larger than 90 MB it was beginning to cause us problems.  At about the same time we began using the AVAIL software and decided to just store the files back on the file system.  Of course, we had to run a conversion out of the database to the file system and change a lot of front-end code to accomodate this structure change.  I can say I have experience both methods for storing the files, so here are a few of my pro's/con's for you to consider.

    PRO'S

    1.  The files are probably most secure in the database.

    2.  Although the database backup takes longer(becuase of size), it's nice having 1 backup.  Restoring the data is simpler.

    CON'S

    1.  It's a real pain restoring the database and then having the server guys restore the files.  If you don't write a custom application for restoring both at the same time, you're in for a world of hurt because the files must match the database data.  Another point is when you restore to a different environment, such as, DEV or QA you have to copy all of those files across your network which could take a lot of time.  When restoring to DEV/QA, we had to do a work around because we have 100's of GB's of files, so now we just copy the database and update the file-pointers to 1 bogus file.

    CONSIDERATIONS:

    Id recommend, if you're going to store the files in the db, to create a seperate data filegroup(s) on a seperate LUN(s).  Put the table which stores the files on this filegroup(s).  In addition, this table should contain only the bare minimum columns.  Perhaps only 3 columns (1. TABLEpk, 2. IMAGE, 3.  FKcolumn from the parent table)

    Good luck

  • Another thought (since we just went through this Painful process) is to budget a HUGE amount of space for this.  For example, our 150GB database had one table (that contained our *.pdf files as IMAGE) whose size was 50GB after gathering data for one year.

    Just a nickle's worth of FREE advise: Plan ahead for this potential soring growth!!

    BTW - we ended up taking the *.pdf files out of the database and only leaving pointers to their location on a SHARE using NTFS security.

    Good luck!

  • Another thought regarding PDF storage. We went through the same debate two years ago and decided to store the PDFs in a file system with the physical location stored in the database (our system has about a 100,000 PDF files). We have an administrative interface and a web based end user interface. The web based interface uses a mapped drive to the PDFs. The system has worked fine, but whenever we restart the file server, the web system loses its mapped drive. We had to write a script (for the web server) that continually ensures the mapped drive is active.

  • I do similar in several of my systems but use unc to access the pdf's from my web servers, do not have connection problems with server reloads.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I couldn't think of a more terrible way to access the data, mapped drives do not constitute a best practice.   Mapped drives are for human consumption not computer to computer processing.   If you are going to store on a network drive, which is fine, do so with UNC path names.

  • in addition to using a UNC path name, set up an alias in your DNS so that if you hvae to move it you don't have to update the database with the new path.  this lets you store documents on multiple servers if need be as well.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply