Excel blobs & BCP extract commands...

  • Hi everyone, I have been asked to investigate storing Excel files in Sql Server as a BLOB. These files could be of any size and I will not know what headers they will have beforehand. They could have multiple sheets per Excel. We may need to re-constitute the file/s promptly at any stage.

    I haven't used blobs before so in an effort to get my head around this I created a test database called: "Scratch" and a table inside it called: "TestBlob". Here is the table structure:

    CREATE TABLE dbo.TestBlob -- Table where BLOB will be stored 
    (
    tbId int IDENTITY(1,1) NOT NULL,
    tbName varchar (50) NULL,
    tbDesc varchar (100) NULL,
    tbBin varbinary (max) NULL
    )

    I was able to load a simple Excel file (literally 1 column)

    Insert TestBlob(tbName, tbDesc, tbBin) Select 'C:\Test_Excel_Files\simple1.xlsx','Files',
    BulkColumn from Openrowset( Bulk 'C:\Test_Excel_Files\simple 1.xlsx', Single_Blob) as tb

    Extracting it has proved to be more problematic. I used a BCP command I found online and modified acordingly. It needs a format file also (which I called simple.fmt).

    Here is the code I used:

    DECLARE @sqlExtract VARCHAR(1000);
    SET @sqlExtract = 'bcp "SELECT tbBin FROM Scratch.dbo.TestBlob where tbId = 1 " QUERYOUT C:\BlobTestOut\simple1Out.xlsx -T -f "C:\BlobTestOut\Formatfile\simple1.fmt" -S ' + @@SERVERNAME;
    EXEC master.dbo.xp_cmdshell @sqlExtract;

    The format file is below:

    <?xml version="1.0"?> 
    <BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <RECORD>
    <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="7"/>
    </RECORD>
    <ROW>
    <COLUMN SOURCE="1" NAME="Salary" xsi:type="SQLSMALLINT"/>
    </ROW>
    </BCPFORMAT>

    It produced an output but when I try to open it I get the error message: "Excel cannot open the file because the file format or file extension is not valid"

    I would appreciate if you could indicate where I am going wrong or perhaps a better approach. I am conscious of the fact that in the future I will not know the header format of the Excel files so a format file approach could be useless and hence is there a better extract command?

    Comments/suggestions/guidance greatly appreciated...

    J.

  • Are you looking to store the entire file as a single BLOB, or import its contents as individual columns?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • thats an invalid format file for what you need.

    see  https://jahaines.blogspot.com/2009/10/exporting-binary-files-to-file-system.html for a full example of what you can do - and you should likely reconsider using bcp for this - a client tool likely should be the one extracting and saving the file, not the server

     

  • Did you investigate Filestream or FileTable options ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Folks, thanks for the feedback and apologies for the delay. I managed to get the the BLOB technique working - the issue was the format of the .fmt file. Thank a million also for the advice in regards the alternative approaches ie a client tool or Filestream/FileTable.

    Let me provide you with some background to the issue as it may assist. I have been asked to investigate storing Excel files either on the server or an untouchable fileshare. The objective is to have a copy of the file so we can use it to restore or issue resolve if it becomes necessary, which is more likely scenario...

    I did an analysis of the previous files loaded over the past 10 years. Average file size: is 594KB. So not super big.

    Number of files and total yearly file weight is more or less going up ie for the years: 2019 to 2023 I discovered the following:

    Number of files loaded: 2256, 2018, 2278, 2199, 2426

    Total yearly weight: ~1.2GB, ~435MB, ~582MB, 1.7GB, 2GB

    For this year so far we are at 918GB so will on course to hit about 2GB by year end.

    I think these numbers rule out the blob approach for *permanent* storage of Excel files and the total number of files & their net weight will increase over time.

    To be honest, I liked the blob technique and from what I saw of the restored Excel files its a faithful facsimile of the original (and was speedy ie I tested over a several examples). Hence, I was thinking of using this technique to temporarily store the excel in the table and then after EOB (say every day) restore elsewhere on a fileshare for a permanent record. Then scrub down the table at periodic intervals. Hence the table will never really grow very large. Just a thought...

    I will however investigate the other approaches...let me know if you have further advice in light of what I mentioned...

    Thanks again, J.

  • Quick question; why store the Excel files as BLOB?

    😎

    There are several resilient options for file storage, most of which will provide better visibility/usability of the content.

  • No reason other than it was introduced to me first, so I got a primitive prototype working but as I did I learned about its limitations. I'm leaning away from it now Eirikur and learning about the FileStream method mentioned earlier. Would you like to suggest an alternative? Any alternative much appreciated...

    J

  • Happy to help, now let's dig into the problem you need to solve!

    😎

    Could you please list the high-level requirements in a handful of bullet points?

  • Requirements are fairly loose Eirkur. Investigate the storing of Excel documents either in Sql Server or a fileshare (on a different server to the Sql one), whichever is best.

    I did a historic analysis and we are loading about 2,000 documents per year with total current size of about 2GB. The trend is upward.

    That's about it really. I think I have ruled out Blobs & am currently investigating Filestream. Let me know if you have any other suggestions & I will dig into it....

    Cheers, J.

  • jellybean wrote:

    Requirements are fairly loose Eirkur. Investigate the storing of Excel documents either in Sql Server or a fileshare (on a different server to the Sql one), whichever is best. I did a historic analysis and we are loading about 2,000 documents per year with total current size of about 2GB. The trend is upward. That's about it really. I think I have ruled out Blobs & am currently investigating Filestream. Let me know if you have any other suggestions & I will dig into it....

    Cheers, J.

    OK, so those are the storage requirements, but what are the ongoing requirements regarding workbooks that have already been stored? How will they be accessed and used?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • The previous files have been stored on the the Sql server itself in an accessible share available to the uers.

    I'm not sure what will happen to them TBH. I suspect that they will be relocated to a external fileshare to get them off the Server and a line drawn under them so to speak.

    Going forward from that point onward, basically we want to allow the users to upload a document from their work PCs and we will make a faithful copy of the file uploaded in case its needed at a later stage. What technique that uses I am investigating...ie blobs, Filestream or fileshare...etc...

    I hope that helps...J.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply