Can SSIS Check for Viruses?

  • Apologies if it is a silly question.

    We periodically receive an Excel file from a customer. There is now a requirement to store it as a BLOB object in SQL Server for the reference.

    I would use Exchange Server for it: ask the customer to email the file to the Exchange Server and SSIS - to listen to and pick the file from there, but there is no Exchange Server our SSIS can see. The physical location of the Excel file, before it will be picked up by SSIS, is scanned, but is it enough? Are there good practice rules on how to implement it, please?

  • Does the antivirus software you use have a CLI?

    You could write a script task in C#, VB.Net to interact with your AV and call it to scan the file in question before importing to the DB.

  • Ant-Green wrote:

    Does the antivirus software you use have a CLI?

    You could write a script task in C#, VB.Net to interact with your AV and call it to scan the file in question before importing to the DB.

    +1, good answer. There is nothing built in to SSIS to help you achieve this natively.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Ant-Green wrote:

    Does the antivirus software you use have a CLI?

    You could write a script task in C#, VB.Net to interact with your AV and call it to scan the file in question before importing to the DB.

    very good point. I will check. Thank you.

  • Does anyone realize that if the docs are stored on the file system as opposed to a blob in a table, the need to scan them is then part of the OS, not the database?

     

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John wrote:

    Does anyone realize that if the docs are stored on the file system as opposed to a blob in a table, the need to scan them is then part of the OS, not the database?

    +1000

    And, they can be stored on less expensive storage, to boot.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Michael L John wrote:

    Does anyone realize that if the docs are stored on the file system as opposed to a blob in a table, the need to scan them is then part of the OS, not the database?

    Thank you, Michael.

    I did realised hence asked the question: "The physical location of the Excel file, before it will be picked up by SSIS, is scanned, but is it enough? " as my knowledge of anti-virus tools is limited to clicking next-next-next whilst installing it on my personal device.

  • BOR15K wrote:

    Michael L John wrote:

    Does anyone realize that if the docs are stored on the file system as opposed to a blob in a table, the need to scan them is then part of the OS, not the database?

    Thank you, Michael.

    I did realised hence asked the question: "The physical location of the Excel file, before it will be picked up by SSIS, is scanned, but is it enough? " as my knowledge of anti-virus tools is limited to clicking next-next-next whilst installing it on my personal device.

    Huh?

    Calling the virus scan thru the API to scan a file would be the same process and what the anti-virus program would do when a file is copied or modified on the file system.  I'm assuming that you can execute a "deep scan" of the file thru the API, similar to what a full system scan would do on a server.  Without any information on the anti-virus you have in place, I would not be surprised that there is no option to do a "deep scan" thru the API

    If the source of these files is the public internet, then I think that a much more robust process would need to be implemented.  I would think that the files would be uploaded to a "holding area" on the file system that is isolated from the rest of the network.  When a new file is placed in that area, an automated process should then do the deep scan, and if it passes that, the file is moved to it's final resting place.

    If the source of these files are your clients, then I would think that there are some requirements for these files, as well as a some sort of legal requirement that the clients agree to.  In the absence of that, I would take the same steps as a public document.

    An SSIS package is not the place to trigger a virus scan on a file.  This definitely falls into the category of "You can do it, but should you do it".

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • You can use Microsoft Graph for Online data access to access and pull attachments out of exchange using PowerShell as a job step in the ssis package. That way you will get the malware filtering from exchange.

    The last time I looked at using the new api a few years ago it was very poorly documented and I chose to continue on with exchange web services until it was either someone else's problem or the documentation improved. It is now someone else's problem.

    If you can avoid storing the blobs in SQL at all, I would. you could implement compliance hold on the exchange mailbox then merely export the data and some sort of file reference that points back to the original in the mailbox. This would achieve better attestation to the integrity of the data anyways as it would keep the file exactly as it was as it was sent to the mailbox. An excel file stored as blob in SQL server could not attest to that as there would be no way to prove it was not modified before it was loaded.

    I used the importExcel powershell module in the past to deal with excel data. you can even use it to load the raw data into a table in SQL if you want. Normalizing the data in the columns to uniform, unambiguous formats before it is out of excel helps the ETL, especially if you have an excel file that has multiple presentation formats within a single column.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply