SQL Server Encrypt data into a file, send it and then decrypt

  • First off, my apologies for what could potentially be a bad title!

    I am looking for some general information.

    I may have a potential project where I need to pull data from a SQL Server, populate a parquet file and then SFTP this parquet file to the end user.  Seems simple enough.  The caveat here is that some of the SQL server data needs to be encrypted, placed in the file and then sent via SFTP.  Once that file is with the consumer, they will of course need to decrypt those data elements to make them usable within their system.

    Does anyone have some good info on an overall process to complete this task?  I have been googling and reading up on this, but I dont feel like I am reading up on a complete solution to this.

    Any information will be greatly appreciated.

     

    Thank you!

  • I suspect that encrypting the entire file after it has been created would be an easier process. Is that a possibility?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil,

    At this point everything is on the table!  Any info you can give on this would be appreciated.  We would be using ADF for this work as well.  It should be worth noting that we will be sending this file via an SFTP server, which could render all of this talk worthless.

    Thank you

  • GBeezy wrote:

    Phil,

    At this point everything is on the table!  Any info you can give on this would be appreciated.  We would be using ADF for this work as well.  It should be worth noting that we will be sending this file via an SFTP server, which could render all of this talk worthless.

    Thank you

    I have not worked with Parquet files, so I cannot help with specifics, sorry.

    If you were considering using straight FTP rather than SFTP, and that is the underlying reason for your question, I'd suggest you take FTP out of the equation. Don't even consider it as a possibility.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • i would suggest two things here. python can grab data into a dataframe, and then export out parquet files.

    once the file is created, use GnuPG to create an encrypted version of the file.

    then you give the client the public key so they can decrypt the file, and SFTP as expected.

    below is some code i created when i was testing python to all sorts of formats, including export to parquet.

    ###################################################################################################
    ## Python SQL Server Table to Parquet File using Pandas
    ###################################################################################################
    #####--prerequisites for this script--#############################################################
    ## # pip install --upgrade pip
    ## # pip install pandas
    ## # pip install pyodbc
    ## # pip install fastparquet
    ## # pip install openpyxl
    ## # pip install pyarrow
    ###################################################################################################
    ## Get-Data
    ###################################################################################################
    import pyodbc
    import pandas as pd
    import os
    import openpyxl
    # Some other example server values are
    # server = 'localhost\sqlexpress' # for a named instance
    # server = 'myserver,port' # to specify an alternate port
    basedirectory = "D:\Bak\Data\GitHub\Workspace\Python\Output"
    tablename = '[dbo].[vwServerInventoryPatches]'
    query = "SELECT * FROM " + tablename + ";"
    server = 'dbaserver.sqlservercentral.com,1433'
    database = 'DBAResources'
    UseSQLLogin = False
    username = 'sql-lowell'
    password = 'MyRealPasswordForTheDemo#312'
    if (UseSQLLogin):
    cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE='+database+';UID='+username+';PWD='+ password)
    else:
    cnxn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + database + ';')

    cursor = cnxn.cursor()
    # select 26 rows from SQL table to insert in dataframe.

    df = pd.read_sql(query, cnxn)
    ## print(df.head(26))


    ###################################################################################################
    ## save Data to Parquet
    ###################################################################################################
    parquetfilename = os.path.join(basedirectory,tablename + ".parquet")
    if os.path.exists(parquetfilename): os.remove(parquetfilename)
    df.to_parquet(parquetfilename, engine='pyarrow')
    # df.to_parquet(parquetfilename, engine='fastparquet')

    ###################################################################################################
    ## Save Data To excel
    ###################################################################################################
    csvfilename = os.path.join(basedirectory,tablename + ".csv")
    if os.path.exists(csvfilename): os.remove(csvfilename)
    df.to_csv(csvfilename)
    ###################################################################################################
    ## Save Data To excel
    ###################################################################################################
    excelfilename = os.path.join(basedirectory,tablename + ".xlsx")
    if os.path.exists(excelfilename): os.remove(excelfilename)
    df.to_excel(excelfilename,sheet_name='Results')
    ##append?
    #with pd.ExcelWriter(excelfilename, mode='a') as writer: df.to_excel(writer, sheet_name='Results')
    ###################################################################################################
    ## save Data to Flat File
    ###################################################################################################
    tabfilename = os.path.join(basedirectory,tablename + ".tsv")
    if os.path.exists(tabfilename): os.remove(tabfilename)
    df.to_csv(tabfilename, sep ='\t')
    ###################################################################################################
    ## Save Data To csv and then zip
    ###################################################################################################
    zipfilename = os.path.join(basedirectory,tablename + ".zip")
    if os.path.exists(zipfilename): os.remove(zipfilename)
    compression_opts = dict(method='zip', archive_name=tablename + ".csv")
    df.to_csv(zipfilename, index=False, compression=compression_opts)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Are you looking to encrypt the parquet file or just some data inside? meaning, do you trust everyone that might get the parquet file or only some?

    If the former, then use what Lowell posted above. If you want to encrypt some of the data, I'd use some sort of data encryption. This is column level encryption in SQL Server, though you can encrypt some data in a query and pass along the key to users. I think you'd likely use a cert-based asymmetric encryption here. You could also use Python to query and encrypt certain data in a dataframe and not other data. Note, this is then putting binary data in your parquet file. I'm not sure how someone then decrypts that on the other side, but likely also with some app that reads the file and can apply a key and encryption function to the data.

     

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply