Data compression in a field - is it possible?

  • I have a laboratory instrument testing about 500 samples of product per day. Each test produces a text file report of about 6.5k with a summary of the data (which I'll parse and put into individual table columns) and the data points measured. These are in the form

    0.440,0.5|0.460,0.5|0.480,0.5|0.500,0.6|0.520,0.6|0.540,0.6|0.560,0.3, etc., about 6k.

    This works out at about 1GB per year - manageable on its own but I have a lot of other stuff getting logged too and I want to keep below IT radar.

    Does anyone know of any tricks to give lossless compression of this data before insertion into the DB? There are only the characters 0123456789,.| - 13 in all so compression could be quite high. I'll be coding in VBScript so a simple approach would be appreciated.

    Thanks.

  • db-1061069 (7/22/2011)


    I have a laboratory instrument testing about 500 samples of product per day. Each test produces a text file report of about 6.5k with a summary of the data (which I'll parse and put into individual table columns) and the data points measured. These are in the form

    0.440,0.5|0.460,0.5|0.480,0.5|0.500,0.6|0.520,0.6|0.540,0.6|0.560,0.3, etc., about 6k.

    This works out at about 1GB per year - manageable on its own but I have a lot of other stuff getting logged too and I want to keep below IT radar.

    Does anyone know of any tricks to give lossless compression of this data before insertion into the DB? There are only the characters 0123456789,.| - 13 in all so compression could be quite high. I'll be coding in VBScript so a simple approach would be appreciated.

    Thanks.

    I would consider the bolded statement to be quite alarming.

    An option would be filestreaming and use compression on disk. But to compress a single column otherwise - no. You could compress the entire table though without using filestreaming.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • You can create a CLR function to compress and decompress the data and call the function to store and retrieve data.

    -------------------------------------------------
    Will C,
    MCITP 2008 Database Admin, Developer

  • SQLRNNR, don't worry. I'm a white-hat and not a black-hat. IT have their own worries, one of which is total amount of data storage. If my DB requirements get too big I'll have to battle to keep the data.

    Ricer, I wasn't aware of CLR. I'll look into it.

    Thanks.

  • Is it actually necesary to keep the data "online"? Or, can you get away with "offline but available"?

    SQL Server can load information into a work table very, very quickly especially if the data is in "good shape" as I imagine the test equipment produces. If you zip the files (I'd expect about 80% or more compression for what you stated) and only unzip them and load them when needed, you probably wouldn't ever have IT bug you about space requirements.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good point, Jeff. I'll consider that too.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply