Determine doctype of raw binnary data

  • Hello Experts,

    a coworker asked me how to convert raw binary data for for readability. He has a table that contain raw binary data it looks like this:

    0x020000000500000003000000FFF67F3F21A734430264803FD72F803F010000000DFEFF3EAFA640431FF6003F217E003F0100...

    I used bcp to export it out to file using various file format, however the data is not readable. My question is...is it possible to determine which doc type (txt,jpg,pdf, etc.) based on this binary data, or whatever mean to make the data human readable? Eventually he may have to go back and ask the client how to decode the data, but in the meantime I wonder if there is anything we can do about it?

    I suspect the data is encrypted.

    Thanks in advance!

  • Is it a file stored by a client or is he perchance looking at ssis or ssrs binary stored in the database?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • That looks a lot like a byte array. This is the typical way of storing binary type data in the database. I don't think there is a way to determine what file type it is based on the information. We have a table where we can store literally any possible file type. The way we deal with it is to capture the file name along with the contents.

    We have another system that stores nothing but pdfs. The data in both these looks just like what you posted. Depending on what this is used for you might be to make some guesses at the file type.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • in a similar post a couple of years ago, i did a TSQL example that looked at the first x bytes and compared them to known values to find gif/jpg/bmp/png files;

    see this post here, and the rest of the thread too:

    http://www.sqlservercentral.com/Forums/FindPost1179452.aspx

    you'd have to expand the mapping, and find what doc,docx,pdf,rtf,xls,xlsx, etc all start with.

    the right thing would have been to store the original filename witht eh varbinary, of course, but you could reverse engineer about 99% of them i bet.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (3/4/2014)


    in a similar post a couple of years ago, i did a TSQL example that looked at the first x bytes and compared them to known values to find gif/jpg/bmp/png files;

    see this post here, and the rest of the thread too:

    http://www.sqlservercentral.com/Forums/FindPost1179452.aspx

    you'd have to expand the mapping, and find what doc,docx,pdf,rtf,xls,xlsx, etc all start with.

    the right thing would have been to store the original filename with the varbinary, of course, but you could reverse engineer about 99% of them i bet.

    Cool stuff Lowell.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Is it a file stored by a client or is he perchance looking at ssis or ssrs binary stored in the database?

    I heard it is stored by the clients application. I think Sean was right it may be byte array

    Thank you guy for your advise, I just want to make sure I didn't miss out chance to know a method to deal with this stuff, otherwise it's not worth to go extra miles to find it out.

    Thanks again!

  • Lowell (3/4/2014)


    in a similar post a couple of years ago, i did a TSQL example that looked at the first x bytes and compared them to known values to find gif/jpg/bmp/png files;

    see this post here, and the rest of the thread too:

    http://www.sqlservercentral.com/Forums/FindPost1179452.aspx

    you'd have to expand the mapping, and find what doc,docx,pdf,rtf,xls,xlsx, etc all start with.

    the right thing would have been to store the original filename witht eh varbinary, of course, but you could reverse engineer about 99% of them i bet.

    Sweet. I was thinking we could possibly reverse engineer the first part of the file comparing to known values since the various file types do all appear to have very similar info at the beginning of the byte array.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (3/4/2014)


    Lowell (3/4/2014)


    in a similar post a couple of years ago, i did a TSQL example that looked at the first x bytes and compared them to known values to find gif/jpg/bmp/png files;

    see this post here, and the rest of the thread too:

    http://www.sqlservercentral.com/Forums/FindPost1179452.aspx

    you'd have to expand the mapping, and find what doc,docx,pdf,rtf,xls,xlsx, etc all start with.

    the right thing would have been to store the original filename witht eh varbinary, of course, but you could reverse engineer about 99% of them i bet.

    Sweet. I was thinking we could possibly reverse engineer the first part of the file comparing to known values since the various file types do all appear to have very similar info at the beginning of the byte array.

    Googling "file signatures", "magic numbers" etc brings up some pages.

    http://en.wikipedia.org/wiki/List_of_file_signatures

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply