Extracting Documents (.pdf, .doc, .xls, etc) stored in image type fields in multiple rows

  • Hi,

    We have a legacy database in SqlSERVER 2005 where documents of all types have been stored. We no longer have access to the original gui but we would like to extract the documents stored. It appears that if they are very large, they are broken up into multiple rows. The table that they are stored in has the following fields:

    id (PK),

    name (varchar, this is the file name of the original file including the .ext),

    content (image),

    previous_id (int)

    an example of a few rows:

    ID name content, previous_id

    1 doc1.doc xxxxxx -1

    2 pdf1.pdf xxxxxx -1

    3 pdf1.pdf xxxxxx 2

    4 pdf1.pdf xxxxxx 3

    5 xls1.xls xxxxxx -1

    6 vsd1.vsd xxxxxx -1

    7 vsd1.vsd xxxxxx 6

    My problem is this: We would like to extract the files and save them to the file system

    We have been able to successfully extract the files using a ssis package if they are contained in only one row but we can't figure out how to put the multiple rows back together. So from my example, doc1.doc is easy. We can extract that fine but pdf1.pdf is split up into 3 rows so this document can't be extracted.

    Has anyone ever encountered this and how do you concatenate the rows into one file? I've searched and searched google, this forum and many others and can't find a similar situation.

    Any help or steer into the right direction would be greatly appreciated!

    Thanks--Nancy

  • Nancy you posted in a SQL 200 forum, so I'm assuming a lot here; i'm guessing the data is chopped up into VARCHAR(8000) columns, instead of being stored in an IMAGE or TEXT datatypes?

    can you show the CREATE TABLE definition of the table?

    will you be ding this in SQL 2005 or above(lets us know what tools we can use, Like CLR and more)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • This might help.

    Without sample data and DDL it's the best I can offer.

    SELECT ID, name, content, previous_id

    INTO #yourTable

    FROM (SELECT 1, 'doc1.doc', 'abc', -1

    UNION ALL SELECT 2, 'pdf1.pdf', 'abc', -1

    UNION ALL SELECT 3, 'pdf1.pdf', 'def', 2

    UNION ALL SELECT 4, 'pdf1.pdf', 'ghi', 3

    UNION ALL SELECT 5, 'xls1.xls', 'abc', -1

    UNION ALL SELECT 6, 'vsd1.vsd', 'abc', -1

    UNION ALL SELECT 7, 'vsd1.vsd', 'def', 6) a (ID, name, content, previous_id)

    ;WITH CTE AS (

    SELECT ID, name, CONVERT(VARCHAR(MAX),content) AS content, previous_id

    FROM #yourTable

    WHERE previous_id = -1

    UNION ALL

    SELECT t1.ID, t2.name, t2.content+t1.content, t1.previous_id

    FROM #yourTable t1

    INNER JOIN CTE t2 ON t2.ID = t1.previous_id)

    SELECT MIN(ID) AS ID, name, MAX(content) AS content

    FROM CTE

    GROUP BY name

    Lowell (12/1/2011)


    Nancy you posted in a SQL 200 forum, so I'm assuming a lot here; i'm guessing the data is chopped up into VARCHAR(8000) columns, instead of being stored in an IMAGE or TEXT datatypes?

    Yep, but at the top of her post she stated SQL Server 2005, so I'm assuming that's what we've got to work with.

    If not, then my suggestion can be scrapped. 😀


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre pretty much what i was thinking, unless she doesn't have access to 2005+, which means no varchar(max) or CTE's...it's all textpointers and stuff if she's locked into 2000, not much fun.

    --Edit-- somehow i missed that she posted in she has 2005, mea culpa.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • From your description I think you must be in sql 2005? You say you have a column of type image so we are all assuming at least 2005. From the few sample rows it appears that maybe you have versions of each file and the version ordering is handled by previous_id???

    If you want to save all these to the file system I would put together a quick utility in <insert your favorite programming language here>. Being my preference I would do this in .net console app. It should be pretty quick and painless to write the app. Depending on the size of your table the execution could take quite some time though.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Hi All! It is SQL 2005. I thought I had posted this there. Sorry about that.

  • Unfortunately, you're going to need to do as mentioned, create a quick software in .NET (and friends) that can create a filestream, and append each row in order and rebuild the file that way. You can't really work with image internal to the SQL Server.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • So are the multiple rows per file versions or are they additional segments? Given that it is already in the image datatype I think they are likely versions because the image datatype allows for up to 2gb in a single row. There of course may have been something in your previously lost UI that broke it into chunks. Most likely the only you will figure that out is to start pulling out the data and see.

    Do you want/need some help there? I can toss together a quick and dirty c# console app pretty easily if you need some help.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • All of your assumptions are correct. If we can get away with doing this all in SQL Server, that would be great. While I can easily write something in visual basic to do this, I don't have those tools at my disposal at work.

    Do you think I should put this in the 2005 forum as well? Or is there a way I can direct people here?

    I thought I put it there, I don't remember ever selecting "SQL Server 7, 2000". Yikes! What a blast from the past!

    Thanks for your responses!

  • No need to start a new thread. You already have several people helping. You might be able to do this directly from sql but it would take way more time. Are you familiar with c# and have visual studio? I can put together a console app to read your data and save it to a file very quickly. You may (more like almost certainly) will need to adjust it a bit to fit your exact needs.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean good question. The multiple rows are NOT versions. There are actually more columns in the db than I posted and one of them is [version]. Not many records have a version > 1. Our users really didn't utilize this feature way back when.

    Thanks for your offer to write some code!

  • nancy g (12/1/2011)


    Sean good question. The multiple rows are NOT versions. There are actually more columns in the db than I posted and one of them is [version]. Not many records have a version > 1. Our users really didn't utilize this feature way back when.

    Thanks for your offer to write some code!

    So they are not versions...but what are they and how to do they relate?

    Here is the basic gist of the code in .net (untested but should be VERY close).

    using System.Drawing;

    using System.IO;

    ...

    get a datatable filled with your data.

    Select Image [and other columns]

    foreach(DataRow dr in dt.Rows)

    {

    Bitmap bmp = new Bitmap(new MemoryStream((byte[])(dr["Image"]))

    if (bmp == null)

    {

    bmp.Save(YourDerivedFileNameHere);

    bmp.Dispose();

    }

    }

    --edit my omnipresent fat fingers.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • here's a VB.Net snippet I wrote and tested for a different post on the same subject;

    with a few tweaks realted to the connection and table to select from, this is ready to go:

    '--Imports System.IO

    '--Imports System.Data.SqlClient

    '--this works whether the datatype for is IMAGE or VARBINARY(max)

    Dim sqlcmd As String = "SELECT TOP 3 FileName,ImageData From BatchOfBlobs;"

    Dim mySqlConnectionFormat As String = "data source={0};initial catalog={1};user id={2};password={3};Trusted_Connection=False;Connect Timeout=600;Workstation ID=GhostInTheMachine;Application Name=HaxxorPadPlusPlus;"

    Dim MyConn As New SqlConnection

    MyConn.ConnectionString = String.Format(mySqlConnectionFormat, "DEV223", "SandBox", "Noobie", "NotARealPassword")

    MyConn.Open()

    'now lets get a commadn object

    Dim mySqlCommand As New SqlCommand

    mySqlCommand.Connection = MyConn

    mySqlCommand.CommandTimeout = 600

    mySqlCommand.CommandType = CommandType.Text

    mySqlCommand.CommandText = sqlcmd

    Dim myDataReader As SqlDataReader

    myDataReader = mySqlCommand.ExecuteReader

    While myDataReader.Read

    Try

    Dim path As String = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData)

    Dim imageInBytes As Byte() = myDataReader("ImageData")

    Dim memoryStream As System.IO.Stream = New System.IO.MemoryStream(imageInBytes, True)

    Dim image As New System.IO.BinaryWriter(File.Open(path & "\" & myDataReader("FileName"), FileMode.Create))

    '%APPDATA% variable like "C:\Users\Lowell\AppData\Roaming"

    image.Write(imageInBytes)

    image.Flush()

    image.Close()

    Catch ex As Exception

    Console.WriteLine(ex.Message)

    End Try

    End While

    End Sub

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • DOH! I forgot the one I posted is strictly for images (not exactly going to work for you).

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply