Processing XML Column in SSIS

  • Here's my dilemma.

    I have to process about 66k rows of data which contain an XML column (stored as NTEXT not XML) that must be shredded and processed. I am currently doing it all in T-SQL looping over the rows and using sp_xml_preparedocument and sp_xml_removedocument and it takes over 2 hours to process. This is unacceptable in my opinion. I did try converting to XML and using the bative XML functions (.nodes) and that process takes even longer (I never ran it on a full result set, but on limited rows it took at least twice as long). Now I'm thinking it might be faster to do this in SSIS, but I'm not sure how to do it.

    My first thought is to load the id's for the rows I need to process into an ADO variable and then use a foreach loop to get the XML and process it. I realize that this is 66k calls to the DB, but when doing it in T-SQL it is faster to loop through each row to get the XML than to include the NTEXT (Xml) column in the original query. My issue is I'm not sure how to then process the XML once I have it. Can I use adn Object variable and append each XML document to it, or do I need to process each XML document within the foreach loop?

    Any ideas?

  • I would like to try a few things out on this side. Just so we're on the same page which version of SQL Server and SSIS are we on?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • 2008 SP2 for both.

    Thanks,

  • when doing it in T-SQL it is faster to loop through each row to get the XML than to include the NTEXT (Xml) column in the original query.

    I could see that if you were using a while loop in T-SQL to iterate over the ids and then getting the XML/NTEXT into a variable from the table using the id for each iteration...but in SSIS I think you'll want to pull all the data into the recordset with the initial query.

    My issue is I'm not sure how to then process the XML once I have it. Can I use adn Object variable and append each XML document to it, or do I need to process each XML document within the foreach loop?

    I went for:

    1. Get id and xml (as ntext) from database into ADO recordset using a single query.

    2. Use script task to process recordset as follows for each row:

    - cast ntext string as .NET XML document

    - work on XML document as needed

    - options: store XML object as string back to recordset in original column for later use in SSIS package -or- immediately update DB with revised xml data using id for that row


    Proof of concept:

    Setup a table with an id and some xml stored as ntext. I used AdventureWorks data but you have real data to work with.

    USE tempdb

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.some_stuff')

    AND type IN (N'U') )

    DROP TABLE dbo.some_stuff ;

    GO

    CREATE TABLE dbo.some_stuff

    (

    some_stuff_id INT NOT NULL

    PRIMARY KEY,

    xml_data NTEXT

    ) ;

    GO

    INSERT INTO dbo.some_stuff

    (

    some_stuff_id,

    xml_data

    )

    SELECT CustomerID,

    CAST(CAST(Demographics AS VARCHAR(MAX)) AS NTEXT)

    FROM AdventureWorks2008R2.Sales.Store

    GO

    See the attached SSIS package.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I started to look at this and move toward testing it, but then the boss decided we could use another system which does not have the XML to be the source for this project. The way we are doing it there returns the data in about 1 second (including display time).

    I may still doing some playing with this on my own time though. I think it will be an interesting exercise.

    Thanks for the effort it definitely pointed me in the right direction.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply