XML Stream Reader

  • Hi All,

    I am looking to try and figure out a solution to load and process large and complex XML files into SQL. The standard SSIS XML Source is unable to handle the complexity of the documents (3x XSD's and multiple namespaces and 100+ node possibilities). My preference would be to avoid writing the XML files into an XML blob as many of the files are 1GB+ and this adds additional complexities as well as having storage impacts.

    A possible solution which I came across was in Pentaho, where we could use the XML Input Stream (StAX) to produce a simplified node/attribute branching into SQL in a generic format (see first pic), this was generated based on the XML from the second pic. (samples are of a simple xml file, however this solution is scalable to very large and files with complex nesting etc).

    Is it possible to produce something similar with SSIS? I haven't had to do too much with XML other than relatively straight forward files, so when this one came up it threw me quite a bit.

    Does anyone know of a way to do this with SSIS or can you advise on how to create a customer Script Source to work this out?

    We are using SQL 2014 and are looking to upgrade to 2016 shortly, so a possible solution that will work with either would be ideal.

    Thanks for your help,

    Steve

  • The conventional wisdom is to use Linq to stream XML data. Note this article. I have never used Linq and have heard that it's slow for this; nonetheless, that's one option.

    Adam Aspin has a good way to do this using the SQLXML Bulk Loader[/url]. I did this a very long time ago and it seemed to work well. I was looking through the article I just posted and noticed this line:

    SQLXML Bulk Loader is best used in the following situations:

    ...When the XML source structure is relatively simple.

    Since you're dealing with a complex XML structure you may want to first transform the XML into a less complex structure using XSLT. SSIS has an XSLT transform task[/url]. If you are not familiar/comfortable with XSLT post some sample XML with the desired results here and I can show you how to do it.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thank you for your reply, I have a feeling these files may be too much for XMLBulkLoader however I will have a look at the links you sent and see if I can get them to work. No doubt I will be back in touch shortly

  • IRL11 (8/30/2016)


    Thank you for your reply, I have a feeling these files may be too much for XMLBulkLoader however I will have a look at the links you sent and see if I can get them to work. No doubt I will be back in touch shortly

    This should get you a fair amount of the way there. This is a bit of brute force, but it does get you there:

    ;with structCTE as

    (select --distinct

    b.value('text()[1]','varchar(400)') as NodeValue,

    b.value('local-name(.)','varchar(400)') as Node,

    b.value('local-name(..)','varchar(400)') as l1,

    b.value('local-name(../..)','varchar(400)') as l2,

    b.value('local-name(../../..)','varchar(400)') as l3

    ,b.value('local-name(../../../..)','varchar(400)') as l4

    ,b.value('local-name(../../../../..)','varchar(400)') as l5

    ,b.value('local-name(../../../../../..)','varchar(400)') as l6

    ,b.value('local-name(../../../../../../..)','varchar(400)') as l7

    ,b.value('local-name(../../../../../../../..)','varchar(400)') as l8

    ,c.value('local-name(.)','varchar(400)') as attname

    ,c.value('.','varchar(400)') as attvalue

    from (select top 1 * from #RsXMLLoad) x

    cross apply [Message].nodes('//*') a(b)

    outer apply b.nodes('@*') b(c)

    ),

    XMLsummary as (

    select case when l1='' then ''

    when l2='' then l1+'/'

    when l3='' then l2+'/'+l1+'/'

    when l4='' then l3+'/'+l2+'/'+l1+'/'

    when l5='' then l4+'/'+l3+'/'+l2+'/'+l1+'/'

    when l6='' then l5+'/'+l4+'/'+l3+'/'+l2+'/'+l1+'/'

    when l7='' then l6+'/'+l5+'/'+l4+'/'+l3+'/'+l2+'/'+l1+'/'

    when l8='' then l7+'/'+l6+'/'+l5+'/'+l4+'/'+l3+'/'+l2+'/'+l1+'/'

    else l8+'/'+l7+'/'+l6+'/'+l5+'/'+l4+'/'+l3+'/'+l2+'/'+l1+'/'

    endxpath,

    case when l1='' then 1

    when l2='' then 2

    when l3='' then 3

    when l4='' then 4

    when l5='' then 5

    when l6='' then 6

    when l7='' then 7

    when l8='' then 8

    else 9

    endxml_element_level,

    case when l1='' then 0

    when l2='' then 1

    when l3='' then 2

    when l4='' then 3

    when l5='' then 4

    when l6='' then 5

    when l7='' then 6

    when l8='' then 7

    else 8

    endxml_parent_level,

    Node,

    nodevalue,nullif(attname,'') attname,attvalue

    from structCTE s)

    select ROW_NUMBER() over (order by (select null)) Rn,

    xml_parent_level,

    xml_element_level,

    xpath+Node xml_path,

    xpath xml_parent_path,

    Node xml_data_name,

    NodeValue ElementValue,

    Attname as AttributeName,

    AttValue as AttributeValue

    from XMLsummary

    Note: this is hardcoded to handle a max depth of 9 (to the "deepest element"). You could certainly add more levels if you need although I'd question the sanity of transport model if it goes a lot deeper.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply