Reading XML performance - openxml or xquery

  • My original post didn't get the responses I was looking for. So here's an example of the performance problem I have the XML

    declare @xml xml =

    '<Furniture>

    <Table>

    <Color>Blue</Color>

    </Table>

    <Table>

    <Color>Green</Color>

    </Table>

    <Chair>

    <Color>Yellow</Color>

    </Chair>

    <Table>

    <Color>Red</Color>

    </Table>

    <Chair>

    <Color>Magenta</Color>

    </Chair>

    </Furniture>'

    and I want to retrieve the type of furniture and the position of the furniture in the XML. I'm currently running this query:

    SELECT

    ID = R.value('for $i in . return count(../*[. << $i]) + 1', 'INT'),

    Type = 'Table',

    Color = R.value('Color[1]','varchar(10)')

    FROM @xml.nodes( '/Furniture/Table' ) as a(R)

    union

    SELECT

    ID = R.value('for $i in . return count(../*[. << $i]) + 1', 'INT'),

    Type = 'Chair',

    Color = R.value('Color[1]','varchar(10)')

    FROM @xml.nodes( '/Furniture/Chair' ) as a(R)

    It gives me the correct answer:

    ID Type Color

    ---- ----- ----------

    1 Table Blue

    2 Table Green

    3 Chair Yellow

    4 Table Red

    5 Chair Magenta

    The problem is that if I have thousands of tables and chairs, the performance is severely degraded by the loop in getting the position of the furniture in the XML. Is there a more efficient way to get the position of the furniture (ID in the result set) in the XML?

    Thanks

    ====

    I have several XMLs that we keep receiving and need to be processed. Some are small (3-10MB) and some are large (200MB+). The large XML has a mostly flat structure and it reads relatively fast with either openxml or xquery. The smaller XMLs have multiple hierarchical levels and reading it is very time consuming. I tried OPENXML from an xml variable, XQuery from an xml variable and XQuery from an indexed XML column in database (primary index, secondary path index and secondary property index). The file is roughly 3 MB but it takes over 6 minutes to read it.

    Is there a way that XMLs can be read by SQL with reasonable performance?

    Thanks

  • That all depends on the precise nature of your query.

    There are several ways to query the data from an xml document, and some perform better than others for different documents.

    Perhaps if you share your query we could offer more specific advice.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thanks and here's a query:

    SELECT

    SequenceID = R.value('for $i in . return count(../*[. << $i]) + 1', 'INT'),

    ObjectID = R.value('identifier[1]','VARCHAR(50)'),

    Name = R.value('name[1]','VARCHAR(50)'),

    Type = R.value('type[1]','VARCHAR(50)'),

    IsCool = R.value('isCool[1]','VARCHAR(3)'),

    StartDateTime = R.value('startDate[1]','DATETIME2(0)'),

    EndDateTime = R.value('endDate[1]','DATETIME2(0)')

    FROM XMLTable X cross apply Payload.nodes( '/Root/Payload/ListedGadgets' ) AS a(R)

    WHERE X.IndexID = @ID

    I'm reading from table XMLTable where the XML column is named Payload for IndexID = @ID. The XML contains roughly 3,500 gadgets. Thanks again for the help.

  • Is this one of the "flat" xmls that is relatively fast?

    Because it doesn't look like a multi-level one from that query?

    Aside from that, why are you using that flwor query to get a sequence number? row_number() would work wouldn't it? and it would be much faster.

    One small optimisation which will speed things up a lot of the time is to extract the text() from a node, so instead of this:

    ObjectID = R.value('identifier[1]','VARCHAR(50)'),

    Try this:

    ObjectID = R.value('(identifier/text())[1]','VARCHAR(50)'),

    It may seem odd that introducing a function into the value would speed it up, but it can make a big difference on a large dataset.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • I didn't spot that you had provided sample data and a query...I'll take a look, but it does seem to be a non-optimal approach...

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • SELECT

    ID = row_number() over(order by @@spid) ,

    Type = R.value('local-name(.)','varchar(10)'),

    Color = R.value('(Color/text())[1]','varchar(10)')

    FROM @xml.nodes( '/Furniture/*' ) as a(R)

    How does this work for you?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thanks.

    It seems to have done the trick but I need to test on my complex XML. One thing I didn't get is the "order by @@spid" in teh row_number(). How does it guarantee the right order?

  • It doesn't guarantee the right order any more than your count() did, unfortunately - well at least not to my knowledge.

    I have yet to see any documentation that guarantees that count() method will return the order of the nodes in the document, and there is none to support the row_number(), but both "appear" to work.

    If you need to guarantee the order, you should include it in the XML document as an attribute of each Chair/Table.

    If you cannot include a node position in the xml and prefer to trust something a bit more solid and reliable, you are going to lose out on speed again, but you could use a numbers/tally table to selectively pull each node, which would be guaranteed ordering...

    SELECT

    ID = T.N ,

    Type = R.value('local-name(.)','varchar(10)'),

    Color = R.value('(Color/text())[1]','varchar(10)')

    FROM Tally T

    cross apply @xml.nodes( '(/Furniture/*)[sql:column("T.N")]' ) as a(R)

    where T.N <= (select @xml.value('count(/Furniture/*)', 'int'));

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

    Viewing 8 posts - 1 through 7 (of 7 total)

    You must be logged in to reply to this topic. Login to reply