Parsing huge XML file into a database table using sql server

  • I tried to parse a huge xml file using sql server (OPENROWSET) into a database table, but got an error that says duplicate attributes (author). The author column varies for each article, ranging from one author to upto six authors. The columns I am really interested in are author, title and journal. Below is a snippet of the xml data. Any suggestions will be highly appreciated. Thanks in advance.

    <?xml version="1.0" encoding="ISO-8859-1"?>

    <!DOCTYPE dblp SYSTEM "dblp.dtd">

    <dblp>

    <article mdate="2002-01-03" key="persons/CoddD74">

    <author>E. F. Codd</author>

    <author>C. J. Date</author>

    <title>Interactive Support for Non-Programmers: The Relational and Network Approaches.</title>

    <journal>IBM Research Report, San Jose, California</journal>

    <volume>RJ1400</volume>

    <month>June</month>

    <year>1974</year>

    </article>

    </dblp>

  • Hi... Have you tried something like below.. This works fine with your example xml:

    DECLARE @xml XML

    SELECT @xml = CONVERT(XML, (SELECT * FROM OPENROWSET(BULK N'c:\testxml.xml', SINGLE_BLOB) AS xmldocument), 2)

    SELECT t.c.value('(./text())[1]', 'varchar(30)')

    FROM @xml.nodes('//author') AS T(c)

  • Thanks for the response, it worked, but I would like the output to be in the formatt below. All the authors of a particular article should be together in a column, separate by comma, and the title as well as the journal should occupy the next two columns. The authors of a particular article, title and journal should be just one record. Thanks in advance for your suggestions.

    Author (s)

    E F Cod, Patric A V Hall, Markus Tresch

  • I wrote a blog on something similar a while back: http://www.olcot.co.uk/sql-blogs/using-xquery-to-create-a-single-xml-node-with-comma-separated-string-from-multiple-xml-nodes

    here is one of a few ways to comma separate the node author for you:

    DECLARE @xml XML

    SELECT @xml = CONVERT(XML, (SELECT * FROM OPENROWSET(BULK N'c:\testxml.xml', SINGLE_BLOB) AS xmldocument), 2)

    SELECT t.c.query('for $x in author

    return

    if (not( ($x) is (author[last()])[1] )) then

    concat($x/text()[1], ",")

    else

    string($x/text()[1])').value('.', 'varchar(200)')

    , t.c.value('(title/text())[1]', 'varchar(150)') AS 'title'

    , t.c.value('(journal/text())[1]', 'varchar(100)') AS 'journal'

    FROM @xml.nodes('//article') AS T(c)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply