Extract the titles from XML

Question

Post reply

Extract the titles from XML

Steve Jones - SSC Editor

SSC Guru

Points: 741285
More actions
July 12, 2015 at 2:18 am

#73646

Comments posted to this topic are about the item Extract the titles from XML

Viewing 15 posts - 1 through 15 (of 47 total)

You must be logged in to reply to this topic. Login to reply

Hany Helmy SSChampion Points: 13921 More actions · Answer 1

Good Question, thanx.

Thanks & Best Regards,
Hany Helmy
SQL Server Database Consultant

Bob JH Cullen SSCrazy Points: 2082 More actions · Answer 2

Er, all which three books of the four that are in the data .... ?

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".

As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.

People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ed Wagner SSC Guru Points: 287026 More actions · Answer 4

And here I thought I'd be the first one to comment that I don't like XML. 😀

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

TomThomson SSC Guru Points: 104768 More actions · Answer 5

Jeff Moden (7/13/2015)
The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".
As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.
People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

+100. Or maybe I mean + a googol.

The only reason to use XML is that one has some data whose structure can't reasonably and efficiently be represented as a table in an RDBMS but can be represented efficiently in XML. When the XML representation of the data is a tree with data only at the leaf nodes and every leaf node has the same set of attributes it is clearly trivial, not just reasonable and efficient, to represent the data in the leaf nodes as rows in a relational table whose column names match teh attribute names for thos leaf nodes. It is also extremely clear that any XML representation of that data is grossly inefficient both in storage space and in processing cost. So the data used in this QotD is a perfect exanple of data for which XML representation shouldn't even be considered, let alone used.

Nevertheless, it is a nice easy question. Very pleasant to get two whole points for remembering "start at the ground and work down" (or "start at the ground and work up").

edit: I see that more than half of the first seven comments are from people who have not fallen for the XML hype. I wonder if that'll stay above 50% as more comments come in.

Tom

Xavon SSCrazy Points: 2280 More actions · Answer 6

Ed Wagner (7/13/2015)
And here I thought I'd be the first one to comment that I don't like XML. 😀

Nope, lots of people hate XML. (raises hand)

I messed it up; didn't pay close enough attention to the XML string.

SQLRNNR SSC Guru Points: 281344 More actions · Answer 7

TomThomson (7/13/2015)
Jeff Moden (7/13/2015)
The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".
As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.
People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.
+100. Or maybe I mean + a googol.
The only reason to use XML is that one has some data whose structure can't reasonably and efficiently be represented as a table in an RDBMS but can be represented efficiently in XML. When the XML representation of the data is a tree with data only at the leaf nodes and every leaf node has the same set of attributes it is clearly trivial, not just reasonable and efficient, to represent the data in the leaf nodes as rows in a relational table whose column names match teh attribute names for thos leaf nodes. It is also extremely clear that any XML representation of that data is grossly inefficient both in storage space and in processing cost. So the data used in this QotD is a perfect exanple of data for which XML representation shouldn't even be considered, let alone used.
Nevertheless, it is a nice easy question. Very pleasant to get two whole points for remembering "start at the ground and work down" (or "start at the ground and work up").
edit: I see that more than half of the first seven comments are from people who have not fallen for the XML hype. I wonder if that'll stay above 50% as more comments come in.

Not a fan here either.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 8

TomThomson (7/13/2015)
Jeff Moden (7/13/2015)
The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".
As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.
People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.
+100. Or maybe I mean + a googol.
[font="Arial Black"]The only reason to use XML is that one has some data whose structure can't reasonably and efficiently be represented as a table in an RDBMS but can be represented efficiently in XML.[/font] When the XML representation of the data is a tree with data only at the leaf nodes and every leaf node has the same set of attributes it is clearly trivial, not just reasonable and efficient, to represent the data in the leaf nodes as rows in a relational table whose column names match teh attribute names for thos leaf nodes. It is also extremely clear that any XML representation of that data is grossly inefficient both in storage space and in processing cost. So the data used in this QotD is a perfect exanple of data for which XML representation shouldn't even be considered, let alone used.
Nevertheless, it is a nice easy question. Very pleasant to get two whole points for remembering "start at the ground and work down" (or "start at the ground and work up").
edit: I see that more than half of the first seven comments are from people who have not fallen for the XML hype. I wonder if that'll stay above 50% as more comments come in.

I absolutely agree with all the above. The only thing is (maybe due to my current state of caffeine deprecation and the fact that I've been up all night), I can't actually envision any data that will ultimately be targeted for an RDBMS that can't easily be represented is some other form, any and all of which would likely have at least the efficiency gains that I posted in my first email.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

chgn01 SSCarpal Tunnel Points: 4420 More actions · Answer 9

try './/Book'

--------------------------------------
;-)“Everything has beauty, but not everyone sees it.” ― Confucius

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 10

chgn01 (7/13/2015)
try './/Book'

Will work but it's far more expensive to traverse the structure than to do a direct reference.

😎

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 11

Jeff Moden (7/13/2015)
The correct answer isn't actually available. The correct answer should be to "hunt down the moroff that used XML to transmit purely flat data and introduce him to high velocity pork chops".
As it is and without the leading spaces, this file contains 487 characters. If it were converted to a columnar flat file without headers, it would reduce to just 167 characters to be roughly only 1/3rd the size. Adding a single row of headers wouldn't do much to change that.
People crank on about how slow I/O is. You won't find hardware that loads your data in a third of the time so make sure that it's mostly data you're sending or receiving instead of the tag bloat in XML especially for such flat data.

Shall we set up some tests to see how these two actually compare?

😎

SQL_Hunt SSC-Dedicated Points: 33518 More actions · Answer 12

SQL_Hunt

SSC-Dedicated

Points: 33518

July 13, 2015 at 10:11 am

#1812065

Nice Question. I scored the maximum. 🙂

Thanks.

Mikael Eriksson SE SSCommitted Points: 1706 More actions · Answer 13

I guess this is the preferred form.

SELECT

col.value('(Title/text())[1]', 'varchar(50)') AS 'Book'

FROM @x.nodes('/root/Books/Book') a(col)

Using text() in the values clause removes one call to a table valued function and the UDX operator that is there to take care of mixed content XML.

This is faster still if you only want the Title node.

SELECT

col.value('.', 'varchar(50)') AS 'Book'

FROM @x.nodes('/root/Books/Book/Title/text()') a(col)

There is no difference in query plan or performance between using //Book .//Book or /root/Books/Book in the nodes function.

SQLRNNR SSC Guru Points: 281344 More actions · Answer 14

Mikael Eriksson SE (7/13/2015)
I guess this is the preferred form.
SELECT
col.value('(Title/text())[1]', 'varchar(50)') AS 'Book'
FROM @x.nodes('/root/Books/Book') a(col)
Using text() in the values clause removes one call to a table valued function and the UDX operator that is there to take care of mixed content XML.
This is faster still if you only want the Title node.
SELECT
col.value('.', 'varchar(50)') AS 'Book'
FROM @x.nodes('/root/Books/Book/Title/text()') a(col)
There is no difference in query plan or performance between using //Book .//Book or /root/Books/Book in the nodes function.

Thanks. In support of that, here is a little setup to show it.

DECLARE @x XML =

N'<root>

<Books>

<Book>

<Title>Personal</Title>

<Author>Lee Child</Author>

<Publisher Pubdate="20150428">Dell</Publisher>

</Book>

<Book>

<Title>Never Go Back</Title>

<Author>Lee Child</Author>

<Publisher Pubdate="20120312">Dell</Publisher>

</Book>

<Book>

<Title>Ready Player One</Title>

<Author>Ernest Cline</Author>

<Publisher Pubdate="20110416">Broadway</Publisher>

</Book>

<Book>

<Title>The Martian</Title>

<Author>Andy Weir</Author>

<Publisher Pubdate="20140211">Broadway</Publisher>

</Book>

</Books>

</root>'

SET STATISTICS TIME ON;

--cost 135.66

SELECT col.value('data(Title[1])', 'varchar(50)') AS 'Book'

FROM @x.nodes('root/Books/Book') a ( col );

PRINT '///////////////////////////////////////////////';

--cost 2.23

SELECT col.value('.', 'varchar(50)') AS 'Book'

FROM @x.nodes('/root/Books/Book/Title/text()') a ( col );

PRINT '///////////////////////////////////////////////';

--cost 2.23

SELECT col.value('.', 'varchar(50)') AS 'Book'

FROM @x.nodes('/.//Book/Title/text()') a ( col );

SET STATISTICS TIME OFF;

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events