Check 'exist', case insensitively in Xquery

  • I am using SQL Server 2008 R2. Below is the  query  used to get the count  and the time taken for that query.

    --time taken for table without any xml indexing
    SELECT COUNT(*) from [largexml-schema-nodoc].dbo.Table_noindex WHERE bookdata.exist('(/bookstore/book/author)[fn:upper-case(.)=fn:upper-case(''Erik T. Ray'')]') = 1
    --CPU time = 10797 ms, elapsed time = 10994 ms.
    SELECT COUNT(*) from [largexml-schema-nodoc].dbo.Table_noindex WHERE bookdata.exist('(/bookstore/book/author)[.=''Erik T. Ray'']') = 1
    --CPU time = 3312 ms, elapsed time = 3298 ms.


    --time taken for table with all types secondary xml indexing
    SELECT COUNT(*) from [largexml-schema-nodoc].dbo.tblallindex WHERE bookdata.exist('(/bookstore/book/author)[fn:upper-case(.)=fn:upper-case(''Erik T. Ray'')]') = 1
    --CPU time = 7282 ms, elapsed time = 7465 ms.
    SELECT COUNT(*) from [largexml-schema-nodoc].dbo.tblallindex WHERE bookdata.exist('(/bookstore/book/author)[.=''Erik T. Ray'']') = 1
    --CPU time = 453 ms, elapsed time = 461 ms.
    *****100000 rows with XML col value ****
    "<bookstore>
    <book category=""COOKING"">
    <title lang=""en"">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30</price>
    </book>
    <book category=""CHILDREN"">
    <title lang=""en"">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
    </book>
    <book category=""WEB"">
    <title lang=""en"">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
    </book>
    <book category=""WEB"">
    <title lang=""en"">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
    </book>
    </bookstore>"

    Is there any option to check 'exist', case insensitively ?

  • XML is case sensitive by definition. There is no escape from it.

    Two points to mention.

    (1) You can try the following approach by using a different XPath predicate like below.

    (2) Additionally, SQL Server 2012 introduced a new 3rd type of XML index: Selective XML Index

    You can see a good example of it here:

    https://www.red-gate.com/simple-talk/sql/learn-sql-server/precision-indexing-basics-of-selective-xml-indexes-in-sql-server-2012/

    SQL

    -- DDL and sample data population, start
    DECLARE @tbl TABLE (ID INT IDENTITY PRIMARY KEY, bookdata XML);
    INSERT INTO @tbl (bookdata) VALUES
    (N'<bookstore>
    <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30</price>
    </book>
    <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
    </book>
    <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
    </book>
    <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
    </book>
    </bookstore>');
    -- DDL and sample data population, end

    DECLARE @author VARCHAR(30) = 'Erik T. Ray';
    SET @author = UPPER(@author);

    -- slow
    SELECT COUNT(*)
    FROM @tbl
    WHERE bookdata.exist('(/bookstore/book/author)[fn:upper-case(.)=fn:upper-case(''Erik T. Ray'')]') = 1;

    -- much faster
    SELECT COUNT(*)
    FROM @tbl
    WHERE bookdata.exist('/bookstore/book/author/text()[upper-case(.) = sql:variable("@author")]') = 1;

    • This reply was modified 3 years, 7 months ago by  ykhabins.
  • Below query tested and time updated

    -- with ucase
    SELECT COUNT(*) from tblallindex
    WHERE bookdata.exist('/bookstore/book/author/text()[upper-case(.) = ''ERIK T. RAY'']') = 1;
    -- CPU time = 5781 ms, elapsed time = 5903 ms.

    -- without ucase
    SELECT COUNT(*) from tblallindex
    WHERE bookdata.exist('/bookstore/book/author/text()[. = ''Erik T. Ray'']') = 1;
    -- CPU time = 94 ms, elapsed time = 100 ms.

    text() doesn't solve (without schema case),

    when schema is enabled, text() is not allowed, but same speed as text().

  • Possibly a dumb question but would it be faster to do this on the application side?  Not sure how much data you are pulling (in MB/GB) if you just did a full data dump of the XML to the application layer and let the application handle searching and sorting the data.

    Or, alternatively, is it possible to take your XML and put it into relational and easy to use tables and just have a view that pulls the data back as XML for use by the application?

    I am not saying don't do it in SQL if it makes sense, but XML is rarely efficient in SQL Server and can be challenging to debug and get coded up JUST right (based on my experience). I am thinking that if you can pull the XML directly, pass it to the application and let the application handle the searching and sorting, it may be easier as I imagine that the application is parsing the XML into a format that it can handle (table, array(s), dictionary, list, etc) and .NET is generally good at working with its objects in memory.  Having the data in-memory on the application side means that processing doesn't need to re-query the database with each new search that is required on the XML.

    Now on the other hand, if the application requires XML input and it gets that from the database and changing the application is not an option, storing your data in the database in a table format with columns/tables like "Author" and "Title" rather than storing all of the data in a big XML blob will make searches easier as well as inserts and updates.  With your big blobs of XML, finding duplicate or near duplicate data becomes challenging.

    And if you had it all in the database in a nice relational manner with author names and book titles, your SQL queries for searching will be MUCH easier to write and maintain without needing to parse through XML.  And if the application needs an XML result, SQL can convert the table results to XML and the application would have no idea that the database design changed!

    Mind you, I know changing a "working" system into something maintainable like that can be challenging and time consuming and since it adds no "benefit" from an end user perspective, getting approvals to work on changes like that may not be approved by your boss.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Dear Brian Gale,

    The OP just needs to migrate off the current SQL Server 2008R2 to a newer version.

    Starting from SQL Server 2012, it has a very powerful new index: Selective XML Index.

    It allows to create an index based on the XPath expessions. Exactly what is needed for the scenario.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply