Using T-SQL to cross-check lookup lists in XML format (Second attempt)

  • Hello experts,

    First - apologies if this post appears twice. I tried to the edit the SQL code sample and for some reason my whole post disappeared or got turned invisible.

    I'm wrestling with a way to accomplish the following. I have been able to cobble together most of what I need (ugly as it is - aesthetic improvements are most welcome). However, I need help with getting this to a completed and repeatable state.

    My goal is to be able to run a T-SQL script that can:

    1. extract a number of lookup lists from an XML document
    2. compare the values for each of those lookup lists against what we have stored in local lookup tables
    3. report the difference for any mismatches or missing values on either side

    Below is the sample code that you should be able to run to create a sample XML table, followed by the XML script that I have so far. It successfully goes through the XML and spits out some of what I need, but not correctly. Specifically:

    1. I extract some of the elements, but not necessarily the right way
    2. If the script tries to process multiple rows from the sample table, the output is all out of order
    3. I would like to neatly extract any given list from its starting element (<AddedColumn>...) to its ending element (</AddedColumn>...)  to store/compare to its corresponding table but at the moment have no confidence that the script can "know" that it's getting the proper elements and sub-elements without jumbling it all up.

    Additionally, if you take a closer look at the XML in the table column, you will notice that certain sets of elements consist of elements that together make up a lookup list (ColumnType List), but other sets are free-form text fields (ColumnType Text) and in the data I receive the two types appear in alternating fashion. In other words, this XML document stores all the additional columns that we have added to the external system but not neatly separated into first lookup lists then free text fields.

    So I would like to process only the lookup lists, because only those have corresponding lookup tables that we need to keep in sync.

    Sorry for this long description but I think it's needed to point out the essential complications that are giving me trouble. I haven't worked with XML or XQuery a lot so I suspect I'm missing an elegant solution to all of this.

     

    --===== Create sample table.
    /****** Object: Table [dbo].[SampleXML] Script Date: 5/28/2019 4:50:55 PM ******/
    SET ANSI_NULLS ON
    GO

    SET QUOTED_IDENTIFIER ON
    GO

    CREATE TABLE [dbo].[SampleXML](
    [AddedXMLDescription] [varchar](100) NOT NULL,
    [AddedColumnXML] [varchar](max) NULL,
    PRIMARY KEY CLUSTERED
    (
    [AddedXMLDescription] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
    GO

    --===== Insert data.
    BEGIN TRANSACTION;
    INSERT INTO [dbo].[SampleXML]([AddedXMLDescription], [AddedColumnXML])
    SELECT N'Sample XML', N'<AddedColumn>
    <ColumnName>Shape</ColumnName>
    <ColumnType>List</ColumnType>
    <ColumnValues>
    <string>Circle</string>
    <string>Square</string>
    <string>Triangle</string>
    <string>Pentagon</string>
    </ColumnValues>
    </AddedColumn>
    <AddedColumn>
    <ColumnName>Favorite Dessert</ColumnName>
    <ColumnType>Text</ColumnType>
    <FieldValues />
    </AddedColumn>
    <AddedColumn>
    <ColumnName>Currency</ColumnName>
    <ColumnType>List</ColumnType>
    <ColumnValues>
    <string>Dollar (US)</string>
    <string>Dollar (CAN)</string>
    <string>Euro</string>
    <string>Yen</string>
    </ColumnValues>
    </AddedColumn>'
    COMMIT;
    RAISERROR (N'[dbo].[SampleXML]: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;
    GO

    --===== Run script - the one that is giving me a headache :-).
    --===== Set two variables, varchar(max), one xml.
    DECLARE @FixedXML AS VARCHAR(MAX);
    DECLARE @x AS XML;

    --===== Create temp table to hold the results.
    IF OBJECT_ID('tempdb..#TempAddedXML') IS NOT NULL
    DROP TABLE #TempAddedXML;

    CREATE TABLE #TempAddedXML
    (
    AddedXMLDescription VARCHAR(100),
    LookupListName VARCHAR(MAX),
    parentElementName VARCHAR(MAX),
    elementName VARCHAR(MAX),
    elementValue VARCHAR(MAX)
    );

    --===== Cursor
    DECLARE @AddedXMLDescription AS VARCHAR(100);
    DECLARE @LookupListName AS VARCHAR(MAX);

    DECLARE @AddedXMLCursor CURSOR;

    SET @AddedXMLCursor = CURSOR FOR
    SELECT AddedColumnXML
    FROM SampleXML;

    OPEN @AddedXMLCursor;

    FETCH NEXT FROM @AddedXMLCursor
    INTO @AddedXMLDescription;

    WHILE @@FETCH_STATUS = 0
    BEGIN

    --===== Assign the table column data to the varchar(max) variable.
    SELECT @FixedXML = AddedColumnXML
    FROM SampleXML;

    --===== I initially got the error below:
    /* Msg 9402, Level 16, State 1, Line 11 XML parsing: line 1, character 38, unable to switch the encoding */

    --===== Replace the UTF encoding to fix the error, per this forum comment:
    /*
    --https://social.msdn.microsoft.com/Forums/sqlserver/en-US/a87f906e-c2d4-4fe0-958c-69367f71d675/xml-parsing-line-1-character-38-unable-to-switch-the-encoding?forum=sqlreplication
    */
    SELECT @FixedXML = REPLACE(@FixedXML, '"UTF-8"', '"UTF-16"');

    --===== Convert the varchar(max) variable data to nvarchar(max) then to xml.
    SET @x = CONVERT(XML, CONVERT(NVARCHAR(MAX), @FixedXML));

    --===== Output the result. For testing/debugging.
    select @x;

    --===== Declare another xml variable.
    DECLARE @xml XML;
    SET @xml = @x;

    -- Show all elements
    /* Ref.:
    --https://social.technet.microsoft.com/Forums/en-US/c154ef94-72fb-4511-99c1-cf701bda5e2e/query-xml-column-to-extract-child-nodes-into-multiple-rows?forum=sqlxml
    */
    INSERT INTO #TempAddedXML ( AddedXMLDescription, LookupListName,
    parentElementName, elementName, elementValue )
    SELECT @AddedXMLDescription, NULL,
    x.y.value('local-name(..)', 'VARCHAR(MAX)') AS parentElementName,
    x.y.value('local-name(.)', 'VARCHAR(MAX)') AS elementName,
    x.y.value('.', 'VARCHAR(MAX)') AS elementValue
    FROM @xml.nodes('//*[text()]') AS x(y);

    FETCH NEXT FROM @AddedXMLCursor
    INTO @AddedXMLDescription;

    END;

    CLOSE @AddedXMLCursor;
    DEALLOCATE @AddedXMLCursor;

    --===== Check the results, ranking with a number in each row.
    SELECT DENSE_RANK() OVER (ORDER BY AddedXMLDescription) AS Ranking,
    t.AddedXMLDescription, t.parentElementName, t.elementName,
    t.elementValue, t.LookupListName
    FROM#TempAddedXML t
    ORDER BY Ranking;

    If you need any more info, let me know. Thanks for any help.

    -- webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • There was a syntax error in your code, so it wouldn't run.  I had to guess at what you wanted.

    1. WHY are you storing XML data as VARCHAR(max)?
    2. WHY are you using a CURSOR?

      1. It also appears that you are using the CURSOR incorrectly.  You're pulling the same data every time you run the CURSOR.

    Here is what I came up with. (I changed your permanent table to a temporary table.) The actual query itself is very simple.

    CREATE TABLE #SampleXML(
    AddedXMLDescription varchar(100) NOT NULL,
    AddedColumnXML XML NULL,
    PRIMARY KEY CLUSTERED
    (
    AddedXMLDescription ASC
    )
    )

    --===== Insert data.
    BEGIN TRANSACTION;
    INSERT INTO #SampleXML(AddedXMLDescription, AddedColumnXML)
    SELECT N'Sample XML', N'<AddedColumn>
    <ColumnName>Shape</ColumnName>
    <ColumnType>List</ColumnType>
    <ColumnValues>
    <string>Circle</string>
    <string>Square</string>
    <string>Triangle</string>
    <string>Pentagon</string>
    </ColumnValues>
    </AddedColumn>
    <AddedColumn>
    <ColumnName>Favorite Dessert</ColumnName>
    <ColumnType>Text</ColumnType>
    <FieldValues />
    </AddedColumn>
    <AddedColumn>
    <ColumnName>Currency</ColumnName>
    <ColumnType>List</ColumnType>
    <ColumnValues>
    <string>Dollar (US)</string>
    <string>Dollar (CAN)</string>
    <string>Euro</string>
    <string>Yen</string>
    </ColumnValues>
    </AddedColumn>'
    COMMIT;
    RAISERROR (N'#SampleXML: Insert Batch: 1.....Done!', 10, 1) WITH NOWAIT;

    /* Final Query */
    SELECT c.value('../../ColumnName[1]', 'VARCHAR(50)') AS ColumnName
    ,c.value('.', 'VARCHAR(50)') AS Value
    FROM #SampleXML s
    CROSS APPLY s.AddedColumnXML.nodes('/AddedColumn[ColumnType="List"]/ColumnValues/string') T(c)

    DROP TABLE #SampleXML

    Note that the actual query is only four lines of code.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks for you reply, Drew.

    Quick question - could you send me the syntax error that you got? I just ran the code and did not get any errors.

    To answer your other questions:

    WHY are you storing XML data as VARCHAR(max)?

    The XML in question is produced by a third-party vendor, and for some reason, I had to cast it as VARCHAR(MAX) and then back to XML in order to get the entire contents of the XML. I've just started working with XML in this kind of real-world coding so honestly I am not sure why all of the XML document was not showing to begin with.

    WHY are you using a CURSOR?

    I used a cursor in the sample code just to better understand what was happening with the XQuery commands. But I know the longstanding criticisms of cursors and would certainly be open to rewriting the code in a set-based way (if that's possible).

    It also appears that you are using the CURSOR incorrectly.  You're pulling the same data every time you run the CURSOR.

    Thanks - I think that may be an artifact of the sample I used, if I understand your comment correctly. If not, what do you mean by "the same data"? My goal is to pull all of the XML documents in the table each time in order to extract and compare the lookup lists, which is why the cursor query has no WHERE clause (that is, SELECT AddedColumnXML

    FROM SampleXML;).

    Thanks again!

    -- webrunner

     

     

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply