November 12, 2012 at 11:35 pm
Dear All,
I have a Prod table with a million on rows having an ntext column to store XML.
I need to process something from that XML, but my query crashed in between as there are some invalid xml exists in some rows. So, i need to skip those. how can i?.
create table #temp (id int, xmldoc ntext)
insert into #temp values (1, '<parent><child></child></parent>')
insert into #temp values (2, '<parent><child>')
insert into #temp values (3, '<parent><child></child></parent>')
select * from #temp
where convert(xml,convert(nvarchar(max),xmldoc)).exist('(/parent)') = 1
I expect this to skip 2nd row instead an error.
Appreciating your helps.
November 14, 2012 at 11:47 am
This is a cursor-based "RBAR" function so if this is going to be a task that runs often it may be necessary to rewrite it using a Tally Table for improved performance...but I'll leave that up to someone else. I also include an optional XML "cleanup" function (below) that removes tabs and spaces. If that is not necessary, then obviously leaving it out will also improve performance.
/* Test data */
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp
CREATE TABLE #temp (id INT,xmldoc NTEXT)
INSERT INTO #temp
VALUES
(1,'<parent><child1></child1></parent>')
INSERT INTO #temp
VALUES
(2,'<parent><child>')
INSERT INTO #temp
VALUES
(3,'<parent><child2></child2></parent>')
/* Process data */
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
DROP TABLE #TempTable
CREATE TABLE #TempTable (
[id] INT NOT NULL,
[xmldoc] NVARCHAR(4000) NULL,
PRIMARY KEY (ID))
DECLARE
@idoc INT
,@rootnode NVARCHAR(255)
,@ID INT
,@strXML NVARCHAR(4000)
,@errMsg NVARCHAR(255)
SET @rootnode = '/parent'
DECLARE ValidateXML CURSOR
FOR
SELECT
id
,xmldoc
FROM
#temp
OPEN ValidateXML
FETCH NEXT FROM ValidateXML INTO
@ID
,@strXML
WHILE @@FETCH_STATUS = 0
BEGIN
/* Optional but recommended to remove excess spaces and tabs. */
/* However, this may slow down the process considerably so use with care. */
SET @strXML = dbo.svfRemoveExcessSpacesFromXML(@strXML)
BEGIN TRY
EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML
INSERT INTO #TempTable
SELECT
id
,xmldoc
FROM
#temp
WHERE
ID = @ID
AND CONVERT(XML,CONVERT(NVARCHAR(MAX),@strXML)).exist('(/parent)') = 1
END TRY
BEGIN CATCH
SELECT @errMsg = ERROR_MESSAGE()
/* Comment out this insert to just skip rows with invalid XML */
INSERT INTO #TempTable
SELECT
@ID AS id
,'Invalid XML: '+@errMsg AS xmldoc
END CATCH
SET @strXML = NULL
FETCH NEXT FROM ValidateXML INTO
@ID
,@strXML
END
CLOSE ValidateXML
DEALLOCATE ValidateXML
SELECT
id
,xmldoc
FROM
#TempTable
OUTPUT:
idxmldoc
1<parent><child1></child1></parent>
2Invalid XML: The error description is 'The following tags were not closed: parent, child.'.
3<parent><child2></child2></parent>
Optional XML cleanup function:
CREATE FUNCTION dbo.svfRemoveExcessSpacesFromXML
(
@strXML NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
BEGIN
DECLARE
@STR NVARCHAR(MAX)
,@xml XML
SET @STR = @strXML
SET @STR = REPLACE(@str,CHAR(9),' ') -- convert tabs to spaces
SET @STR =
REPLACE(
REPLACE(
REPLACE(
,' ',' '+CHAR(7))
,CHAR(7)+' ','')
,CHAR(7),'')
SET @STR = REPLACE(@str,'> <','><')
RETURN @STR
/*
SELECT dbo.svfRemoveExcessSpacesFromXML(' hi thisxxxx is test for onr jh1 lljkl l l jl l ')
SELECT dbo.svfRemoveExcessSpacesFromXML('<soapenv:Body><BackgroundReports xmlns="http://www.cpscreen.com/schemas"> <BackgroundReportPackage type="report"><ProviderReferenceId>WPS-6308077</ProviderReferenceId><PackageInformation> <ClientReferences />')
*/
END
November 14, 2012 at 12:43 pm
SQL Mad Rafi (11/12/2012)
Dear All,I have a Prod table with a million on rows having an ntext column to store XML.
I need to process something from that XML, but my query crashed in between as there are some invalid xml exists in some rows. So, i need to skip those. how can i?.
create table #temp (id int, xmldoc ntext)
insert into #temp values (1, '<parent><child></child></parent>')
insert into #temp values (2, '<parent><child>')
insert into #temp values (3, '<parent><child></child></parent>')
select * from #temp
where convert(xml,convert(nvarchar(max),xmldoc)).exist('(/parent)') = 1
I expect this to skip 2nd row instead an error.
Appreciating your helps.
No offense but why in the world would you use ntext to hold xml data? We have an xml datatype and ntext is deprecated. You need to get this data into a table that will be supportable in the long run. The RBAR example is exactly what I would come up with for this. I would change the datatype to XML for your xml data. Run that to get your data in the proper datatypes, fix up the garbage that can't convert and change the process that loads this data.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply