October 25, 2012 at 12:41 pm
Folks:
I need help with XML query below. It is working fine but runs very slow if the #output table has more than 20K records.
We get the output using windows powershell and that's the reason you will see the data in this format.
CREATE TABLE #output
(line varchar(255))
INSERT INTO #Output values ('<?xml version="1.0"?>')
INSERT INTO #Output values ('<Objects>')
INSERT INTO #Output values (' <Object>')
INSERT INTO #Output values (' <Property Name="Folder">\\SERVER1\Clients\APHE\Accounts</Pr')
INSERT INTO #Output values ('operty>')
INSERT INTO #Output values (' <Property Name="Access">Modify, Synchronize</Property>')
INSERT INTO #Output values (' <Property Name="User">Domain1\User3</Property>')
INSERT INTO #Output values (' </Object>')
INSERT INTO #Output values (' <Object>')
INSERT INTO #Output values (' <Property Name="Folder">\\SERVER1\Clients\APHE\Location</Property>')
INSERT INTO #Output values (' <Property Name="Access">Modify, Synchronize</Property>')
INSERT INTO #Output values (' <Property Name="User">Domain1\User3</Property>')
INSERT INTO #Output values (' </Object>')
INSERT INTO #Output values (' <Object>')
INSERT INTO #Output values (' <Property Name="Folder">\\SERVER1\Clients\BPE\Accounts</Pr')
INSERT INTO #Output values ('operty>')
INSERT INTO #Output values (' <Property Name="Access">ReadAndExecute, Synchronize</Property>')
INSERT INTO #Output values (' <Property Name="User">Domain1\User2</Property>')
INSERT INTO #Output values (' </Object>')
INSERT INTO #Output values (' <Object>')
INSERT INTO #Output values (' <Property Name="Folder">\\SERVER1\Clients\BPE\Location</Pr')
INSERT INTO #Output values ('operty>')
INSERT INTO #Output values (' <Property Name="Access">Modify, Synchronize</Property>')
INSERT INTO #Output values (' <Property Name="User">Domain1\User1</Property>')
INSERT INTO #Output values (' </Object>')
INSERT INTO #Output values (' <Object>')
INSERT INTO #Output values (' <Property Name="Folder">\\SERVER1\Clients\BPE\Comments</Pr')
INSERT INTO #Output values ('operty>')
INSERT INTO #Output values (' <Property Name="Access">ReadAndExecute, Synchronize</Property>')
INSERT INTO #Output values (' <Property Name="User">Domain1\User2</Propert')
INSERT INTO #Output values ('y>')
INSERT INTO #Output values (' </Object>')
INSERT INTO #Output values (' <Object>')
INSERT INTO #Output values (' <Property Name="Folder">\\SERVER1\Clients\PRD\Accounts</Pr')
INSERT INTO #Output values ('operty>')
INSERT INTO #Output values (' <Property Name="Access">ReadAndExecute, Synchronize</Property>')
INSERT INTO #Output values (' <Property Name="User">Domain1\User3</Propert')
INSERT INTO #Output values ('y>')
INSERT INTO #Output values (' </Object>')
INSERT INTO #Output values ('</Objects>')
DELETE #output WHERE line IS NULL
DECLARE @doc varchar(max)
SET @doc = ''
DECLARE @line varchar(255)
DECLARE xml_cursor CURSOR
FOR SELECT line FROM #output
OPEN xml_cursor
FETCH NEXT FROM xml_cursor INTO @line
WHILE @@FETCH_STATUS = 0
BEGIN
SET @doc = @doc + @line
FETCH NEXT FROM xml_cursor INTO @line
END
CLOSE xml_cursor
DEALLOCATE xml_cursor
SELECT
item.ref.value('(Property/text())[1]', 'nvarchar(200)') AS Folder
,item.ref.value('(Property/text())[2]', 'nvarchar(500)') AS Access
,item.ref.value('(Property/text())[3]', 'nvarchar(200)') AS UserGroup
FROM (SELECT CAST(@doc AS XML) AS feedXml) feeds(feedXml)
CROSS APPLY feedXml.nodes('/Objects/Object') AS item(ref)
Thanks !
October 25, 2012 at 1:02 pm
Try this instead, see if it's any faster:
SET NOCOUNT ON;
IF OBJECT_ID(N'tempdb..#output') IS NOT NULL
DROP TABLE #output;
CREATE TABLE #output (line VARCHAR(255))
INSERT INTO #Output
VALUES ('<?xml version="1.0"?>'),
('<Objects>'),
(' <Object>'),
(' <Property Name="Folder">\\SERVER1\Clients\APHE\Accounts</Pr'),
('operty>'),
(' <Property Name="Access">Modify, Synchronize</Property>'),
(' <Property Name="User">Domain1\User3</Property>'),
(' </Object>'),
(' <Object>'),
(' <Property Name="Folder">\\SERVER1\Clients\APHE\Location</Property>'),
(' <Property Name="Access">Modify, Synchronize</Property>'),
(' <Property Name="User">Domain1\User3</Property>'),
(' </Object>'),
(' <Object>'),
(' <Property Name="Folder">\\SERVER1\Clients\BPE\Accounts</Pr'),
('operty>'),
(' <Property Name="Access">ReadAndExecute, Synchronize</Property>'),
(' <Property Name="User">Domain1\User2</Property>'),
(' </Object>'),
(' <Object>'),
(' <Property Name="Folder">\\SERVER1\Clients\BPE\Location</Pr'),
('operty>'),
(' <Property Name="Access">Modify, Synchronize</Property>'),
(' <Property Name="User">Domain1\User1</Property>'),
(' </Object>'),
(' <Object>'),
(' <Property Name="Folder">\\SERVER1\Clients\BPE\Comments</Pr'),
('operty>'),
(' <Property Name="Access">ReadAndExecute, Synchronize</Property>'),
(' <Property Name="User">Domain1\User2</Propert'),
('y>'),
(' </Object>'),
(' <Object>'),
(' <Property Name="Folder">\\SERVER1\Clients\PRD\Accounts</Pr'),
('operty>'),
(' <Property Name="Access">ReadAndExecute, Synchronize</Property>'),
(' <Property Name="User">Domain1\User3</Propert'),
('y>'),
(' </Object>'),
('</Objects>');
SELECT item.ref.value('(Property/text())[1]', 'nvarchar(200)') AS Folder,
item.ref.value('(Property/text())[2]', 'nvarchar(500)') AS Access,
item.ref.value('(Property/text())[3]', 'nvarchar(200)') AS UserGroup
FROM (SELECT CAST((SELECT '' + line
FROM #output
FOR XML PATH(''),
TYPE).value('.[1]', 'varchar(max)') AS XML) AS FormattedXML)
AS Sub
CROSS APPLY FormattedXML.nodes('/Objects/Object') AS item (ref);
Ignore the part that constructs the test data, of course. It's just the final query that matters.
It uses a For XML Path trick that concatenates strings together. It's documented behavior, and nicely replaces the cursor.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 25, 2012 at 1:20 pm
I don't see much difference in performance. I have heard that OPENXML is faster, never used though. Do you think using OPENXML will be much faster? If so, how will I use OPENXML here?
With 31K records in the #output table it takes 40 mins.
Thanks for all your help.
October 25, 2012 at 2:08 pm
Honestly, I'm used to using OpenRowset's Bulk option to pull XML files into an XML column in a single step. I haven't used OpenXML in years.
Once the XML is in a table in an XML column, I can have XML indexes on it and all that kind of thing. I've processed some fairly large files (multiple Mb per file) pretty quickly that way.
So, I can't really speak to how well OpenXML will work on this.
Another option I've used successfully is SSIS to import and normalize XML data. But I've only done that with fairly small files. Usually no more than about 100kb for the ASCII-format XML files.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
October 25, 2012 at 2:36 pm
I just modified the SQL you sent and put the XML data in # table and wow it is much faster.
INSERT INTO #xmlData
SELECT CAST((SELECT '' + line
FROM #output
FOR XML PATH(''),
TYPE).value('.[1]', 'varchar(max)') AS XML) AS FormattedXML
SELECT item.ref.value('(Property/text())[1]', 'nvarchar(200)') AS Folder,
item.ref.value('(Property/text())[2]', 'nvarchar(500)') AS Access,
item.ref.value('(Property/text())[3]', 'nvarchar(200)') AS UserGroup
FROM #xmlData
AS Sub
CROSS APPLY XmlData.nodes('/Objects/Object') AS item (ref);
Thanks for all your help !
October 29, 2012 at 6:37 am
Excellent! Glad I could help.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply