April 2, 2008 at 6:18 am
Jeff Moden (4/1/2008)
Ok... here's my entry for the virtual beer...Looking forward to your entry 😉
No fair working at night... my wife won't let me get geeky at night.
April 2, 2008 at 12:13 pm
antonio.collins (4/2/2008)
No fair working at night... my wife won't let me get geeky at night.
Heh... understood... I keep telling mine "How can I miss you if you won't go away?" 😀 She thinks it's better than me playing pool or bowling because she knows where I am and I don't drink at home. 2 out of 3 ain't bad but I'm not giving up the "WTFs" for anyone, if you know what I mean 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
April 2, 2008 at 1:14 pm
antonio.collins (4/2/2008)
Jeff Moden (4/1/2008)
Ok... here's my entry for the virtual beer...Looking forward to your entry 😉
No fair working at night... my wife won't let me get geeky at night.
I'd respond to that...but none of the answers are PG!!!
😀
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 2, 2008 at 1:43 pm
I created test data as follows:
create table ParserTest (
ID int identity primary key,
String varchar(max))
go
insert into dbo.parsertest (string)
SELECT 'Subject=Drilling&Category=Drill Header&Status=NO&IPPDM_Count=94356'
+'&Well_Count=84353&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&'
+'BeginDate=2/3/2008&EndDate=2/5/2008'
from common.dbo.bignumbers
insert into dbo.parsertest (string)
SELECT 'Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count='
+'94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest'
+'=107674'
from common.dbo.bignumbers
Then I tested a Numbers Table solution. I used the Pivot function, instead of Jeff's solution, but otherwise quite similar:
set statistics io on
set statistics time on
;with
Parsed as
(select id, SUBSTRING(String + '&', number,
CHARINDEX('&', String + '&', number) - number) as Parsed,
row_number() over (order by number) as Row
FROM common.dbo.numbers
inner join dbo.parsertest
on number <= LEN(String)
and SUBSTRING('&' + String, number, 1) = '&'),
EAV (ID, Col, Val) as
(select id, left(parsed, charindex('=', parsed)-1) Col,
reverse(left(reverse(parsed), charindex('=', reverse(parsed))-1)) Val
from parsed)
select ID, Subject, Category, Status, IPPDM_Count,
Well_Count, Zone_Status, Prod_String_Count, BeginDate, EndDate
from
(select id, col, val
from eav) sub
pivot
(max(val)
for col in (Subject, Category, Status, IPPDM_Count,
Well_Count, Zone_Status, Prod_String_Count, BeginDate, EndDate)) pvt
It's 2M rows (instead of the 1M Jeff proposed), and I got these results:
=============================
(2000002 row(s) affected)
Table 'Numbers'. Scan count 2000002, logical reads 6000006, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParserTest'. Scan count 1, logical reads 47907, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4781359 ms, elapsed time = 4807388 ms.
=============================
Next is an XML parser test. (Separate post, these take WAY too long to run on this many rows on this machine.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 2, 2008 at 1:51 pm
XML Function test:
ALTER function [dbo].[StringParserXML]
(@String_in varchar(max),
@Delimiter_in char(10))
returns @Parsed table (
Parsed varchar(100))
as
-- This one is faster than StringParser, but it doesn't correctly handle
-- XML-specific characters, such as "<" or "&". StringParser will handle those
-- without difficulty.
begin
declare @XML xml
select @xml = ' '
insert into @parsed(parsed)
select x.i.value('.', 'varchar(100)')
from @xml.nodes('//i') x(i)
return
end
The test:
set statistics io on
set statistics time on
;with
Parsed as
(select id, parsed
FROM dbo.parsertest
cross apply common.dbo.stringparserxml(string, '&')),
EAV (ID, Col, Val) as
(select id, left(parsed, charindex('=', parsed)-1) Col,
reverse(left(reverse(parsed), charindex('=', reverse(parsed))-1)) Val
from parsed)
select ID, Subject, Category, Status, IPPDM_Count,
Well_Count, Zone_Status, Prod_String_Count, BeginDate, EndDate
from
(select id, col, val
from eav) sub
pivot
(max(val)
for col in (Subject, Category, Status, IPPDM_Count,
Well_Count, Zone_Status, Prod_String_Count, BeginDate, EndDate)) pvt
The results:
==============================
SQL Server parse and compile time:
CPU time = 13 ms, elapsed time = 13 ms.
(2000002 row(s) affected)
Table '#5B4453CB'. Scan count 2, logical reads 3000003, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ParserTest'. Scan count 1, logical reads 47907, physical reads 14999, read-ahead reads 288, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 220125 ms, elapsed time = 227719 ms.
==============================
The Numbers table version I used took over an hour (4781359 ms), while the XML version took just under 4 minutes (220125 ms). The XML version also had a HUGE advantage in terms of scan counts, which matters on a system with a lot of I/O bottlenecks.
Either I'm doing something horribly wrong in the Numbers table version, or the XML totally kicked the Numbers table version's ***.
Jeff, please tell me what I got wrong on the Numbers table version. I'm too tired (long day) and possibly just not seeing it right.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 2, 2008 at 2:03 pm
Just in case the Cross Apply is somehow more efficient than a straight up join (which is ridiculous on the face of it), I also tested:
ALTER function [dbo].[StringParser]
(@String_in varchar(max),
@Delimiter_in char(1))
returns table
as
return(
SELECT top 100 percent
SUBSTRING(@String_in+@Delimiter_in, number,
CHARINDEX(@Delimiter_in, @String_in+@Delimiter_in, number) - number) as Parsed,
row_number() over (order by number) as Row
FROM numbers
WHERE number <= LEN(@String_in)
AND SUBSTRING(@Delimiter_in + @String_in, number, 1) = @Delimiter_in
ORDER BY number
)
With this as the test:
;with
Parsed as
(select id, parsed
FROM dbo.parsertest
cross apply common.dbo.stringparser(string, '&')),
EAV (ID, Col, Val) as
(select id, left(parsed, charindex('=', parsed)-1) Col,
reverse(left(reverse(parsed), charindex('=', reverse(parsed))-1)) Val
from parsed)
select ID, Subject, Category, Status, IPPDM_Count,
Well_Count, Zone_Status, Prod_String_Count, BeginDate, EndDate
from
(select id, col, val
from eav) sub
pivot
(max(val)
for col in (Subject, Category, Status, IPPDM_Count,
Well_Count, Zone_Status, Prod_String_Count, BeginDate, EndDate)) pvt
I didn't get results, because I killed it after it had been running for a bit over 8 minutes, since the XML test took less than 4 minutes.
(I thought an interesting thing in all of these tests is that I got the first rows back almost instantly. I would have thought the Pivot function would be a "hidden RBAR" that would build a result set in tempdb, and then would run through it to pivot the results, but apparently it doesn't work that way.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 2, 2008 at 4:13 pm
Hi Jeff,
Your code works fine as per my original request. There is an error on my part on the requirement, I apolozise. Apparently, there are Rows that are NULL, less than 5 parameter items(eg. Subject, Category, Status). ..more than 5 parameter items, which I failed to mentioned. So therefore, 1st 5 items extraction from the parameter text would not work in my case. I probably need to search by item name Charidex(Subject=, Category=,Status= etc.) and then extract the values after the '=' (eg. Drilling, Drill Header, Yes, 4567, 4567)
Below is the table script and sample data of my SOURCE and also the Expected format of the RESULT set I'm trying to accomplish. LEt me know if you need additional info.
Thanks again!
-Tash
--===== If the test source table already exists, drop it
DROP TABLE #SourceTable
--===== Create the test table with
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE #SourceTable(
[ReportKey] [int] IDENTITY(1,1) PRIMARY KEY CLUSTERED,
[Parameters] [ntext] COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [TimeStart] [datetime] NOT NULL,
)
GO
--===== All Inserts into the IDENTITY column
SET IDENTITY_INSERT #SourceTable ON
--===== Insert the test data into the test table
INSERT INTO #SourceTable
(ReportKey, Parameters, TimeStart)
SELECT '2181','Subject=Zone&Category=Address&CountTable_B=NO&Zone_Count=95220&R_Master_Zone=95200','Feb 12 2008 5:18PM' UNION ALL
SELECT '2923','Subject=Drilling&Category=Drill Header&Status=YES&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356','Mar 10 2008 2:20PM' UNION ALL
SELECT '1284','Subject=Zone&Category=Address&CountTable_B=NO&Zone_Count=95229&R_Master_Zone=95224','Jan 20 2008 6:20PM' UNION ALL
SELECT '2513','Division=1&District=105&Date=03/28/2008 17:35:08','Mar 28 2008 5:35PM' UNION ALL
SELECT '134',NULL, 'Mar 28 2008 5:35PM' UNION ALL
SELECT '2169','begindate=03/11/2008 00:00:00&enddate=03/11/2008 00:00:00','Mar 11 2008 3:42PM' UNION ALL
SELECT '254','Subject=Zone&Category=Zone Header&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356&AddlZoneInterest=107674', 'Mar 28 2008 5:35PM' UNION ALL
SELECT '468','Category=Drill Header&Status=NO&IPPDM_Count=9435&Well_Count=8433&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94353&BeginDate=2/3/2008&EndDate=2/5/2008', 'Mar 28 2008 5:35PM' UNION ALL
SELECT '182','Division=1&Date=01/14/2008 09:38:27&District=94','Jan 15 2008 10:38PM' UNION ALL
SELECT '112','Subject=Zone&Category=Address&IPPDM_Count=94356&Well_Count=94356','Jan 21 2008 7:20PM' UNION ALL
SELECT '9854',NULL, 'Mar 28 2008 5:35PM' UNION ALL
SELECT '167','WellID=616311&reportdate=1/1/2008 1:30:00 PM','Jan 7 2008 10:05AM' UNION ALL
SELECT '2523','Subject=Drilling&Status=YES&IPPDM_Count=94356&Well_Count=94356&Zone_Status=NO&Prod_String_Count=95249&R_Master_W_Count=94356','Mar 21 2008 3:20PM' UNION ALL
SELECT '654',NULL, 'Mar 28 2008 5:35PM' UNION ALL
SELECT '2526','Division=2&District=99&Date=03/15/2008 06:35:01','Mar 15 2008 6:35AM' UNION ALL
SELECT '1285','Date=01/13/2008 21:50:09','Jan 13 2008 9:50PM' UNION ALL
SELECT '1326',NULL, 'Mar 28 2008 5:35PM' UNION ALL
SELECT '2563','WellID=617521&WellName:isnull=true','Mar 26 2008 8:26AM' UNION ALL
SELECT '192','Subject=Zone&Category=Address&IPPDM_Count=94356&Zone_Status=NO','Dec 10 2007 12:20PM'
--===== Set the identity insert back to normal
SET IDENTITY_INSERT #SourceTable ON
----===========Output table============
--===== If the test output table already exists, drop it
DROP TABLE #ResultTable
--===== Create the test output table with
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE #ResultTable(
[ReportKey] [int] IDENTITY(1,1) PRIMARY KEY CLUSTERED,
[Subject] [nvarchar] (200) NULL,
[Category] [nvarchar] (200) NULL,
[Status] [nvarchar] (200) NULL,
[Ippdm_Count] [nvarchar] (200) NULL,
[Well_Count] [nvarchar] (200) NULL,
[TimeStart] [datetime] NOT NULL,
)
GO
----------------------Resutlset should look like this-----------------Below is sample output sample data-----------
--===== All Inserts into the IDENTITY column
SET IDENTITY_INSERT #ResultTable ON
--===== Insert the test OUTPUT data into the test table
INSERT INTO #ResultTable
(ReportKey, Subject,Category, Status,Ippdm_Count,Well_Count,TimeStart)
SELECT '2181','Zone','Address','NO',NULL,'9510','Feb 12 2008 5:18PM' UNION ALL
SELECT '1284','Zone',NULL,'NO','95229','95224','Jan 20 2008 6:20PM' UNION ALL
SELECT '134',NULL,NULL,NULL,NULL,NULL,'Mar 28 2008 5:35PM' UNION ALL
SELECT '2923',NULL,'Drill Header','YES','94356','94356','Mar 10 2008 2:20PM' UNION ALL
SELECT '9854',NULL,NULL,NULL,NULL,NULL,'Mar 28 2008 5:35PM' UNION ALL
SELECT '192','Zone','Address',NULL,'94356','94356','Dec 10 2007 12:20PM'
--===== Set the identity insert back to normal
SET IDENTITY_INSERT #ResultTable ON
April 2, 2008 at 8:42 pm
declare @xml XML
-- convert all row data to an XML element
set @xml = (
select cast('<parm ReportKey="'+ cast(ReportKey as varchar(12))
+'" TimeStart="'+ cast(TimeStart as varchar(32))+ '" '
+ replace(replace(Parameters,'=','="'),'&','" ')+ '"/>' as XML) as data
from #SourceTable
for xml auto, type)
select D.parm.value('@ReportKey','int') as ReportKey,
D.parm.value('@Subject','varchar(255)') as [Subject],
D.parm.value('@Category','varchar(255)') as Category,
D.parm.value('@Status','varchar(255)') as [Status],
D.parm.value('@IPPDM_Count','varchar(255)') as IPPDM_Count,
D.parm.value('@Well_Count','varchar(255)') as Well_Count,
D.parm.value('@TimeStart','smalldatetime')
from @xml.nodes('/SourceTable/data/parm') as D(parm)
go
see http://www.sqlservercentral.com/Forums/Topic478799-338-1.aspx for more info.
April 2, 2008 at 10:27 pm
GSquared (4/2/2008)
Jeff, please tell me what I got wrong on the Numbers table version
At a high level, yes... You split on the "&" instead of the "=" which forces you into a second CTE. That and the REVERSE function (3 times) is very expensive.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 2, 2008 at 10:31 pm
Antonio,
Your solution doesn't work with the test data I created. Please post your test data generator and the result times you got.
Gus, you're splitting out more than requested and you haven't printed any of the result times.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 3, 2008 at 6:14 am
Jeff Moden (4/2/2008)
Antonio,Your solution doesn't work with the test data I created. Please post your test data generator and the result times you got.
Sorry. The Parameters column became ntext since the last time I looked. Also, I was using a physical SourceTable, not a temp table. Anyway, simple enough to correct. I also added some comments since the multiple replaces can get confusing. The block below works perfectly with the sample data script posted earlier.
declare @xml XML
-- convert all row data to an XML element
set @xml = (
select cast('<parm ReportKey="'+ cast(ReportKey as varchar(12))
+'" TimeStart="'+ cast(TimeStart as varchar(32))+ '" '
+ isnull(
replace(
replace(
replace( cast(Parameters as varchar(max)),
'=','="'), -- add leading quote to value
'&','" ') + '"', -- remove leading ampersand and add trailing quote to value
':isnull=','_isnull='), -- handle bitfield; suffix acts like a namespace
'') -- handle null parameters
+ '/>' as XML) as data -- replace XML with varchar(max) to see the constructed set
from #SourceTable
for xml raw('SourceData')) -- must use 'raw(tag)' if using temp table; 'auto' uses table name as parent element
select D.parm.value('@ReportKey','int') as ReportKey,
D.parm.value('@Subject','varchar(255)') as [Subject],
D.parm.value('@Category','varchar(255)') as Category,
D.parm.value('@Status','varchar(255)') as [Status],
D.parm.value('@IPPDM_Count','varchar(255)') as IPPDM_Count,
D.parm.value('@Well_Count','varchar(255)') as Well_Count,
D.parm.value('@TimeStart','smalldatetime') as TimeStart
from @xml.nodes('/SourceData/data/parm') as D(parm) -- .nodes() parameter is XPath selector
April 3, 2008 at 6:23 am
Jeff Moden (4/2/2008)
GSquared (4/2/2008)
Jeff, please tell me what I got wrong on the Numbers table versionAt a high level, yes... You split on the "&" instead of the "=" which forces you into a second CTE. That and the REVERSE function (3 times) is very expensive.
Yeah, but the XML version is doing exactly the same thing in those regards, and finished in 1/20th the time of the Numbers table version. If those were what was causing the delay, both would be slowed down.
I'm trying to figure out how the Numbers table split ended up so slow compared to the XML split. I think it's all the scans, but I'm not sure how to eliminate those. That's what I'm looking for help on.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 3, 2008 at 7:00 am
Jeff Moden (4/2/2008)
Antonio,Your solution doesn't work with the test data I created. Please post your test data generator and the result times you got.
Gus, you're splitting out more than requested and you haven't printed any of the result times.
I included result times in both tests. Look on page 2 of this thread.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
April 3, 2008 at 8:03 am
Many many thanks Antonio. Your XML solution gets me the expected result. I'd also like to thank GSquard and Jeff for their contribution.
-Lalon
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply