July 5, 2012 at 4:13 am
dwain.c (7/4/2012)
+1 to Chris for trying!I guess now I'll need to check whether I'm using the right version of DelimitedSplit8K!
Cheers mate. It had to be done, and it was fun playing with this toy. Its performance in my little "real world" test harness is indistinguishable from the most recent DS8K.
The plans are sufficiently different to hint that "absolute zero" has been reached - TSQL won't do the job any faster.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 6, 2012 at 12:07 am
Jeff,
I just scanned all 35 pages of this discussion thread seeking the revised string splitter but unfortunately was unable to find it posted by you.
Can you point me to the page in the discussion where it is or repost it?
Thanks.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 6, 2012 at 3:12 am
dwain.c (7/6/2012)
Jeff,I just scanned all 35 pages of this discussion thread seeking the revised string splitter but unfortunately was unable to find it posted by you.
Can you point me to the page in the discussion where it is or repost it?
Thanks.
The alternatives from the discussion have been made part of the test set. You can find the test set in de article under the header resources (all the way to the bottom).
There are several zip files there, the one you seek is called Build all objects and run all tests.zip. The file in this zip containing all contributions is called Updated Build all objects and run all tests.sql
July 6, 2012 at 3:14 am
peter-757102 (7/6/2012)
dwain.c (7/6/2012)
Jeff,I just scanned all 35 pages of this discussion thread seeking the revised string splitter but unfortunately was unable to find it posted by you.
Can you point me to the page in the discussion where it is or repost it?
Thanks.
The alternatives coming from the discussion have been made part of the test set.
You can find the test set in de article under the header resources (all the way to the bottom).
There are several zip files there, the one you seek is called Build all objects and run all tests.zip
The file in this zip containing all contributions is called Updated Build all objects and run all tests.sql
Cool! Thanks Peter.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
July 6, 2012 at 6:18 am
ChrisM@Work (7/5/2012)
The plans are sufficiently different to hint that "absolute zero" has been reached - TSQL won't do the job any faster.
I wish I could take the credit for that but it was the "team" effort over many years (see the revision history in the code if you want to know more) culminated with the help of a couple of sharp folks in this very discussion. My hat is off to the community at large.
And thank you for trying once more. I'm not sure that "absolute zero" has actually been reached and your idea still has strong and logical merit. Someday, someone will look at that idea and find a way to implement it with great speed.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2012 at 6:23 am
peter-757102 (7/6/2012)
dwain.c (7/6/2012)
Jeff,I just scanned all 35 pages of this discussion thread seeking the revised string splitter but unfortunately was unable to find it posted by you.
Can you point me to the page in the discussion where it is or repost it?
Thanks.
The alternatives from the discussion have been made part of the test set. You can find the test set in de article under the header resources (all the way to the bottom).
There are several zip files there, the one you seek is called Build all objects and run all tests.zip. The file in this zip containing all contributions is called Updated Build all objects and run all tests.sql
Thank you for the "cover", Peter.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 22, 2012 at 10:18 am
While this article is a bit old I did want to comment on a trick that I've discovered to build an itvf for a xml splitter. This will perform better than the tally based splitter for most strings.
The basic trick is to create an intermediate UDF. This will force sql server to calculate the xml result first, instead of the usual bit where if you try to itvf it starts streaming and thus recalculating the result once it has to start chunking. (Last I heard it was every 64 bytes,yikes).
Here are my functions
CREATE function xmlify(@str varchar(max),@d varchar(1))
returns xml
begin
return '<r>'+replace(@str,@d,'</r><r>')+'</r>'
end
CREATE FUNCTION xmlSplit
(
@STR VARCHAR(max),
@d VARCHAR(1)
)
RETURNS table
as
return(
SELECT Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (select xml=dbo.xmlify(@str,@d)) cte
cross apply cte.xml.nodes('r') x(i)
)
ANd here is some test code I used to build an extremely large string and how long it took to split it up.
declare @STR nvarchar(max),@start datetime,@res varchar(8000);
select @STR=stuff((select top (1000*1000) concat(';',newid()) from sys.all_columns a,sys.all_columns b for xml path('')),1,1,'')
select datalength(@str)
select @start=getdate()
select @res=Item from xmlSplit(@str,';')
select datediff(ms,@start,getdate())
This created a 70mb string
Here is the timing from my machine: 11.636 sec
Not bad at all!
August 23, 2012 at 4:07 am
mburbea (8/22/2012)
While this article is a bit old I did want to comment on a trick that I've discovered to build an itvf for a xml splitter. This will perform better than the tally based splitter for most strings.The basic trick is to create an intermediate UDF. This will force sql server to calculate the xml result first, instead of the usual bit where if you try to itvf it starts streaming and thus recalculating the result once it has to start chunking. (Last I heard it was every 64 bytes,yikes).
...
Brilliant thinking - and it doesn't half work, around 75-125 x faster than without the udf (when tested on single strings of different sizes).
But what if you want to force the datatype conversion without the limitations of a udf, and gain ItemID to boot?
This does just that:
SELECT
ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
I wonder how well this works against a large number of rows, as opposed to a large number of characters in a string? Might find time later.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 23, 2012 at 4:48 am
ChrisM@Work (8/23/2012)
mburbea (8/22/2012)
While this article is a bit old I did want to comment on a trick that I've discovered to build an itvf for a xml splitter. This will perform better than the tally based splitter for most strings.The basic trick is to create an intermediate UDF. This will force sql server to calculate the xml result first, instead of the usual bit where if you try to itvf it starts streaming and thus recalculating the result once it has to start chunking. (Last I heard it was every 64 bytes,yikes).
...
Brilliant thinking - and it doesn't half work, around 75-125 x faster than without the udf (when tested on single strings of different sizes).
But what if you want to force the datatype conversion without the limitations of a udf, and gain ItemID to boot?
This does just that:
SELECT
ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
I wonder how well this works against a large number of rows, as opposed to a large number of characters in a string? Might find time later.
Just curious, but why this ?
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537August 23, 2012 at 5:01 am
Mark-101232 (8/23/2012)
ChrisM@Work (8/23/2012)
mburbea (8/22/2012)
While this article is a bit old I did want to comment on a trick that I've discovered to build an itvf for a xml splitter. This will perform better than the tally based splitter for most strings.The basic trick is to create an intermediate UDF. This will force sql server to calculate the xml result first, instead of the usual bit where if you try to itvf it starts streaming and thus recalculating the result once it has to start chunking. (Last I heard it was every 64 bytes,yikes).
...
Brilliant thinking - and it doesn't half work, around 75-125 x faster than without the udf (when tested on single strings of different sizes).
But what if you want to force the datatype conversion without the limitations of a udf, and gain ItemID to boot?
This does just that:
SELECT
ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
I wonder how well this works against a large number of rows, as opposed to a large number of characters in a string? Might find time later.
Just curious, but why this ?
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
Try with and without, Mark. A single row is streamed, as mburbea (8/22/2012) points out. Two rows causes complete conversion of the expression to xml.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 23, 2012 at 5:20 am
ChrisM@Work (8/23/2012)
Mark-101232 (8/23/2012)
ChrisM@Work (8/23/2012)
mburbea (8/22/2012)
While this article is a bit old I did want to comment on a trick that I've discovered to build an itvf for a xml splitter. This will perform better than the tally based splitter for most strings.The basic trick is to create an intermediate UDF. This will force sql server to calculate the xml result first, instead of the usual bit where if you try to itvf it starts streaming and thus recalculating the result once it has to start chunking. (Last I heard it was every 64 bytes,yikes).
...
Brilliant thinking - and it doesn't half work, around 75-125 x faster than without the udf (when tested on single strings of different sizes).
But what if you want to force the datatype conversion without the limitations of a udf, and gain ItemID to boot?
This does just that:
SELECT
ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
I wonder how well this works against a large number of rows, as opposed to a large number of characters in a string? Might find time later.
Just curious, but why this ?
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
Try with and without, Mark. A single row is streamed, as mburbea (8/22/2012) points out. Two rows causes complete conversion of the expression to xml.
Amazing!
PRINT '========== Multi Row - Not Streamed ==========================================='
DECLARE @STR VARCHAR(8000)
DECLARE @ItemID VARCHAR(8000)
DECLARE @Item VARCHAR(8000)
SET @STR = 'a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;'
SET @STR = REPLICATE(@str,32)
SET STATISTICS TIME ON
SELECT
@ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
@Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
SET STATISTICS TIME OFF;
GO
PRINT '========== Single Row - Streamed ==========================================='
DECLARE @STR VARCHAR(8000)
DECLARE @ItemID VARCHAR(8000)
DECLARE @Item VARCHAR(8000)
SET @STR = 'a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;'
SET @STR = REPLICATE(@str,32)
SET STATISTICS TIME ON
SELECT
@ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
@Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
--****UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
SET STATISTICS TIME OFF;
GO
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537August 23, 2012 at 5:33 am
Great thinking pattern !
Marvelous results !
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
August 23, 2012 at 7:47 am
I'm not seeing the XML trouncing the Delimittedsplit8k tvf though.
My test code I'm using DateTime2 for the greater accuracy
DECLARE @STR VARCHAR(8000)
DECLARE @ItemID VARCHAR(8000)
DECLARE @Item VARCHAR(8000)
SET @STR = 'a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;'
SET @STR = REPLICATE(@str,32)
DECLARE @start DATETIME2
DECLARE @end DATETIME2
DECLARE @start2 datetime2
DECLARE @end2 datetime2
SELECT @start=SYSDATETIME()
SELECT
ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
SELECT @end= SYSDATETIME()
SELECT @start2=SYSDATETIME()
SELECT
*
FROM
dbo.DelimitedSplit8K(@str,';')
SELECT @end2= SYSDATETIME()
SELECT
'xml',
DATEDIFF(ms,@start,@end)
UNION ALL
SELECT
'delimitedsplit8k',
DATEDIFF(ms,@start2,@end2)
Results after 100 runs (in ms)
delimitedsplit8k98
xml 115
Looks like a tie to me.
Ran another 100 samples and it comes in as
delimitedsplit8k 83
xml 152
not as much of a tie... but 152 ms is pretty handy. What I find interesting is that the Delimitedsplit8k seems to be more consistent. Or weirder yet it keeps getting faster 🙂
Ran yet another 100 samples and it comes in as
delimitedsplit8k69
xml 155
and again
delimitedsplit8k36
xml 159
So that's 400 samples I've done...
delimitedsplit8k 71 ms
xml 145 ms
I did this without clearing any part of the plan or data cache after I had run both the delimitedsplit8k udf and the xml method.
August 23, 2012 at 7:54 am
Realized I skipped the xml function tricky thing.
First run 100 times
runtype durationms
delimitedsplit8k89
xml 99
xmlfunction165
Second run 100 times
runtype durationms
delimitedsplit8k29
xml151
xmlfunction252
CREATE TABLE #Results(
runtype VARCHAR(50),
durationinms INT)
TRUNCATE TABLE #results
DECLARE @count INT=0
WHILE @count < 100
BEGIN
DECLARE @STR VARCHAR(8000)
DECLARE @ItemID VARCHAR(8000)
DECLARE @Item VARCHAR(8000)
SET @STR = 'a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z'
SET @STR = REPLICATE(@str,32)
DECLARE @start DATETIME2
DECLARE @end DATETIME2
DECLARE @start2 datetime2
DECLARE @end2 datetime2
DECLARE @start3 datetime2
DECLARE @end3 datetime2
SELECT @start=SYSDATETIME()
SELECT
ItemID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM (
SELECT Stringy = CAST('<r>' + REPLACE(@str,';','</r><r>')+'</r>' AS XML)
UNION ALL SELECT NULL -- eliminated by CROSS APPLY
) cte (xmlstring)
CROSS APPLY cte.xmlstring.nodes('r') x(i)
SELECT @end= SYSDATETIME()
SELECT @start2=SYSDATETIME()
SELECT
*
FROM
dbo.DelimitedSplit8K(@str,';')
SELECT @end2= SYSDATETIME()
SELECT @start3=SYSDATETIME()
select * from xmlSplit(@str,';')
SELECT @end3= SYSDATETIME()
INSERT #results
SELECT
'xml',
DATEDIFF(ms,@start,@end)
UNION ALL
SELECT
'delimitedsplit8k',
DATEDIFF(ms,@start2,@end2)
UNION ALL
SELECT
'xmlfunction',
DATEDIFF(ms,@start3,@end3)
SET @count+=1
END
go
SELECT
runtype,
AVG(durationinms)
FROM #results
GROUP BY
runtype
August 23, 2012 at 8:09 am
mtassin (8/23/2012)
I'm not seeing the XML trouncing the Delimittedsplit8k tvf though.....
Nor me. I've plugged it into the splitter test harness from the article attachent and on some permutations of the csv file it's dog slow. In fact, the dog appears to have missing limbs ("Take me out for a drag").
It does have a couple of plus points though - as the OP pointed out, it appears to be pretty good at super-long strings, and if you want to split a string without using a function i.e. inlined with the rest of the query, then it's really compact.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 346 through 360 (of 990 total)
You must be logged in to reply to this topic. Login to reply