August 28, 2012 at 5:46 am
SQL Kiwi (8/28/2012)
Usman Butt (8/28/2012)
But MTVF code speeded up things quite significantly which I shared earlier. For "not created in the same batch" part here is an example of the same batch table values split in which the appending no-length string have no effect 🙂I have some good news for you:
DECLARE @STR VARCHAR(8000);
DECLARE @table TABLE
(
DelimitedStr VARCHAR(8000)
)
SET @STR = 'a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;';
SET @STR = REPLICATE(@str,32);
INSERT @table
(
[DelimitedStr]
)
VALUES (
)
-- Fast
;WITH CTE ([DelimitedStr], xmlstring) AS (
SELECT [DelimitedStr],xmlstring FROM @table AS T
CROSS APPLY(
SELECT CAST('<r>' + REPLACE([DelimitedStr],';','</r><r>')+'</r>' + LEFT(NEWID(),0) AS XML)) x (xmlstring))
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM CTE CROSS APPLY cte.xmlstring.nodes('r') x(i);
Good news not yet. :unsure: I did not opt this while testing since I want it to be an iTVF, So we know NEWID() cannot be used :(. I know the workaround of using VIEW but that too slows down the code pretty much. I hope you can get around this limitation as well 😎
August 28, 2012 at 6:11 am
SQL Kiwi (8/28/2012)
Jeff Moden (8/26/2012)
Ok... I tried those two. Why does the LEFT(@@Dbts,0) speed things up so much?It's intriguing, enough to be the subject of my next blog post after the one I am writing now. The others on this thread came commendably close to explaining why (it does indeed have quite a bit to do with non-determinism, though not constant-folding). Y'all might work out the details in the meantime, but I'll post a link to my post anyway when it goes live. There's just not space here for a decent explanation.
Ohhhhh...Seems like we have to wait for a while 🙁 But good news is yet another blog of yours is coming up on sqlblog 😎 BTW, still trying to digest your excellent posts on temporary tables 😉
August 28, 2012 at 8:22 am
SQL Kiwi (8/28/2012)
Jeff Moden (8/26/2012)
Ok... I tried those two. Why does the LEFT(@@Dbts,0) speed things up so much?It's intriguing, enough to be the subject of my next blog post after the one I am writing now. The others on this thread came commendably close to explaining why (it does indeed have quite a bit to do with non-determinism, though not constant-folding). Y'all might work out the details in the meantime, but I'll post a link to my post anyway when it goes live. There's just not space here for a decent explanation.
Is it something that could change characteristics based on the next change that MS might bless us with or do you think it's stable for the good part of the future? I ask because I tried other functions and string constants in the LEFT function that don't produce the same performance. Of course, those things were deterministic in nature. Heh... and yeah... I'm one that certainly knows that MS could change even documented features. I REALLY look forward to your blog post on the subject because I love this kind of stuff. Thanks, Paul.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 28, 2012 at 8:57 am
Jeff Moden (8/28/2012)
Is it something that could change characteristics based on the next change that MS might bless us with or do you think it's stable for the good part of the future? I ask because I tried other functions and string constants in the LEFT function that don't produce the same performance. Of course, those things were deterministic in nature. Heh... and yeah... I'm one that certainly knows that MS could change even documented features. I REALLY look forward to your blog post on the subject because I love this kind of stuff. Thanks, Paul.
Perhaps set up a poll and an easy to run test harness, so all of us can report back precise SQL Server versions and the results - I'd be worried that a previous MS change affected it, as well. As a community we have several versions, quite a few SP's, and many CU's in use.
August 28, 2012 at 9:10 am
Nadrek (8/28/2012)
Jeff Moden (8/28/2012)
Is it something that could change characteristics based on the next change that MS might bless us with or do you think it's stable for the good part of the future? I ask because I tried other functions and string constants in the LEFT function that don't produce the same performance. Of course, those things were deterministic in nature. Heh... and yeah... I'm one that certainly knows that MS could change even documented features. I REALLY look forward to your blog post on the subject because I love this kind of stuff. Thanks, Paul.Perhaps set up a poll and an easy to run test harness, so all of us can report back precise SQL Server versions and the results - I'd be worried that a previous MS change affected it, as well. As a community we have several versions, quite a few SP's, and many CU's in use.
Excellent Idea. :Wow: But so far I have seen the same behavior on 2005, 2008, 2008 R2. But CUs, SPs etc. could change the behavior.
August 28, 2012 at 3:21 pm
Jeff Moden (8/28/2012)
Is it something that could change characteristics based on the next change that MS might bless us with or do you think it's stable for the good part of the future?
It's not documented at the moment (I don't generally write about documented things) and the behaviour changed at least once (in SQL Server 2005) so yes, this is a curiosity and not something one could rely on. That's not to say there are no circumstances in which one might do so, but it comes with all the usual risks.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
August 28, 2012 at 3:55 pm
Usman Butt (8/28/2012)
Good news not yet. :unsure: I did not opt this while testing since I want it to be an iTVF, So we know NEWID() cannot be used :(. I know the workaround of using VIEW but that too slows down the code pretty much. I hope you can get around this limitation as well 😎
It's a bit tricky in an in-line TVF with column references; one option is to use Brad Shultz's trick (not quite as fast):
DECLARE @STR VARCHAR(8000);
DECLARE @table TABLE
(
DelimitedStr VARCHAR(8000)
)
SET @STR = 'a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;';
SET @STR = REPLICATE(@str,32);
INSERT @table
(
[DelimitedStr]
)
VALUES (
)
;WITH CTE ([DelimitedStr], xmlstring) AS (
SELECT [DelimitedStr],xmlstring FROM @table AS T
CROSS APPLY(
SELECT CAST('<r>' + REPLACE([DelimitedStr],';','</r><r>')+'</r>' AS XML).query('.')) x (xmlstring))
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM CTE CROSS APPLY cte.xmlstring.nodes('r') x(i);
By the way, Oleg's XML function also fails when a semicolon is the delimiter and characters like '<' or '&' are present. In the version you posted, it had a comma hard-coded as the delimiter (the @Delimiter parameter in the function definition is unused).
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
August 28, 2012 at 11:04 pm
SQL Kiwi (8/28/2012)
By the way, Oleg's XML function also fails when a semicolon is the delimiter and characters like '<' or '&' are present.
Yes, absolutely correct. This is why I said that RCTE version could be a better option unless one can lives with these caveats 😉
In the version you posted, it had a comma hard-coded as the delimiter (the @Delimiter parameter in the function definition is unused).
Yes, this was already highlighted. The funny thing was that the code I was using doesn't have the comma hard-coded :hehe: Should have cover it with the argument that it was deliberate in order to know someone used it? 😉 But my bad, I did not change it here then (I did it now). Thanks for pointing it out.
August 29, 2012 at 1:02 am
SQL Kiwi (8/28/2012)
Usman Butt (8/28/2012)
Good news not yet. :unsure: I did not opt this while testing since I want it to be an iTVF, So we know NEWID() cannot be used :(. I know the workaround of using VIEW but that too slows down the code pretty much. I hope you can get around this limitation as well 😎It's a bit tricky in an in-line TVF with column references; one option is to use Brad Shultz's trick (not quite as fast):
DECLARE @STR VARCHAR(8000);
DECLARE @table TABLE
(
DelimitedStr VARCHAR(8000)
)
SET @STR = 'a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;';
SET @STR = REPLICATE(@str,32);
INSERT @table
(
[DelimitedStr]
)
VALUES (
)
;WITH CTE ([DelimitedStr], xmlstring) AS (
SELECT [DelimitedStr],xmlstring FROM @table AS T
CROSS APPLY(
SELECT CAST('<r>' + REPLACE([DelimitedStr],';','</r><r>')+'</r>' AS XML).query('.')) x (xmlstring))
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM CTE CROSS APPLY cte.xmlstring.nodes('r') x(i);
Ok. I think I am getting a bit more understanding of this behavior (Seems like it :hehe:). I have managed to use NEWID()’s zero length string in iTVF with the same speed as was for the inline query. Another good thing about this iTVF is that we are doing the entitization as well and not losing the execution speed.
USE [tempdb]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[XmlInlineSplitter]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[XmlInlineSplitter]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE FUNCTION [dbo].[XmlInlineSplitter]
(
@STR VARCHAR(MAX),
@Delimiter VARCHAR(1),
@NewId VARCHAR(50)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH CTE ( xmlstring) AS (
SELECT CAST('<r>' + REPLACE((SELECT @STR FOR XML PATH('')),@Delimiter,'</r><r>')+'</r>' + LEFT(@NewId,0) AS XML))
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = x.i.value('text()[1]', 'varchar(8000)')
FROM CTE CROSS APPLY cte.xmlstring.nodes('r') x(i)
GO
SET NOCOUNT ON;
DECLARE @STR VARCHAR(8000), @ItemId BIGINT, @Item VARCHAR(8000);
DECLARE @table TABLE
(
DelimitedStr VARCHAR(8000)
)
INSERT @table
(
[DelimitedStr]
)
SELECT REPLICATE('a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;', 32)
UNION ALL
SELECT REPLICATE('y;a;b;c;z;d;e;c;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;', 32);
SET STATISTICS TIME ON;
SELECT @ItemID = ItemID,@Item = Item FROM @table AS T
CROSS APPLY dbo.XmlInlineSplitter([DelimitedStr], ';', NEWID())
SET STATISTICS TIME OFF;
I do not know BWAA-HAAA!!!! is the right word to use here but using it anyhow:-D:-D But couple of the many questions comes to mind are: Is it an opening of a new dimension? Should all other code be revisited where this undocumented hack could be utilized to speed things up?
All these questions may be answered when your blog becomes live. Cannot wait for that. I hope you would be kind enough to share the in-depth details as soon as possible 😎
Last but not least, I do not see such use of NEWID() at runtime is a different behavior, but just in case if it is different, then please shed some light on this as well. Thanks.
August 29, 2012 at 2:40 am
Usman Butt (8/29/2012)
I do not know BWAA-HAAA!!!! is the right word to use here but using it anyhow.
Ha! Very good. You know passing NEWID as a parameter is cheating, right? I won't go on about it, because I'm sure the limitations of this idea are very apparent to you. Aside from that, I dislike implicit conversions where than can be avoided. So, hoping you don't mind, here's a slightly tidied version, with the formatting I prefer, and explicit type conversion. Anyway, well done.
IF OBJECT_ID(N'dbo.XmlInlineSplitter', N'IF') IS NOT NULL
DROP FUNCTION dbo.XmlInlineSplitter;
GO
CREATE FUNCTION dbo.XmlInlineSplitter
(
@STR varchar(max),
@Delimiter char(1),
@NewId uniqueidentifier
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = T2.i.value('./text()[1]', 'varchar(8000)')
FROM
(
SELECT
CONVERT(xml,
'<r>' +
REPLACE(
(SELECT @STR FOR XML PATH('')),
@Delimiter,
CONVERT(nvarchar(max), N'</r><r>')
) + N'</r>' +
LEFT(CONVERT(nvarchar(40),@NewId),0)
)
) AS T1 (xmlstring)
CROSS APPLY T1.xmlstring.nodes('./r') AS T2(i);
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
August 29, 2012 at 3:07 am
SQL Kiwi (8/29/2012)
Usman Butt (8/29/2012)
I do not know BWAA-HAAA!!!! is the right word to use here but using it anyhow.Ha! Very good. You know passing NEWID as a parameter is cheating, right? I won't go on about it, because I'm sure the limitations of this idea are very apparent to you. Aside from that, I dislike implicit conversions where than can be avoided. So, hoping you don't mind, here's a slightly tidied version, with the formatting I prefer, and explicit type conversion. Anyway, well done.
IF OBJECT_ID(N'dbo.XmlInlineSplitter', N'IF') IS NOT NULL
DROP FUNCTION dbo.XmlInlineSplitter;
GO
CREATE FUNCTION dbo.XmlInlineSplitter
(
@STR varchar(max),
@Delimiter char(1),
@NewId uniqueidentifier
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = T2.i.value('./text()[1]', 'varchar(8000)')
FROM
(
SELECT
CONVERT(xml,
'<r>' +
REPLACE(
(SELECT @STR FOR XML PATH('')),
@Delimiter,
CONVERT(nvarchar(max), N'</r><r>')
) + N'</r>' +
LEFT(CONVERT(nvarchar(40),@NewId),0)
)
) AS T1 (xmlstring)
CROSS APPLY T1.xmlstring.nodes('./r') AS T2(i);
Thanks a lot for the compliment 😎 Yes, your version is much tidier 😀
But I have got two questions which I forgot before
1. Does Replacing an expression from varchar(max) takes less time than nvarchar(max)? I have seen this behavior many times. Not to forget the Replace bug you pointed out. So should not the final version have the Binary Collation conversion? 🙂
2. My thinking is that NEWID() update SQL server internal tables? This is why this function cannot be part of a user defined function? Also what about the RAND() function? Does it change something internally as well? If yes, then why it does not have the same behavior? I guess some part of this question is related to your blog, so not expecting much.
August 29, 2012 at 4:47 am
Usman Butt (8/29/2012)
1. Does Replacing an expression from varchar(max) takes less time than nvarchar(max)? I have seen this behavior many times.
The varchar version will be half the size of the nvarchar, so yes. The explicit conversions are to remove the CONVERT_IMPLICITs from the query plan; I didn't spend any time testing what was faster.
Not to forget the Replace bug you pointed out. So should not the final version have the Binary Collation conversion?
I didn't want to assume anything about the comparison semantic that would be appropriate for all users.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
August 29, 2012 at 7:04 am
SQL Kiwi (8/29/2012)
Usman Butt (8/29/2012)
1. Does Replacing an expression from varchar(max) takes less time than nvarchar(max)? I have seen this behavior many times.The varchar version will be half the size of the nvarchar, so yes. The explicit conversions are to remove the CONVERT_IMPLICITs from the query plan; I didn't spend any time testing what was faster.
I did some testing and explicit conversion to varchar(max) after entitization process optimized it quite a bit. So with your code formatting style and with no implicit conversions (hope so), this is what I have come up with
IF OBJECT_ID(N'dbo.XmlInlineSplitter', N'IF') IS NOT NULL
DROP FUNCTION dbo.XmlInlineSplitter;
GO
CREATE FUNCTION dbo.XmlInlineSplitter
(
@STR varchar(max),
@Delimiter varchar(1),
@NewId uniqueidentifier
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = T2.i.value('./text()[1]', 'varchar(8000)')
FROM
(
SELECT
CONVERT(xml,
'<r>' +
REPLACE(
CONVERT(varchar(max),(SELECT @STR FOR XML PATH(''))),
@Delimiter,
CONVERT(varchar(max), '</r><r>')
) + '</r>' +
LEFT(CONVERT(varchar(40),@NewId),0)
)
) AS T1 (xmlstring)
CROSS APPLY T1.xmlstring.nodes('./r') AS T2(i);
August 29, 2012 at 7:18 am
Usman Butt (8/29/2012)
I did some testing and explicit conversion to varchar(max) after entitization process optimized it quite a bit. So with your code formatting style and with no implicit conversions (hope so), this is what I have come up with
Yes that is better. Of course now someone will need to split:
DECLARE @x nvarchar(max) = N'?????,??,???,????,??????,???????????,?????,??,?????,???';
SELECT * FROM dbo.XmlInlineSplitter(@x, ',', NEWID()) AS xis;
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
August 29, 2012 at 7:38 am
SQL Kiwi (8/29/2012)
Usman Butt (8/29/2012)
I did some testing and explicit conversion to varchar(max) after entitization process optimized it quite a bit. So with your code formatting style and with no implicit conversions (hope so), this is what I have come up withYes that is better. Of course now someone will need to split:
DECLARE @x nvarchar(max) = N'?????,??,???,????,??????,???????????,?????,??,?????,???';
SELECT * FROM dbo.XmlInlineSplitter(@x, ',', NEWID()) AS xis;
hahahaha..but your version too started with varchar(max) as the parameter, so would the explicit conversion to nvarchar(max) would have matter :hehe:
So for Unicode data, nvarchar(max) version could be something like
IF OBJECT_ID(N'dbo.XmlInlineSplitter_For_Unicode_Data', N'IF') IS NOT NULL
DROP FUNCTION dbo.XmlInlineSplitter;
GO
CREATE FUNCTION dbo.XmlInlineSplitter_For_Unicode_Data
(
@STR nvarchar(max),
@Delimiter nvarchar(1),
@NewId uniqueidentifier
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
SELECT
ItemID = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
Item = T2.i.value('./text()[1]', 'nvarchar(max)')
FROM
(
SELECT
CONVERT(xml,
N'<r>' +
REPLACE(
(SELECT @STR FOR XML PATH('')),
@Delimiter,
CONVERT(nvarchar(max), N'</r><r>')
) + N'</r>' +
LEFT(CONVERT(nvarchar(40),@NewId),0)
)
) AS T1 (xmlstring)
CROSS APPLY T1.xmlstring.nodes('./r') AS T2(i);
Viewing 15 posts - 391 through 405 (of 990 total)
You must be logged in to reply to this topic. Login to reply