September 20, 2010 at 4:55 pm
gah (9/20/2010)
Jeff Moden (9/20/2010)
Thank you one an all for your help. 🙂
Jeff...you are more than welcome, it was the least I could do to repay you for the amount of help you have provided to so many others like myself in the past.
Kind regards Graham
I'm humbled, Graham. Thank you for your kind words.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 20, 2010 at 5:10 pm
Jeff Moden (9/20/2010)
P.S. I really want to know when you took up mind-reading, Ron. I was about to post and ask for a full run like that. Thanks.
Jeff thank yourself - unknown to you I have learned a vast amount of good, solid things to do in SQL Server - having come from a background of an interface (GUI) programmer where everything is done RBAR - reading your articles / posts was the dawn of a new day for me.
And thanks to all the others who have contributed to my learning.
That and learning to test and understand what was posted by others, when I see something new I have to do just that. Reading Brad Schulz post to this forum got me to wanting to learn and test, and once tested thought why trash it, Jeff might be able to use. The old saying "curiosity killed the cat, satisfaction bought it back"
September 20, 2010 at 7:21 pm
bitbucket-25253 (9/20/2010)
Jeff Moden (9/20/2010)
P.S. I really want to know when you took up mind-reading, Ron. I was about to post and ask for a full run like that. Thanks.Jeff thank yourself - unknown to you I have learned a vast amount of good, solid things to do in SQL Server - having come from a background of an interface (GUI) programmer where everything is done RBAR - reading your articles / posts was the dawn of a new day for me.
And thanks to all the others who have contributed to my learning.
That and learning to test and understand what was posted by others, when I see something new I have to do just that. Reading Brad Schulz post to this forum got me to wanting to learn and test, and once tested thought why trash it, Jeff might be able to use. The old saying "curiosity killed the cat, satisfaction bought it back"
That's mighty kind of you to say, Ron. Looking forward to seeing you this February. Thanks again.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 21, 2010 at 9:39 pm
rob.symonds (9/20/2010)
Anyone else a little concerned about the 2k8 results for the inline Tally?Sorry, I would obviously try this myself but I don't have a 2k8 instance to hand, could someone try a multi-line tally version on 2k8?
Below is Jeff's original just with the ML Tally inserted
Don't worry Jeff, I didn't forget to attach the results of the query in the original post.
--===== Do this in a nice safe place that everyone has
USE tempdb;
GO
--===================================================================
-- Create a Tally table from 1 to 11000
--===================================================================
--===== If the Tally table already exists here, drop it to make reruns easier.
IF OBJECT_ID('tempdb.dbo.Tally','U') IS NOT NULL
DROP TABLE dbo.Tally;
--===== Create and populate the Tally table on the fly.
-- This ISNULL function makes the column NOT NULL
-- so we can put a Primary Key on it
SELECT TOP 11000
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
;
--===== Add a CLUSTERED Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
;
GO
--=====================================================================================================================
-- Create the various functions to test
--=====================================================================================================================
--===== Do this in a nice safe place that everyone has
USE tempdb;
--===== Tally Table (Split8KTally iTVF) ===============================================================================
IF OBJECT_ID('tempdb.dbo.Split8KTally') IS NOT NULL
DROP FUNCTION dbo.Split8KTally
GO
CREATE FUNCTION dbo.Split8KTally
(@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
SUBSTRING(@Parameter,N,CHARINDEX(@Delimiter,@Parameter+@Delimiter,N)-N) AS ItemValue
FROM dbo.Tally
WHERE N BETWEEN 1 AND LEN(@Parameter)
AND SUBSTRING(@Delimiter+@Parameter,N,1) = @Delimiter; --Notice how we find the comma
GO
--===== Tally Table (Split8KTally iTVF) ===============================================================================
IF OBJECT_ID('tempdb.dbo.Split8KTally') IS NOT NULL
DROP FUNCTION dbo.Split8KTally
GO
CREATE FUNCTION dbo.Split8KTally
(@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
SUBSTRING(@Parameter,N,CHARINDEX(@Delimiter,@Parameter+@Delimiter,N)-N) AS ItemValue
FROM dbo.Tally
WHERE N BETWEEN 1 AND LEN(@Parameter)
AND SUBSTRING(@Delimiter+@Parameter,N,1) = @Delimiter; --Notice how we find the comma
GO
--===== Tally Table (Split8KTally mlTVF) ===============================================================================
IF OBJECT_ID('tempdb.dbo.Split8KTally_mlTVF') IS NOT NULL
DROP FUNCTION dbo.Split8KTally_mlTVF
GO
Create FUNCTION [dbo].[Split8KTally_mlTVF]
(@Parameter varchar(8000), @Delimiter varchar(5))
RETURNS @RtnValue table (Id int identity(1,1), [Value] varchar(8000))
AS BEGIN
SET @Parameter = @Delimiter + @Parameter + @Delimiter
DECLARE @SplitCharLength int;
SET @SplitCharLength = len(@Delimiter);
INSERT INTO @RtnValue (Value)
SELECT SUBSTRING(@Parameter, (N + @SplitCharLength), CHARINDEX(@Delimiter, @Parameter, N+1) - (N + @SplitCharLength))
FROM dbo.Tally
WHERE N < (LEN(@Parameter) - @SplitCharLength)
AND SUBSTRING(@Parameter, N, @SplitCharLength) = @Delimiter
RETURN
END
GO
--===== Loop Method 1 (Split8KL1 mlTVF) ===============================================================================
IF OBJECT_ID('tempdb.dbo.Split8KL1') IS NOT NULL
DROP FUNCTION dbo.Split8KL1
GO
CREATE FUNCTION dbo.Split8KL1
(@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))
RETURNS @Result TABLE (ItemNumber INT IDENTITY(1,1), ItemValue VARCHAR(8000))
WITH SCHEMABINDING AS
BEGIN
--===== Declare a variable to remember the position of the current comma
DECLARE @N INT;
--===== Add start and end commas to the Parameter so we can handle
-- single elements
SELECT @Parameter = @Delimiter + @Parameter + @Delimiter,
--===== Preassign the current comma as the first character
@N = 1;
--===== Loop through and find each comma, then insert the string value
-- found between the current comma and the next comma. @N is
-- the position of the current comma.
WHILE @N < LEN(@Parameter) --Don't include the last comma
BEGIN
--==== Do the insert using the value between the commas
INSERT INTO @Result (ItemValue)
SELECT SUBSTRING(@Parameter, @N+1, CHARINDEX(@Delimiter, @Parameter, @N+1)-@N-1);
--==== Find the next comma
SELECT @N = CHARINDEX(@Delimiter, @Parameter, @N+1);
END; --END While
RETURN;
END; --END Function
GO
--===== XML-Brad (Split8KXMLBrad iTVF) ======================================================================================
IF OBJECT_ID('dbo.Split8KXMLBrad') IS NOT NULL
DROP FUNCTION dbo.Split8KXMLBrad
GO
CREATE FUNCTION dbo.Split8KXMLBrad
(@Parameter VARCHAR(MAX), @Delimiter VARCHAR(1))
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ItemNumber,
x.i.value('(./text())[1]','varchar(8000)') AS ItemValue
from (select XMLList=cast('<i>'+replace(@Parameter,@Delimiter,'</i><i>')+'</i>' as xml).query('.')) a
cross apply XMLList.nodes('i') x(i)
;
GO
--=====================================================================================================================
-- Build some test data from the table that is actually using in the tests
--=====================================================================================================================
--===== Build a thousand item CSV row without it being dependent on other test tables
DECLARE @CSVList VARCHAR(8000);
SELECT @CSVList = STUFF(
(
SELECT TOP 1000 ','+ CAST(SalesOrderDetailID AS VARCHAR(10))
FROM AdventureWorks.Sales.SalesOrderDetail
ORDER BY NEWID()
FOR XML PATH('')
)
,1,1,'');
--===== Show what we have for a parameter now...
PRINT @CSVList;
--=====================================================================================================================
-- Run the tests using Brad's code
--=====================================================================================================================
--===== Commence testing ...
SET STATISTICS TIME,IO ON;
--And use it in a query:
PRINT '--===== Split8KTally ==========================================================================================='
declare @ProductID int, @ProductName varchar(50), @SalesOrderID int, @OrderDate datetime, @TerritoryName varchar(50);
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KTally(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.ItemValue as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
PRINT '--===== Split8KTally mlTVF ==========================================================================================='
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KTally_mlTVF(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.Value as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
--declare @ProductID int, @ProductName varchar(50), @SalesOrderID int, @OrderDate datetime, @TerritoryName varchar(50);
PRINT '--===== Split8KXMLBrad ==========================================================================================='
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KXMLBrad(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.ItemValue as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
--declare @ProductID int, @ProductName varchar(50), @SalesOrderID int, @OrderDate datetime, @TerritoryName varchar(50);
PRINT '--===== Split8KL1 ==========================================================================================='
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KL1(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.ItemValue as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
SET STATISTICS TIME,IO OFF;
--=====================================================================================================================
-- Houskeeping
--=====================================================================================================================
--===== Do this in a nice safe place that everyone has
USE tempdb;
GO
--===== Drop all the object that we just created
DROP FUNCTION dbo.Split8KTally, dbo.Split8KL1, dbo.Split8KXMLBrad;
DROP TABLE dbo.Tally;
GO
So I'm sitting in the corner sucking my thumb, twiddling my hair, nervously clutching my favorite blanky while rocking back and forth, occasionally taking a lick from a beer lollipop and muttering a constant stream of "WTF over?". The "reflection" back into the Tally table based function that caused a full Cartesian Join was still fresh in my mind and I asked myself why the "reflection" was occurring in 2008 and not 2005? Then I said "Who cares... let's try something different!" I knew it was worth thinking about because even Brad managed to squeak out a "Whoa!" when it came to the speed of the Tally Table solution in 2005. There had to be a way in 2008.
Then it dawned on me to try an mlTVF instead of an iTVF so, still clutching my favorite blanky and the beer lollipop, I built one... and it smokes all the code anyone has tried on 2008 yet. Or, at least I thought "yet". That's when I remembered seeing something about a post in this thread and went looking for it. Obviously, I found it (quoted above) and couldn't understand why we didn't actually try it and the answer is that Rob was having problems uploading the results and I flat forgot about it.
So, here it is... the Tally Table solution that beats the XML and the While Loop in 2008 without any fancy flags or option settings. It's a very old solution normally frowned upon by wiz-kids. Sorry it took so long. I'm not as old as BitBucket but I will blame it all on a bit of CRS. I'm also a wee bit embarassed that I let these kids whip my hiney with XML and While Loops. 😉
Here are the results from that second bit of code with the 3 original players and the new (old) Tally Table solution...
Routine
Split8KTally CPU time = 101573 ms, elapsed time = 57478 ms.
Split8KXMLBrad CPU time = 732 ms, elapsed time = 706 ms.
Split8KL1 CPU time = 63 ms, elapsed time = 81 ms.
Split8KTallyMulti CPU time = 31 ms, elapsed time = 27 ms.
Here's the test code for anyone who wants to do their own runs.
--===== Do this in a nice safe place that everyone has
USE tempdb;
GO
--===================================================================
-- Create a Tally table from 1 to 11000
--===================================================================
--===== If the Tally table already exists here, drop it to make reruns easier.
IF OBJECT_ID('tempdb.dbo.Tally','U') IS NOT NULL
DROP TABLE dbo.Tally;
--===== Create and populate the Tally table on the fly.
-- This ISNULL function makes the column NOT NULL
-- so we can put a Primary Key on it
SELECT TOP 11000
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
;
--===== Add a CLUSTERED Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
;
GO
--=====================================================================================================================
-- Create the various functions to test
--=====================================================================================================================
--===== Do this in a nice safe place that everyone has
USE tempdb;
--===== Tally Table (Split8KTally iTVF) ===============================================================================
IF OBJECT_ID('tempdb.dbo.Split8KTally') IS NOT NULL
DROP FUNCTION dbo.Split8KTally
GO
CREATE FUNCTION dbo.Split8KTally
(@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
SUBSTRING(@Parameter,N,CHARINDEX(@Delimiter,@Parameter+@Delimiter,N)-N) AS ItemValue
FROM dbo.Tally
WHERE N BETWEEN 1 AND LEN(@Parameter)+1
AND SUBSTRING(@Delimiter+@Parameter,N,1) = @Delimiter; --Notice how we find the comma
GO
--===== XML-Brad (Split8KXMLBrad iTVF) ================================================================================
IF OBJECT_ID('dbo.Split8KXMLBrad') IS NOT NULL
DROP FUNCTION dbo.Split8KXMLBrad
GO
CREATE FUNCTION dbo.Split8KXMLBrad
(@Parameter VARCHAR(MAX), @Delimiter VARCHAR(1))
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ItemNumber,
x.i.value('(./text())[1]','varchar(8000)') AS ItemValue
from (select XMLList=cast('<i>'+replace(@Parameter,@Delimiter,'</i><i>')+'</i>' as xml).query('.')) a
cross apply XMLList.nodes('i') x(i)
;
GO
--===== Loop Method 1 (Split8KL1 mlTVF) ===============================================================================
IF OBJECT_ID('tempdb.dbo.Split8KL1') IS NOT NULL
DROP FUNCTION dbo.Split8KL1
GO
CREATE FUNCTION dbo.Split8KL1
(@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))
RETURNS @Result TABLE (ItemNumber INT IDENTITY(1,1), ItemValue VARCHAR(8000))
WITH SCHEMABINDING AS
BEGIN
--===== Declare a variable to remember the position of the current comma
DECLARE @N INT;
--===== Add start and end commas to the Parameter so we can handle
-- single elements
SELECT @Parameter = @Delimiter + @Parameter + @Delimiter,
--===== Preassign the current comma as the first character
@N = 1;
--===== Loop through and find each comma, then insert the string value
-- found between the current comma and the next comma. @N is
-- the position of the current comma.
WHILE @N < LEN(@Parameter) --Don't include the last comma
BEGIN
--==== Do the insert using the value between the commas
INSERT INTO @Result (ItemValue)
SELECT SUBSTRING(@Parameter, @N+1, CHARINDEX(@Delimiter, @Parameter, @N+1)-@N-1);
--==== Find the next comma
SELECT @N = CHARINDEX(@Delimiter, @Parameter, @N+1);
END; --END While
RETURN;
END; --END Function
GO
--===== Tally Table (Split8KTallyMulti iTVF) ==========================================================================
IF OBJECT_ID('tempdb.dbo.Split8KTallyMulti') IS NOT NULL
DROP FUNCTION dbo.Split8KTallyMulti
;
GO
CREATE FUNCTION dbo.Split8KTallyMulti
(@Parameter VARCHAR(8000), @Delimiter VARCHAR(1))
RETURNS @Split TABLE (ItemNumber INT, ItemValue VARCHAR(8000))
WITH SCHEMABINDING AS
BEGIN
INSERT INTO @Split
(ItemNumber,ItemValue)
SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,
SUBSTRING(@Parameter,N,CHARINDEX(@Delimiter,@Parameter+@Delimiter,N)-N) AS ItemValue
FROM dbo.Tally
WHERE N BETWEEN 1 AND LEN(@Parameter)+1
AND SUBSTRING(@Delimiter+@Parameter,N,1) = @Delimiter; --Notice how we find the comma
RETURN;
END;
GO
--=====================================================================================================================
-- Build some test data from the table that is actually using in the tests
--=====================================================================================================================
--===== Build a thousand item CSV row without it being dependent on other test tables
DECLARE @CSVList VARCHAR(8000);
SELECT @CSVList = STUFF(
(
SELECT TOP 1000 ','+ CAST(SalesOrderDetailID AS VARCHAR(10))
FROM AdventureWorks.Sales.SalesOrderDetail
ORDER BY NEWID()
FOR XML PATH('')
)
,1,1,'');
--===== Show what we have for a parameter now...
PRINT @CSVList;
--=====================================================================================================================
-- Run the tests using Brad's code
--=====================================================================================================================
--===== Commence testing ...
SET STATISTICS TIME,IO ON;
--And use it in a query:
PRINT '--===== Split8KTally (iTVF) ==========================================================================================='
declare @ProductID int, @ProductName varchar(50), @SalesOrderID int, @OrderDate datetime, @TerritoryName varchar(50);
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KTally(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.ItemValue as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
--declare @ProductID int, @ProductName varchar(50), @SalesOrderID int, @OrderDate datetime, @TerritoryName varchar(50);
PRINT '--===== Split8KXMLBrad (XML)==========================================================================================='
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KXMLBrad(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.ItemValue as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
--declare @ProductID int, @ProductName varchar(50), @SalesOrderID int, @OrderDate datetime, @TerritoryName varchar(50);
PRINT '--===== Split8KL1 (Loop mlTVF)========================================================================================='
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KL1(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.ItemValue as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
--declare @ProductID int, @ProductName varchar(50), @SalesOrderID int, @OrderDate datetime, @TerritoryName varchar(50);
PRINT '
--===== Split8KTallyMulti (mlTVF )====================================================================================='
select @ProductID=d.ProductID
,@ProductName=p.Name
,@SalesOrderID=h.SalesOrderID
,@OrderDate=h.OrderDate
,@TerritoryName=t.Name
from dbo.Split8KTallyMulti(@CSVList,',') a
join AdventureWorks.Sales.SalesOrderDetail d on cast(a.ItemValue as int)=d.ProductID
join AdventureWorks.Production.Product p on d.ProductID=p.ProductID
join AdventureWorks.Sales.SalesOrderHeader h on d.SalesOrderID=h.SalesOrderID
join AdventureWorks.Sales.SalesTerritory t on h.TerritoryID=t.TerritoryID
order by d.ProductID,h.SalesOrderID ;
SET STATISTICS TIME,IO OFF;
--=====================================================================================================================
-- Houskeeping
--=====================================================================================================================
--===== Do this in a nice safe place that everyone has
USE tempdb;
GO
--===== Drop all the object that we just created
DROP FUNCTION dbo.Split8KTally, dbo.Split8KL1, dbo.Split8KXMLBrad, dbo.Split8KTallyMulti;
DROP TABLE dbo.Tally;
GO
--Jeff Moden
Change is inevitable... Change for the better is not.
September 21, 2010 at 9:45 pm
GilaMonster (9/20/2010)
JeffWould you like a run on 2008 R2? Will have to be this weekend at earliest, cause I'm not planning to upgrade before then.
Hi Gail... now that I actually have something decent for you to test, I'd love it if you would run both the code from the original post and the code in the post right above this one whenever you get the chance. Thank you much! 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
September 21, 2010 at 10:54 pm
Jeff Moden (9/21/2010)
Then it dawned on me to try an mlTVF instead of an iTVF so, still clutching my favorite blanky and the beer lollipop, I built one... and it smokes all the code anyone has tried on 2008 yet. Or, at least I thought "yet". That's when I remembered seeing something about a post in this thread and went looking for it. Obviously, I found it (quoted above) and couldn't understand why we didn't actually try it and the answer is that Rob was having problems uploading the results and I flat forgot about it.
I mentioned this approach on Sunday night as well, though I didn't post any hard numbers:
Brad Schulz (9/19/2010)[/b]
I tried also converting the Tally approach and the XML approach to multi-line functions as opposed to inline functions (based on a suggestion by Adam Machanic), and also put a "wrapper" around the CLR function to force it to act like a multi-line function, and all approaches came out pretty much on par, though the CLR approach required some more reads than the other approaches because of the "wrapper".
This brings all the approaches (Tally, XML, RBAR, CLR) to a fairly even playing field, because it creates the same query plan for each of them, and doesn't require any crazy memory grant requests. Essentially the query plan is pretty much as written... in other words, execute the function to get the set of items you want to lookup, and LOOP JOIN that with the table you want to look it up in. The optimizer assumes 1 row as the output of the function, and thus it does a Nested Loops join. The problem is that this can cause a slowdown if your list of items is reeeeeaaallllyyyy looooonnnnnggggg.
So it brings it down again to: which function will strip the item list fastest? So CLR comes in first (though it requires more reads), then the Tally approach (though the RBAR approach is actually better for a small list of items... Tally better for loooong list of items), and, sadly, XML comes in last (because it requires more CPU).
Again, as I mentioned on Sunday, I intended to write a follow-up post about this, but I keep putting it off.
--Brad
September 22, 2010 at 12:56 am
I must admit I'm a little confused about the conclusions being drawn overall.
At the risk of stating the obvious, using a multi-statement TVF is exactly equivalent to splitting the input value into a table variable, and then running a separate join query.
In any case, the net effect is to produce a better plan by counteracting certain costing model limitations by introducing another - namely that table variables are estimated as producing one row (unless OPTION(RECOMPILE) is used).
There's nothing wrong with that of course - but it does mean that one has to choose between Tally implementations depending on how large the input set is.
September 22, 2010 at 1:10 am
This seems a good time to highlight some other issues too:
The Tally solution does not handle trailing blank records or Unicode characters correctly:
DECLARE @CSV NVARCHAR(MAX) = N'A,B&C,,C,?,',
@Delim NVARCHAR(255) = N',';
SELECT S.ItemNumber,
S.ItemValue
FROM dbo.Split8KTally(@CSV, @Delim) S;
There should be six records, not five; and the fifth record contains a '?'.
The XML solution needs to be modified (as shown on Brad's blog) to handle entitization correctly:
DECLARE @CSV NVARCHAR(MAX) = N'A,B&C,,C,?,',
@Delim NVARCHAR(255) = N',';
SELECT S.ItemNumber,
S.ItemValue
FROM dbo.Split8KXMLBrad(@CSV, @Delim) S;
Other character values are invalid in XML and would cause the method to fail completely. The XML solution also emit NULLs instead of empty strings.
The SQLCLR solution works correctly:
DECLARE @CSV NVARCHAR(MAX) = N'A,B&C,,C,?,',
@Delim NVARCHAR(255) = N',';
SELECT S.sequence,
S.item
FROM dbo.SplitString_Multi(@CSV, @Delim) S;
September 22, 2010 at 1:32 am
Brad Schulz (9/20/2010)
CLR is certainly fastest in and of itself, but it requires a big memory grant (almost a 100MB) and brings about more reads than necessary (because of the optimizer kind of treating it like a "black box" that spits out an estimated 1000 rows and therefore constructs a plan with hash matches because of the estimated 360,000 rows that the final plan expects to produce).
The SQLCLR solution isn't always fastest, but it does have consistently good performance, and significantly out-performs the other methods on non-trivial tests.
It's not cardinality estimation that's to blame for the memory grant, it's the lack of statistical information. The test rig happens to produce exactly the estimated 1000 rows, but the costing component has no statistical information to reason about the distribution of values. The resulting guesses at join cardinality are responsible for the excessive memory grant.
September 22, 2010 at 3:10 am
steve-893342 (9/19/2010)
Good one:-)I shall remember that for next time I need to bcp and split at the same time;-)
:laugh: Well there is some reasoning behind the example that might interest you.
The SQLCLR solution does not perform anywhere near as well when the results of the split need to be written to a table.
When used with an INSERT statement, the reason is that the QO needs to provide Halloween protection, since there is a possibility that the TVF reads from the same table that is being inserted to. Even if the TVF is marked as performing no data access, QO does not take that into consideration; or if it does, it isn't prepared to believe us 🙂
So, when columns are sourced from a SQLCLR TVF, you'll see an Eager Spool (or another blocking operator) between the read and write cursors. Sad, but true.
The situation with SELECT INTO is much less clear, because the query plan contains nothing to indicate why throughput is so slow.
The reason for using bcp was to demonstrate that the throughput of the TVF itself is not the issue. Bulk copying rows in this way is much faster from a SQLCLR TVF than from a normal row source (such as a Tally or XML splitter).
Happily, the bcp-style interface (SqlBulkCopy) is directly accessible from SQLCLR, so I can write a function/procedure to bulk-copy split rows directly to a target table (not a file!). This is immensely quick, as you might imagine.
September 22, 2010 at 5:02 am
Paul White NZ (9/22/2010)
steve-893342 (9/19/2010)
Good one:-)I shall remember that for next time I need to bcp and split at the same time;-)
:laugh: Well there is some reasoning behind the example that might interest you.
The SQLCLR solution does not perform anywhere near as well when the results of the split need to be written to a table.
When used with an INSERT statement, the reason is that the QO needs to provide Halloween protection, since there is a possibility that the TVF reads from the same table that is being inserted to. Even if the TVF is marked as performing no data access, QO does not take that into consideration; or if it does, it isn't prepared to believe us 🙂
So, when columns are sourced from a SQLCLR TVF, you'll see an Eager Spool (or another blocking operator) between the read and write cursors. Sad, but true.
The situation with SELECT INTO is much less clear, because the query plan contains nothing to indicate why throughput is so slow.
The reason for using bcp was to demonstrate that the throughput of the TVF itself is not the issue. Bulk copying rows in this way is much faster from a SQLCLR TVF than from a normal row source (such as a Tally or XML splitter).
Happily, the bcp-style interface (SqlBulkCopy) is directly accessible from SQLCLR, so I can write a function/procedure to bulk-copy split rows directly to a target table (not a file!). This is immensely quick, as you might imagine.
Paul
That's a very timely response indeed and extremely well explained. I was just about to show you results comparing the peformance of INSERT queries for the CLR splitter versus a virtual tally in-line splitter, but it looks like you've got all bases covered. Using SqlBulkCopy to circumvent the conventional INSERT query is a good trick indeed and very well thought out. Thanks very much for this info and your other contributions to this thread which have been most informative:-)
Cheers
Steve
September 22, 2010 at 6:56 am
Paul White NZ (9/22/2010)
The SQLCLR solution works correctly:
DECLARE @CSV NVARCHAR(MAX) = N'A,B&C,,C,?,',
@Delim NVARCHAR(255) = N',';
SELECT S.sequence,
S.item
FROM dbo.SplitString_Multi(@CSV, @Delim) S;
6 rows returned eh. Looks like my persistence paid off after all:-)
September 22, 2010 at 7:08 am
steve-893342 (9/22/2010)
6 rows returned eh. Looks like my persistence paid off after all:-) http://www.sqlservercentral.com/Forums/FindPost979750.aspx
Yes indeed - and you still owe me a pint 😎
September 22, 2010 at 8:00 am
Jeff remember this article?
The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
By Jeff Moden, 2008/05/07
Total article views: 41560 Views in the last 30 days: 985
http://www.sqlservercentral.com/articles/T-SQL/62867/
Unless my arithmetic is wrong that is 47 reads per day, 1,484 per month,
a vast audience to be taught.
Time for a rewrite or an addendum? ? ?
(And please do not hate me for bringing it up)
September 22, 2010 at 8:30 am
Paul White NZ (9/22/2010)
This seems a good time to highlight some other issues too:The Tally solution does not handle trailing blank records or Unicode characters correctly:
Heh... Of course it doesn't handle Unicode correctly... the variables are all VARCHAR. 😉
I am, however, surprised by the trailing comma problem. I'll take a look at that.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 151 through 165 (of 214 total)
You must be logged in to reply to this topic. Login to reply