January 6, 2012 at 11:33 am
SQL Kiwi (1/6/2012)
reinpost (1/6/2012)
Just a trivial remark: right at the start it saysAPPLY is named after the process of applying a set of input rows to a table-valued function.
Surely you mean to say applying a table-valued function to a set of input rows.
I think it makes sense either way! ๐
So do I, but mathematicians and programmers tend to say that functions are applied to things. In fact this new operator was probably named after Lisp's 'apply' , which does the same thing.
January 6, 2012 at 12:11 pm
reinpost (1/6/2012)
SQL Kiwi (1/6/2012)
reinpost (1/6/2012)
Just a trivial remark: right at the start it saysAPPLY is named after the process of applying a set of input rows to a table-valued function.
Surely you mean to say applying a table-valued function to a set of input rows.
I think it makes sense either way! ๐
So do I, but mathematicians and programmers tend to say that functions are applied to things. In fact this new operator was probably named after Lisp's 'apply' , which does the same thing.
Even though it is irrelevant... I'll agree that the proper is that the function applies to the rows. I thought about it like a car and paint. When you apply the paint to the car, the properties of the paint do not change the car does. So in this case the function does not change at all, it is applied to the rows of which the properties have changed.
Now that I said that... Who cares, it still accomplishes the same thing ๐
Jared
CE - Microsoft
January 6, 2012 at 4:52 pm
A word of caution
I took the tip about SCHEMABINDING and the optimizer's expanding iTVFs and converted a Multi-statement UDF (referred to from here on as the TF) that splits up to 12MB of delimited data using a Numbers table. Perhaps this is SS2005EE-specific and later versions are smarter, but...
BAD MOVE!
I have this (inherited) snippet from an SP that uses CROSS APPLY in the FROM clause and has a WHERE clause that contains DBA.dbo.udf_Split(), which is the TF I converted to an iTVF. The SP happens to be the most important SP in the system - the one that gets a User's permissions to do anything. It runs in under 1 second for over 500 Users using the original TF. However, it takes 15+ minutes to run the same SP using the iTVF for a single User, I consider this a very bad move and will pull the plug on changing this TF to an iTVF:-D
FROM dbo.t_user_ref ur
CROSS APPLY dbo.fn_GetPermissionByUser(user_id) AS p
WHERE GETDATE() BETWEEN eff_date AND term_date
AND ( @CompanyID IS NULL
OR ( @CompanyID IS NOT NULL
AND @CompanyID > 0
AND company_id = @CompanyID
)
)
AND ( @SiteIDs IS NULL
OR ( @SiteIDs IS NOT NULL
AND EXISTS ( SELECT 1
FROM DBA.dbo.udf_Split( @SiteIDs + ', ',',')
WHERE ',' + r_site_ids + ',' <> REPLACE(',' + r_site_ids + ',', ',' + [value] + ',', '')
)
)
)
AND ( @user-id IS NULL
OR ( @user-id IS NOT NULL
AND @user-id > 0
AND user_id = @user-id
)
)
AND ( @UserRole IS NULL
OR ( @UserRole IS NOT NULL
AND p.r_user_role LIKE '%' + @UserRole
)
)
The optimizer did indeed expand the iTVF form of the UDF into the main SELECT, but then goodness knows why it decides that 12789618 rows must now be scanned for each User row.
Be very, very careful which TFs you convert to iTVFs... :ermm:
January 6, 2012 at 5:02 pm
SAinCA (1/6/2012)
A word of caution
It is certainly still possible to write highly dubious code using in-line table-valued functions if you really try. There are a few cases where a multi-statement function makes sense; in this case it's because the results of the split are materialized into a table variable once and reused many times that way. Of course, this is something that could easily be done without the msTVF - just split the site IDs into a temporary table first. This can be better than using an msTVF since the 'real' temporary table can have statistics and can be indexed - the hidden msTVF table variable cannot do either of those things!
So my word of caution would be: if you find a multi-statement TVF performing better, check that you shouldn't be materializing something yourself. Do it properly with an iTVF and a temporary table and you'll get statistics, indexes, and probably better performance.
Fellow MVP Brad Shultz went into some details about why the multi-statement hidden materialization works here:
http://bradsruminations.blogspot.com/2010/08/integer-list-splitting-sql-fable.html
January 6, 2012 at 6:50 pm
SAinCA (1/6/2012)
A word of cautionI took the tip about SCHEMABINDING and the optimizer's expanding iTVFs and converted a Multi-statement UDF (referred to from here on as the TF) that splits up to 12MB of delimited data using a Numbers table. Perhaps this is SS2005EE-specific and later versions are smarter, but...
It could be the way the split function is using the Numbers table. I've seen it where the entire table gets scanned because of "reflection" (not an official term but that's what I think of it as) back to the table from the "outside" world cause the table to be scanned many times.
Would you mind posting your split function so I can take a look, please?
Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 7, 2012 at 12:52 am
Jeff Moden (1/6/2012)
Would you mind posting your split function so I can take a look, please?
Can I ask that you guys start a new thread for this? Feel free to post a link here though.
January 8, 2012 at 1:15 pm
SQL Kiwi (1/7/2012)
Jeff Moden (1/6/2012)
Would you mind posting your split function so I can take a look, please?Can I ask that you guys start a new thread for this? Feel free to post a link here though.
Thought the discussion would lend itself to the proper application of CROSS APPLY, but sure.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 8, 2012 at 9:08 pm
Jeff Moden (1/8/2012)
Thought the discussion would lend itself to the proper application of CROSS APPLY, but sure.
Hm, that's actually a good point. As you were, discuss away...:-)
January 9, 2012 at 10:28 am
As requested, Jeff:
USE DBA
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[udf_SplitN]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[udf_SplitN]
GO
CREATE FUNCTION [dbo].[udf_SplitN]
( @inString nvarchar(MAX)
, @delim nvarchar(255)
)
RETURNS @arrSplit TABLE
( IDX int IDENTITY(0, 1) PRIMARY KEY
, [value] nvarchar(1024)
)
AS
/* =============================================================================
** original idea by Anith Sen, posted on http://www.simple-talk.com at
** http://www.simple-talk.com/sql/t-sql-programming/the-helper-table-workbench/
** Description: Split a nvarchar string into chunks using a delimiter of up to
** 255 characters.
**
** Notes: See the corresponding udf_Split for the varchar version.
**
SELECT *
FROM dbo.udf_Split('one,two,three,four,five,six,seven,eight,nine,ten', ',')
SELECT *
FROM dbo.udf_Split('Monday--Tuesday--Wednesday--thursday--friday--saturday--sunday', '--')
========================================================================== */
BEGIN
INSERT INTO @arrSplit
( [value]
)
SELECT SUBSTRING(@inString + @delim, number, CHARINDEX(@delim, @inString + @delim, number) - number)
FROM dbo.Numbers
WHERE number <= LEN(REPLACE(@inString, N' ', N'`'))
AND SUBSTRING(@delim + @inString, number, LEN(REPLACE(@delim, N' ', N'`'))) = @delim
ORDER BY number
RETURN
END
GOThis is used in preference to the function below in cases where 4000 characters is likely to be exceeded, and sometimes by MB. The msTVF works very well at splitting XML files that contain multiple documents - a problem a third-party vendor presented that could not be changed due to their third party software. That's the primary reason for the msTVF's existence in this form.
When I know the delimited varchar data are short, I'll use the much-discussed and augmented UDF found on SSC (renamed by me to distinguish functionality from my other 2 udf_Split% UDFs):
USE [DBA]
GO
/****** Object: UserDefinedFunction [dbo].[udf_Split8Kv] Script Date: 01/09/2012 09:26:22 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER FUNCTION [dbo].[udf_Split8Kv]
/***************************************************************************************************
Purpose: Split a given string at a given delimiter and return a list of the split elements (items).
Returns: iTVF containing the following:
ItemNumber = Index Position of the Item as an int, begins at ZERO.
Item = Element value as a VARCHAR(8000)
CROSS APPLY Usage Example:
-----------------------------------------------------------------------------------------------------
===== Conditionally drop the test tables to make reruns easier for testing.
-- (this is NOT a part of the solution)
IF OBJECT_ID('tempdb..#JBMTest') IS NOT NULL
DROP TABLE #JBMTest;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
SELECT *
INTO #JBMTest
FROM ( SELECT 1,'1,10,100,1000,10000,100000,1000000'
UNION ALL
SELECT 2,'2000000,200000,20000,2000,200,20,2'
UNION ALL
SELECT 3, 'This,is,a,test'
UNION ALL
SELECT 4, 'and so is this'
UNION ALL
SELECT 5, 'This, too (no pun intended)'
UNION ALL
SELECT 6, LEFT(REPLICATE('Thisisanunbrokenstring',400),7999)
) d (SomeID,SomeValue);
GO
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID
, split.ItemNumber
, split.Item
FROM #JBMTest test
CROSS APPLY ( SELECT ItemNumber
, Item
FROM dbo.udf_Split8Kv(test.SomeValue,',')
) split;
***************************************************************************************************/
--===== Define I/O parameters
( @pString varchar(7999)
, @pDelimiter char(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
--===== "Inline" CTE Driven "Tally Tableโ produces values up to
-- 10,000... enough to cover VARCHAR(8000)
WITH E1 ( N )
AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
, E2 ( N )
AS ( SELECT 1
FROM E1 a
, E1 b
) -- 100
, E4 ( N )
AS ( SELECT 1
FROM E2 a
, E2 b
) --10,000
, cteTally ( N )
AS ( SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT N ) )
FROM E4
)
--===== Do the split
SELECT ROW_NUMBER() OVER ( ORDER BY N ) - 1 AS IDX
, SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Value
FROM cteTally
WHERE N < LEN(@pString) + 2
AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter
GO
January 9, 2012 at 3:52 pm
Excellent article. I liked the graphics, although I have to say I liked the code file even better. I have a few questions but I think I will wait until I have had a chance to read part 2 before expressing any ๐
Kenneth
Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]
January 9, 2012 at 8:39 pm
SAinCA (1/9/2012)
As requested, Jeff:
I thought the problem may have been with the split function you had listed but, looking at the code you provided for the split functions, I don't see a particular problem with them other than they're slower (because of the delimiter concatenation) than the new split function available at the URL coming up. The attachments to the article have been updated, so be sure to get the code from the attachments and not the article itself. The code in the attachments is about 20% faster after someone was good enough to provide yet another optimization.
Here's the link to the article with the new splitter functions...
http://www.sqlservercentral.com/articles/Tally+Table/72993/
One of the "problems" (in quotes because it's a human error, not a problem with CROSS APPLY or iTVFs themselves) with iTVFs and CROSS APPLY is a problem that is very similar to a problem with VIEWs. That is, it's "hidden" code or at least "out of site" code AND the code becomes a part of the execution plan as if it had been written as a part of the code. Because of that, you can miss some of the most terrible performance problems due to a very simple oversight... mismatched datatypes which, of course, will cause table and index scans and even full "rewinds" which are worse than mere table scans. That was the same problem that Brad Shultz ran into in the link that Paul previously posted (I never did post back on that thread to show what the problem was). In that case (IIRC... it was a while ago), it was caused by CHARINDEX returning BIGINT instead of just an INT because the first operand of the CHARINDEX Brad used was MAX datatype. The datatype mismatch was a real killer there. Instead of the Numbers table selecting only the numbers it needed, it "rewound" the table 121,316 times and generated a total of 121,317,000 rows internally through a Lazy Table Spool (I have the execution plan but have misplaced the analysis that showed the precise reason, so operating as best I can from memory).
Because you haven't included any information as to the datatypes you used for variables or any of the columns used in your code example, I can't even begin to guess where the datatype mismatch (or, possibly, accidental many-to-many join) may be. It may be in the splitters or it (more likely) is in the fn_GetPermissionByUser function you actually used in the CROSS APPLY. And, yeah... because of the lack of information about your code, I can only guess but anytime something like what you've described happens with a conversion to an iTVF and a CROSS APPLY, it's usually due to a datatype mismatch. You have to remember... iTVFs and VIEWs are a lot alike... they "become one" with the calling code and, as with any fully inline code, data type matches become rather critical for any decent performance.
I agree with your "word of caution" but I wouldn't blame the iTVF or the CROSS APPLY. Chances are it's something like a simple datatype mismatch and that's likely what you have to be most cautious about. Why does multi-line and scalar TVFs let you get away with it (datatype mismatches and accidental many-to-many joins)? Because the execution plan doesn't incorporate the plans for such things in the main-stream execution plan.
Paul is much more of an expert on execution plans than I am. If you were to follow the procedure (see the second link in my signature line at the end of this post) for posting performance problems and opened a new thread for the problem (we've diverged too much from the intent of this thread already), I'm sure that someone, or even Paul himself, would be happy to show you how to correctly incorporate the very high speed methods of combining CROSS APPLY and iTVFs for this particular problem.
As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...
WHERE GETDATE() BETWEEN eff_date AND term_date
AND ( @CompanyID IS NULL
OR ( @CompanyID IS NOT NULL
AND @CompanyID > 0
AND company_id = @CompanyID
)
)
Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...
WHERE GETDATE() BETWEEN eff_date AND term_date
AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))
The fact that you're using BETWEEN to compare dates is a whole other chapter in performing SQL Harakiri but this thread isn't the place for that particular discussion. ๐
--Jeff Moden
Change is inevitable... Change for the better is not.
January 10, 2012 at 9:51 am
Jeff Moden (1/9/2012)
As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...
WHERE GETDATE() BETWEEN eff_date AND term_date
AND ( @CompanyID IS NULL
OR ( @CompanyID IS NOT NULL
AND @CompanyID > 0
AND company_id = @CompanyID
)
)
Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...
WHERE GETDATE() BETWEEN eff_date AND term_date
AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))
I can't see any declarations anywhere in any of the code posted that prevents companyID being negative, and unless there is such a restriction this simplification is broken. It seems likely that there is such a restriction (column names with ID that get compared with 0 are often identity columns with 1 as initial seed) but there isn't certain to be, unless I've missed something.
Tom
January 10, 2012 at 11:21 am
Thanks, Jeff, for the complete treatment given to the script.
Thanks, too, to Paul for uncovering the treasure that is APPLY.
My only defense is "inherited code; wasn't broken so didn't fix it". I grabbed the newest split code (thanks) and looked at the code for implicit data type conversions. The usage of the split-string within the snippet posted takes a varchar and concatenates strings before comparison against a string, so there are no implicit conversions there.
The get permissions UDF is unchanged in my tenure apart from reformatting it from unreadable spaghetti. It's a msTVF that could probaby be rewritten as an iTVF using some CTEs and UNION ALL statements, but time doesn't permit that change just yet. Folks no longer with the Company said the underlying design needed complete overhaul and I can wait another day until it no longer supports the business - I have bigger fish to fry. The defensive "company > 0" was already there in case the UI sent garbage (outside my control); as I said, it ain't broke and under 2-second response to get all permissions for all users is acceptable by my Director.
BTW: Did you have a URL for the reference to the dates-between-harakiri, please? Wouldn't want to bump into misleading posts, as sometimes is the case. The term_date is defaulted to 9999-12-31 23:59:59 and if someone leaves a client company, we usually find out after the fact, so their term_date bereft of time is OK.
Thanks again for such an in-depth, meaty post on the posted scripts. If I get into performance issues (growth is just beginning in earnest), I'll be sure to post a new thread per your directive.
January 10, 2012 at 4:31 pm
L' Eomot Inversรฉ (1/10/2012)
Jeff Moden (1/9/2012)
As a side bar, you've got a huge amount of unnecessary code in your code example that you posted. Take the following snippet, for example...
WHERE GETDATE() BETWEEN eff_date AND term_date
AND ( @CompanyID IS NULL
OR ( @CompanyID IS NOT NULL
AND @CompanyID > 0
AND company_id = @CompanyID
)
)
Because NULLs cannot be related to anything else including other NULLs and assuming that you're not actually using negative CompanyIDs, the code can be simplified quite a bit as follows...
WHERE GETDATE() BETWEEN eff_date AND term_date
AND (@CompanyID IS NULL OR company_id = NULLIF(@CompanyID,0))
I can't see any declarations anywhere in any of the code posted that prevents companyID being negative, and unless there is such a restriction this simplification is broken. It seems likely that there is such a restriction (column names with ID that get compared with 0 are often identity columns with 1 as initial seed) but there isn't certain to be, unless I've missed something.
Heh... agreed. That's why I very specifically stated...
...and assuming that you're not actually using negative CompanyIDs...
--Jeff Moden
Change is inevitable... Change for the better is not.
January 10, 2012 at 5:57 pm
Jeff Moden (1/10/2012)
Heh... agreed. That's why I very specifically stated......and assuming that you're not actually using negative CompanyIDs...
Yes, I'm getting really erratic :blush:. I even quoted that statement and still didn't notice it. How erratic can I get?
Tom
Viewing 15 posts - 76 through 89 (of 89 total)
You must be logged in to reply to this topic. Login to reply