June 3, 2017 at 8:44 am
Alan.B - Monday, January 4, 2016 11:16 AMEirikur Eiriksson (1/4/2016)
Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
...{snip}...
Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.
I know it's an older post but just saw this. Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 3, 2017 at 8:48 am
Mickey Stuewe - Sunday, January 3, 2016 11:24 PMComments posted to this topic are about the item Why Scalar Functions Can Be Costly
Good article, Mickey. You said there would be follow-up articles on mTVFs and iTVFs. I hope the comments on this thread didn't scare you off.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 3, 2017 at 8:53 am
Jeff Moden - Saturday, June 3, 2017 8:48 AMMickey Stuewe - Sunday, January 3, 2016 11:24 PMComments posted to this topic are about the item Why Scalar Functions Can Be CostlyGood article, Mickey. You said there would be follow-up articles on mTVFs and iTVFs. I hope the comments on this thread didn't scare you off.
I'll try not to scare Mickey off 😀
😎
June 3, 2017 at 8:59 am
Jeff Moden - Saturday, June 3, 2017 8:44 AMAlan.B - Monday, January 4, 2016 11:16 AMEirikur Eiriksson (1/4/2016)
Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
...{snip}...
Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.I know it's an older post but just saw this. Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?
It does serve it's purpose when trying to match i.e. user's input and fuzzy matching. As fuzzy matching is an Enterprise feature in SSIS then one often has to revert to Standard SQL Server alternatives.
😎
June 3, 2017 at 3:12 pm
Nice article, Mickey. I missed it when it was first published, but it sounds like the stuff I talk about all the time.
June 4, 2017 at 12:06 am
Eirikur Eiriksson - Saturday, June 3, 2017 8:59 AMJeff Moden - Saturday, June 3, 2017 8:44 AMAlan.B - Monday, January 4, 2016 11:16 AMEirikur Eiriksson (1/4/2016)
Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
...{snip}...
Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.I know it's an older post but just saw this. Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?
It does serve it's purpose when trying to match i.e. user's input and fuzzy matching. As fuzzy matching is an Enterprise feature in SSIS then one often has to revert to Standard SQL Server alternatives.
😎
Is it any good or is it horribly slow and requires a table or index scan like most on-the-fly fuzzy lookups?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 6, 2017 at 5:04 am
Another point of optimization would be rewriting the scalar Function dbo.fn_SumRangeOfNumbers() so instead of summing integers from a Tally table the function should use the formula for the sum of an arithmetic progression.
i.e.: (SecondNumber - FirstNumber + 1) * (FirstNumber + SecondNumber) / 2
July 26, 2017 at 8:28 am
Jeff Moden - Saturday, June 3, 2017 8:44 AMAlan.B - Monday, January 4, 2016 11:16 AMEirikur Eiriksson (1/4/2016)
Worse still are the multi-statement table-valued functions, in fact there is no excuse or justification for using those.I agree with this 99.9999% of the time but I did find one exception here... For the Longest Common Substring; I've been playing around with this for a couple years and here's my iTVF solution:
...{snip}...
Phil Factor has a mTVF solution here: https://www.simple-talk.com/blogs/2014/12/20/string-comparisons-in-sql-the-longest-common-substring/[/url]My solution is much, much faster than Phil's for shorter strings but slows down as the strings get longer with the tipping point being somewhere around 100 characters. The problem with the iTVF version is that, without an updateable variable, I have to use a more brute-force approach. I can't beat the mTVF in this case for longer strings (yet). For me, that's been the 0.0001% exception.I know it's an older post but just saw this. Although it's a seriously interesting exercise, is there a practical use to finding the longest common string?
Sorry I missed your question Jeff; I just saw it now.
fI have never used the longest common substring to solve a business problem. I was posted on a forum few years ago and I thought it was a fun and challenging exercise. Today I use it to when showing people how to use tally tables - it seams to make a good impression on people new to the concept.
One thing you can do with a Longest common substring function is do some preliminary plagiarism detection. I have a couple examples but can't find them at the moment. Let's say, for example, you have a website like SSC where authors can submit articles. You could (after some initial cleanup such as making breaks/line feeds/carriage returns uniform earlier in the routine) compare a submitted article to all existing articles in your database to check for a longest common substring longer than <user defined number>. If the query finds that an article has a shares a, say, 90-character substring with another article, the editor is notified to review and see if the submitting author is steeling someone's work or is correctly referencing another article where they give credit.
-- Itzik Ben-Gan 2001
July 26, 2017 at 8:38 am
TheSQLGuru - Monday, January 4, 2016 7:02 PMNot even for Check Constraints since you can get bad data. Search for sql server udf check constraint bug and you can go here: http://sqlblog.com/blogs/tibor_karaszi/archive/2009/12/17/be-careful-with-constraints-calling-udfs.aspxAddressing someone else's comment about WITH SCHEMABINDING: that is a MUST, especially if you use UDFs in UPDATES. You can avoid a nasty table spool put in place to prevent the Halloween Problem in some cases. I recommend using SCHEMABINDING all the time it is an option though - build a box around the user, and in this case the user is YOU, the developer. 🙂
Not even for computed columns. Another reason why scalar functions in computed columns is a bad idea
-- Itzik Ben-Gan 2001
Viewing 9 posts - 46 through 53 (of 53 total)
You must be logged in to reply to this topic. Login to reply