August 3, 2009 at 12:25 am
We are using set of table valued user defined functions at base level and use it instead of direct table access in queries. Reason behind using the UDF's is to reuse the code.
Suppose, there is table as 'Companies' in database. I have created few UDF's for different conditions which are required at many places e.g. fnGetSoftwareCompanies, fnGetContractedCompanies, gnGetCompaniesRegisteredInLastYear etc..
There is also another table 'CompanyProducts'. And now consider that I need to retrieve products of software companies, I have written another UDF 'fnGetsoftwarecompanyProducts' which internally use 'fnGetSoftwareCompanies' and then join to CompanyProducts. Final data is pivoted get product names as columns.
Now in one of my SP's, I need to join both UDF's 'fnGetSoftwareCompanies' and 'fnGetsoftwarecompanyProducts' in order to retieve some data.
Second UDF itself uses first UDF internally. So same block of code is executed twice in that SP. I understand that I could have avoided that by not using UDFs and writting whole logic in one batch. But there are reasons I have to use UDFs compulsory.
Now query looks something like
Select
--columns
From fnGetSoftwareCompanies(id)
Join fnGetsoftwarecompanyProducts(id)
WHERE --conditions
Here 'fnGetSoftwareCompanies' code block will be executed twice. Once in this outer code and next in 'fnGetsoftwarecompanyProducts'
I am currently working on performance optimization task and I want to get rid of this issue. I would have tried use of temp table but can not use it in functions.
Other solution I tried was, to build XML from 'fnGetSoftwareCompanies' resultset and pass it to other function 'fnGetsoftwarecompanyProducts' which then use that XML data instead of call to 'fnGetSoftwareCompanies' function. But performance was slow in this approach.
Could someone please suggest me solution to resolve this issue ?
August 3, 2009 at 6:55 am
You want an honest answer? Don't do this. Don't use functions in this manner. I've seen an entire application written exactly the way you described. It looked great and when tested within the development environment on only a few rows of data with only a few connections, it worked fine. Just as soon as hundreds of rows (note, that's two zero's,) were inserted into the underlying tables, the crystaline structure of UDF's calling UDF's joined to UDF's that called other UDF's fell completely appart. It's not scalable. You'll get very bad execution plans and serious repitition on the code, as you're already seeing. If you're using multi-statement table valued UDF's then you're in for an even bigger world of hurt. These UDF's have no statistics. Look at the execution plan. The estimated rows for all the multi-statement UDF's is 1. That means the execution plan thinks it is returning a single row. This works fine as long as it is only return one, or a very few, row, but as soon as it starts returning 100, 1000 or more, the execution plans cease to function appropriately.
Save yourself tons of headaches, rethink and redesign now.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 3, 2009 at 7:25 am
Thanks Grant for taking your time to answer this post. We are facing some serious performance issues with this approach on PROD server with increasing users and data.
I think it makes sense to redesign queries. Do you recommend not using inline table valued function at all - which returns more than 1,00,000 rows ?
August 3, 2009 at 7:32 am
Inline UDF's are not usually a problem because, as you can see in the execution plan, they'll resolve out to a standard set of operations, like a view (unlike the multi-statement UDF's). But when you start joining them to each other, the optimizer can, and will, get overwhelmed and start making poor choices for execution, leading to bad performance.
So, from a general stand-point, sure, go ahead and use the inline UDF, but when you start combining them and joining them, get ready for a rough ride.
Returning that much data if it's not an ETL process begs the question, do you need that many rows returned to the client, for real? Human beings don't usually read more than a couple of hundred rows of data.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 3, 2009 at 8:53 am
Inline UDFs, like views, can be great if you use them properly. If you deviate even slightly from "optimum" use, they can kill performance very, very efficiently.
The key to using both, is use them for the least number of columns you'll need, as a shortcut for your most common joins, and don't use them for anything more than that.
For example, if you have two tables that you have to join all the time, to get two or three columns total, then a view or inline UDF can be a valuable shortcut that saves having to write that join over and over and over again.
But if they have even a single column you don't need, that's extra I/O, extra processing, extra RAM, etc., that you'll waste every time you don't need that column.
At my last job, there were two tables in one database that needed to be joined for about 80% of the queries in use. Needed to be two tables, not one, for normalization reasons with regard to insert/update/delete, but needed to be one table for just about every select used. I took the six or seven columns that were used in 90% of those queries, and put them into a view. Then used that view for most of the selects, just to make the coding a bit easier.
That's the kind of thing that a view or inline UDF is good for.
At the same shop, one of the devs had built a series of nested views. The base pulled every column from three tables. Several others took that, and added calculated columns and even more joins. The top level view had every column from six or seven tables, a bunch of aggregated/calculated data columns, and pulled data from the Windows Active Directory as well. There were procs that used that view to get as few as two columns from tables that had a direct FK/PK relationship. The dev was shocked, honestly, when I was able to take some of those procs and cut their run-time down from 10-15 seconds to under 10 milliseconds. It meant I had to type a little more code in those procs, to build the actual joins they really needed instead of just having the view in the From clause all by itself, but I thought it was worth the effort.
That's the kind of trap that people get into with views and UDFs.
Does that help clarify the situation?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 3, 2009 at 8:15 pm
That's a good summary, Gus. We had some developers create scalar-valued UDFs (which contained queries) to simplify/standardize their final queries. That way, instead of writing a join to the accounts receivables table, they could just write a where clause with
dbo.ufAccountOverdue(acctNo) = 'Y'
I'm sure you can imagine how poorly this ran when used to select a list of overdue accounts. It was especially bad when several such functions were used in the where clause.
One question though: If you do a select against a view or an inline table valued function, doesn't the resulting query plan only use the columns required? I'm trying to find my source authority for that right now.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 4, 2009 at 5:58 am
It may be too late for the current project, but one of the best ways I have found to promote good types of 'code re-use' inside SQL Server is to base a solution on Service Broker.
August 4, 2009 at 7:34 am
Quickly Confused (8/3/2009)
That's a good summary, Gus. We had some developers create scalar-valued UDFs (which contained queries) to simplify/standardize their final queries. That way, instead of writing a join to the accounts receivables table, they could just write a where clause withdbo.ufAccountOverdue(acctNo) = 'Y'
I'm sure you can imagine how poorly this ran when used to select a list of overdue accounts. It was especially bad when several such functions were used in the where clause.
One question though: If you do a select against a view or an inline table valued function, doesn't the resulting query plan only use the columns required? I'm trying to find my source authority for that right now.
I can certainly imagine how that performed, since I've seen it before. Heck, early on, I made that same mistake and had to refactor it out of my own databases.
When you include a view or inline UDF in a query, it amounts to including its definition in the calling query as a derived table. It's just shorthand for doing exactly that.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 4, 2009 at 8:28 am
Care to elaborate on that, Paul? I'm somewhat familiar with Service Broker but I want to be sure I'm not misunderstanding you.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 5, 2009 at 8:25 am
I have spent a significant chunk of the last 2 years or so helping several clients refactor UDFs out of their databases due to extremely poor performance. It is a perfect example of something being (potentially) good for the developer that is absolutely horrid for the server. Some of the processes I have rewritten have run 5 ORDERS OF MAGNITUDE more efficiently!!!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
August 5, 2009 at 11:31 am
Bob Hovious (8/3/2009)
One question though: If you do a select against a view or an inline table valued function, doesn't the resulting query plan only use the columns required? I'm trying to find my source authority for that right now.
Try Grant's blog.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 5, 2009 at 12:05 pm
Thanks, Gail. Nice to see you back.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 5, 2009 at 4:18 pm
Bob Hovious (8/4/2009)
Care to elaborate on that, Paul? I'm somewhat familiar with Service Broker but I want to be sure I'm not misunderstanding you.
Briefly, I meant that using Service Broker tends to produce designs which encourage code re-use. It's a bit of a woolly assertion I grant you, but I have found it to be the case. There are specific cases of great re-use opportunities - one example that leaps to mind is in serializing and de-serializing data especially where a remote service is involved. Did you have something else in mind Bob?
Paul
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply