Are tabular udfs helpful or hurtful?

Question

Are tabular udfs helpful or hurtful?

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 1

Hallidayd --

Hmm, very interesting - I'm intrigued... Your changes remind me of Itzik Ben-Gan's technique for making a scalar udf become an inline table-valued udf: (http://www.sqlmag.com/print/sql-server/inline-scalar-functions.aspx).

So, in the container stored proc that joins a table to this udf (udf modified as inline as you suggest), would I then encompass this udf as a derived table similar to the technique Itzik uses to pull a single value into a field within the Select statement?

SELECT(various fields)

FROMdbo.PROVIDER_ACHIEVEMENT A

JOIN

(

SELECT ConditionKey, NETWORK_COND_PREVQTR_RATE

FROM DBO.udf_RPT_NETWORK_PREVQTR_COND_RATE (@UserID)

) B

ON A.ConditionKey = B.ConditionKey

hallidayd SSCrazy Points: 2490 More actions · Answer 2

I first saw that technique described by Adam Machanic http://sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx

Although perhaps superficially similar, Itzik is really solving a slightly different problem - getting a scalar inline whereas you are getting a multi statement TVF inline. As such, you can call it like you do there, or you can call it as you ever did, or you can use CROSS APPLY. The point is you can use it just like the old UDF - it is the optimiser that is really going to treat things differently.

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 3

OK -- really cool.

Any issue with parameter sniffing with the modified inline udf? I'm wondering about @user-id?

Would it be helpful if the container stored proc assigned the @user-id parameter to a declared variable and then passed that variable to the udf?

hallidayd SSCrazy Points: 2490 More actions · Answer 4

The parameter sniffing is at the sproc level.

I would google parameter sniffing in SQL. Your options depend on your circumstances and SQL Server version.

What I would definitely do first however is execute your procedure using the recompile hint - if it still takes a very, very long time then your problem is not (only) parameter sniffing.

hallidayd SSCrazy Points: 2490 More actions · Answer 5

As mentioned, statistics are another avenue to investigate. Are autoupdate statistics on? And if not, do you have a regular maintenance routine to selectively update them?

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 6

Awesome information!

I'll start digging into the code, as well as attempt to translate the execution plans, and see if I can configure SQL Profiler correctly to assess performance.

Thanks again!

--pete

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 7

hallidayd (10/20/2010)
As mentioned, statistics are another avenue to investigate. Are autoupdate statistics on? And if not, do you have a regular maintenance routine to selectively update them?

I'll have to check with our new dba regarding the autoupdate statistics setting. (Is there a query that can be run to determine if autoupdate stats is on?)

Also, should statistics be refreshed whenever a table structure changes or index is added/deleted/modified?

hallidayd SSCrazy Points: 2490 More actions · Answer 8

SELECT DATABASEPROPERTYEX('myDB', 'IsAutoUpdateStatistics')

peterzeke (10/20/2010)
I'll have to check with our new dba regarding the autoupdate statistics setting. (Is there a query that can be run to determine if autoupdate stats is on?)
Also, should statistics be refreshed whenever a table structure changes or index is added/deleted/modified?

Statistics are based on the data distribution, not the structure. IIRC the trigger point is modifications to 20% of the data the statistics are based on since the last update.

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 9

Well, I decided to test out changing the tabular multistatement udf to be inline as suggested in a previous post. To my surprise (shock), converting the udf to be inline is performing very poorly. The original udf takes less than a second to run, compared to the inline udf which takes about 1 minute.

Here are the queries I tested, which return two columns with 15 rows:

SELECT CONDITIONKEY

,NETWORK_COND_PREVQTR_RATE

FROM DBO.[udf_RPT_NETWORK_PREVQTR_COND_RATE](530438)

SELECT CONDITIONKEY

,NETWORK_COND_PREVQTR_RATE

FROM DBO.[udf_RPT_NETWORK_PREVQTR_COND_RATE_Inline](530438)

Here's the performance output from the original udf:

Table '#2CA8951C'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 47 ms, elapsed time = 59 ms.

Here's the performance output from the inline udf:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'NETWORK_ACHIEVEMENT'. Scan count 1, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PROVIDER_ACHIEVEMENT'. Scan count 1, logical reads 7223, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DIM_PROVIDER'. Scan count 1, logical reads 157, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 64031 ms, elapsed time = 65717 ms.

Does this suggest that the inline approach is not ideal for the specific data that the udf is pulling? Does it suggest that there are missing indexes that the inline udf needs to perform optimally? Perhaps the inline version kindly offered to me in an earlier post needs to be tweaked?

Again, many thanks for the assistance that's been offered.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

Stats IO lies. The original udf would not have appeared at all in the IO stats. That may well explain why the difference looks so pronounced in terms of IO.

http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/

The IOs are low, which is good. Lots of blocking? Lots of data streaming back to the client? Execution plan on (if so turn it off for time tests it takes ages to generate)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 11

Gee wiz, Mr Celko-- I never thought that this discussion thread would draw your attention. As much as I'd like to consider your post helpful, I'm not so sure what to think. Your reply reminds me of the parable of the Pharisee and the Tax Collector. I come to SSC in an effort to learn, to improve... hopefully from people who aren't fed up with people like me.

So, back to the task at hand. Joe's reply does indeed highlight problems in my coding, specifically within the udf I've offered up for public examination. In fact, one of the reasons I chose to post a udf is that I'm not completely thrilled with using it in the first place.

The existence of this udf is an effort to simplify and standardize various queries. For example, rather than all of my colleagues remembering to join certain tables and the various criteria that must be present to ensure the validity of the data returned from these tables, a udf or view can be leveraged that ensures the correct data is returned. Write once, use everywhere. Without using udfs or views, my colleagues can end up generating all sorts of queries that can yield different results or merely be difficult to debug. (Just look at the different queries people submit for SQL Challenges and Speed Freaks!)

The data warehouse, from which this udf queries, is based on the Kimball method of dimensional modeling[/url]. I sense that Joe Celko is anti-Kimball, otherwise Joe would recognize the various "key" fields in the udf as being Kimball based. The datekey field is a prime example. I don't want to argue with Joe about the validity of the Kimball method; I'm merely pointing out the style of data warehouse I'm querying.

Also, I'm not the creator of this data warehouse, including the table Dim_Provider (and others just like it). I agree that it would make a lot more sense if this table was in plural form (Dim_Providers).

As for typing in uppercase letters or using hyphens, it's a style that exists within the IT group at my company. Be it wrong or right, it's a style that works for us. Should I join a different company elsewhere, I'll code according to their standards.

Lastly, I don't know what "declarative programming" really means. I make an honest effort to keep up with set-based querying as illustrated by known sql gurus like Itzik Ben-Gan and Jeff Moden (here on SSC). So, Joe, feel free to teach me instead of condemning me. I'm not looking for a fight; I'm looking to learn.

hallidayd SSCrazy Points: 2490 More actions · Answer 12

In terms of comparing the two, easiest method is to isolate the SQL within each function and test the two side by side (by this I mean take the guts of the functions out of the functions and test as scripts). You can then compare plans and costs more easily. As mentioned, you can't easily debug the working of a multi statement UDF.

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 13

hallidayd,

As an initial step to breaking down the udf, specifically with the modified udf you provided, I converted the nested udfs (within the cross join) as inline udfs as well, and sure enough a tremendous performance gain resulted. Performance is now approx 200ms, yet not quite as fast as my original udf - but that's hardly the point.

The point for me is that I'm certainly acquiring a better understanding of inline udfs, and how the optimizer can leverage the sql. I'm also finding the nested udf approach to be a bit "unnatural" -- i.e., when debugging sql that has nested udfs, how deep does the rabbit hole go?

I am a little torn, however, between the benefit of a udf (i.e., "write once, use everywhere"... which also means "modified once, modified everywhere") vs. Joe Celko's stance of spelling out all joins within a query -- if a a particular business rule is implemented in multiple queries and that rule changes in the future, then all of the queries that have the old implementation of the business rule have to be modified individually vs. a single correction to a udf which can take effect immediately for all queries that reference the udf.

Again, thanks for you assistance.

hallidayd SSCrazy Points: 2490 More actions · Answer 14

I don't believe that is Joe's stance. Note also he has rewritten the UDF as a single statement but not used inline function syntax, so the benefit of the rewrite is somewhat lost. However, it is a couple of minor changes to alter.

There is some contention amongst professionals re. reusable code when it comes to SQL. I know and respect some very good DBAs that do not allow views and functions - everything is coded from scratch. I know when I first started I overused them and my applications suffered.

One thing to remember is that views (and also inline UDFs) are incorporated into plans not as all-or-nothing objects, but inline. As such, the optimiser can dispose entirely of tables referenced in the view if they are not referenced in the "parent SQL" (for want of a better term) if there would be no change to the logical semantics of the query. This follows then that judicious use of views and inline UDFs is a valid technique to my mind; it flattens the database structure and makes it easier for less skilled developers\ users to extract the information they need.

peterzeke SSCrazy Eights Points: 9003 More actions · Answer 15

Agreed, I probably painted Celko's viewpoint with too wide a brush.

As for your mention of views, you've answered a question I was pondering: can the optimiser leverage a view much like an inline udf? In fact, I'm wondering if some of the udfs I've created (which I haven't posted) would be better as views, especially since parameters don't need to be passed to some of these udfs.

Given the constructive feedback and education I received from this discussion thread, I've been developing a sql query (from scratch) that will replace the current sql that supports my currently ill-performing report. I'm attempting to not refer to any of the sql already built -- i.e., use a clean slate.

What makes the report tricky is that information needs to be reported at different granularities on the same page:

1) Display a physician's performance for various services (e.g., blood tests, office visits) related to specific diseases/conditions

2) Display performance metrics at the disease/condition level (i.e., rollup service level data)

3) Display comparative stats from the previous calendar quarter regarding the entire network of physician performances at the service and disease/condition level

4) Display the performance rates of the entire group/practice of which this physician belongs

5) Display the best physician performance rates across the entire network at both the service and disease/condition level.

The data warehouse contains snapshot tables of most of this data, but not all of it. So, given these different granularities, I took an initial approach of acquiring chunks of the data via udfs rather than one monster query.