December 29, 2007 at 10:02 am
My app uses multiple calendars. For the regular (gregorian) calendary, the week, month, quarter, etc. of a given date can be determined by datepart(). However, datepart() can't be used with the other calendar types since the year start and month boundaries are arbitrary.
For example, the fiscal year may begin June 1 so July 15 is actually quarter 1. In the production calendar, weeks never split months so July 30th may actually be considered part of August.
I figure I can support the calendar in one of two ways. The first is to create a table which associates the proper week/month/qtr/etc. to each date in the db for each calendar. Since our retention interval is 5 years, that's only 1800 rows per calendar.
The alternative is to create a function that would be used instead of datepart(). This UDF never does any I/O since the production calendar start dates are hard-coded for each year. The logic is pretty simple and the function is only about 40 lines long.
I think the table solution is the best choice. But since this logic will be usd in very large queries (7 tables, 2 of which have multi-million rows) some other folks have advised me against adding yet another table. Still, I'm very skeptical about the performance resulting from calling such a function 50,000+ times in a query.
Frankly, I don't want to waste time working on something that's a bad idea or trying to prove which one is better. Hopefully one of you folks can point me to some reference material that will let us decide which route to take before expending any effort.
December 29, 2007 at 10:54 am
Then, consider this...
The only problem is that I've not figured out a good set based method for future dating things like deliveries given a ship date and a number of business days.
Sergiy has some code somewhere to do that type of stuff but I've sadly lost track of it.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 29, 2007 at 3:59 pm
Thanks, but that link simply describes how to create a calendar table (which we already can do for the various tables). It does not contrast benefits/drawbacks of using a table with various calendars vs. a datepart() replacement UDF when the target queries deal with large tables and the result set considers 50,000K rows.
December 29, 2007 at 5:48 pm
Heh... thought you said you "Frankly, I don't want to waste time working on something that's a bad idea or trying to prove which one is better"
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2007 at 11:30 am
For a calendar, there are two alternative solutions of creating a table or using a function.
As the SQL Query optimizer can make use of table indicies, constraints and statistics but cannot determine the result of a function, recommend using a table.
SQL = Scarcely Qualifies as a Language
December 30, 2007 at 11:47 am
Jeff Moden (12/29/2007)
Heh... thought you said you "Frankly, I don't want to waste time working on something that's a bad idea or trying to prove which one is better"
correct. we already know how to implement either alternative. but we don't want to implement both and then run timings to determine which is better. i'm hoping someone has already done an investigation of the performance impact from calling a scalarUDF 1000s of times in large queries. if the UDF read info from the db, then i know it would slower than using another table, but since the UDF performs no I/O it's impact should be reduced.
Hopefully one of you folks can point me to some reference material that will let us decide which route to take before expending any effort.
December 30, 2007 at 12:13 pm
Carl Federl (12/30/2007)
As the SQL Query optimizer can make use of table indicies, constraints and statistics but cannot determine the result of a function, recommend using a table.
thanks, but the UDF wouldn't be used for selectivity or joins, mostly just grouping/reporting labels.
here's pseudo-code for both approaches.
[font="Courier New"]declare @startDate int, @endDate int
set @startDate = dbo.fCalPartToDate( "P", "Q", 200501 ), @endDate = dateadd( day, -1, dbo.fCalPartToDate( "P", "Q", 200502 ) )
select dbo.fDateToCalPart( "P", "Y", O.orderDate ),
dbo.fDateToCalPart( "P", "M", O.orderDate ),
dbo.fDateToCalPart( "P", "W", O.orderDate ), ...
from Orders O ...
where ... and O.orderDate between @startDate and @endDate
...
group by dbo.fDateToCalPart( "P", "Y", O.orderDate ),
dbo.fDateToCalPart( "P", "M", T.orderDate ),
dbo.fDateToCalPart( "P", "W", T.orderDate )[/font]
Calendar table:
[font="Courier New"]select C.year, C.month, C.week, ...
from Calendar C join Orders O on C.date = O.orderDate ...
where ... and C.calType = "P" and C.quarter = 200501
...
group by C.year, C.month, C.week[/font]
December 30, 2007 at 2:20 pm
Your example SQL is exactly the case where a table would be superior to a stored procedure. With a table, the values of the date range are visible to the optimizer, but in the case of a function, the values are not known.
SQL = Scarcely Qualifies as a Language
December 30, 2007 at 2:52 pm
Carl Federl (12/30/2007)
Your example SQL is exactly the case where a table would be superior to a stored procedure. With a table, the values of the date range are visible to the optimizer, but in the case of a function, the values are not known.
Good. I've always preferred the table route since it's clearer and less susceptable to coding errors. But some folks seem to be paranoid about having more than 5 tables in a join.
December 30, 2007 at 10:06 pm
but we don't want to implement both and then run timings to determine which is better.
Yeah... you do...
--Jeff Moden
Change is inevitable... Change for the better is not.
December 31, 2007 at 1:37 pm
I agree with Jeff that you should do at least some time trials with your data and setup.
That being said I have used the table method with great success. You can create any custom output value you need fairly easily and add it to the table and avoid later processing during your query. For example MMQQ or QQYYYY.
Personally I've always thought that the "fear" of using extra tables is somewhat misunderstood. I've added a dozen small, well indexed lookup tables to a query with minimal impact. Even on large million+row querys. It all kind of depends on the size of the table, how you are using it and how it is indexed. A small (I would say 1800 rows is small) table with a good index (say on a date field) that is used strictly as a lookup should have minimal impact on your query. Even if its a large query on large tables.
Just my 2 cents though. I would give it a test and see how it works for you.
Kenneth
Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]
January 2, 2008 at 8:34 am
thanks for the feedback kenneth. we've decided on using a calendar table and should have it implemented in our framework by week's end.
as for testing, i don't agree entirely with you or jeff. i've found that doing simple tests outside of the application (aka, the real code) can often lead to unanticipated problems once the solution is deployed. ("it worked fine in development.") that's because this type of testing is usually done on small datasets and/or the access logic for the test isn't exactly the same as the real code (a few tables were omitted, the overall query was simplified, etc.). for us to properly test either solution, we'd have to implement it in our app and let the app generate the actual queries. and since this is calendar functionality, the gui requires changes also so we needed to decide which route to take beforehand.
i still don't have a good idea of what the performance impact of a UDF used instead of another table in a large query is. and that's really the question i wanted answered. most articles compare UDFs to builtins or discuss them used as sargs or make them so trivial that they can be easily replaced.
Are UDFs Harmful to SQL Server Performance
maybe later this month i'll try a couple of tests on our full dataset and decide for myself. but since we've got a deliverable and enough ammo to support the table alternative, the calendar table will be implemented. thanks to all who shared their thoughts on this.
January 2, 2008 at 9:08 am
I think it primary depends on the type of testing. Your absolutely right in that testing on a small amount of data or with a simplified query isn't going to do it if you are working with a complicated query or millions of rows. You can however get a good idea by running some tests with say 500k rows (should be sufficiently large to give you a true idea) and if you use profiler to get a sample query or 2 from your app and then modify them to use either the function or the additional table. I will say that in the end you have to write both alternatives (at least mostly) before you can begin testing. But at least you don't actually have to put them in your app.
All that being said my opinions are just that .. opinions and I certainly wouldn't say that anyone else's opinions aren't just as valid.
Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]
January 2, 2008 at 8:59 pm
There is a third option that lot's of folks forget about... that is the one of a formula in the code instead of an extra join or a dip on a UDF. Sure, you have to type the formula correctly every time, but that's what "code libraries" are for. And, not all UDF's are bad... a well written UDF will have virtually no overhead. I forget what the UDF did when we did some testing a few years back, but I do remember laughing out loud when the UDF actually outperformed the discreet code embedded in a proc. As I'm sure you're aware, UDF's also have the advantage of streamlining some development efforts... but so do small well indexed "helper" tables... after the first call, they can easily live entirely in cache memory and provide very very fast results in a join across multiple simultaneous sessions.
So, like Kenneth said, it's all in how you test and what the application of the method is.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply