function vs. another table join

Question

Post reply

function vs. another table join

antonio.collins

SSCrazy Eights

Points: 9600
More actions
December 29, 2007 at 10:02 am

#180099

My app uses multiple calendars. For the regular (gregorian) calendary, the week, month, quarter, etc. of a given date can be determined by datepart(). However, datepart() can't be used with the other calendar types since the year start and month boundaries are arbitrary.
For example, the fiscal year may begin June 1 so July 15 is actually quarter 1. In the production calendar, weeks never split months so July 30th may actually be considered part of August.
I figure I can support the calendar in one of two ways. The first is to create a table which associates the proper week/month/qtr/etc. to each date in the db for each calendar. Since our retention interval is 5 years, that's only 1800 rows per calendar.
The alternative is to create a function that would be used instead of datepart(). This UDF never does any I/O since the production calendar start dates are hard-coded for each year. The logic is pretty simple and the function is only about 40 lines long.
I think the table solution is the best choice. But since this logic will be usd in very large queries (7 tables, 2 of which have multi-million rows) some other folks have advised me against adding yet another table. Still, I'm very skeptical about the performance resulting from calling such a function 50,000+ times in a query.
Frankly, I don't want to waste time working on something that's a bad idea or trying to prove which one is better. Hopefully one of you folks can point me to some reference material that will let us decide which route to take before expending any effort.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 1

Then, consider this...

http://sqlserver2000.databases.aspfaq.com/why-should-i-consider-using-an-auxiliary-calendar-table.html

The only problem is that I've not figured out a good set based method for future dating things like deliveries given a ship date and a number of business days.

Sergiy has some code somewhere to do that type of stuff but I've sadly lost track of it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 2

Thanks, but that link simply describes how to create a calendar table (which we already can do for the various tables). It does not contrast benefits/drawbacks of using a table with various calendars vs. a datepart() replacement UDF when the target queries deal with large tables and the result set considers 50,000K rows.

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 3

Heh... thought you said you "Frankly, I don't want to waste time working on something that's a bad idea or trying to prove which one is better"

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Carl Federl One Orange Chip Points: 25384 More actions · Answer 4

For a calendar, there are two alternative solutions of creating a table or using a function.

As the SQL Query optimizer can make use of table indicies, constraints and statistics but cannot determine the result of a function, recommend using a table.

SQL = Scarcely Qualifies as a Language

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 5

Jeff Moden (12/29/2007)
Heh... thought you said you "Frankly, I don't want to waste time working on something that's a bad idea or trying to prove which one is better"

correct. we already know how to implement either alternative. but we don't want to implement both and then run timings to determine which is better. i'm hoping someone has already done an investigation of the performance impact from calling a scalarUDF 1000s of times in large queries. if the UDF read info from the db, then i know it would slower than using another table, but since the UDF performs no I/O it's impact should be reduced.

Hopefully one of you folks can point me to some reference material that will let us decide which route to take before expending any effort.

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 6

Carl Federl (12/30/2007)
As the SQL Query optimizer can make use of table indicies, constraints and statistics but cannot determine the result of a function, recommend using a table.

thanks, but the UDF wouldn't be used for selectivity or joins, mostly just grouping/reporting labels.

here's pseudo-code for both approaches.

[font="Courier New"]declare @startDate int, @endDate int

set @startDate = dbo.fCalPartToDate( "P", "Q", 200501 ), @endDate = dateadd( day, -1, dbo.fCalPartToDate( "P", "Q", 200502 ) )

select dbo.fDateToCalPart( "P", "Y", O.orderDate ),

dbo.fDateToCalPart( "P", "M", O.orderDate ),

dbo.fDateToCalPart( "P", "W", O.orderDate ), ...

from Orders O ...

where ... and O.orderDate between @startDate and @endDate

...

group by dbo.fDateToCalPart( "P", "Y", O.orderDate ),

dbo.fDateToCalPart( "P", "M", T.orderDate ),

dbo.fDateToCalPart( "P", "W", T.orderDate )[/font]

Calendar table:

[font="Courier New"]select C.year, C.month, C.week, ...

from Calendar C join Orders O on C.date = O.orderDate ...

where ... and C.calType = "P" and C.quarter = 200501

...

group by C.year, C.month, C.week[/font]

Carl Federl One Orange Chip Points: 25384 More actions · Answer 7

Your example SQL is exactly the case where a table would be superior to a stored procedure. With a table, the values of the date range are visible to the optimizer, but in the case of a function, the values are not known.

SQL = Scarcely Qualifies as a Language

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 8

Carl Federl (12/30/2007)
Your example SQL is exactly the case where a table would be superior to a stored procedure. With a table, the values of the date range are visible to the optimizer, but in the case of a function, the values are not known.

Good. I've always preferred the table route since it's clearer and less susceptable to coding errors. But some folks seem to be paranoid about having more than 5 tables in a join.

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 9

but we don't want to implement both and then run timings to determine which is better.

Yeah... you do...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Kenneth.Fisher SSC-Insane Points: 20396 More actions · Answer 10

I agree with Jeff that you should do at least some time trials with your data and setup.

That being said I have used the table method with great success. You can create any custom output value you need fairly easily and add it to the table and avoid later processing during your query. For example MMQQ or QQYYYY.

Personally I've always thought that the "fear" of using extra tables is somewhat misunderstood. I've added a dozen small, well indexed lookup tables to a query with minimal impact. Even on large million+row querys. It all kind of depends on the size of the table, how you are using it and how it is indexed. A small (I would say 1800 rows is small) table with a good index (say on a date field) that is used strictly as a lookup should have minimal impact on your query. Even if its a large query on large tables.

Just my 2 cents though. I would give it a test and see how it works for you.

Kenneth

Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

antonio.collins SSCrazy Eights Points: 9600 More actions · Answer 11

thanks for the feedback kenneth. we've decided on using a calendar table and should have it implemented in our framework by week's end.

as for testing, i don't agree entirely with you or jeff. i've found that doing simple tests outside of the application (aka, the real code) can often lead to unanticipated problems once the solution is deployed. ("it worked fine in development.") that's because this type of testing is usually done on small datasets and/or the access logic for the test isn't exactly the same as the real code (a few tables were omitted, the overall query was simplified, etc.). for us to properly test either solution, we'd have to implement it in our app and let the app generate the actual queries. and since this is calendar functionality, the gui requires changes also so we needed to decide which route to take beforehand.

i still don't have a good idea of what the performance impact of a UDF used instead of another table in a large query is. and that's really the question i wanted answered. most articles compare UDFs to builtins or discuss them used as sargs or make them so trivial that they can be easily replaced.

Are UDFs Harmful to SQL Server Performance

Performance Issues with UDFs

maybe later this month i'll try a couple of tests on our full dataset and decide for myself. but since we've got a deliverable and enough ammo to support the table alternative, the calendar table will be implemented. thanks to all who shared their thoughts on this.

Kenneth.Fisher SSC-Insane Points: 20396 More actions · Answer 12

I think it primary depends on the type of testing. Your absolutely right in that testing on a small amount of data or with a simplified query isn't going to do it if you are working with a complicated query or millions of rows. You can however get a good idea by running some tests with say 500k rows (should be sufficiently large to give you a true idea) and if you use profiler to get a sample query or 2 from your app and then modify them to use either the function or the additional table. I will say that in the end you have to write both alternatives (at least mostly) before you can begin testing. But at least you don't actually have to put them in your app.

All that being said my opinions are just that .. opinions and I certainly wouldn't say that anyone else's opinions aren't just as valid.

Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 13

There is a third option that lot's of folks forget about... that is the one of a formula in the code instead of an extra join or a dip on a UDF. Sure, you have to type the formula correctly every time, but that's what "code libraries" are for. And, not all UDF's are bad... a well written UDF will have virtually no overhead. I forget what the UDF did when we did some testing a few years back, but I do remember laughing out loud when the UDF actually outperformed the discreet code embedded in a proc. As I'm sure you're aware, UDF's also have the advantage of streamlining some development efforts... but so do small well indexed "helper" tables... after the first call, they can easily live entirely in cache memory and provide very very fast results in a join across multiple simultaneous sessions.

So, like Kenneth said, it's all in how you test and what the application of the method is.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)