June 8, 2008 at 12:36 am
I have a code element that runs continually. It assigns a category to new data. The data is position and the category is the nearest road (or place). So the table the category comes from is huge, and growing, and determining the closest road is quite a calculation.
I have tried various ways of optimising this code and have reached a strange conclusion that the code is fastest if all the workings are left within the main procedure. If I move it into a sub-procedure it is a little slower. If I move it into a function it takes more than twice as long to run.
Does anyone know a good reference or have experience in joining large tables where the join is not an exact match (I am linking on the closest match).
In simple terms there are 2 tables; 1 reference table with 10 Million rows and the new-data table with 100 to 10,000 rows. The idea is to match each of the new-data rows to a single (closest) reference row using a distance calculation.
If I do everything in the Stored Procedure the run time is 50miliseconds per row. If I move the calculation into a function this becomes 150 ms and when I move the join into a function it slows to 400 ms.
This makes no sense? I thought functions were supposed to be more efficient.
The machine; dual CPU and dual-core, 4Gb ram. There is no problem there.
June 8, 2008 at 10:00 am
For better or worse - running a function like that will usually end up being very slow. In my experience, functions (T-SQL functions that is) are NOT know for their speed.
Not knowing anything about this calculation, it's going to be hard to be specific on things. That being said - for complicated calculations, I've found that CLR functions are faster than T-SQL functions, although they can sometimes run into resource issues if garbage collection can't keep up, etc... If appropriate - try setting the functions to deterministic and/or precise, so that the optimizer knows it can reuse previous results based on same input.
The problem is that the function often forces SQL Server to have to touch each and every single row one at a time, instead of using set-based retrieval and calculations, which end up being a huge performance penalty. So even though a CLR function is more efficient than the T-SQL version, both are starting with such a perf penalty that it can alomst never compensate for.
As a result, you will probably find that building the calculation directly into the query, and possibly encapculating the entire recordset in a stored proc will be the fastest execution, assuming your indexing scheme is correct and helpful, etc...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 9, 2008 at 3:20 am
Thanks for the confirmation. I reached the conclusion after trial and error that a long query with all elenments within the select statment was faster than an beautifully written one that referred complex calculations out to functions. I just have no experience of functions and optimisation to back up my 'best-performance' structure.
Apart from the benefit of a common code segment placed in a Function I now don's see any benefit in using them. I went so far as to test a few other Stored Procedures that used the complex calculation function - where I copied the exact code in the function back into the procedure - and in every case the procedure ran faster than when the code used the function.
If there are any articles out there that explain the actual workings where a function is more efficient please point me to this. I still like the concept of encapsulating repeated code elements in functions but the penalty in performance is a concern.
As far as using CLR goes it will not work in this case because the calculation is using columns of 2 tables.
June 9, 2008 at 9:44 am
The place where I've found functions useful is where you have to join to them. It's not so easy joining to a proc.
Also, multi-select functions use a table variable, which means they don't store statistics, which slows them down, but inline table-value functions can use statistics in the same manner as views and procs (no table variable).
Functions do take a performance hit, but it can be worth it for complex code that will be re-used in many places, if you can minimize the performance issues. I have a hierarchy function (inline, not multi-select) that has very good performance (complex hierarchies in under 20 milliseconds). Placing the same code in each query that uses it makes it about 5-10% faster, but also makes for slower development, more complex troubleshooting, and so on. In this case, the cost is worth it to me. Especially as it only gets called when the nested sets hierarchy is out of synch (which isn't very often).
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 9, 2008 at 10:05 am
You might also want to take a look at Adam Machanic's book Expert SQL server 2005: http://www.amazon.com/gp/product/159059729X/ref=cm_arms_pdp_dp
Especially chapter 9: "Spatial Data", which I think was written by HUGO KORNELIS (who posts here occasionally).
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 9, 2008 at 12:37 pm
For my 2 cents.
I've found the fastest way to do something like select zip codes within a radius is to use simple betweens for latitude and longitude into a temp table. Latitude and longitude can be indexed in your zip code table. You calculate the furthest latitudes and longitudes from the center point and use those for your initial temp table.
This gives you a square basically around the center point. Then use the trigonometric functions to trim the square into a circle and delete temp table records initially selected.
This may jot be applicable to your application.
As far as functions go, I've found that they can be very, very fast when used to string data together - it's about the only way to avoid a cursor. It's still RBAR, but it's faster RBAR.
Something like:
CREATE FUNCTION fnStringCustNotes
(
@CustomerIDINT
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE
@NotesVARCHAR(MAX)
, @Comma VARCHAR(1)
SET @Notes = ''
SET @Comma = ''
SELECT @Notes = @Notes + @Comma + X.Notes, @Comma = ','
FROM
(SELECT Notes
FROM CustomerNotes
WHERECustomerID = @CustomerID
) AS X
RETURN @Notes
END
June 9, 2008 at 1:42 pm
Actually, when it comes to concatenating strings, you can do that with the coalesce funtion pretty easily. Doesn't require any UDF or loop.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 10, 2008 at 10:45 am
I'd like to see how that's done exactly. This is for a one to many relationship between Customer and CustomerNotes? There could be zero or more notes per customer.
Todd Fifield
June 10, 2008 at 10:56 am
Todd - assuming that we're dealing with SQL2005, you actually have a few options. One is the "xml trick" - which looks something like this:
SELECT t1.SomeID,
STUFF((SELECT ','+t2.SomeCode
FROM dbo.TestData t2
WHERE t1.SomeID = t2.SomeID
FOR XML PATH('')),1,1,''
)
FROM dbo.TestData t1
GROUP BY t1.SomeID
(stolen as is right out of Jeff's article on concatenation)
http://www.sqlservercentral.com/articles/Test+Data/61572/[/url]
The second is - you can actually build a user-defined Aggregate in CLR, to do the concatenation. Books online sports the code for the concatenation aggregate right here:
http://msdn.microsoft.com/en-us/library/ms131056.aspx
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 10, 2008 at 11:40 am
Matt,
Thanks. I'm going to try it. I'd forgotten about these features since I'm mostly dealing with databases with compatibility mode 80. I keep getting burned trying to use 2005 features so mostly I don't bother.
Todd Fifield
June 10, 2008 at 1:07 pm
I'll save you the pain - that won't work against compat=80....:D
Actually - neither will. Those are both "compatibility 90" options...
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 11, 2008 at 3:27 pm
Thanks Matt. I'll just have to create one off to the side to play with the new features.
Todd Fifield
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply