This article doesn't introduce anything new with respect to development instead but highlights the effects of optimizing code for "mass executions" as is likely typical with a SQL CLR deployment.
I'm just now getting to a point where I can start playing with the SQL CLR capabilities of SQL Server 2005 / 2008 / 2008 R2. I started out with what I determined would be the best possible scenario we (my company) stand to gain from the availability of SQL CLR - a distance calculation user-defined function. The T-SQL implementation is not overly complicated or messy, but given it is purely a computational scalar function and based on what I've read online regarding when to use the SQL CLR, I saw this as a good candidate.
My first implementation of the Distance user-defined SQL CLR function looked similar to this:
<SqlFunction()> _ Public Shared Function Distance( _ ByVal Latitude1 As SqlDecimal, _ ByVal Longitude1 As SqlDecimal, _ ByVal Latitude2 As SqlDecimal, _ ByVal Longitude2 As SqlDecimal) As SqlDecimal Dim Latitude1Radian As SqlDecimal = Latitude1 * Math.PI / 180.0 Dim Longitude1Radian As SqlDecimal = Longitude1 * Math.PI / 180.0 Dim Latitude2Radian As SqlDecimal = Latitude2 * Math.PI / 180.0 Dim Longitude2Radian As SqlDecimal = Longitude2 * Math.PI / 180.0 Dim LongSpanRadian As SqlDecimal = Longitude2Radian - Longitude1Radian Dim LatSpanRadian As SqlDecimal = Latitude2Radian - Latitude1Radian 'intermediate result a Dim a As SqlDecimal = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2) 'intermediate result c (great circle distance in radians) Dim c As SqlDecimal = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a)) Return 3956.0 * c End Function
How did it perform? Not too bad. I calculated (in a very unofficial fashion) about an 11% reduction in time to execute a query compared to the pure T-SQL implementation. But based on reports I've seen regarding the benefits of SQL CLR, I must say I was disappointed in such a marginal improvement.
As such, I set about trying some different methods to attempt to optimize the function.
Notice the prevalence of SqlDecimal data types. I used them (thinking I had to) based on examples I've found on the Internet. Then I saw an example that used Decimal (a native .net data type) instead and I thought I'd try that to see if there were any differences. I also thought I could eliminate redundant operations by creating some public constants.
The resulting block of code (shown below) was my next iteration of the Distance user-defined function using SQL CLR.
Imports System.Data.SqlTypes Imports Microsoft.SqlServer.Server Partial Public Class UserDefinedFunctions Public Const EarthRadiusInMiles As Decimal = 3956.0 Public Const RadianConversionConst As Decimal = Math.PI / 180.0 <SqlFunction()> _ Public Shared Function Distance( _ ByVal Latitude1 As Decimal, _ ByVal Longitude1 As Decimal, _ ByVal Latitude2 As Decimal, _ ByVal Longitude2 As Decimal) As Decimal Dim Latitude1Radian As Decimal = Latitude1 * RadianConversionConst Dim Longitude1Radian As Decimal = Longitude1 * RadianConversionConst Dim Latitude2Radian As Decimal = Latitude2 * RadianConversionConst Dim Longitude2Radian As Decimal = Longitude2 * RadianConversionConst Dim LongSpanRadian As Decimal = Longitude2Radian - Longitude1Radian Dim LatSpanRadian As Decimal = Latitude2Radian - Latitude1Radian 'intermediate result a Dim a As Decimal = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2) 'intermediate result c (great circle distance in radians) Dim c As Decimal = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a)) Return EarthRadiusInMiles * c End Function End Class
The result was an (unofficial) 60% reduction in execution time! And it makes sense. The SqlDecimal would have to be converted to the associated native data type when passed into the various methods (Math.Cos, Math.Pow, Math.Sin, etc). That's a lot of conversions and added overhead for no substantial gain.
I then decided to write a simple test template and get some more accurate results of these differences and how performance is impacted. I created the four following methods within my SQL CLR library file:
- DistanceSqlTypesNoGlobalConsts
- DistanceSqlTypesGlobalConsts
- DistanceNoGlobalConsts
- DistanceGlobalConsts
I then created 4 separate 'stub' user-defined functions that called the associated assembly method and executed each one 30 times using a table of 100,000 latitude/longitude values and the following loop (showing the execution of one of the test functions - DistanceGlobalConsts).
DECLARE @counter int, @StartTime datetime, @EndTime datetime, @ElapsedMilliseconds int DECLARE @ResultsTable table (iteration int not null, elapsedtime int not null) SET @counter = 30 WHILE @counter > 0 BEGIN -- Clear cached data CHECKPOINT; DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; SELECT @StartTime = GETDATE(); SELECT dbo.DistanceGlobalConsts(40.82000, -73.05000, Latitude, Longitude) FROM TestTable SELECT @EndTime = GETDATE(); SELECT @ElapsedMilliseconds = DATEDIFF(ms, @StartTime, @EndTime); INSERT INTO @ResultsTable (iteration, elapsedtime) VALUES (@counter, @ElapsedMilliseconds) SET @counter = @counter - 1; END SELECT * FROM @ResultsTable
The full results are attached for your review but the condensed results are presented in the table below.
Method | Average Time (milliseconds) |
---|---|
DistanceSqlTypesNoGlobalConsts | 1821.766667 |
DistanceSqlTypesGlobalConsts | 1283.966667 |
DistanceNoGlobalConsts | 756.866667 |
DistanceGlobalConsts * | 734.433333 |
T-SQL | 1403.933333 |
* Note: I ran the DistanceGlobalConsts test case first and the first iteration of the 30 iteration test was about 200 milliseconds longer than the other iterations. I did not notice the same discrepancy on my other test cases so I re-processed the DistanceGlobalConsts and saw the first iteration was consistent with the other 29 iterations. I suspect the difference may have been a result of loading the assembly since it was the initial call to the assembly since it was created. Since all methods were in the same assembly, I think it is more consistent to use the re-processed results for DistanceGlobalConsts. Both tests are included in the full results attachment.
I like to think the hardware is immaterial for the purposes of this test (all tests were run on the same hardware in the same configuration/state); however, if you are interested in me posting specifics about the hardware used in this test, please feel free to request the details you are interested in. Also, I haven't considered any other performance metrics (disk io or memory consumption, in particular) so those should be considered as well.
There may be more that can be done to optimize this particular user-defined function but the general lack of information outlining the significance of using native .net data types versus the sqltypes within the SQL CLR assembly prompted me to share my discovery as I embark on playing with the SQL CLR.