When Microsoft first introduced CLR integration in SQL Server 2005, I was quite interested to see what wonderful things could be done that either: a) couldn't be done in regular T-SQL or b) would at least be more efficient than being done in T-SQL. As I started to read about Table-Valued Functions, I was excited to learn that the output from Table-Valued Functions (TVFs) could stream out to the calling process. There were examples in the MSDN documentation and I followed them and they worked. However, I started to notice that while the output itself might have streamed out of the data collected in the TVF, it seemed that you were still required to collect all of that data first in memory before returning it. This was a bit frustrating since it did not seem to truly "stream" in the way that I had expected. Sure CLR Stored Procedures can stream their results back but I wanted to create TVFs so that I could more easily interact with the output by doing WHERE conditions, GROUP BYs, ORDER BYs, etc. which are not possible with Stored Procedures unless you trap the output into a Table Variable or Temp Table (which I did not want to do).
After working with TVFs for some time, I learned more about the IEnumerable interface and what else could be done with it. I was elated to learn that the new TVF structure really could fully stream data out to the calling T-SQL query as Microsoft originally claimed. Of course, I was a little annoyed at the lack of easy to find examples of how to do this. If I search on "streaming clr tvf" then I can find a few examples, but they mostly show the full IEnumerator syntax requiring the MoveNext(), Current(), and Reset() methods. This method might be a bit confusing and most people will likely not start out using the word "streaming" when doing a search for an example. Simply searching on "clr tvf example" shows mostly articles showing the method which collects all data before releasing it. So, I decided it would be helpful for others who are starting out making their own CLR TVFs to have an actual example that will hopefully be easy to find in a search and will show the quick and easy method.
The Original / Standard Method
The method that I first found, and that I suspect a lot of others are also using, centers around creating a Generic List of the intended result row, adding a result row to that list per iteration of a loop, and then returning the entire Generic List which is the full result set.
private struct ReturnValues
{
public int Value;
} private static void FillValues(object obj, out SqlInt32 TheValue)
{
ReturnValues ReturnVals = (ReturnValues)obj;
TheValue = ReturnVals.Value;
}
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, IsPrecise = true, SystemDataAccess = SystemDataAccessKind.None, FillRowMethodName = "FillValues", TableDefinition = "IntValue INT")]
public static IEnumerable TVF_Standard(SqlInt32 MaxValue)
{
if (MaxValue.IsNull) {
return new List<ReturnValues>(); // return no rows } List<ReturnValues> AllVals = new List<ReturnValues>(); // container for the Result Set ReturnValues Vals = new ReturnValues(); // each row for (int index = 1; index <= MaxValue.Value; index++)
{
Vals.Value = index;
AllVals.Add(Vals); // add row to the Result Set container
} return AllVals; // return all data at once
}
The Fully Streaming Method
Before I show the quick and easy method I need to mention that there are actually two ways of accomplishing this full streaming: implementing IEnumerator or using "yield". The more involved method is providing a formalized framework to interact with the IEnumerator interface which involves creation of MoveNext(), Current(), and Reset() methods. This method is documented in a couple of places and hence I will not re-document that approach here since I want to focus on the less documented approach.
Now we can get to the fun stuff. All that is really needed to send a result row back per each iteration is using the "yield" command. Most of the time you will simply "yield return" the single-row data structure. Sometimes you might want to exit without sending any data back in which case you can use the "yield break" command.
private struct ReturnValues
{
public int Value;
} private static void FillValues(object obj, out SqlInt32 TheValue)
{
ReturnValues ReturnVals = (ReturnValues)obj;
TheValue = ReturnVals.Value;
}
[Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, IsPrecise = true, SystemDataAccess = SystemDataAccessKind.None, FillRowMethodName = "FillValues", TableDefinition = "IntValue INT")]
public static IEnumerable TVF_Streaming(SqlInt32 MaxValue)
{
if (MaxValue.IsNull) {
yield break; // return no rows } // we do not need the Generic List of <ReturnValues> ReturnValues Vals = new ReturnValues(); // each row for (int index = 1; index <= MaxValue.Value; index++)
{
Vals.Value = index;
yield return Vals; // return row per each itteration
} // we do not need to return everything at once
}
I put comments in the code to direct your attention to the differences. The main difference is that the "Standard" function has a Generic List that it builds up and returns at the end while the "Streaming" function does not have the Generic List and instead returns each row as it is created via the "yield return" command.
In order to test this easily, I have attached a SQL script that will create the Assembly which contains both of the above CLR Table-Valued Functions as well as their T-SQL wrapper functions. I have also attached a ZIP file containing the Visual Studio 2008 Project if you want to compile this yourself or see the full C# code. Please see the "Resources" section at the bottom of the article for these files.
Testing the first one (TVF_Standard) with a large value of 20,000,000 shows (at least on my machine 😉 that it runs out of memory with the following error:
SELECT MAX(IntValue) FROM Test.dbo.TVF_Standard(20000000)
Msg 6532, Level 16, State 49, Line 1
.NET Framework execution was aborted by escalation policy because of out of memory.
System.Threading.ThreadAbortException: Thread was being aborted.
System.Threading.ThreadAbortException:
at System.Collections.Generic.List`1.set_Capacity(Int32 value)
at System.Collections.Generic.List`1.EnsureCapacity(Int32 min)
at TestFunctions.TVF_Standard(SqlInt32 MaxValue)
Testing the second one (TVF_Streaming) with a value of 40,000,000 -- a value twice as great at the value that didn't work with the TVF_Standard function -- works with no problems.
SELECT MAX(IntValue) FROM Test.dbo.TVF_Streaming(40000000)
-- returns: 40000000
And I did include the "yield break" syntax in the C# function example so the queries below show each of the two functions working with a NULL input:
SELECT MAX(IntValue) FROM Test.dbo.TVF_Standard(NULL) SELECT MAX(IntValue) FROM Test.dbo.TVF_Streaming(NULL)
Both of those queries return NULL.
Sadly, Not Always
As a final note to recommending "yield return", it should be mentioned that it does not work in all cases. Sometimes, when using the SqlChars and SqlBytes datatypes for input parameters you might run into the following error:
Msg 6260, Level 16, State 1, Line 1
An error occurred while getting new row from user defined Table Valued Function :
System.InvalidOperationException: Accessing members of an object from a wrong thread.
System.InvalidOperationException:
at System.Data.SqlServer.Internal.ClrLevelContext.XvarProxyRead(CClrXvarProxy* pXvarProxy, UInt64 iPosition, Byte* pbBuffer, UInt32 cbCount)
at System.Data.SqlServer.Internal.ClrLevelContext.System.Data.SqlServer.Internal.IXvarProxyAccessor.XvarProxyRead(CClrXvarProxy* , UInt64 , Byte* , UInt32 )
at System.Data.SqlServer.Internal.StreamOnBlobHandle.Read(Byte* pbBuffer, UInt64 offset, UInt32 count)
at System.Data.SqlServer.Internal.XvarWlobStream.Read(Char[] buffer, Int32 offset, Int32 count)
at System.Data.SqlTypes.SqlChars.get_Value()
This is a bug that was introduced in SQL Server 2008 and according to a Microsoft blog, is fixed in the next version:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=382235
Copyright © 2009 Solomon Rutzky