A couple of days ago I read Steve Jones’ blog on creating a user defined table type. This reminded me that I had just recently been reading about TVPs (Table Valued Parameters) which require user defined table types. Steve wondered about how useful the concept of user defined table types and this started me thinking about different ways that TVPs could be used. Specifically replacing comma delimited lists that get passed into stored procedures to be used in an IN clause.
In this particular case I’m going use AdventureWorks2008 for my examples.
Here is my made up situation: A developer has come to me asking for help. He has a table called Person.BusinessEntity and needs to pass in a list of IDs and get back a record set with the matching rows from the table. The first thing he thought of was to use an IN clause.
CREATE PROCEDURE usp_GetBusinessEntitys (@EntityList varchar(max)) AS<br /> SELECT * FROM Person.BusinessEntity WHERE BusinessEntityID IN (@EntityList) GO
It would be great if this worked but unfortunately it doesn’t and he gets back:
Conversion failed when converting the varchar value 'comma delimited list of values' to data type int.
You wouldn’t believe how many times I’ve been asked over the years for a good solution to this particular situation. (Well, since most of the people reading this are going to either be developers or DBAs you probably do.) The best solution I’ve been able to come up with over the years is to use dynamic SQL as I do here:
CREATE PROCEDURE usp_GetBusinessEntitys (@EntityList varchar(max)) AS<br /> DECLARE @SQL nvarchar(max) SET @SQL = 'SELECT * FROM Person.BusinessEntity WHERE BusinessEntityID IN ('+@EntityList+')' EXEC sp_executesql @SQL<br /> GO
Now to get back on topic I’m going to try to do the same thing using a TVP.
First I create the user defined table type.
CREATE TYPE BusinessEntityIds AS TABLE ( BusinessEntityId INT PRIMARY KEY); GO
I wasn’t certain I would be able to create one with a primary key so I tried it and it worked. Honestly I find that one fact really cool. But to move on ….
Now I create the stored procedure:
CREATE PROCEDURE usp_GetBusinessEntitys_TVP (@EntityList BusinessEntityIds READONLY) AS</p></pre> SELECT Person.BusinessEntity.* <pre> FROM Person.BusinessEntity<br /> JOIN @EntityList EntityList ON BusinessEntity.BusinessEntityID = EntityList.BusinessEntityId GO
Pretty cool hu?
I could also have done it this way:
CREATE PROCEDURE usp_GetBusinessEntitys_TVP2 (@EntityList BusinessEntityIds READONLY) AS</p> <p>SELECT *<br /> FROM Person.BusinessEntity<br /> WHERE BusinessEntityID IN (SELECT BusinessEntityId FROM @EntityList) GO
Now I’ll try all 3 versions out and make sure that everything is working correctly and that I get the same results from each sp.
DECLARE @EntityList varchar(max) DECLARE @EntityList_TVP BusinessEntityIds</p></pre> SET @EntityList = '1,5,67,100,301,543,997,1111,1245,3356,4295,6546,8342,8567,9000,9265,10045,10321,11456,12545' <pre> INSERT INTO @EntityList_TVP VALUES (1), (5), (67), (100), (301),<br /> (543), (997), (1111), (1245), (3356),<br /> (4295), (6546), (8342), (8567), (9000),<br /> (9265), (10045), (10321), (11456), (12545)<br /> -- Run stored procedures<br /> EXEC usp_GetBusinessEntitys @EntityList<br /> EXEC usp_GetBusinessEntitys_TVP @EntityList_TVP<br /> EXEC usp_GetBusinessEntitys_TVP2 @EntityList_TVP<br />
I’m not going to bother showing the results. You can either try it yourself or take my word for it that everything works just fine.
Next I’m going to do some comparisons. I used the same script only with SET STATISTICS IO ON and SET STATISTICS TIME ON. I also turned on the actual execution plan for each.
Running the same script again I get some rather interesting results. First the part where I load the table variable was 7% of the total plan. Seemed like rather a lot to me but we can go with it. Next comes the stored procedure with dynamic SQL. It only took up a whopping 2%! The remaining 2 stored procedures (the TVP versions) took up an even 45% each. Then I looked at the IO and TIME statistics.
usp_GetBusinessEntitys
<br /> Table 'BusinessEntity'. Scan count 20, logical reads 40, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.</pre> <pre> <p>(1 row(s) affected)</p> <p>SQL Server Execution Times:<br /> CPU time = 0 ms, elapsed time = 279 ms.</p> <p>SQL Server Execution Times:<br /> CPU time = 0 ms, elapsed time = 280 ms.<br />
usp_GetBusinessEntitys_TVP
<br /> Table 'BusinessEntity'. Scan count 0, logical reads 40, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.<br /> Table '#0F975522'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.</pre> <pre> <p>(1 row(s) affected)</p> <p>SQL Server Execution Times:<br /> CPU time = 0 ms, elapsed time = 396 ms.</p> <p>SQL Server Execution Times:<br /> CPU time = 15 ms, elapsed time = 397 ms.<br />
usp_GetBusinessEntitys_TVP2
<br /> Table 'BusinessEntity'. Scan count 0, logical reads 40, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.<br /> Table '#15502E78'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.</pre> <pre> <p>(1 row(s) affected)</p> <p>SQL Server Execution Times:<br /> CPU time = 16 ms, elapsed time = 423 ms.</p> <p>SQL Server Execution Times:<br /> CPU time = 16 ms, elapsed time = 429 ms.<br />
The reads are pretty much even across the board (ignoring the extra couple from the table variable). But for time result is almost even between the 2 versions using TVPs and the dynamic SQL version is clearly faster.
So far the dynamic SQL version is clearly the winner. But that is with only 20 values in the list. Will it scale? Let’s try it with the full table, some 20,000 rows.
DECLARE @EntityList varchar(max) DECLARE @EntityList_TVP BusinessEntityIds</p></pre> SELECT @EntityList = STUFF((SELECT ','+CAST(BusinessEntityID AS varchar) <pre> FROM Person.BusinessEntity<br /> ORDER BY BusinessEntityId<br /> FOR XML PATH(''),TYPE).value('.','varchar(max)'),1,1,'')</pre> <pre> <p>INSERT INTO @EntityList_TVP<br /> SELECT BusinessEntityId<br /> FROM Person.BusinessEntity</p> <p>-- Run stored procedures<br /> EXEC usp_GetBusinessEntitys @EntityList<br /> EXEC usp_GetBusinessEntitys_TVP @EntityList_TVP<br /> EXEC usp_GetBusinessEntitys_TVP2 @EntityList_TVP<br />
Again first the execution plans. This time we get a vastly different skew. The setup (loading the parameters) was 74% and the remaining 26% was 22% to the dynamic SQL version and an even split of 2% each on the TVP versions. The big difference was the plan for the dynamic SQL SP. The plans for the TVP versions stayed consistent.
Now the stats:
usp_GetBusinessEntitys
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'BusinessEntity'. Scan count 1, logical reads 98, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.</p> <p>(1 row(s) affected)</p> <p>SQL Server Execution Times:<br /> CPU time = 891 ms, elapsed time = 2441 ms.</p> <p>SQL Server Execution Times:<br /> CPU time = 907 ms, elapsed time = 2447 ms.</p> <p>SQL Server Execution Times:<br /> CPU time = 907 ms, elapsed time = 2570 ms.<br />
usp_GetBusinessEntitys_TVP
Table 'BusinessEntity'. Scan count 0, logical reads 41554, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.<br /> Table '#4BAC3F29'. Scan count 1, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.</pre> <pre> <p>(1 row(s) affected)</p> <p>SQL Server Execution Times:<br /> CPU time = 172 ms, elapsed time = 38086 ms.</p> <p>SQL Server Execution Times:<br /> CPU time = 172 ms, elapsed time = 38088 ms.<br />
usp_GetBusinessEntitys_TVP2
Table 'BusinessEntity'. Scan count 0, logical reads 41554, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.<br /> Table '#4BAC3F29'. Scan count 1, logical reads 36, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.</pre> <pre> <p>(1 row(s) affected)</p> <p>SQL Server Execution Times:<br /> CPU time = 250 ms, elapsed time = 2150 ms.</p> <p>SQL Server Execution Times:<br /> CPU time = 250 ms, elapsed time = 2151 ms.<br />
And on reads the dynamic SQL version wins with 98 logical reads to 41554 logical reads for each of the TVP SPs (on the main table only, reads on the TVPs weren’t significant comparatively). As far as execution time the TVP version that uses IN is somewhat faster than the dynamic SQL with 2.1 seconds to 2.5 seconds. The TVP version that uses a JOIN takes significantly longer with 38 seconds.
The end result? I’ll probably continue to use dynamic SQL. Now that varchar(max) variables and parameters exist there is no longer a problem of running out of space when you get large numbers of values. In IO the dynamic SQL version was clearly the winner once the numbers were scaled up. The TVP IN version obviously scaled better in terms of speed but it still wasn’t enough faster than the dynamic SQL version to be worth it unless you are expecting huge numbers of values in the list and it’s considerably slower with small numbers of values.