(The complete usable code is in the very bottom of this blog post)
Some of you may have seen this “numbers table” technique in use before, but I thought I would share a few simpel examples to those of you that haven’t seen the light yet
Lets start with a classic problem: Converting a delimited string into a table.
In many old setups I have seen, it’s not uncommon to have the need for passing multiple values to a stored procedure, to look up data for multiple values in one go. Let me build my demo table, and show how such a stored procedure could look like:
CREATE TABLE MyFavoriteMovies ( MovieId INT IDENTITY(1,1) PRIMARY KEY, MovieTitle VARCHAR(100), NumberOfViews INT ) GO INSERT INTO MyFavoriteMovies (MovieTitle, NumberOfViews) VALUES ('Gone with the Wind', 23), ('Driving Miss Daisy', 9), ('Martha', 16), ('Fætrene på Torndal', 33), ('Casablanca', 8), ('Dumb & Dumber', 6), ('Wargames', 24), ('Geniiius, the movie', NULL) GO CREATE PROCEDURE FetchMovieTitles @MovieIds VARCHAR(1000) AS DECLARE @sqlcmd VARCHAR(MAX) = ' SELECT * FROM MyFavoriteMovies WHERE MovieId IN (' + @MovieIds + ')' EXEC (@sqlcmd) GO
I created a table to hold a list of my favorite movies, and then I have a stored procedure i can execute, that takes a comma separated list of MovieId’s as parameter. It’s not pretty, I know, but nevertheless it’s a template I have seen numerous times over the time.
So why is this not ideal?
The obvious reason is, that because the stored procedure generates the sql command dynamically, and then executes it without any check whatsoever, it’s open for sql injections. On top of that security issue, it also generates a new execution plan for every different parameter that the procedure is executed with. If my query was more complex in the procedure, the dynamic sql could easily becomre nasty to work with. Think about the single quote which serves both as the escape character but also surrounds strings. An example of this could look like this:
DECLARE @sqlcmd VARCHAR(MAX) = ' SELECT * FROM MyFavoriteMovies WHERE MovieId IN (' + @MovieIds + ') AND MovieTitle = ''Wargames'''
Ugly, right?
So how can we fix this you ask? Imaging we could convert the list of MovieId’s to a table, then the query could look something like this:
SELECT * FROM MyFavoriteMovies WHERE MovieId IN (SELECT MovieId FROM #MovieIds) AND MovieTitle = 'Wargames'
Now we got rid of the dynamic sql, and it’s much prettier to read. But we don’t have the #MovieIds table yet, so we need to find a way to convert our delimited string to a table.
Let’s start to build the solution up, by looking at this:
DECLARE @MovieIds VARCHAR(1000) = '1,3,5' SELECT SUBSTRING(@MovieIds, 1, len(@MovieIds)) SELECT SUBSTRING(@MovieIds, 2, len(@MovieIds)) SELECT SUBSTRING(@MovieIds, 3, len(@MovieIds)) SELECT SUBSTRING(@MovieIds, 4, len(@MovieIds)) SELECT SUBSTRING(@MovieIds, 5, len(@MovieIds))
Here I have five select statements, which performs a substring on the variable, but with an increasing offset going from 1 to 5. Notice that the length of the @MovieIds string is exactly 5 characters. Does this get my any closer to the solution? Can you spot the pattern? The first character in the first, third and fifth result is actually the values I’m looking for. But how could I identify these? What if we instead of selecting the rest of the string in each of the substring command, could stop at the next comma?
DECLARE @MovieIds VARCHAR(1000) = '1,3,5' SELECT SUBSTRING(@MovieIds, 1, CHARINDEX(',', @MovieIds+',', 1) - 1) SELECT SUBSTRING(@MovieIds, 2, CHARINDEX(',', @MovieIds+',', 2) - 2) SELECT SUBSTRING(@MovieIds, 3, CHARINDEX(',', @MovieIds+',', 3) - 3) SELECT SUBSTRING(@MovieIds, 4, CHARINDEX(',', @MovieIds+',', 4) - 4) SELECT SUBSTRING(@MovieIds, 5, CHARINDEX(',', @MovieIds+',', 5) - 5)
The length of the substring is now replaced by the CHARINDEX function. Once again notice the pattern with the increasing values from 1 to 5. The result now almost look like what I want. If I could filter out the second and fourth row, and merge the rest into a table – then I’m done.
Enter Numbers table
CREATE TABLE Numbers ( n INT PRIMARY KEY ) GO INSERT INTO Numbers (n) VALUES (1), (2), (3), (4), (5) GO SELECT n FROM Numbers GO
Do you see a pattern that looks like the substring queries above? Now we can actually use the n column from the Numbers table instead of the hardcoded values for the increasing numbers 1 to 5. That would look like this:
DECLARE @MovieIds VARCHAR(1000) = '1,3,5' SELECT SUBSTRING(@MovieIds, n, CHARINDEX(',', @MovieIds+',', n) - n) FROM Numbers
Now I select from the Numbers table, and the only coulumn I select, is the substring function from before – but now with column n instead of the hardcoded values 1 to 5. The output:
It’s getting warm… we are almost there.
To filter out the rows I don’t want, I need to find something I can use in a where clause. The easiest is to simply use the very same substring line, and the find the rows with a value <> ‘’. That would look like this:
DECLARE @MovieIds VARCHAR(1000) = '1,3,5' DECLARE @Delimiter VARCHAR(5) = ',' SELECT SUBSTRING(@MovieIds, n, CHARINDEX(@Delimiter, @MovieIds+@Delimiter, n) - n) as MyCol FROM Numbers WHERE SUBSTRING(@MovieIds, n, CHARINDEX(@Delimiter, @MovieIds+@Delimiter, n) - n) <> ''
This looks very much like what we need. If we put this in a function called dbo.SplitToTable() that takes the parameters @Input and @Delimiter, we can write our original procedure like this:
CREATE FUNCTION dbo.SplitToTable(@Input VARCHAR(1000), @Delimiter VARCHAR(5)) RETURNS TABLE AS RETURN ( SELECT SUBSTRING(@Input, n, CHARINDEX(@Delimiter, @Input+@Delimiter, n) - n) as val FROM Numbers WHERE SUBSTRING(@Input, n, CHARINDEX(@Delimiter, @Input+@Delimiter, n) - n) <> '' ) GO ALTER PROCEDURE FetchMovieTitles @MovieIds VARCHAR(1000) AS SELECT * FROM MyFavoriteMovies WHERE MovieId IN (SELECT SUBSTRING(val AS INT) FROM dbo.SplitToTable(@MovieIds, ',')) GO EXEC FetchMovieTitles @MovieIds = '1,3,5' GO
Now I have accomplished my goal to get rid of the dynamic sql to avoid security risks, and to be able to reuse cached execution plans.
There are a few corrections that need to be made to make this work for longer inputs. First of all we need to have more values in our numbers table. 1 to 5 only covers input string up to a length of 5. The where clause in the function can also be optimized a bit. I won’t go into the details, but the setup I often implement uses the code below:
CREATE TABLE Numbers ( n INT PRIMARY KEY ) --Fill data in Numbers table (this needs only to be done 1 time) ;WITH Pass0 as (select 1 as C union all select 1), --2 rows Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),--4 rows Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),--16 rows Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),--256 rows Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),--65.536 rows Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),--4.294.967.296 rows Tally as (select row_number() over(order by C) as Number from Pass5) INSERT Numbers (n) SELECT Number FROM Tally WHERE Number < 1000000 --Fill 1 million rows (or as many as you like) --This will fill approximately 13 MB GO CREATE FUNCTION [dbo].[SplitToTable] ( @Input varchar(8000), @Delimiter varchar(5) ) RETURNS TABLE AS RETURN ( SELECT SUBSTRING(@Input+@Delimiter, n, CHARINDEX(@Delimiter, @Input+@Delimiter, n) - n) AS Val FROM Numbers WHERE n <= LEN(@Input) AND SUBSTRING(@Delimiter + @Input, n, 1) = @Delimiter ) GO SELECT * FROM dbo.SplitToTable('hello|world|pipe|delimited|string', '|') GO
To use the Numbers table one needs to think a bit differently about things, than the normal way of solving problems. The way I described the solution in this blog post, pretty much covers how I often attach a new problem, where my gut tells me that the numbers table might come in handy. What if I need a table with all dates the next 30 days? Simple:
SELECT DATEADD(DD, n, '2012-04-01') FROM Numbers WHERE n <= 30
Sometimes I end up with pretty nifty solutions to crazy problems, that otherwise was really nasty to solve. This SplitToTable function is just one of the many examples of things to use the Numbers table for. If you would like to learn more about this, just google for auxiliary numbers table, and a lot of different solutions and ideas pop up.