The invitation this month is from Erik Darling, and it’s a neat one. I like this thought, asking us to find code that impressed us or made us feel something. I tend to look at this as positive, but it could be negative.
In any case, I was on vacation from 1 Jul to 9 Jul, out of touch with the world and unwired. So I’m doing this as a quick post by the seat of my pants. I might have to come back and make a second post in the future.
Changing the way I think about T-SQL
I’m a decent T-SQL developer. Not amazing, or great even, but I am effective. I’ve learned a lot over the years and I’ve been able to get things done for my employers. I often look at other’s code and I try to improve how I view problem solving. I’ve learned a lot from Itzik and others over the years, though I have to admit that many of us solutions go over my head. I’m just not in those spaces where I need complex coded solutions very often.
That being said, years ago I got an article from Jeff Moden on the tally table. I hadn’t used this, and was fascinated. I know Itzik had written about numbers tables early on, but it hadn’t caught my attention. However, in a follow-up, Jeff wrote about a splitter function, which would use the tally table to split strings efficiently. This is the function (credit to Jeff in his article):
CREATE FUNCTION [dbo].[DelimitedSplit8K] --===== Define I/O parameters (@pString VARCHAR(8000), @pDelimiter CHAR(1)) --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE! RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000... -- enough to cover VARCHAR(8000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1 UNION ALL SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter ), cteLen(N1,L1) AS(--==== Return start and length (for use in substring) SELECT s.N1, ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000) FROM cteStart s ) --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found. SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1), Item = SUBSTRING(@pString, l.N1, l.L1) FROM cteLen l ;
Now, note this is not the original code, but updated and improved code. I love that Jeff has maintained this.
What I found great is how Jeff approached the problem. This is limited to 8k strings, but that’s for performance reasons. One could easily enhance this to be larger if needed. I also like the tally table is quickly generated from simple code that gives us 10,000 numbers.
I also found the simplicity of the substring and the charindex to be something I think I not only understand, but could have written.
Could have. Certainly didn’t and might not have. This is great code that’s been helpful to me over the years in places where I wanted to break up code. I’ve used this in a number of demos for clients and I’ve referred people to this over the years as well.