October 3, 2016 at 9:09 am
I know about DelimitedSplit8K, however what I want is to split a long text (varchar(max)) field on a delimiter at a given length (2000).
For Example:
Say I have a value:
DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';
if I want a max of five characters then the output would be:
AB,CD
EF,GH
IJ
if I want a max of ten characters then the output would be:
AB,CD,EF
GH,IJ
So if this makes sense, how would I go about programming this? My web search skills are dismal so I did not find any direction.
Thanks
DJJ
October 3, 2016 at 9:47 am
The simple answer would be to use something like this:
DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';
WITH
E(n) AS(
SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)
),
E2(n) AS(
SELECT a.n FROM E a, E b
),
E4(n) AS(
SELECT a.n FROM E2 a, E2 b
),
cteTally(n) AS(
SELECT TOP( CAST(CEILING(LEN(@myv) / 5.) AS int))
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) - 1 n
FROM E4
)
SELECT SUBSTRING( @myv, (5 * n) + 1, 5)
FROM cteTally;
Of course, that would be to simple to be what you're looking for.
So I opened the briefcase and found this thread that might help you:
http://www.sqlservercentral.com/Forums/Topic1709361-149-1.aspx
October 3, 2016 at 10:02 am
Thanks, Luis!
I will study both your example and the link.
October 3, 2016 at 12:46 pm
We have a script for exactly this kind of thing; have a look at nsplit2B which you can find here[/url].
Here's a couple examples:
DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';
SELECT * FROM dbo.nsplit2B(@myv,5,1);
SELECT * FROM dbo.nsplit2B(@myv,8,1);
Results:
ItemNumber Token
-------------------- ----------
1 AB,CD
2 EF,GH
3 IJ
ItemNumber Token
-------------------- ----------
1 AB,CD,EF
2 GH,IJ
-- Itzik Ben-Gan 2001
October 3, 2016 at 12:58 pm
Hi Alan, you're still missing the GetNumsAB definition. I'd be interested on checking out how everything works. 🙂
October 3, 2016 at 1:05 pm
Alan.B (10/3/2016)
We have a script for exactly this kind of thing; have a look at nsplit2B which you can find here[/url].Here's a couple examples:
DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';
SELECT * FROM dbo.nsplit2B(@myv,5,1);
SELECT * FROM dbo.nsplit2B(@myv,8,1);
Results:
ItemNumber Token
-------------------- ----------
1 AB,CD
2 EF,GH
3 IJ
ItemNumber Token
-------------------- ----------
1 AB,CD,EF
2 GH,IJ
This looks like it expects all of the tokens to be the same length. While this is true of the sample data, it wasn't clear that this would be true of the actual data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 3, 2016 at 1:25 pm
Luis Cazares (10/3/2016)
Hi Alan, you're still missing the GetNumsAB definition. I'd be interested on checking out how everything works. 🙂
getnumsAB
CREATE FUNCTION dbo.GetNumsAB
(
@low bigint,
@high bigint,
@gap bigint,
@row1 bit
)
/****************************************************************************************
Purpose:
Creates up to 100,544,625 sequential numbers beginning with @low and ending with @high.
Used to replace iterative methods such as loops, cursors and recursive CTEs to solve SQL
problems. Based on Itzik Ben-Gan's getnums function with some tweeks and enhancements
and added functionality. The logic for getting rn to begin at 0 or 1 is based comes from
Jeff Moden's fnTally function.
Compatibility:
SQL Server 2008+ and Azure SQL Database
Syntax:
SELECT rn, n1, n2 FROM dbo.getnumsAB(@low,@high,@gap,@row1);
Parameters:
@low = a bigint that represents the lowest value for n1.
@high = a bigint that represents the highest value for n1.
@gap = a bigint that represents how much n1 and n2 will increase each row; @gap also
represents the difference between n1 and n2.
@row1 = a bit that represents the first value of rn. When @row = 0 then rn begins
at 0, when @row = 1 then rn will begin at 1.
Return Types:
Inline Table Valued Function returns:
rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can
start at 0 or 1 which is dictated by @row1.
n1 = bigint; a sequential number starting at the value of @low and incrimentingby the
value of @gap until it is less than or equal to the value of @high.
n2 = bigint; a sequential number starting at the value of @low+@gap and incrimenting
by the value of @gap.
Developer Notes:
1. The lowest and highest possible numbers returned are whatever is allowable by a
bigint. The function, however, returns no more than 100,544,625 rows (465^3).
2. @gap does not affect rn, rn will begin at @row1 and increase by 1 until the last row
unless its used in a query where a filter is applied to rn.
3. @gap must be greater than 0 or the function will not return any rows.
4. Keep in mind that when @row1 is 0 then the highest row-number will be the number of
rows returned minus 1
5. If you only need is a sequential set beginning at 0 or 1 then, for best performance
use the RN column. Use N1 and/or N2 when you need to begin your sequence at any
number other than 0 or 1 or if you need a gap between your sequence of numbers.
6. Although @gap is a bigint it must be a positive integer or the function will
not return any rows.
7. @high must be equal or higher than @low or the function won't return any rows.
8. There is no performance penalty for sorting by rn ASC; there is a large performance
penalty for sorting in descending order. To sort in descending order, first make sure
that @gap = 1. Next create an alias column (called "n_desc" in the example below) and
use the formula: n_desc = ABS(rn-(@high+1)). The example below will return the numbers
1 to 10 in descending order:
DECLARE @low int = 1, @high int = 10, @gap int = 1, @row1 bit = 1;
SELECT n_desc = ABS(rn-(@high+(@row1)))
FROM dbo.getnumsAB(@low,@high,1,@ro1)
ORDER BY rn;
Examples:
--===== 1. Using RN (rownumber)
-- (1.1) The best way to get the numbers 1,2,3...@high (e.g. 1 to 5):
SELECT RN FROM dbo.getnumsAB(1,5,1,1);
-- (1.2) The best way to get the numbers 0,1,2...@high-1 (e.g. 0 to 5):
SELECT RN FROM dbo.getnumsAB(0,5,1,0);
--===== 2. Using N1
-- (2.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):
SELECT N1 FROM dbo.getnumsAB(-3,3,1,1);
-- (2.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:
SELECT RN, N1 FROM dbo.getnumsAB(-3,3,1,1);
-- (2.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:
SELECT RN, N1 FROM dbo.getnumsAB(-3,3,1,0);
--===== 3. Using N2 and @gap
-- (3.1) To get 0,10,20,30...100, set @low to 0, @high to 100 and @gap to 10:
SELECT N1 FROM dbo.getnumsAB(0,100,10,1);
-- (3.2) Note that N2=N1+@gap; this allows you to create a sequence of ranges.
-- For example, to get (0,10),(10,20),(20,30).... (90,100):
SELECT N1, N2 FROM dbo.getnumsAB(0,90,10,1);
-- (3.3) Remember that a rownumber is included and it can begin at 0 or 1:
SELECT RN, N1, N2 FROM dbo.getnumsAB(0,90,10,1);
--===== (4) A real life example using RN, N1 and N2:
-- Beginning with @StartDate, to generate ranges of weeks that occur between
-- @startDate & @EndDate:
DECLARE @StartDate datetime = '1/1/2015', @EndDate datetime = '2/28/2015';
SELECT
WeekNbr = 'Week #'+CAST(RN AS varchar(2)),
WeekStart = CONVERT(DATE, DATEADD(DAY,N1,@StartDate)),
WeekEnd = CONVERT(DATE, DATEADD(DAY,N2-1,@StartDate))
FROM dbo.getnumsAB(0,datediff(DAY,@StartDate,@EndDate),7,1);
---------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20140518 - Initial Development - Alan Burstein
Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section
- Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
L1(N) AS
(
SELECT 1
FROM (VALUES
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),
(NULL),(NULL),(NULL)) t(N) -- 465 values
),
iTally AS
(
SELECT rn = 0 WHERE @row1 = 0
UNION ALL
SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM L1 a CROSS APPLY L1 b CROSS APPLY L1 c
) --Up to 100,544,625 numbers (select 465*465*465 = 100,544,625)
SELECT TOP (ABS((@high-@low)/ISNULL(NULLIF(@gap,0),1)+1))
rn,
n1 = ((rn-( @row1 ))*@gap+@low),
n2 = ((rn-(@row1-1))*@gap+@low)
FROM iTally
WHERE @high >= @low and @gap > 0
ORDER BY rn;
dbo.NSplit2B
CREATE FUNCTION dbo.NSplit2B
(
@String varchar(max),
@TokenLen bigint,
@DelimLen bigint
)
/****************************************************************************************
Purpose:
Identical to NSplit8K but handles an input string of data type varchar(max). NSplit2B
splits an input string (@String) into @TokenLen sized tokens with @DelimLen sized gaps
between each token. Intended to be a better performing alternative to the traditional
"splitter" for cases where the size of the token (AKA "item") is always the same.
Compatibility:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
Syntax:
SELECT ItemNumber, Token
FROM dbo.NSplit2B(@string, @length, @delimiterLength)
Parameters:
@String = varchar(max); the input string to "split" into tokens
@TokenLen = bigint; the size of the output token
@DelimLen = bigint; the size of the delimiter. Can be set to 0 for when there's
no delimiter or set to negetive if you desire overlap
Returns:
ItemNumber = bigint; represents the order that the token appears in the input string
Token = varchar(max); the @lengh-sized token returned by the function.
Developer notes:
1. Requires GetNumsAB: http://www.sqlservercentral.com/scripts/Set+Based/139370/
Alternatively you can incorporate your own tally table logic at your own risk.
2. If the last token is shorter then @TokenLen then it will be truncated.
3. Tends to perform better with a serial execution plan; it may be adventageous to test
with OPTION (MAXDOP 1) and compare that performance with a parallel plan. To force a
parallel execution plan see make_parallel by Adam Machanic:
sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
4. There are no safeguards built into the function to make sure the tokens are all the
same size. The only safeguard is to test and know your data.
5. If any of the parameters are NULL the function will return a a single NULL value.
6. Will perform slower than NSplit8K due to the fact that it's handling varchar(max)
strings. If you're sure all strings contain 8,000 characters or less use NSPlit8K.
7. NSplit2B is deterministic. For more about deterministic functions see:
https://msdn.microsoft.com/en-us/library/ms178091.aspx
Examples:
--===== (1) extract the values "ab", "cd", "ef", and "gg"; token-size=2; delimiter=","
SELECT ItemNumber, Token FROM dbo.NSplit2B('ab,cd,ef,gg',2,1);
--===== (2) extract the values "xxx", "yyy", "bbb", etc... token-size=3; delimiter=","
DECLARE @string1 varchar(100) = 'xxx,yyy,bbb,123,zzz';
SELECT ItemNumber, Token FROM dbo.NSplit2B(@string1,3,1);
--===== (3) extract the values "client01", "client02", etc...
DECLARE @string varchar(100) = '<client01>,<client02>,<client03>,<client04>';
SELECT ItemNumber, Token FROM dbo.NSplit2B(SUBSTRING(@string,2,LEN(@string)-2),8,3);
--===== (4) If you dont need a delimiter to split the string
SELECT ItemNumber, Token FROM dbo.NSplit2B('abcdefghi',3,0);
--===== (5) Turn it into an N-Gram function
DECLARE @string varchar(max) = 'abcdefghi',
@N int = 1; -- change this to 2 for bigrams, 3 for trigrams, 4 for 4-grams, etc...
SELECT TOP (LEN(@string)-(@N-1)) ItemNumber, Token
FROM dbo.NSplit2B('abcdefghi',@N,-(@N-1));
--===== (6) Split a string into unigrams then count the vowels
SELECT VowelCount = COUNT(*)
FROM dbo.NSplit2B('abcdefghi',1,0)
WHERE Token LIKE '[aeiou]';
--===== (7) Behavior for NULL inputs
SELECT ItemNumber, Token FROM dbo.NSplit2B(NULL,1,0);
SELECT ItemNumber, Token FROM dbo.NSplit2B('ABC',NULL,0);
SELECT ItemNumber, Token FROM dbo.NSplit2B('ABC',1,NULL);
----------------------------------------------------------------------------------------
Rev 00 - 20151030 - Alan Burstein - Initial Development
Rev 01 - 20160407 - Alan Burstein
- Rewrote using getnumsAB
- Added logic for handling NULL inputs
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
ItemNumber = rn,
Token = SUBSTRING(@string,n1,@TokenLen)
FROM dbo.GetNumsAB(1,CONVERT(bigint,DATALENGTH(@string),0),@TokenLen+@DelimLen,1)
WHERE @string IS NOT NULL AND @TokenLen+@DelimLen IS NOT NULL
UNION ALL
SELECT 1, NULL -- if any of the parameters are NULL then return a single NULL value
WHERE REPLICATE(@string,@TokenLen+@DelimLen) IS NULL;
-- Itzik Ben-Gan 2001
October 3, 2016 at 1:26 pm
drew.allen (10/3/2016)
Alan.B (10/3/2016)
We have a script for exactly this kind of thing; have a look at nsplit2B which you can find here[/url].Here's a couple examples:
DECLARE @myv VARCHAR(MAX) = 'AB,CD,EF,GH,IJ';
SELECT * FROM dbo.nsplit2B(@myv,5,1);
SELECT * FROM dbo.nsplit2B(@myv,8,1);
Results:
ItemNumber Token
-------------------- ----------
1 AB,CD
2 EF,GH
3 IJ
ItemNumber Token
-------------------- ----------
1 AB,CD,EF
2 GH,IJ
This looks like it expects all of the tokens to be the same length. While this is true of the sample data, it wasn't clear that this would be true of the actual data.
Drew
I did not fully understand the requirement so I'm guessing a little (waiting on OP feedback). Yes - if tokens are not the same length then what I posted wont work.
-- Itzik Ben-Gan 2001
October 3, 2016 at 1:35 pm
I used the link Luis presented and have it working. The size of each string varies greatly but the needed split is around 2000 characters.
So, I modified to add the extra column I needed, and use the different delimiter.
Thank you, everyone.
October 12, 2016 at 9:22 am
djj (10/3/2016)
I used the link Luis presented and have it working. The size of each string varies greatly but the needed split is around 2000 characters.So, I modified to add the extra column I needed, and use the different delimiter.
Thank you, everyone.
EDIT***
Never mind I realize now where my confusion lay
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply