March 7, 2021 at 4:53 am
Jonathan AC Roberts wrote:Or you could just add a WHERE clause to filter them out as it's a table valued function.
Yes ... but my Splitter returns an Item No, and if I use WHERE afterwards I have gaps in the Item No, so I would then also have to add code to solve that, every time I have that requirement, whereas if I get my Splitter to remove them that function returns contiguous Item Nos.
I still have the option of the Splitter not removing them, so I can have the data "raw" and post-process if that is more appropriate.
I'm always interested in what other folks come up with in the area of splitters, especially those that can handle LOB datatypes. Would you post your splitter, please?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 7, 2021 at 2:03 pm
Would you post your splitter, please?
Sadly that requires more time than I have readily available to sanitise and remove any naming etc. that is "internal" ... which in turn would mean testing to make sure that I hadn't goofed something up ...
What I can do more easily is to catalogue the splitters I have / use
CREATE FUNCTION dbo.FN_DelimSplitChar
(
@strSourcevarchar(8000)
, @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
, @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
-- Use '[DELETE]' to delete any NULL items
)
-- Split a string based on delimiter. Returns a Character String (which is NOT Trimmed)
RETURNS @tblArray TABLE
(
S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
, S_StrValuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
Standard string splitter. Optimised for <= 8,000 char
CREATE FUNCTION dbo.FN_DelimSplitInt
(
@strSourcevarchar(8000)
, @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
, @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. NULL or '-1')
-- Use '[DELETE]' to delete any NULL items
)
-- Split a string based on delimiter. Returns an array of INTs.
RETURNS @tblArray TABLE
(
S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
, S_IntValueint NULL
)
Same as CHAR, optimised for INTs
CREATE FUNCTION dbo.FN_DelimSplitIntMAX
(
@strSourcevarchar(MAX)
, @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
, @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. NULL or '-1')
-- Use '[DELETE]' to delete any NULL items
)
-- Split a string based on delimiter. Returns an array of INTs.
RETURNS @tblArray TABLE
(
S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
, S_IntValueint NULL
)
Same ... used if Source could exceed [and is optimised for] 8,000+ chars
CREATE FUNCTION dbo.FN_DelimSplitNVarcharMax
(
@strSourcenvarchar(MAX)
, @strDelimiternvarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
, @strBlanknvarchar(4000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
-- Use '[DELETE]' to delete any NULL items
)
-- Split a TEXT string based on delimiter. Returns a Character String.
RETURNS @tblArray TABLE
(
S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
, S_StrValuenvarchar(4000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
Same, only used when Wide-Chars need to be accommodated (Input up to 8,000 char)
CREATE FUNCTION dbo.FN_DelimSplitVarcharMax
(
@strSourcevarchar(MAX)
, @strDelimitervarchar(10) = ','-- NOTE: Delimiter is CASE SENSITIVE
, @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
-- Use '[DELETE]' to delete any NULL items
)
-- Split a TEXT string based on delimiter. Returns a Character String.
RETURNS @tblArray TABLE
(
S_Itemint IDENTITY(1,1) NOT NULL PRIMARY KEY
, S_StrValuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
)
Wide-Chars, more than 8,000 char
CREATE FUNCTION dbo.FN_FixedSplitChar
(
@strValuevarchar(8000)-- String Value to split
, @intWidthint-- Width (of each element)
, @intOptionssmallint = 0-- RESERVED
)
-- Fixed-width string splitter
RETURNS TABLE
...
DECLARE@tblArray TABLE
(
T_Itemint NOT NULL
, T_StrValuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
, PRIMARY KEY
(
T_Item
)
)
Split fixed width strings (i.e. no delimiter)
CREATE FUNCTION dbo.FN_FixedSplitInt
(
@strValuevarchar(8000)-- String Value to split
, @intWidthint-- Width (of each element)
, @intStuffStart int-- Offset for STUFF
, @intStuffLenint-- Length for STUFF
, @intOptionssmallint = 0-- 0=Default, 2=Last item blank=ignore
)
-- Fixed-width string splitter
RETURNS TABLE
AS
...
DECLARE@tblArray TABLE
(
T_Itemint NOT NULL
, T_IntValueint NULL
, PRIMARY KEY
(
T_Item
)
)
Same, but for INTs. The STUFF option is where the INTs are fixed length, but also have delimiter - strings such as:
'/123/45 /7 /0 /567'
CREATE FUNCTION dbo.FN_PathSplit
(
@strSourcevarchar(8000)
, @strDelimitervarchar(10) = '/'
, @strBlankvarchar(8000) = '[DELETE]'-- What to return if an item is blank? (e.g. '', NULL or '[BLANK]')
-- Use '[DELETE]' to delete any NULL items
)
-- Split a string into its Path components
-- e.g. /L1/L2/L3 will be split into /L1, /L1/L2 and /L1/L2/L3
RETURNS @tblArray TABLE
(
S_Itemint IDENTITY(1,1) NOT NULL
, S_Valuevarchar(8000) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
, PRIMARY KEY
(
S_Item
)
)
CREATE FUNCTION dbo.FN_SplitWords
(
@strValuevarchar(8000)-- String Value to split into words
, @intOptionssmallint = 0-- 0=Normal, 1=Append plurals
)
RETURNS TABLE
Dictionary table available for significant number of English words, aliased for root/singular words and so on. Not a substitute for a proper lexicon! but we find it works well enough for general text searching where we need "story" to also match "stories", and ""tell" / "told" etc
So take a search string, of one/multiple words, split it, run it through the dictionary to get "root" words, and then use the "word lookup table" to find (weighted) matches for one/many/all words in the search string, and then join that back to the original table/record. The central Word Lookup Table is maintained by trigger, splitting e.g. a "Description" into words and storing them (the "root" variant) in the Word Lookup Table (WordID, TableID, RecordPKeyID).
For searching where no pre-processing to Word Lookup Table exists, then convert the words in the search string into all possible singular/plural, present/past tense etc. and then use THAT list (somehow!) to match strings - using LIKE/whatever
CREATE FUNCTION dbo.FN_Trigram_Split
(
@stringvarchar(8000)
)
-- Find matching records in TrigramData Table
RETURNS TABLE
We use Trigrams for text searching in some circumstances
November 11, 2022 at 10:30 pm
I recently had reason to revisit this article and I found a bug in the code. It does not preserve leading spaces in the individual elements. If an element is all spaces, it removes all the spaces even when the delimiter is a comma.
Compare the outputs of items 2, 3, and 18 in the following test.
--===== Conditionally drop the test tables to make reruns easier for testing.
-- (this is NOT a part of the solution)
DROP TABLE IF EXISTS #JBMTest
;
--===== Create and populate a test table on the fly (this is NOT a part of the solution).
-- In the following comments, "b" is a blank and "E" is an element in the left to right order.
-- Double Quotes are used to encapsulate the output of "Item" so that you can see that all blanks
-- are preserved no matter where they may appear.
SELECT SomeID, SomeValue = CONVERT(NVARCHAR(4000),SomeValue)
INTO #JBMTest
FROM ( --# of returns & type of Return Row(s)
SELECT 0, NULL UNION ALL --1 NULL
SELECT 1, SPACE(0) UNION ALL --1 b (Empty String)
SELECT 2, SPACE(1) UNION ALL --1 b (1 space)
SELECT 3, SPACE(5) UNION ALL --1 b (5 spaces)
SELECT 4, ',' UNION ALL --2 b b (both are empty strings)
SELECT 5, '55555' UNION ALL --1 E
SELECT 6, ',55555' UNION ALL --2 b E
SELECT 7, ',55555,' UNION ALL --3 b E b
SELECT 8, '55555,' UNION ALL --2 b B
SELECT 9, '55555,1' UNION ALL --2 E E
SELECT 10, '1,55555' UNION ALL --2 E E
SELECT 11, '55555,4444,333,22,1' UNION ALL --5 E E E E E
SELECT 12, '55555,4444,,333,22,1' UNION ALL --6 E E b E E E
SELECT 13, ',55555,4444,,333,22,1,' UNION ALL --8 b E E b E E E b
SELECT 14, ',55555,4444,,,333,22,1,' UNION ALL --9 b E E b b E E E b
SELECT 15, ' 4444,55555 ' UNION ALL --2 E (w/Leading Space) E (w/Trailing Space)
SELECT 16, 'This,is,a,test.' UNION ALL --4 E E E E
SELECT 17, ',,,,,,' UNION ALL --7 (All Empty Strings)
SELECT 18, ' a , b '
) d (SomeID, SomeValue)
;
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID, test.SomeValue, Value = QUOTENAME(split.value,N'"'), split.position
FROM #JBMTest test
CROSS APPLY dbo.STRING_SPLIT(test.SomeValue,N',') split
;
--===== Split the CSV column for the whole table using CROSS APPLY (this is the solution)
SELECT test.SomeID, test.SomeValue, Item = QUOTENAME(split.Item,N'"'), split.ItemNumber
FROM #JBMTest test
CROSS APPLY dbo.DelimitedSplitN4K(test.SomeValue,N',') split
;
--Jeff Moden
Change is inevitable... Change for the better is not.
November 12, 2022 at 1:57 am
Thanks for spotting that Jeff. I deliberately wrapped an LTRIM around the return value so if a manually written csv sometimes had comma space or just comma it would return the value with no leading space. I should really remove LTRIM as the T-SQL string_split function returns leading spaces. I should also rename the position column to ordinal to make it the same as the T-SQL function.
I'll replace the function in my script article with this one:
IF OBJECT_ID('[dbo].[STRING_SPLIT]','IF') IS NULL BEGIN
EXEC ('CREATE FUNCTION [dbo].[STRING_SPLIT] () RETURNS TABLE AS RETURN SELECT 1 X')
END
GO
ALTER FUNCTION [dbo].[STRING_SPLIT]
(
@string nvarchar(MAX),
@separator nvarchar(MAX)
)
RETURNS TABLE WITH SCHEMABINDING
AS RETURN
WITH X(N) AS (SELECT 'Table1' FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) T(C)),
Y(N) AS (SELECT 'Table2' FROM X A1, X A2, X A3, X A4, X A5, X A6, X A7, X A8) , -- Up to 16^8 = 4 billion
T(N) AS (SELECT TOP(ISNULL(LEN(@string),0)+1) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 N FROM Y),
Delim(Pos) AS (SELECT t.N FROM T WHERE SUBSTRING(@string, t.N, LEN(@separator+'x')-1) LIKE @separator OR t.N = 0),
Separated(value,position) AS (SELECT SUBSTRING(@string, d.Pos + CASE WHEN Pos=0 THEN 1 ELSE LEN(@separator+'x')-1 END,
LEAD(d.Pos,1,2147483647) OVER (ORDER BY (SELECT NULL))
- CASE WHEN d.Pos=0 THEN 2-LEN(@separator+'x')
ELSE d.Pos END - (LEN(@separator+'x')-1)),
d.pos
FROM Delim d
WHERE @string IS NOT NULL)
SELECT s.value value,
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) ordinal
FROM Separated s
WHERE s.value <> @separator;
GO
November 13, 2022 at 12:36 am
That sounds good and right, Jonathan. I tell people that good splitters shouldn't make any assumptions and that it should return EVERYTHING that's between the delimiters, an empty string if there's nothing between the delimiters and, hopefully, at NULL if a NULL was passed for the string value.
Thank you for your efforts here.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2022 at 12:01 pm
This was removed by the editor as SPAM
December 19, 2022 at 9:58 pm
Comments posted to this topic are about the item A varchar(MAX) STRING_SPLIT function for SQL 2012 and above
December 22, 2022 at 9:54 am
Hi folks,
Some words about this interesting function.
Now I call this function like this:
SELECT * FROM dbo.STRING_SPLIT(N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit', N'DO');
GO
But what to do if I need separator1 or separator2 or separator3?
In the example above: separators are :
DO
BE
BO
i.e. I want to split string separated by one of these separators?
The call (sep2) after call(sep1) does not help in that case.
December 22, 2022 at 11:54 am
Hi folks,
Some words about this interesting function.
Now I call this function like this:
SELECT *FROM dbo.STRING_SPLIT(N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit', N'DO');GO
But what to do if I need separator1 or separator2 or separator3?
In the example above: separators are :
DO
BE
BO
i.e. I want to split string separated by one of these separators?
The call (sep2) after call(sep1) does not help in that case.
Not sure exactly what you want, but if you want to split by just one of the separators you could put the separator into a variable and change that variable to the value you want:
DECLARE @Separator nvarchar(MAX) = 'BO'
DECLARE @String nvarchar(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
SELECT *
FROM dbo.STRING_SPLIT(@String, @Separator)
If you want to split by all the separators you could use REPLACE.
DECLARE @String nvarchar(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
SELECT *
FROM dbo.STRING_SPLIT(REPLACE(REPLACE(REPLACE(@String, ' BO ', '|'), ' DO ', '|'), ' BE ', '|'), '|') t
December 23, 2022 at 9:05 am
Yes, the 2nd is what I have asked.
But the replace
DECLARE @String NVARCHAR(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
SELECT *
FROM dbo.STRING_SPLIT(REPLACE(REPLACE(REPLACE(@String, ' BO ', '|'), ' DO ', '|'), ' BE ', '|'), '|') t
is not for all cases, need to be sure that no the character '|' in the text. May be replace all separators with the 1st one.
December 23, 2022 at 12:11 pm
Yes, the 2nd is what I have asked.
But the replace
DECLARE @String NVARCHAR(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
SELECT *
FROM dbo.STRING_SPLIT(REPLACE(REPLACE(REPLACE(@String, ' BO ', '|'), ' DO ', '|'), ' BE ', '|'), '|') tis not for all cases, need to be sure that no the character '|' in the text. May be replace all separators with the 1st one.
Yes, you could do that:
DECLARE @String NVARCHAR(MAX) = N'Lorem DO ipsum BE dolor BO sit amet, consectetur adipiscing elit'
SELECT *
FROM dbo.STRING_SPLIT(REPLACE(REPLACE(@String, ' BO ', ' DO '), ' BE ', ' DO '), ' DO ' ) t
Viewing 11 posts - 31 through 40 (of 40 total)
You must be logged in to reply to this topic. Login to reply