April 18, 2015 at 9:04 pm
I need a function or MSSQL query that returns the surrounding text in a string. So, let's say I have a row that contains the following comma separated string:
"adventure,devotion,for children,happiness,imagination,love,world,lovely,confusion,introspection,on work and working,on writing and words,"
The user searches for the word "lovely".
I want to return the word "lovely" and the text surrounding it like: "imagination,love,world,lovely,confusion,introspection,on work and working,"
What's the most efficient way to do this. Currently I am doing this which includes a function for returning a specific number of words...
dbo.FirstNWords(REPLACE(LEFT([Categories],CHARINDEX(','+ @term,[Categories],20)),',',', '),3)+' '+ dbo.FirstNWords(REPLACE(SUBSTRING([Categories],CHARINDEX(@term,[Categories]),LEN([Categories])),',',', '),3)
It works, however it seems too slow.
April 18, 2015 at 10:02 pm
Sorry, this should be in the SQL Server 2012 - T-SQL forum...
April 18, 2015 at 11:21 pm
Quick solution which uses dbo.DelimitedSplit8K[/url]
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @SAMPLE_STR VARCHAR(500) = 'adventure,devotion,for children,happiness,imagination,love,world,lovely,confusion,introspection,on work and working,on writing and words,';
DECLARE @SEARCH_WORD VARCHAR(50) = 'lovely';
DECLARE @WORD_COUNT INT = 3;
;WITH WORD_LIST AS
(
SELECT
WL.ItemNumber
,WL.Item
,MAX(CASE
WHEN WL.Item = @SEARCH_WORD THEN WL.ItemNumber
ELSE 0
END) OVER
(
PARTITION BY (SELECT NULL)
) AS WL_FLAG
FROM dbo.DelimitedSplit8K(@SAMPLE_STR,',') AS WL
)
SELECT
STUFF( (SELECT
',' + W.Item
FROM WORD_LIST W
WHERE W.ItemNumber BETWEEN W.WL_FLAG - @WORD_COUNT AND W.WL_FLAG + @WORD_COUNT
FOR XML PATH(''),TYPE).value('.[1]','VARCHAR(500)')
,1,1,'') AS OUTPUT_STR;
Results
OUTPUT_STR
--------------------------------------------------------------------------
imagination,love,world,lovely,confusion,introspection,on work and working
April 19, 2015 at 3:27 am
Second solution, which is more efficient than the previous one as it doesn't use the XML concatenation. The code is somewhat self-explanatory.
😎
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @SAMPLE_STR VARCHAR(8000) = 'adventure,devotion,for children,happiness,imagination,love,world,lovely,confusion,introspection,on work and working,on writing and words';
DECLARE @SEARCH_WORD VARCHAR(50) = 'lovely';--
DECLARE @WORD_COUNT INT = 3;
DECLARE @DELIMITER CHAR(1) = ',';
/* Inline Tally Table
Range 1-8000
*/
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
) AS X(N))
, NUMS(N) AS (SELECT TOP(LEN(@SAMPLE_STR)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)
/* Base information
W_POS = Position of the search word within the string
W_LEN = Length of the string
*/
,WORD_POS AS
(
SELECT
CHARINDEX(@SEARCH_WORD,@SAMPLE_STR,1) AS W_POS
,LEN(@SAMPLE_STR) AS W_LEN
)
/* Identify the position of all delimiters in the string.
LAG and LEAD to bring the desired range of
(@WORD_COUNT x 2) + 1
*/
,WORD_GROUPS AS
(
SELECT
NM.N
,WP.W_POS
,WP.W_LEN
,LAG(NM.N,@WORD_COUNT,0) OVER
(
ORDER BY NM.N
) AS START_POS
,LEAD(NM.N,@WORD_COUNT + 1,0) OVER
(
ORDER BY NM.N
) AS END_POS
FROM NUMS NM
CROSS APPLY WORD_POS WP
WHERE SUBSTRING(@SAMPLE_STR,NM.N,1) = @DELIMITER
)
/* Find the group where the search word is closest to the
delimiter and calculate the length of the extraction
*/
,RANKED_GROUPS AS
(
SELECT
DENSE_RANK() OVER
(
ORDER BY ABS(WG.W_POS - WG.N)
) RG_DRNK
,WG.START_POS
,CASE
WHEN WG.END_POS = 0 THEN WG.W_LEN - WG.START_POS
ELSE (WG.END_POS - WG.START_POS) - 1
END AS XLEN
,WG.W_POS
FROM WORD_GROUPS WG
)
/* Return NULL if the search word is not within the string
else the right group of words.
*/
SELECT
CASE
WHEN RG.W_POS = 0 THEN NULL
ELSE SUBSTRING(@SAMPLE_STR,RG.START_POS + 1,RG.XLEN)
END AS OUTPUT_STR
FROM RANKED_GROUPS RG
WHERE RG.RG_DRNK = 1;
Results
OUTPUT_STR
--------------------------------------------------------------------------
imagination,love,world,lovely,confusion,introspection,on work and working
April 19, 2015 at 12:12 pm
Eirikur Eiriksson (4/19/2015)
Second solution, which is more efficient than the previous one as it doesn't use the XML concatenation.
/* Find the group where the search word is closest to the
delimiter and calculate the length of the extraction
*/
I don't know if that's true or not (and I will find out) but what a freakin' awesome idea, Eirikur! And, not only does it work in 2K5 but I've got a couple of other ideas for this. Well done!
--Jeff Moden
Change is inevitable... Change for the better is not.
April 19, 2015 at 3:08 pm
Thanks. Is it possible to put this into a function? Or how would I use that within a select statement?
April 19, 2015 at 3:17 pm
markpringle (4/19/2015)
Thanks. Is it possible to put this into a function? Or how would I use that within a select statement?
Sure. I'd make sure that it was an "Inline Table Valued Function" for performance but there's no reason why this couldn't be encapsulated for reuse in such a manner.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 19, 2015 at 4:00 pm
It uses a Temp table, but temp tables can't be used in functions, right? I am just not SQL savvy enough to know how to repurpose it.
April 19, 2015 at 4:24 pm
He! Now I must retract the earlier statement, the bespoke function got left behind like an Escargot on a barbecue (I like them flamed)!
Here is the code for the inline(able) table value functions, the first one works on 2005 and later and the second one on 2012 and later.
😎
USE tempdb;
GO
SET NOCOUNT ON;
GO
/* dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K
Delimitedsplit8K based solution
*/
IF OBJECT_ID(N'dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K') IS NOT NULL DROP FUNCTION dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K;
GO
CREATE FUNCTION dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K
(
@SAMPLE_STR VARCHAR(500)
,@SEARCH_WORD VARCHAR(50)
,@WORD_COUNT INT
,@DELIMITER CHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
WITH WORD_LIST AS
(
SELECT
WL.ItemNumber
,WL.Item
,MAX(CASE
WHEN WL.Item = @SEARCH_WORD THEN WL.ItemNumber
ELSE 0
END) OVER
(
PARTITION BY (SELECT NULL)
) AS WL_FLAG
FROM dbo.DelimitedSplit8K(@SAMPLE_STR,@DELIMITER) AS WL
)
SELECT
CASE WHEN CHARINDEX(@SEARCH_WORD,@SAMPLE_STR,1) > 0 THEN
STUFF( (SELECT
@DELIMITER + W.Item
FROM WORD_LIST W
WHERE W.ItemNumber BETWEEN W.WL_FLAG - @WORD_COUNT AND W.WL_FLAG + @WORD_COUNT
FOR XML PATH(''),TYPE).value('.[1]','VARCHAR(500)')
,1,1,'')
ELSE NULL
END
AS OUTPUT_STR;
GO
/* dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE
Alternative solution based on an inline tally
table and LAG/LEAD
*/
IF OBJECT_ID(N'dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE') IS NOT NULL DROP FUNCTION dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE;
GO
CREATE FUNCTION dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE
(
@SAMPLE_STR VARCHAR(8000)
,@SEARCH_WORD VARCHAR(50)
,@WORD_COUNT INT
,@DELIMITER CHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
/* Inline Tally Table
Range 1-8000
*/
WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
) AS X(N))
, NUMS(N) AS (SELECT TOP(LEN(@SAMPLE_STR)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)
/* Base information
W_POS = Position of the search word within the string
W_LEN = Length of the string
*/
,WORD_POS AS
(
SELECT
CHARINDEX(@SEARCH_WORD,@SAMPLE_STR,1) AS W_POS
,LEN(@SAMPLE_STR) AS W_LEN
)
/* Identify the position of all delimiters in the string.
LAG and LEAD to bring the desired range of
(@WORD_COUNT x 2) + 1
*/
,WORD_GROUPS AS
(
SELECT
NM.N
,WP.W_POS
,WP.W_LEN
,LAG(NM.N,@WORD_COUNT,0) OVER
(
ORDER BY NM.N
) AS START_POS
,LEAD(NM.N,@WORD_COUNT + 1,0) OVER
(
ORDER BY NM.N
) AS END_POS
FROM NUMS NM
CROSS APPLY WORD_POS WP
WHERE SUBSTRING(@SAMPLE_STR,NM.N,1) = @DELIMITER
)
/* Find the group where the search word is closest to the
delimiter and calculate the length of the extraction
*/
,RANKED_GROUPS AS
(
SELECT
DENSE_RANK() OVER
(
ORDER BY ABS(WG.W_POS - WG.N)
) RG_DRNK
,WG.START_POS
,CASE
WHEN WG.END_POS = 0 THEN WG.W_LEN - WG.START_POS
ELSE (WG.END_POS - WG.START_POS) - 1
END AS XLEN
,WG.W_POS
FROM WORD_GROUPS WG
)
/* Return NULL if the search word is not within the string
else the right group of words.
*/
SELECT
CASE
WHEN RG.W_POS = 0 THEN NULL
ELSE SUBSTRING(@SAMPLE_STR,RG.START_POS + 1,RG.XLEN)
END AS OUTPUT_STR
FROM RANKED_GROUPS RG
WHERE RG.RG_DRNK = 1;
GO
Then some sample data to play around with
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.TBL_SURROUND_TEST') IS NOT NULL DROP TABLE dbo.TBL_SURROUND_TEST;
DECLARE @SAMPLE_SIZE INT = 3; -- 4 = 531441 ROWS = 12 ^ 3
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)
,SAMPLE_DATA AS
(
SELECT
NM.N AS TRAN_ID
,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) + CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) + CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) AS TRIPLET
FROM NUMS NM
)
SELECT
ISNULL(ROW_NUMBER() OVER
(
ORDER BY (SELECT NULL)
),0) AS SD_RID
,CONCAT(
S01.TRIPLET ,CHAR(44)
,S02.TRIPLET ,CHAR(44)
,S03.TRIPLET ,CHAR(44)
,S04.TRIPLET ,CHAR(44)
,S05.TRIPLET ,CHAR(44)
,S06.TRIPLET ,CHAR(44)
,S07.TRIPLET ,CHAR(44)
,S08.TRIPLET ,CHAR(44)
,S09.TRIPLET ,CHAR(44)
,S10.TRIPLET ,CHAR(44)
,S11.TRIPLET ,CHAR(44)
,S12.TRIPLET) AS INPUT_STR
INTO dbo.TBL_SURROUND_TEST
FROM SAMPLE_DATA S01
CROSS JOIN SAMPLE_DATA S02
CROSS JOIN SAMPLE_DATA S03
CROSS JOIN SAMPLE_DATA S04
CROSS JOIN SAMPLE_DATA S05
CROSS JOIN SAMPLE_DATA S06
CROSS JOIN SAMPLE_DATA S07
CROSS JOIN SAMPLE_DATA S08
CROSS JOIN SAMPLE_DATA S09
CROSS JOIN SAMPLE_DATA S10
CROSS JOIN SAMPLE_DATA S11
CROSS JOIN SAMPLE_DATA S12
;
ALTER TABLE dbo.TBL_SURROUND_TEST ADD CONSTRAINT PK_DBO_SURROUND_TEST_SD_RID PRIMARY KEY CLUSTERED (SD_RID ASC);
SELECT
COUNT(*) AS SAMPLE_COUNT
FROM dbo.TBL_SURROUND_TEST;
Test harness
USE tempdb;
GO
SET NOCOUNT ON;
DECLARE @INT_BUCKET INT = 0;
DECLARE @CHAR_BUCKET_01 VARCHAR(100) = '';
DECLARE @CHAR_BUCKET_02 VARCHAR(100) = '';
RAISERROR(N'-------------------------------------------------------------
DRY RUN
-------------------------------------------------------------',0,0) WITH NOWAIT;
SET STATISTICS IO,TIME ON;
SELECT
@INT_BUCKET = ST.SD_RID
,@CHAR_BUCKET_02 = ST.INPUT_STR
FROM dbo.TBL_SURROUND_TEST ST;
SET STATISTICS TIME,IO OFF;
RAISERROR(N'-------------------------------------------------------------
iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K
-------------------------------------------------------------',0,0) WITH NOWAIT;
SET STATISTICS IO,TIME ON;
SELECT
@INT_BUCKET = ST.SD_RID
,@CHAR_BUCKET_01 = X.OUTPUT_STR
,@CHAR_BUCKET_02 = ST.INPUT_STR
FROM dbo.TBL_SURROUND_TEST ST
CROSS APPLY dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K(ST.INPUT_STR,'ABC',3,',') AS X
WHERE X.OUTPUT_STR IS NOT NULL;
SET STATISTICS TIME,IO OFF;
RAISERROR(N'-------------------------------------------------------------
iTVF_SURROUNDING_WORDS_ALTERNATIVE
-------------------------------------------------------------',0,0) WITH NOWAIT;
SET STATISTICS IO,TIME ON;
SELECT
@INT_BUCKET = ST.SD_RID
,@CHAR_BUCKET_01 = X.OUTPUT_STR
,@CHAR_BUCKET_02 = ST.INPUT_STR
FROM dbo.TBL_SURROUND_TEST ST
CROSS APPLY dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE(ST.INPUT_STR,'ABC',3,',') AS X
WHERE X.OUTPUT_STR IS NOT NULL;
SET STATISTICS TIME,IO OFF;
Results:blush:
-------------------------------------------------------------
DRY RUN
-------------------------------------------------------------
Table 'TBL_SURROUND_TEST'. Scan count 1, logical reads 4483, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 125 ms, elapsed time = 131 ms.
-------------------------------------------------------------
iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K
-------------------------------------------------------------
Table 'Worktable'. Scan count 396, logical reads 12605, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TBL_SURROUND_TEST'. Scan count 1, logical reads 4483, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 5086 ms, elapsed time = 5088 ms.
-------------------------------------------------------------
iTVF_SURROUNDING_WORDS_ALTERNATIVE
-------------------------------------------------------------
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TBL_SURROUND_TEST'. Scan count 1, logical reads 4483, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 66612 ms, elapsed time = 66738 ms.
Post Mortem: Looks like the dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE got hit by an SORT Operator, passed away very slowly.
Edit: Just remembered that on this test system the DelimitedSplit8K is an alias for DelimitedSplit8K_Lead, might produce slightly different results.
April 19, 2015 at 5:56 pm
markpringle (4/19/2015)
It uses a Temp table, but temp tables can't be used in functions, right? I am just not SQL savvy enough to know how to repurpose it.
I see nothing of a temp table in the code. It's just a standard practice for folks like Eirikur and many others to add "USE TempDB" to code to ensure that we cause no damage to a real database whether we use any DDL or not. It's just a habit on forums.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply