Finding the longest string within a string field

Question

Finding the longest string within a string field

olibbhq

SSCarpal Tunnel

Points: 4034
More actions
March 20, 2014 at 6:10 am

#289613

Hi,
This might sounds a bit odd but here goes, thanks for your help in advance. We have some URLs within a bulk block of text some of which are very long. I need to identify rows where such urls exceed say 100 characters in length in amongst other text.
So the rule would be return a record if within the string there is a string (without spaces) longer than 100 characters.
Hopefully this is not too confusing and thank you for your help in advance.
Oliver

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

SQLCJ SSCommitted Points: 1791 More actions · Answer 1

You may have to use combination of charindex() and len() function to determine this. If you could please provide the sample set then we can give it a go.

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 2

Hi,

Please see attached as an example.

Many Thanks,

Oliver

CREATE TABLE [dbo].[A_TABLE](

[ID] [int] NOT NULL,

[column1] [nvarchar](max) NULL,

CONSTRAINT [PK_A_TABLE] PRIMARY KEY CLUSTERED

(

[ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (1, N'asb unliunliuniluniuniuiluniuniuneiwudbeuwybduweybduweybduywebdukybwedybweuybdweyubedukywbed nnnn unuuuu jjjjj ')

GO

INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (2, N'eirfreuu erfiuneriufnerufn iurnfliuenfliuwenfiu unliufrenwuifnewuinfliuewnrfiuwruiefniuernfilurneliunwflriunwliufnrwliunfrliuwneflirunwef rufnwerun')

GO

INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (3, N'ireunfiluwernfiulrenwliufrnweiurnluewniuwnliueniuenfrliunewuifniuewnrieureuinflriunwliuerfn nn urr urrrrrr')

GO

INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (4, N'frewunfilurewn erifunliuwernfliruwenf')

GO

INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (5, N' reufniruenfrluienw oeriunfirewifrweionfriweo;ifernfoiwrneoifnweo;inrfioweno;irnfweiornioen;oiewnoifrn;irenniefvjhdbscvuybsd')

GO

Luis Cazares SSC Guru Points: 183706 More actions · Answer 3

Based on that sample data. What should be the output?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 4

Thanks for the reply. The output I would like is to return the ID for the rows which have a continous character string longer than say 60 within the string, so for example it would return rows:

1

2

3

5

Row 4 does not have and continous characters strings longer that 60 characters so therefore is excluded.

Many Thanks,

Oliver

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

If I understand correctly, you want to check if there's a continuous character string in the column, it doesn't matter the position of it.

You could do this with a string delimiter and checking each item. The best T-SQL option available is explained in here: http://www.sqlservercentral.com/articles/Tally+Table/72993/

And I'm using it in the following query.

DECLARE @Length int = 60

SELECT DISTINCT ID

FROM dbo.A_TABLE

CROSS APPLY dbo.DelimitedSplit8K( column1, ' ')

WHERE LEN(Item) > @Length

If you have any questions, feel free to ask.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 6

Many Thanks this works perfectly. I really appreciate the help that splitter is going to be so useful going forward for other tasks. It seems that cross apply seems to be the answer to all my questions at the moment!

Thanks again

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 7

Just a quick thought;

Splitting the string is somewhat like crossing the stream for a sip of water, when all that is needed is to get the position of the spaces/delimiters.

Here is a suggestion:

USE tempdb;

GO

DECLARE @LENGTH_LIMIT INT = 50;

DECLARE @DELIM SMALLINT = UNICODE(N' ');

SELECT

XX.ID

FROM

(

SELECT

ROW_NUMBER() OVER

(

PARTITION BY X.ID

ORDER BY (SELECT NULL)

) AS PRT_RID

,X.ID

FROM

(

SELECT

PS.ID

,LEAD(PS.N,1) OVER

(

PARTITION BY PS.ID

ORDER BY PS.ID, PS.N

) - PS.N AS ST_LEN

FROM

(

SELECT TA.ID,0 AS N,@DELIM AS CH_CODE

FROM dbo.A_TABLE TA

UNION ALL

SELECT

TA.ID

,NN.N

,UNICODE(SUBSTRING(TA.column1,NN.N,1)) AS CH_CODE

FROM dbo.A_TABLE TA

CROSS APPLY

(

SELECT

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N

FROM

(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N1

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N2

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N3

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N4

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N5

) AS NN

WHERE NN.N <= LEN(TA.column1)

UNION ALL

SELECT TA.ID,LEN(TA.column1),@DELIM

FROM dbo.A_TABLE TA

) AS PS

WHERE PS.CH_CODE = @DELIM

) AS X

WHERE X.ST_LEN > @LENGTH_LIMIT

) AS XX

WHERE XX.PRT_RID = 1;

olibbhq SSCarpal Tunnel Points: 4034 More actions · Answer 8

Many Thanks will give this a go.

Thanks for the help.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 9

Eirikur Eiriksson (3/21/2014)
Just a quick thought;
Splitting the string is somewhat like crossing the stream for a sip of water, when all that is needed is to get the position of the spaces/delimiters.

That seems valid, but you're scanning the table trice. Would that perform better than the splitter?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Luis Cazares SSC Guru Points: 183706 More actions · Answer 10

I can't test this right now because I don't have a 2012 instance available, but it uses Eirikur's idea.

WITH e1(N) AS(

SELECT N FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))e(N)

),

e2(N) AS(

SELECT a.N FROM e1 a, e1 b

),

e4(N) AS(

SELECT a.N FROM e2 a, e2 b

),

Spaces AS(

SELECT ID, N

FROM #A_TABLE a

CROSS APPLY(

SELECT TOP(LEN(column1) + 2) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) - 1 N

FROM e4

) t

WHERE SUBSTRING( column1 + ' ', N, 1) = ' '

),

Lengths AS(

SELECT ID, LEAD(N,1) OVER(PARTITION BY ID ORDER BY N) - N AS StringLength

FROM Spaces

)

SELECT ID

FROM Lengths

WHERE StringLength > @Length

GROUP BY ID

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Eirikur Eiriksson SSC Guru Points: 182920 More actions · Answer 11

Luis Cazares (3/21/2014)
Eirikur Eiriksson (3/21/2014)
Just a quick thought;
Splitting the string is somewhat like crossing the stream for a sip of water, when all that is needed is to get the position of the spaces/delimiters.
That seems valid, but you're scanning the table trice. Would that perform better than the splitter?

The reason is that this is the same work as the splitter does, minus the string manipulation. The three scans are a result of the union clauses, the first and the last scans only bring back the identity value.

I made a mistake in my earlier code as I forgot to limit the number sequence generation to the length of the string. Something that is more than 99 percent of the cost. A correct code is:

DECLARE @LENGTH_LIMIT INT = 60;

DECLARE @DELIM SMALLINT = UNICODE(N' ');

SELECT

XX.ID

FROM

(

SELECT

ROW_NUMBER() OVER

(

PARTITION BY X.ID

ORDER BY (SELECT NULL)

) AS PRT_RID

,X.ID

FROM

(

SELECT

PS.ID

,LEAD(PS.N,1) OVER

(

PARTITION BY PS.ID

ORDER BY PS.ID, PS.N

) - PS.N AS ST_LEN

FROM

(

SELECT TA.ID,0 AS N,@DELIM AS CH_CODE

FROM dbo.A_TABLE TA

UNION ALL

SELECT

TA.ID

,NN.N

,UNICODE(SUBSTRING(TA.column1,NN.N,1)) AS CH_CODE

FROM dbo.A_TABLE TA

CROSS APPLY

(

SELECT

ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N

FROM

(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N1

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N2

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N3

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N4

,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N5

ORDER BY (SELECT NULL) OFFSET 0 ROWS FETCH FIRST (LEN(TA.column1)) ROWS ONLY

) AS NN

WHERE NN.N <= LEN(TA.column1)

UNION ALL

SELECT TA.ID,LEN(TA.column1),@DELIM

FROM dbo.A_TABLE TA

) AS PS

WHERE PS.CH_CODE = @DELIM

) AS X

WHERE X.ST_LEN > @LENGTH_LIMIT

) AS XX

WHERE XX.PRT_RID = 1;

thava SSCrazy Points: 2275 More actions · Answer 12

how about a recursive idea

WITH cte AS (

SELECT a.id, a.column1, CAST(0 AS BIGINT) Base, CHARINDEX(' ', a.column1, 0) AS MID

FROM a_table a

UNION ALL

SELECT c.id, c.column1, CHARINDEX(' ', a.column1, c.Mid) AS Base,

CASE ---- To pick up the last segment

WHEN CHARINDEX(' ', a.column1, c.Mid+1)>0 THEN CHARINDEX(' ', a.column1, c.Mid+1)

ELSE LEN(A.column1)

END AS MID

FROM a_table a

INNER JOIN cte c

ON c.id = a.ID

WHERE CHARINDEX(' ', a.column1, c.Mid+1)>=0 AND mid<LEN(a.column1) -- Tell the recursive part that it reach the end of the string

)

SELECT *

FROM cte

WHERE Mid - base >60

ORDER BY id

if you want it you can even get the word which length is greater than 60

Every rule in a world of bits and bytes, can be bend or eventually be broken
MyBlog About Common dialog control
A Visualizer for viewing SqlCommand object script [/url]

Luis Cazares SSC Guru Points: 183706 More actions · Answer 13

thava (3/24/2014)
how about a recursive idea

It's nice, but the DelimitedSplit8K version will work in half the time :-). You might not notice it with a few rows but it becomes clear with larger datasets.

I still need to test it against the 2012 options. But I don't have that version available.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Luis Cazares SSC Guru Points: 183706 More actions · Answer 14

And after some testing on SQL Server 2012, the DelimitedSplit8K still is the fastest solution posted in this thread. However, I haven't tested the new version presented by Eirikur in his article: Reaping the benefits of the Window functions in T-SQL[/url]

I haven't read it in detail but when I do it I might come back 🙂

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2