March 20, 2014 at 6:10 am
Hi,
This might sounds a bit odd but here goes, thanks for your help in advance. We have some URLs within a bulk block of text some of which are very long. I need to identify rows where such urls exceed say 100 characters in length in amongst other text.
So the rule would be return a record if within the string there is a string (without spaces) longer than 100 characters.
Hopefully this is not too confusing and thank you for your help in advance.
Oliver
March 20, 2014 at 6:23 am
You may have to use combination of charindex() and len() function to determine this. If you could please provide the sample set then we can give it a go.
March 20, 2014 at 6:49 am
Hi,
Please see attached as an example.
Many Thanks,
Oliver
CREATE TABLE [dbo].[A_TABLE](
[ID] [int] NOT NULL,
[column1] [nvarchar](max) NULL,
CONSTRAINT [PK_A_TABLE] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (1, N'asb unliunliuniluniuniuiluniuniuneiwudbeuwybduweybduweybduywebdukybwedybweuybdweyubedukywbed nnnn unuuuu jjjjj ')
GO
INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (2, N'eirfreuu erfiuneriufnerufn iurnfliuenfliuwenfiu unliufrenwuifnewuinfliuewnrfiuwruiefniuernfilurneliunwflriunwliufnrwliunfrliuwneflirunwef rufnwerun')
GO
INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (3, N'ireunfiluwernfiulrenwliufrnweiurnluewniuwnliueniuenfrliunewuifniuewnrieureuinflriunwliuerfn nn urr urrrrrr')
GO
INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (4, N'frewunfilurewn erifunliuwernfliruwenf')
GO
INSERT [dbo].[A_TABLE] ([ID], [column1]) VALUES (5, N' reufniruenfrluienw oeriunfirewifrweionfriweo;ifernfoiwrneoifnweo;inrfioweno;irnfweiornioen;oiewnoifrn;irenniefvjhdbscvuybsd')
GO
March 20, 2014 at 10:31 am
Based on that sample data. What should be the output?
March 20, 2014 at 10:41 am
Thanks for the reply. The output I would like is to return the ID for the rows which have a continous character string longer than say 60 within the string, so for example it would return rows:
1
2
3
5
Row 4 does not have and continous characters strings longer that 60 characters so therefore is excluded.
Many Thanks,
Oliver
March 20, 2014 at 10:51 am
If I understand correctly, you want to check if there's a continuous character string in the column, it doesn't matter the position of it.
You could do this with a string delimiter and checking each item. The best T-SQL option available is explained in here: http://www.sqlservercentral.com/articles/Tally+Table/72993/
And I'm using it in the following query.
DECLARE @Length int = 60
SELECT DISTINCT ID
FROM dbo.A_TABLE
CROSS APPLY dbo.DelimitedSplit8K( column1, ' ')
WHERE LEN(Item) > @Length
If you have any questions, feel free to ask.
March 20, 2014 at 11:20 am
Many Thanks this works perfectly. I really appreciate the help that splitter is going to be so useful going forward for other tasks. It seems that cross apply seems to be the answer to all my questions at the moment!
Thanks again
March 21, 2014 at 3:27 am
Just a quick thought;
Splitting the string is somewhat like crossing the stream for a sip of water, when all that is needed is to get the position of the spaces/delimiters.
Here is a suggestion:
USE tempdb;
GO
DECLARE @LENGTH_LIMIT INT = 50;
DECLARE @DELIM SMALLINT = UNICODE(N' ');
SELECT
XX.ID
FROM
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY X.ID
ORDER BY (SELECT NULL)
) AS PRT_RID
,X.ID
FROM
(
SELECT
PS.ID
,LEAD(PS.N,1) OVER
(
PARTITION BY PS.ID
ORDER BY PS.ID, PS.N
) - PS.N AS ST_LEN
FROM
(
SELECT TA.ID,0 AS N,@DELIM AS CH_CODE
FROM dbo.A_TABLE TA
UNION ALL
SELECT
TA.ID
,NN.N
,UNICODE(SUBSTRING(TA.column1,NN.N,1)) AS CH_CODE
FROM dbo.A_TABLE TA
CROSS APPLY
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM
(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N1
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N2
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N3
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N4
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N5
) AS NN
WHERE NN.N <= LEN(TA.column1)
UNION ALL
SELECT TA.ID,LEN(TA.column1),@DELIM
FROM dbo.A_TABLE TA
) AS PS
WHERE PS.CH_CODE = @DELIM
) AS X
WHERE X.ST_LEN > @LENGTH_LIMIT
) AS XX
WHERE XX.PRT_RID = 1;
March 21, 2014 at 3:31 am
Many Thanks will give this a go.
Thanks for the help.
March 21, 2014 at 11:35 am
Eirikur Eiriksson (3/21/2014)
Just a quick thought;Splitting the string is somewhat like crossing the stream for a sip of water, when all that is needed is to get the position of the spaces/delimiters.
That seems valid, but you're scanning the table trice. Would that perform better than the splitter?
March 21, 2014 at 12:14 pm
I can't test this right now because I don't have a 2012 instance available, but it uses Eirikur's idea.
WITH e1(N) AS(
SELECT N FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))e(N)
),
e2(N) AS(
SELECT a.N FROM e1 a, e1 b
),
e4(N) AS(
SELECT a.N FROM e2 a, e2 b
),
Spaces AS(
SELECT ID, N
FROM #A_TABLE a
CROSS APPLY(
SELECT TOP(LEN(column1) + 2) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) - 1 N
FROM e4
) t
WHERE SUBSTRING( column1 + ' ', N, 1) = ' '
),
Lengths AS(
SELECT ID, LEAD(N,1) OVER(PARTITION BY ID ORDER BY N) - N AS StringLength
FROM Spaces
)
SELECT ID
FROM Lengths
WHERE StringLength > @Length
GROUP BY ID
March 21, 2014 at 2:27 pm
Luis Cazares (3/21/2014)
Eirikur Eiriksson (3/21/2014)
Just a quick thought;Splitting the string is somewhat like crossing the stream for a sip of water, when all that is needed is to get the position of the spaces/delimiters.
That seems valid, but you're scanning the table trice. Would that perform better than the splitter?
The reason is that this is the same work as the splitter does, minus the string manipulation. The three scans are a result of the union clauses, the first and the last scans only bring back the identity value.
I made a mistake in my earlier code as I forgot to limit the number sequence generation to the length of the string. Something that is more than 99 percent of the cost. A correct code is:
DECLARE @LENGTH_LIMIT INT = 60;
DECLARE @DELIM SMALLINT = UNICODE(N' ');
SELECT
XX.ID
FROM
(
SELECT
ROW_NUMBER() OVER
(
PARTITION BY X.ID
ORDER BY (SELECT NULL)
) AS PRT_RID
,X.ID
FROM
(
SELECT
PS.ID
,LEAD(PS.N,1) OVER
(
PARTITION BY PS.ID
ORDER BY PS.ID, PS.N
) - PS.N AS ST_LEN
FROM
(
SELECT TA.ID,0 AS N,@DELIM AS CH_CODE
FROM dbo.A_TABLE TA
UNION ALL
SELECT
TA.ID
,NN.N
,UNICODE(SUBSTRING(TA.column1,NN.N,1)) AS CH_CODE
FROM dbo.A_TABLE TA
CROSS APPLY
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N
FROM
(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N1
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N2
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N3
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N4
,(SELECT NM.N FROM ( VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1) ) AS NM(N)) AS N5
ORDER BY (SELECT NULL) OFFSET 0 ROWS FETCH FIRST (LEN(TA.column1)) ROWS ONLY
) AS NN
WHERE NN.N <= LEN(TA.column1)
UNION ALL
SELECT TA.ID,LEN(TA.column1),@DELIM
FROM dbo.A_TABLE TA
) AS PS
WHERE PS.CH_CODE = @DELIM
) AS X
WHERE X.ST_LEN > @LENGTH_LIMIT
) AS XX
WHERE XX.PRT_RID = 1;
March 24, 2014 at 4:23 am
how about a recursive idea
WITH cte AS (
SELECT a.id, a.column1, CAST(0 AS BIGINT) Base, CHARINDEX(' ', a.column1, 0) AS MID
FROM a_table a
UNION ALL
SELECT c.id, c.column1, CHARINDEX(' ', a.column1, c.Mid) AS Base,
CASE ---- To pick up the last segment
WHEN CHARINDEX(' ', a.column1, c.Mid+1)>0 THEN CHARINDEX(' ', a.column1, c.Mid+1)
ELSE LEN(A.column1)
END AS MID
FROM a_table a
INNER JOIN cte c
ON c.id = a.ID
WHERE CHARINDEX(' ', a.column1, c.Mid+1)>=0 AND mid<LEN(a.column1) -- Tell the recursive part that it reach the end of the string
)
SELECT *
FROM cte
WHERE Mid - base >60
ORDER BY id
if you want it you can even get the word which length is greater than 60
Every rule in a world of bits and bytes, can be bend or eventually be broken
MyBlog About Common dialog control
A Visualizer for viewing SqlCommand object script [/url]
March 24, 2014 at 9:32 am
thava (3/24/2014)
how about a recursive idea
It's nice, but the DelimitedSplit8K version will work in half the time :-). You might not notice it with a few rows but it becomes clear with larger datasets.
I still need to test it against the 2012 options. But I don't have that version available.
March 24, 2014 at 11:04 pm
And after some testing on SQL Server 2012, the DelimitedSplit8K still is the fastest solution posted in this thread. However, I haven't tested the new version presented by Eirikur in his article: Reaping the benefits of the Window functions in T-SQL[/url]
I haven't read it in detail but when I do it I might come back 🙂
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply