March 19, 2015 at 2:37 pm
Hello,
I have several URLS in a table that look like this
This is the SQL I wrote
select * from dbo.URLTable where url not like '%[^a-zA-Z0-9]%.[a-z][a-z]%'
but couldn't catch the above cases.
I tried using SQL CLR function REGEXParse but was unsuccesful. Can somebody provide me hints?
Thanks
MR
March 19, 2015 at 3:06 pm
rash3554 (3/19/2015)
Hello,I have several URLS in a table that look like this
This is the SQL I wrote
select * from dbo.URLTable where url not like '%[^a-zA-Z0-9]%.[a-z][a-z]%'
but couldn't catch the above cases.
I tried using SQL CLR function REGEXParse but was unsuccesful. Can somebody provide me hints?
Thanks
MR
You will NEVER get this correct in t-sql. It is far too complicated. For example, there is absolutely nothing wrong with the first example you posted. If the base domain is com.com and there is a subdomain of g this is perfectly valid.
I would look into a decent RegEx online. For example look at this page dedicated to finding a regex that can do this 100%. https://mathiasbynens.be/demo/url-regex
In other words, you need to find yourself a regex that captures most of the invalid stuff and move on.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 19, 2015 at 3:14 pm
I'm sure you could write some kind of CLR function that would go ping your URL's, that'll tell you whether they're valid or not 😀
Trying to rely on regular expressions for this kind of thing is prone to either being too loose or too strict.
March 19, 2015 at 3:30 pm
I'd go with the ping option as a validation before inserting in the table.
Would this be a valid URL http://www.thisisaninvalidurl.com? It has the correct format but won't get you anywhere.
March 19, 2015 at 3:32 pm
ZZartin (3/19/2015)
I'm sure you could write some kind of CLR function that would go ping your URL's, that'll tell you whether they're valid or not 😀Trying to rely on regular expressions for this kind of thing is prone to either being too loose or too strict.
I built one for CLR a while ago as a proof of concept:
http://www.sqlservercentral.com/Forums/FindPost1382517.aspx
you can grab the text results and see oif there is an 404 error or if it raised an error or whatever
Lowell
March 19, 2015 at 3:39 pm
Luis Cazares (3/19/2015)
I'd go with the ping option as a validation before inserting in the table.Would this be a valid URL http://www.thisisaninvalidurl.com? It has the correct format but won't get you anywhere.
Using ping is probably more likely to return false negatives than it is to find invalid urls. for example you can't ping microsoft.com, but you can ping http://www.microsoft.com. Either of those entered into ANY browser are perfectly fine.
Also, some perfectly valid servers will not send a reply to a ping so those would also possibly indicate a false negative.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 19, 2015 at 3:40 pm
Lowell (3/19/2015)
ZZartin (3/19/2015)
I'm sure you could write some kind of CLR function that would go ping your URL's, that'll tell you whether they're valid or not 😀Trying to rely on regular expressions for this kind of thing is prone to either being too loose or too strict.
I built one for CLR a while ago as a proof of concept:
http://www.sqlservercentral.com/Forums/FindPost1382517.aspx
you can grab the text results and see oif there is an 404 error or if it raised an error or whatever
Excellent. I was remembering this thread but wasn't having any luck finding it. Awesome.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 19, 2015 at 4:09 pm
Sean Lange (3/19/2015)
Luis Cazares (3/19/2015)
I'd go with the ping option as a validation before inserting in the table.Would this be a valid URL http://www.thisisaninvalidurl.com? It has the correct format but won't get you anywhere.
Using ping is probably more likely to return false negatives than it is to find invalid urls. for example you can't ping microsoft.com, but you can ping http://www.microsoft.com. Either of those entered into ANY browser are perfectly fine.
Also, some perfectly valid servers will not send a reply to a ping so those would also possibly indicate a false negative.
Yeah ping might have been the wrong word, I meant anything that will verify that a url can be resolved for example ping will resolve microsoft.com and you can see that even though you don't get a reply.
And it looks like Lowell posted a solution that will work just fine for that.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply