September 7, 2004 at 6:05 am
Does anyone know how I can search through a textfield in a table and return only the links in the textfield?
September 7, 2004 at 8:49 am
Aris --
By "textfield", I assume you mean a column of datatype TEXT? What is the maximum length of the data that will ever be inserted into the column?
Have you defined exactly what "links" mean to you? Will you support all valid URI formats? Is the data marked up (e.g. anchor tags, which would certainly help), or are you hoping to "automatically" figure out links like Word does?
Finally, are you in control of the data on the way in to the database? If so, I would recommend doing this parsing before insertion, and either A) Adding markup if none exists, or B) Storing "pointers" (substring locations) to the links in a separate table.
--
Adam Machanic
whoisactive
September 7, 2004 at 9:21 am
Yes, I am referring to a column of datatype text. The links I want to find will contain "http://" so that should make it somewhat easier.
The data is already stored in the database. I just need to go through and return the links that are in there. This sounds like something that may require a cursor, but I don't even know where to start to accomplish something like this.
September 7, 2004 at 9:27 am
Depending on your data, this may be doable without a cursor. Can you post some sample rows?
--
Adam Machanic
whoisactive
September 7, 2004 at 10:12 am
There wouldn't be anything special about any rows I might post. Basically, if you create a text field in the db and type in
"This is sample text with a link right here: http://www.google.com - This is more sample text and more sample text, etc., etc. and this is another link http://www.amazon.com "
September 7, 2004 at 10:31 am
You can use a sequence table for this. Here's a very simple example. I have not tested it thoroughly, so it may require some or a lot of tweaking ... Anyway, tell me if you have any questions about how this works:
use tempdb
go
select top 8000 identity(int, 1, 1) as number
into numbers
from pubs..authors a,
pubs..authors b,
pubs..authors c
go
create table testsearch(sometext text)
go
insert testsearch values ('This is sample text with a link right here: http://www.google.com - This is more sample text and more sample text, etc., etc. and this is another link http://www.amazon.com ')
go
select substring(testsearch.sometext,
numbers.number,
charindex(' ',
testsearch.sometext,
charindex('http://', testsearch.sometext, numbers.number)) - charindex('http://', testsearch.sometext, numbers.number))
from testsearch
join numbers on numbers.number = charindex('http://', testsearch.sometext, numbers.number)
--
Adam Machanic
whoisactive
September 7, 2004 at 1:11 pm
Thank you very much for trying to help me out, but to be honest I've never delved this far into TSQL. If you could possibly give me a quick tutorial on what's going on I would be most grateful.
September 7, 2004 at 1:17 pm
First, a table called 'numbers' is created, with every number from 1 - 8000. This is meant to be a one-time thing. You should also make the number a primary key. But I didn't do that, for brevity.
Next, we JOIN to the numbers table, based on the start position of the string 'HTTP://'. The numbers table is used to control the CHARINDEX and SUBSTRING functions, which take integers as input. So now we have the location of the start position of every 'HTTP://' (from our JOIN). I assumed that the end of the link could always be identified by a space (which, of course, may not be true in reality)... So using SUBSTRING, you ask for the text between the start of the 'HTTP://' and the next space after that 'HTTP://'.
It's a bit confusing the first time you see it, but try pulling apart the components and it should (I hope) begin to make more sense...
--
Adam Machanic
whoisactive
September 8, 2004 at 6:42 am
Two more questions if you would be so kind.
Let's say the field I want to search is called authorbio in the authors table. How would I tell it to insert this value rather than the hard-coded string below?
insert testsearch values ('This is sample text with a link right here: http://www.google.com - This is more sample text and more sample text, etc., etc. and this is another link http://www.amazon.com ')
go
This doesn't work - insert testsearch values (authorbio) go
lastly, why are you selecting from pubs..authors a, pubs..authors b, pubs..authors c? Why do you not just select from pubs..authors?
September 8, 2004 at 7:17 am
Aris,
You don't need to use the 'testsearch' table at all. It was just there for an example. Use your authors table itself:
select substring(authors.authorbio,
numbers.number,
charindex(' ',
authors.authorbio,
charindex('http://', authors.authorbio, numbers.number)) - charindex('http://', authors.authorbio, numbers.number))
from authors join numbers on numbers.number = charindex('http://', authors.authorbio, numbers.number)
I select from pubs..authors a, b, .. to create a cross-join. There are 23 rows in pubs..authors. We require a numbers table with 8000 characters, in order to search as much data as we can using the standard functions. By cross-joining three times, we produce 12167 rows, which can then be used for the TOP 8000 with the IDENTITY.
Make sure you add a clustered primary key to the number column of the numbers table. It will help with performance of this query.
--
Adam Machanic
whoisactive
September 8, 2004 at 8:23 am
That works great! Thank you so much! You rock!
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply