Optimizing query with LIKE

  • Probably we can't trick the nature, but anyhow. Is there any way (tips) to optimize query with LIKE, i.e. select id from table1 where url like '%strtosearch%' ? Sometimes this query is pretty fast but mainly... Unfortunately need to add % to both sides of searching string. Or the only decision is to use Full Text Search? Thanks

  • Full-text search uses like 'string%'. What do you need to search exactly?

  • The reason its slow is because by using a wildcard in the beginning of the string eliminates all chance of using an index to eliminate any candidate rows.

    Full text indexing is faster in some respects, but it takes additional work to implement, but i'm not sure havn't used it yet

  • Table I mentioned (pretty big ~15 million rows) has url column varchar(800). Query is looking for records (id) where this url contains some string (sometimes 15-20 characters). I know that wildcard from both side is a real pain but, unfortunately, can not get rid of them. There is nonclustered index on url field but I am not sure it helps in this query. Probably, we can not do a lot in this case. Thanks.

  • It probably doesn't apply in your case but if a clustered index is placed on the field that has the LIKE condition it works quite well because clustered indexes are good with range finds.

    If your table is fairly static and you need an additional clustered index (only one allowed per table) then consider creating a view and putting a clustered index on that.

    I would read up about indexed views first because there are some additional caveats on them such as the various options such as SET ANSI_NULLS etc.

  • Can you split the url into its parts.

    protocol = https://

    DomainName = http://www.SubDomain.DomainName

    Folders = new tables with all the folders

    FileName = Whatever.aspx

    parameters = new table or string with the parameters.

    That way you could index all the parts of the urls and do search on indexed columns (assuming the search can be modified to fit this design).

  • I can't call this table static as new records with new url are inserted pretty often. Thanks

  • Probably later we can think about it (problem is that string can be from domain, folder, file in your design approach). But currently I have what I have- whole url. Thanks

  • No offense but that data is pretty easy to split (assuming the url is in a valid url format). And then the search engine wouldn't be too hard to create, but I don't know the rest of the requirements so I have no clue how well this would work.

  • And you cannot have any other filter criteria in this query besides the url filter criteria ?  Can't you restrict based on another filter criteria that is on an indexed column, something like:

    select id

    from table1

    where url like '%strtosearch%'

    and mod_date_time > @start_date

    and mod_date_time < @end_date

    or maybe a status or some other column.

    That would prevent it from going through all the records...are you really interested in looking through all the 15 million+ rows ?  The only other solution is what Remi suggested i.e. break the URL into parts and store as computed columns that are then indexed to make the access faster.

  • Just a warning: url parsing can be a tricky business specially in shops that use url remapping

     


    * Noel

  • Care to elaborate?

  • Sure!

    There are many websites that for security reasons perform remaping of the urls depending to what pages you want to access and those modules are usually plugged in as http modules. It makes it harder for hackers to figure out the asp or php pages that are being called. Then if you come up with a logic to perform the "unmapping" you will also need to update your code every time the map function changes

     


    * Noel

  • I see .

  • Sorry guys, didn't see new posts. Anyhow, the actual query is a join:

    select t1.id

    from table1 t1

    inner join table2 t2 on t2.id = t1.contentid

    where t1.custid = @custid and t1.deleted = 'N' and t1.statusid != @newstatus

             and t2.url like @urlstr

    Table t2 has ~15 million row, clustered index on id; Table t1 has ~80,000 rows, nonclustered index on url. Just playing with indexes

    Thanks

     

     

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply