Best way to handle like% on column too large for an index

  • We have a table to 100M rows and up until now we were fine with an non clustered index a varchar(4000) because we never went above 900 bytes (yes it is a bad design).

    We have the need to support international character sets now so the column was updated to nvarchar(4000) and now we have data past the 900 byte limit.

    The data is long, seems useless but is needed by the business and they need to be able to search "where bigcolumn like 'test%'". With an index, even with a huge amount of data, it was 'fast'. Now of course without an index it is unusable. The wildcard is always at the end of the search.

    I made a full text index on the column and basic queries such as:

    select * from ourtable where contains(bigcolumn, '\\AReallyLongStringofTextHere') works fine unless there is a space in the data. We loose thousands of returned rows because of spaces in the data.

    I have tried select * from ourtable where contains(bigcolumn, '"\\AReallyLongStringofTextHere that includes spaces"') but not all of the data is returned. I get 112 rows with the contains statement. The table scanning statement of "select * from ourtable where bigcolumn like '\\AReallyLongStringofTextHere that includes spaces%' returns 1939 rows.

    I understand that a full text index is breaking the long string up since it contains spaces. Is there a way to retain the entire string as 1 index entry or is there a way to fix my query to return all of the rows?

    Thanks

  • Since you're always searching at the start of the string, you could add a persisted computed column of ~900 bytes and search on that:

    ALTER TABLE ourtable

    ADD bigcolumn_search AS CAST(LEFT(bigcolumn, 450) AS nvarchar(450)) PERSISTED

    CREATE NONCLUSTERED INDEX ourtable__IX_bigcolumn_search ON ourtable ( bigcolumn_search ) WITH ( FILLFACTOR = 99 ) ON [PRIMARY] --or whatever filegroup it should go on

    SELECT *

    FROM outtable

    WHERE bigcolumn_search LIKE 'some long string of stuff%'

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • kevitra (9/18/2015)


    We have a table to 100M rows and up until now we were fine with an non clustered index a varchar(4000) because we never went above 900 bytes (yes it is a bad design).

    We have the need to support international character sets now so the column was updated to nvarchar(4000) and now we have data past the 900 byte limit.

    The data is long, seems useless but is needed by the business and they need to be able to search "where bigcolumn like 'test%'". With an index, even with a huge amount of data, it was 'fast'. Now of course without an index it is unusable. The wildcard is always at the end of the search.

    I made a full text index on the column and basic queries such as:

    select * from ourtable where contains(bigcolumn, '\\AReallyLongStringofTextHere') works fine unless there is a space in the data. We loose thousands of returned rows because of spaces in the data.

    I have tried select * from ourtable where contains(bigcolumn, '"\\AReallyLongStringofTextHere that includes spaces"') but not all of the data is returned. I get 112 rows with the contains statement. The table scanning statement of "select * from ourtable where bigcolumn like '\\AReallyLongStringofTextHere that includes spaces%' returns 1939 rows.

    I understand that a full text index is breaking the long string up since it contains spaces. Is there a way to retain the entire string as 1 index entry or is there a way to fix my query to return all of the rows?

    Thanks

    Here is what I am hearing. We always search for some text value at the beginning of the string, hence ... LIKE 'Test%' as an example. What this doesn't tell use is that you may also do a search like ... LIKE 'This is a test%'. To really give you a good answer, we really need to know a little more about what it is you are trying to accomplish.

  • ScottPletcher (9/18/2015)


    Since you're always searching at the start of the string, you could add a persisted computed column of ~900 bytes and search on that:

    ALTER TABLE ourtable

    ADD bigcolumn_search AS CAST(LEFT(bigcolumn, 450) AS nvarchar(450)) PERSISTED

    CREATE NONCLUSTERED INDEX ourtable__IX_bigcolumn_search ON ourtable ( bigcolumn_search ) WITH ( FILLFACTOR = 99 ) ON [PRIMARY] --or whatever filegroup it should go on

    SELECT *

    FROM outtable

    WHERE bigcolumn_search LIKE 'some long string of stuff%'

    With the PERSISTED index anything over 450 characters will be ignored, correct? We need 100% matching on the entire string. If a string is 700 characters most likely the search criteria will be 675 characters with a wildcard at the end. Matching on the first 450 would not be acceptable for our application.

    Thanks

  • Here is what I am hearing. We always search for some text value at the beginning of the string, hence ... LIKE 'Test%' as an example. What this doesn't tell use is that you may also do a search like ... LIKE 'This is a test%'. To really give you a good answer, we really need to know a little more about what it is you are trying to accomplish.

    The data is a dump of character data that does not need to be full text indexed like a book. It is always in a specific order and if someone searched for LIKE 'this is a test%' we would want 'this is a test%' and everything that matches. Not "this" or "is"... like a normal full text index.

    Usually the search string is almost all of the characters except the last word. You can imagine it like a file system c\windows\users\me. A search string could be c\windows\users% except > 450 characters long at times. The users are selecting the base text from a GUI and then wanting every child item that matches the string they selected. They rarely enter more search criteria.

  • kevitra (9/18/2015)


    With the PERSISTED index anything over 450 characters will be ignored, correct? We need 100% matching on the entire string. If a string is 700 characters most likely the search criteria will be 675 characters with a wildcard at the end. Matching on the first 450 would not be acceptable for our application.

    I would match on the first 450 characters (using the INDEX) to narrow the search, and then search on the, JOINed, table for an exact match to narrow the resultset further.

    Hopefully many/most of your searches are highly selective on the first 450 characters?

  • kevitra (9/18/2015)


    I made a full text index on the column and basic queries such as:

    select * from ourtable where contains(bigcolumn, '\\AReallyLongStringofTextHere') works fine unless there is a space in the data. We loose thousands of returned rows because of spaces in the data.

    I have tried select * from ourtable where contains(bigcolumn, '"\\AReallyLongStringofTextHere that includes spaces"') but not all of the data is returned. I get 112 rows with the contains statement. The table scanning statement of "select * from ourtable where bigcolumn like '\\AReallyLongStringofTextHere that includes spaces%' returns 1939 rows.

    I understand that a full text index is breaking the long string up since it contains spaces.

    Quick suggestion, enclose the search string in double quotes.

    😎

  • Quick suggestion, enclose the search string in double quotes.

    😎

    I do have double quotes around the CONTAINS but it is missing 1000 rows of data. I don't understand why this is happening.

    This is the SELECT with single and double quotes:

    SELECT * FROM ourtable WHERE CONTAINS(bigcolumn,'"aholebunchof text that sometimes has spacesandsometimes does not and needs to be found in order with all child records"')

  • Kristen-173977 (9/19/2015)


    kevitra (9/18/2015)


    With the PERSISTED index anything over 450 characters will be ignored, correct? We need 100% matching on the entire string. If a string is 700 characters most likely the search criteria will be 675 characters with a wildcard at the end. Matching on the first 450 would not be acceptable for our application.

    I would match on the first 450 characters (using the INDEX) to narrow the search, and then search on the, JOINed, table for an exact match to narrow the resultset further.

    Hopefully many/most of your searches are highly selective on the first 450 characters?

    Most of the time searches are well under 450 characters. Thanks for the suggestion.

  • If you don't require an nvarchar search, you can use varchar and get 900 chars to search.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply