How To Efficiently Index Using a VARCHAR Column?

  • I've inherited several databases designed by software developers. There are a lot of VARCHAR columns and a lot of non-indexed searches on them with poor response times. I've been asked to improve response times.

    Efficient searches require indexing but my best knowledge says that it is very inefficient to index a VARCHAR column, either individually, or as part of a composite index, because (I have read) this condition causes index scans instead of seeks.

    Is this true? If not, what is true about using a VARCHAR column in an index?

    If you were faced with my situation, what would be the best course of action(s)?

    LC

  • crainlee2 (9/16/2009)


    If you were faced with my situation, what would be the best course of action(s)?

    Test and find out for sure... here's a starter kit...

    SELECT TOP 1000000

    IDENTITY(INT,1,1) AS UniqueInt,

    CAST(NEWID() AS VARCHAR(36)) AS UniqueVarChar

    INTO #TestTable

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Before I use your suggestion to produce a 1M times 1M cartesian product that will consume all of memory, my CPU, and a hard drive, I'd appreciate an explanation of what your suggestion is designed to test.

    If your suggestion is a DBA joke, I got it. Ha ha. Very funny. Isn't there another section of this forum for clowns?

    Does anyone else have a serious comment?

    LC

  • As a rule, you want indexes to be as narrow as possible and as selective as possible. So the best index is a unique integer value. However, the real world intrudes on that a bit.

    Heck yes, index varchar. You just need to validate the selectivity of the index and balance the added cost of index maintenance for inserts/updates/deletes over the value of the index for selects (and, to an extent, updates/deletes). I don't know anyone dealing with real systems that doesn't index varchar. You just have to pick & choose where you do it carefully, but that shouldn't be any different than indexing an integer, datetime, binary, etc..

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • crainlee2 (9/17/2009)


    Jeff,

    Before I use your suggestion to produce a 1M times 1M cartesian product that will consume all of memory, my CPU, and a hard drive, I'd appreciate an explanation of what your suggestion is designed to test.

    If your suggestion is a DBA joke, I got it. Ha ha. Very funny. Isn't there another section of this forum for clowns?

    Does anyone else have a serious comment?

    LC

    First of all, this will NOT produce a 1 M times 1 M cartesian product. It will create a table of 1M rows that you can use to test, with a varchar of 36, that you can index and see its behavior.

    Now, that looks pretty ironic that you insult one of the best posters on SSC saying he gave you a clown suggestion.

    Maybe you did not understand what he was trying to help you with, then all you need to do is kindly ask questions.

    People here are just trying to help, and I find it really immature that you answer that way when someone is trying to help you, and teach you indexing.

    Now, I propose you create the table, and query using a where condition on the varchar column.

    Then, add a nonclustered index on it, and do the same query, it will show you how fast it will now be.

    Cheers,

    J-F

  • Depending on how large the varchar columns are you may get quicker results using Full-Text Indexes on these columns

    http://msdn.microsoft.com/en-us/library/ms345119(SQL.90).aspx"> http://msdn.microsoft.com/en-us/library/ms345119(SQL.90).aspx

    Full-Text is efficient at searching a word from a large group of text, for example it can speed up LIKE '%string%' queries where a normal index may struggle.

    If the text columns are not to large, for example a persons first name then i would be more inclined to use a traditional non-clustered index.

    Like a lot of things in SQL it can really depend on your data and what results you are trying to achieve as Jeff suggested sometimes it can be better to try out the scenarios with some test data to find the optimal solution

    soluiton

  • Thanks Grant and Ten. I'll simply move forward on indexing.

    Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.

    Thanks to all,

    LC

  • Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.

    It does, but did you spot this part of Jeff's query?

    SELECT TOP 1000000

  • crainlee2 (9/17/2009)


    Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.

    Dude

    Seriously. Listen to the Jeff. He knows. He has pulled out massively performant SQL from nowhere more times than I'd care / even be able to count.

    Perhaps, if you can't spot a TOP clause, you might want to take a bit of humility before pointing people at BOL. 😀

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • crainlee2 (9/17/2009)


    Thanks Grant and Ten. I'll simply move forward on indexing.

    Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.

    Thanks to all,

    LC

    The whole point of that cross join is to produce a million row cartesian product. That way, you have a million rows to test with. Testing on a few thousand rows isn't generally considered adequate, because you can have solutions for a few thousand rows that simply don't scale well. Of course, if you can guarantee that the table will never have more than X rows, then scale may not matter, but there are very, very few cases where you can actually be absolutely sure that you'll never have the data grow.

    Before you simply move forward on indexing, I recommend testing like crazy. Some index schemes can actually slow things down more than they speed them up, and there's no way to be certain without extensive testing.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Yeah, I guess I didn't say it my post, it's fine to index a VARCHAR column, but you need to test, test, test. The same goes for any index on any data type.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • You know, I copied Jeff's code and ran it on my development server here at work. It ran for 6 seconds and generated a temp table with 1,000,000 rows.

    Don't see where the joke is with this one.

    I think someone owes Jeff a very sincere and LENGTHY apology.

  • Jeff, J-F, Ian, Matt,

    I sincerely apologize. I'm not liable for anything I think, say, or do until after my 3rd cup of coffee in the morning. When I read Jeff's reply, I was on my second cup.

    No, I did not notice the TOP keyword. Duh.

    Jeff, I appreciate your response. I will give it a try, to experiment and assess results.

    Thanks to everyone for their responses.

    Sincerely,

    LC

  • If you are searching VARCHAR columns for exact matches, an index on a checksum column (or computed checksum column, or a checksum in a view) can be a good solution.

    These threads have fairly extensive discussions and examples on this subject.

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47302

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=66170

    The advantage of this approach is that you can have fast integer indexes without the overhead of indexes of large VARCHAR columns.

  • Thanks for sharing the CHECKSUM links. I didn't know that function existed.

    That's a pretty cool way of quickly narrowing searches on large chucks of character data.

    LC

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply