How To Efficiently Index Using a VARCHAR Column?

Question

Post reply

How To Efficiently Index Using a VARCHAR Column?

Gail Wanabee

SSCrazy Eights

Points: 8471
More actions
September 16, 2009 at 6:27 pm

#210932

I've inherited several databases designed by software developers. There are a lot of VARCHAR columns and a lot of non-indexed searches on them with poor response times. I've been asked to improve response times.
Efficient searches require indexing but my best knowledge says that it is very inefficient to index a VARCHAR column, either individually, or as part of a composite index, because (I have read) this condition causes index scans instead of seeks.
Is this true? If not, what is true about using a VARCHAR column in an index?
If you were faced with my situation, what would be the best course of action(s)?
LC

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004370 More actions · Answer 1

crainlee2 (9/16/2009)
If you were faced with my situation, what would be the best course of action(s)?

Test and find out for sure... here's a starter kit...

SELECT TOP 1000000

IDENTITY(INT,1,1) AS UniqueInt,

CAST(NEWID() AS VARCHAR(36)) AS UniqueVarChar

INTO #TestTable

FROM Master.sys.All_Columns ac1

CROSS JOIN Master.sys.All_Columns ac2

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 2

Jeff,

Before I use your suggestion to produce a 1M times 1M cartesian product that will consume all of memory, my CPU, and a hard drive, I'd appreciate an explanation of what your suggestion is designed to test.

If your suggestion is a DBA joke, I got it. Ha ha. Very funny. Isn't there another section of this forum for clowns?

Does anyone else have a serious comment?

LC

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 3

As a rule, you want indexes to be as narrow as possible and as selective as possible. So the best index is a unique integer value. However, the real world intrudes on that a bit.

Heck yes, index varchar. You just need to validate the selectivity of the index and balance the added cost of index maintenance for inserts/updates/deletes over the value of the index for selects (and, to an extent, updates/deletes). I don't know anyone dealing with real systems that doesn't index varchar. You just have to pick & choose where you do it carefully, but that shouldn't be any different than indexing an integer, datetime, binary, etc..

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 4

crainlee2 (9/17/2009)
Jeff,
Before I use your suggestion to produce a 1M times 1M cartesian product that will consume all of memory, my CPU, and a hard drive, I'd appreciate an explanation of what your suggestion is designed to test.
If your suggestion is a DBA joke, I got it. Ha ha. Very funny. Isn't there another section of this forum for clowns?
Does anyone else have a serious comment?
LC

First of all, this will NOT produce a 1 M times 1 M cartesian product. It will create a table of 1M rows that you can use to test, with a varchar of 36, that you can index and see its behavior.

Now, that looks pretty ironic that you insult one of the best posters on SSC saying he gave you a clown suggestion.

Maybe you did not understand what he was trying to help you with, then all you need to do is kindly ask questions.

People here are just trying to help, and I find it really immature that you answer that way when someone is trying to help you, and teach you indexing.

Now, I propose you create the table, and query using a where condition on the varchar column.

Then, add a nonclustered index on it, and do the same query, it will show you how fast it will now be.

Cheers,

J-F

steveb. SSC-Forever Points: 46733 More actions · Answer 5

Depending on how large the varchar columns are you may get quicker results using Full-Text Indexes on these columns

http://msdn.microsoft.com/en-us/library/ms345119(SQL.90).aspx"> http://msdn.microsoft.com/en-us/library/ms345119(SQL.90).aspx

Full-Text is efficient at searching a word from a large group of text, for example it can speed up LIKE '%string%' queries where a normal index may struggle.

If the text columns are not to large, for example a persons first name then i would be more inclined to use a traditional non-clustered index.

Like a lot of things in SQL it can really depend on your data and what results you are trying to achieve as Jeff suggested sometimes it can be better to try out the scenarios with some test data to find the optimal solution

soluiton

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 6

Thanks Grant and Ten. I'll simply move forward on indexing.

Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.

Thanks to all,

LC

Ian Scarlett SSC-Insane Points: 23197 More actions · Answer 7

Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.

It does, but did you spot this part of Jeff's query?

SELECT TOP 1000000

Matt Whitfield SSCrazy Eights Points: 8107 More actions · Answer 8

crainlee2 (9/17/2009)
Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.

Dude

Seriously. Listen to the Jeff. He knows. He has pulled out massively performant SQL from nowhere more times than I'd care / even be able to count.

Perhaps, if you can't spot a TOP clause, you might want to take a bit of humility before pointing people at BOL. 😀

Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone

GSquared SSC Guru Points: 260824 More actions · Answer 9

crainlee2 (9/17/2009)
Thanks Grant and Ten. I'll simply move forward on indexing.
Old Hand: Note that without a WHERE clause, a CROSS JOIN produces a cartesian product. If you don't believe me, look it up in Books Online.
Thanks to all,
LC

The whole point of that cross join is to produce a million row cartesian product. That way, you have a million rows to test with. Testing on a few thousand rows isn't generally considered adequate, because you can have solutions for a few thousand rows that simply don't scale well. Of course, if you can guarantee that the table will never have more than X rows, then scale may not matter, but there are very, very few cases where you can actually be absolutely sure that you'll never have the data grow.

Before you simply move forward on indexing, I recommend testing like crazy. Some index schemes can actually slow things down more than they speed them up, and there's no way to be certain without extensive testing.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 10

Yeah, I guess I didn't say it my post, it's fine to index a VARCHAR column, but you need to test, test, test. The same goes for any index on any data type.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 11

You know, I copied Jeff's code and ran it on my development server here at work. It ran for 6 seconds and generated a temp table with 1,000,000 rows.

Don't see where the joke is with this one.

I think someone owes Jeff a very sincere and LENGTHY apology.

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 12

Jeff, J-F, Ian, Matt,

I sincerely apologize. I'm not liable for anything I think, say, or do until after my 3rd cup of coffee in the morning. When I read Jeff's reply, I was on my second cup.

No, I did not notice the TOP keyword. Duh.

Jeff, I appreciate your response. I will give it a try, to experiment and assess results.

Thanks to everyone for their responses.

Sincerely,

LC

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 13

If you are searching VARCHAR columns for exact matches, an index on a checksum column (or computed checksum column, or a checksum in a view) can be a good solution.

These threads have fairly extensive discussions and examples on this subject.

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47302

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=66170

The advantage of this approach is that you can have fast integer indexes without the overhead of indexes of large VARCHAR columns.

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 14

Thanks for sharing the CHECKSUM links. I didn't know that function existed.

That's a pretty cool way of quickly narrowing searches on large chucks of character data.

LC