How To Efficiently Index Using a VARCHAR Column?

Question

How To Efficiently Index Using a VARCHAR Column?

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 1

crainlee2 (9/17/2009)
Jeff, J-F, Ian, Matt,
I sincerely apologize. I'm not liable for anything I think, say, or do until after my 3rd cup of coffee in the morning. When I read Jeff's reply, I was on my second cup.
No, I did not notice the TOP keyword. Duh.
Jeff, I appreciate your response. I will give it a try, to experiment and assess results.
Thanks to everyone for their responses.
Sincerely,
LC

BWAA-HAA!!! Ohhh my... I've been there and done that. Not a problem and thanks for the apology. I learned a long time ago that the 3rd cup of coffee is the magic cup of coffee and I try to not even post until then. 😀

Lee, looking back at it, I can understand your original response. There are a lot of folks that still don't know the method of using a constrained Cartesian Product to produce all sorts of things such as Tally/Numbers tables, on the fly date tables, and test tables. If you were one of the folks that didn't know about it, I can see how you might have thought I was doing something devious especially since the verbal part of my response was so bloody short.

In case there is any doubt by anyone and for the record, even if I have a deep rooted hatred for an individual or a company, I will never ever tamper with their data or their server in any way... it would be a violation of personal ethics and principles that would destroy other people's trust in me both here and at work and it would most certainly cost me my MVP status.

Heh... Don't get me wrong... I'm human and I might pork chop someone real good and hard if they deserve it, but I'll never mess with their data.

Back to the problem on the original post. You asked what we would do if faced with a similar situation and, although my response was terribly short (I only had two cups of coffee :-P), the response was accurate... use the code I posted to build a million row test table and test various scenarios with different indexes. If required, modify the code to meet your needs and test some more.

The reason why I'm suggesting such testing is because you will get a lot of responses from this forum as to what to do. Some will be based on experience and some will be based on book-read opinion AND some of those based on experience may be different than your particular situation. Some will be flat out wrong. Even if someone gives you the absolute correct answer, the ONLY way you will know is if you test against some data... a LOT of data. The script I provided will enable you to do that.

The other thing is, if I were your boss and you told me what you were going to do (one way or another), the first thing out of my mouth would be... "Cool... did you test it?" The answer had better be "Yes" followed by a demonstrable proof in the form of code and data. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 2

Lynn,

I just tested Jeff's starter kit code, too. It ran in 3 seconds.

I did some testing of SELECT statements, with and without a non-clustered index on the UniqueVarChar column. The resulting times to execute were perceptably different but both were quick, so I changed his original query to create 100,000,000 rows. This required 1 minutes and 15 seconds.

Then, I picked out a row, captured part of the value in a uniqueIdentifier column, and searched for it using a LIKE clause. This required 8 seconds.

Then, I indexed the UniqueVarChar column (this required 3 minutes and 20 seconds). I cleared the system cache and reran the same search using the LIKE clause. The elapsed time was 13 seconds which surprised me.

I removed the LIKE clause, cleared the system cache, and reran the search query searching for the entire uniqueIdentifier value. The elapsed time was instantaneous.

I dropped the non-clustered index, cleared the system cache, and reran the search query again, searching for the entire uniqueIdentifier value. The elapsed time was 9 seconds.

So, I'm sold on the merits of indexing a varchar column. But, I am surprised and cannot explain why using a LIKE clause, an aggravating factor in a search, would run faster without an index than with one.

I've repeated these experiments several times with approximately the same results. Any idea why searching with a LIKE clause on an indexed table would be slower than searching with a LIKE clause on a non-indexed table?

LC

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 3

To the rest of you folks, I just want to say "Thanks" for backing me up. I've been under the weather and in bed for the last 18 hours or so and it's a wonderful thing to return to such kind words. You're the best.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 4

crainlee2 (9/17/2009)
Lynn,
I just tested Jeff's starter kit code, too. It ran in 3 seconds.
I did some testing of SELECT statements, with and without a non-clustered index on the UniqueVarChar column. The resulting times to execute were perceptably different but both were quick, so I changed his original query to create 100,000,000 rows. This required 1 minutes and 15 seconds.
Then, I picked out a row, captured part of the value in a uniqueIdentifier column, and searched for it using a LIKE clause. This required 8 seconds.
Then, I indexed the UniqueVarChar column (this required 3 minutes and 20 seconds). I cleared the system cache and reran the same search using the LIKE clause. The elapsed time was 13 seconds which surprised me.
I removed the LIKE clause, cleared the system cache, and reran the search query searching for the entire uniqueIdentifier value. The elapsed time was instantaneous.
I dropped the non-clustered index, cleared the system cache, and reran the search query again, searching for the entire uniqueIdentifier value. The elapsed time was 9 seconds.
So, I'm sold on the merits of indexing a varchar column. But, I am surprised and cannot explain why using a LIKE clause, an aggravating factor in a search, would run faster without an index than with one.
I've repeated these experiments several times with approximately the same results. Any idea why searching with a LIKE clause on an indexed table would be slower than searching with a LIKE clause on a non-indexed table?
LC

[font="Arial Black"]BE CAREFULL!!!!![/font] Look at the content of the test table you created when you set the starter kit to 100,000,000 rows... it probably only created a little over 16 million rows because sys.All_Columns only has a little over 4 thousand rows in it (on a new system, anyway).

On the LIKE thing... post your test code. It's the only way we can setup to do a similar test before we can answer your question.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Whitfield SSCrazy Eights Points: 8107 More actions · Answer 5

Jeff Moden (9/17/2009)
I've been under the weather and in bed for the last 18 hours or so...

That doesn't sound like enough time to recover from being under the weather - hot cocoa and bed for you!! 😀

No seriously, hope you feel better mate.

Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 6

Jeff,

I'll post my test code when I get back to work tomorrow morning.

In the meantime, may I suggest a hot toddy: A small glass, a little lemon juice, some water, honey, a shot or 2 of good bourbon, stir, then heat it in the microwave until it's so hot you can barely sip it.

It may not make you well, but I promise you'll feel better and sleep well.

If you're taking Tylenol (acetaminophen), forget the bourbon. The combination is very dangerous to your liver.

Tomorrow,

Lee

Grant Fritchey SSC Guru Points: 398890 More actions · Answer 7

Jeff Moden (9/17/2009)
To the rest of you folks, I just want to say "Thanks" for backing me up. I've been under the weather and in bed for the last 18 hours or so and it's a wonderful thing to return to such kind words. You're the best.

Too much partying on vacation?

Good to have you back.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 8

Matt Whitfield (9/17/2009)
Jeff Moden (9/17/2009)
I've been under the weather and in bed for the last 18 hours or so...
That doesn't sound like enough time to recover from being under the weather - hot cocoa and bed for you!! 😀
No seriously, hope you feel better mate.

Hot scotch and some antihistimines should do it. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 9

crainlee2 (9/17/2009)
Jeff,
I'll post my test code when I get back to work tomorrow morning.
In the meantime, may I suggest a hot toddy: A small glass, a little lemon juice, some water, honey, a shot or 2 of good bourbon, stir, then heat it in the microwave until it's so hot you can barely sip it.
It may not make you well, but I promise you'll feel better and sleep well.
If you're taking Tylenol (acetaminophen), forget the bourbon. The combination is very dangerous to your liver.
Tomorrow,
Lee

Heh... Toddies and I are old friends except I use scotch or a good whiskey.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 10

Grant Fritchey (9/17/2009)
Jeff Moden (9/17/2009)
To the rest of you folks, I just want to say "Thanks" for backing me up. I've been under the weather and in bed for the last 18 hours or so and it's a wonderful thing to return to such kind words. You're the best.
Too much partying on vacation?
Good to have you back.

Nah... I've just been staying up too late and it all caught up with me. It hit me like a tone of bricks. No fever, no runny nose... just massive sleep deprivation. Getting ready to cut some more logs right now...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 11

Jeff Moden (9/17/2009)
Matt Whitfield (9/17/2009)
Jeff Moden (9/17/2009)
I've been under the weather and in bed for the last 18 hours or so...
That doesn't sound like enough time to recover from being under the weather - hot cocoa and bed for you!! 😀
No seriously, hope you feel better mate.
Hot scotch and some antihistimines should do it. 😛

With a combo like that - be sure to say hi to the Pink Elephants that come by:)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004372 More actions · Answer 12

Matt Miller (#4) (9/17/2009)
With a combo like that - be sure to say hi to the Pink Elephants that come by:)

Ohhhhh... THAT's what those are. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 13

crainlee2 (9/17/2009)
Any idea why searching with a LIKE clause on an indexed table would be slower than searching with a LIKE clause on a non-indexed table?
LC

Can you post both execution plans? That might help narrow down the issue.

"Like" and indexes sometimes get along well, and sometimes don't. A lot of it depends on the specifics of the query and the index, and on the stats involved.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 14

Okay, I finally finished my daily server audits and I've got some time to work on this.

Below is my test code. I alternately commented out or un-commented lines to test execution times and change execution conditions. I even added a clustered index this morning to see if it changed execution times. It did not.

The initial table population created 27,772,900 rows.

For my search values, I first created the table, then searched for the last row, and used the contents of the UniqueVarChar field as my search critiera. After that, I experimented with searches.

Search Test Cases:

------------------

1. Search on a FULL UniqueVarChar value with no index on the UniqueVarChar column. - 1 second

2. Search using LIKE on a partial UniqueVarChar value with no index on the UniqueVarChar column. - 8 seconds

3. Search using LIKE on a FULL UniqueVarChar value with no index on the UniqueVarChar column. - 2 seconds

4. Search on a FULL UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 0 seconds

5. Search using LIKE on a partial UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 8 seconds

6. Search using LIKE on a FULL UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 2 seconds

Conclusions:

-----------

1. Indexing the UniqueVarChar column made almost no difference in search times (zero vs. 1 second).

2. Using LIKE with a longer search argument instead of a short search argument made a big difference in search times.

Question: Am I using the correct DBCC statement to clear the cache ("DBCC FREESYSTEMCACHE ('ALL')")?

LC

P.S. If I've got time today, I'll post the execution plans.

________________________

--SELECT TOP 100000000

-- IDENTITY(INT,1,1) AS UniqueInt,

-- CAST(NEWID() AS VARCHAR(36)) AS UniqueVarChar

-- INTO #TestTable

-- FROM Master.sys.All_Columns ac1

-- CROSS JOIN Master.sys.All_Columns ac2

--CREATE CLUSTERED INDEX I1 ON #TestTable (UniqueInt ASC)

--DROP INDEX I2 ON #TestTable

--CREATE NONCLUSTERED INDEX I2 ON #TestTable (UniqueVarChar ASC)

DBCC FREESYSTEMCACHE ('ALL')

USE tempdb

--SELECT TOP 1 * FROM #TestTable ORDER BY UniqueVarChar DESC -- Use this query to locate the last record in the table.

--SELECT * FROM #TestTable WHERE UniqueVarChar = 'FFFFFF28-AE73-4A74-A5ED-577B4822B319'

--SELECT * FROM #TestTable WHERE UniqueVarChar LIKE '%ED-577B4822B3%'

--SELECT * FROM #TestTable WHERE UniqueVarChar LIKE '%FFFFFF28-AE73-4A74-A5ED-577B4822B319%'

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 15

I decided to create a more aggravating set of test conditions to see if I could obtain more definitive results.

> I made the created records much larger, so that only one will fit on a page.

> I created 1,000,000 records.

> I experimented with CHAR vs. VARCHAR for the search datatype.

> I increased the length of the UniqueVarChar column to 50.

> I eliminated the clearing of the SQL Server cache before executing the searches.

> I did not create a clustered index.

Again, below is my test code. I alternately commented out or un-commented lines to test execution times and change execution conditions.

For my search values, I first created the table, then searched for the last row, then alternately used the contents of the UniqueChar and UniqueVarChar columns as my search critiera.

CHAR Search Test Cases:

------------------------

1. Search on a FULL UniqueVarChar value with no index on the UniqueVarChar column. - 2:12 minutes:seconds

2. Search using LIKE on a partial UniqueVarChar value with no index on the UniqueVarChar column. - 2:10 minutes:seconds

3. Search using LIKE on a FULL UniqueVarChar value with no index on the UniqueVarChar column. - 2:10 minutes:seconds

4. Search on a FULL UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 0:00 minutes:seconds

5. Search using LIKE on a partial UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 0:00 minutes:seconds

6. Search using LIKE on a FULL UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 0:00 minutes:seconds

VARCHAR Search Test Cases:

------------------------

1. Search on a FULL UniqueVarChar value with no index on the UniqueVarChar column. - 2:10 minutes:seconds

2. Search using LIKE on a partial UniqueVarChar value with no index on the UniqueVarChar column. - 2:10 minutes:seconds

3. Search using LIKE on a FULL UniqueVarChar value with no index on the UniqueVarChar column. - 2:10 minutes:seconds

4. Search on a FULL UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 0:00 minutes:seconds

5. Search using LIKE on a partial UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 0:00 minutes:seconds

6. Search using LIKE on a FULL UniqueVarChar value with a non-clustered index on the UniqueVarChar column. - 0:00 minutes:seconds

Conclusions:

-----------

1. Indexing the UniqueVarChar column had a tremendous impact on improved search performance.

2. Not clearing the cache is a strong argument for as much RAM as a company can afford for its servers.

3. Using LIKE in a WHERE clause made no difference in search times.

4. There was no difference in search performance between searching on an indexed CHAR or indexed VARCHAR column.

LC

________________________

--SELECT TOP 1000000

-- IDENTITY(INT,1,1) AS UniqueInt,

-- CAST(NEWID() AS CHAR(50)) AS UniqueChar,

-- CAST(NEWID() AS VARCHAR(50)) AS UniqueVarChar,

-- REPLICATE('A',7500) AS Padding

-- INTO #TestTable

-- FROM Master.sys.All_Columns ac1

-- CROSS JOIN Master.sys.All_Columns ac2

--DROP INDEX I2 ON #TestTable

--CREATE NONCLUSTERED INDEX I2 ON #TestTable (UniqueVarChar ASC)

USE tempdb

--SELECT TOP 1 * FROM #TestTable ORDER BY UniqueInt DESC -- Use this query to locate the last record in the table.

--SELECT * FROM #TestTable WHERE UniqueVarChar = 'FC6AC668-45E2-473A-9CBD-2A9126CE536E'

--SELECT * FROM #TestTable WHERE UniqueVarChar LIKE '%BD-2A9126CE53%'

--SELECT * FROM #TestTable WHERE UniqueVarChar LIKE '%FC6AC668-45E2-473A-9CBD-2A9126CE536E%'