March 19, 2009 at 2:13 am
So it's better to leave them like this, non-clustered, than put a clustered index on pk?
I repeat, they are mainly used for joins, so a clustered index on other columns is not necessary.
Though I understand two different things : from Grant I understand that is better to put a clustered index, this way non-clustered indexes have smth to rely on, and from Gila that is better to keep it this way...
As I said before , all my tables have guids as Pk and the join use mainly this columns.
I guess I will leave them this way....to maintain the logic...
Unfortunetly the database is prety slow , and I hoped that rebuilding the indexes an removing this heaps will be the best move.
Thanks again for your help
March 19, 2009 at 5:44 am
I don't think Gail and I are in disagreement here. You need a clustered index on these tables, however, a primary key of GUID is not the best choice, and is even a bad choice. So, either you use the ordered GUID, which is a little better, but not great, or you find another candidate for the cluster. Regardless, you really should get clustered indexes on the tables. SQL Server functionality in data retrieval is largely designed around the clustered index. Without them, you're hurting your system.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 19, 2009 at 6:07 am
shnex, you clearly want a solution that involves you in making minimal changes, which makes clustering the GUID an obvious choice for you.
Gail, Grant, can I ask you why you think a GUID is automatically a bad choice. I'm in complete agreement with you if the fill factor is set to 100%.
However if the fill factor is 90%, you should be able to increase the number of rows very effectively because the GUID is truly random (I assume) and therefore the inserts should spread across the pages in a very regular manner.
So if you increase the number of rows by 10% you should now expect each page to be about 99% full with little splitting because of the distribution.
(This all assumes that the row size allows a reasonable number of rows to be held in each page of course.)
.
March 19, 2009 at 6:25 am
I have several issues with GUID's as clustered indexes. First, they're very wide. This means fewer per page and more pages per index, making them somewhat less efficient. That's a minor issue. I've dealt with wider indexes before. The real issue is because the GUID is completely randomly distributed across the index, even with a lower fill factor, it leads to page splitting. That's an expensive operation in itself and, of course, it causes the index to become fragmented much quicker.
Now, if you use SEQUENTIALID() instead of NEWID(), it offsets alot of the issues, and reduces the random nature of the GUID, but the width is still something of an issue.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 19, 2009 at 6:33 am
Ok man...message received.I will search for other candidates on clustered index, and if I don't find I will put them on guids.That's the situation.Thank you...things are more clear now:-)
March 19, 2009 at 6:39 am
Grant Fritchey (3/19/2009)
I have several issues with GUID's as clustered indexes. First, they're very wide. This means fewer per page and more pages per index, making them somewhat less efficient. That's a minor issue. I've dealt with wider indexes before.
I can't argue with any of that.
Grant Fritchey (3/19/2009)
The real issue is because the GUID is completely randomly distributed across the index, even with a lower fill factor, it leads to page splitting. That's an expensive operation in itself and, of course, it causes the index to become fragmented much quicker.
This doesn't stack up theoretically for me. A truly random insertion would cause less splitting not more providing a sensible fill factor is used. It would also reduce page contention for locks on insert compared to any monotonically increasing key. The only way this would not be true is if NEWID() does not generate a truly random GUID.
Have you seen the fragmentation effect you describe in practice?
.
March 19, 2009 at 7:29 am
Oh yeah. We did a lot of tests to see if we could put GUID to work. It made developers lives easier, but it really seriously impacted the database. I don't have numbers any more, the tests were done a long time ago. I do know that SEQUENTIALID made a huge positive difference.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 19, 2009 at 8:49 am
Now I'm choosing my indexes, so I have another question: I have a bigint column with 90% different values, the duplicate values are rare, but the column is not very used(about 10% of the queries and joins). The fact that the values are most of the time different could be important in choosing the clustered index column? Should I combine the pk column that's a guid with this column and create a clustered index?
March 19, 2009 at 9:21 am
shnex (3/19/2009)
Now I'm choosing my indexes, so I have another question: I have a bigint column with 90% different values, the duplicate values are rare, but the column is not very used(about 10% of the queries and joins). The fact that the values are most of the time different could be important in choosing the clustered index column? Should I combine the pk column that's a guid with this column and create a clustered index?
I'm afraid it's very hard to answer specific questions without knowing what your schema is like.
I prefer a clustered primary key (which is by definition unique) or a UNIQUE clustered index. Usually you shouldn't choose a value which changes as this will physically move the row when you update it.
Although a clustered index improves performance when you are scanning based on the value in it (e.g. orders created between 11:00am and 11:30am) on many occasions they are just used to organise the data, so in a retail system Order Code might make a good one.
There is a hit if they are not declared as UNIQUE in terms of space required and performance as well as making the defragmentation processes take longer and do more logging.
So you are right to look for columns with lots of different values.
.
March 19, 2009 at 9:24 am
Grant Fritchey (3/19/2009)
Oh yeah. We did a lot of tests to see if we could put GUID to work. It made developers lives easier, but it really seriously impacted the database. I don't have numbers any more, the tests were done a long time ago. I do know that SEQUENTIALID made a huge positive difference.
OK. I still think my theory is correct. Sounds like NEWID() isn't as random as it might be.
.
March 19, 2009 at 9:52 am
So....
The table contains person information and it looks like this
column1 uniqueidentifier,
column 2 bigint,
column 3 varchar(30),
...
column 12 decimal(18,2)
from column 3 to 12 there are columns that contain data, int,decimal or char
column2 is a registration code, that could be the same for persons that are related(same family usualy)
column1 is reference for some fk and is used in joins.
there are some sp that use column2 as a filter, or in a join, but they are few.
So, my question was: puting the clustered index on column2(bigint) is beter than on column1(guid)?
March 19, 2009 at 9:55 am
Tim Walker (3/19/2009)
Gail, Grant, can I ask you why you think a GUID is automatically a bad choice.
Experience.
I've had a table with a guid cluster that could go from 0% fragmentation to 99.95% fragmentation in under 10 hours, and that was with a fill factor of 75%. I wasn't willing to drop the fill factor any lower because, as it was, the table took over 300 GB of space. (150 million rows). Performance degraded noticeably (on some rather important queries) once the fragmentation reached about 50%.
My criteria for clustered indexes - narrow, unchanging, unique, ever-increasing. Guids fail miserably on the last point. I don't mind them on slow-changing tables, but on tables with lots of inserts, they tend to be a disaster.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 19, 2009 at 9:58 am
shnex (3/19/2009)
So, my question was: puting the clustered index on column2(bigint) is beter than on column1(guid)?
Maybe. Can you give us the table's actual schema and some of the more common queries on it.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 19, 2009 at 10:07 am
GilaMonster (3/19/2009)
My criteria for clustered indexes - narrow, unchanging, unique, ever-increasing.
That's a good one Gail. Explains everything just in 4 words.:-D
March 19, 2009 at 10:13 am
Krishna (3/19/2009)
GilaMonster (3/19/2009)
My criteria for clustered indexes - narrow, unchanging, unique, ever-increasing.That's a good one Gail. Explains everything just in 4 words.:-D
I can't take credit. Kimberly Tripp[/url]'s been preaching that for years
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 46 through 60 (of 63 total)
You must be logged in to reply to this topic. Login to reply