July 10, 2008 at 5:44 am
A question for the experts...
Does using an identity column as a surrogate key result in an imbalanced index?
e.g. 1001,1002,1003,1004 {...} appears to give poor selectivity,
so if you reverse the values to give 1001, 2001, 3001,4004{...}
does it improve selectivity and performance in 2005?
Regards
George.
July 10, 2008 at 6:44 am
From my own experience and everything I've read, no, the order doesn't change the selectivity. It will affect sorting speed. If you need to retreive the data in a particular order all the time, I'd make sure the index for that data is in that same order.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 10, 2008 at 8:39 am
Thanks,
I didn't think so , just some due dilligence.
I guess it'd screw range scans too.
Regards
George
July 11, 2008 at 9:43 am
g.brennan (7/10/2008)
A question for the experts...Does using an identity column as a surrogate key result in an imbalanced index?
e.g. 1001,1002,1003,1004 {...} appears to give poor selectivity,
so if you reverse the values to give 1001, 2001, 3001,4004{...}
does it improve selectivity and performance in 2005?
Regards
George.
Assuming you are using some form of integer as identity, you have a misconception about actual byte storage in the low-level workings of a computer. Recall that integer is 4 bytes, which break down to 32 bits of 1s and 0s, which the computer is VERY VERY efficient at processing. The numbers you present are just human representations.
Also note that selectivity of your two examples is exactly the same - 1/n - right?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 11, 2008 at 9:51 am
Since an identity column is, generally, unique per row, it already has as high a selectivity as is possible. How would converting it to a string, then reversing it, result in more selectivity?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 13, 2008 at 12:50 pm
Assuming you are using some form of integer as identity, you have a misconception about actual byte storage in the low-level workings of a computer. Recall that integer is 4 bytes, which break down to 32 bits of 1s and 0s, which the computer is VERY VERY efficient at processing. The numbers you present are just human representations.
Also note that selectivity of your two examples is exactly the same - 1/n - right?
No misconception on either count but selectivity is probably the wrong word, so much for trying to be brief...;).
A number will be processed into the tree based based on it's comparative value, if there are 1 million index values all begining with '1' then I was basically asking does the the index tree become imbalanced over time, requiring reindexing and would reversing the key assist.
The brief answer is no it wouldn't, but it would screw up index range scans so not a good idea.
George.
July 13, 2008 at 4:04 pm
I still don't understand why you think range scans would be "screwed up" if you have a million integer identity values from 1000000 to 1999999. Actually I am not really sure what the technical term "screwed up" is supposed to mean here either! 😉
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 14, 2008 at 1:10 am
TheSQLGuru (7/13/2008)
I still don't understand why you think range scans would be "screwed up" if you have a million integer identity values from 1000000 to 1999999. Actually I am not really sure what the technical term "screwed up" is supposed to mean here either! 😉
I'm assuming that with a reversed key, if you read an index page from disk the reversed values wouldn't be sequential e.g. key 1001 would is very unlikely to be on the same page as 1002. Whereas a normal index 1001,1002 etc would, with a high probability, be on the same page
July 14, 2008 at 3:56 am
reversing would not make any difference as you'll get just as many starting with 1, 2, 3, 4 etc. ( no I'm not that sad to sit down and work it out exactly ! ) as every 10 values the end digit will be repeated, as in 04, 14,24,34,44,45
Anyway as the index considers the whole number ( assuming it's an int) not just the leading digit the index will be highly selective. 1 million is fairly small all things considered, I've worked with several hundred million row tables with identity or sequential integer values without any issue.
It's an interesting thought btw.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
July 14, 2008 at 7:31 am
g.brennan (7/13/2008)
Assuming you are using some form of integer as identity, you have a misconception about actual byte storage in the low-level workings of a computer. Recall that integer is 4 bytes, which break down to 32 bits of 1s and 0s, which the computer is VERY VERY efficient at processing. The numbers you present are just human representations.
Also note that selectivity of your two examples is exactly the same - 1/n - right?
No misconception on either count but selectivity is probably the wrong word, so much for trying to be brief...;).
A number will be processed into the tree based based on it's comparative value, if there are 1 million index values all begining with '1' then I was basically asking does the the index tree become imbalanced over time, requiring reindexing and would reversing the key assist.
The brief answer is no it wouldn't, but it would screw up index range scans so not a good idea.
George.
You could achieve this by converting the number to a set of char(1) values, then indexing them in reverse order. Otherwise, no, it doesn't work that way for numeric indexes. Does for string data (well, sort of), but not for numbers.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 14, 2008 at 8:12 am
Keep in mind that you may want the "imbalance" you're talking about performance-wise. (The imbalance does happen somewhat, I just think you're not 100% on your description, since it doesn't deal with the decimal representation of the numbers, but the binary version)
A clustered index that would keep the tree fairly balanced at all times would be one where you'd have your inserts spread over the entire spectrum of the table, which means page splits. Page splits can be VERY costly if you don't keep them under control: having a sequential ID as your clustered key can definitely help with avoid that (since inserts will happen "at the end").
The B-Tree maintenance is a much lower cost in this case IMO.
Still - like Kevin pointed out, on a million rows, that really shouldn't be much of a factor.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply