June 28, 2012 at 4:12 am
Situation.
Indexes for daily use:
Clustered index on the identity.
The main use during reading will be on this identity.
Extra index on the 'logical key'
This will be used a lot during the load of the data, for resolving relations, the identity is used afterwards.
This can be used afterwards for the reading of the data, but the main use wil be on identiy.
Actuel field:
Actual_indication, can hold two values: Y and N.
This field is only of importance during the load.
Only actual rows are loaded or updated.
Actual rows can change from Y to N during to load, a N never changes back to a Y.
After they change to N, the load does not change them anymore.
I am not familliar yet with the 'numbers' and timings. So do not know if performance issues will occure.
But I have a question.
During the load in the where clause there is a selection (or join) on the logical key and the row must be actual (Y).
What would be the effect of adding the actual field to the clustered index?
I thought about this. So here is some of my thoughts:
1. By NOT adding the actual field to the Extra index, this index does not need to be reorganised when the actual is changed from Y to N.
2. By adding the actual field to the clustered identity index, changing from Y to N does give a change in de the index, but the index does not need reorganising, because the identity is unique and therefore the change can be done in place.
3. By having the actual in the clustered index, although the Extra index has to be travelled through, it can use the selection on the actual field as wel, because it is in the clustered index. So the combined indexes wil result in the actuel rows only. (Is the true ?).
So my reasoning is that this is fairly effective, because although the index can change, even the clustered index, all updates of the index can be done IN PLACE. So we have the advantage of having the extra field in de index, and the advantage that we have only IN PLACE changes in the index. So updates of the actual field can be done efficiently.:w00t:
So this solutions looks effective to me, but this it work this way?
(Is there a difference between 2008/2005 with this construction?).
Thanks for your time and attention.
Ben Brugman
June 28, 2012 at 6:14 am
When doing updates to index keys, SQL does not do them as inplace updates, they will be converted into a delete/insert pair. This is why one of the guidelines for chosing a clustered index key is that the column are never updated.
I don't see any use in adding a second column to the clustered index. It makes it wider and won't be seekable as a second column unless you're filtering on the ID column as well. If you need that column as part of a nonclustered index, specify it as part of the nonclustered index.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 28, 2012 at 10:25 am
During the load in the where clause there is a selection (or join) on the logical key and the row must be actual (Y).
If the logical key nonclus index needs only the "Y" values, because all processes you expect to use this index require the value to be "Y", you should consider making it a filtered index and including only the "Y" values. That should increase the chance of the non-clus index being used.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 29, 2012 at 1:30 am
GilaMonster (6/28/2012)
When doing updates to index keys, SQL does not do them as inplace updates, they will be converted into a delete/insert pair. This is why one of the guidelines for chosing a clustered index key is that the column are never updated.I don't see any use in adding a second column to the clustered index. It makes it wider and won't be seekable as a second column unless you're filtering on the ID column as well. If you need that column as part of a nonclustered index, specify it as part of the nonclustered index.
Thank you, 'won't be seekable' does exactly answer my question.
My reasoning although technically possible won't happen, thank you for increasing my understanding of indexes.
Ben Brugman
ScottPletcher (6/28/2012)
If the logical key nonclus index needs only the "Y" values, because all processes you expect to use this index require the value to be "Y",.....
Only during the load of data the "Y" field is accessed (write/read/update). The size of it (char(1)) is not problem, so we have to test if loading is faster with or without the field.
As said it is not hugely important, but the anwsers do extend my knowledge of indexes, which is valuable, thanks.
Ben Brugman
June 29, 2012 at 2:04 am
You may want to take a read through these:
http://www.sqlservercentral.com/articles/Indexing/68439/
http://www.sqlservercentral.com/articles/Indexing/68563/
http://www.sqlservercentral.com/articles/Indexing/68636/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 29, 2012 at 6:23 am
GilaMonster (6/29/2012)
You may want to take a read through these:http://www.sqlservercentral.com/articles/Indexing/68439/
Thanks for providing the links.
I do understand indexes, both clustered and heap.
I do not always understand behavior of the optimizer combined with the indexes. Your anwser did provide new knowledge. I did check the above pages (quickly) and your anwser was an addition to these pages.
Technically it would be possible to use the information from the clustered key in addition to the extra key, while not using the first part of the clustered key. But you described that this will not happen in the current implementation of SQLserver.
And for some complex situations, for example using an IN construction in the where clause I do not know what the behavior is of the optimizer is. ClusteredKey in ('OneValue') or the situation ClusteredKey in ('value1','value2','value3').
Thanks for your anwsers,
Ben Brugman
June 29, 2012 at 6:35 am
ben.brugman (6/29/2012)
Technically it would be possible to use the information from the clustered key in addition to the extra key, while not using the first part of the clustered key.
Only as a scan, not a seek. That would be like looking someone up in the phone book by first name. Since the phone book is ordered by surname, you can't use the first name only unless you read the entire phone book and check every entry
And for some complex situations, for example using an IN construction in the where clause I do not know what the behavior is of the optimizer is. ClusteredKey in ('OneValue') or the situation ClusteredKey in ('value1','value2','value3').
Exactly the same as with an equality.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 29, 2012 at 6:47 am
GilaMonster (6/29/2012)
ben.brugman (6/29/2012)
Technically it would be possible to use the information from the clustered key in addition to the extra key, while not using the first part of the clustered key.Only as a scan, not a seek. That would be like looking someone up in the phone book by first name. Since the phone book is ordered by surname, you can't use the first name only unless you read the entire phone book and check every entry
And for some complex situations, for example using an IN construction in the where clause I do not know what the behavior is of the optimizer is. ClusteredKey in ('OneValue') or the situation ClusteredKey in ('value1','value2','value3').
Exactly the same as with an equality.
Has this changed?
Because in the OLDEN DAYS, an IN could result in bad performance. From memory I remember that doing selections with UNIONS between single value selections could deliver far better results than the IN construct. (A Union between OneValue, TwoValue and Threevalue, gave much better performance than the IN Construction).
This is from memory, could be from SQLserver 2000 or 2005, do not remember.
Ben
June 29, 2012 at 8:34 am
Some behaviour likely changed between SQL 6.5 and SQL 7. Nothing more recent than that though.
WHERE Column IN ('Value1') is completely equivalent to WHERE Column = 'Value1' and WHERE Column IN ('V1', 'V2','V3') is treated exactly like WHERE Column = 'V1' OR Column = 'V2' OR Column = 'V3' (and in fact is converted to the OR format during parsing)
Assuming there's an appropriate index, that'll be a seek that executes once for the first example, three times for the second.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 29, 2012 at 8:59 am
GilaMonster (6/29/2012)
Some behaviour likely changed between SQL 6.5 and SQL 7. Nothing more recent than that though.WHERE Column IN ('Value1') is completely equivalent to WHERE Column = 'Value1' and WHERE Column IN ('V1', 'V2','V3') is treated exactly like WHERE Column = 'V1' OR Column = 'V2' OR Column = 'V3' (and in fact is converted to the OR format during parsing)
Assuming there's an appropriate index, that'll be a seek that executes once for the first example, three times for the second.
Thank you,
I am sure to have seen the behaviour I described, but can not reproduce it.
My memory does not hold all information.
But I am fairly sure my memory is not making this up (yet).
Haven't been using 7 for years and 6.5 only for a very short time, so those two can not be the situation were I observed the behavior. (I tested the situation multiple times, with timing and resetting the cache at certain points. But as said my memory fails me with the details).
Up to your anwser I considered the IN or the OR construction as constructions to be avoided, now I have to re-evaluate that.
Thanks for educating me,
Ben Brugman
June 29, 2012 at 9:01 am
SQL 2000 often showed poor behaviour with this form of query:
WHERE Col1 = A OR Col2 = B OR Col3 = C
Ors with multiple columns. I don't recall any problems with IN or single-column ORs.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 29, 2012 at 11:20 am
FYI:
The query plans in 2000, 2005 and 2008R2 all appear to be the same regardless if I use IN (x,y,z) OR use or X or Y or Z.
Using UNIONs to accomplish the same is about a 30% increase in cost, as expected.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply