January 17, 2011 at 7:37 am
I often have to select the largest set of numbers from a table that has data like:
A 1
A 4
A 6
B 2
B 5
B 9
C 1
C 6
C 8
...
I want
A 6
B 9
C 8
...
The letter/number combinations are unique, and I have a non-clustered, ascending index covering the two fields, letter then number. Some letters have only a few numbers, some have around 20,000. Can I do any better by,
A. Adding an index on JUST the number field?
B. Specifying the number field in the index in descending order, so that the largest is found first? (I have little interest in the smallest number, but I need the largest quite frequently.)
Here is the query that retrieves what I need from this table:
SELECT EvidenceLetter, MAX(EvidenceNumber) AS MaxOfEvidenceNumber
FROM dbo.Podrobnosti
GROUP BY EvidenceLetter
January 17, 2011 at 7:59 am
A: No
B: Maybe, test and see. Doubt it will be a major difference.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 17, 2011 at 8:28 am
Tend to agree with Gail. You'll have to test.
Depending on how many unique vales and the size of the table, you may or may not get better performance with the index. Not sure ascending or descending matters if you are pulling from a wide range.
January 17, 2011 at 9:07 am
GilaMonster (1/17/2011)
A: NoB: Maybe, test and see. Doubt it will be a major difference.
You were right. I tried both and almost no difference. Seems a little odd to me, the reverse ordering I would have thought should at least cut down on the index reads. Although, now that I consider it more carefully, the index, as constructed, provides no guarantee that the largest number will be hit early, even when the order is reversed. The index consists of both the letter and number, and that combination may distributed with no regard for the grouping of letters.
I don't suppose there is any way to force the index to take account of some sort of grouping by the first field, is there?
January 17, 2011 at 10:49 am
pdanes2 (1/17/2011)
You were right. I tried both and almost no difference. Seems a little odd to me, the reverse ordering I would have thought should at least cut down on the index reads.
No, because it's going to scan the index regardless. SQL has no other way to go a max with a group by when there's no where clause.
It can't seek for the max value for each EvidenceLetter because it doesn't know, without reading the index, what the different values of EvidenceLetter are and, even if it did a seek for each one would likely be more expensive than a scan of the whole index.
If you have an index scan and stream aggregate (which you should) it's about as efficient as this is going to get.
I don't suppose there is any way to force the index to take account of some sort of grouping by the first field, is there?
Err, not sure what you're asking here. Indexes are sorted by the index keys
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 17, 2011 at 12:44 pm
quote]GilaMonster (1/17/2011)
pdanes2 (1/17/2011)
Seems a little odd to me, the reverse ordering I would have thought should at least cut down on the index reads.
No, because it's going to scan the index regardless. SQL has no other way to go a max with a group by when there's no where clause.
WHERE clause? You've lost me here. Do you mean when the query contains a WHERE clause, it would hunt through the index more efficiently? Even if so, I doubt that I would profit from repeated individual interrogation by letter over letting SQL Server figure it out on its own. You said yourself that a seek for each one would probably be more expensive than an index scan, so individual requests from me would certainly be more so.
If you have an index scan and stream aggregate (which you should) it's about as efficient as this is going to get.
I checked the query plan and it does show a stream aggregate element, cost 0%. I didn't do anything special to make that happen, so I imagine it's a default setting. Not sure what it means, I've just done a quick lookup and found several interesting tutorials, but haven't read them yet - I'm trying to get a few things done before I have to leave the building, so I'll read them at home tonight.
In any case, it's not so much about this one query; that already runs very quickly. It's more about trying to get a firm handle on principles while I have something very simple and understandable on which to experiment, so I won't be completely out to lunch when I get involved in troubleshooting something more complicated.
I don't suppose there is any way to force the index to take account of some sort of grouping by the first field, is there?
Err, not sure what you're asking here. Indexes are sorted by the index keys
I probably didn't explain it very well, and I doubt very much it's possible anyway, but...
What I meant was forcing the index to store locations of letter field changes, even if the index page was not full yet. It would see, when letter changes from A to B, that it should store that new, highest (or lowest) number in the B category, rather than some number further down in the B range that's not an endpoint of the B range, becuse that's where the indexing routine sees as the next place to mark. That would make certain that the endpoints (min or max, depending on ordering) were always at the topmost level of the index, guaranteeing min or max queries would alway be resolved at the top level of the index.
January 17, 2011 at 1:39 pm
pdanes2 (1/17/2011)
You've lost me here. Do you mean when the query contains a WHERE clause, it would hunt through the index more efficiently?
No, I'm saying that since there's no where clause, nothing limiting the data, you're going to always get a scan of some form. Index scan, clustered index scan or table scan
I checked the query plan and it does show a stream aggregate element, cost 0%. I didn't do anything special to make that happen, so I imagine it's a default setting.
You did do something special. You put an index that supports the grouping. More advanced indexing than basics, but still...
In any case, it's not so much about this one query; that already runs very quickly. It's more about trying to get a firm handle on principles while I have something very simple and understandable on which to experiment, so I won't be completely out to lunch when I get involved in troubleshooting something more complicated.
You asked...
http://www.sqlservercentral.com/articles/Indexing/68439/
http://www.sqlservercentral.com/articles/Indexing/68563/
http://www.sqlservercentral.com/articles/Indexing/68636/
http://sqlinthewild.co.za/index.php/category/sql-server/indexes/ :hehe:
(Yes, all my writing, so if you want an alternate opinion be sure to search around. I don't have other links handy)
What I meant was forcing the index to store locations of letter field changes, even if the index page was not full yet. It would see, when letter changes from A to B, that it should store that new, highest (or lowest) number in the B category, rather than some number further down in the B range that's not an endpoint of the B range, becuse that's where the indexing routine sees as the next place to mark.
I think you may be under a misconception as to how indexes work. With your index on EvidenceLetter, EvidenceNumber, ALL rows of the table are reflected in the index.
To get what you're asking for you need another table with something to keep it in sync or an indexes view. Either way, specialised techniques, not general options.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 17, 2011 at 5:40 pm
pdanes2 (1/17/2011)
You've lost me here. Do you mean when the query contains a WHERE clause, it would hunt through the index more efficiently?
No, I'm saying that since there's no where clause, nothing limiting the data, you're going to always get a scan of some form. Index scan, clustered index scan or table scan
Oh, I see. I sort of thought that the MAX function and Group By clause would be enough hints to act similarly to a WHERE clause, but maybe that's unrealistic.
I do have another table with just the letters - I wonder what would happen if I did a join to the letter table and then asked for the max number of each letter set. Maybe, instead of a scan, it would look up each max number directly based on the small number of entries in the letter table...? I'll give it a try tomorrow and see what happens.
You asked...
I did indeed ask and thank you for the links. I've already spent a good bit of time poking around your site, but sometimes it's difficult to even know which questions to ask - it never hurts to have the author point out which specific articles may apply to a particular problem.
What I meant was forcing the index to store locations of letter field changes, even if the index page was not full yet. It would see, when letter changes from A to B, that it should store that new, highest (or lowest) number in the B category, rather than some number further down in the B range that's not an endpoint of the B range, becuse that's where the indexing routine sees as the next place to mark.
I think you may be under a misconception as to how indexes work. With your index on EvidenceLetter, EvidenceNumber, ALL rows of the table are reflected in the index.
That's certainly possible and it wouldn't be the first such, but I don't think I'm too far off the mark here. I do have some experience with databases, just not SQL Server.
In all the engines I've ever worked with, the index is not just a simple ordered list of key values, but a multi-layered structure, where each node in a layer is an entry pointer to some group of nodes in the layer below. Layers are built to as many levels as necessary until the top node is some minimum number of entries - sometimes just one entry, sometimes one 'page', one 'cluster', one 'bucket' or whatever the designers decide to call it. This allows a query to examine a layer and from the info therein, jump immediately down the the correct point in the next lower layer in the index instead of scanning the entire dataset. Certainly SQL Server has some such mechanism, even if the details are different from other implementations - that's pretty much the whole point of having an index. What I was wondering was if there was a way to force specific entries (the biggest in the number field, in this case) to be present in the uppermost layer, and thereby be automatically read in during the first step of examining the index, rather then wherever it happens to land based on fill factor, cluster size and who knows what all other factors.
I don't really expect that it can be done, it's a pretty esoteric property and I've never heard of an engine that gave users that much control over building indexes, but software gets better all the time and it doesn't hurt to ask.
To get what you're asking for you need another table with something to keep it in sync or an indexes view. Either way, specialised techniques, not general options.
I do have some indexed views in place, designed according to posts and articles by you and others, but I'm still not very good at reading query plans, so I'm not always sure if what I've done has helped or not. But overall, my app is constantly running faster and crashing less, and I'm getting a better feel for what I'm doing, so there is progress.
Thanks for the explanations.
January 17, 2011 at 10:51 pm
pdanes2 (1/17/2011)
I do have another table with just the letters - I wonder what would happen if I did a join to the letter table and then asked for the max number of each letter set. Maybe, instead of a scan, it would look up each max number directly based on the small number of entries in the letter table...? I'll give it a try tomorrow and see what happens.
It won't be more efficient. By adding the second table you'll add to the execution plan a second index scan and a join. Seeks are not always the most efficient way to execute a query. Reading the entire table requires reading the entire table, so if you managed to get seeks, you'd have the same or more likely higher page read count and likely a higher CPU.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 18, 2011 at 9:10 am
Don't forget that for small values, like those you have with letters, a single page will have thousands (potentially) of entries on it. Since SQL Server will read the entire page, a scan of the index will be more efficient than a seek since you just grab the entire index in a page or two.
So while it seems on a small scale that surfacing more information in a second table or bringing it up in the index tree is more efficient, it's not. The scale doesn't go down that low.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply