Question about index

Question

Post reply

Question about index

runningdad

Valued Member

Points: 63
More actions
November 26, 2019 at 7:38 pm

#3701205

I was asked this question:
There is a table calls [Runner], which is a table wih 10 million rows, with 3 columns
[Runner].[UserName]
[Runner].[ZipCode]
[Runner].[Mileage]
There is an cluster index on the [Runner].[Mileage] column
Whats problem with the following query and how to improve it?
SELECT [UserName], [ZipCode] FROM [Runner]
WHERE FUNCAveMile([Mileage]) = 50
FUNCAveMile() is a certain function.
Thanks.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

jonas.gunnarsson 52434 Ten Centuries Points: 1240 More actions · Answer 1

The problem is the FUNCAveMile(), that is need to be run for every row in table. What does it do?

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 2

There's no way to index this that will improve performance. So, you could modify the structure and store a computed column instead of the FUNCAveMile function, which you can index. Then you would see performance improvements.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Adi Cohn SSC-Addicted Points: 425 More actions · Answer 3

You can also create a persisted computed column for FUNCAveMile([Mileage]) and index that column. This way you won't even have to modify the query. Of course if you can just modify the code, there is a good chance that it will be better

Adi

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 4

all the guys are right on this. if you perform a function on every row in a where clause you will die very quickly - you can't use a function in a persisted computed column.. but you can sure as heck make some improvements by taking 90% of the function and persisting it

I love persisted computed columns - they save my systems so many times

MVDBA

runningdad Valued Member Points: 63 More actions · Answer 5

Thanks guys for the suggestion.

The other way I can think of is put that pre-comput column into a temp table (with the other columns).

Then I create an index for the temp table and query from it.

Grant Fritchey SSC Guru Points: 398913 More actions · Answer 6

You could. Depending on what you're doing, that's a lot of data movement

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 7

Grant Fritchey wrote:

You could. Depending on what you're doing, that's a lot of data movement

grant has hit the magic words - data movement

write once , read many - don't keep writing to a temp table every time you need to read - just sort it out at write time (once)

MVDBA

Jeff Moden SSC Guru Points: 1004436 More actions · Answer 8

This actually sounds like a pretty cool interview question for anyone that claims to have "years of performance tuning expertise" on their resume.

First, I'd have to explain how the code doesn't produce what I think they want as a correct answer to begin with because there is no correlation between what the function returns and the name of the runner is. We MUST know what the function is doing first. And that might actually be the root of the question, if it's an interview question.

If they were to say that it's by runner name, then ...

If no schema changes to the table or the clustered index were allowed, I guess I'd ask how many times new data would be added in comparison to how often the results of such a query would be needed. If the query isn't needed that often, then I'm thinking that the quick answer would be to make a single AVG() pass on the table using a GROUP BY and a HAVING.

If the usage is going to be a lot compared to the inserts, then creating an indexed view to pre-aggregate and materialize the results of an average calculation might be the way to go. If the latency of the data could be, say, 1 day, I'd might be tempted to setup a pre-aggregated table that would be updated once per day so that the indexed view wouldn't slow down any inserts or updates.

I'd have to check the function to know whether or not I could use Adi's idea of an indexed persisted computed column.

Another option might be to upgrade to SQL Server 2019 and see what happens. It would be interesting but I'd rather have the guarantee of an indexed view or pre-aggregated table.

No matter the method, though, I'd definitely explain why that function had to go.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 9

jeff - indexed views, are they enterprise edition only? -I've stopped tracking features per version, but standard edition is so much cheaper that our finance department shout at us when we use enterprise features. 🙁

perhaps the OP has standard?

Jeff has some very valid points, but my gut says computed column

MVDBA