Question about index

  • I was asked this question:

    There is a table calls [Runner], which is a table wih 10 million rows, with 3 columns

    [Runner].[UserName]

    [Runner].[ZipCode]

    [Runner].[Mileage]

    There is an cluster index on the [Runner].[Mileage] column

    Whats problem with the following query and how to improve it?

    SELECT [UserName], [ZipCode] FROM [Runner]

    WHERE FUNCAveMile([Mileage]) = 50

    FUNCAveMile() is a certain function.

    Thanks.

  • The problem is the FUNCAveMile(), that is need to be run for every row in table. What does it do?

  • There's no way to index this that will improve performance. So, you could modify the structure and store a computed column instead of the FUNCAveMile function, which you can index. Then you would see performance improvements.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • You can also create a persisted computed column for FUNCAveMile([Mileage]) and index that column.  This way you won't even have to modify the query.  Of course if you can just modify the code, there is a good chance that it will be better

    Adi

  • all the guys are right on this. if you perform a function on every row in a where clause you will die very quickly - you can't use a function in a persisted computed column.. but you can sure as heck make some improvements by taking  90% of the function and persisting it

    I love persisted computed columns - they save my systems so many times

    MVDBA

  • Thanks guys for the suggestion.

    The other way I can think of is put that pre-comput column into a temp table (with the other columns).

    Then I create an index for the temp table and query from it.

  • You could. Depending on what you're doing, that's a lot of data movement

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey wrote:

    You could. Depending on what you're doing, that's a lot of data movement

    grant has hit the magic words - data movement

    write once , read many - don't keep writing to a temp table every time you need to read - just sort it out at write time (once)

    MVDBA

  • This actually sounds like a pretty cool interview question for anyone that claims to have "years of performance tuning expertise" on their resume.

    First, I'd have to explain how the code doesn't produce what I think they want as a correct answer to begin with because there is no correlation between what the function returns and the name of the runner is.  We MUST know what the function is doing first.  And that might actually be the root of the question, if it's an interview question.

    If they were to say that it's by runner name, then ...

    If no schema changes to the table or the clustered index were allowed, I guess I'd ask how many times new data would be added in comparison to how often the results of such a query would be needed.  If the query isn't needed that often, then I'm thinking that the quick answer would be to make a single AVG() pass on the table using a GROUP BY and a HAVING.

    If the usage is going to be a lot compared to the inserts, then creating an indexed view to pre-aggregate and materialize the results of an average calculation might be the way to go.  If the latency of the data could be, say, 1 day, I'd might be tempted to setup a pre-aggregated table that would be updated once per day so that the indexed view wouldn't slow down any inserts or updates.

    I'd have to check the function to know whether or not I could use Adi's idea of an indexed persisted computed column.

    Another option might be to upgrade to SQL Server 2019 and see what happens.  It would be interesting but I'd rather have the guarantee of an indexed view or pre-aggregated table.

    No matter the method, though, I'd definitely explain why that function had to go.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jeff - indexed views, are they enterprise edition only? -I've stopped tracking features per version, but standard edition is so much cheaper that our finance department shout at us when we use enterprise features. 🙁

    perhaps the OP has standard?

    Jeff has some very valid points, but my gut says computed column

    MVDBA

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply