Max function and indexes

Question

Post reply

Max function and indexes

Craig Smith

SSChasing Mays

Points: 600
More actions
January 15, 2020 at 6:52 pm

#3715763
We often need to pull the maximum values for date fields. I thought we would need to index those fields, but without indexes the queries perform relatively well. A couple examples below; note that these are narrow tables, and not having to output all the rows certainly helps, but I'm still surprised at the performance.
10 million rows: < 1 second
300 million rows: 20 seconds
- This topic was modified 4 years, 11 months ago by Craig Smith.

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

So, is this just an observation or do you actually have a question here?

Craig Smith SSChasing Mays Points: 600 More actions · Answer 2

Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.

Jeff Moden SSC Guru Points: 1004433 More actions · Answer 3

Craig Smith wrote:

Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.

Are you able to share the actual execution plan (not just a graphic but the actual saved file). It's the only way we might be able to explain what's happening.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Craig Smith SSChasing Mays Points: 600 More actions · Answer 4

Execution plan is attached.

Attachments:

You must be logged in to view attached files.

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 5

I suppose it depends on your table definitions and the query you are using to "pull maximum values of date fields". That could mean a number of different things.

If for example, you have an OrderStatus table with a row for each OrderID for each time the status changes, the speed will depend on how many rows per OrderID you have.

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 6

looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

Jeffrey Williams SSC Guru Points: 90313 More actions · Answer 7

Chris Harshman wrote:

looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.

It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.

Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

ScottPletcher SSC Guru Points: 101130 More actions · Answer 8

Jeffrey Williams wrote:

Chris Harshman wrote:
looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.
It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.
Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.

Create the clustered index on that column, which reduces the I/O to a single seek, as fast as it gets.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Craig Smith SSChasing Mays Points: 600 More actions · Answer 9

Thanks everyone. I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different. When I do the 300 million row table I'll do a before and after test.

So the actual row size is about 185 bytes. I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?

Edit -- posts crossed in the mail -- That column is not in the primary key. Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary. Doesn't appear to be on tables this size, we'll see about the larger one.

This reply was modified 4 years, 11 months ago by Craig Smith.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Craig Smith wrote:

Thanks everyone. I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different. When I do the 300 million row table I'll do a before and after test.
So the actual row size is about 185 bytes. I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?
Edit -- posts crossed in the mail -- That column is not in the primary key. Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary. Doesn't appear to be on tables this size, we'll see about the larger one.

Clustered index doesn't have to be the primary key and sometimes it is better that way. Really depends on the access patterns of the tables.

Jeff Moden SSC Guru Points: 1004433 More actions · Answer 11

Adding to Lynn's post, I will say that having a UNIQUE Clustered Index is usually better because it gets used in all Non-Clustered indexes, as well.

To wit, I do have tables that I've assigned the Clustered Index to a DATETIME column but I've added the IDENTITY column as a secondary column in the definition of the Clustered Index as a "uniquifier/temporal tie breaker" to be able to force uniqueness.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101130 More actions · Answer 12

Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.

But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004433 More actions · Answer 13

That being said, I'd love to see the schema of the table and what the real use of finding the max date would be for this table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004433 More actions · Answer 14

ScottPletcher wrote:

Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.
But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.

Also, for single row inserts, it will certainly "help" cut down down on dup key values but it will not eliminate them. Also, if you use something even like SYSDATETIME ( ), ALL of the rows inserted in a single batch are guaranteed to have exactly the same date and time.

If you need the clustered index to be unique, just having a single temporal column isn't going to do it... not even DATETIME2(7).

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)