January 15, 2020 at 6:52 pm
We often need to pull the maximum values for date fields. I thought we would need to index those fields, but without indexes the queries perform relatively well. A couple examples below; note that these are narrow tables, and not having to output all the rows certainly helps, but I'm still surprised at the performance.
10 million rows: < 1 second
300 million rows: 20 seconds
January 15, 2020 at 8:30 pm
So, is this just an observation or do you actually have a question here?
January 15, 2020 at 8:37 pm
Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.
January 15, 2020 at 8:51 pm
Sorry, I am curious as to how it can do this and what factors influence it; I would like to know when indexes are needed for these types of queries.
Are you able to share the actual execution plan (not just a graphic but the actual saved file). It's the only way we might be able to explain what's happening.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 15, 2020 at 10:08 pm
I suppose it depends on your table definitions and the query you are using to "pull maximum values of date fields". That could mean a number of different things.
If for example, you have an OrderStatus table with a row for each OrderID for each time the status changes, the speed will depend on how many rows per OrderID you have.
January 15, 2020 at 10:14 pm
looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.
January 15, 2020 at 10:45 pm
looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.
It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.
Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
January 15, 2020 at 10:57 pm
Chris Harshman wrote:looking at your execution plan, 9 million rows at 15 bytes per row is not very large, so it was able to read the entire table and do the table scan.
It also is parallel across 8 CPUs and took a little over 1.2 seconds per thread and almost 3 seconds of total CPU time.
Adding a clustered index (doesn't appear this table has a clustered index) would allow for a clustered index scan and should reduce the query to quite a bit less - for even much larger tables.
Create the clustered index on that column, which reduces the I/O to a single seek, as fast as it gets.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 15, 2020 at 11:01 pm
Thanks everyone. I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different. When I do the 300 million row table I'll do a before and after test.
So the actual row size is about 185 bytes. I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?
Edit -- posts crossed in the mail -- That column is not in the primary key. Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary. Doesn't appear to be on tables this size, we'll see about the larger one.
January 16, 2020 at 7:02 am
Thanks everyone. I have added a clustered primary key in our test environment but can't compare the performance as the configuration is totally different. When I do the 300 million row table I'll do a before and after test.
So the actual row size is about 185 bytes. I read that fixed length columns (such as this date) are stored at the beginning of the row however; does it only have to retrieve the first part of the row?
Edit -- posts crossed in the mail -- That column is not in the primary key. Sure, any index on that column would help, but it can't be the clustering key so I'm trying to see if it's really necessary. Doesn't appear to be on tables this size, we'll see about the larger one.
Clustered index doesn't have to be the primary key and sometimes it is better that way. Really depends on the access patterns of the tables.
January 16, 2020 at 3:38 pm
Adding to Lynn's post, I will say that having a UNIQUE Clustered Index is usually better because it gets used in all Non-Clustered indexes, as well.
To wit, I do have tables that I've assigned the Clustered Index to a DATETIME column but I've added the IDENTITY column as a secondary column in the definition of the Clustered Index as a "uniquifier/temporal tie breaker" to be able to force uniqueness.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2020 at 4:03 pm
Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.
But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 16, 2020 at 4:15 pm
That being said, I'd love to see the schema of the table and what the real use of finding the max date would be for this table.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 16, 2020 at 4:24 pm
Given that the table had no index at all, I didn't think in this case that making the clus index unique was necessary, particularly to the extent of adding the overhead of an identity column. SQL can very quickly seek to find the highest key value whether it's unique or not.
But, to help cut down on dup key values, make the underlying column datetime2 rather than datetime.
Also, for single row inserts, it will certainly "help" cut down down on dup key values but it will not eliminate them. Also, if you use something even like SYSDATETIME ( ), ALL of the rows inserted in a single batch are guaranteed to have exactly the same date and time.
If you need the clustered index to be unique, just having a single temporal column isn't going to do it... not even DATETIME2(7).
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 30 total)
You must be logged in to reply to this topic. Login to reply