slow query on a partitioned table

Question

Post reply

slow query on a partitioned table

ghostman

Old Hand

Points: 327
More actions
July 25, 2020 at 3:01 pm

#3774729
I have a huge table with 3B of rows containing around 6 months of data running on Microsoft SQL server 2017 (RTM), which is partitioned day by day on a date column (each day on a separate filegroup and each filegroup has 1 data file).
this table has an identity field which is bigint.
I have two indexes : a clustered index on date and id a non clustered index on date
I'm trying to run the following query :
```
 select top 500000 * from table with(nolock) where id>@certain_id order by id
```
but the query is taking a lot of time. I tried to create an non clustered index on id field , but nothing changes!
the weird part I was able to run the same query with no issues and with fast response in the past. But due to some circumstances I had to format the server and re-attach the database containing the partitioned table, and I'm now having this issue.
any hint is much appreciated.
- This topic was modified 5 years, 7 months ago by ghostman.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110209 More actions · Answer 1

There is no point having a clustered index on date and id a non clustered index on date - they duplicate each other.

On the old version of the server you must have had a clustered index on id and date.

That's the only reason I see for making a query like you posted:

where id>@certain_id order by id

ID's are meaningless from the business point of view, and it was a workaround for bad indexing - clustering on ID instead of date.

Since your indexing is now almost right (you still need to have non-clustered PK on id instead on non-clustered index on date) you may query straight on date:

where date>@certain_date order by date

_____________
Code for TallyGenerator

Sergiy SSC Guru Points: 110209 More actions · Answer 2

And try to get rid of daily partitioning. With clustering on date it only adds to overheads without providing any advantage.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 3

ghostman wrote:

I have a huge table with 3B of rows containing around 6 months of data running on Microsoft SQL server 2017 (RTM), which is partitioned day by day on a date column (each day on a separate filegroup and each filegroup has 1 data file).
this table has an identity field which is bigint.
I have two indexes : a clustered index on date and id a non clustered index on date
I'm trying to run the following query :
 select top 500000 * from table with(nolock) where id>@certain_id order by id
but the query is taking a lot of time. I tried to create an non clustered index on id field , but nothing changes!
the weird part I was able to run the same query with no issues and with fast response in the past. But due to some circumstances I had to format the server and re-attach the database containing the partitioned table, and I'm now having this issue.
any hint is much appreciated.

See the article at the second link in my signature line below on how to post performance problems. Without the things in that list, we'd only guessing, although Sergiy is definitely on the right track. Also, do you REALLY need all columns in the table to be returned?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Adi Cohn SSC-Addicted Points: 429 More actions · Answer 4

Unfortunately there is an old bug that for some reason Microsoft is not fixing. When using top clause or min or max aggregation functions on a partitioned table, SQL Server will do a table scan even if a good index exists. In some cases I managed to get around it by using a CTE or derived table that had a union query and each part of it queried a specific partition (by using the function $PARTITION). Then I ran the original query on the CTE (or the derived table). In other cases I had to remove the partitioning.

Adi

Sergiy SSC Guru Points: 110209 More actions · Answer 5

I don't think it's a bug.

When you misuse partitioning and create daily or weekly partitions you end up with hundreds or thousands separate indexes created for every partition. Overhead of seeking each of those indexes followed by building a table of the results in memory to perform an aggregation on that table is most likely exceeds the cost of simply scanning of all of the records in sequence.

_____________
Code for TallyGenerator

Adi Cohn SSC-Addicted Points: 429 More actions · Answer 6

Actually it is a bug. It doesn't depend on the number of partitions. I can have only 2 partitions and the optimizer will do a table scan instead of an index seek even if a good index exists

Adi

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

Adi Cohn wrote:

Unfortunately there is an old bug that for some reason Microsoft is not fixing. When using top clause or min or max aggregation functions on a partitioned table, SQL Server will do a table scan even if a good index exists. In some cases I managed to get around it by using a CTE or derived table that had a union query and each part of it queried a specific partition (by using the function $PARTITION). Then I ran the original query on the CTE (or the derived table). In other cases I had to remove the partitioning.
Adi

If you look at the criteria the OP is using and compare that to the indexes and the fact that he's trying to return 500,000 rows using SELECT *, it's not a bug or at least not in this case.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 8

ghostman wrote:

I have a huge table with 3B of rows containing around 6 months of data running on Microsoft SQL server 2017 (RTM), which is partitioned day by day on a date column (each day on a separate filegroup and each filegroup has 1 data file).
this table has an identity field which is bigint.
I have two indexes : a clustered index on date and id a non clustered index on date
I'm trying to run the following query :
 select top 500000 * from table with(nolock) where id>@certain_id order by id
but the query is taking a lot of time. I tried to create an non clustered index on id field , but nothing changes!
the weird part I was able to run the same query with no issues and with fast response in the past. But due to some circumstances I had to format the server and re-attach the database containing the partitioned table, and I'm now having this issue.
any hint is much appreciated.

There's no index to support the ORDER BY ID. That also means that the query is not able to even take advantage of Partition Elimination. Are you absolutely sure that THAT's the query you were running and that it wasn't actually based on a date before? And the non-clustered index on the ID column is probably not even coming into play (check the execution plan) because of the * in the SELECT. If the query was fast before, it may have somehow been doing a scan of the B-TREE based on the second column of your Clustered Index but I find that highly unlikely.

I have some difficulty believing the query you posted ever ran fast but, if it did, it may be that the data finally grew enough to reach "the tipping point" where SQL server finally said enough is enough.

The real key here, though, is that we're just guessing. See my first post on this thread to see what we need to help you solve this problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)