Improving Performance of Cross-Partition Queries

Question

Post reply

Improving Performance of Cross-Partition Queries

Solomon Rutzky

SSCoach

Points: 16431
More actions
May 3, 2010 at 10:28 pm

#221158

Comments posted to this topic are about the item Improving Performance of Cross-Partition Queries
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Viewing 15 posts - 1 through 15 (of 41 total)

You must be logged in to reply to this topic. Login to reply

Sander A. Default port Points: 1440 More actions · Answer 1

Have you tried changing the order of the clustered index. You say you did try this index:

(IntDate ASC, CustomerID ASC, PartitionID ASC) ON[PartitionedTables](PartitionID)

But have you tried putting the PartitionID fist? :

(PartitionID ASC, IntDate ASC, CustomerID ASC) ON[PartitionedTables](PartitionID)

I'm very curious what happens when you try this, so please post the result....

Paul White SSC Guru Points: 150468 More actions · Answer 2

For SQL Server magazine subscribers, Itzik Ben-Gan has an article on exactly this issue at:

http://www.sqlmag.com/article/microsoft-products/max-and-min-aggregates-against-partitioned-tables.aspx

That was published March 16 2010.

Anyway, about this article. I'm afraid I did not rate it highly for many reasons. One shortcoming is the lack of easily consumable sample code either in-line with the text or as a download.

It would also have been good to mention that a non-partitioned index on a partitioned table does not suffer from this optimiser limitation:

-- Everyone has one of these, feel free to use another test database

USE tempdb;

GO

-- Example partition function and scheme

CREATE PARTITION FUNCTION PF(TINYINT) AS RANGE RIGHT FOR VALUES (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]);

GO

-- Test table (partitioned heap)

CREATE TABLE dbo.ExampleTable

(

[ExampleTableID] [bigint] NOT NULL IDENTITY(1, 1),

[CustomerID] [int] NOT NULL,

[PartitionID] [tinyint] NOT NULL,

[IntDate] [int] NOT NULL,

)

ON PS (PartitionID);

GO

-- Partitioned clustered index

-- (note: clustering key should always be unique)

CREATE CLUSTERED INDEX [PCX dbo.ExampleTable IntDate, CustomerID]

ON dbo.ExampleTable

(IntDate, CustomerID)

ON PS (PartitionID);

GO

-- Add 100,000 randomish rows

INSERT dbo.ExampleTable WITH (TABLOCK)

(CustomerID, PartitionID, IntDate)

SELECT TOP (100000)

RAND(CHECKSUM(NEWID())) * 1000 + 1,

RAND(CHECKSUM(NEWID())) * 10 + 1,

CONVERT(INTEGER, CONVERT(CHAR(8), DATEADD(DAY, 0 - RAND(CHECKSUM(NEWID())) * 365 + 1, GETDATE()), 112))

FROM master.sys.columns C1,

master.sys.columns C2,

master.sys.columns C3;

GO

-- Add partitioned primary key

ALTER TABLE dbo.ExampleTable

ADD CONSTRAINT [PK dbo.ExampleTable ExampleTableID, CustomerID, PartitionID]

PRIMARY KEY NONCLUSTERED (ExampleTableID, CustomerID, PartitionID) ON PS (PartitionID);

GO

-- Add partitioned index on CustomerID

CREATE NONCLUSTERED INDEX [PIX dbo.ExampleTable CustomerID]

ON dbo.ExampleTable (CustomerID)

ON PS (PartitionID);

GO

-- Add non-partitioned index on IntDate

CREATE NONCLUSTERED INDEX [IX dbo.ExampleTable IntDate]

ON dbo.ExampleTable (IntDate)

ON [PRIMARY];

GO

-- +++++++++++++++++++++++++++++++++++++++++++++

-- Efficient plan based on non-partitioned index

-- +++++++++++++++++++++++++++++++++++++++++++++

SELECT minimum_date = MIN(IntDate)

FROM dbo.ExampleTable;

-- ++++++++++++++++++++++++++++++++++++++++++++++

-- Alternative method to create an efficient plan

-- based partly on Itzik Ben-Gan's approach

-- +++++++++++++++++++++++++++++++++++++++++++++

SELECT min_customer_id = MIN(iTVF.partition_minimum)

FROM sys.partitions AS P

CROSS

APPLY (

-- Minimum per partition

SELECT partition_minimum = MIN(ET.CustomerID)

FROM dbo.ExampleTable ET

WHERE $PARTITION.PF(ET.PartitionID) = P.partition_number

) iTVF

WHERE P.[object_id] = OBJECT_ID(N'dbo.ExampleTable', N'U')

AND P.index_id IN (0, 1);

GO

-- Tidy up

DROP TABLE dbo.ExampleTable;

DROP PARTITION SCHEME PS;

DROP PARTITION FUNCTION PF;

Of course it is fair to say that the non-aligned index would have to be dropped and re-created to perform SWITCH operations. In mitigation, I would just say that this likely occurs in a maintenance window anyway, and indexes are pretty quick to build. One other option is to maintain both partitioned and non-partitioned indexes - the optimiser is smart enough to choose the efficient plan based on the non-partitioned index for MIN and MAX queries.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

Numerous SQL Server MVPs (and lots of other users as well) have been screaming at Microsoft for years to improve partitioning. I hope improvements will be coming in later versions of the engine. Connect has some entries for these things I think. It is a sin we need to jump through hoops for things like this!!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 4

nice article

we have a 2TB database we're going to be moving to new hardware soon and i wanted to try to partition it because people seem to like to select an entire year's worth of data or more in one query. but there is no way to test it since we don't have 2TB of empty space anywhere

Solomon Rutzky SSCoach Points: 16431 More actions · Answer 5

Sander A. (5/4/2010)
But have you tried putting the PartitionID fist? :
(PartitionID ASC, IntDate ASC, CustomerID ASC) ON[PartitionedTables](PartitionID)

Hello Sander. I did just try that with the same table and data that I used before and it performed basically the same as with the PartitionID at the end. Essentially 10 - 11 seconds for the generic/simple MIN(IntDate) query. The row order should be the same in either case since the values for PartitionID can only be of one value per partition.

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

UMG Developer SSChampion Points: 13482 More actions · Answer 6

What happens if you use a sub query to get the minimum value for each partition, and then get the min of that?

i.e. Something like this:

SELECT MIN(MinIntDate)

FROM (SELECT PartitionID, MIN(IntDate) MinIntDate

FROM dbo.ExampleTablePartitioned

GROUP BY PartitionID) a

or

SELECT MIN(MinIntDate)

FROM (SELECT PartitionID FROM dbo.ExampleTable GROUP BY PartitionID) a

CROSS APPLY (SELECT MIN(IntDate) MinIntDate

FROM dbo.ExampleTable b

WHERE B.PartitionID = a.PartitionID) c

chapman.tim SSC-Addicted Points: 417 More actions · Answer 7

Hi alen,

Be careful before you decide to employ partitioning here. Do whatever you can do test your queries before you decide to implement. It could be a life-saver. 🙂

I agree w/ you SQL Guru...I've spoke w/ the partitioning team @ MS. The idea for partitioning is not that it makes queries faster (believe it or not), but that it eases moving data in and out and along with some maintenance advantages.

Tim

SQL Server MVP

Solomon Rutzky SSCoach Points: 16431 More actions · Answer 8

Paul White NZ (5/4/2010)
It would also have been good to mention that a non-partitioned index on a partitioned table does not suffer from this optimiser limitation

Paul, thanks for the feedback. Interesting about the Ben-Gan article; I did not know about it.

And, for the sake of being thorough, you make a good point about the non-partitioned index. However, it is not that applicable in my situation. The index comes with two costs: disk space (in this case over 1 GB!) and it needs to be maintained across INSERT and DELETE operations. Given that the piece of code in my article is in about five Stored Procs, that is negligible disk space and still fairly maintainable. True, it does not help ad-hoc queries but we generally don't do those in Production, and if we do then rarely would they be across all partitions. I am sure others might find themselves in a similar situation.

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Solomon Rutzky SSCoach Points: 16431 More actions · Answer 9

alen teplitsky (5/4/2010)
nice article
we have a 2TB database we're going to be moving to new hardware soon ... but there is no way to test it since we don't have 2TB of empty space anywhere

Hello Alen and thanks. I am surprised that you cannot find 2 TB of available space given that I keep hearing how "disk is cheap" ;-). Good luck!

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Solomon Rutzky SSCoach Points: 16431 More actions · Answer 10

UMG Developer (5/4/2010)
What happens if you use a sub query to get the minimum value for each partition, and then get the min of that?
SELECT MIN(MinIntDate)
FROM (SELECT PartitionID, MIN(IntDate) MinIntDate
FROM dbo.ExampleTablePartitioned
GROUP BY PartitionID) a

Hello. I just tried your code and it had the same effect as the regular SELECT MIN(IntDate). It took 10 - 11 seconds each time. There is really no getting around using the Partition Key in the WHERE clause (outside of doing a non-partitioned Index like Paul mentioned).

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

UMG Developer SSChampion Points: 13482 More actions · Answer 11

Solomon Rutzky (5/4/2010)
Hello. I just tried your code and it had the same effect as the regular SELECT MIN(IntDate). It took 10 - 11 seconds each time. There is really no getting around using the Partition Key in the WHERE clause (outside of doing a non-partitioned Index like Paul mentioned).

What about the other version I edited in:

SELECT MIN(MinIntDate)

FROM (SELECT PartitionID FROM dbo.ExampleTable GROUP BY PartitionID) a

CROSS APPLY (SELECT MIN(IntDate) MinIntDate

FROM dbo.ExampleTable b

WHERE B.PartitionID = a.PartitionID) c

I would hope it was similar to the other cross apply that Paul supplied. (I guess the real problem there is that you would have to have an index with the PartitionID as the first column to get the best performance and using Sys.Partitions avoids that.)

Paul White SSC Guru Points: 150468 More actions · Answer 12

UMG Developer (5/4/2010)
I would hope it was similar to the other cross apply that Paul supplied. (I guess the real problem there is that you would have to have an index with the PartitionID as the first column to get the best performance and using Sys.Partitions avoids that.)

Ah, sadly no - that APPLY results in two full scans of the partitioned index. You can use the test rig code to test for yourself - just don't create the non-partitioned index. You have to include the $PARTITION function to help the optimiser along.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 13

Solomon Rutzky (5/4/2010)
...you make a good point about the non-partitioned index. However, it is not that applicable in my situation. The index comes with two costs: disk space (in this case over 1 GB!) and it needs to be maintained across INSERT and DELETE operations.

I guess it comes down to circumstances and personal preference.

Not looking to start a debate here, but I just want to respond briefly:

The non-partitioned index could replace the partitioned index, so no additional space or maintenance would be required. I'm not sure the space or maintenance overhead arguments are terribly strong ones.

Your precise circumstances may differ, but I think that in general, I would prefer the solution that just works every time, without having to rewrite queries, or remember to do the trick whenever new code is written containing MIN or MAX. I dislike the $PARTITION approach for similar reasons, but it does strike me as preferable to the WHILE loop + hard coded partition count.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 14

Solomon Rutzky (5/4/2010)
alen teplitsky (5/4/2010)
nice article
we have a 2TB database we're going to be moving to new hardware soon ... but there is no way to test it since we don't have 2TB of empty space anywhere
Hello Alen and thanks. I am surprised that you cannot find 2 TB of available space given that I keep hearing how "disk is cheap" ;-). Good luck!
Take care,
Solomon...

its finding 2TB of usable space after you create separate RAID1 volumes for all the partitions, logs and tempdb

$7500 or so to buy the disk and jbod for a soon to be free server

$8000 to buy a new server with just enough disk