May 3, 2010 at 10:28 pm
Comments posted to this topic are about the item Improving Performance of Cross-Partition Queries
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
May 4, 2010 at 2:40 am
Have you tried changing the order of the clustered index. You say you did try this index:
(IntDate ASC, CustomerID ASC, PartitionID ASC) ON[PartitionedTables](PartitionID)
But have you tried putting the PartitionID fist? :
(PartitionID ASC, IntDate ASC, CustomerID ASC) ON[PartitionedTables](PartitionID)
I'm very curious what happens when you try this, so please post the result....
May 4, 2010 at 5:58 am
For SQL Server magazine subscribers, Itzik Ben-Gan has an article on exactly this issue at:
That was published March 16 2010.
Anyway, about this article. I'm afraid I did not rate it highly for many reasons. One shortcoming is the lack of easily consumable sample code either in-line with the text or as a download.
It would also have been good to mention that a non-partitioned index on a partitioned table does not suffer from this optimiser limitation:
-- Everyone has one of these, feel free to use another test database
USE tempdb;
GO
-- Example partition function and scheme
CREATE PARTITION FUNCTION PF(TINYINT) AS RANGE RIGHT FOR VALUES (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]);
GO
-- Test table (partitioned heap)
CREATE TABLE dbo.ExampleTable
(
[ExampleTableID] [bigint] NOT NULL IDENTITY(1, 1),
[CustomerID] [int] NOT NULL,
[PartitionID] [tinyint] NOT NULL,
[IntDate] [int] NOT NULL,
)
ON PS (PartitionID);
GO
-- Partitioned clustered index
-- (note: clustering key should always be unique)
CREATE CLUSTERED INDEX [PCX dbo.ExampleTable IntDate, CustomerID]
ON dbo.ExampleTable
(IntDate, CustomerID)
ON PS (PartitionID);
GO
-- Add 100,000 randomish rows
INSERT dbo.ExampleTable WITH (TABLOCK)
(CustomerID, PartitionID, IntDate)
SELECT TOP (100000)
RAND(CHECKSUM(NEWID())) * 1000 + 1,
RAND(CHECKSUM(NEWID())) * 10 + 1,
CONVERT(INTEGER, CONVERT(CHAR(8), DATEADD(DAY, 0 - RAND(CHECKSUM(NEWID())) * 365 + 1, GETDATE()), 112))
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
-- Add partitioned primary key
ALTER TABLE dbo.ExampleTable
ADD CONSTRAINT [PK dbo.ExampleTable ExampleTableID, CustomerID, PartitionID]
PRIMARY KEY NONCLUSTERED (ExampleTableID, CustomerID, PartitionID) ON PS (PartitionID);
GO
-- Add partitioned index on CustomerID
CREATE NONCLUSTERED INDEX [PIX dbo.ExampleTable CustomerID]
ON dbo.ExampleTable (CustomerID)
ON PS (PartitionID);
GO
-- Add non-partitioned index on IntDate
CREATE NONCLUSTERED INDEX [IX dbo.ExampleTable IntDate]
ON dbo.ExampleTable (IntDate)
ON [PRIMARY];
GO
-- +++++++++++++++++++++++++++++++++++++++++++++
-- Efficient plan based on non-partitioned index
-- +++++++++++++++++++++++++++++++++++++++++++++
SELECT minimum_date = MIN(IntDate)
FROM dbo.ExampleTable;
-- ++++++++++++++++++++++++++++++++++++++++++++++
-- Alternative method to create an efficient plan
-- based partly on Itzik Ben-Gan's approach
-- +++++++++++++++++++++++++++++++++++++++++++++
SELECT min_customer_id = MIN(iTVF.partition_minimum)
FROM sys.partitions AS P
CROSS
APPLY (
-- Minimum per partition
SELECT partition_minimum = MIN(ET.CustomerID)
FROM dbo.ExampleTable ET
WHERE $PARTITION.PF(ET.PartitionID) = P.partition_number
) iTVF
WHERE P.[object_id] = OBJECT_ID(N'dbo.ExampleTable', N'U')
AND P.index_id IN (0, 1);
GO
-- Tidy up
DROP TABLE dbo.ExampleTable;
DROP PARTITION SCHEME PS;
DROP PARTITION FUNCTION PF;
Of course it is fair to say that the non-aligned index would have to be dropped and re-created to perform SWITCH operations. In mitigation, I would just say that this likely occurs in a maintenance window anyway, and indexes are pretty quick to build. One other option is to maintain both partitioned and non-partitioned indexes - the optimiser is smart enough to choose the efficient plan based on the non-partitioned index for MIN and MAX queries.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 4, 2010 at 7:49 am
Numerous SQL Server MVPs (and lots of other users as well) have been screaming at Microsoft for years to improve partitioning. I hope improvements will be coming in later versions of the engine. Connect has some entries for these things I think. It is a sin we need to jump through hoops for things like this!!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 4, 2010 at 9:08 am
nice article
we have a 2TB database we're going to be moving to new hardware soon and i wanted to try to partition it because people seem to like to select an entire year's worth of data or more in one query. but there is no way to test it since we don't have 2TB of empty space anywhere
May 4, 2010 at 9:43 am
Sander A. (5/4/2010)
But have you tried putting the PartitionID fist? :(PartitionID ASC, IntDate ASC, CustomerID ASC) ON[PartitionedTables](PartitionID)
Hello Sander. I did just try that with the same table and data that I used before and it performed basically the same as with the PartitionID at the end. Essentially 10 - 11 seconds for the generic/simple MIN(IntDate) query. The row order should be the same in either case since the values for PartitionID can only be of one value per partition.
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
May 4, 2010 at 9:54 am
What happens if you use a sub query to get the minimum value for each partition, and then get the min of that?
i.e. Something like this:
SELECT MIN(MinIntDate)
FROM (SELECT PartitionID, MIN(IntDate) MinIntDate
FROM dbo.ExampleTablePartitioned
GROUP BY PartitionID) a
or
SELECT MIN(MinIntDate)
FROM (SELECT PartitionID FROM dbo.ExampleTable GROUP BY PartitionID) a
CROSS APPLY (SELECT MIN(IntDate) MinIntDate
FROM dbo.ExampleTable b
WHERE B.PartitionID = a.PartitionID) c
May 4, 2010 at 9:56 am
Hi alen,
Be careful before you decide to employ partitioning here. Do whatever you can do test your queries before you decide to implement. It could be a life-saver. 🙂
I agree w/ you SQL Guru...I've spoke w/ the partitioning team @ MS. The idea for partitioning is not that it makes queries faster (believe it or not), but that it eases moving data in and out and along with some maintenance advantages.
Tim
SQL Server MVP
May 4, 2010 at 9:56 am
Paul White NZ (5/4/2010)
It would also have been good to mention that a non-partitioned index on a partitioned table does not suffer from this optimiser limitation
Paul, thanks for the feedback. Interesting about the Ben-Gan article; I did not know about it.
And, for the sake of being thorough, you make a good point about the non-partitioned index. However, it is not that applicable in my situation. The index comes with two costs: disk space (in this case over 1 GB!) and it needs to be maintained across INSERT and DELETE operations. Given that the piece of code in my article is in about five Stored Procs, that is negligible disk space and still fairly maintainable. True, it does not help ad-hoc queries but we generally don't do those in Production, and if we do then rarely would they be across all partitions. I am sure others might find themselves in a similar situation.
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
May 4, 2010 at 10:00 am
alen teplitsky (5/4/2010)
nice articlewe have a 2TB database we're going to be moving to new hardware soon ... but there is no way to test it since we don't have 2TB of empty space anywhere
Hello Alen and thanks. I am surprised that you cannot find 2 TB of available space given that I keep hearing how "disk is cheap" ;-). Good luck!
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
May 4, 2010 at 10:07 am
UMG Developer (5/4/2010)
What happens if you use a sub query to get the minimum value for each partition, and then get the min of that?
SELECT MIN(MinIntDate)
FROM (SELECT PartitionID, MIN(IntDate) MinIntDate
FROM dbo.ExampleTablePartitioned
GROUP BY PartitionID) a
Hello. I just tried your code and it had the same effect as the regular SELECT MIN(IntDate). It took 10 - 11 seconds each time. There is really no getting around using the Partition Key in the WHERE clause (outside of doing a non-partitioned Index like Paul mentioned).
Take care,
Solomon...
SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR
May 4, 2010 at 10:16 am
Solomon Rutzky (5/4/2010)
Hello. I just tried your code and it had the same effect as the regular SELECT MIN(IntDate). It took 10 - 11 seconds each time. There is really no getting around using the Partition Key in the WHERE clause (outside of doing a non-partitioned Index like Paul mentioned).
What about the other version I edited in:
SELECT MIN(MinIntDate)
FROM (SELECT PartitionID FROM dbo.ExampleTable GROUP BY PartitionID) a
CROSS APPLY (SELECT MIN(IntDate) MinIntDate
FROM dbo.ExampleTable b
WHERE B.PartitionID = a.PartitionID) c
I would hope it was similar to the other cross apply that Paul supplied. (I guess the real problem there is that you would have to have an index with the PartitionID as the first column to get the best performance and using Sys.Partitions avoids that.)
May 4, 2010 at 10:26 am
UMG Developer (5/4/2010)
I would hope it was similar to the other cross apply that Paul supplied. (I guess the real problem there is that you would have to have an index with the PartitionID as the first column to get the best performance and using Sys.Partitions avoids that.)
Ah, sadly no - that APPLY results in two full scans of the partitioned index. You can use the test rig code to test for yourself - just don't create the non-partitioned index. You have to include the $PARTITION function to help the optimiser along.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 4, 2010 at 10:38 am
Solomon Rutzky (5/4/2010)
...you make a good point about the non-partitioned index. However, it is not that applicable in my situation. The index comes with two costs: disk space (in this case over 1 GB!) and it needs to be maintained across INSERT and DELETE operations.
I guess it comes down to circumstances and personal preference.
Not looking to start a debate here, but I just want to respond briefly:
The non-partitioned index could replace the partitioned index, so no additional space or maintenance would be required. I'm not sure the space or maintenance overhead arguments are terribly strong ones.
Your precise circumstances may differ, but I think that in general, I would prefer the solution that just works every time, without having to rewrite queries, or remember to do the trick whenever new code is written containing MIN or MAX. I dislike the $PARTITION approach for similar reasons, but it does strike me as preferable to the WHILE loop + hard coded partition count.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 4, 2010 at 10:46 am
Solomon Rutzky (5/4/2010)
alen teplitsky (5/4/2010)
nice articlewe have a 2TB database we're going to be moving to new hardware soon ... but there is no way to test it since we don't have 2TB of empty space anywhere
Hello Alen and thanks. I am surprised that you cannot find 2 TB of available space given that I keep hearing how "disk is cheap" ;-). Good luck!
Take care,
Solomon...
its finding 2TB of usable space after you create separate RAID1 volumes for all the partitions, logs and tempdb
$7500 or so to buy the disk and jbod for a soon to be free server
$8000 to buy a new server with just enough disk
Viewing 15 posts - 1 through 15 (of 40 total)
You must be logged in to reply to this topic. Login to reply