Compatibility of streaming aggregate operator and partitioning.

Question

Post reply

Compatibility of streaming aggregate operator and partitioning.

SpringTownDBA

SSCertifiable

Points: 6917
More actions
April 19, 2013 at 10:15 pm

#276700

I've got a 350 million row partitioned table that I'm trying to aggregate. The plan I get uses a HASH Aggregate. I would like to get a query plan that uses a streaming aggregate so results can be returned to the client faster. Here is the original query (Schema is at the end):
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID
Schema for the Table and indexes are fairly straight forward. The clustered index is on (userid, datetime). There is a Non-clustered index on (Username, sessionid). Both indexes are partitioned on (DateTime). Since the query is limited to one partition and the columns being grouped on are either in the clustered index or the NC index key, the query optimizer should be able to generate a plan using a streaming aggregate that scans the NC index.
However, I am unable to produce the desired plan. I've tried 4 variations: with/without MAXDOP 1, with/without query "ORDER GROUP" query hint. The plans for these are in the attachment.
--This is the query that should be producing a streaming agg
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID
--Still get a Hash Aggregate without parallelism
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1)
--Prevent HASH Aggregation, and we get a distinct sort instead
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (ORDER GROUP)
--Still get distinct sort
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1, ORDER GROUP)
It there something preventing the use of a streaming aggregate for this query, or is this a bug/limitation of the QO?
Create Schema for the table:
CREATE PARTITION FUNCTION [PartitionByYear](datetime) AS RANGE RIGHT FOR
VALUES (N'2005-01-01T00:00:00.000', N'2006-01-01T00:00:00.000', N'2007-01-01T00:00:00.000',
N'2008-01-01T00:00:00.000', N'2009-01-01T00:00:00.000', N'2010-01-01T00:00:00.000',
N'2011-01-01T00:00:00.000', N'2012-01-01T00:00:00.000', N'2014-01-01T00:00:00.000')
GO
/****** Object: PartitionScheme [PartYearsToArchiveAndPrimary] Script Date: 4/19/2013 11:04:53 PM ******/
CREATE PARTITION SCHEME [PartYearsToArchiveAndPrimary] AS PARTITION [PartitionByYear]
TO ([Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [PRIMARY], [PRIMARY])
GO
CREATE TABLE [dbo].[AttemptedLogin](
[ObjectId] [int] IDENTITY(1,1) NOT NULL,
[DateTime] [datetime] NOT NULL,
[UserId] [int] NOT NULL,
[UserName] [varchar](30) NULL,
[Password] [varchar](30) NULL,
[SessionID] [varchar](30) NULL,
[IPAddress] [varchar](16) NULL,
[ProxyAddress] [varchar](16) NULL,
[CountryCode] [char](2) NULL,
CONSTRAINT [PK_NewAttemptedLogin] PRIMARY KEY NONCLUSTERED
(
[ObjectId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PartYearsToArchiveAndPrimary]([DateTime])
GO
/****** Object: Index [CIX_AttemptedLogin] Script Date: 4/19/2013 11:02:19 PM ******/
CREATE CLUSTERED INDEX [CIX_AttemptedLogin] ON [dbo].[AttemptedLogin]
(
[UserId] ASC,
[DateTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
) ON [PartYearsToArchiveAndPrimary]([DateTime])
GO
/****** Object: Index [IX_AttemptedLogin_UserNameSessionID] Script Date: 4/19/2013 11:02:20 PM ******/
CREATE NONCLUSTERED INDEX [IX_AttemptedLogin_UserNameSessionID] ON [dbo].[AttemptedLogin]
(
[UserName] ASC,
[SessionID] ASC
)
WHERE ([SessionId] IS NOT NULL)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90) ON [PartYearsToArchiveAndPrimary]([DateTime])
GO
Thanks!

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

SpringTownDBA SSCertifiable Points: 6917 More actions · Answer 1

Forgot the include information about the machine:

Microsoft SQL Server 2012 (SP1) - 11.0.3128.0 (X64)

Dec 28 2012 20:23:12

Copyright (c) Microsoft Corporation

Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

Paul White SSC Guru Points: 150468 More actions · Answer 2

Scanning the nonclustered index, there is an implied first key on the partition ID, so the following can use a Stream Aggregate:

SELECT sessionid, username

FROM dbo.AttemptedLogin

WHERE sessionid IS NOT null

GROUP BY $Partition.PartitionByYear([DateTime]), UserName, SessionID

Similarly, if a single partition is statically specified, the following also results in a Stream Aggregate, since the keys of the nonclustered index are (UserName, SessionID):

SELECT SessionID, UserName FROM dbo.AttemptedLogin

WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null

GROUP BY UserName, SessionID

The problem in the original code is that it adds UserID to the mix, and fails to use a Stream Aggregate:

SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin

WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null

GROUP BY UserId, UserName, SessionID

Now UserID forms part of the key of the nonclustered index since it is not declared as unique, but it comes after the nonclustered keys. Writing the GROUP BY clause to reflect the key order results in the Stream Aggregate you are looking for:

SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin

WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null

GROUP BY UserName, SessionID, UserId

Should the order of the GROUP BY keys matter? No. In that sense, it is a limitation, in the same way that writing a query with several window functions might result in an unnecessary sort depending on the written order of the query.

It may be more natural (and easier for future maintenance) if you explicitly include UserID in the nonclustered index definition.

edit: In response to your other Twitter question, yes the same considerations apply to DISTINCT:

-- Sort

SELECT DISTINCT UserId, UserName, SessionID FROM dbo.AttemptedLogin

WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null

-- No sort

SELECT DISTINCT UserName, SessionID, UserId FROM dbo.AttemptedLogin

WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null

Another option is to keep the GROUP BY or DISTINCT column order as it is, and add a presentation ORDER BY UserName, SessionID, UserId

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

I am glad you saw this one Paul! I didn't think anyone else could give as clear of a description of why the OP was getting the observed behavior.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service