October 24, 2019 at 3:15 pm
All,
I'm trying to increase my knowledge of query plans on large databases. This question is purely a learning exercise for me. I'm also aware that there may not be a clear answer as it might depend on too many other contributing factors.
The DDL for the tables is:
/****** Object: Table [dbo].[Posts] Script Date: 24/10/2019 15:41:07 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Posts](
[Id] [int] IDENTITY(1,1) NOT NULL,
[AcceptedAnswerId] [int] NULL,
[AnswerCount] [int] NULL,
[Body] [nvarchar](max) NOT NULL,
[ClosedDate] [datetime] NULL,
[CommentCount] [int] NULL,
[CommunityOwnedDate] [datetime] NULL,
[CreationDate] [datetime] NOT NULL,
[FavoriteCount] [int] NULL,
[LastActivityDate] [datetime] NOT NULL,
[LastEditDate] [datetime] NULL,
[LastEditorDisplayName] [nvarchar](40) NULL,
[LastEditorUserId] [int] NULL,
[OwnerUserId] [int] NULL,
[ParentId] [int] NULL,
[PostTypeId] [int] NOT NULL,
[Score] [int] NOT NULL,
[Tags] [nvarchar](150) NULL,
[Title] [nvarchar](250) NULL,
[ViewCount] [int] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
/****** Object: Table [dbo].[Comments] Script Date: 24/10/2019 15:57:05 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Comments](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CreationDate] [datetime] NOT NULL,
[PostId] [int] NOT NULL,
[Score] [int] NULL,
[Text] [nvarchar](700) NOT NULL,
[UserId] [int] NULL,
CONSTRAINT [PK_Comments__Id] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Two of the indexes are as follows:
/****** Object: Index [idxPostID] Script Date: 24/10/2019 15:40:17 ******/
CREATE CLUSTERED INDEX [idxPostID] ON [dbo].[Posts]
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [idxPostCUI] Script Date: 24/10/2019 15:40:40 ******/
CREATE NONCLUSTERED INDEX [idxPostCUI] ON [dbo].[Posts]
(
[OwnerUserId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
My query is:
Select id from posts where id in (select postid from comments)
The execution plan is:
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.481" Build="14.0.3192.2" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="24484400" StatementId="1" StatementOptmLevel="FULL" CardinalityEstimationModelVersion="70" StatementSubTreeCost="1294.62" StatementText="Select id from posts where id in (select postid from comments)" StatementType="SELECT" QueryHash="0x3EFF698920163089" QueryPlanHash="0xB9DF43C412BAC1AD" RetrievedFromCache="true" SecurityPolicyApplied="false">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan CachedPlanSize="48" CompileTime="4" CompileCPU="4" CompileMemory="296">
<ThreadStat Branches="2" />
<MemoryGrantInfo SerialRequiredMemory="1024" SerialDesiredMemory="4511592" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="104857" EstimatedPagesCached="13107" EstimatedAvailableDegreeOfParallelism="2" MaxCompileMemory="1309920" />
<RelOp AvgRowSize="11" EstimateCPU="22.9642" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484400" LogicalOp="Gather Streams" NodeId="1" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="1294.62">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</OutputList>
<Parallelism>
<RelOp AvgRowSize="11" EstimateCPU="286.071" EstimateIO="394.003" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484400" LogicalOp="Inner Join" NodeId="2" Parallel="true" PhysicalOp="Hash Match" EstimatedTotalSubtreeCost="1271.65">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</OutputList>
<MemoryFractions Input="1" Output="1" />
<Hash>
<DefinedValues />
<HashKeysBuild>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</HashKeysBuild>
<HashKeysProbe>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</HashKeysProbe>
<RelOp AvgRowSize="11" EstimateCPU="16.9169" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484800" LogicalOp="Bitmap Create" NodeId="3" Parallel="true" PhysicalOp="Bitmap" EstimatedTotalSubtreeCost="452.424">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</OutputList>
<Bitmap>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Bitmap1005" />
</DefinedValue>
</DefinedValues>
<HashKeys>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</HashKeys>
<RelOp AvgRowSize="11" EstimateCPU="16.9169" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484800" LogicalOp="Aggregate" NodeId="4" Parallel="true" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="452.424">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</OutputList>
<StreamAggregate>
<DefinedValues />
<GroupBy>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</GroupBy>
<RelOp AvgRowSize="11" EstimateCPU="267.68" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="43182600" LogicalOp="Repartition Streams" NodeId="5" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="435.507">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</OutputList>
<Parallelism PartitioningType="Hash">
<PartitionColumns>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</PartitionColumns>
<OrderBy>
<OrderByColumn Ascending="true">
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</OrderByColumn>
</OrderBy>
<RelOp AvgRowSize="11" EstimateCPU="29.6551" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="43182600" LogicalOp="Aggregate" NodeId="6" Parallel="true" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="167.827">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</OutputList>
<StreamAggregate>
<DefinedValues />
<GroupBy>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</GroupBy>
<RelOp AvgRowSize="11" EstimateCPU="41.4909" EstimateIO="96.6809" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="75437800" EstimatedRowsRead="75437800" LogicalOp="Index Scan" NodeId="7" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="138.172" TableCardinality="75437800">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</OutputList>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
</DefinedValue>
</DefinedValues>
<Object Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Index="[idxCommentPostID]" IndexKind="NonClustered" Storage="RowStore" />
</IndexScan>
</RelOp>
</StreamAggregate>
</RelOp>
</Parallelism>
</RelOp>
</StreamAggregate>
</RelOp>
</Bitmap>
</RelOp>
<RelOp AvgRowSize="11" EstimateCPU="55.0462" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="45919800" LogicalOp="Repartition Streams" NodeId="8" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="139.154">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</OutputList>
<Parallelism PartitioningType="Hash">
<PartitionColumns>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</PartitionColumns>
<RelOp AvgRowSize="11" EstimateCPU="25.256" EstimateIO="58.852" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="45919800" EstimatedRowsRead="45919800" LogicalOp="Index Scan" NodeId="9" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="84.108" TableCardinality="45919800">
<OutputList>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</OutputList>
<IndexScan Ordered="false" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</DefinedValue>
</DefinedValues>
<Object Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Index="[idxPostCUI]" IndexKind="NonClustered" Storage="RowStore" />
<Predicate>
<ScalarOperator ScalarString="PROBE([Bitmap1005],[StackOverflow].[dbo].[Posts].[Id],N'[IN ROW]')">
<Intrinsic FunctionName="PROBE">
<ScalarOperator>
<Identifier>
<ColumnReference Column="Bitmap1005" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="N'[IN ROW]'" />
</ScalarOperator>
</Intrinsic>
</ScalarOperator>
</Predicate>
</IndexScan>
</RelOp>
</Parallelism>
</RelOp>
</Hash>
</RelOp>
</Parallelism>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
I'm wondering why the optimiser would use the idxPostCUI non clustered index rather than the idxPostID clustered index to get the list of IDs from the post table? The only reason I could thing of is because it's a smaller index and the non clustered index will have links to the clustered index?
If anyone is interested in which database I'm using it's this one https://www.brentozar.com/archive/2018/06/new-stack-overflow-public-database-available-2018-06/
Thanks
October 24, 2019 at 6:25 pm
I believe that the optimizer will see the estimated count from the subquery and determine that the list of postid from the Comments table won't be selective enough to justify doing seeks against the clustered index.
Then, since the column in the query is just id, and it is included in the nonclustered index because it's the clustered index key, then the cheapest way to get a list of all the ids that match the Comments table is to use the smaller index idxPostCUI and build a hash table using multiple CPU cores to filter the matches.
October 24, 2019 at 6:33 pm
Thanks for your help.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply