Query plan question

Question

Post reply

Query plan question

as_1234

SSCrazy

Points: 2865

October 24, 2019 at 3:15 pm

#3691653

All,

I'm trying to increase my knowledge of query plans on large databases. This question is purely a learning exercise for me. I'm also aware that there may not be a clear answer as it might depend on too many other contributing factors.

The DDL for the tables is:

/****** Object:  Table [dbo].[Posts]    Script Date: 24/10/2019 15:41:07 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Posts](
[Id] [int] IDENTITY(1,1) NOT NULL,
[AcceptedAnswerId] [int] NULL,
[AnswerCount] [int] NULL,
[Body] [nvarchar](max) NOT NULL,
[ClosedDate] [datetime] NULL,
[CommentCount] [int] NULL,
[CommunityOwnedDate] [datetime] NULL,
[CreationDate] [datetime] NOT NULL,
[FavoriteCount] [int] NULL,
[LastActivityDate] [datetime] NOT NULL,
[LastEditDate] [datetime] NULL,
[LastEditorDisplayName] [nvarchar](40) NULL,
[LastEditorUserId] [int] NULL,
[OwnerUserId] [int] NULL,
[ParentId] [int] NULL,
[PostTypeId] [int] NOT NULL,
[Score] [int] NOT NULL,
[Tags] [nvarchar](150) NULL,
[Title] [nvarchar](250) NULL,
[ViewCount] [int] NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

/****** Object:  Table [dbo].[Comments]    Script Date: 24/10/2019 15:57:05 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Comments](
[Id] [int] IDENTITY(1,1) NOT NULL,
[CreationDate] [datetime] NOT NULL,
[PostId] [int] NOT NULL,
[Score] [int] NULL,
[Text] [nvarchar](700) NOT NULL,
[UserId] [int] NULL,
 CONSTRAINT [PK_Comments__Id] PRIMARY KEY CLUSTERED 
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Two of the indexes are as follows:

/****** Object:  Index [idxPostID]    Script Date: 24/10/2019 15:40:17 ******/
CREATE CLUSTERED INDEX [idxPostID] ON [dbo].[Posts]
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

/****** Object:  Index [idxPostCUI]    Script Date: 24/10/2019 15:40:40 ******/
CREATE NONCLUSTERED INDEX [idxPostCUI] ON [dbo].[Posts]
(
[OwnerUserId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

My query is:

Select id from posts where id in (select postid from comments)

The execution plan is:

<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.481" Build="14.0.3192.2" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="24484400" StatementId="1" StatementOptmLevel="FULL" CardinalityEstimationModelVersion="70" StatementSubTreeCost="1294.62" StatementText="Select id from posts where id in (select postid from comments)" StatementType="SELECT" QueryHash="0x3EFF698920163089" QueryPlanHash="0xB9DF43C412BAC1AD" RetrievedFromCache="true" SecurityPolicyApplied="false">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan CachedPlanSize="48" CompileTime="4" CompileCPU="4" CompileMemory="296">
            <ThreadStat Branches="2" />
            <MemoryGrantInfo SerialRequiredMemory="1024" SerialDesiredMemory="4511592" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="104857" EstimatedPagesCached="13107" EstimatedAvailableDegreeOfParallelism="2" MaxCompileMemory="1309920" />
            <RelOp AvgRowSize="11" EstimateCPU="22.9642" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484400" LogicalOp="Gather Streams" NodeId="1" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="1294.62">
              <OutputList>
                <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
              </OutputList>
              <Parallelism>
                <RelOp AvgRowSize="11" EstimateCPU="286.071" EstimateIO="394.003" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484400" LogicalOp="Inner Join" NodeId="2" Parallel="true" PhysicalOp="Hash Match" EstimatedTotalSubtreeCost="1271.65">
                  <OutputList>
                    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
                  </OutputList>
                  <MemoryFractions Input="1" Output="1" />
                  <Hash>
                    <DefinedValues />
                    <HashKeysBuild>
                      <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                    </HashKeysBuild>
                    <HashKeysProbe>
                      <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
                    </HashKeysProbe>
                    <RelOp AvgRowSize="11" EstimateCPU="16.9169" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484800" LogicalOp="Bitmap Create" NodeId="3" Parallel="true" PhysicalOp="Bitmap" EstimatedTotalSubtreeCost="452.424">
                      <OutputList>
                        <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                      </OutputList>
                      <Bitmap>
                        <DefinedValues>
                          <DefinedValue>
                            <ColumnReference Column="Bitmap1005" />
                          </DefinedValue>
                        </DefinedValues>
                        <HashKeys>
                          <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                        </HashKeys>
                        <RelOp AvgRowSize="11" EstimateCPU="16.9169" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="24484800" LogicalOp="Aggregate" NodeId="4" Parallel="true" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="452.424">
                          <OutputList>
                            <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                          </OutputList>
                          <StreamAggregate>
                            <DefinedValues />
                            <GroupBy>
                              <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                            </GroupBy>
                            <RelOp AvgRowSize="11" EstimateCPU="267.68" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="43182600" LogicalOp="Repartition Streams" NodeId="5" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="435.507">
                              <OutputList>
                                <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                              </OutputList>
                              <Parallelism PartitioningType="Hash">
                                <PartitionColumns>
                                  <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                                </PartitionColumns>
                                <OrderBy>
                                  <OrderByColumn Ascending="true">
                                    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                                  </OrderByColumn>
                                </OrderBy>
                                <RelOp AvgRowSize="11" EstimateCPU="29.6551" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="43182600" LogicalOp="Aggregate" NodeId="6" Parallel="true" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="167.827">
                                  <OutputList>
                                    <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                                  </OutputList>
                                  <StreamAggregate>
                                    <DefinedValues />
                                    <GroupBy>
                                      <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                                    </GroupBy>
                                    <RelOp AvgRowSize="11" EstimateCPU="41.4909" EstimateIO="96.6809" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="75437800" EstimatedRowsRead="75437800" LogicalOp="Index Scan" NodeId="7" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="138.172" TableCardinality="75437800">
                                      <OutputList>
                                        <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                                      </OutputList>
                                      <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                                        <DefinedValues>
                                          <DefinedValue>
                                            <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Column="PostId" />
                                          </DefinedValue>
                                        </DefinedValues>
                                        <Object Database="[StackOverflow]" Schema="[dbo]" Table="[Comments]" Index="[idxCommentPostID]" IndexKind="NonClustered" Storage="RowStore" />
                                      </IndexScan>
                                    </RelOp>
                                  </StreamAggregate>
                                </RelOp>
                              </Parallelism>
                            </RelOp>
                          </StreamAggregate>
                        </RelOp>
                      </Bitmap>
                    </RelOp>
                    <RelOp AvgRowSize="11" EstimateCPU="55.0462" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="45919800" LogicalOp="Repartition Streams" NodeId="8" Parallel="true" PhysicalOp="Parallelism" EstimatedTotalSubtreeCost="139.154">
                      <OutputList>
                        <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
                      </OutputList>
                      <Parallelism PartitioningType="Hash">
                        <PartitionColumns>
                          <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
                        </PartitionColumns>
                        <RelOp AvgRowSize="11" EstimateCPU="25.256" EstimateIO="58.852" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="45919800" EstimatedRowsRead="45919800" LogicalOp="Index Scan" NodeId="9" Parallel="true" PhysicalOp="Index Scan" EstimatedTotalSubtreeCost="84.108" TableCardinality="45919800">
                          <OutputList>
                            <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
                          </OutputList>
                          <IndexScan Ordered="false" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                            <DefinedValues>
                              <DefinedValue>
                                <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
                              </DefinedValue>
                            </DefinedValues>
                            <Object Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Index="[idxPostCUI]" IndexKind="NonClustered" Storage="RowStore" />
                            <Predicate>
                              <ScalarOperator ScalarString="PROBE([Bitmap1005],[StackOverflow].[dbo].[Posts].[Id],N'[IN ROW]')">
                                <Intrinsic FunctionName="PROBE">
                                  <ScalarOperator>
                                    <Identifier>
                                      <ColumnReference Column="Bitmap1005" />
                                    </Identifier>
                                  </ScalarOperator>
                                  <ScalarOperator>
                                    <Identifier>
                                      <ColumnReference Database="[StackOverflow]" Schema="[dbo]" Table="[Posts]" Column="Id" />
                                    </Identifier>
                                  </ScalarOperator>
                                  <ScalarOperator>
                                    <Const ConstValue="N'[IN ROW]'" />
                                  </ScalarOperator>
                                </Intrinsic>
                              </ScalarOperator>
                            </Predicate>
                          </IndexScan>
                        </RelOp>
                      </Parallelism>
                    </RelOp>
                  </Hash>
                </RelOp>
              </Parallelism>
            </RelOp>
          </QueryPlan>
        </StmtSimple>
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>

I'm wondering why the optimiser would use the idxPostCUI non clustered index rather than the idxPostID clustered index to get the list of IDs from the post table? The only reason I could thing of is because it's a smaller index and the non clustered index will have links to the clustered index?

If anyone is interested in which database I'm using it's this one https://www.brentozar.com/archive/2018/06/new-stack-overflow-public-database-available-2018-06/

Thanks

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Chris Harshman SSC-Forever Points: 42192 More actions · Answer 1

I believe that the optimizer will see the estimated count from the subquery and determine that the list of postid from the Comments table won't be selective enough to justify doing seeks against the clustered index.

Then, since the column in the query is just id, and it is included in the nonclustered index because it's the clustered index key, then the cheapest way to get a list of all the ids that match the Comments table is to use the smaller index idxPostCUI and build a hash table using multiple CPU cores to filter the matches.

as_1234 SSCrazy Points: 2865 More actions · Answer 2

as_1234

SSCrazy

Points: 2865

October 24, 2019 at 6:33 pm

#3691737

Thanks for your help.