Eliminating a Distinct SORT operation

Question

Eliminating a Distinct SORT operation

lmarkum

SSCertifiable

Points: 7989
More actions
June 19, 2015 at 12:02 pm

#318317

I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286590 More actions · Answer 1

lmarkum (6/19/2015)
I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

Without the query at a bare minimum there is nothing we can do help here. It would probably also help to know the table structures and indexes.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

And also post the actual execution plan as a *.sqlplan file.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 3

lmarkum (6/19/2015)
I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

Are the two clustered indexes both unique?

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

lmarkum SSCertifiable Points: 7989 More actions · Answer 4

Here is the SQL and the attached sql plan

DECLARE @SendId INT

SET @SendId = 1

UPDATEEmailTrackingSummary

SETEmailName = tets.EmailName,

SentDate = tets.SentDate,

SendDate = tets.SendDate,

NumberSent = tets.NumberSent,

NumberDelivered = tets.NumberDelivered,

Duplicates = tets.Duplicates,

UniqueClicks = tets.UniqueClicks,

UniqueOpens = tets.UniqueOpens,

HardBounces = tets.HardBounces,

Unsubscribes = tets.Unsubscribes,

ExistingUndeliverables = tets.ExistingUndeliverables,

ExistingUnsubscribes = tets.ExistingUnsubscribes,

InvalidAddresses = tets.InvalidAddresses,

NumberExcluded = tets.NumberExcluded,

NumberTargeted = tets.NumberTargeted,

OtherBounces = tets.OtherBounces,

SoftBounces = tets.SoftBounces,

PreviewURL = tets.PreviewURL,

ts = GetDate(),

Reconciled = 0

FROMEmailTrackingSummary ets

INNER JOIN TempEmailTrackingSummary tets ON tets.SendId = ets.SendId

WHERE@SendId IS NULL OR TETS.SendId = @SendId

Alan Burstein SSC Guru Points: 61141 More actions · Answer 5

lmarkum (6/19/2015)
Here is the SQL and the attached sql plan
DECLARE @SendId INT
SET @SendId = 1
UPDATEEmailTrackingSummary
SETEmailName = tets.EmailName,
SentDate = tets.SentDate,
SendDate = tets.SendDate,
NumberSent = tets.NumberSent,
NumberDelivered = tets.NumberDelivered,
Duplicates = tets.Duplicates,
UniqueClicks = tets.UniqueClicks,
UniqueOpens = tets.UniqueOpens,
HardBounces = tets.HardBounces,
Unsubscribes = tets.Unsubscribes,
ExistingUndeliverables = tets.ExistingUndeliverables,
ExistingUnsubscribes = tets.ExistingUnsubscribes,
InvalidAddresses = tets.InvalidAddresses,
NumberExcluded = tets.NumberExcluded,
NumberTargeted = tets.NumberTargeted,
OtherBounces = tets.OtherBounces,
SoftBounces = tets.SoftBounces,
PreviewURL = tets.PreviewURL,
ts = GetDate(),
Reconciled = 0
FROMEmailTrackingSummary ets
INNER JOIN TempEmailTrackingSummary tets ON tets.SendId = ets.SendId
WHERE@SendId IS NULL OR TETS.SendId = @SendId

Can you post the query plan and DDL for EmailTrackingSummary and TempEmailTrackingSummary?

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 6

Answer removed, bad advice.

I think I got to the bottom of this... Give me a few minutes...

Ok I'm back. Again, DDL would really be helpful. Note the link in my signature line (or Sean's or Lynn's) for best practices on getting help/asking questions. What I'm really curious about is if there's a one-to-many relationship between EmailTrackingSummary and TempEmailTrackingSummary. If there is then there is a problem with your logic. Take the query below where I'm using a join to set the val of dbo.T_one/col1 with the value of dbo.T_many/col1.

USE tempdb

GO

IF OBJECT_ID('tempdb.dbo.T_One') IS NOT NULL DROP TABLE dbo.T_One;

IF OBJECT_ID('tempdb.dbo.T_Many') IS NOT NULL DROP TABLE dbo.T_Many;

GO

CREATE TABLE dbo.T_One

(

to_id int identity primary key,

col1 varchar(10)

);

CREATE TABLE dbo.T_Many

(

tm_id int identity primary key,

to_id int,

col1 varchar(10)

);

INSERT INTO dbo.T_One (col1)

VALUES ('xxx'),('xxx'),('xxx'),('xxx'),('xxx'),('xxx'),('xxx');

INSERT INTO dbo.T_Many (to_id, col1)

VALUES (1,'asd'),(1,'fgb'),(2,'ddd'),(3,'ddd'),(3,'ppp'),(3,'ooo'),

(4,'ddd'),(4,'zzz'),(5,'123'),(6,'vvv'),(7,'yyy');

UPDATE dbo.T_One

SET col1 = m.col1

FROM dbo.T_One o

JOIN dbo.T_Many m ON o.to_id = m.to_id;

SELECT * FROM T_One;

Because I'm doing an update based on a one-to-many join the optimizer is selecting the first match between each id and performing the update based on that.

Hopefully this helps you understand the problem with your update statement and why you are getting a DISTINCT sort. See attached query plan.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

lmarkum SSCertifiable Points: 7989 More actions · Answer 7

Alan,

In your example the SORT is happening right after the table is read and prior to the Nested Loops join, which makes sense given your example. In my situation the TempEmailTrackingSummary table and the EmailTrackingSummary table go into a Nested Loops inner join and then the distinct sort occurs after the tables are joined. The clustered index on TempEmailTrackingSummary is not unique. I will have to learn more about the data and how this table is used in order to know if it can be made a unique index.

Here is the DDL:

/****** Object: Table [dbo].[TempEmailTrackingSummary] Script Date: 6/21/2015 10:32:44 PM ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TempEmailTrackingSummary]') AND type in (N'U'))

BEGIN

CREATE TABLE [dbo].[TempEmailTrackingSummary](

[SendID] [int] NOT NULL,

[EmailName] [varchar](150) NULL,

[SentDate] [datetime] NULL,

[SendDate] [datetime] NULL,

[NumberSent] [int] NULL,

[NumberDelivered] [int] NULL,

[Duplicates] [int] NULL,

[UniqueClicks] [int] NULL,

[UniqueOpens] [int] NULL,

[HardBounces] [int] NULL,

[Unsubscribes] [int] NULL,

[ExistingUndeliverables] [int] NULL,

[ExistingUnsubscribes] [int] NULL,

[InvalidAddresses] [int] NULL,

[NumberExcluded] [int] NULL,

[NumberTargeted] [int] NULL,

[OtherBounces] [int] NULL,

[SoftBounces] [int] NULL,

[PreviewURL] [varchar](150) NULL

) ON [PRIMARY]

END

GO

SET ANSI_PADDING OFF

GO

CREATE CLUSTERED INDEX [ClusteredIndex-SendId] ON [dbo].[TempEmailTrackingSummary]

(

[SendID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

GO

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

SET ANSI_PADDING ON

GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[EmailTrackingSummary]') AND type in (N'U'))

BEGIN

CREATE TABLE [dbo].[EmailTrackingSummary](

[SendID] [int] NOT NULL,

[EmailName] [varchar](150) NULL,

[SentDate] [datetime] NULL,

[SendDate] [datetime] NULL,

[NumberSent] [int] NULL,

[NumberDelivered] [int] NULL,

[Duplicates] [int] NULL,

[UniqueClicks] [int] NULL,

[UniqueOpens] [int] NULL,

[HardBounces] [int] NULL,

[Unsubscribes] [int] NULL,

[ExistingUndeliverables] [int] NULL,

[ExistingUnsubscribes] [int] NULL,

[InvalidAddresses] [int] NULL,

[NumberExcluded] [int] NULL,

[NumberTargeted] [int] NULL,

[OtherBounces] [int] NULL,

[SoftBounces] [int] NULL,

[ts] [datetime] NULL,

[PreviewURL] [varchar](150) NULL,

[Reconciled] [bit] NULL DEFAULT ((0)),

[ReconciledCount] [int] NULL DEFAULT ((0)),

CONSTRAINT [PK_EmailTrackingSummary] PRIMARY KEY CLUSTERED

(

[SendID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

) ON [PRIMARY]

END

GO

SET ANSI_PADDING OFF

GO

SET ANSI_PADDING ON

GO

/****** Object: Index [IX_EmailTrackingSummary_EmailName] Script Date: 6/21/2015 10:01:02 PM ******/

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[EmailTrackingSummary]') AND name = N'IX_EmailTrackingSummary_EmailName')

CREATE NONCLUSTERED INDEX [IX_EmailTrackingSummary_EmailName] ON [dbo].[EmailTrackingSummary]

(

[EmailName] ASC

)

INCLUDE ( [NumberSent],

[UniqueClicks],

[UniqueOpens],

[HardBounces],

[Unsubscribes]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

GO

/****** Object: Index [IX_EmailTrackingSummary_Reconciled_ReconciledCount] Script Date: 6/21/2015 10:01:02 PM ******/

IF NOT EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[EmailTrackingSummary]') AND name = N'IX_EmailTrackingSummary_Reconciled_ReconciledCount')

CREATE NONCLUSTERED INDEX [IX_EmailTrackingSummary_Reconciled_ReconciledCount] ON [dbo].[EmailTrackingSummary]

(

[Reconciled] ASC,

[ReconciledCount] ASC

)

INCLUDE ( [SendID]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]

GO

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

lmarkum (6/19/2015)
I have an UPDATE statement that joins two table by SendId. One table, I'll call it T1, has a clustered index on SendId ASC. The other table I will call T2 also has a clustered index on SendID ASC. All the columns from T2 are used to update T1. The execution plan shows a Clustered index scan on T2 and a Clustered Index Seek on T1 going into a Nested Loops inner join. Immediately following is a Distinct Sort that is done on SendId ASC. Why the Distinct SORT if the tables are already ordered by SendID?

I think you've more or less answered your question with some encouragement from other folks. Whether you have or not, here's a demo which clearly shows the point - without a unique index on the source set, SQL Server can't determine if it contains dupes or not, so dedupes with a DISTINCT SORT.

DROP TABLE #a; DROP TABLE #b;

WITH

e1 AS (SELECT d.n FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0)) d (n)),

e2 AS (SELECT e1.n FROM e1 CROSS JOIN e1 b),

e4 AS (SELECT e2.n FROM e2 CROSS JOIN e2 b),

iT AS (SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM e4 CROSS JOIN e2 CROSS JOIN e1)

SELECT

n,

c1 = NEWID(),

c2 = DATEADD(millisecond,CHECKSUM(NEWID()),GETDATE()),

c3 = ABS(CHECKSUM(NEWID()))

INTO #a

FROM iT

CREATE CLUSTERED INDEX cx_Thing ON #a (n)

SELECT TOP 24 *

INTO #b

FROM #a

CREATE CLUSTERED INDEX cx_NotUnique ON #b (n)

-- Check the execution plan: DISTINCT SORT

UPDATE a SET

c1 = b.c1,

c2 = b.c2,

c3 = b.c3

FROM #a a

INNER JOIN #b b ON b.n = a.n

DROP INDEX cx_NotUnique ON #b

CREATE UNIQUE CLUSTERED INDEX cx_Unique ON #b (n)

-- Check the execution plan: NO SORT

UPDATE a SET

c1 = b.c1,

c2 = b.c2,

c3 = b.c3

FROM #a a

INNER JOIN #b b ON b.n = a.n

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden