OUTPUT clause returning the wrong row?

Question

OUTPUT clause returning the wrong row?

g.britton

SSChampion

Points: 13863
More actions
November 6, 2014 at 7:44 am

#298210

I'm looking at various methods for deleting duplicate rows. Among the alternatives, one works just fine but gives me results that make me go, "huh?".
Consider this script:
declare @t table (a int, b int, c int, d int, e int)
insert into @t (a, b, c, d, e) values
(1, 2, 3, 4, 5),
(3, 4, 2, 3, 4),
(1, 2, 3, 4, 5)
select a,b,c,d,e, rn = row_number() over (
partition by a,b,c,d,e
order by a,b,c,d,e)
from @t;
with cte as (
select a,b,c,d,e, rn = row_number() over (
partition by a,b,c,d,e
order by a,b,c,d,e)
from @t
)
delete
from cte
output deleted.*
where rn > 1;
The code works -- that is, the duplicate row is deleted. However the output clause returns:
abcdern
123451
So....why? Why does the output clause show that the row with rn=1 was deleted, when the where clause stipulates rn > 1?
Gerald Britton, Pluralsight courses

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247224 More actions · Answer 1

I presume this is just an 'out of interest' type question, because when you're deleting exact duplicates from a heap, there is no real row number? You could not use this information to relate back to the original data and match the duplicate rows one to one.

If you add another dupe row, you'll see that the deleted row numbers are 1 and 2. So it seems that the SQL engine removes the correct rows and then somehow loses track of the row which remains. Looks a bit strange, but not really a problem.

g.britton SSChampion Points: 13863 More actions · Answer 2

Phil Parkin (11/6/2014)
I presume this is just an 'out of interest' type question, because when you're deleting exact duplicates from a heap, there is no real row number?

The row number is generated by the ROW_NUMBER() function and the sort order is specified. AFAIK there is no "real" row number in a clustered index either. The closest you get is the CI key which maps to RID which is just a file/page/rowid.

Even if I change the table definition to:

declare @t table (id int identity(1,1) primary key, a int, b int, c int, d int, e int)

to make it a CI, the output clause still returns incorrect results.

Gerald Britton, Pluralsight courses

Phil Parkin SSC Guru Points: 247224 More actions · Answer 3

g.britton (11/6/2014)
Phil Parkin (11/6/2014)
I presume this is just an 'out of interest' type question, because when you're deleting exact duplicates from a heap, there is no real row number?
The row number is generated by the ROW_NUMBER() function and the sort order is specified. AFAIK there is no "real" row number in a clustered index either. The closest you get is the CI key which maps to RID which is just a file/page/rowid.
Even if I change the table definition to:
declare @t table (id int identity(1,1) primary key, a int, b int, c int, d int, e int)
to make it a CI, the output clause still returns incorrect results.

Well, using your revised table definition, the delete should be done like this:

delete t

output deleted.*

from @t t

join cte on t.Id = cte.id

where rn > 1;

And everything works.

ScottPletcher SSC Guru Points: 101235 More actions · Answer 4

My guess is it's just some quirk with the output clause and ROW_NUMBER(), because the expected row gets kept even when the OUTPUT seems to indicate otherwise:

declare @t table (a int, b int, c int, d int, e int, f int)

insert into @t (a, b, c, d, e, f) values

(1, 2, 3, 4, 5, 99),

(3, 4, 2, 3, 4, 98),

(1, 2, 3, 4, 5, 96),

(1, 2, 3, 4, 5, 97)

select a,b,c,d,e, rn = row_number() over (

partition by a,b,c,d,e

order by a,b,c,d,e)

from @t;

with cte as (

select a,b,c,d,e, rn = row_number() over (

partition by a,b,c,d,e

order by a,b,c,d,e,f)

from @t

)

delete

from cte

output deleted.*

where rn > 1;

select * from @t

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

g.britton SSChampion Points: 13863 More actions · Answer 5

well, isn't that interesting!

Don't know about the "should be done" bit though. My version doesn't need a self-join. It's just weird that SQL loses track of what it just deleted.

Gerald Britton, Pluralsight courses

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

g.britton (11/6/2014)
well, isn't that interesting!
Don't know about the "should be done" bit though. My version doesn't need a self-join. It's just weird that SQL loses track of what it just deleted.

You're missing something.

The issue has to do with using the row_number() against the DELETED "magic table". If you were to add in the ID number like you mentioned and control which one to delete you will see it's wiping out the expected row. That said the ROW_NUMBER value isn't a physical value in the table, so DELETED doesn't have it to return. As a result the query attempt to provide based on what it knows from the CTE, but only find one row (in the delete), ergo RN is 1.

Confirm the results this way:

declare @t table (a int, b int, c int, d int, e int, id int identity(1,1))--<-- adding in the identity

insert into @t (a, b, c, d, e) values

(1, 2, 3, 4, 5),

(3, 4, 2, 3, 4),

(1, 2, 3, 4, 5)

select a,b,c,d,e,ID, row_number() over (

partition by a,b,c,d,e

order by a,b,c,d,e, ID) as RN --<-- making SURE that the "second" duplicate gets deleted

from @t;

with cte as (

select a,b,c,d,e,ID, row_number() over (

partition by a,b,c,d,e

order by a,b,c,d,e, ID) as RN

from @t

)

delete

from cte

output deleted.*

where rn > 1;

You will see that ID = 3 is the one that is gone. The ID is reliable since it's a physical column in the table which is what exists in the DELETED pseudo table

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 7

Quick thought, there are actually two row_number instances in the plan, one used for the delete and another for the output. As only one row is outputted, the row number is of course 1.

😎

[Table Scan]-->[Sort]--[Segment]-->[Sequence Project]-->[Filter]-->[Table Delete]-->[Sort]-->[Segment]-->[Sequence Project]-->[DELETE]

|<---------- 1st row number -------->| delete |<--------- 2nd row number ---------->| output

g.britton SSChampion Points: 13863 More actions · Answer 8

Good observation, though the output becomes misleading. I never doubted that the "right" row was deleted. I just found the results from the output clause to be confusing. I've lowered that to "misleading" 🙂

Gerald Britton, Pluralsight courses

Gazareth One Orange Chip Points: 27737 More actions · Answer 9

I actually encountered this exact behaviour last weekend. Got me a bit worried!

My explanation would be that the CTE is never materialised as such, so the rn value is never stored, and the ROW_NUMBER() clause is re-evaluated in the deleted table.

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 10

Gazareth (11/6/2014)
I actually encountered this exact behaviour last weekend. Got me a bit worried!
My explanation would be that the CTE is never materialised as such, so the rn value is never stored, and the ROW_NUMBER() clause is re-evaluated in the deleted table.

That's in a way correct, there are two row number (segment-sequence project) operators in the execution plan, one for the delete and one for the output.

😎