May 4, 2012 at 10:51 pm
Jeff Moden (5/4/2012)
I think the extra sorts are absolutely worth it, though. They get rid of the accidental cross join, nearly an order of magnitude of reads, and runs several orders of magnitude faster according to priofiler.
It depends, as always (e.g. the sorts might spill to disk). Anyway, the point I'm trying to make here is that the optimizer has limitations which mean it does not choose as good an execution plan as it might here. Aside from anything else, things like ROW_NUMBER are not relational, and the optimizer reasons about things primarily in relational terms. ROW_NUMBER does not produce a set (strictly, a multi-set) it produces a sequence. Adding the row number column (projecting a new column in the jargon) means projecting a sequence; hence the Sequence Project operator in the plan. More limitations mean the common table expression is evaluated, and rows numbered, twice. Yet more mean the merge join has to account for the possibility of duplicates, and it runs in many-to-many mode with a needless work table.
Often, it is possible to give the optimizer better information, so it will produce good plans now and in the future when the number of rows and distribution changes. One way to do this is to materialize the CTE:
CREATE TABLE #Materialized
(
sortOrder bigint PRIMARY KEY CLUSTERED,
idLigne character (2) NOT NULL,
id integer NOT NULL,
statut character(1) NOT NULL,
);
INSERT #Materialized
(
sortOrder,
idLigne,
id,
statut
)
SELECT
sortOrder = ROW_NUMBER() OVER (ORDER BY mt.idLigne),
mt.idLigne,
mt.id,
mt.statut
FROM
dbo.MyTest AS mt;
-- Pretend there are a few more rows
UPDATE STATISTICS #Materialized WITH ROWCOUNT = 50, PAGECOUNT = 1;
-- One to many merge
SELECT
this.idLigne,
this.id,
this.statut
FROM #Materialized AS this
WHERE
NOT EXISTS
(
SELECT 1
FROM #Materialized AS nxt
WHERE
nxt.sortOrder = this.sortOrder + 1
AND nxt.statut = this.statut
);
DROP TABLE #Materialized;
May 4, 2012 at 11:09 pm
Jeff Moden (5/4/2012)
To continue, both queries use a "Work Table". The one with the MERGE hint uses it a whole lot less, though.
Sure. The work table in the loops join case is for the spool, the one in the merge join case is for the logical possibility of duplicates (the many-to-many part) but is obviously never actually required (though it is still created).
May 5, 2012 at 2:53 pm
SQL Kiwi (5/4/2012)
One way to do this is to materialize the CTE
I'm definitely a fan of that especially when a CTE is referenced more than once.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 5, 2012 at 7:47 pm
Jeff Moden (5/5/2012)
SQL Kiwi (5/4/2012)
One way to do this is to materialize the CTEI'm definitely a fan of that especially when a CTE is referenced more than once.
I think that is one thing Oracle does right with their subquery refactoring clause (aka, SQL Servers cte). It can either execute it inline or use it as a temporary table.
May 5, 2012 at 7:54 pm
Lynn Pettis (5/5/2012)
Jeff Moden (5/5/2012)
SQL Kiwi (5/4/2012)
One way to do this is to materialize the CTEI'm definitely a fan of that especially when a CTE is referenced more than once.
I think that is one thing Oracle does right with their subquery refactoring clause (aka, SQL Servers cte). It can either execute it inline or use it as a temporary table.
Have you voted for the Connect item?
May 6, 2012 at 10:00 am
I have, indeed.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 6, 2012 at 10:42 am
So have I.
May 7, 2012 at 3:05 am
Hello everyone. I'm sorry if I took time to answer you I was not home this weekend.
Thank you for your response ! your proposals are very interesting especially the solution without creating and dropping a table.
I gave you an example too simple to understand the real problem seeing this example:
I have the table "TableSorted"
idLigne | id | stat | datetime
1 1 A 1/01/2011 14:00:00
2 1 A 1/01/2011 14:00:05
3 1 A 1/01/2011 15:00:00
4 1 B 1/01/2011 16:00:00
5 1 A 1/01/2011 17:00:00
6 1 B 1/01/2011 18:00:00
7 1 B 1/01/2011 19:00:00
8 2 A 1/01/2011 12:00:00
9 2 A 1/01/2011 12:30:00
10 2 A 2/01/2011 11:00:00
11 2 A 2/01/2011 11:10:00
12 2 B 2/01/2011 15:00:00
and here is the desired result:
idLigne | id | stat | datetime
3 1 A 1/01/2011 15:00:00
4 1 B 1/01/2011 16:00:00
5 1 A 1/01/2011 17:00:00
7 1 B 1/01/2011 19:00:00
9 2 A 1/01/2011 12:30:00
11 2 A 2/01/2011 11:10:00
12 2 B 2/01/2011 15:00:00
I opt finally to use of the following query that i found simple :
with transformed as (
select idLigne, id,
stat,datetime,
row_number () over (order by idLigne) rn
from TableSorted)
select TableSorted.*
from TableSorted
inner join transformed t1
on TableSorted.idLigne = t1.idLigne
left join transformed t2
on t1.rn = t2.rn - 1
and t1.stat = t2.stat and t1.id = t2.id and year(t1.datetime) = year(t2.datetime) and month(t1.datetime) = month(t2.datetime) and day(t1.datetime) = day(t2.datetime)
where t2.rn is null
Viewing 8 posts - 16 through 22 (of 22 total)
You must be logged in to reply to this topic. Login to reply