March 12, 2010 at 5:14 am
Hi
I have a table that contains amongst others 2 columns. One (COL1) is a unique code column that increments for each record and the other (COL2) is one that contains a link to another record in a table (there could be mulltiple of these that are the same).
What I need to do is return only one record for each COL2 value and this needs to be the one with the highest COL1 value. So this would allow for the latest record for each COL2 value to be returned.
I thought it may be as simple as something like:
Select DISTINCT(COL2) From Table1 Order By COL1 Desc
However this comes up with an error as the COL1 value needs to be in the DISTINCT, but if I put this in here it brings back everything.
So if I have the following values in the table
COL1 COL2
1 10
2 55
3 68
4 55
5 8
6 66
7 66
8 4
I would want it to return the rows
COL1 COL2
1 10
3 68
4 55
5 8
7 66
8 4
Thank
Steve
March 12, 2010 at 5:31 am
GROUP BY is the more general form of DISTINCT:
Select MAX(COL1) as COL1, COL2
From Table1
Group By COL2
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 14, 2010 at 3:27 am
Some people find this way of writing the query more intuitive:
Setup:
CREATE TABLE #Data
(
col1 INTEGER NOT NULL PRIMARY KEY,
col2 INTEGER NOT NULL,
);
GO
INSERT #Data (col1, col2) VALUES (1, 10);
INSERT #Data (col1, col2) VALUES (2, 55);
INSERT #Data (col1, col2) VALUES (3, 68);
INSERT #Data (col1, col2) VALUES (4, 55);
INSERT #Data (col1, col2) VALUES (5, 08);
INSERT #Data (col1, col2) VALUES (6, 66);
INSERT #Data (col1, col2) VALUES (7, 66);
INSERT #Data (col1, col2) VALUES (8, 04);
Solution:
SELECT D1.col1,
D1.col2
FROM #Data D1
WHERE col1 =
(
SELECT MAX(D2.col1)
FROM #Data D2
WHERE D2.col2 = D1.col2
)
ORDER BY
D1.col1;
GO
DROP TABLE #Data;
Results:
col1 col2
==== ====
3 68
7 66
4 55
1 10
5 8
8 4
Despite appearances, this implementation only reads from the source table once:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 15, 2010 at 4:41 am
Hi Steven,
Is this what you need?
create table #t(col1 int, col2 int)
insert into #t values (1, 10)
insert into #t values (2, 55)
insert into #t values (3, 68)
insert into #t values (4, 55)
insert into #t values (5, 8)
insert into #t values (6, 66)
insert into #t values (7, 66)
insert into #t values (8, 4)
;with cte as
(
select col1, col2, row_number() over (partition by col2 order by col1 desc) 'rn'
from
#t
)
select col1, col2 from cte where rn = 1 order by col1
drop table #T
Alternately, you can use temporary table instead of CTE. Temporary tables are faster if your query returns 10k - 100k rows. I guess it is faster for more number of rows as well but i have not verified this.
https://sqlroadie.com/
March 15, 2010 at 4:56 am
Arjun Sivadasan (3/15/2010)
Alternately, you can use temporary table instead of CTE. Temporary tables are faster if your query returns 10k - 100k rows
Are you sure about that? 😉
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 15, 2010 at 5:05 am
Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Ofcourse, with the addition of index) Paul.
I was wondering what was that thing about the sample data until i saw your previous reply.
Believe me, I wrote that without seeing your reply. Or I would not have bothered to change the names.
🙂
https://sqlroadie.com/
March 15, 2010 at 5:20 am
Arjun Sivadasan (3/15/2010)
Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Of course, with the addition of index).
I would love to hear more about that.
Can you post a demo of a temporary table being faster than the ROW_NUMBER or Sequence Top methods?
Or a technical reference for it?
Just outline your method if you like - I will happily write the test rig for you.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 15, 2010 at 5:26 am
Paul White (3/15/2010)
Arjun Sivadasan (3/15/2010)
Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Of course, with the addition of index).I would love to hear more about that.
Can you post a demo of a temporary table being faster than the ROW_NUMBER or Sequence Top methods?
Or a technical reference for it?
Just outline your method if you like - I will happily write the test rig for you.
Paul
Aww I didn't mean to say that temp table can replace ROW_NUMBER Paul. I am sorry for not being specific. A couple of extra words costs me two posts now. 🙂
I meant to say that Steven can use temp table instead of CTE (which i had used in the query). CTEs crawl when there are heavy where clauses and when the number or rows are greater than 10k.
https://sqlroadie.com/
March 15, 2010 at 6:01 am
Arjun Sivadasan
I meant to say that Steven can use temp table instead of CTE (which i had used in the query). CTEs crawl when there are heavy where clauses and when the number or rows are greater than 10k.
Sorry, but I really want to nail this down. Bear with me.
Are you saying it is faster to copy to the rows to a temporary table, and add the ROW_NUMBER there, than it is to do it with a Common Table Expression?
Complex WHERE clauses and numbers of rows make no difference: only execution plans matter.
It seems as if you have been misled by a bad past experience and drawn an unsound conclusion.
Remember that a CTE is just a fancy derived table...
I am always happy to bust myths 😉
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 15, 2010 at 8:06 am
Sorry, but I really want to nail this down. Bear with me.
Are you saying it is faster to copy to the rows to a temporary table, and add the ROW_NUMBER there, than it is to do it with a Common Table Expression?
Sure Paul. I understand the purpose. I am saying from my experience in a particular scenario. I guess you are much better endowed than me to explain the why behind this. So, I will just explain what I did.
I had to retrieve say 'n' number of columns from a view and often joining with other views or tables. The number of columns and joins depend on certain configurations. So this one was a dynamic query. As in
declare @query nvarchar(max)
set @query = 'select a,b,c from dbo.tableabc' (dynamic to accommodate the configuration thingy)
execute sp_executesql @query
I had to join the output of this select query (which i had in a CTE) with other tables / views (didn't use row_number here).
When i replaced this CTE with a temp table and built an index on the PK column and did the joins, my query time was reduced by a great margin.
Complex WHERE clauses and numbers of rows make no difference: only execution plans matter.
It seems as if you have been misled by a bad past experience and drawn an unsound conclusion.
Remember that a CTE is just a fancy derived table...
I am always happy to bust myths 😉
You could be right here. I don't have the same query as I have changed my employer recently.
I had done one more thing which could have played its part in reducing the execution time.
Instead of
select a,b,c --the column list is built dynamically
from
dbo.tableabc tabc
join dbo.tabledef tdef
on tdef.c1 = tabc.c1 -- and so on
where
blah blah
I did
create table #temp(int PK)
insert into #temp
select PK
from dbo.tableabc tabc
join dbo.tabledef tdef
on tdef.c1 = tabc.c1
where blah blah
select a,b,c
from
#temp t
join dbo.tableabc tabc
on tabc.PK = t.PK
join dbo.tabledef tdef
on tdef.c1 = tabc.c1
I admit that I may have made an unsound conclusion. But tell me this, the fancy derived table that a CTE is, can you add an index to a CTE? I think I can prove that temp table will be faster in the above scenario. I will post some code some time soon.
Going back to the initial comment I made, I admit that in that particular scenario CTE or temp table may not make much of a difference.
PS: PK = primary key
This would have been much better if we discussed F2F. Phew!! 🙂
https://sqlroadie.com/
March 15, 2010 at 8:19 am
Thanks for explaining Arjun - I can imagine the sort of query you are talking about.
To answer your question though: no - you cannot create an index on a CTE (or a derived table for that matter). Neither are 'real' in the sense that a temporary table or table variable is.
As a purely logical construction, the CTE can obviously use any useful index that happens to exist on the base tables. Given an optimal index for generating that ranking function's output, the ROW_NUMBER CTE/derived table method is hard to beat in the specific scenario that concerns this thread.
Hard to beat, but not impossible. APPLY/TOP can beat it for some data distributions, and so (believe it or not) can a recursive CTE under some rather specific circumstances.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 15, 2010 at 8:27 am
Hey thanks for the time Paul. I will watch what I write here. I think it is important not to mislead other people who refer to the forums. I will do some performance testing on CTEs and temp tables in the same scenarios (with and without joins) and post the results. Again, thanks for the time.
https://sqlroadie.com/
March 15, 2010 at 8:32 am
Thanks for being so reasonable about it! If you come across anything in your tests that you'd like my opinion on before posting, feel free to PM me.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply