Problem Returning Distinct Results

Question

Post reply

Problem Returning Distinct Results

steven.tryner

SSC Enthusiast

Points: 127
More actions
March 12, 2010 at 5:14 am

#103909

Hi
I have a table that contains amongst others 2 columns. One (COL1) is a unique code column that increments for each record and the other (COL2) is one that contains a link to another record in a table (there could be mulltiple of these that are the same).
What I need to do is return only one record for each COL2 value and this needs to be the one with the highest COL1 value. So this would allow for the latest record for each COL2 value to be returned.
I thought it may be as simple as something like:
Select DISTINCT(COL2) From Table1 Order By COL1 Desc
However this comes up with an error as the COL1 value needs to be in the DISTINCT, but if I put this in here it brings back everything.
So if I have the following values in the table
COL1 COL2
1 10
2 55
3 68
4 55
5 8
6 66
7 66
8 4
I would want it to return the rows
COL1 COL2
1 10
3 68
4 55
5 8
7 66
8 4
Thank
Steve

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

GROUP BY is the more general form of DISTINCT:

Select MAX(COL1) as COL1, COL2

From Table1

Group By COL2

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Paul White SSC Guru Points: 150467 More actions · Answer 2

Some people find this way of writing the query more intuitive:

Setup:

CREATE TABLE #Data

(

col1 INTEGER NOT NULL PRIMARY KEY,

col2 INTEGER NOT NULL,

);

GO

INSERT #Data (col1, col2) VALUES (1, 10);

INSERT #Data (col1, col2) VALUES (2, 55);

INSERT #Data (col1, col2) VALUES (3, 68);

INSERT #Data (col1, col2) VALUES (4, 55);

INSERT #Data (col1, col2) VALUES (5, 08);

INSERT #Data (col1, col2) VALUES (6, 66);

INSERT #Data (col1, col2) VALUES (7, 66);

INSERT #Data (col1, col2) VALUES (8, 04);

Solution:

SELECT D1.col1,

D1.col2

FROM #Data D1

WHERE col1 =

(

SELECT MAX(D2.col1)

FROM #Data D2

WHERE D2.col2 = D1.col2

)

ORDER BY

D1.col1;

GO

DROP TABLE #Data;

Results:

col1 col2

==== ====

3 68

7 66

4 55

1 10

5 8

8 4

Despite appearances, this implementation only reads from the source table once:

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 3

Hi Steven,

Is this what you need?

create table #t(col1 int, col2 int)

insert into #t values (1, 10)

insert into #t values (2, 55)

insert into #t values (3, 68)

insert into #t values (4, 55)

insert into #t values (5, 8)

insert into #t values (6, 66)

insert into #t values (7, 66)

insert into #t values (8, 4)

;with cte as

(

select col1, col2, row_number() over (partition by col2 order by col1 desc) 'rn'

from

#t

)

select col1, col2 from cte where rn = 1 order by col1

drop table #T

Alternately, you can use temporary table instead of CTE. Temporary tables are faster if your query returns 10k - 100k rows. I guess it is faster for more number of rows as well but i have not verified this.

https://sqlroadie.com/

Paul White SSC Guru Points: 150467 More actions · Answer 4

Arjun Sivadasan (3/15/2010)
Alternately, you can use temporary table instead of CTE. Temporary tables are faster if your query returns 10k - 100k rows

Are you sure about that? 😉

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 5

Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Ofcourse, with the addition of index) Paul.

I was wondering what was that thing about the sample data until i saw your previous reply.

Believe me, I wrote that without seeing your reply. Or I would not have bothered to change the names.

🙂

https://sqlroadie.com/

Paul White SSC Guru Points: 150467 More actions · Answer 6

Arjun Sivadasan (3/15/2010)
Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Of course, with the addition of index).

I would love to hear more about that.

Can you post a demo of a temporary table being faster than the ROW_NUMBER or Sequence Top methods?

Or a technical reference for it?

Just outline your method if you like - I will happily write the test rig for you.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 7

Paul White (3/15/2010)
Arjun Sivadasan (3/15/2010)
Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Of course, with the addition of index).
I would love to hear more about that.
Can you post a demo of a temporary table being faster than the ROW_NUMBER or Sequence Top methods?
Or a technical reference for it?
Just outline your method if you like - I will happily write the test rig for you.
Paul

Aww I didn't mean to say that temp table can replace ROW_NUMBER Paul. I am sorry for not being specific. A couple of extra words costs me two posts now. 🙂

I meant to say that Steven can use temp table instead of CTE (which i had used in the query). CTEs crawl when there are heavy where clauses and when the number or rows are greater than 10k.

https://sqlroadie.com/

Paul White SSC Guru Points: 150467 More actions · Answer 8

Arjun Sivadasan
I meant to say that Steven can use temp table instead of CTE (which i had used in the query). CTEs crawl when there are heavy where clauses and when the number or rows are greater than 10k.

Sorry, but I really want to nail this down. Bear with me.

Are you saying it is faster to copy to the rows to a temporary table, and add the ROW_NUMBER there, than it is to do it with a Common Table Expression?

Complex WHERE clauses and numbers of rows make no difference: only execution plans matter.

It seems as if you have been misled by a bad past experience and drawn an unsound conclusion.

Remember that a CTE is just a fancy derived table...

I am always happy to bust myths 😉

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 9

Sorry, but I really want to nail this down. Bear with me.
Are you saying it is faster to copy to the rows to a temporary table, and add the ROW_NUMBER there, than it is to do it with a Common Table Expression?

Sure Paul. I understand the purpose. I am saying from my experience in a particular scenario. I guess you are much better endowed than me to explain the why behind this. So, I will just explain what I did.

I had to retrieve say 'n' number of columns from a view and often joining with other views or tables. The number of columns and joins depend on certain configurations. So this one was a dynamic query. As in

declare @query nvarchar(max)

set @query = 'select a,b,c from dbo.tableabc' (dynamic to accommodate the configuration thingy)

execute sp_executesql @query

I had to join the output of this select query (which i had in a CTE) with other tables / views (didn't use row_number here).

When i replaced this CTE with a temp table and built an index on the PK column and did the joins, my query time was reduced by a great margin.

Complex WHERE clauses and numbers of rows make no difference: only execution plans matter.
It seems as if you have been misled by a bad past experience and drawn an unsound conclusion.
Remember that a CTE is just a fancy derived table...
I am always happy to bust myths 😉

You could be right here. I don't have the same query as I have changed my employer recently.

I had done one more thing which could have played its part in reducing the execution time.

Instead of

select a,b,c --the column list is built dynamically

from

dbo.tableabc tabc

join dbo.tabledef tdef

on tdef.c1 = tabc.c1 -- and so on

where

blah blah

I did

create table #temp(int PK)

insert into #temp

select PK

from dbo.tableabc tabc

join dbo.tabledef tdef

on tdef.c1 = tabc.c1

where blah blah

select a,b,c

from

#temp t

join dbo.tableabc tabc

on tabc.PK = t.PK

join dbo.tabledef tdef

on tdef.c1 = tabc.c1

I admit that I may have made an unsound conclusion. But tell me this, the fancy derived table that a CTE is, can you add an index to a CTE? I think I can prove that temp table will be faster in the above scenario. I will post some code some time soon.

Going back to the initial comment I made, I admit that in that particular scenario CTE or temp table may not make much of a difference.

PS: PK = primary key

This would have been much better if we discussed F2F. Phew!! 🙂

https://sqlroadie.com/

Paul White SSC Guru Points: 150467 More actions · Answer 10

Thanks for explaining Arjun - I can imagine the sort of query you are talking about.

To answer your question though: no - you cannot create an index on a CTE (or a derived table for that matter). Neither are 'real' in the sense that a temporary table or table variable is.

As a purely logical construction, the CTE can obviously use any useful index that happens to exist on the base tables. Given an optimal index for generating that ranking function's output, the ROW_NUMBER CTE/derived table method is hard to beat in the specific scenario that concerns this thread.

Hard to beat, but not impossible. APPLY/TOP can beat it for some data distributions, and so (believe it or not) can a recursive CTE under some rather specific circumstances.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Arjun Sivadasan SSCertifiable Points: 5902 More actions · Answer 11

Hey thanks for the time Paul. I will watch what I write here. I think it is important not to mislead other people who refer to the forums. I will do some performance testing on CTEs and temp tables in the same scenarios (with and without joins) and post the results. Again, thanks for the time.

https://sqlroadie.com/

Paul White SSC Guru Points: 150467 More actions · Answer 12

Thanks for being so reasonable about it! If you come across anything in your tests that you'd like my opinion on before posting, feel free to PM me.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi