Problem Returning Distinct Results

  • Hi

    I have a table that contains amongst others 2 columns. One (COL1) is a unique code column that increments for each record and the other (COL2) is one that contains a link to another record in a table (there could be mulltiple of these that are the same).

    What I need to do is return only one record for each COL2 value and this needs to be the one with the highest COL1 value. So this would allow for the latest record for each COL2 value to be returned.

    I thought it may be as simple as something like:

    Select DISTINCT(COL2) From Table1 Order By COL1 Desc

    However this comes up with an error as the COL1 value needs to be in the DISTINCT, but if I put this in here it brings back everything.

    So if I have the following values in the table

    COL1 COL2

    1 10

    2 55

    3 68

    4 55

    5 8

    6 66

    7 66

    8 4

    I would want it to return the rows

    COL1 COL2

    1 10

    3 68

    4 55

    5 8

    7 66

    8 4

    Thank

    Steve

  • GROUP BY is the more general form of DISTINCT:

    Select MAX(COL1) as COL1, COL2

    From Table1

    Group By COL2

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Some people find this way of writing the query more intuitive:

    Setup:

    CREATE TABLE #Data

    (

    col1 INTEGER NOT NULL PRIMARY KEY,

    col2 INTEGER NOT NULL,

    );

    GO

    INSERT #Data (col1, col2) VALUES (1, 10);

    INSERT #Data (col1, col2) VALUES (2, 55);

    INSERT #Data (col1, col2) VALUES (3, 68);

    INSERT #Data (col1, col2) VALUES (4, 55);

    INSERT #Data (col1, col2) VALUES (5, 08);

    INSERT #Data (col1, col2) VALUES (6, 66);

    INSERT #Data (col1, col2) VALUES (7, 66);

    INSERT #Data (col1, col2) VALUES (8, 04);

    Solution:

    SELECT D1.col1,

    D1.col2

    FROM #Data D1

    WHERE col1 =

    (

    SELECT MAX(D2.col1)

    FROM #Data D2

    WHERE D2.col2 = D1.col2

    )

    ORDER BY

    D1.col1;

    GO

    DROP TABLE #Data;

    Results:

    col1 col2

    ==== ====

    3 68

    7 66

    4 55

    1 10

    5 8

    8 4

    Despite appearances, this implementation only reads from the source table once:

  • Hi Steven,

    Is this what you need?

    create table #t(col1 int, col2 int)

    insert into #t values (1, 10)

    insert into #t values (2, 55)

    insert into #t values (3, 68)

    insert into #t values (4, 55)

    insert into #t values (5, 8)

    insert into #t values (6, 66)

    insert into #t values (7, 66)

    insert into #t values (8, 4)

    ;with cte as

    (

    select col1, col2, row_number() over (partition by col2 order by col1 desc) 'rn'

    from

    #t

    )

    select col1, col2 from cte where rn = 1 order by col1

    drop table #T

    Alternately, you can use temporary table instead of CTE. Temporary tables are faster if your query returns 10k - 100k rows. I guess it is faster for more number of rows as well but i have not verified this.

    https://sqlroadie.com/

  • Arjun Sivadasan (3/15/2010)


    Alternately, you can use temporary table instead of CTE. Temporary tables are faster if your query returns 10k - 100k rows

    Are you sure about that? 😉

  • Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Ofcourse, with the addition of index) Paul.

    I was wondering what was that thing about the sample data until i saw your previous reply.

    Believe me, I wrote that without seeing your reply. Or I would not have bothered to change the names.

    🙂

    https://sqlroadie.com/

  • Arjun Sivadasan (3/15/2010)


    Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Of course, with the addition of index).

    I would love to hear more about that.

    Can you post a demo of a temporary table being faster than the ROW_NUMBER or Sequence Top methods?

    Or a technical reference for it?

    Just outline your method if you like - I will happily write the test rig for you.

    Paul

  • Paul White (3/15/2010)


    Arjun Sivadasan (3/15/2010)


    Ya I have tried that (the temp table approach on queries that return ~40k rows with performance improvement from 10 to 2 secs. Of course, with the addition of index).

    I would love to hear more about that.

    Can you post a demo of a temporary table being faster than the ROW_NUMBER or Sequence Top methods?

    Or a technical reference for it?

    Just outline your method if you like - I will happily write the test rig for you.

    Paul

    Aww I didn't mean to say that temp table can replace ROW_NUMBER Paul. I am sorry for not being specific. A couple of extra words costs me two posts now. 🙂

    I meant to say that Steven can use temp table instead of CTE (which i had used in the query). CTEs crawl when there are heavy where clauses and when the number or rows are greater than 10k.

    https://sqlroadie.com/

  • Arjun Sivadasan


    I meant to say that Steven can use temp table instead of CTE (which i had used in the query). CTEs crawl when there are heavy where clauses and when the number or rows are greater than 10k.

    Sorry, but I really want to nail this down. Bear with me.

    Are you saying it is faster to copy to the rows to a temporary table, and add the ROW_NUMBER there, than it is to do it with a Common Table Expression?

    Complex WHERE clauses and numbers of rows make no difference: only execution plans matter.

    It seems as if you have been misled by a bad past experience and drawn an unsound conclusion.

    Remember that a CTE is just a fancy derived table...

    I am always happy to bust myths 😉

  • Sorry, but I really want to nail this down. Bear with me.

    Are you saying it is faster to copy to the rows to a temporary table, and add the ROW_NUMBER there, than it is to do it with a Common Table Expression?

    Sure Paul. I understand the purpose. I am saying from my experience in a particular scenario. I guess you are much better endowed than me to explain the why behind this. So, I will just explain what I did.

    I had to retrieve say 'n' number of columns from a view and often joining with other views or tables. The number of columns and joins depend on certain configurations. So this one was a dynamic query. As in

    declare @query nvarchar(max)

    set @query = 'select a,b,c from dbo.tableabc' (dynamic to accommodate the configuration thingy)

    execute sp_executesql @query

    I had to join the output of this select query (which i had in a CTE) with other tables / views (didn't use row_number here).

    When i replaced this CTE with a temp table and built an index on the PK column and did the joins, my query time was reduced by a great margin.

    Complex WHERE clauses and numbers of rows make no difference: only execution plans matter.

    It seems as if you have been misled by a bad past experience and drawn an unsound conclusion.

    Remember that a CTE is just a fancy derived table...

    I am always happy to bust myths 😉

    You could be right here. I don't have the same query as I have changed my employer recently.

    I had done one more thing which could have played its part in reducing the execution time.

    Instead of

    select a,b,c --the column list is built dynamically

    from

    dbo.tableabc tabc

    join dbo.tabledef tdef

    on tdef.c1 = tabc.c1 -- and so on

    where

    blah blah

    I did

    create table #temp(int PK)

    insert into #temp

    select PK

    from dbo.tableabc tabc

    join dbo.tabledef tdef

    on tdef.c1 = tabc.c1

    where blah blah

    select a,b,c

    from

    #temp t

    join dbo.tableabc tabc

    on tabc.PK = t.PK

    join dbo.tabledef tdef

    on tdef.c1 = tabc.c1

    I admit that I may have made an unsound conclusion. But tell me this, the fancy derived table that a CTE is, can you add an index to a CTE? I think I can prove that temp table will be faster in the above scenario. I will post some code some time soon.

    Going back to the initial comment I made, I admit that in that particular scenario CTE or temp table may not make much of a difference.

    PS: PK = primary key

    This would have been much better if we discussed F2F. Phew!! 🙂

    https://sqlroadie.com/

  • Thanks for explaining Arjun - I can imagine the sort of query you are talking about.

    To answer your question though: no - you cannot create an index on a CTE (or a derived table for that matter). Neither are 'real' in the sense that a temporary table or table variable is.

    As a purely logical construction, the CTE can obviously use any useful index that happens to exist on the base tables. Given an optimal index for generating that ranking function's output, the ROW_NUMBER CTE/derived table method is hard to beat in the specific scenario that concerns this thread.

    Hard to beat, but not impossible. APPLY/TOP can beat it for some data distributions, and so (believe it or not) can a recursive CTE under some rather specific circumstances.

    Paul

  • Hey thanks for the time Paul. I will watch what I write here. I think it is important not to mislead other people who refer to the forums. I will do some performance testing on CTEs and temp tables in the same scenarios (with and without joins) and post the results. Again, thanks for the time.

    https://sqlroadie.com/

  • Thanks for being so reasonable about it! If you come across anything in your tests that you'd like my opinion on before posting, feel free to PM me.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply