using distinct

  • I'm looking at some other developer's code in stored procedures.

    I see she used distinct in the select statement in quite a few of the queries.

    the select queries related with 2-3 table joins.

    I run those queries and use distinct or not use it -returns the same numbers of records.

    I guess she may think it is safe to add distinct to remove duplicated records. I will probably do so too.

    My question is: does using distinct or not affect the performance of the query? does it returns the same execution plan?

    I tested her queries, it excutes using the same much time.

    But for my question , I mean in generic, does distinct affect excution time or not?

    Thanks

  • I have seen it go both ways. It depends on the query and how the results are being displayed or not displayed.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Asking for distinct values requires SQL server to do an additional task. Not only does it need to find and return all data matching the specified criteria, it now also needs to see if any duplicates are in the results and if found these need to be filtered out. So from that point of view asking for a distinct result set is always going to ask a bigger effort from your server, i.e. going to be slower.

    However, if you've got a lot of duplicates in your result set, the entire result set may be a significantly bigger than only the unique values in there. i.e. By the sheer size of the data being processed the distinct set may be returned to you faster than the entire set. Thus making the distinct query look "faster".

    Yet an other factor is the optimizer. If this can somehow determine that the result set you requested is always going to be unique, it will not add the code for filtering out distinct values, thereby such a query will show no performance difference with or without the distinct. And even if the filtering code needs to be added, the optimizer may see a way to do it very early on in the data collection process, thus resulting in very little (maybe even close to unnoticeable) performance loss.

    So in general: yes, adding distinct costs performance. However the amount of data needing to be processed/transferred has such a big influence on the overall performance of your query that by reducing the amount of data to be processed the net result of the distinct query can be any magnitude better than that of the non-distinct query. It is again a matter of "it depends".



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • DISTINCT is an aggregation. Will it always increase execution time? No, but it frequently does. It's usually put in as a crutch because someone doesn't know how to properly write TSQL code so their JOINs or WHERE clauses create duplicate rows and the aggregation (effectively a GROUP BY operation) eliminates duplicates. Fixing the TSQL in these cases is an imperative and usually results in faster execution. Sometimes it's because of bad structures that have ill-defined JOIN criteria or don't enforce referential integrity so that there are actual duplicates. In these cases you can't remove the DISTINCT until you fix the data & structure problems.

    DISTINCT should only be present when you know precisely why you're using it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • +1

    In your example DISTINCT is used as a check to make sure that the output returns only a unique result , I have seen code where developers have put in top 1 in the same fashion. This is not the way the DISTINCT and TOP Keyword are to be used.

    Test the code with data and validate your getting the correct results instead of putting int checks like DISTINCT etc.

    Your goal should always be to get exactly those records you want.

    Jayanth Kurup[/url]

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply