SELECT Statement Question

  • Hi GSquared,

    Thank you for your input. I am currently working with a tutor on the finer points before I go fo certification.

    She and I were discussing this very same keyword and its results.

    Per MSDN site for Transact-SQL, it says:

    "The DISTINCT keyword eliminates duplicate rows from the results of a SELECT statement."

    As I am still in training mode, right now I cannot try the query both ways.

    Would you please elaborate on why DISTINCT doesn't apply to this query, thanks.

  • Hi GSquared,

    Thank you for your input. I am currently working with a tutor on the finer points before I go fo certification.

    She and I were discussing this very same keyword and its results.

    Per MSDN site for Transact-SQL, it says:

    "The DISTINCT keyword eliminates duplicate rows from the results of a SELECT statement."

    As I am still in training mode, right now I cannot try the query both ways.

    Would you please elaborate on why DISTINCT doesn't apply to this query, thanks.

    This creates additional overhead with no benefit because it does not matter whether or not 1 or 10 of the same value exist in the in clause. It only matters that one exists.

    For example:

    declare @t table(

    ID int,

    txt varchar(25)

    )

    insert into @t

    select 1, 'test' union all

    select 1, 'test2' union all

    select 2, 'test3' union all

    select 3, 'test4'

    select *

    from @t

    WHERE ID IN (SELECT DISTINCT ID from @t Where ID < 3)

    --i look look like this

    --where id in (1,2)

    select *

    from @t

    WHERE ID IN (SELECT ID from @t Where ID < 3)

    --i look look like this

    --where id in (1,1,2)

    The main thing to note here is that both queries returned the same records and the second did not incur an additional overhead because distinct was not used. It may seem minuscule in this example but on a table with lots of records using distinct can degrade performance.

  • robertafricker (3/11/2008)


    Hi GSquared,

    Thank you for your input. I am currently working with a tutor on the finer points before I go fo certification.

    She and I were discussing this very same keyword and its results.

    Per MSDN site for Transact-SQL, it says:

    "The DISTINCT keyword eliminates duplicate rows from the results of a SELECT statement."

    As I am still in training mode, right now I cannot try the query both ways.

    Would you please elaborate on why DISTINCT doesn't apply to this query, thanks.

    The theory behind it is a little complex unless you're comfortable with set-based mathematics, but it can be explained simply this way.

    The query after the "IN" statement is simply a list of allowed values for the main part.

    If I said, "you can have 5 apples, or 5 apples, or 5 apples, or 4 oranges", it's the same thing as, "you can have 5 apples, or 4 oranges". Right?

    "Distinct" is used to tell the database, "just give me the options that aren't duplicates of each other". In other words, it goes through and gets rid of the extra copies of "5 apples".

    That's a useful thing in the final query if the final query will be seen by a human being and will help them understand what they are looking at.

    But the database is the only thing that ever sees the results of the sub-query (the query in parenthesis after "IN" is a sub-query). It doesn't get confused by the multiple copies of "5 apples". But, if you tell it "distinct" in that case, then it has to do the work of removing the extra ones. So, in a sub-query like that, if you use "distinct", you're telling the database to do more work, which it will do, but it doesn't actually gain you anything. On a small query in a small database, it won't matter. In a larger database, with thousands of users and processes, every little bit counts.

    In your particular query, you're looking for all of the categories that are named "beverage" in the sub-query, and then the outer query does it's stuff and limits itself based on the sub-query. If, for example, there were ten category IDs that were all called "beverage", and all ten were the same, it wouldn't matter in the slightest to the database, and the results of the final query would be the same either way (with or without "distinct"). The only difference is that "distinct" tells the database to do the extra work of removing the duplicates where it doesn't need to.

    Does that help?

    (Forgive me if I'm over-explaining or telling you stuff you already know. I'm trying to make this as easy to understand as possible and I might overshoot that mark by a bit.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Great jobs with the explanations you guys!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi GSquared,

    Thank you for the more visual explanation (apples) of the DISTINCT clause.

    I definitely understand performance issues and how adding any additional search criteria will slow the DB down.

    Seeing the result set in a table form would make it easier.

    I am going to send some queries with DISTINCT in them as well as some others that I have questions about.

    Please keep an eye out, and thanks again for your expertise...:cool:

  • Roberta (or is it Robert A?):

    One thing I've found very useful over the years is to create a testing grounds database. I call mine ProofOfConcept.

    He's something you can do if it will help you to understand this. You want some actual tables you can look at:

    use proofofconcept

    go

    create table #Test1 (

    ID int identity primary key,

    Name varchar(25),

    CategoryID tinyint);

    insert into #test1 (name, categoryid)

    select 'Joe', 1 union all

    select 'Bob', 1 union all

    select 'Sam', 2 union all

    select 'Steve', 3 union all

    select 'Doug', 4

    select name

    from #test1

    where categoryid in (1, 1)

    select name

    from #test1

    where categoryid in (1)

    select name

    from #test1

    where categoryid in (1, 1, 2, 3)

    select name

    from #test1

    where categoryid in (1, 2, 3)

    If you run it, you'll see that the first two selects return the same results, and the second two selects return the same results, and it's pretty obvious when you read the queries.

    Think of a sub-query as a way to have SQL fill in the list of values, instead of you typing "(1, 2, 3)", then try the following.

    create table #Test2 (

    Category tinyint)

    insert into #test2 (category)

    select 1 union all

    select 1 union all -- notice the duplicate

    select 2 union all

    select 3 union all

    select 4

    select category -- will show the duplicate

    from #test2

    select distinct category -- won't show the duplicate

    from #test2

    select name

    from #test1

    where categoryid in

    (select category -- with duplicate

    from #test2)

    select name

    from #test1

    where categoryid in

    (select distinct category -- without duplicate

    from #test2)

    Try running this code in Management Studio, so you can see it and see how it works. You'll notice that the last two selects get the exact same results.

    Whenever I run into something new and interesting on these forums (or other SQL web pages, etc.), I try it out in my ProofOfConcept database. It makes it much easier to understand when I can actually run the code, where it can't possibly mess up a server or real database.

    With these tests, you can actually see the results very easily. The tables are simple enough and have few enough rows to get a better grip on it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi GSQuared,

    Yes, it's Roiberta, the feminine version....; )

    Besides that, I agree, that "seeing is believing."

    I do have SQL Express Edition on my PC, so I will try

    what you have provided.

    That's part of my situation, I need to be actually

    executing the queries and reviewing the results.

    Thanks again, I'll follow up with you on the results.

  • If you're serious about getting into SQL Server, I recommend getting a copy of SQL Server 2005 Developer Edition. It's about $50 and gives you much more power and more tools and features than Express. Some things you may want to try out might not work in Express. Everything works in Dev.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • And, on your name: Pleased to meet you. I'm Gus, but use GSquared on as many of these forums as I can (it's a joke that goes back to high school in the 80s).

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi Adam,

    Thank you for your code samples, I have copied them into a Word doc to review. It's going to take a little time for me to really look it over.

    I do understand that DISTINCT is a performance issue, and does not return duplicates.

    Therefore, my next question would be WHEN would it be appropriate to use it?

    Let me know, thanks!

  • Hi Gus, (and I'm Roberta not Roiberta as I mistyped-!)

    I still like your tag name, I'm sure there's a company out there with that same name. ( I think in Seattle where I recently moved from).

    Yes, I am definitely serious about SQL Server. The online course I just finished really separates the wheat from the chaff shall we say...;)...meaning, you wouldn't finish it if you weren't committed.

    As I think I mentioned in this topic, I am working with a tutor who works for Microsoft as well, in addition to utilizing this forum. You just can't get TMI especially as a Newbie. And you know the old adage, 2 heads are better than....etc....

    I'm copying your code in a word doc for the near future. My hands are currently full studying 200 pages of notes and working with the tutor and you all as well.

    I'm getting ready to look at your other replies on my other queries, thanks for your eternal wizdom.

  • robertafricker (3/12/2008)


    Hi Adam,

    Thank you for your code samples, I have copied them into a Word doc to review. It's going to take a little time for me to really look it over.

    I do understand that DISTINCT is a performance issue, and does not return duplicates.

    Therefore, my next question would be WHEN would it be appropriate to use it?

    Let me know, thanks!

    Distinct is appropriate to use when the data is being presented to a human being, and removing duplicate rows will increase ease of use and speed of comprehension.

    It can also be used correctly in circumstances where you absolutely don't want duplicates.

    For example, if you are populating a temporary table with first names out of a customers table, and you want the first name to be the primary key, you'll definitely want to use Distinct in that case.

    Basically, the main thing to keep in mind is when to not use it, which is mainly in sub-queries only the database ever sees, or in places where the data is going to be unique already. For example, if you are selecting a list, and the select includes the primary key, each row is already going to be unique, so Distinct would, again, be a waste of time.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Roberta,

    I'd like to add that the code samples using Northwind might be intended to illustrate how to do something, but not necessarily the best or most efficient way to do it.

  • Hi William,

    Thank you for the tip. I had another member say the same thing and suggest purchasing the Developer edition for about $50 which I will do.

    Right now, I am studying up a storm, so it's one thing at a time. Thanks again.

    -Roberta-

  • Roberta,

    Welcome to the SSC Community!

    As you can see from the deluge of responses, a lot of folk are more than willing to work with people like you that are putting in the time and energy to make yourself understand the "basics".

    When and if you decide to, you can order a copy of Microsoft SQL Server Developer Edition 2005 Win32 X64/IA64 English CD/DVD for $49.95 (U.S. Dollars) Item: E32-00575 here: http://www.microsoft.com/products/info/product.aspx?view=22&pcid=f544888c-2638-48ed-9f0f-d814e8b93ca0&type=ovr

    I look forward to your future postings!

    Happy T-SQL'ing,

    "Key"
    MCITP: DBA, MCSE, MCTS: SQL 2005, OCP

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply