Select data but exclude Duplicates

  • Hi all (SQL Server 2000)

    Can someone help me with the following problem.

    table (tblSiblings) structure as follows:

    intSiblingID,

    intStudent1ID

    intStudent2ID

     

    Data looks like the following

    1     student 1    student 2

    2     student 2    student 1

    3     student 3    student 4

    4     student 4    student 3

    5     student x    student y

    6     student y    student x 

    This was set up by a previous developer. I want to extract 1 of every unique relationship in a select statement. At present, every (as far as I am aware) relationship is set up twice (the second version being inverted). I don't care if I select the normal version or the inverted version, I just need one of each relationship eg  1 to 2, 3 to 4, x to y.

    Any suggestions on how to code this. I was thinkinbg of a not in sub-select but can't get this to work.

    any help appreciated.

    thanks

    CCB

  • With the data you provided, this should work:

    SELECT x.intStudent1ID, x.intStudent2ID

    FROM tblSiblings x

    INNER JOIN tblSiblings y

      ON x.intStudent1ID = y.intStudent2ID

     AND x.intSiblingID < y.intSiblingID

  • Perfect.

    Thanks for the quick reply

    CCb.

  • You may try the following code:

    select S1.intSiblingID,

     S1.intStudent1ID,

     S1.intStudent2ID

    from #stud S1 inner join #stud S2 on

     S1.intStudent1ID = S2.intStudent2ID

     and S1.intStudent2ID = S2.intStudent1ID

     and S1.intSiblingID < S2.intSiblingID

    order by S1.intSiblingID

    But it will work for those data rows that *have* duplicates

  • Maybe you could add a little check constraint to avoid those kind of dulicates :

    check : intStudent1ID <= intStudent2ID

    that way 1, 1 is valid (but only once because I assume that you have a unique constraint in there... by the way this should really be your primary key clustered). Also 1, 2 is valid but 2, 1 is not so that solve your problem for good.

  • Hi Remi,

    Thanks for the advice. I am actual extracting the data from a legacy DB and uploadinging it into a new one where the proper constraints are set up.

    Regards

    CCB

  • Happy to hear that .

    Did you also know that intColName is a worst practice (in case the datatype changes down the line)??

  • Hi Remi,

    No I d'dn't know that.

    What do you use ?

  • The actual name that makes sens. Here's an article on this matter :

    Worst Practices - Part 1 of a Very Long Series!

  • Hungarian notation in the way it was recommended by Microsoft (it is no longer recommended) is wrong in every way and has always been. That goes for general programming and even more so for naming database objects. Columns should be named to show what property of the entity they hold, nothing more.

  • I agree for database objects. But I like to name my variables bVerdict for booleans, sSomething for strings. Saves a lot of time on long procedures. But I definitly stay away from that when naming columns. However I prefix the views with vw, the functions with fn and table functions with fnvw. That's it .

  • Joel Spolsky has a great article that discusses why Hungarian notation is stupid. Or rather the Hungarian notation as it was recommended (and used) by Microsoft, which is not at all the same as the inventor meant. The article is really about a bigger subject (still very interesting), so if only interested in Hungarian notation then skip to the end (the section called "I'm Hungary").

    Making Wrong Code Look Wrong

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply