problem with DISTINCT

Question

problem with DISTINCT

jonny-291221

SSC Rookie

Points: 26
More actions
January 5, 2006 at 10:51 am

#170053

Hello. I have a complex query that joins 2 tables, a person table and an address table. The problem is i need only ONE of each person, but if i put distinct in the select statement, it still returns multiple rows for people with more than one address entry. Is there a way to force a distinct on the person id and not all the information i am selecting? This is being used in a search query to find a person and i want some information from the address table displayed, but only one entry for each person.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

You need to provide the business rules for which address to select.

If there are multiple addresses per person, and you want some or all of the address columns in the resultset, then you need business rules for discarding addresses. Do you want to show only the most recently created address ? The most recently updated ? The most complete ? A random selection ?

jonny-291221 SSC Rookie Points: 26 More actions · Answer 2

just a random selection i guess...

my company wants to see the city, position, and company name from the address table in the search results table, tho they only want to see one of every person, and many people have more than one address...

JacekO SSCertifiable Points: 6610 More actions · Answer 3

If you provide the table definitions for both we can see what can be done.

---------------------------------------------
[font="Verdana"]Nothing is impossible.
It is just a matter of time and money.[/font]

Vladan SSC-Insane Points: 22072 More actions · Answer 4

IMHO random selection is the worst possible solution. Lets say someone has 5 addresses, one of them is address that they are using only for invoicing, another one is incomplete, one was entered 10 years ago and never used since that time, and the rest are actual, active addresses. You really don't care which one will be returned? Just that it is some address connected in a way with the person?

Richard Gardner-291039 Hall of Fame Points: 3603 More actions · Answer 5

I agree with Vladan - sounds like you've got a problem with your data model. If there is more than one address for each person then you have to work out why there is more than one - are there different types of addresses for each person (i.e. invoice, despatch, home, work), in which case you need to give each address a category number then decide which category of address you want before you run your query - that way you restrict the recordset to one address per person and the user has control over the type of data returned.

If it is simply that addresses are badly managed and there should only be one per person then you need some way of controlling and archiving old addresses and uniquely returning the current one, like a flag controlled from the user interface (i.e. an 'archived' column which reads either "Arch" or "Current" then return WHERE archived = 'Current').

Until you have one of these you're whistling in the wind.....

p.s. It has just occured to me you might be looking at the problem the wrong way - the reason you are getting multiple people returned is because for each person the database is finding more than one address, for each instance of an address it finds it returns the person each time. Rather than concentrating on getting rid of the "extra" people you need to concentrate on getting rid of the extra addresses..... Rememer the DISTINCT clause only refers to DISTINCT rows in the recordset - so if you have Pete Smith at 10 Smith Close and 12 Smith Close the db sees these are not distinct and returns both of them.

Stewart Joslyn SSCertifiable Points: 6131 More actions · Answer 6

A crude (untested) solution would be :

SELECT c.*, a.* FROM Contacts c

JOIN AddressTable a ON a.AddressID = c.PersonID JOIN

(SELECT PersonID, MAX(AddressID) AS MaxAddress FROM AddressTable GROUP BY PersonID) dt

ON dt.MaxAddress = a.AddressID

I'm sure that this could be optimised but it gives the idea (Get a filtered list of addresses by some rule in a derived table and then use that).

Richard Gardner-291039 Hall of Fame Points: 3603 More actions · Answer 7

Of course that would work Stewart, fair point.

Interesting bone of contention between myself and our developers. I say that a nested max should be unnecessary in a well designed database, they say I am being a wuss with my SQL - I'd be interested in hearing any thoughts anyone has about that one - I'm actually having an argument about it with them at the moment.

Of course in this case it would return the latest address added to the database, which you would assume in most cases would be the correct one.

jonny-291221 SSC Rookie Points: 26 More actions · Answer 8

yeah the tables were not set up in an efficient way (and the code and queries were horribly written), but they dont want much time spent on this program, tho they want it to work right. we are redoing our major internal software soon, and after that is done we will be redoing this little program. We have decided for now just to let it return different rows for different addresses. Thanks for the input!

jonny-291221 SSC Rookie Points: 26 More actions · Answer 9

in my experience, assumptions always end in fixes later cause users will always do the wrong thing

it would be better to store a timestamp for when the info was updated and grab the most recently updated row based on that

Stewart Joslyn SSCertifiable Points: 6131 More actions · Answer 10

It would return the latest address onli if the ids were sequential and not GUIDs or something else horrible ... and there's no guarantee that the latest is the correct one, anyway.

I agree that there should be an indicator to distinguish between multiple addresses. Given that the same address could, in principle, be of different significance to different contacts (eg its my main office but your secondary office), then there should be a relationship table will fields for contact id, address id and relationship. Start and end dates of address validity on the relationship table also would be good and lastupdated and updateuser everywhere.