March 1, 2006 at 5:06 am
I am trying to join 5 tables in a sql server 2k db. Does anyone know of a good set of guidelines for doing this? Alternately, could someone find the problem in the following query?
The query that I am using is listed here (please let me know if I am violating any programming guidelines on this):
SELECT p.ParticipantID, pr.Age, ir.FnlTime, e.EventDate
FROM Participant p INNER JOIN PartRace pr ON p.ParticipantID = pr.ParticipantID
JOIN IndResults ir ON pr.ParticipantID = ir.ParticipantID
JOIN RaceData rd ON ir.RaceID = rd.RaceID
JOIN Events e ON e.EventID = rd.EventID
WHERE rd.Dist = '5_km' AND p.Gender = 'm' AND ir.FnlTime <> '00:00' AND e.EventGrp = 1
ORDER BY ir.ParticipantID
The problem that I am having is that if a participant shows up multiple times (which they could do since this is designed to get the performances for an event over a series of years) it does not associate the correct data from year to year. Basically some times show up where they shouldn't.
March 1, 2006 at 5:16 am
Post the DDL please. For now, I recommend, you qualify your joins. SQL Qyery Optimizer interprets, true, but are your joins MEANT to be what is interpreted? e.g. INNER, LEFT OUTER etc...?
_/_/_/ paramind _/_/_/
March 1, 2006 at 5:21 am
There is obviously much I don't know about this process. What is DDL? Also, I will qualify my joins as you suggested.
Thanks~
March 1, 2006 at 5:33 am
DDL = data description language, which means the creation statements for your tables in this case.
Is the following, what you mean? :
On an event, multiple races take place(e).
For each race, some racedata (beginning, end etc. is stored)
Each race has some participants.
A participant has ONE individual result per race.
The participant has further properties, eg. name, age ...
?
SELECT
p.ParticipantID,
pr.Age,
ir.FnlTime,
e.EventDate
FROM Events e
INNER JOIN RaceData rd
ON e.EventID = rd.EventID
INNER JOIN PartRace pr
ON rd.RaceID = pr.RaceID
INNER JOIN IndResults ir
ON pr.ParticipantID = pr.ParticipantID
INNER JOIN Participant p
ON pr.ParticipantID = p.ParticipantID
WHERE
rd.Dist = '5_km'
AND p.Gender = 'm'
AND ir.FnlTime <> '00:00'
AND e.EventGrp = 1
ORDER BY
ir.ParticipantID
_/_/_/ paramind _/_/_/
March 1, 2006 at 6:26 am
Just guessing but I think the problem is the join between PartRace and IndResults not matching on RaceID
SELECT p.ParticipantID, pr.Age, ir.FnlTime, e.EventDate
FROM Participant p
INNER JOIN PartRace pr
ON pr.ParticipantID = p.ParticipantID
INNER JOIN IndResults ir
ON ir.ParticipantID = pr.ParticipantID
AND ir.RaceID = pr.RaceID
AND ir.FnlTime <> '00:00'
INNER JOIN RaceData rd
ON rd.RaceID = ir.RaceID
AND rd.Dist = '5_km'
INNER JOIN Events e
ON e.EventID = rd.EventID
AND e.EventGrp = 1
WHERE p.Gender = 'm'
ORDER BY p.ParticipantID
Far away is close at hand in the images of elsewhere.
Anon.
March 1, 2006 at 7:11 am
DDL is not data description language, DDL is data definition language and the last query is better because per ANSI SQL when the query processor see the FROM and ON clause in a JOIN operation the WHERE clause is ignored so the three extra evaluations introduced with the AND operator after the WHERE clause will be ignored by the query processor. Hope this helps.
Kind regards,
Gift Peddie
Kind regards,
Gift Peddie
March 1, 2006 at 7:23 am
definition : of course - just overpaced while planning dinner
But with the query, it is still NOT clear, whether inner or outer joins are meant. The inner joins are just a guess, as I denoted. With regard to possible outer joins, it makes a big difference in query logic, whether the criteria are incorporated in join logic or later applied to the result set. As this is still unknown, I'd rather not worry about which query performs better, since we don't even know, which one is possibly wrong in terms of logic
_/_/_/ paramind _/_/_/
March 1, 2006 at 7:45 am
From the standpoint of code itself this is how I would setup the query.
SELECT
p.ParticipantID,
pr.Age,
ir.FnlTime,
e.EventDate
FROM
dbo.Participant p
INNER JOIN
dbo.PartRace pr
INNER JOIN
dbo.IndResults ir
INNER JOIN
dbo.RaceData rd
INNER JOIN
dbo.Events e
ON
e.EventID = rd.EventID AND
e.EventGrp = 1 AND
rd.Dist = '5_km'
ON
ir.RaceID = rd.RaceID AND
ir.FnlTime != '00:00'
ON
pr.ParticipantID = ir.ParticipantID
-- feel like you probably have a race id that needs to be reference in this join
-- AND pr.RaceID = ir.RaceID
ON
p.ParticipantID = pr.ParticipantID
WHERE
p.Gender = 'm'
ORDER BY
ir.ParticipantID
March 1, 2006 at 7:55 am
why would you do so?
_/_/_/ paramind _/_/_/
March 1, 2006 at 8:30 am
First and foremost for readability.
Second to constrain the joins so I cannot connect on unrealted items. Queries are interpreted innermost to outtermost joins, top to bottom.
So the join
dbo.RaceData rd
INNER JOIN
dbo.Events e
ON
e.EventID = rd.EventID AND
e.EventGrp = 1 AND
rd.Dist = '5_km'
The ON will not let me reference any table other than RaceData and Events.
Also in general the innermost join will occurr (and many times this is what the query engine will try to do itself anyway) first and work it's way out. This hopefully with provide a small dataset comparing to a larger each time which means the query at each stage should generate the smallest possible dataset to work with before continuing, this can also be helped along when you take the where conditions that relate to that join and move them to the ON clause.' Which in turn makes it easier to read that when I look at table Events, I only want EventGrp = 1 and Dist = '5_km' at the time the join to RaceData occurs.
March 1, 2006 at 8:50 am
I'm pretty sure, I know and I knew, what you mean, but let me state again, what I already wrote in a prior message:
>it is still NOT clear, whether inner or outer joins are meant. The inner joins are just a guess, as I denoted.<
Do you know, what will happen, when you incorporate '5 km' in the join logic and there's no data entered at some point?
You will miss the records, not for not being there, but for being incomplete. Wouldn't you always want to adjust the "where"-filter only, when querying for different cases?, e.g. WHERE Dist IS NULL? But you'll never get those results, if already a join condition.
As for the database's table-logic-driven join conditions, I agree. (e.g. ID etc.)
_/_/_/ paramind _/_/_/
March 1, 2006 at 4:18 pm
Ok this has been incredibly helpful and educational. I apologize for anyone who was trying to help me with insufficient information. The last query submitted worked very well and, yes, you were correct that the there needed to be a connection between the RaceIDs so thanks for that suggestion.
I have a question: Why the dbo. ?
Thanks!
March 1, 2006 at 4:30 pm
>I have a question: Why the dbo. ?<
the object owner (here dbo. = DataBaseOwner) is always an integral part of naming in SQL-Server. This is true for all objects (tables, procedures etc.) Try the following:
CREATE TABLE dbo.[tblCustomer] ([CustomerID] [int] NOT NULL)
CREATE TABLE user1.[tblCustomer] ([CustomerID] [int] NOT NULL)
CREATE TABLE user2.[tblCustomer] ([CustomerID] [int] NOT NULL)
CREATE TABLE user3.[tblCustomer] ([CustomerID] [int] NOT NULL)
dbo is simply implicitly assumed, as long as you don't provide another distinct name. In other words - if you use different object owners you will run into problems, if you omit the owner part. Nice feature, but beware of the pitfalls.
PS: Replace users1 to 3 with real accounts in your db
_/_/_/ paramind _/_/_/
March 7, 2006 at 2:15 pm
There is also a minor performance gain in not forcing the query engine to go looking. It actually does first go looking for loggedonuser.object before assuming dbo.object. And in case someone thinks to ask curren server and current database are always assumed for their portion of the 4 part name.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply