February 27, 2006 at 11:37 am
On 2/10 I posted a question, which a member (PW) responded to and gave me a great solution, which also follows. However, I found that the code does not work when Table1.Field1 is not unique. Here is the problem as it was originally posted:
"I want to select several fields from 2 tables where each row in Table1 may map to 0 to n rows on Table2, but I only want to return a single row from Table2 where a date field is >= a date on Table1.
For example:
Select R_ID
, P.Field1
, Date1
, S.Field2
, S.Field3
from Table1 as P
Left outer join Table2 as S
on P.Field1 = S.Field1 and S.StartDate >= P.Date1
Order by Field1, S.StartDate
This returns all the rows on Table2 but I want only that row with the earliest S.StartDate
I tried to use a subquery with the result set sorted by StartDate asc and specify "Top1" but I couldn't get the syntax to work.
Any suggestions?"
---------
PW's proposed solution works most of the time. It was:
"For problems like this, you generally need to join to a derived table (note, not a sub-query), where the derived table introduces a column that resolves to just 1 row. In your case, you want to introduce MIN(StartDate) as a date to join on:
In this example, I used a derived table named "dtEarliest"
Select R_ID
, P.Field1
, Date1
, S.Field2
, S.Field3
from Table1 as P
Left outer join
-- Join a derived table, to get the Earliest date for each Field1 value
(
Select S.Field1, Min(S.StartDate) As EarliestStartDate
From Table1 As P
Inner Join Table2 as S
on P.Field1 = S.Field1 and S.StartDate >= P.Date1
Group By S.Field1
) dtEarliest
On (dtEarliest.Field1 = P.Field1)
-- Joint to Table2, this time joining on both Field1 and the Date
-- so that there is only 1 row per Field1
Left Outer Join Table2 As S
On (dtEarliest.Field1 = S.Field1 And
dtEarliest.EarliestStartDate = s.StartDate)
Order by P.Field1, S.StartDate
"
----------------------------------------
It turns out that Table1.Field1 is not unique. I can have any number of rows with the same value of Field1; the rows would be distinct because they would differ on Date1. And because they have different values for Date1, they also map to different values for Table2.Field2. If, for example, my Table1 has 3 rows where Field1 = '555' then all 3 rows are returned in the solution set but they are returned with identical values for Table2.Field2. And that value for Field2 is the value that should only map to the row with the earliest Table1.Date1.
The result set might look like:
Field1 Date1 Field2 StartDate
555 01/01/2005 aaaa 1/3/2005
555 05/01/2005 aaaa 1/3/2005
555 02/24/2006 aaaa 1/3/2005
StartDate should always be >= Date1 so the 2nd and 3rd rows are obviously wrong. What isn't obvious is that the values for Field2 are also wrong for the 2nd and 3rd rows.
I am stumped. Please help?
Thank you.
February 27, 2006 at 12:11 pm
Select P.R_ID
, P.Field1
, P.Date1
, Min(S1.StartDate) As EarliestStartDate
, S2.Field2
, S2.Field3
from Table1 as P
Left join Table2 as S1
On P.Field1 = S1.Field1
And S1.StartDate >= P.Date1
Left Join Table2 S2
On P.Field1 = S2.Field1
Group By P.R_ID
, P.Field1
, P.Date1
, S2.Field2
, S2.Field3
HAVING S2.StartDate = MIN(S1.StartDate)
February 27, 2006 at 3:08 pm
Thank you for helping, Kevin.
I am getting an error which states:
"S2.StartDate is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause."
I am reviewing the code to see if I left something out. Do you recommend adding S2.StartDate to the Group BY clause?
February 27, 2006 at 3:18 pm
Kevin - thanks again for suggesting a solution!...
I added S.StartDate to the Group BY clause. While the result set returned looks accurate, it took 1.5 minutes to return, which is significantly longer than the solution with the derived table.
Table2 may contain several rows with the same Field1 and the same EarliestStartDate. I want to return only 1 row. I tried using "TOP 1" in the Select statement of the derived table originally suggested to me, but that did not work - all the StartDates and EndDates returned were NULL. Any suggestions?
I am learning a lot.
February 27, 2006 at 11:34 pm
did you set a primary key or index for your tables? how many records in each?
you may think about using the EXISTS statement.
p
inner join s1
on ...
where exists
(select min() from s2 group by...)
check out msdn.com on the usage of EXISTS
also, are you looking for only one record from both table1 and table2? If so, what's the criteria? MAX(R_ID) ?
February 28, 2006 at 8:16 am
Table1 has 18 rows. Table2 (actually a view) has about 15k rows. I had a field in table1 set as identity but had not set it as PK. I set it as PK just now and reran the query you suggested and the result set came back in 17 seconds. 🙂
The user doesn't care which of the records from table2 with the same earliest date is returned with a row from table1. As the current date approaches that earliest date they said those multiple rows will resolve into a single row, so it is irrelevant which one I select; the first one will be fine.
To sum up:
I need all records from Table1 returned.
If a record on table2 has the same value for Field1 that exists on a row in table1 AND has a StartDate >= table1's Date1, then I need to return fields from that row along with the fields from table1.
If more than 1 row exists on table2 with those 2 criteria I want to select the one with the earliest StartDate.
If more than 1 row have that same earliest StartDate than I select 1 row and it doesn't matter which one.
I will look into "Exists".
Thank you again, and again for the help and insights you provided.
m
February 28, 2006 at 8:38 am
Kevin - the solution you proposed with "Having" does not return rows from table1 that map to zero rows on table2. Is this the result of using the Having clause: HAVING S2.StartDate = MIN(S1.StartDate)?
March 1, 2006 at 1:16 pm
--I haven't forgot about you...
--I'm pretty sure this is your expected result set
SELECT
P2.R_ID,
P2.Field1,
P2.Date1,
S2.Field2,
S2.Field3
FROM
(
SELECT
R_ID,
--P.Field1,
MIN(S1.startDate) minStartDate
FROMTable1 as P
LEFT JOINTable2 as S1
ONP.Field1 = S1.Field1
ANDS1.StartDate >= P.Date1
GROUP BY
R_ID
--,P.Field1
) P1
INNER JOINTable1 P2
ONp1.R_ID = P2.R_ID
--Remove the join above if P.Field1 is part of the GROUP in P1
LEFT JOINTable2 S2
ONP1.Field1 = S2.Field1
ANDS2.startDate = minStartDate
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply