January 12, 2015 at 8:12 pm
I have a situation where I need to identify all appropriate records in my main table and only appropriate records in my LEFT JOIN table where they match those in a third table without letting the third table affect my primary result set.
To help explain, here is a simplified code snippit from a health care environment:DECLARE @RptPeriodStart DATETIME, @RptPeriodStop DATETIME;
CREATE TABLE #Element (ElementId BIGINT, Concept VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));
CREATE TABLE #Property (PropertyId BIGINT, ElementId BIGINT, PropertyName VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));
CREATE TABLE #ValueSet (ValueSetId BIGINT, ValueSetName VARCHAR(50), Description VARCHAR(256), StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));
SELECT e.ElementId, e.StartDateTime, e.StopDateTime
FROM #Element e
LEFT JOIN #Property p
ON p.PropertyName = 'FacilityLocation'
AND p.ElementId = e.ElementId
INNER JOIN #ValueSet vs
ON vs.ValueSetName = 'ED'
AND vs.StdTaxonomy = p.StdTaxonomy
AND vs.StdTerm = p.StdTerm
WHERE e.Concept = 'encounter'
AND (e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop
OR p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop);
In this query, I want to identify encounters that end within the reporting period or their associated ED Visit ends within the reporting period. The ED Visit is identified by the property 'FacilityLocation' but I only want those locations identified as ED according to the given value set (ignoring others such as ICU). Therefore, I need the table #ValueSet to restrict the selection of #Property without affecting #Element.
If the join to #ValueSet is made with a LEFT JOIN, then I do indeed get all records from #Element, but I also get those from #Property whether they're in the value set or not. If I use an INNER JOIN as shown, then I only get records from #Element that have an ED Visit and I don't get those without an ED Visit.
Due to the data volume in the tables, if I do a LEFT JOIN to a sub-query of #Property which is an INNER JOIN to #ValueSet [basically LEFT JOIN (SELECT FROM #Property INNER JOIN #ValueSet)] then performance suffers tremendously because it forces a look at ALL #Property records and not just the ones for the required #Element records.
This is really driving me nuts!! Is there a way to do this effectively without using a correlated sub-query?
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
January 12, 2015 at 9:16 pm
Very difficult for me to follow precisely without some sample data and expected results to work with, but have you tried EXISTS and/or NOT EXISTS?? Sounds like one or both could be helpful here. The great thing about EXISTS is that it will "short circuit" and stop processing once it finds the first "hit". I have used that about a bajillion times at clients to replace counting all records and then verifying count > 0. 😎
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 13, 2015 at 2:23 am
There are two issues here which might be causing some confusion.
Firstly, the FROM list of your query, which looks like
SELECT ...
FROM parent
LEFT JOIN child to parent
INNER JOIN grandchild to child
SQL Server converts the LEFT JOIN to an INNER JOIN. See this post for more details: http://www.sqlservercentral.com/Forums/FindPost1583604.aspx
Secondly, if you filter a column of a left-joined table in the WHERE clause without accounting for nulls (where’s no match on the RHS for a row on the LHS), you’re converting the join to an INNER JOIN.
As Kevin points out, it’s difficult to figure out what you are trying to do without sample data and expected results, but I think it might be something like this:
SELECT e.ElementId, e.StartDateTime, e.StopDateTime
FROM #Element e
LEFT JOIN (
SELECT p.ElementId
FROM #Property p
INNER JOIN #ValueSet vs
ON vs.ValueSetName = 'ED'
AND vs.StdTaxonomy = p.StdTaxonomy
AND vs.StdTerm = p.StdTerm
WHERE p.PropertyName = 'FacilityLocation'
AND p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop
) pvs ON pvs.ElementId = e.ElementId
WHERE e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop
OR pvs.ElementId IS NOT NULL
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 13, 2015 at 5:22 am
Aaron N. Cutshall (1/12/2015)
If the join to #ValueSet is made with a LEFT JOIN, then I do indeed get all records from #Element, but I also get those from #Property whether they're in the value set or not. If I use an INNER JOIN as shown, then I only get records from #Element that have an ED Visit and I don't get those without an ED Visit.
OK, may be just filter them in WHERE ?DECLARE @RptPeriodStart DATETIME, @RptPeriodStop DATETIME;
CREATE TABLE #Element (ElementId BIGINT, Concept VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));
CREATE TABLE #Property (PropertyId BIGINT, ElementId BIGINT, PropertyName VARCHAR(16), StartDateTime DATETIME, StopDateTime DATETIME, StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));
CREATE TABLE #ValueSet (ValueSetId BIGINT, ValueSetName VARCHAR(50), Description VARCHAR(256), StdTaxonomy VARCHAR(16), StdTerm VARCHAR(50));
SELECT e.ElementId, e.StartDateTime, e.StopDateTime
FROM #Element e
LEFT JOIN #Property p
ON p.PropertyName = 'FacilityLocation'
AND p.ElementId = e.ElementId
LEFT JOIN #ValueSet vs
ON vs.ValueSetName = 'ED'
AND vs.StdTaxonomy = p.StdTaxonomy
AND vs.StdTerm = p.StdTerm
WHERE e.Concept = 'encounter'
AND (e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop
OR p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop)
-- no real Property without ValueSet
AND (p.ID IS NULL OR vs.ID IS NOT NULL);
January 13, 2015 at 7:41 am
ChrisM@Work (1/13/2015)
There are two issues here which might be causing some confusion.Firstly, the FROM list of your query, which looks like
SELECT ...
FROM parent
LEFT JOIN child to parent
INNER JOIN grandchild to child
SQL Server converts the LEFT JOIN to an INNER JOIN. See this post for more details: http://www.sqlservercentral.com/Forums/FindPost1583604.aspx
Chris,
I had great difficulty in simplifying the situation and generating data would have let to greater confusion rather than clarification, so I created a different example that is not as complex but I think still serves the purpose:
SELECT A.Item, B.Descr, C.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
ON B.Item = A.Item
INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color;
As expected, I only received two lines as the LEFT JOIN was converted to an INNER JOIN:
ItemDescrColor
2Red 2Red
4Blue 4Blue
Changing the INNER JOIN to a LEFT JOIN did provide all 5 lines but allowed a child record not in the color value set (green):
ItemDescrColor
1NULLNULL
2Red 2Red
3Green 3NULL
4Blue 4Blue
5NULLNULL
Then I modified the query based upon your posting:
SELECT A.Item, B.Descr, C.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color
ON B.Item = A.Item;
And it worked!!
ItemDescrColor
1NULLNULL
2Red 2Red
3NULLNULL
4Blue 4Blue
5NULLNULL
I have to admit that I don't really understand how it works syntactically. Could you clarify?
Aaron
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
January 13, 2015 at 7:43 am
serg-52 (1/13/2015)
OK, may be just filter them in WHERE ?
Unfortunately, that would filter out rows, not columns.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
January 14, 2015 at 12:27 am
Aaron N. Cutshall (1/13/2015)
serg-52 (1/13/2015)
OK, may be just filter them in WHERE ?Unfortunately, that would filter out rows, not columns.
Yes, my understanding was you complaint just about excessive rows. If you need just set column to NULL then
SELECT A.Item
-- when key of C is NULL ignore B data
, CASE WHEN C.Color IS NOT NULL THEN B.Descr END Descr
, C.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
ON B.Item = A.Item
LEFT JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color
While it gives the same results as the one by Chris it may have different execution plan. If
Due to the data volume in the tables, if I do a LEFT JOIN to a sub-query of #Property which is an INNER JOIN to #ValueSet [basically LEFT JOIN (SELECT FROM #Property INNER JOIN #ValueSet)] then performance suffers tremendously because it forces a look at ALL #Property records and not just the ones for the required #Element records.
then i suggest you choose one of the scripts based on their perfomance under your environment.
January 14, 2015 at 1:30 am
Aaron N. Cutshall (1/13/2015)
ChrisM@Work (1/13/2015)
There are two issues here which might be causing some confusion.Firstly, the FROM list of your query, which looks like
SELECT ...
FROM parent
LEFT JOIN child to parent
INNER JOIN grandchild to child
SQL Server converts the LEFT JOIN to an INNER JOIN. See this post for more details: http://www.sqlservercentral.com/Forums/FindPost1583604.aspx
Chris,
I had great difficulty in simplifying the situation and generating data would have let to greater confusion rather than clarification, so I created a different example that is not as complex but I think still serves the purpose:
SELECT A.Item, B.Descr, C.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
ON B.Item = A.Item
INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color;
As expected, I only received two lines as the LEFT JOIN was converted to an INNER JOIN:
ItemDescrColor
2Red 2Red
4Blue 4Blue
Changing the INNER JOIN to a LEFT JOIN did provide all 5 lines but allowed a child record not in the color value set (green):
ItemDescrColor
1NULLNULL
2Red 2Red
3Green 3NULL
4Blue 4Blue
5NULLNULL
Then I modified the query based upon your posting:
SELECT A.Item, B.Descr, C.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color
ON B.Item = A.Item;
And it worked!!
ItemDescrColor
1NULLNULL
2Red 2Red
3NULLNULL
4Blue 4Blue
5NULLNULL
I have to admit that I don't really understand how it works syntactically. Could you clarify?
Aaron
Hi Aaron, I haven't come across an online source which describes and accounts for this behaviour - yet - but you may find some tips in this thread.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 14, 2015 at 2:39 am
BOL says,
<joined_table> ::=
{
<table_source> <join_type> <table_source> ON <search_condition>
| <table_source> CROSS JOIN <table_source>
| left_table_source { CROSS | OUTER } APPLY right_table_source
| [ ( ] <joined_table> [ ) ]
}
<join_type> ::=
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN
http://msdn.microsoft.com/en-us/library/ms177634.aspx
Which means the code between JOIN and ON must be a <table_source>.
It leaves no way to interpret the query but as it is
SELECT A.Item, B.Descr, C.Color
FROM
(SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN
(
(SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color
)
ON B.Item = A.Item;
January 14, 2015 at 8:59 am
Serg,
You are correct that it essentially transforms the INNER JOIN into a sub-query.
SELECT A.Item, B.Descr, C.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color
ON B.Item = A.Item;
SELECT A.Item, B.Descr, B.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT B.Item, B.Descr, C.Color
FROM (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
INNER JOIN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color) as C
ON C.Color = B.Color
) AS B
ON B.Item = A.Item;
When I look at the execution plan for the above two queries, they were nearly identical with the sub-query having only one additional step. I abandoned that approach due to the performance hit I encountered and will have to abandon this solution as well for the same reason.
Another alternative is to tweak my environment to allow something like this one:
SELECT A.Item, B.Descr, B.Color
FROM (SELECT 1 as Item UNION ALL SELECT 2 as Item UNION ALL SELECT 3 as Item UNION ALL SELECT 4 as Item UNION ALL SELECT 5 as Item) as A
LEFT JOIN (SELECT 2 as Item, 'Red 2' AS Descr, 'Red' as Color UNION ALL SELECT 3 as Item, 'Green 3' AS Descr, 'Green' as Color UNION ALL SELECT 4 as Item, 'Blue 4' AS Descr, 'Blue' as Color) as B
ON B.Item = A.Item
AND B.Color IN (SELECT 'Red' as Color UNION ALL SELECT 'White' as Color UNION ALL SELECT 'Blue' as Color);
That will be a bit difficult as my true scenario has not one value to be validated but a pair of values and it doesn't lend itself easily to this approach.
Thanks to everyone who contributed and helped me to work out this difficult issue.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
January 14, 2015 at 10:41 am
So, have you actually solved your problem yet with Chris' solution, or is there still an issue with what you are trying to do?
January 14, 2015 at 10:52 am
Chris' solution works well in environments with low to medium data volume, but in the very high data volume environment I have it's too slow. I'm considering other alternatives.
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
January 14, 2015 at 12:14 pm
I would not have thought the slowness would be due to the derived table he used to enable your joins. I would think its more likely a question of what indexes you have or of the OR statement in the where clause resulting in the wrong indexes being used.
Out of curiosity, is the following faster, slower, or about the same (replaces the OR with UNION ALLs)? That is assuming I got the logic right, no data to test against ...
;
WITH pvs AS (
SELECT p.ElementId
FROM #Property p
INNER JOIN #ValueSet vs
ON vs.ValueSetName = 'ED'
AND vs.StdTaxonomy = p.StdTaxonomy
AND vs.StdTerm = p.StdTerm
WHERE p.PropertyName = 'FacilityLocation'
AND p.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop
)
SELECT e.ElementId, e.StartDateTime, e.StopDateTime
FROM #Element e
LEFT JOIN pvs ON pvs.ElementId = e.ElementId
WHERE e.StopDateTime BETWEEN @RptPeriodStart AND @RptPeriodStop
UNION ALL
SELECT e.ElementId, e.StartDateTime, e.StopDateTime
FROM #Element e
JOIN pvs ON pvs.ElementId = e.ElementId
WHERE e.StopDateTime < @RptPeriodStart
UNION ALL
SELECT e.ElementId, e.StartDateTime, e.StopDateTime
FROM #Element e
JOIN pvs ON pvs.ElementId = e.ElementId
WHERE e.StopDateTime > @RptPeriodStop
January 14, 2015 at 12:31 pm
Yet, not surprising when you consider that the parent table has 1.85M rows, the child table has 7.5M rows, and the validation table has 54K. The indexes were optimized to access the child table with the parent record known. The sub-query joins the child table with the validation table and then after that joins to the parent table. This forced a complete table scan join between the child and the validation tables without narrowing it down to the correct parent first. Combine that with the number of records needed from the parent table and you could understand why it took some time. It worked for sure, but the time involved was just too long for acceptable performance (at least 10 times longer).
LinkedIn: https://www.linkedin.com/in/sqlrv
Website: https://www.sqlrv.com
January 14, 2015 at 12:59 pm
1) Did you try out my rewrite of the query?
2)
The sub-query joins the child table with the validation table and then after that joins to the parent table.
It may be picking that plan, but that may be because the OR makes it tough to pick good indexes.
With a derived table or CTE, the query optimizer does not restrict itself to calculating the derived table and then joining it to the others.
So if it could find a good index to the parent table it would be using it. The derived table is just a way of communicating the correct logic for the query.
An article with an example[/url]
edit: of course, performance could still be tough because you can only filter so much of the parent table. You don't care what the dates on it are if you have the same dates within the child table.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply