November 18, 2013 at 7:35 am
A coworker brought this to my attention and I can't figure out what is happening. I was hoping someone here could explain what is happening in SQL that is causing this result. The code is straight forward enough; I have modified it to use two simple temp tables.
The query should return a value from table A based on the existence of a value in table B. The first select is the proper one that functions as you would expect. The second query however is throwing me off, specifically the subquery. "SELECT DISTINCT temp_int_field FROM #sub_table" is an invalid select statement but for some reason, when placed in a sub-select it not only parses, but also runs and returns data. Can someone break this down for me?
I do have experience with using a WHERE NOT EXISTS as opposed to a left join and I know in these cases the sub query references the columns in the outer query so I'm guessing that something similar is happening here.
CREATE TABLE #temp (temp_int_field INT, temp_varchar_field VARCHAR(5))
CREATE TABLE #sub_table (sub_table_int INT, sub_table_varchar VARCHAR(5))
INSERT INTO #temp
SELECT 1,'test'
INSERT INTO #temp
select 2, 'test2'
INSERT INTO #temp
SELECT 3, 'test3'
--//This is the sub table value that i want to return from the temp table
INSERT INTO #sub_table
select 1, 'test2'
---------------------------------
-->This query returns the dataset I expect; one field
SELECT temp_int_field, temp_varchar_field
FROM #temp WITH(NOLOCK)
WHERE temp_int_field IN (SELECT sub_table_int FROM #sub_table)
---------------------------------
-->This query returns all rows from the #temp table
SELECT temp_int_field, temp_varchar_field
FROM #temp WITH(NOLOCK)
WHERE temp_int_field IN (SELECT temp_int_field FROM #sub_table)
--//Take the sub query out and make it a regular select statement and it blows up as it should.
--SELECT DISTINCT temp_int_field FROM #sub_table
--DROP TABLE #temp,#sub_table
November 18, 2013 at 7:46 am
temp_int_field DOES NOT EXIST in #sub_table, it's a column in #temp.
because you are not aliasing your columns, it isn't obvious that that is the issue at first glance.
--this works, becuase the column exisits in #subtable
--ELECT DISTINCT temp_int_field FROM #sub_table does not, and must be part of the implied group by that distinct actually does behind the scenes.
SELECT temp_int_field, temp_varchar_field
FROM #temp WITH(NOLOCK)
WHERE temp_int_field IN (SELECT DISTINCT sub_table_int FROM #sub_table)
Lowell
November 18, 2013 at 7:49 am
Since you didn't specify the table's name in the sub query, SQL Server got the value from the outer query. In order to avoid this, you should specify the table's name in the sub query. When you'll do it, you'll get the error that the column doesn't exist. Most chances are that this was your expected results.
SELECT temp_int_field, temp_varchar_field
FROM #temp WITH(NOLOCK)
WHERE temp_int_field IN (SELECT SubTable.temp_int_field FROM #sub_table as SubTable )
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
November 18, 2013 at 8:27 am
Yes, I realize that the field does not exist in the sub table, that's why it was throwing me off that it would run.
Conceptually when I think of a subquery in this manner I see the inner query running and and then the outer query using it's results to check the existence of fields that match.
So using my thought process for the current example:
SELECT temp_int_field, temp_varchar_field
FROM #temp WITH(NOLOCK)
WHERE temp_int_field IN (SELECT sub_table_int FROM #sub_table)
--//Becomes
--//1 being the number that is returned when the subquery runs
SELECT temp_int_field, temp_varchar_field
FROM #temp WITH(NOLOCK)
WHERE temp_int_field IN (1)
However, it's obviously not as simple as this if I am able to reference outer query columns in the inner query. I am just wondering what SQL Server is actually doing or how it is running the query to allow this.
November 18, 2013 at 8:53 am
You have to understand that the sub query is connected to the outer query. Take a look at this query:
SELECT temp_int_field, temp_varchar_field
FROM #temp WITH(NOLOCK)
WHERE temp_int_field IN (SELECT sub_table_int FROM #sub_table where temp_int_field = 1)
This is more intuitive. In the sub query I reference a column that exists only in the table that I referenced in the outer query. Try to visualize what is going on. When the first record from #temp table is being evaluated the server checks the sub query. Since it didn't find the column temp_int_field in #sub_table, it tries to find it in the outer query. Since there is such table in the outer query it uses it and checks if temp_int_field (from the outer table) equals 1 or not. The same thing happens with the rest of the records. Once you realize that the sub query can reference the table from the outer query, you can understand that it can do it also in the select clause and not only in the where clause.
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply