(last updated: 2020-01-25 @ 00:05 EST / 2020-01-25 @ 05:05 UTC )
Many of us SQL Server professionals are aware that it’s a “best practice” to avoid "SELECT * ...
" queries. If you were not aware of this before, you are now (if you want to better understand why, there are plenty of articles / posts out there explaining it). Within the context of this advice there is a debate as to whether or not "SELECT * ...
" should be avoided in an EXISTS
operator. The theory is that the EXISTS
operator is looking at rows instead of columns and isn’t bring back any data from any column(s), so here it should be ok to use "SELECT * ...
". In trying to avoid the use of "*
", people get creative and instead specify expressions such as:
1
NULL
TOP 1 1
(this one seems like an odd choice given that the benefit of theEXISTS
operator is that it stops processing at the first row it encounters)- {column_name}
- others…
Several (many?) others have written about how the evidence seems to indicate that what you specify in the SELECT
list within the EXISTS
operator is not truly evaluated, so using "*
" should be fine, or even preferred. Two such articles are:
- The Age-Old SELECT * vs. SELECT 1 Debate by Brad Schulz
- Should I Type * or 1 With the EXISTS Logical Operator by Viacheslav Maliutin
Both of those authors do an excellent job showing that the EXISTS
operator ignores the items in the SELECT
list:
- They both show that the execution plans look the same.
- Brad also shows that invalid expressions (i.e. expressions that should generate an error, such as "
1 / 0
") do not throw any errors. - Viacheslav also compares the performance between the various options and finds no difference among them.
In the past, I myself have preferred to use the “divide-by-zero” expression "1 / 0
" as proof that there is no reason to not use "*
". But, revisiting the issue now, it occurs to me that all of these proofs are really only inferring that "*
" and "1
" (and "TOP 1
", "1 / 0
", etc) are being handled the same way by SQL Server. To be fair, seeing that the execution plans look the same is pretty convincing, and is really close to being direct proof. But, what if two execution plans look the same, but have slight differences that are not visible in the graphical execution plans? I’m not sure how possible or likely that is.
Thankfully, we don’t need to wonder about any of this. We can get SQL Server to tell us, directly, that all of these variations are indeed the same, as far as it’s concerned.
What can SQL Server show us?
Methodology
How do we know whether or not the execution plans are truly the same? By inspecting the execution plan XML.
The execution plan XML has the following structure:
<ShowPlanXML> <BatchSequence> <Batch> <Statements> <StmtSimple ... ... >
Within the <StmtSimple>
element / node, there are several attributes, but the two we are interested in are:
QueryHash
QueryPlanHash
If two queries have the same QueryHash
, then they are considered to be the exact same query. The expectation is that since each variation is slightly different text, each test should have a different value for QueryHash
(a query’s hash is based on the actual bytes of the query text, hence even a difference in casing — "A" vs "a" — would result in a different hash value).
Likewise, if two execution plans have the same QueryPlanHash
value, then they are considered to be the exact same execution plan.
Make Testing a Little Easier
In order to view the execution plan XML, after each execution we will need to:
- click the “Execution plan” tab
- right-click in the “Execution plan” area (to bring up the context menu)
- move the mouse cursor to “Show Execution Plan XML” and click on it
Not complicated, but doing this repeatedly over many tests gets a little but annoying. Fortunately, we can create a shortcut key for “Show Execution Plan XML” such that we will only need to click on the “Execution plan” tab (i.e. Step 1) and then hit the shortcut key. It would be nice to also skip Step 1, but there doesn’t seem to be a way to do that. For general instructions on how to create the shortcut key, please see my post:
SSMS Tip #2: Shortcut Keys for SQLCMD Mode, Open Containing Folder, Copy Full Path, etc
For this particular shortcut:
- the command to find is: OtherContextMenus.ExecutionPlanContext.ShowExecutionPlanXML
- create the shortcut either in “Global” (the default), or in “SQL Query Editor” (if you want to use the same shortcut in another context to do something else)
- shortcut keys that I used (was previously unused): Control + ‘
Are We Going to Test Anything or What?
Ok, fine. Let’s get on with it.
Initial Setup
USE [tempdb]; -- DROP TABLE dbo.[Nuthin]; CREATE TABLE dbo.Nuthin ([a] INT, INT); INSERT INTO dbo.Nuthin ([a], ) VALUES (999, -999);
We added a row to the table to make sure that the query optimizer doesn’t skip anything as an optimization if it knows that there are no rows and hence the EXISTS
is guaranteed to return false
.
In order to view the execution plans, you must enable “Include Actual Execution Plan”:
- right-click in the query editor and select “Include Actual Execution Plan” from the context menuOR
- type Control + M
Testing, Part 1
The first test is a commonly used proof that the SELECT
list is not evaluated as it should produce a “divide-by-zero” error as shown here:
SELECT 1/0 FROM sys.objects; /* Msg 8134, Level 16, State 1, Line XXXXX Divide by zero error encountered. */
Yet, as you will see, no error is produced. For each of the tests, I viewed the execution plan XML, copied the QueryHash
and QueryPlanHash
values, and pasted them below the query. In each case, the QueryHash
value is different (as expected), but the QueryPlanHash
value is the same. This proves that there is no functional difference between any of these options (meaning, "SELECT 1 ...
" is not better than "SELECT * ...
" in this context because they are the same, as far as SQL Server is concerned, and that’s the only opinion that matters here).
SELECT 'bob' WHERE EXISTS (SELECT 1/0 FROM dbo.Nuthin); -- QueryHash="0xABC20B0FB2DF9C3E" QueryPlanHash="0xA4D6E870A12967AA" SELECT 'bob' WHERE EXISTS (SELECT * FROM dbo.Nuthin); -- QueryHash="0x91691B2FCD26DBD9" QueryPlanHash="0xA4D6E870A12967AA" SELECT 'bob' WHERE EXISTS (SELECT TOP 1 1 FROM dbo.Nuthin); -- QueryHash="0xFA5DEF9D8794F783" QueryPlanHash="0xA4D6E870A12967AA" SELECT 'bob' WHERE EXISTS (SELECT (SELECT 1) FROM dbo.Nuthin); -- QueryHash="0xAC803197792327AF" QueryPlanHash="0xA4D6E870A12967AA" SELECT 'bob' WHERE EXISTS (SELECT CASE WHEN DATEDIFF(DAY, GETDATE(), '2020-01-22') > 3 THEN 22 ELSE 33 END FROM dbo.Nuthin); -- QueryHash="0xA679E8825CC576BD" QueryPlanHash="0xA4D6E870A12967AA"
Does the "*
" get expanded into the column list?
SELECT 'bob' WHERE EXISTS ( SELECT * FROM (SELECT *, 1/0 AS [nope] FROM dbo.Nuthin) y ); -- QueryHash="0x26C7EEF1114519EC" QueryPlanHash="0xA4D6E870A12967AA"
The query above indicates that even if "*
" is being expanded, it’s not being done in such a way that evaluates any of the data or values for the columns.
Is the column list evaluated or verified in any way? Based on the query below, we can see that if a column is referenced, it must at least exist:
SELECT 'bob' WHERE EXISTS (SELECT FROM dbo.[Nuthin]); /* Data Provider: Msg 207, Level 16, State 1, Line XXXXX Invalid column name 'c' */
Permissions?
Richard.Bielawski, in a forum post related to Viacheslav’s article, said:
If you don’t have access to every single column in the table then any value except a specific column that you do have access to will cause a security failure.
All generic values such as *, 1 or null cause all columns to be checked. If you name a specific column, that’s the only column that will be checked.
Is that true? To test, we will create a restricted User that can SELECT
from the Table, but is denied access to one particular column.
CREATE USER [Denied] WITHOUT LOGIN; GRANT SELECT ON dbo.[Nuthin] TO [Denied]; DENY SELECT ON dbo.[Nuthin]() TO [Denied];
The restricted User cannot view execution plans, so disable those by typing Control + M one more time.
We then impersonate that restricted User:
EXECUTE AS USER = N'Denied'; SELECT CURRENT_USER AS [User]; -- Denied
We verify that the restricted User cannot SELECT
column "", but does have permission to read column "
[a]
":
SELECT * FROM dbo.[Nuthin]; SELECT 1 FROM dbo.[Nuthin]; SELECT NULL FROM dbo.[Nuthin]; /* Msg 230, Level 14, State 1, Line XXXXX The SELECT permission was denied on the column 'b' of the object 'Nuthin', database 'tempdb', schema 'dbo'. */SELECT [a] FROM dbo.[Nuthin]; -- 999 (success)
Great. Now we see if column permissions are checked or not:
SELECT 'bob' WHERE EXISTS (SELECT * FROM dbo.[Nuthin]); SELECT 'bob' WHERE EXISTS (SELECT NULL FROM dbo.[Nuthin]); SELECT 'bob' WHERE EXISTS (SELECT 1 FROM dbo.[Nuthin]); SELECT 'bob' WHERE EXISTS (SELECT 1/0 FROM dbo.[Nuthin]); /* Msg 230, Level 14, State 1, Line XXXXX The SELECT permission was denied on the column 'b' of the object 'Nuthin', database 'tempdb', schema 'dbo'. */SELECT 'bob' WHERE EXISTS (SELECT [a] FROM dbo.[Nuthin]); -- bob (success)
The queries shown above prove that permissions are indeed being checked for all columns when no columns are referenced.
BUT, there’s more to the story. It seems that the generic options do still work, just so long as you reference a column that the User has permissions to somewhere in the query:
SELECT 'bob' WHERE EXISTS (SELECT *, [a] FROM dbo.[Nuthin]); SELECT 'bob' WHERE EXISTS (SELECT * FROM dbo.[Nuthin] WHERE [a] = 0); SELECT 'bob' WHERE EXISTS (SELECT NULL, [a] FROM dbo.[Nuthin]); SELECT 'bob' WHERE EXISTS (SELECT NULL FROM dbo.[Nuthin] WHERE [a] = 0); SELECT 'bob' WHERE EXISTS (SELECT 1, [a] FROM dbo.[Nuthin]); SELECT 'bob' WHERE EXISTS (SELECT 1 FROM dbo.[Nuthin] WHERE [a] = 0); SELECT 'bob' WHERE EXISTS (SELECT 1 / 0, [a] FROM dbo.[Nuthin]); SELECT 'bob' WHERE EXISTS (SELECT 1 / 0 FROM dbo.[Nuthin] WHERE [a] = 0);
Hence, while it is true that column permissions are sometimes checked and can be a problem, it is fairly easy to avoid. And, on a practical level, how often do the queries being used within an EXISTS
operator not reference any columns at all? Meaning, this is definitely something to be aware of, but possibly not something that most people will encounter.
Now we need to stop impersonating the restricted User:
REVERT; SELECT CURRENT_USER AS [User]; -- dbo
One More Thing
We’re almost done, but I did notice something else in a few of the execution plans that was interesting. Remember how the QueryHash
value will be different if even a single byte of the query text is different? Well, as you can see below, those are definitely three different queries, yet they all have the exact same QueryHash
value:
SELECT 'bob' WHERE EXISTS (SELECT [a] FROM dbo.[Nuthin]); -- StatementText="SELECT 'bob' WHERE EXISTS (SELECT [a] FROM dbo.[Nuthin])" -- QueryHash="0x17771D953AFFBFA7" QueryPlanHash="0xA4D6E870A12967AA" SELECT 'bob' WHERE EXISTS (SELECT FROM dbo.[Nuthin]); -- StatementText="SELECT 'bob' WHERE EXISTS (SELECT FROM dbo.[Nuthin])" -- QueryHash="0x17771D953AFFBFA7" QueryPlanHash="0xA4D6E870A12967AA" SELECT 'bob' WHERE EXISTS (SELECT [a], FROM dbo.[Nuthin]); -- StatementText="SELECT 'bob' WHERE EXISTS (SELECT [a], -- FROM dbo.[Nuthin])" -- QueryHash="0x17771D953AFFBFA7" QueryPlanHash="0xA4D6E870A12967AA"
Now that’s not something you see every day ??.
Final Thought
My preference is to use "SELECT *
" as it makes debugging easier. Assumming there are no variables being used in the EXISTS
subquery, I can highlight the entire subquery within the EXISTS(...)
and hit F5 or Control + E to execute it to see what, if anything, is returned. If it’s a "1
" or some other constant then, if any rows are returned, they are all of that constant, which doesn’t tell me nearly as much as the actual column values. And, if it’s an invalid expression, then it doesn’t work at all and can’t even clue me into whether or not rows are returned.