Structured Query Language (SQL) is an implementation of some core concepts in computer science, set theory, and general mathematics. Having a foundation in where SQL comes from can lead to better and more natural approaches to querying a database.
When we say natural, we mean in line with the mathematical foundations of relational databases. Why is natural a good thing? Because, generally, more natural mathematical approaches to a particular query will yield faster and more stable results.
If you're a self-taught query writer, you would do yourself well to look into the areas of mathematics that were used to define the modern query language. Areas like boolean algebra, relational algebra, and relational calculus (tuple and domain) - it will affect how you approach querying.
An example of this comes into play in when building parameter-based conditional WHERE clauses. Let's look at a simple example.
Example - Step 1:
Suppose we are writing a search query to look for individuals in a table. The table name is People, and it contains an ID field, and FirstName and LastName fields. We want the query to either retrieve all the rows in the table, or filter based on first name, last name, or both.
We start with a stored procedure that looks like this:
CREATE PROCEDURE [spSearchPeople]
@firstname nvarchar(50) = null,
@lastname nvarchar(50) = null
AS
BEGIN
SELECT ID, FirstName, LastName FROM People
END
We have two nullable parameters, which currently are not being used. The SELECT statement pulls all of the values from the table. Now -- how to conditionally filter?
One approach is to use dynamic SQL -- programmatically constructing the WHERE clause as a string to execute. This is especially common if one has learned T-SQL, as opposed to pure SQL, since T-SQL adds procedural programming concepts to a set-based language.
There is nothing wrong with the dynamic SQL approach, per se. There may, also, be certain instances in which dynamic SQL is necessary depending on the complexity of what is being accomplished (though that is rare, in my experience).
In most cases, though, you can approach these types of problems from a more natural, Boolean point of view.
Step 2: The First Boolean Filter
Let's start with filtering on the first name. We adjust our query as such:
ALTER PROCEDURE [spSearchPeople]
@firstname nvarchar(50) = null,
@lastname nvarchar(50) = null
AS
BEGIN
SELECT ID, FirstName, LastName FROM People WHERE (@firstname IS NULL) OR (FirstName LIKE '%' + @firstname + '%')
END
What have we done here? Remember first that the results of a WHERE clause is nothing but pure boolean arithmetic. As the database engine looks at rows it is asking: "Is the WHERE clause 'true' or 'false' for this row"? When the query is executed the database engine will include the row in the result if the WHERE clause is true.
The key to getting the above filter to run properly is the 'OR'. If any condition within a sequence of 'OR's is true, then the whole sequence of conditions is said to be true. In the above our first condition is @firstname IS NULL, thus taking advantage of the nullability of passed in parameters to the stored procedure, another key element of this technique.
So imagine the database engine running through all of the rows in our People table. To decide whether or not to include a particular row in the result it checks the conditions. Suppose the parameter @firstname is a null value. Then the entire WHERE clause is deemed true (since it is ORs not ANDs) and that row will be returned.
Therefore, if @firstname is a null value, then all of the rows will be returned regardless of the rest of the WHERE clause, since @firstname IS NULL is always true no matter what row you are on!
Now let's suppose we give the query something to filter on, passing a string into @firstname. The clause @firstname IS NULL is now false.
Imagine the database engine looking at rows in this case. It sees that the first condition, @firstname IS NULL, is false. The condition is connected by ORs, so it says "well, the first condition is false, so I can only return this row if one of the other conditions is true". In this case, that the FirstName field is LIKE the value in the parameter.
So we've accomplished a conditional filter! If you pass an @firstname then the LIKE clause will be the condition. If you pass nothing, you'll get all the rows back.
We could summarize this technique this way: By including a condition in an OR sequence that has nothing to do with values in the row, you override other conditions that are row dependant.
To be a bit clearer, we might rewrite our query in the following preparatory (though unnecessary) way:
ALTER PROCEDURE [spSearchPeople]
@firstname nvarchar(50) = null,
@lastname nvarchar(50) = null
AS
BEGIN
SELECT ID, FirstName, LastName FROM People WHERE (@firstname IS NULL) OR ( ( FirstName LIKE '%' + @firstname + '%' ) )
END
The above shows in a perhaps more direct way what is happening and also prepares us for adding more than one filter clause if the @firstname is passed in. You may have an entire complex filter in addition to the first name search itself.
Step 3: The Second Filter
Let's add the last name filter:
ALTER PROCEDURE [dbo].[spSearchPeople]
@firstname nvarchar(50) = null,
@lastname nvarchar(50) = null
AS
BEGIN
SELECT ID, FirstName, LastName FROM People
WHERE
(
(@firstname IS NULL)
OR
(
(
FirstName LIKE '%' + @firstname + '%' --AND any other conditions associated with first name
)
)
)
AND
(
(@lastname IS NULL)
OR
(
(
LastName LIKE '%' + @lastname + '%' --AND any other conditions associated with last name'
)
) )
END
Here we have simply repeated the concept for the last name. Please note, though, that now our placement of parenthesis becomes extremely important. Think of each filter as a group, surrounded by parenthesis, and connected by ANDs. It is structured as:
(filter for first name) AND (filter for last name) AND...
whereas inside the filter you have a structure of:
parameter does not exist OR (filter using that parameter)
The ANDs and proper parenthesis ensure that as you add filters you properly reduce the resulting rows. There must be an explicit differentiation between the ANDs on the filters, and the ORs within the filters. If you do not place the parenthesis properly (thus properly separating the filters) then be prepared for unpredictable query results.
Again, there is some unnecessary syntax in the above example, but this kind of query structure can quickly become difficult to manage if you cannot see clearly what is happening. I personally recommend to always structure your conditional queries with good amounts of white space, so that you do not lose track of the filters as you add them.
Conclusion
There are multiple ways to accomplish conditional WHERE clauses, as there are always multiple ways to do things in programming. These examples are not meant to be a critique of any method, but rather the explanation of one approach.
I prefer this method because, while there are other options in T-SQL, this methodology is purely mathematical and thus elegant, natural to the domain of Boolean and relational algebra, and transferable to nearly all other relational database systems.