Introduction
This is the first of a two-part series of articles looking at the APPLY operator. This part sets out to give the reader a good understanding of how to design queries using the APPLY operator, using a detailed example to cover the new ideas presented. There is also a brief look at user-defined table-valued functions; since the concepts involved are central to using APPLY effectively.
Next week, the second part will compare APPLY with the familiar JOIN operator, examine APPLY in more detail, and present more examples.
Understanding APPLY
Overview
APPLY encourages us to take a divide and conquer approach to query writing; it works by building a solution in stages. The method might be familiar to those who have worked with procedural languages like those in the .NET family. As we will see, APPLY works by calling a routine for each member of an input set.
The details
APPLY is named after the process of applying a set of input rows to a table-valued function. In this context, a table-valued function can refer to just about anything that produces rows and columns as its output - so we are not just referring to a user-defined table-valued function here.
APPLY calls the function once for each row from the input. Column values from the input row are available for the function to use. The function returns a table on each call, which always has the same set of columns. The number of rows returned may vary between calls, however. The table produced by the function is appended to the input row. If the function produces more than one row, the input row is duplicated to match. Each member of the input set produces part of the final output; these partial results are combined by APPLY to produce the eventual full result.
If that seems all a bit complex and abstract, do not worry; it is a lot simpler in practice, as shown later in an example.
Query design
APPLY encourages the designer to think in terms of applying a common function to each row of an input set, with intermediate results combining to form the final output. With one exception (which we will come to next), this difference in logical design is largely a convenience feature. Most queries written using APPLY can also be expressed using a JOIN. Part two of this series will examine that statement in more detail.
The exception is where the function used is a user-defined table-valued function (defined using the CREATE FUNCTION statement), which uses values from the input set in its parameter list. This powerful ability is unique to the APPLY operator.
In the next section, we will take a simple problem and solve it using the divide and conquer approach promoted by APPLY. Incidentally, the problem is easily solved using joins, but the point is to emphasise the design approach taken when using APPLY.
Using APPLY
A simple problem
Imagine you are the person responsible for the database at a school. You are asked to produce a report showing grade scores for each male student, in each subject. The following illustration shows the three relevant tables, together with a sample of the data in each.
Writing the function
Frequently, the natural-language description of the problem makes it clear what the input set should be, and what the function should do.
In this case, the input set is the set of all male students. The function is required to list the subjects and scores for one particular student. By applying each member of the input set to the function, the desired result will be built up in stages.
We start by writing a query to return subjects and scores for one student. Notice that instead of embedding a particular value in the query, a variable is used.
Though not strictly necessary (as we will see later on), our next step is to encapsulate this logic in a user-defined table-valued function. The parts carried over from the original query are highlighted in blue.
Notice how easy that is to do - the variable becomes the parameter, and the body of the SELECT statement becomes the definition of the function. We can use the following query to test our new function with a single student id.
This produces a table with three rows and two columns, as shown below. This is one of the partial results that will later be combined by APPLY to form the final output.
The input set
Now that we have the function, we can move on to produce the input set, which will drive the APPLY. The query to list all male students is trivial:
Writing the APPLY
The final step is to APPLY the input set to the function. The APPLY slots into the FROM clause of the input set query, and the function sits to the right of the APPLY operator.
The input set passes student ids, one at a time, to the function through its parameter. Since we are using APPLY, we can pass this value as a direct column reference.
The result of executing that query is:
The column data shown in dark green comes from the input set, and the data shown in blue comes from the function. Values from the input set are repeated as necessary, to match the number of rows returned by the function. The repeated rows are shown in a lighter green.
APPLY and User-Defined Functions
As was mentioned in the section on query design, APPLY is not limited to using user-defined functions created using the CREATE FUNCTION statement. In the previous example, we could have omitted the user-defined function completely, and written the query like this:
Instead of passing the student id to the function as a parameter, the inner WHERE clause now contains a direct reference to a column from the input set. This connection between input set and function is known as a correlation. Some people find it useful to think of this usage of APPLY as a correlated join.
You might be wondering why we went to the trouble of creating a user-defined function in the previous section's example. We will discuss those reasons next.
The advantages of in-line user-defined functions
There are two types of user-defined table-valued function: multi-statement and in-line. For brevity, we will refer to the in-line variety as an iTVF from now on.
An iTVF consists of a single SELECT statement, which can include references to the function's parameters. See CREATE FUNCTION in the SQL Server documentation for syntax details, more examples, and the list of restrictions. You might find it useful to think of an iTVF as a view that accepts parameters. Just as for views, SQL Server expands the definition of an iTVF directly into the query plan of an enclosing query, before optimization is performed.
The effect is that SQL Server is able to apply its full range of optimizations, considering the query as a whole. It is just as if you had written the expanded query out by hand. This makes iTVFs a great way to:
- Encapsulate logic;
- Produce compact and readable code;
- Promote code-reuse;
- Improve consistency; and
- Encourage a modular programming style
These advantages can help to reduce the number of bugs, shorten development time, and make training new staff a quicker and easier process. Many professionals maintain a library of iTVFs, for precisely these reasons.
Multi-statement functions
A multi-statement user-defined function works quite differently. In particular, it does not share the view-like property of being expanded into the containing query before optimization. For this reason, multi-statement functions tend to perform much less well than iTVFs. If you decide to write a multi-statement function, be sure to test it thoroughly if performance is an important consideration.
End of Part One
As you may have noticed from the preceding code examples, the APPLY operator cannot be used by itself in the FROM clause. There are also two forms of APPLY, CROSS and OUTER. We will discuss these in Part II of this series.
The Resources section below contains heavily annotated versions of all the code featured in this part of the article, together with a script to create the sample data used. It also includes an extra example, showing how to return the TOP N scores for each student.