I haven't posted much lately, but it has been a busy few months with many family activities. At work, I am happy to say that we are in the middle of upgrading our financial system, which includes an upgrade from SQL Server 2000 to SQL Server 2005 64 bit. Basically, I am reviewing everything I have done concerning that system for the past 4 years. This is a chance to improve stored procedures and create new SSIS packages to replace DTS. I am really happy that I am getting to use the new T-SQL features, such as Common Table Expressions (CTE). I have been a big fan of virtual tables, but I believe that CTEs have some advantages.
The first advantage is that CTEs are probably easier to read once you understand the syntax. The CTE is defined at the beginning and then used in one, and only one, statement immediately following it. But it can be used multiple times in that statement. Another interesting thing is that multiple queries can be defined within it and that they can build on each other. CTEs can not be nested, that is, you can't define one inside the other. So, CTEs are very flexible, but not always the best performing solution.
My favorite example is from Itzik Ben-Gan's recent book "Inside SQL Server 2005 T-SQL Programming." It creates a virtual number table. Levels 1 - 5 are catesian products that join the previous level on itself. The num statement uses the new Row_Number() function to apply a row number for each of the rows produced.
WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1), --2 rows
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows
L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows
L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows
num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
SELECT N FROM NUM WHERE N <= 100000;
CTEs are gone right after the query that uses them, but they can be persisted in a view. Here is a view definition so that I always have a table of numbers whenever I need them:
create view vwNumbers
as
WITH
L0 AS (SELECT 1 AS C UNION ALL SELECT 1),
L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),
L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),
L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),
L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),
L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),
num AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS N FROM L5)
SELECT N FROM NUM WHERE N <= 1000000
Here is an example that builds on Itzik's and it is very efficient. It shows how you can "nest" CTEs by using a view and define multiple queries in one CTE.
--Find the dates that orders were not placed in AdventureWorks:
Use AdventureWorks
GO
With
Dates as (select dateadd(d,N,'1/1/1900') as NDate from vwNumbers), -- a "nested" cte
DateBounds as (select min(OrderDate) as MinDate, max(OrderDate) as MaxDate from Sales.SalesOrderHeader) --second query in CTE
select NDate as MissingDate
from Dates left join Sales.SalesOrderHeader
on NDate = OrderDate join DateBounds
on NDate between MinDate and MaxDate
where OrderDate is null
order by NDate
In this example, I have used the dateadd function to translate my numbers view into a date query. I am defining two queries inside the CTE. The second gives me the first and last order date in the orders table. I don't care about dates outside that range. Finally, I use both of the queries defined in the CTE to help me solve the problem. I am using the left join syntax and checking for a null to find rows that are missing.