Introduction
Pagination is not a new concept to developers or users. In
fact, most of us use pagination every day. Pagination gives users the ability
to view data in small subsets, much like flipping through the pages of a book.
In fact, many development tools have built-in pagination; however, I believe we
sometimes become too dependent on canned code. It seems that managers today
give less thought to scalability and performance and more thought to faster development.
I am not saying that auto generated code is bad, but it is fair to say auto
generated code is not always the most efficient. Sometimes this inefficiency
can directly impact user experience and the overall perspective of application
performance. Pagination is one of these inefficiencies. While it is easier to
click the check box that says pagination, it is also a lot more costly.
The Problem
The biggest problem with built-in pagination is the amount
of data returned to the application. Built-in pagination retrieves all the
data from a subsequent query and performs filtering at the application tier,
not the database tier. This leads to greater overhead and resource consumption,
with no additive benefit. For example, if you have 100,000 rows returned by a
SQL query and the user wants to view 10 rows at a time, built-in pagination
will return 100,000 rows regardless of the number of rows displayed on the
screen. From a user’s perspective, the load time should be fast because the
application is only returning 10 rows. When load times are slow for 10 rows,
users begin to get frustrated and scream that the application is slow. Users
do not care, nor should they, what is going on in the background. Users care
about two things: usability and performance. This article addresses the
database logic that can be used to successfully implement a custom paging
solution, which can translate into faster load times and a better user
experience.
The Solution
SQL 2005 introduced new functionality called Row_Number.
Row_Number makes pagination a breeze because it gives developers the ability to
assign a sequential number to each row returned. The numbering is dictated by
the partition and ordering set within the function. This functionality allows
developers to return only the data that is needed by the application, nothing
more nothing less. By returning less data to the application, we can directly
increase performance and put less stress on the server.
Let’s start by creating a sample table and some data.
Note: I created test data by populating empty tables with Adventure
Works data. You can download a free copy of the AdventureWorks database, on
the codeplex website. http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004
USE
[tempdb]
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE [name] = 'Contacts')
BEGIN
DROP
TABLE Contacts
END
GO
CREATE
TABLE Contacts(
ContactID
INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
FirstName
VARCHAR(25),
LastName
VARCHAR(25)
)
GO
CREATE
INDEX IXC_LastName ON
dbo.Contacts(LastName) INCLUDE(FirstName)
GO
CREATE
INDEX IXC_FirstName ON
dbo.Contacts(FirstName) INCLUDE(LastName)
GO
DECLARE
@max_id INT
SET
@max_id = (SELECT max(contactid)
FROM
[Adventureworks].[Person].[Contact]
)
INSERT
INTO Contacts
SELECT
TOP 1000000 --<<<--THE
NUMBER OF ROWS!
FirstName = (SELECT TOP 1
FirstName
FROM
adventureworks.person.contact
WHERE ContactID =
ABS(CONVERT(BINARY(6),NEWID()) % @max_id) + 1),
LastName = (SELECT TOP 1
LastName
FROM adventureworks.person.contact
WHERE ContactID =
ABS(CONVERT(BINARY(6),NEWID()) % @max_id) + 1)
FROM
Master.dbo.SysColumns
t1,
Master.dbo.SysColumns
t2
GO
Now that we have our base table and data, we can begin to
discuss how to best accomplish the task at hand. I am sure that most will need
more functionality than paging data, so I have included dynamic sorting and filtering
functionality. Usually, when someone hears the word dynamic they automatically
think dynamic SQL; however, I will demonstrate some ways to achieve dynamic
functionality, without the use of dynamic SQL. I will also show how to achieve
the same functionality through the use of dynamic SQL.
Note: I will not go into any great detail about the
performance benefits/costs and potential hazards of using dynamic SQL, but will
show you some techniques to minimize risk. For more information about the use of
dynamic sql, please read the following article by Erland Sommarskog, http://www.sommarskog.se/dynamic_sql.html.
CREATE
PROCEDURE usp_ContactPaging
(
@SortCol
VARCHAR(25)='ContactId ASC',
@FirstName
VARCHAR(25)=NULL,
@LastName
VARCHAR(25)=NULL,
@pgSize
INT=25, @pgNbr INT=1
)
AS
BEGIN
/*==================================================
I use the below code to get the total number of
rows. If you do not need this functionality,
you can remove this section of code and remove
@NbrPages in the SELECT statements.
==================================================*/
DECLARE
@NbrPages INT
IF
@FirstName IS NULL
AND @LastName IS
NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
END
IF
@FirstName IS NOT
NULL AND
@LastName IS NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[FirstName] = @FirstName
END
IF
@FirstName IS NULL
AND @LastName IS
NOT NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[LastName] = @LastName
END
IF
@FirstName IS NOT
NULL AND
@LastName IS NOT
NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[FirstName] = @FirstName AND
[LastName] = @LastName
END
;WITH
PagingCTE (Row_ID,ContactId,FirstName,LastName)
AS
(
SELECT
ROW_NUMBER()
OVER(ORDER BY
CASE WHEN @SortCol='FirstName DESC' THEN FirstName END DESC,
CASE WHEN @SortCol='FirstName ASC' THEN FirstName END ASC,
CASE WHEN @SortCol='LastName ASC' THEN LastName END ASC,
CASE WHEN @SortCol='LastName DESC' THEN LastName END DESC,
CASE WHEN @SortCol='ContactID ASC' THEN ContactId END ASC,
CASE WHEN @SortCol='ContactID DESC' THEN ContactId END DESC
) AS [Row_ID],
ContactId,
FirstName,
LastName
FROM
Contacts
WHERE
[FirstName] = COALESCE(@FirstName,FirstName) AND
[LastName]
= COALESCE(@LastName,LastName)
)
SELECT
Row_ID,
ContactId,
FirstName,
LastName,
@pgNbr AS PageNumber,
@NbrPages
AS TotalNbrPages
FROM
PagingCTE
WHERE
Row_ID >= (@pgSize
* @pgNbr) - (@pgSize -1) AND
Row_ID
<= @pgSize *
@pgNbr
END
GO
This stored procedure is using a CTE (Common Table
Expression) to create row numbering. The variables are sort column, first
name, last name, page number and page size. The variable names are self
explanatory and I need not go into detail. The where clause uses a formula
based on page size and the page number to calculate which rows should be returned
to the application. By submitting values such as @pgSize = 25, @pgNbr = 1, we
are telling the procedure to return the 1st page and each page will
contain 25 rows.
One of the key things to note in the above stored procedure
is the order by clause. The order by clause sorts the data depending on the
parameter passed in. As you can see, I used a case expression to filter the
data conditionally based on the value of @SortCol. We can use this method to dynamically
sort, without the use of dynamic sql. This still gives us good performance,
costs little overhead and does not subject us to injection attack. The
predicate is built dynamically using COALESCE. I am using the COALESCE
function to set the column equal to itself, unless a value has been passed into
the stored procedure. Often developers will use (WHERE FirstName = @FirstName
OR @FirstName IS NULL) instead. This is essentially the same thing and should
not result in a differing execution plan. Can we all agree that this stored
procedure looks good? Well I disagree. I used COALESCE to demonstrate what
not to do when dealing with dynamic filters. The method described above guarantees
that the query will result in a table scan and that no indexes will be used. I
know this is outside the scope of the article, but should be mentioned. The
idea is to make data retrieval faster, not slower.
Have your doubts? Let’s test the stored procedure. Note:
Since we are dynamically generating table data, I cannot guarantee you will
have the same first and last names in your table. Supply a name that is valid
in your table.
EXEC
usp_contactpaging 'contactid DESC','Adam',NULL,25,1
GO
Results:
Execution Plan:
What can we do to make this query index seek? Well, we
really only have two options. One is to use dynamic sql to build the predicate
dynamically. Two is to introduce control flow logic and have differing
statements for each possibility. I know that using control flow logic can be
tedious and will make the procedure more difficult to manage, but I still prefer
to use this method where possible. For those in a situation where there are
too many variables to use control flow logic, I will be demonstrating a dynamic
sql method a bit later in the article.
Revised Stored Procedure
ALTER
PROCEDURE usp_ContactPaging
(
@SortCol
VARCHAR(25)='ContactId ASC',
@FirstName
VARCHAR(25)=NULL,
@LastName
VARCHAR(25)=NULL,
@pgSize
INT=25, @pgNbr INT=1
)
AS
BEGIN
/*==================================================
I use the below code to get the total number of
rows. If you do not need this functionality,
you can remove this section of code and remove
@NbrPages in the SELECT statements.
==================================================*/
DECLARE
@NbrPages INT
IF
@FirstName IS NULL
AND @LastName IS
NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
END
IF
@FirstName IS NOT
NULL AND
@LastName IS NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[FirstName] = @FirstName
END
IF
@FirstName IS NULL
AND @LastName IS
NOT NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[LastName] = @LastName
END
IF
@FirstName IS NOT
NULL AND
@LastName IS NOT
NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[FirstName] = @FirstName AND
[LastName] = @LastName
END
--NO filters, this will always TABLE/INDEX scan
IF
@FirstName IS NULL
AND @LastName IS
NULL
BEGIN
;WITH PagingCTE (Row_ID,ContactId,FirstName,LastName)
AS
(
SELECT
ROW_NUMBER()
OVER(ORDER BY
CASE WHEN @SortCol='FirstName DESC' THEN FirstName END DESC,
CASE WHEN @SortCol='FirstName ASC' THEN FirstName END ASC,
CASE WHEN @SortCol='LastName ASC' THEN LastName END ASC,
CASE WHEN @SortCol='LastName DESC' THEN LastName END DESC,
CASE WHEN @SortCol='ContactID ASC' THEN ContactId END ASC,
CASE WHEN @SortCol='ContactID DESC' THEN ContactId END DESC
) AS [Row_ID],
ContactId,
FirstName,
LastName
FROM Contacts
)
SELECT
Row_ID,
ContactId,
FirstName,
LastName,
@pgNbr
AS PageNumber,
@NbrPages
AS TotalNbrPages
FROM PagingCTE
WHERE Row_ID >= (@pgSize * @pgNbr) - (@pgSize -1) AND
Row_ID
<= @pgSize *
@pgNbr
END
--FIRST NAME ONLY
IF
@FirstName IS NOT
NULL AND
@LastName IS NULL
BEGIN
;WITH PagingCTE (Row_ID,ContactId,FirstName,LastName)
AS
(
SELECT
ROW_NUMBER()
OVER(ORDER BY
CASE WHEN @SortCol='FirstName DESC' THEN FirstName END DESC,
CASE WHEN @SortCol='FirstName ASC' THEN FirstName END ASC,
CASE WHEN @SortCol='LastName ASC' THEN LastName END ASC,
CASE WHEN @SortCol='LastName DESC' THEN LastName END DESC,
CASE WHEN @SortCol='ContactID ASC' THEN ContactId END ASC,
CASE WHEN @SortCol='ContactID DESC' THEN ContactId END DESC
) AS [Row_ID],
ContactId,
FirstName,
LastName
FROM Contacts
WHERE [FirstName] =
@FirstName
)
SELECT
Row_ID,
ContactId,
FirstName,
LastName,
@pgNbr
AS PageNumber,
@NbrPages
AS TotalNbrPages
FROM PagingCTE
WHERE Row_ID >= (@pgSize * @pgNbr) - (@pgSize -1) AND
Row_ID
<= @pgSize *
@pgNbr
END
--LAST NAME ONLY
IF
@FirstName IS NULL
AND @LastName IS
NOT NULL
BEGIN
;WITH PagingCTE (Row_ID,ContactId,FirstName,LastName)
AS
(
SELECT
ROW_NUMBER()
OVER(ORDER BY
CASE WHEN @SortCol='FirstName DESC' THEN FirstName END DESC,
CASE WHEN @SortCol='FirstName ASC' THEN FirstName END ASC,
CASE WHEN @SortCol='LastName ASC' THEN LastName END ASC,
CASE WHEN @SortCol='LastName DESC' THEN LastName END DESC,
CASE WHEN @SortCol='ContactID ASC' THEN ContactId END ASC,
CASE WHEN @SortCol='ContactID DESC' THEN ContactId END DESC
) AS [Row_ID],
ContactId,
FirstName,
LastName
FROM Contacts
WHERE [LastName] =
@LastName
)
SELECT
Row_ID,
ContactId,
FirstName,
LastName,
@pgNbr
AS PageNumber,
@NbrPages
AS TotalNbrPages
FROM PagingCTE
WHERE Row_ID >= (@pgSize * @pgNbr) - (@pgSize -1) AND
Row_ID
<= @pgSize *
@pgNbr
END
--FIRST AND LAST NAME
IF
@FirstName IS NOT
NULL AND
@LastName IS NOT
NULL
BEGIN
;WITH PagingCTE (Row_ID,ContactId,FirstName,LastName)
AS
(
SELECT
ROW_NUMBER()
OVER(ORDER BY
CASE WHEN @SortCol='FirstName DESC' THEN FirstName END DESC,
CASE WHEN @SortCol='FirstName ASC' THEN FirstName END ASC,
CASE WHEN @SortCol='LastName ASC' THEN LastName END ASC,
CASE WHEN @SortCol='LastName DESC' THEN LastName END DESC,
CASE WHEN @SortCol='ContactID ASC' THEN ContactId END ASC,
CASE WHEN @SortCol='ContactID DESC' THEN ContactId END DESC
) AS [Row_ID],
ContactId,
FirstName,
LastName
FROM Contacts
WHERE [FirstName] =
@FirstName AND
[LastName]
= @LastName
)
SELECT
Row_ID,
ContactId,
FirstName,
LastName,
@pgNbr
AS PageNumber,
@NbrPages
AS TotalNbrPages
FROM PagingCTE
WHERE Row_ID >= (@pgSize * @pgNbr) - (@pgSize -1) AND
Row_ID
<= @pgSize *
@pgNbr
END
END
GO
As you can see, the stored
procedure just got a lot uglier and longer; however, the optimizer can now
generate a quality execution plan. Execute the new procedure, using the same
variables.
EXEC
usp_contactpaging 'contactid DESC','Adam',NULL,25,1
GO
Query Results:
Execution Plan:
Note: If you use the LIKE operator with begins with, you
may see a nested loop join instead of a single seek. Like uses a nested loop
join to compare a constant value (your parameter value) to an underlying table,
in our case contacts. This type of join will still yield good performance for
our purposes because we are only seeking the table one time; however, this type
of join is usually slower for large table joins.
By adding control flow logic to the existing stored
procedure we are able to index seek. The major problem with this method is
maintainability becomes increasingly difficult; however, I believe
manageability takes a backseat to security. Next I will demonstrate the
parameterized dynamic sql method.
The biggest consideration when building a dynamic SQL query
is how to minimize risk and parameterize the statement. In order to
parameterize dynamic SQL you must execute dynamic SQL using the system stored
procedure sp_executesql. Parameterized dynamic SQL is less susceptible to
injection attack, in comparison to non parameterized dynamic SQL.
Parameterized dynamic SQL also has the distinct advantage of query plan reuse. The
optimizer does not usually allow for query plan reuse with non parameterized dynamic
SQL, unless an identical query plan already exists. For parameterized dynamic
SQL, the optimizer is able to keep parts of an existing query plan and
interchange values, without having to generate a new plan.
Parameterized Dynamic SQL:
ALTER
PROCEDURE usp_ContactPaging
(
@SortCol
VARCHAR(25)='ContactId ASC',
@FirstName
VARCHAR(25)=NULL,
@LastName
VARCHAR(25)=NULL,
@pgSize
INT=25, @pgNbr INT=1
)
AS
BEGIN
DECLARE
@SQL nvarchar(max),
@params nvarchar(100)
/*==================================================
I use the below code to get the total number of
rows. If you do not need this functionality,
you can remove this section of code and remove
@NbrPages in the SELECT statements.
==================================================*/
DECLARE
@NbrPages INT
IF
@FirstName IS NULL
AND @LastName IS
NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
END
IF
@FirstName IS NOT
NULL AND
@LastName IS NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[FirstName] = @FirstName
END
IF
@FirstName IS NULL
AND @LastName IS
NOT NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[LastName] = @LastName
END
IF
@FirstName IS NOT
NULL AND
@LastName IS NOT
NULL
BEGIN
SELECT
@NbrPages =
CEILING(count(*)/(@pgSize*1.0))
FROM
Contacts
WHERE
[FirstName] = @FirstName AND
[LastName] = @LastName
END
SET
@params = N'@SIZE INT, @nbr
INT, @Pages INT, @Sort VARCHAR(25)'
SET
@SQL = N'
;WITH PagingCTE (Row_ID,ContactId,FirstName,LastName)
AS
(
SELECT
--YOU CAN REMOVE
THE CASE AND USE THE SORT VARIABLE HERE BUT IT
--INCREASES
SUSCEPTIBILITY
ROW_NUMBER()
OVER(ORDER BY
CASE WHEN @Sort=''FirstName DESC'' THEN FirstName END
DESC,
CASE WHEN @Sort=''FirstName ASC'' THEN FirstName END
ASC,
CASE WHEN @Sort=''LastName ASC'' THEN LastName END
ASC,
CASE WHEN @Sort=''LastName DESC'' THEN LastName END
DESC,
CASE WHEN @Sort=''ContactID ASC'' THEN ContactId END
ASC,
CASE WHEN @Sort=''ContactID DESC'' THEN ContactId END
DESC
) AS [Row_ID],
ContactId,
FirstName,
LastName
FROM Contacts
' + CASE
WHEN @FirstName IS NOT NULL AND @LastName IS NULL THEN
N'WHERE FirstName = ' +
QUOTENAME(@FirstName,'''')
WHEN @FirstName IS NULL AND @LastName IS NOT NULL THEN
N'WHERE LastName = ' + QUOTENAME(@LastName,'''')
WHEN @FirstName IS NOT NULL AND @LastName IS NOT NULL THEN
N'WHERE FirstName = ' +
QUOTENAME(@FirstName,'''') + ' AND ' + N'LastName = ' + QUOTENAME(@LastName,'''')
END + N'
)
SELECT
Row_ID,
ContactId,
FirstName,
LastName,
@Nbr AS PageNumber,
@Pages AS TotalNbrPages
FROM PagingCTE
WHERE Row_ID >= (@SIZE * @nbr) - (@SIZE -1) AND Row_ID <=
@SIZE * @nbr'
--PRINT @sql
EXEC
sp_executesql
@SQL,
@params,
@SIZE
= @pgSize,
@nbr
= @pgNbr,
@Pages
= @NbrPages,
@Sort
= @SortCol
END
GO
Now
let’s execute the stored procedure
EXEC
usp_contactpaging 'contactid DESC','Adam',NULL,25,1
GO
Query Results:
Execution Plan:
Note: You can remove the Row_Number case statement and use
@SortCol to build the order by clause dynamically; however, you will be subject
to greater security risk and will not greatly increase performance.
Conclusion
In our attempt to cut corners to meet deadlines, we
sometimes miss the key ingredients to making an application as efficient as
possible. One of these corners is built-in pagination. While it is much
easier to check the box for pagination, it does not mean this is the best way to
introduce pagination to your environment. SQL 2005 has given us a simplistic method
to create custom pagination. The methodology is possible because of a new
function called Row_Number. By implementing custom pagination functionality,
we can directly increase user satisfaction, have faster page loads, and reduce amount
of data returned to the application.
Links:
·
Adventure Works DB - http://www.codeplex.com/MSFTDBProdSamples/Release/ProjectReleases.aspx?ReleaseId=4004
·
Dynamic SQL – http://www.sommarskog.se/dynamic_sql.html
·
Row_Number() - http://msdn.microsoft.com/en-us/library/ms186734(SQL.90).aspx
·
Sp_executesql - http://msdn.microsoft.com/en-us/library/ms188001(SQL.90).aspx