Recently someone asked me about what had become my feature in SQL 2005 and if
it was the feature I thought it would be. That's really a pretty good question
since it's easy to get excited about the 'wow' and overlook the more mundane.
Going into the release my favorite feature that I was looking forward to was the
ability to modify replicated columns (if you've done replication I'm sure you
share my enthusiasm for that feature!). My path has taken a twist and turn since
then so that I don't use replication daily. Instead I find that out of all the
new things the one I seem to use most is Top N in updates and deletes (followed
closely by the output clause). No, not exciting, but definitely practical.
Assuming you've been using SQL 2000 for a while you know that you can use Top
N in selects, either as a number or a percentage. For example, you might write
this:
select top 100 * from sometable
What you couldn't do was parameterize it. It only worked with selects too. If
you wanted to restrict the number of rows affected by a select at runtime, or
restrict the rows affected by an update or delete you
had to explicitly describe the rows using a join or where clause, or you could
cheat a bit by using ROWCOUNT, like this:
set rowcount 1000 delete from table where somefield=1 set rowcount 0
It worked well enough (and still does in SQL 2005). The downside was a bit of
a clunky syntax and it created the risk that somehow rowcount did not get reset back to 0,
not a good thing since it is a session based setting. Forgetting to change it
back would mean that all the statements would be restricted to 100 rows. Not the
best thing if you're trying to select all employees that need a paycheck. SQL 2005 let's you use Top N in selects, plus updates and deletes
- but the syntax has some quirks as you'll see in a second.
For select statements you must specify parentheses if you are passing
a parameter, otherwise they are optional
Select Top (@TopN) * from employees Select Top 10 * from employees
When doing an update or delete, you have to use the parentheses in both
cases:
Delete Top (@TopN) from employees Delete Top (10) from employees
update Top (@TopN) employees set payrate = payrate * 1.1 update Top (10) from employees set payrate = payrate * 1.1
So a simple rule to follow would be to always wrap the 'N' in parentheses!
The most common reason I've had to do restrict the number of rows has been
when doing operations against millions of records. Updating millions of records
can not only cause blocking, but it can bloat your transaction log even if
you're in simple recovery mode (because it has to log the entire transaction).
Doing the update in batches doesn't guarantee no blocking of course, but it
should reduce the time the query takes to complete. It often takes some
experimenting to figure out the optimum batch size. I suggest making sure that
the query will complete in less than 30 seconds, as that is typically the
default timeout for a database action. Even if you're in full recovery mode
there is an advantage to doing chunks. If the size of your transaction won't fit
into the current virtual log segment the log will grow (or you'll come to a
halt!). By doing batches and running log backups you can easily keep the log
from growing to a huge size.
I've shared my favorite feature and an application for it as well - how about
writing about your favorite feature and submitting it for the SQL world to see?