Computed Columns

Computed columns are one of those features that we all know about but tend to

forget about when it's time to implement. Are they really a useful option since

we're denormalizing our data in some cases? Can they boost performance? The

answer to both is yes and they can actually save you some effort compared to

other options. We'll use a simple reporting example to drive our discussion.

Let's use Adventureworks for the examples. We'll start simple. You've been

helping build reports and a very common task is to list data from Person.Contact

including showing contact names as lastname then a comma and a space, then a first name (Warren, Andy). Because you're a

power user you just create it in your select statement, like this:

select lastname + ', ' + firstname as Fullname from person.contact

One way to to help out the less technical users is to create a view that

encapsulates that usage, like this:

create view vPersonContact as
--note, you could alternatively use the * to indicate all columns, I just prefer to specify
select 
  lastname + ', ' + firstname as Fullname
  ,ContactID
      ,NameStyle
      ,Title
      ,FirstName
      ,MiddleName
      ,LastName
      ,Suffix
      ,EmailAddress
      ,EmailPromotion
      ,Phone
      ,PasswordHash
      ,PasswordSalt
      ,AdditionalContactInfo
      ,rowguid
      ,ModifiedDate
from person.contact

You'll need to grant select permissions on the view to the same users that

currently have select permissions on the Person.Contact table. At that point

they can use the view in place of the table and get the fullname column for

free. Conceivably not all the users will 'get' that the view is really the table

plus a column. You could - using a little care and doing this off hours - rename

the Person.Contact table to Person.ContactBase, and then modify your view as

follows:

create view Person.Contact as
--note, you could alternatively use the * to indicate all columns, I just prefer to specify
select 
      lastname + ', ' + firstname as Fullname
      ,ContactID
      ,NameStyle
      ,Title
      ,FirstName
      ,MiddleName
      ,LastName
      ,Suffix
      ,EmailAddress
      ,EmailPromotion
      ,Phone
      ,PasswordHash
      ,PasswordSalt
      ,AdditionalContactInfo
      ,rowguid
      ,ModifiedDate
from person.contactbase

This can take a little work if you have foreign keys, but it's doable, and ends up being being a nice transparent change to your report writers and to any application

using the table/view. I'm leaving that as an exercise for you, and we'll just

proceed using the first view. Now let's see what happens when they start to take

advantage of our new column in unexpected ways such as filtering against it:

select * from vPersonContact where fullname like '%, andy'

In my copy of Adventureworks I have separate indexes on lastname and firstname. You can see the user is trying to identify rows in the table with a firstname of Andy.

SQL typically doesn't give us great query plans with a leading wild card so it

goes back to the dependable clustered index scan (essentially a table scan,

meaning it has to look at every row in the table to see if it matches or

doesn't). There's nothing to stop them from filtering on firstname and

displaying fullname, but most users wouldn't know that or tend to do that by

default. But, let's look at what would happen if they did:

select * from Person.Contact where firstname='andy'

We get a much better plan (always use Profiler to confirm that the plan truly

is more effective by comparing reads for the two queries). Clearly we'd like to

avoid a table scan and the best way short of turning our users into DBA's is to

index the column. One way we can accomplish this is to modify the view to change

it to an indexed view which would allow us to index fullname after creating a

unique clustered index on the view. Indexed views are useful, but they add some

weight and complexity that we might not yet need. Another option would be to add

FullName to the table as a real column, and then create insert and update

triggers on the table that maintain the column based on changes to either first

or last name. We could then index FullName and avoid the extra complexity of the

indexed view. Our final solution and of course our target for today is to use a

computed column which is almost the the same as adding the column and creating

the triggers as you'll see next.

To add a computed column via TSQL the syntax is easy:

ALTER TABLE Person.Contact ADD FullName  AS lastname + ', ' + firstname

By default computed columns are not persisted; that is, it's really just adding the expression to the table much as we added it to the view earlier. It's cleaner than

the view if you always want the column to be present. Each time we access the

column SQL will calculate the value and then display or filter against it

appropriately. It will not change our query performance though as we're

still doing it all on the fly. Here's what the plan looks like now:

If we change the definition to persisted that will eliminate the overhead of

calculating the value every time, the trade off is that we have to store the

calculated value and maintain it (essentially our trigger based solution from

above). We'll just add another column called FullName2 that will be persisted

and then retest our query:

ALTER TABLE Person.Contact ADD FullName2  AS lastname + ', ' + firstname PERSISTED

Just a minor change in the plan and one that probably doesn't greatly affect

performance as we're still scanning the table. The final option we have is to

index the column, but does it make a difference whether we index the persisted

or non persisted column? We'll create a non clustered index on each column and

then run our queries again:

The query plans are identical which makes sense since regardless of how we

got the values, the final index would turn out exactly the same. We get an index

scan because we're still using the leading wild card (maybe training them not to

use that isn't such a bad idea after all) and that usually eliminates a seek as

an option. It's still considerably faster than a table scan though and that's

better than where we started.

At this point I haven't tested with more complex computed columns to see if I

can see a difference in overhead to maintain them. Persisting and indexing would

take up more space than just indexing or just persistently used singly. I

typically opt to just index the computed column rather than persist it, but

that's not a best practice - just my practice. The best guidance I can offer is

that if you're about to write triggers to maintain a column take a hard look at

computed columns first, then indexed views, as a way to let SQL do all the work

for you in a standard way. Denormalizing data should not be our first choice and

we want to look at all the options, but as you can see from the example we

really were denormalizing to help our users rather than just to boost

performance.

I blog once a week or so at

http://blogs.sqlservercentral.com/blogs/andy_warren/default.aspx about

SQLServer, SQL user groups, and related topics. I hope you'll visit and comment

occasionally!

Rate

Share

Categories

Share

Rate

Computed Columns

Rate

Share

Categories

Share

Rate

Related content

Index on a computed column

Auto generated SQL Server keys - uniqueidentifier or IDENTITY

Identity Columns

A Normalization Primer

Querying Responses