Using Computed Columns

Question

Post reply

Using Computed Columns

Luis Martín Caballero

SSC Rookie

Points: 47
More actions
February 11, 2019 at 10:18 pm

#370543

Comments posted to this topic are about the item Using Computed Columns

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Mike D Say Hey Kid Points: 703 More actions · Answer 1

Why not use a view for computed columns? Is this not clearer?

Alan Burstein SSC Guru Points: 61152 More actions · Answer 2

Good article and great topic. I'm a big fan of computed columns so it's nice to see a detailed article on the topic. I would warn, however, against using a T-SQL scalar UDF for a computed column if performance is important. Once a scalar UDF's (pre-SQL 2019) becomes part of the table queries that involve that table will become slower. For starters, queries involving that table can no longer get a parallel execution plan even when that column is not referenced! Even index rebuilds must be performed with a serial plan. Based on my testing, non-inline Scalar UDFs bring these kind of problems when used for computed columns or as constraints (Check & Default). Note these two articles by Eric Darling:

Another Hidden Parallelism Killer: Scalar UDFs In Check Constraints
Another reason why scalar functions in computed columns is a bad idea

Again - I enjoyed the article and am looking forward to the next one. I'm not being critical of it in any way, I just wanted to make people aware of the performance implications of using scalar UDFs for this. A correctly developed CLR, on the other hand, does not have this problem. Fortunately, too, SQL 2019 adds inline scalar UDFs which solves this problem on the T-SQL side.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Luis Martín Caballero SSC Rookie Points: 47 More actions · Answer 3

Hi Mike,
There is no universal solution for every situation. Computed columns are another posibility like views.
When you use computed columns you can persist the information so you will have it stored whith the rest of the table. This way you get better performance when accesing the column and also you reduce the number of objects to administer. Computed columns can be used in some constraints what can also be an advantage.
As usual, neither of the two options can always be used, so it will be necessary to see each particular case to find which of the two options is the one that best fits.

Luis Martín Caballero SSC Rookie Points: 47 More actions · Answer 4

Hi Alan,
Thank you very much for the clarification. The idea was mainly to show the possibilities of the use of calculated columns, but you are right that it's necessary to be careful with the performance according to the solution that we are going to adopt.

thisisfutile Hall of Fame Points: 3669 More actions · Answer 5

Mike D - Tuesday, February 12, 2019 7:55 AM
Why not use a view for computed columns? Is this not clearer?

Our software vendor uses a lot of integer columns for status levels. For example, in the Customer table, the CustomerStatus can be 0,1,2 or 3. They then have a CustomerStatusComputed column for the convenience of anyone that's writing queries or reports. This column shows 'Open','Hold','Closed' or 'Prospect' respectively. This column acts as a sort of anchor or index location for human eyes. Yes, a view can be used and many views already exist with this exact information but, to me, it's much more convenient to look at the table for this information.

drew.allen SSC Guru Points: 77014 More actions · Answer 6

Mike D - Tuesday, February 12, 2019 7:55 AM
Why not use a view for computed columns? Is this not clearer?

I think that they're both equally clear. Views are good for complex computations, e.g., ones involving sub-queries, but I think they're overkill if you are just doing simple computed columns.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

thisisfutile Hall of Fame Points: 3669 More actions · Answer 7

We have a database that is filled with tables utilizing an integer column for things like 'Status'. For example, Customer table has a Status column and 0-3 are the integers used in this column. It also has a StatusComputed column that's computed as (case [Status] when 0 then 'Open' when 1 then 'Hold' when 2 then 'Closed' when 3 then 'Prospect' else 'Unknown' end). Without doing any testing I'd have to guess that the query optimizer would perform better with a query that has WHERE Status = 0 rather than WHERE StatusComputed = 'Open'. Does anyone have any opinions?

drew.allen SSC Guru Points: 77014 More actions · Answer 8

thisisfutile - Tuesday, February 12, 2019 11:57 AM
We have a database that is filled with tables utilizing an integer column for things like 'Status'. For example, Customer table has a Status column and 0-3 are the integers used in this column. It also has a StatusComputed column that's computed as (case [Status] when 0 then 'Open' when 1 then 'Hold' when 2 then 'Closed' when 3 then 'Prospect' else 'Unknown' end). Without doing any testing I'd have to guess that the query optimizer would perform better with a query that has WHERE Status = 0 rather than WHERE StatusComputed = 'Open'. Does anyone have any opinions?

I prefer using lookup tables for this. It's much easier to add new rows to a lookup table than it is to modify an existing table if you need to add another status, especially if all code changes need to go through a change control process. (Adding a row isn't usually considered a code change, so it doesn't need to go through the change control process.)

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

thisisfutile Hall of Fame Points: 3669 More actions · Answer 9

drew.allen - Tuesday, February 12, 2019 12:26 PM
thisisfutile - Tuesday, February 12, 2019 11:57 AM
We have a database that is filled with tables utilizing an integer column for things like 'Status'. For example, Customer t has a Status column and 0-3 are the integers used in this column. It also has a StatusComputed column that's computed as (case [Status] when 0 then 'Open' when 1 then 'Hold' when 2 then 'Closed' when 3 then 'Prospect' else 'Unknown' end). Without doing any testing I'd have to guess that the query optimizer would perform better with a query that has WHERE Status = 0 rather than WHERE StatusComputed = 'Open'. Does anyone have any opinions?
I prefer using lookup tables for this. It's much easier to add new rows to a lookup table than it is to modify an existing table if you need to add another status, especially if all code changes need to go through a change control process. (Adding a row isn't usually considered a code change, so it doesn't need to go through the change control process.)
Drew

I'm actually not aware of what a lookup table is. Are you referring to a normalized table? Like status is the key field in the Customer table and they reference a lookup table called "CustomerStatus"?

For reference, the DB I'm referring to is maintained by our software developer. It's their design and the ERP software references the Status column (0-3) and displays Open, Hold, Closed, or Prospect based on its C# code. They decided to put the computed columns in for us customers simply for reference. No more need to write a case statement in the select clause, instead just select the StatusComputed column.

I'm interested to know more about lookup tables though. I searched it and my first hit was a Joe Celko article. Since he's not for the faint of heart (and since I fainted) I figured I'd let you respond to point me in the right direction. 😀

drew.allen SSC Guru Points: 77014 More actions · Answer 10

thisisfutile - Tuesday, February 12, 2019 12:54 PM
drew.allen - Tuesday, February 12, 2019 12:26 PM
thisisfutile - Tuesday, February 12, 2019 11:57 AM
We have a database that is filled with tables utilizing an integer column for things like 'Status'. For example, Customer t has a Status column and 0-3 are the integers used in this column. It also has a StatusComputed column that's computed as (case [Status] when 0 then 'Open' when 1 then 'Hold' when 2 then 'Closed' when 3 then 'Prospect' else 'Unknown' end). Without doing any testing I'd have to guess that the query optimizer would perform better with a query that has WHERE Status = 0 rather than WHERE StatusComputed = 'Open'. Does anyone have any opinions?
I prefer using lookup tables for this. It's much easier to add new rows to a lookup table than it is to modify an existing table if you need to add another status, especially if all code changes need to go through a change control process. (Adding a row isn't usually considered a code change, so it doesn't need to go through the change control process.)
Drew
I'm actually not aware of what a lookup table is. Are you referring to a normalized table? Like status is the key field in the Customer table and they reference a lookup table called "CustomerStatus"?
For reference, the DB I'm referring to is maintained by our software developer. It's their design and the ERP software references the Status column (0-3) and displays Open, Hold, Closed, or Prospect based on its C# code. They decided to put the computed columns in for us customers simply for reference. No more need to write a case statement in the select clause, instead just select the StatusComputed column.
I'm interested to know more about lookup tables though. I searched it and my first hit was a Joe Celko article. Since he's not for the faint of heart (and since I fainted) I figured I'd let you respond to point me in the right direction. 😀

It's just what it sounds like. You use one value to look up another. Normalization is a separate, but related concept. Lookup tables are usually normalized, but a normalized table isn't necessarily considered a lookup table.

Generally, you have a single code that you look up in a table to get the description. Joe mentions the "One True Lookup Table" in which case, you will need two (or more) codes which will give you a unique combination, but otherwise the idea is the same--you look up the unique combination to get a description.

Also, you have your terminology a little mixed up. The key field of a table MUST be unique. So, status cannot be the "key field in the Customer table," because it is not unique in the customer table.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

thisisfutile Hall of Fame Points: 3669 More actions · Answer 11

drew.allen - Tuesday, February 12, 2019 2:05 PM
Also, you have your terminology a little mixed up. The key field of a table MUST be unique. So, status cannot be the "key field in the Customer table," because it is not unique in the customer table.
Drew

I'm sorry, I thought 'foreign' was understood, just as 'primary' is in your statement.

drew.allen SSC Guru Points: 77014 More actions · Answer 12

thisisfutile - Wednesday, February 13, 2019 1:34 PM
drew.allen - Tuesday, February 12, 2019 2:05 PM
Also, you have your terminology a little mixed up. The key field of a table MUST be unique. So, status cannot be the "key field in the Customer table," because it is not unique in the customer table.
Drew
I'm sorry, I thought 'foreign' was understood, just as 'primary' is in your statement.

The word primary is optional, because it's the default kind of key. The word foreign is required, because it's not the default kind of key. At least, that is my understanding of how the language works, and I haven't seen anything to indicate otherwise (although that could just be observational bias).

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA