Computed column storage size

Question

Computed column storage size

CferMN

SSC Eights!

Points: 922
More actions
October 19, 2016 at 10:41 am

#313985
We have an ID that is 8 characters long with leading zeros if the ID is less than 8 characters long. Currently, it is stored as a VARCHAR(8), with the developer having to specifically add the leading zeros during insertion. I would like to change that MemberId field to an INT field, since letters are never used for this field, only numbers. The only issue is we still need to display the leading zeros for reports, websites, applications, etc.
My solution was to use a computed column on the INT field, using for the presentation the function FORMAT(MemberId, '00000000'). While this does work, I noticed an anomaly with the computed column's data type; it returns an NVARCHAR(4000), which is the max size for NVARCHAR.
I also added another computed column, this time using REPLICATE and LEN to get the desired result. This returned a data type of VARCHAR(8000), which is the max size for VARCHAR.
Here's the code I used to set up this quick example:
CREATE TABLE test.MemberTrial
(
MemberId INT,
ReplicatedMemberId AS REPLICATE('0', 8 - LEN(MemberID)) + CAST(MemberID AS VARCHAR(8)),
FormattedMemberId AS FORMAT(MemberId, '00000000'),
CastMemberId AS CAST(FORMAT(MemberId, '00000000') AS CHAR(8))
)
INSERT INTO test.MemberTrial (MemberId)
VALUES
(123),
(1123),
(11235),
(112358),
(11235813),
(112358132),
(9999999),
(321321),
(3),
(73)
SELECT
MemberId,
ReplicatedMemberId,
FormattedMemberId,
CastMemberId
FROM
test.MemberTrial
My questions are:
- Since they are non-persisted computed columns, does it matter if FORMAT() or REPLICATE() returns a max NVARCHAR/VARCHAR data type?
- Is there a way to force FORMAT() to a smaller data type without casting?
- If I wrap FORMAT() in a CAST(), will that cause a little performance hit when selecting the values?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

DesNorton SSC-Insane Points: 24396 More actions · Answer 1

A computed column is not stored, therefor there is no need to be concerned about the size. HOWEVER, SQL has to recompute the value every time that it is read.

If you make the computed column PERSISTED, then it is persisted to disk. You also have the added benefit of being able to index the value, if necessary/desired.

Both of these options will result in a column of type varchar(8)

ReplicatedMemberId AS RIGHT(REPLICATE('0', 8) + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

ReplicatedMemberId2 AS RIGHT('00000000' + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

How to post data/code on a forum to get the best help.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

Exactly what DesNorton mentioned. I just wanted to add something about the performance.

The FORMAT function has been proven to be very slow. It's up to 40 times slower that previous formatting methods. What seems to be the most effective option is to add the zeros before casting.

CREATE TABLE MemberTrial

(

MemberId INT,

RightMemberId AS RIGHT(MemberId + 100000000, 8)

);

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

CferMN SSC Eights! Points: 922 More actions · Answer 3

DesNorton (10/19/2016)
A computed column is not stored, therefor there is no need to be concerned about the size. HOWEVER, SQL has to recompute the value every time that it is read.
If you make the computed column PERSISTED, then it is persisted to disk. You also have the added benefit of being able to index the value, if necessary/desired.
Both of these options will result in a column of type varchar(8)
ReplicatedMemberId AS RIGHT(REPLICATE('0', 8) + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,
ReplicatedMemberId2 AS RIGHT('00000000' + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,

I realize that if you persist them they will go to disk. But wouldn't that also increase storage? If you're already storing the INT value (4 bytes), then persist the VARCHAR(8) value (10 bytes, correct? Eight for the characters, 2 for overhead?), you're basically storing the value twice, taking up 14 bytes as opposed to 4. Is that correct?

And wouldn't it be best just to store and index the original MemberId INT column and use the computed column for display only? Then you're indexing an INT instead of a VARCHAR. Isn't that better for speed and performance, too?

DesNorton SSC-Insane Points: 24396 More actions · Answer 4

CferMN (10/19/2016)
DesNorton (10/19/2016)
A computed column is not stored, therefor there is no need to be concerned about the size. HOWEVER, SQL has to recompute the value every time that it is read.
If you make the computed column PERSISTED, then it is persisted to disk. You also have the added benefit of being able to index the value, if necessary/desired.
Both of these options will result in a column of type varchar(8)
ReplicatedMemberId AS RIGHT(REPLICATE('0', 8) + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,
ReplicatedMemberId2 AS RIGHT('00000000' + CAST(MemberID AS VARCHAR(8)), 8) PERSISTED,
I realize that if you persist them they will go to disk. But wouldn't that also increase storage? If you're already storing the INT value (4 bytes), then persist the VARCHAR(8) value (10 bytes, correct? Eight for the characters, 2 for overhead?), you're basically storing the value twice, taking up 14 bytes as opposed to 4. Is that correct?
And wouldn't it be best just to store and index the original MemberId INT column and use the computed column for display only? Then you're indexing an INT instead of a VARCHAR. Isn't that better for speed and performance, too?

If you ever need to search for MemberId like '000012%' then you will have to do a full table scan for each search. In this case, you are better off adding the persisted column and indexing it.

If that never happens, then there are 2 options

1 - Store the extra 10Bytes, and use drive space - With a max of 10mil rows, that transaltes to a max 100MB (or 95.3 if 1KB = 1024B).

2 - Take the CPU perf hit of having to do the calc on every read.

You need to make the call of which is the lesser of 2 evils for your situation.

How to post data/code on a forum to get the best help.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

Persisted computed columns only make sense when the computation is expensive or when an index is needed. If that's not the case, persisting the computed column makes little sense.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2