November 7, 2018 at 1:03 am
when a table has one hundred columns, where 40 of them have datatypes far exceeding the max(len()) of data actually in them, will this alone impact query performance?
Breakdown
varchar(100) - 19 having max length 20
varchar(300) - 4 having max length 51
varchar(500), - 12 having max length 94
varchar(1000) - 2 having max length 86
varchar(2000) - 2 having max length 115
up to varchar(4000) - 2 having max length 3999 & 392
--Quote me
November 7, 2018 at 1:52 am
Probably not, although it may affect data integrity. If the maximum length of any value in the column is going to be 50, then you don't want varchar(300) for that column, since there's a danger you will accidentally admit invalid values.
One way you may get a small performance improvement is if you have any fixed-length columns - credit card numbers for example. In those cases you should use char instead of varchar, both for data integrity and to avoid the two-byte overhead that varchar incurs over char. (Of course, if it's credit card numbers we're talking about, there are all sorts of security concerns on top of that, but that's beyond the scope of this topic.)
John
November 7, 2018 at 2:12 am
Memory allocated for the execution of a query is based on the assumed row width. The assumed row width is not based on reality - it's based on the declared size of the columns.
I posted details in another thread (https://www.sqlservercentral.com/Forums/1997686/NVarchar4000-and-Performance#bm1997863). That shows the memory requests and grants associated with selecting the same ~40-byte value out of columns of different data types. The larger the column size, the more wasted memory granted to the query.
-Eddie
Eddie Wuerch
MCM: SQL
November 7, 2018 at 12:38 pm
John says No, Eddie gives a fully explained Yes (on referenced link).
Because of the well reasoned explanation I accept Eddie's answer.
Large datatypes require more space to index as well, don't they?
--Quote me
November 7, 2018 at 3:30 pm
polkadot - Wednesday, November 7, 2018 12:38 PMLarge datatypes require more space to index as well, don't they?
With character data, no.
Physically, there's no difference between storing or indexing a 40-byte string as varchar(128) or varchar(512).
An ASCII value stored as nvarchar(100) will take twice the space as varchar(100), unless row-level compression is in use.
Eddie Wuerch
MCM: SQL
November 7, 2018 at 11:05 pm
oh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.
if I take a len() of a column value and it returns 38, is this 38 bytes?
--Quote me
November 8, 2018 at 1:50 am
It won't affect indexing, no, but Eddie is right - it does affect memory grants, which in turn affects performance. I overlooked that in my first reply.
LEN() returns the number of characters in a string. DATALENGTH() returns the number of bytes it uses.
John
November 9, 2018 at 3:01 am
polkadot - Wednesday, November 7, 2018 11:05 PMoh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.
Declared varchar column length will not affect the physical (on-disk) or data cache storage. It will affect the performance of actually using that index. The values must be loaded into a query's executing memory space in order to be read and processed. The memory requested in order to load that data is based on the declared size, not the actual size. The data must read into memory in order to analyze it to figure out how large it is.
varchar(100) - 19 having max length 20
varchar(300) - 4 having max length 51
varchar(500), - 12 having max length 94
varchar(1000) - 2 having max length 86
varchar(2000) - 2 having max length 115
up to varchar(4000) - 2 having max length 3999 & 392
With the exception of the varchar(4000) column storing up to 3999 bytes, any query accessing those columns will request much more memory than it actually needs in order to execute the query. This isn't a cache - it's single-use and each running query requests a separate grant. Memory wasted on queries can't be used for caches or keeping sorts out of tempdb.
The cool thing is that shrinking the column definition (such as turning the varchar(2000) columns into varchar(200) columns) doesn't require any writes to the table, just a read to ensure the change is safe. Making similar moves across the other columns will have a positive effect on performance, which could be significant on a highly-concurrent system that touches this data frequently on many threads.
Eddie Wuerch
MCM: SQL
November 10, 2018 at 8:15 am
Eddie Wuerch - Friday, November 9, 2018 3:01 AMpolkadot - Wednesday, November 7, 2018 11:05 PMoh, then you and John are both saying that if varchar datatype size is much bigger than actual length of data, this will not affect query performance or indexing.Declared varchar column length will not affect the physical (on-disk) or data cache storage. It will affect the performance of actually using that index. The values must be loaded into a query's executing memory space in order to be read and processed. The memory requested in order to load that data is based on the declared size, not the actual size. The data must read into memory in order to analyze it to figure out how large it is.
varchar(100) - 19 having max length 20
varchar(300) - 4 having max length 51
varchar(500), - 12 having max length 94
varchar(1000) - 2 having max length 86
varchar(2000) - 2 having max length 115
up to varchar(4000) - 2 having max length 3999 & 392With the exception of the varchar(4000) column storing up to 3999 bytes, any query accessing those columns will request much more memory than it actually needs in order to execute the query. This isn't a cache - it's single-use and each running query requests a separate grant. Memory wasted on queries can't be used for caches or keeping sorts out of tempdb.
The cool thing is that shrinking the column definition (such as turning the varchar(2000) columns into varchar(200) columns) doesn't require any writes to the table, just a read to ensure the change is safe. Making similar moves across the other columns will have a positive effect on performance, which could be significant on a highly-concurrent system that touches this data frequently on many threads.
You should write an article on this, Eddie. Too many people think that NUMERIC(18,0) and NVARCHAR(256) (the defaults of lot of "table designer" software) is OK to use for just about everything.
And while Knuth was absolutely correct, too many people confuse doing things the right way with "pre-optimization".
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply