January 29, 2013 at 4:53 pm
Hello Everyone
I hope that everyone is having a Fabulous day.
I was in a discussion with a DB2 developer that has some how, designed a SQL Server database. It is a nice mess actually
He was trying to convince me that char is better performing than varchar
I am trying to find some documentation about this, but cannot. Not even in the BOL. I have always used varchar for everything to store string data, unless I know for certain of the data length, like a postal code that includes the hyphen
I am hoping that someone can shed some light on the subject.
Thanks in advance for all your advice, help and suggestions
Andrew SQLDBA
January 29, 2013 at 5:39 pm
AndrewSQLDBA (1/29/2013)
He was trying to convince me that char is better performing than varchar
There is a small area where CHAR performs better storage-wise than VARCHAR because of field size settings, where VARCHAR() requires 2 extra characters in the datapage per record to state the size of the record. You're at the nanosecond response level there though, so you would need to be operating on millions of records simultaneously to see any benefit.
In theory, char not requiring the extra internal step of determining datalength during a row extraction for varchar would probably allow for some benefit, but... I've never seen it.
So, you got me curious. Kowabunga!
DECLARE @starttime DATETIME2
SET @starttime = GETDATE()
SELECT
*
FROM
#SampleSetVC
WHERE
SomeData = '@@@@@@@@@@@@@@@@@@@@'
PRINT 'VC Check: ' + CONVERT( VARCHAR(60), DATEDIFF(ms, @starttime, GETDATE()))
SET @starttime = GETDATE()
SELECT
*
FROM
#SampleSetCH
WHERE
SomeData = '@@@@@@@@@@@@@@@@@@@@'
PRINT 'CH Check: ' + CONVERT( VARCHAR(60), DATEDIFF(ms, @starttime, GETDATE()))
GO 10
Results:
Beginning execution loop
(50070 row(s) affected)
VC Check: 447
(49956 row(s) affected)
CH Check: 427
(50070 row(s) affected)
VC Check: 587
(49956 row(s) affected)
CH Check: 400
(50070 row(s) affected)
VC Check: 380
(49956 row(s) affected)
CH Check: 413
(50070 row(s) affected)
VC Check: 390
(49956 row(s) affected)
CH Check: 390
(50070 row(s) affected)
VC Check: 413
(49956 row(s) affected)
CH Check: 520
(50070 row(s) affected)
VC Check: 416
(49956 row(s) affected)
CH Check: 660
(50070 row(s) affected)
VC Check: 557
(49956 row(s) affected)
CH Check: 590
(50070 row(s) affected)
VC Check: 410
(49956 row(s) affected)
CH Check: 414
(50070 row(s) affected)
VC Check: 417
(49956 row(s) affected)
CH Check: 423
(50070 row(s) affected)
VC Check: 410
(49956 row(s) affected)
CH Check: 447
Batch execution completed 10 times.
High enough variability that I wouldn't say for sure one is definately better than the other, but Varchar outperformed Char often enough that I'd say that my one theory is debunked.
Basically, it's storage considerations only, unless there's some extreme test I just hadn't thought of.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
January 29, 2013 at 6:19 pm
varchar fields that get frequently updated can cause page splits.
Doubly so if they are indexed.
January 29, 2013 at 10:18 pm
Just as a thumb rule (that I follow), use CHAR only when you are sure of the character length. For example - SSN number is always of fixed length. For data which you are not sure of the exact datalength in advance use VARCHAR for that.
SQL Server differentiates data types basis allocation units --> ROW_OVERFLOW_DATA, IN_ROW_DATA and LOB_DATA. CHAR comes under the category of IN_ROW_DATA and VARCHAR comes under the category of ROW_OVERFLOW_DATA.
IN_ROW_DATA allocation unit means that data will pretty much be stored on the same page (if the table design is simple and no other allocation units used). SQL Server pretty much knows in advance the data bytes that will be stored. This means, querying the data is faster.
On the other hand, in case of ROW_OVERFLOW_DATA, SQL Server will identify the length of data to be stored on run time (of-course max limit is always there). In-case the data does not fit into a single page, i.e. data overflows to another page. Definitely, considering this case the query has to span multiple pages to get a single record.
I am trying to present a view here so that you understand what is happening internally. In case you want complete detail kindly refer BOL.
~ Lokesh Vij
Link to my Blog Post --> www.SQLPathy.com[/url]
Follow me @Twitter
January 30, 2013 at 10:16 am
AndrewSQLDBA (1/29/2013)
I was in a discussion with a DB2 developer ...
He was trying to convince me that char is better performing than varchar
Andrew SQLDBA
First, let's be grateful he's a developer, not a DBA 🙂
Char was often used on the "DB2" on AS/400 because on that platform char was indeed more efficient (all strings were stored as char anyway).
For SQL Server, that's just a ridiculous claim. Char just wastes space for many columns.
Varchar does have some overhead, including a 2-byte length. So for, 3 chars or less, just use char(1|2|3). 4 chars is borderline: if at least two were (almost) always present, I'd probably use char there too.
Otherwise, as noted by others, use varchar unless it's fixed usage.
If you know that a specific varchar column(s) are being initially inserted as null/blank, but will be updated later, you might pre-fill them with nn '*'s or other dummy char to prevent the row from expanding later, which can cause forwarding pointers (shudder).
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 31, 2013 at 6:12 am
"Forwarding pointers" only if we're talking about a heaps, else a page split. Neither is good though. If you have a column that will initially be empty, even something wider than a few bytes, but will certainly be provided later then a CHAR may make sense to avoid having the data exceed the size of the page on update.
I heard a viable theory recently that said that IBM settled on CHAR for all data types because they also sold the disk drives that the databases ran on 😉
In theory CHAR would perform slightly better than a VARCHAR because the length is known ahead of time so retrieving the value is as simple as fseek (drawing from my C++ days) with a known offset whereas retrieving a VARCHAR would require some other inference or derivation from the schema or at worst reading one byte at a time until the 'end of VARCHAR'-byte was reached. That said, modern computing has supposedly closed the gap via optimizations for handling VARCHAR so the differences are now negligible and can effectively be cast aside as a performance consideration, in theory, of course.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
January 31, 2013 at 6:25 am
opc.three (1/31/2013)
In theory CHAR would perform slightly better than a VARCHAR because the length is known ahead of time so retrieving the value is as simple as fseek (drawing from my C++ days) with a known offset whereas retrieving a VARCHAR would require some other inference or derivation from the schema or at worst reading one byte at a time until the 'end of VARCHAR'-byte was reached. That said, modern computing has supposedly closed the gap via optimizations for handling VARCHAR so the differences are now negligible and can effectively be cast aside as a performance consideration, in theory, of course.
Very nicely said 🙂
~ Lokesh Vij
Link to my Blog Post --> www.SQLPathy.com[/url]
Follow me @Twitter
February 4, 2013 at 8:06 am
The considerations for char and varchar on SQL Server and DB2 are much the same.
Many (over 10) years ago DB2 moved to having an inheriently variable length row, just as SQL Server has. Prior to that, char was faster to process than varchar, but subsequent to that change it is hard to discern the difference.
If the data has a fixed length, then char avoids a tiny amount of processing when returning the row to the caller, but this is all about setting the length attribute for the field. If the output field is of variable length, then the length has to be set regardless of if the table is char or varchar.
On insert or update, there is probably no difference between the work needed for char or varchar. Rows in DB2 and SQL server are formatted in a similar way, with a header that contains a list of pointers, each of which gives the offset of the start of the data for the column they represent. You could not tell by looking at the data on disk if a given column was char or varchar - it is only when the table definition is merged with the pointer list that the DBMS knows if it has char or varchar data.
This does not apply to all database systems. Historically Oracle only deals with varchar data, and treats a zero-length varchar as a Null. Both SQL Server and DB2 treat a zero-length varchar as an empty string and have separate indicators to identify if the string is a Null.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
February 4, 2013 at 10:32 am
opc.three (1/31/2013)
"Forwarding pointers" only if we're talking about a heaps, else a page split. Neither is good though. If you have a column that will initially be empty, even something wider than a few bytes, but will certainly be provided later then a CHAR may make sense to avoid having the data exceed the size of the page on update.I heard a viable theory recently that said that IBM settled on CHAR for all data types because they also sold the disk drives that the databases ran on 😉
In theory CHAR would perform slightly better than a VARCHAR because the length is known ahead of time so retrieving the value is as simple as fseek (drawing from my C++ days) with a known offset whereas retrieving a VARCHAR would require some other inference or derivation from the schema or at worst reading one byte at a time until the 'end of VARCHAR'-byte was reached. That said, modern computing has supposedly closed the gap via optimizations for handling VARCHAR so the differences are now negligible and can effectively be cast aside as a performance consideration, in theory, of course.
I think IBM did it on the AS/400 because the underlying system was a flat file system, with fixed-length records. That is, IBM just put a relational access layer on top of a non-relational, traditional file system. That was in the ~mid 1990's. I guess by now they've changed it, although I think many people simply left their AS/400 systems with fixed char(nn) columns.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply