June 7, 2004 at 12:52 pm
It's been just a month since I joined my current project...
I have noticed that most of the tables in the Databases have character data defined as 'CHAR' instead of 'VARCHAR'. We have one particular table with over 58 columns with over 5 million records have field lengths like char(25), char(50) etc. We are at the point where we have to alter some of the columns in the table. Was wondering if it would make logical sense to take this time to change the datatype from CHAR to VARCHAR (almost all the of the char fields are mainly descriptive in nature). Apart from the space gain that will occur, is there a performance gain that could happen with the change. There are queries based on these char fields, will they improve? Is it worth making the change? We have some slow running queries off this table which need tuning (I am working on it..).
Please advise...Thanxs
FYI: We have SQL Server 2000, all latest servicepacks
June 7, 2004 at 1:43 pm
First if you are near the max length on most of your fields you could actually loose space because varchar incurrs and extra 2 bytes to manage the offset for non-null values. On top of that if you do have nulled columns but have varchars after the columns in other columns that are not null the 2 bytes are still used sorta as a place holder. And if you have several columns before it that are all null and there would be no non-nulled columns after this then each of those records their 2 bytes for the offset. But again that is if you are near the max length for the columns.
Second consider shortening the char columns to the largest length you actually use to gain space back nd get rid of immediate waste. If you have some like this then reconsider my first comment afterwards.
Third and I am still working to confirm this, but in comparing strings especially for orders SQL Server may actually pad the varchars with blanks to have a fixed length to do an easier compare. If this is the case then that adds overhead to the job of doing such tasks and in such cases CHARs would be better. Any MS folks who know for sure please jump in as I only read this in a book based on several different dbs and didn't say anything specific.
Now if the key is to conserve space consider INT columns and fixed length columns you may currently have and see if they truely need to be their current length or implementation. I have seen people use INTs for columns whose only values would ever be 1,2,3,or 4. Using a TINYINT instead would easily meet those needs with 1 byte of storage versus 4 for INT, so with 5 million rows they will save 15 million bytes or roughly 14.3 MBs of storage in just one column, and depending on the width of the entire row make gain enough space multiple coluns to cause extra rows to fit in each page saving pages some of the slack space (the difference between the maximum allowed space and what can fit physically in it).
Also like I said with varchars you have to be carefully for things like put all your non-nullable columns first, and that will always reach their max or at least 2 bytes from should be made into chars and consider percentage of condition. In storage all fixed length columns are stored first then each varchars offset and finally the data for any varchar filled that has actual data. Remember a tables column order unless you program based on order (which is considered poor) is unimportant. Even then a view can replace handle for the program as such setting a specific controlled order so the base table can be better optimized.
June 7, 2004 at 1:57 pm
Thanks Antares686.
We do not have a space crunch issue. My focus is to improve performance and just wanted to see if a change from char to varchar would improve performance of queries. From your reply, I gather not to expect a big gain in any form.
Any one else with a different view?????? Any general recommendation on when to define a field as CHAR and VARCHAR (from performance impact stand point...)
June 8, 2004 at 12:30 am
Antares686 sead it all
- Figure out the average actual datalength of you char columns.
SELECT avg(datalength( rtrim([yourcol]) )) as yourcol_AVG_datalength
, Max(datalength( rtrim([yourcol]) )) as yourcol_Max_datalength
- are those columns indexed ? Also consider the overhead for that (in both ways).
- only use float if you realy have to.
Correct datatyping saves a bunch of troubles ! (now and in the future)
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply