June 7, 2005 at 2:32 pm
I am playing with some data I have and working on restricting it as small as possible.
I just discovered something that didn't occurr to me but it might have to some.
Taking the started US Postal zip code nnnnn-nnnn (where n = a number) what is the datatype (or combination of more than 1) you can cram this into?
Note: I don't want any compression schema used just straight data in and out of a datatype or two in one or two fields.
This is just for fun in case there is something elase I didn't see and to see if anyone has every thought of the way I just discovered.
June 7, 2005 at 2:37 pm
It's the first time I see us zip codes of more than 5 numbers. Is this a common occurance?
If not maybe you could ship that in its own colum using a smallint (2 bytes instead of 3 bytes for chars), Then you could shrink the current zip code column to char(5) or int(4), therefore saving at least 2-3 bytes per line.
Just my 2 cents.
June 7, 2005 at 4:46 pm
If I am understanding the requirements correctly then a DECIMAL(9) seems efficient, needing 5 bytes storage. But it of course means there will be conversion needed when working with it, but that is the way I understand the reqs.
June 7, 2005 at 11:22 pm
I prefer varchar(10) over 2 int or 1 int and 1 smallint, due to leading zeros in the Zip Code and Plus 4 portions, otherwise when you output two numbers you will have to:
SELECT NULLIF(RIGHT('00000'+CONVERT(varchar,Zip_int),5),'00000')+ISNULL('-'+RIGHT('0000'+CONVERT(varchar,Plus4_smallint),4),'')
Instead of:
SELECT ZipCode
With varchar(10) you do not waste database storage with 5 spaces for a char(10) when you only have the Zip code portion of the value.
A plus for varchar(10) over int, is the support for the Canadian and UK PostalCode, which use alpha numeric characters.
Andy
June 7, 2005 at 11:42 pm
This shows the space usage for a table with only the Zip code:
CREATE TABLE Zip_1_Num (Zip decimal(9) NULL)
CREATE TABLE Zip_2_Num (Zip int NULL, Plus4 smallint NULL)
CREATE TABLE Zip_varchar (Zip varchar(10) NULL)
TableName NumCols RowSize
------------ ----------- --------
Zip_1_Num 1 12
Zip_2_Num 2 13
Zip_varchar 1 20
So for strictly space DECIMAL wins, but try and ORDER BY and you will get a surprise, plus how to deal with a NJ Zip+4:
Decimal Int Smallint Varchar
--------- ----- --------- ----------
80080234 8008 234 08008-0234
Andy
June 8, 2005 at 8:39 am
Remi,
This zip code format is referred to as zip plus 4. All us addresses have a zip + 4 code but it is only commonly used in commercial applications. Its purpose it to further narrow down the destination of the mail and may help to reduce bulk mailing costs.
I couldn't even tell you what my home address extra 4 digits are without looking at some commercial mail sent to me.
June 8, 2005 at 8:42 am
Thanx for the info.
So what's the final (shortest) datatype for this problem?
Anyones already done this in a real application and can relate the pros and cons of changing from the varchar(10) datatype?
June 8, 2005 at 10:52 am
INT is smallest, but generally not recommended because of the manipulation. INT is +- 2 billion, roughly, plenty for 9 digits. DECIMAL(9) is 5 bytes. I see no advantage to that over INT, though I suppose you could store the +4 as decimals and extract the ZIP5 with FLOOR(). I'd recommend CHAR(9) or CHAR(10), unless you have a really good reason for saving space. The following works well for measuring row size.
It looks like SQL 2000 requires a minimum of 11 bytes for a row (smallint, two tinyints, or 8 bit fields, or a null varchar()). Note that bit fields take two bits each, because of the NULL bit, even if NOT NULL. It seems VARCHAR has 4 bytes of overhead for the length. VARCHAR(9) with 5 digits stored and CHAR(9) take the same amount of space.
I wrote a script to measure this using dynamic SQL:
use tempdb
go
declare @datatype varchar(20)
declare dtcurs cursor fast_forward for
SELECT datatype = 'INT'
UNION ALL SELECT 'DEC(9)'
UNION ALL SELECT 'CHAR(5)'
UNION ALL SELECT 'CHAR(9)'
UNION ALL SELECT 'VARCHAR(9)'
open dtcurs
fetch next FROM dtcurs into @datatype
while @@fetch_status = 0
begin
if object_id('tv') is not null
drop table tv
if object_id('tv2') is not null
drop table tv2
exec ('create table tv (x ' + @datatype + ')')
set nocount on
declare @i int set @i=0
while @i < 32
begin
insert tv(x) select 98105
set @i=@i+1
end
select tv1.* into tv2 from tv tv1, tv tv2, tv tv3, tv tv4
--exec sp_spaceused tv2
declare @id int
set @id = object_id('tv2')
declare @output varchar(80)
select @output = left(@datatype + space(15), 15) + ' takes '
+ str( ( ( select sum(dpages) from sysindexes
where indid < 2 and id = @id)
+( select isnull(sum(used), 0) from sysindexes
where indid = 255 and id = @id )
) * d.low / 1048576., 7, 2) + ' bytes'
from master.dbo.spt_values d
where d.number = 1 and d.type = 'E'
print @output
fetch next FROM dtcurs into @datatype
end
close dtcurs
deallocate dtcurs
go
set nocount off
go
--OUTPUT (where data = 98105):
INT takes 13.18 bytes
DEC(9) takes 14.19 bytes
CHAR(5) takes 14.19 bytes
CHAR(9) takes 18.26 bytes
VARCHAR(9) takes 18.26 bytes
Measured somewhat more manually:
--varchar(9) 11.14 ''
--char(5) 14.18
--char(9) 18.25
--varchar(9) 18.26 '98105'
--char(10) 19.24
--dec(9) 14.19
--int 13.20
Some additional:
--tinyint 11.14 --set to 1
--2 tinyints 11.14
--3 tinyints 12.16
--4 tinyints 13.18
--5 tinyints 14.19
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply