NULL vs. Empty String

Question

Post reply

NULL vs. Empty String

vijju2k

SSCommitted

Points: 1531
More actions
October 20, 2003 at 11:03 am

#101723

Hi all,
I was wonderning what is a good entry for a database record if it has to be left blank (ie, no data is entered). Is NULL better than Empty String or vice versa.
I apologize if this is a redundant posting.
thank you all in advance,
V

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

SQLBill SSC Guru Points: 51440 More actions · Answer 1

My answer is that it depends.

Basically, if you mean using a 'space' for an empty string, then NULL takes up less space.

But if you are using a query such as this:

SELECT col1, col2

FROM mytable

WHERE col1 = @myvariable

Then NULL will probably cause problems. To use NULL in the search you would have to change the query to WHERE col1 IS NULL. So you couldn't use the variable. But a space could be entered as:

SET @myvariable = ' '

and you wouldn't have to make any changes to the WHERE.

-SQLBill

vijju2k SSCommitted Points: 1531 More actions · Answer 2

My thinking while writing this question was, if we discuss from a purely database maintainence (and/or performance) perspective, which is better?

Jonathan SSC-Insane Points: 20427 More actions · Answer 3

My rule is to only make a column nullable if it truly needs to hold "unknown" values. Nullable columns will force you to use ternary predicates with NULL logic that work correctly for the actual "unknown" meaning of NULL. E.g., if you create a column for "Address Line 2", there may be many tuples where an empty string means "known to be blank," which is logically very different from "unknown."

If you don't need NULL logic then don't use nullable columns.

--Jonathan

Frank Kalis SSC Guru Points: 111183 More actions · Answer 4

Well, as has been said above, it depends...

you might take a look at

http://www.firstsql.com/inulls.htm

http://www.sqlservercentral.com/columnists/jtravis/understandingthedifferencebetweenisnull.asp

http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

to get further information on NULL

Frank

http://www.insidesql.de

Well, actually no real need to read my last link. Might actually lead to confusion . But nice to know anyway.

Edited by - Frank Kalis on 10/21/2003 01:12:18 AM

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Johan Bijnens SSC Guru Points: 135662 More actions · Answer 5

Keep it simple and stupid.

Follow Jonathan's guidelines.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data and code to get the best help

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Who am I ? Sometimes this is me but most of the time this is me

Frank Kalis SSC Guru Points: 111183 More actions · Answer 6

quote:
Keep it simple and stupid.

are we talking about management ???

Frank

http://www.insidesql.de

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

cneuhold Hall of Fame Points: 3375 More actions · Answer 7

hi!

try to avoid empty strings, if you're up to creating "clean" code on whatever client side you're using. stumbling over a NULL from time to time will make your developers remember, that ther *is* something like NULL and that they'll have to check for it!

otherwise they'll assume that empty means just "empty string" which will ease programming but make your applications do the funniest things, once some NULLs arise.

best regards,

chris.

NPeeters SSChampion Points: 12654 More actions · Answer 8

Have to agree with Jonathan. Null can have an entirely different meaning from an empty string.

Just think about an integer column. In that case a NULL is completely different from a zero value.

For me every column that is not required has to be nullable, and a null value indicates that it has not been entered yet.

lambje SSCrazy Points: 2085 More actions · Answer 9

Definitely NULL. I would never even consider using a blank. In fact, it drives me nuts when a front end tool inserts a blank instead of a null in a nullable column. Because of this you should always query with IsNull(nullcol, '') = ''.

Antares686 SSC Guru Points: 125444 More actions · Answer 10

First off I will state I disagree with CJ Date and the others who think NULL is a waste in SQL. NULL does not apply only to SQL but to most all languages and in fact came from the others as far as concept.

NULL is the abstract expression of the unknown.

Without the abstract value of NULL how then do you define a completely unknown value. Some suggest defaults but give no clear idea as to how these defaults solve the NULL problem as they will still represent the unknown.

Now consider this when a datatype is fixed length such as an int or char how is the data express in the database when it does exist.

Here is the DBCC PAGE output from a table I created tx with column d int, c char(10), and a int.

First off ignore the 00160010 at the begining of each hex set as it has special meaning but not important here.

Note: int is 4 bytes char(10) is 10 and int is 4 for a total of 18 bytes

In Slot 0 (record 1 on the page) the space from the [ and ] is 20 but the 0003 in the last set can be ignored because it has to do with end of data so subtract 2 (HEX is base on 00 thru FF as 1 byte and we are removing 2 sets).

Thus we have 18 total bytes. Also note the order of bytes is funny in that 00000001 is actually read 00010000 and so on to read out the rows in this view.

436B4060: 00160010 [00000001 00000000 00000000 ................

436B4070: 00000000 00030000] 06 .........

Now none of that is key what is key is the 06 at the very end seperate from the rest in this case.

It is important in the fact that columns 2 and 3 are NULL in the first example. Now as you see it is 6 but the key is it is a bitmask. Column 1 = 1, 2 = 2, 3 = 4, 4 = 8 and so on up to 1,024 columns

So 2 + 4 = 6 the bitmask tells me column 2 and 3 are NULL as this is the NULL bitmask.



Slot 0 Offset 0x60

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP                    

436B4060:  00160010  00000001  00000000  00000000 ................

436B4070:  00000000  00030000        06           .........

d                                = 1                

c                                = [NULL]           

a                                = [NULL]           



Slot 1 Offset 0x79

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP                    

436B4079:  00160010  00000002  00000000  00000000 ................

436B4089:  00000000  00030000        06           .........

d                                = 2                

c                                = [NULL]           

a                                = [NULL]           



Slot 2 Offset 0x92

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP                    

436B4092:  00160010  00000003  2020207a  20202020 ........z       

436B40A2:  00002020  00030000        04             .......

d                                = 3                

c                                = z                

a                                = [NULL]           



Slot 3 Offset 0xab

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP                    

436B40AB:  00160010  00000004  2020207a  20202020 ........z       

436B40BB:  00012020  00030000        00             .......

d                                = 4                

c                                = z                

a                                = 1

Ok so under the hood the data is still stored but with 0 and a bitmask to tell the engine which columns are NULL.

Now from there what happens with varchars. It is a bit different.

The table was defined as ty with column a int, b varchar(5), c tinyint, and d char(8).

Here is the DBCC PAGE output



Slot 0 Offset 0x60

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP                    

43808060:  00110010  00000001  00000000  00000000 ................

43808070:  0e000400                               ....

a                                = 1                

b                                = [NULL]           

c                                = [NULL]           

d                                = [NULL]           



Slot 1 Offset 0x74

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP                    

43808074:  00110010  00000001  00000002  00000000 ................

43808084:  0a000400                               ....

a                                = 1                

b                                = [NULL]           

c                                = 2                

d                                = [NULL]           



Slot 2 Offset 0x88

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP                    

43808088:  00110010  00000001  61616102  20202061 .........aaaa   

43808098:  02000420                                ...

a                                = 1                

b                                = [NULL]           

c                                = 2                

d                                = aaaa             



Slot 3 Offset 0x9c

------------------

Record Type = PRIMARY_RECORD                        

Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS   

4380809C:  00110030  00000001  61616102  20202061 0........aaaa   

438080AC:  00000420  00190001        63            .......c

a                                = 1                

b                                = c                

c                                = 2                

d                                = aaaa

Now there are a few differences here that make using NULLable varchars good and tell us a bit about the engine.

In Slot 0 we notice we are 5 bytes short of what a usual record looks like when all fixed length, this is because nothing is written for the varchar field when not filled.

But we see the fist byte on the second row has 0e00 whih is what we are looking for here. What has happened is the engine bitmasks the fact 2,3, and 4 are NULL (in other words 2+4+8=14 or 0E in HEX) so the engine uses the NULL marker even for varchar fields.

Proceed to SLOT 2 and SLOT 3 and you see the respective values for NULL bitmask 0a (columsn 2 and 4) and 02 (column 2 only).

Now the fun comes when we add data to the varchar our row grows by several bytes but not neccessarily equal to the 5 for the varchar. What happens here is there is a marker added to tell the engine where the varchar data for the first non-fixed-length column begins and takes up 2 bytes itself (this is the offset from begining of record). a new one is added for each varchar column that is put into play and all come before the varchar data for all columns is written (or any non-fixed-length datatype).

The key thou is the bitmask for columns is used under all cirumstances to note when there are NULL values.

So in the case of a non-fixed-length column that may have NULLs using a default instead is a waste of space which is a commoditity best served otherwise.

Also, if you are going to use a Non-NULL value how do you deal with it more effectively (or at least as effectively) compared to the NULL bitmask. You still have to dream up something that will add the overhead for NULL to be dealt with but you will rely on the developer coding a NULL or UNKNOWN value process which opens the door for many errors with a strict guideline.

My opinion is use NULL when possible but in some situations it may make more sense to use a default for no data such as '' because it may be easier to deal with user input in char values. Just be carefull that good records do not have a overlap issue such a $0 != $0 becuase one is unknown and oen represents no value. As for non-fixed-length datatypes thou, it is still up to you, but keep in mind you save space with NULL.

I hope all that helps a bit or leads you to some understanding of NULL and it's handling but as for the whole NULLs are not valueable I am afraid that is not the cause becuase you will always run into the need to express an unknown value as opposed to a non-existant value.

Antares686 SSC Guru Points: 125444 More actions · Answer 11

Oh 1 more not I would suggest if you have alots of repeating values or if the NULLable column will be NULL often then consider splitting the table and using a 1 to 1 relationship. This will still produce NULLs when joined but can be bennificial, especially with fixed length datatypes, in saving space.

PBirch SSCrazy Eights Points: 8493 More actions · Answer 12

I prefer defaults whenever possible. In some columns (such as Termination Date in an Employees table) NULL is the logical choice.

It depends.

Dr. Peter Venkman: Generally you don't see that kind of behavior in a major appliance.

Patrick

Quand on parle du loup, on en voit la queue

Jonathan SSC-Insane Points: 20427 More actions · Answer 13

quote:
I prefer defaults whenever possible. In some columns (such as Termination Date in an Employees table) NULL is the logical choice.
It depends.

That's interesting, as I would use something like 99991231 if they're not terminated. The termination date is not "unknown" in the sense that they may or may not have been terminated, and this allows normal logic and (perhaps) making the column NOT NULL, although it does preclude using the IS NOT NULL logic as a shortcut for "not terminated." Personally I think this "logic" is misguided: you may save a tiny bit of space by making the TerminationDate do double duty, but you have limited the information you can store. E.g., if you use a separate flag column for Terminated, you will be able to tell if someone is terminated whether or not their termination date has been set. (Why am I getting a mental picture of the Governor-Elect of California?)

The nulls vs no nulls debate has been raging since Date and Codd. We're not going to settle it in this forum, although it's interesting to read some of these opinions and reasoning.

quote:
My opinion is use NULL when possible...

I guess I disagree with James ("Antares686") on this; I am loathe to use nullable columns and create them only when I need the "unknown" logic. I certainly won't allow NULL to be used when the value is instead empty. E.g. I might make a City column in an address table nullable and not allow blanks, but I would make the Name column not null and not allow blanks. I might make the PhoneNumber column nullable and allow blanks (not everyone has a phone). That way one could query for entities with no phones or for entities where we don't know their phone numbers. A problem with allowing both blanks and nulls, as I wrote in another thread, is making clear the distinction in the data-entry front-end.

--Jonathan

Frank Kalis SSC Guru Points: 111183 More actions · Answer 14

quote:
That's interesting, as I would use something like 99991231 if they're not terminated. The termination date is not "unknown" in the sense that they may or may not have been terminated, and this allows normal logic and (perhaps) making the column NOT NULL, although it does preclude using the IS NOT NULL logic as a shortcut for "not terminated." Personally I think this "logic" is misguided: you may save a tiny bit of space by making the TerminationDate do double duty, but you have limited the information you can store. E.g., if you use a separate flag column for Terminated, you will be able to tell if someone is terminated whether or not their termination date has been set. (Why am I getting a mental picture of the Governor-Elect of California?)

uff, and I thought while reading Patricks' post, the programmers of our human resources system ?!? are completely wrong using a termination date of 12/31/9999.

Now that Jonathan mentioned the Californian Governor Election I may ask the question if it advisable to keep calm with critic of the government in California?

Do you have to fear hearing 'Hasta la vista, baby' and getting terminated

quote:
The nulls vs no nulls debate has been raging since Date and Codd. We're not going to settle it in this forum, although it's interesting to read some of these opinions and reasoning.

NULLs have been, are and I guess will always be an integral part of RDBMS, so help me Codd (#3).

Frank

http://www.insidesql.de

Edited by - Frank Kalis on 10/22/2003 12:29:16 AM

Edited by - Frank Kalis on 10/22/2003 12:31:20 AM

--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]