Normalization

Question

Post reply

Normalization

Viewing 15 posts - 46 through 60 (of 144 total)

You must be logged in to reply to this topic. Login to reply

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 1

Steven

Please read up on relational database design.

Thank you,

Welsh

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 2

Someone, please show me an Relational EDR where the Address1 and Addess2 Lines are separated into two tables.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

Holy Spam, Welsh. Think it through, read it over, make sure it contains what you want to say, then hit post.. or even Edit. Postcount doesn't mean much here, and I'd like to avoid 14 emails for updates to a thread when you only had one or two things to say.

You're mistaking the theoretical normalization with the engines that have optimizers to work against normalization. The data is the purpose, the engine optimizes to read it. Depending on the expected format of the data will depend on the expected optimizations in the engine. The theory and method of optimization does not negate normal form methods, they merely get entangled with which are good on which engines.

As to your request for two tables for address, that will happen at higher normal forms so that nulls and the like only appear during reporting, saving the storage and avoiding nulls, a requirement for 5NF. Another method would be an address table with keys in address1 and address2, but then you still need a NULL (or a 0 for a blank), doing two unique lookups to the address table, which if memory serves would fall under 4NF but I'd have to re-research. Been a while since I cared about anything above 3.

Optimized in SQLServer? No. Correct to certain levels of normal form? Yes. Depending on the methodology you want for NULL usage and avoidance and repetitive data storage. Where that latter theory really gets interesting is the whole single point of storage entry. For example: Address2: Apt 101. When that person changes apartments to 103, do you edit the looked up address2 ID item which might invalidate others using it, or do you seek and correct to Apt 103? Things like that are where higher normal forms can become problematic, especially from an upkeep scenario.

These are items where the engine and optimization interfere with data theory. You can certainly pound on optimization in an engine all day long, and around here we do it a lot. That has little to do with the theoretical discussion of methodology of storage, which I believe was the point Steven was trying to make.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1004454 More actions · Answer 4

Welsh Corgi (4/23/2011)
Jeff,
I have no issues with your address solution but it requires an extra join and it can degrade performance.
If the objective is perfect normalization then go with an extra table but if you want to potentially avoid derogation of performance use an address1 & address 2 lines.
The address 2 line is usually null.
Personally I would not create a second address line for the same type of address unless it improved the execution plan.
Regards,
Welsh

Gosh, I suppose but I've simply not experienced the performance problems of which you speak.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 5

I'm was wrong what I stated and I apologize.

I should no better.

I regret what I said and that I offended anyone.

I'm sorry for the spam.

I was watching my home town team get massacred in the playoffs but that is not an excuse.

I was not trying to run up the points and I would request that they be taken away.

Respectfully,

Welsh

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

TomThomson SSC Guru Points: 104773 More actions · Answer 6

Welsh Corgi (4/23/2011)
I do not mean to be sarcastic but how many people have deployed a relational database beyond 3rd normal for? I could see 1 up from that but 3rd normal form but 6th normal form?
What kind of performance do you get?
How did your OLTP perform?

I aim at EKNF (which is somewhere between 3rd normal form and BCNF), and if there are any MFD or Decomposition anomalies left I decide whether to normalise further (and introduce representation failures that have to be coded around) or not normalise further (and have to code around the problems introduced by the MFDs and Decomposition anomalies).

But almost anything in EKNF is in 5NF because there are no MFDs or decomposition anomalies (in those rare cases where there are, I suspect that there has been some misunderstanding of the part of the real world that the database is to model and review that, because tearing up a bad model and starting again is probably going to be better than living with either decomposition anomalies or representation failures) so of course I usually end up with 5NF without having to normalise beyond EKNF.

Indeed almost anything in 3NF is in 5NF! So I suspect that a lot of people have deployed databases "beyond" 3NF, sinply because the process of normalising as far as 3NF resulted in a 5NF schema.

This of course relies on using the original definitions of the normal forms, not the perverted versions with an explicit forbidding of NULL added long after the definition was originally published and accepted as standard. Except of course in the case of 6NF, where it depends whether you mean this 6NF or that 6NF - there are two completely different versions and one of them did indeed prohibit NULL in its original form.

If you forbid NULL then any normalisation is fraught with dangers since it is neccessary to introduce lots of extra tables and extra entity types to cater for an attribute being known or not and find some way of coding around the fact that the result of an outer join is not a relation (first step: find some way of expressing the result of an outer join; note that this can't be done in SQL modified to exlude NULL; in RA it can be treated as a set of relations; or alternatively it can be treated as the union of the members of that set, ie the union of several relations with different signatures).

Incidentally, I would normally have a single address table with billing addresses, delivery addresses, and correspondance addresses in it. They are NOT different sorts of address, they are just different uses made of the same sort of address (often of the same address). Each address can be associated with a particular "person" (whether a human person or a corporation) for a particluar purpose - in the simplest case three attributes of the "person" are foreign keys into the address table, in the case when I want to record several billing addresses and/or several correspondance addresses or several delivery addresses for a "person" I will use a (single) association table to indicate wheich addresses are associated whith which "person" for which function and which are defaults and which are extras. The shipment table will of course have a foreign key into the same address table (or often into the order table, which has that foreign key, but I've seen systems where an individual order contains items to be shipped to several different addresses) to get an address for shipment. And of course my schema won't need any NULLs in the foreign keys. Within the address table I'm happy to work with a lot of attributes; for many attributes (such as appartment designator ['Flat 12', 'Appt 2C', '1st Floor Red'] or building name) there's a perfectly acceptable default (zero length string) that won't cause problems. For address attributes that go onto a separate line if they exist it may be better to use NULL or to have anseparate indicator column (a single bit) for each that indicates whether the item should or should not be present (in which case any value in teh domain is an acceptable default) - addresses (even if I have to cover most of the world, so lots of different address styles) will have few enough attributes that indicator columns are not going to be a big problem if the presentation layer prefers those to NULLs; but there is no imaginable way I'm going to have two tables to cover adresses with and without address_line_1, two tables to cover addresses with and without province, and so on (ending up with maybe 32 tables) because now I have to have either oodles of association tables to associate persons with addresses or oodles of person tables (for the case where there is specified that each person can have only one address for each of the three functions - ending up with 32768 "person" tables if I ended up with 32 address tables). Some people claim this is a normalisation issue - but I believe that it is an issue caused by an irrational prejudice agains NULLs and a rational fear of indicator columns (if you have too many performance can suffer). Not that my schema will be 5NF without splitting the address table into several tables - of course if it uses NULLs instead of indicator columns that has to be Fagin's original definition of 5NF, not the anti-null brigade's definition, but if it uses indicator columns then it's what even the anti-NULL brigade would call 5NF.

Tom

Steven993 Mr or Mrs. 500 Points: 588 More actions · Answer 7

Tom.Thomson (4/24/2011)
This of course relies on using the original definitions of the normal forms, not the perverted versions with an explicit forbidding of NULL added long after the definition was originally published and accepted as standard

"perverted" ? Why ? :hehe:

Tom.Thomson (4/24/2011)
If you forbid NULL then any normalisation is fraught with dangers since it is neccessary to introduce lots of extra tables and extra entity types to cater for an attribute being known or not

That's a danger ? Psychological danger maybe. Having a big number of tables is not a problem for me.

Database integrity, predictability of query result, table meaning is more important than table number...

And we always have views to hide that "complexity" to users.

Tom.Thomson (4/24/2011)
and find some way of coding around the fact that the result of an outer join is not a relation

The fact is that SQL allow NULL. That's a big mistake. We have to deal with it and be aware of all the problems it can bring in our databases.

Tom.Thomson (4/24/2011)
(first step: find some way of expressing the result of an outer join; note that this can't be done in SQL modified to exlude NULL; in RA it can be treated as a set of relations; or alternatively it can be treated as the union of the members of that set, ie the union of several relations with different signatures).

What do you mean ? What do you mean by "different signatures".

If I have different tables, and I need to present a result to the user, with possibly missing values for some columns, I'll use SQL's LEFT JOIN operator and use functions like COALESCE to explicitly represent to the user "unknown", "no value", etc..

I make a view from that query, so the user don't have to build query himself and fight the NULL monster.

By the Way, LEFT JOIN operator is not a primitive one.

It can be expressed with UNION, NATURAL JOIN and EXISTS operators.

Example:

I used MySQL, because unlink SQL Server it allow me to use NATURAL JOIN, which I like a lot.

create table news (

news_id integer primary key,

title varchar(20)

) Engine=InnoDB;

insert into news values (1, '1st news');

insert into news values (2, '2nd news');

insert into news values (3, '3rd news');

create table commentary (

com_id integer primary key,

com_text text,

news_id integer,

constraint fk_com_news foreign key (com_id)

references news (news_id)

) Engine=InnoDB;

insert into commentary values (1, 'com_1', 2);

insert into commentary values (2, 'com_2', 3);

insert into commentary values (3, 'com_3', 3);

I want to list all the "news", with for each the number of commentaries.

With LEFT JOIN :

SELECT news_id, title, Count(com_id) AS comNb

FROM news

NATURAL LEFT JOIN commentary

GROUP BY news_id

ORDER BY comNb DESC

Without LEFT JOIN :

SELECT news_id, title, Count(*) AS comNb

FROM news

NATURAL JOIN commentary

GROUP BY news_id

UNION

SELECT news_id, title, 0

FROM news AS n1

WHERE NOT EXISTS (

SELECT *

FROM news AS n2

NATURAL JOIN commentary

WHERE n2.news_id = n1.news_id)

ORDER BY comNb DESC

Same result :

news_id title comNb

3 3rd news 2

2 2nd news 1

1 1st news 0

TomThomson SSC Guru Points: 104773 More actions · Answer 8

Steven993 (4/24/2011)
Tom.Thomson (4/24/2011)
This of course relies on using the original definitions of the normal forms, not the perverted versions with an explicit forbidding of NULL added long after the definition was originally published and accepted as standard
"perverted" ? Why ? :hehe:

Perhaps because Dace had to pervert them to get them to fit his strange world view? :hehe::-P

Tom.Thomson (4/24/2011)
If you forbid NULL then any normalisation is fraught with dangers since it is neccessary to introduce lots of extra tables and extra entity types to cater for an attribute being known or not
That's a danger ? Psychological danger maybe. Having a big number of tables is not a problem for me.
Database integrity, predictability of query result, table meaning is more important than table number...
And we always have views to hide that "complexity" to users.

You think the danger is only psychological because you haven't seen how fast the number of tables can blow up. When I end up with hundreds of thousands of tables simply because I refuse to recognise that there is a simpler way of representing the fact that I don't have a value for something than to produce a new value for that case I am needlessly complicating and obfuscating the schema with an almost guaranteed increase in the bug rate and consequential reduction in reliability and correctness of query results, and possibly crippling performance because what should be trivial queries involve the unions of hundreds or even thousands of natural left joins (where the things being joined are in turn unions of many natural left joins).

Tom.Thomson (4/24/2011)
and find some way of coding around the fact that the result of an outer join is not a relation
The fact is that SQL allow NULL. That's a big mistake. We have to deal with it and be aware of all the problems it can bring in our databases.

If you allow null, the result of an outer join is a relation; if you don't, it isn't. The reason we have outer join is that it is an extremely useful operation that greatly simplifies many otherwise complex tasks. And of course the reason we have null is that when used properly (and I'm fully aware that it is very often badly misued - and SQL makes it quite hard to avoid misusing it) it greatly simplies what would otherwise be enourmous and complex schemata.

Tom.Thomson (4/24/2011)
(first step: find some way of expressing the result of an outer join; note that this can't be done in SQL modified to exlude NULL; in RA it can be treated as a set of relations; or alternatively it can be treated as the union of the members of that set, ie the union of several relations with different signatures).
What do you mean ? What do you mean by "different signatures".

I mean that their signatures are not identical. They don't all have the same attribute set. Some columns in one don't exist in some of the others.

If I have different tables, and I need to present a result to the user, with possibly missing values for some columns, I'll use SQL's LEFT JOIN operator and use functions like COALESCE to explicitly represent to the user "unknown", "no value", etc..
I make a view from that query, so the user don't have to build query himself and fight the NULL monster.

That appears to assume that you are doing presentation in SQL. Why do you want to do that? Why do you want to assume that SQL can be used to do the presentation? If you are passing data out to me for me to do some pretty complext mathematics on it before showing it to the user I can assure you that I would rather see a NULL to represent BOTTOM that some string you have chosen for me to waste space storing and time parsing.

By the Way, LEFT JOIN operator is not a primitive one.
It can be expressed with UNION, NATURAL JOIN and EXISTS operators.
Example:
I used MySQL, because unlink SQL Server it allow me to use NATURAL JOIN, which I like a lot.
create table news (
news_id integer primary key,
title varchar(20)
) Engine=InnoDB;
insert into news values (1, '1st news');
insert into news values (2, '2nd news');
insert into news values (3, '3rd news');
create table commentary (
com_id integer primary key,
com_text text,
news_id integer,
constraint fk_com_news foreign key (com_id)
references news (news_id)
) Engine=InnoDB;
insert into commentary values (1, 'com_1', 2);
insert into commentary values (2, 'com_2', 3);
insert into commentary values (3, 'com_3', 3);
I want to list all the "news", with for each the number of commentaries.
With LEFT JOIN :
SELECT news_id, title, Count(com_id) AS comNb
FROM news
NATURAL LEFT JOIN commentary
GROUP BY news_id
ORDER BY comNb DESC
Without LEFT JOIN :
SELECT news_id, title, Count(*) AS comNb
FROM news
NATURAL JOIN commentary
GROUP BY news_id
UNION
SELECT news_id, title, 0
FROM news AS n1
WHERE NOT EXISTS (
SELECT *
FROM news AS n2
NATURAL JOIN commentary
WHERE n2.news_id = n1.news_id)
ORDER BY comNb DESC
Same result :
news_id title comNb
3 3rd news 2
2 2nd news 1
1 1st news 0

It's very easy to do trivial cases, like this one. Now try one where some domain has 5 attributes that may or may not be present - that's 32 relations for that domain; and some other domain has to reference 3 entities of the first domain - that's 32768 relations for the second domain; and a third domain has to reference 2 entities in the second domain and 1 in the first - that's 33555432 relations for the third domain. So these three domains alone require 33587232 tables. Throw in a few more domains and the null-free world can suddenly become a lot bigger than a mere 33 million tables. How big and messy is your example going to become in such a real-world situation?

Tom

Steven993 Mr or Mrs. 500 Points: 588 More actions · Answer 9

Tom.Thomson (4/24/2011)
Perhaps because Dace had to pervert them to get them to fit his strange world view? :hehe::-P

I don't know Dace's work, but maybe you could explain it for me, and explain me why you disagree with it ?

Tom.Thomson (4/24/2011)
It's very easy to do trivial cases, like this one. Now try one where some domain has 5 attributes that may or may not be present - that's 32 relations for that domain.....

"some domain has 5 attributes" ? What do you mean by attribute and domain here ?

Where does that 32 comes from ? Please give an example.

TomThomson SSC Guru Points: 104773 More actions · Answer 10

Steven993 (4/24/2011)
Tom.Thomson (4/24/2011)
Perhaps because Dace had to pervert them to get them to fit his strange world view? :hehe::-P
I don't know Dace's work, but maybe you could explain it for me, and explain me why you disagree with it ?

I couldn't explain it; (a) there's too much of it to fit here and (b) I can't follow his reasoning about NULL values (he claimed at one point, for example, that if you introduced a NULL it was easy to prove that this had as a consequence that you had to introduce an infinite number of different sorts of NULL; the argument he presented to support this appeared to me to be ludicrous, complete arm-waving nonsense). He has also been know to claim that it is impossible to build any useful theory based on multi-valued logic, which any serious mathematician will tell you is utter nonsense (we may disagree amongst ourselves whether a useful mathematics can be built on 2-valued logic, but no-one believes that MVL is not useful). Also he claimed to disagree with Codd about atomicity, but I can't see where they differ. On pretty much everything except NULL (and MVL) I agree with him (and I believe that his position on everything except NULL and MVL is identical with Codd's anyway). If you look at Date's description of a debate he had on nulls at this url you will get a picture of how he argued about nulls (note that this is a revised version of an edited version or the report of the original debate - both the revision and the earlier editing done by Date, without Codd).

Tom.Thomson (4/24/2011)
It's very easy to do trivial cases, like this one. Now try one where some domain has 5 attributes that may or may not be present - that's 32 relations for that domain.....
"some domain has 5 attributes" ? What do you mean by attribute and domain here ?
Where does that 32 comes from ? Please give an example.

A domain is something I want to represent: I wanted to avoid saying "relation" because I know you don't accept the concept of a relation with nullable attributes (since you don't accept null); and I wanted to avoid "table" because I'm not talking SQL but relational theory. But probably either "relation" or "table" would be a better word.

I won't give you an example where the combinatorial explosion takes place, but instead one where it can be avoided, because it's an easier example to understand. Let's talk about addresses (again ;-)); possible attributes (taking account the address schemes of some of the countries where I've lived and worked) are

1)house or building name

2)community

3)Appartment or house number (internal)

4)PO Box

5)street address (including number on street)

6)district

7)town or city

8)municipality

9)municipality code

10)county

11)state

12)island

13)province

14)region

15)component country

16)post code

17)zip

18)country

My address here includes 2,3,4,5,7,8,9,12,13,18; my daughter's address includes 3,5,7,15,16,18; so you can see just from those two that there's quite a lot of variation in which attributes are used.

Depending on where you live, your home address may have values for any combination of attributes 1,2,3,4,6,8,10,12,15,16. So in a system with nulls each of those columns would be NULLABLE unless you could invent a safe default. Now suppose I can't invent a safe default for any of them: then the number of different combinations of attributes that can exist for addresses is 1024 (2 to the power 10) because there are 10 independently existing attributes. So if I can't have nullable attributes addresses are divided into 1024 sets, and the collection of attributes in any one set is different from the collection of attributes for any other set; so no two sets can be represented by the same relation, so there have to be 1024 relations. Of course it's ridiculous to say that I can't invent safe defaults for any of these attributes (a zero length string works for several), so the number is nowhere near 1024 in this particular case. But you can see how I got 32 from 5 nullable (because possibly absent or unknown) attributes. Note that most of the other attributes in an address can be absent or unknown too, the only ones guaranteed to be there and the street address and the country (I don't need town or city in countries like Luxembourg or Monte Carlo), but even so I don't think I would be forced to use NULL to avoid multiple tables in this particular case (because a zero length string is probably a safe default for every attribute, and my software can interpret that value as "not applicable"). However, as soon as I have (signed) integer or real attributes instead of text strings choosing a default can become very hard or even impossible, so the options are to use NULL, to introduce extra attribute whose sole function is to say "the value in that column over there is not a value becasue it doesn't have one in this row" (ie code NULLs by hand instead of using a built-in NULL provided by the system), or to allow the proliferation of tables to take place (and if you are familiar with combinatorial mathematics, you will realise how quickly this proliferation can spread through a schema and push the numbers up to extremely high levels).

So the question of whether NULL is needed essentially boils down to a different question: is it always possible to assign a default value to every attribute in such a way that using the default rather than admitting that you don't have a value is always safe? If it isn't possible, then you either permit NULL (whether a hand-rolled NULL or the one provided in you database system) or accept that there will be a combinatorial explosion. If it is possible, then you don't need NULL but you do need a default assignment algorithm.

My personal view is that it is not always possible to assign default values safely (they will make a mess of aggregation). It would be nice if this weren't so, but my experience tells me that it is. The Dace position is that it is always possible to choose safe defaults; my comment on that when he first published that view was "he must have led a sheltered life" - there, that gives you a little more idea of what he did and why I disagree.

Tom

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 11

I'm sure that you know that you can determine information from the FIPS county code so that you do not need to store a lot of information in one table:

http://en.wikipedia.org/wiki/Federal_Information_Processing_Standard"> http://en.wikipedia.org/wiki/Federal_Information_Processing_Standard

The same is true with the Zip Code Plus, enter the zip and relate it to parent tables.

I also used Country Codes and regions for an International Trade Database years ago, so there were a lot of tables involved but you only had to enter limited information and the fields on the form were auto populated.

Respectfully,

Welsh

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004454 More actions · Answer 12

Welsh Corgi (4/24/2011)
I'm sure that you know that you can determinte information from the FIPS county code so that you do not need to store a lot of information in one table:
http://en.wikipedia.org/wiki/Federal_Information_Processing_Standard"> http://en.wikipedia.org/wiki/Federal_Information_Processing_Standard
The same is true with the Zip Code Plus, enter the zip and relate it to parent tables.
Respectfully,
Welsh

How odd. I believe I recall that, when you and I breifly talked about addresses, you said the extra joins would cause a performance problem. Have you changed your opion on that?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 13

Jeff,

No offense but you can't get it down to the exact address but you can narrow down the information that you need to enter, City State by entering the Zip. When you enter the zip + it narrows it down to a geographical area.

The Postal Service or any other State or Federal Entity does not store address1 & address2 in separate tables.

Regards...

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Jeff Moden SSC Guru Points: 1004454 More actions · Answer 14

Duplicate post withdrawn.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steven993 Mr or Mrs. 500 Points: 588 More actions · Answer 15

@tom.Thomson

When you write Dace, I guess you mean Date.

I actually own almost every books Chris published. He wrote hundred of pages on NULL topic.

Maybe I'll come later and write something about the history of NULL in relational theory and why it's not allowed in relationland.

There is not only one reason to exclude NULL from relational theory, they are a lot.

Here is my favorite one, quickly:

As we can both agree, relational theory is a branch of applied mathematics, it mainly take it roots from set theory and predicate logic.

A relation header can be seen as a predicate. It is the meaning of the relation.

Take this relation header:

SUPPLIER{S#, SNAME, CITY}

The predicate of the relation can be read as follow:

The SUPPLIER S# is named SNAME and is located in CITY.

Each attribute of the header is a variable of the predicate.

What is a tuple ?

A tuple is what you get when you assign values to the predicate's variables. Tuples are truth propositions.

Here are some propositions of that relation :

The SUPPLIER S127 is named Steven and is located in Paris.

The SUPPLIER S876 is named Chris and is located in London.

What append if you introduce NULL ?

The SUPPLIER S671 is named ??? and is located in Madrid.

Is that a truth proposition ? Does it make sense ? No.

Concerning the address "problem".

This is a database design question.

It depends on the business rules you have to follow, to represent.

In France, the addresses are like that :

Steven Relational
16 rue de la Paix
75000 Paris

Name and FirstName.

Then one to 5 lines, containing whatever we want, street, house, region, door number, floor....

Then ZipCode and City

In databases we just want to save First name, Last name, ZipCode, City and the One to Five lines.

That's enough to send product to clients and bills to suppliers. No more details.

How to deal with the possible 5 address lines ?

I consider the address lines has multivalued properties of the Address.

Here is my database schema (primary key underlined, # for foreign keys):

Customer(cust_id, cust_lastName, cust_firstName)

Address(add_id, add_Name, add_zipCode, add_City, cust_id#)

AddressLine(add_id#, line_id, line_Content)

Then you can add the fact that a customer can have multiple address of different kind, etc.. no problem.

That's what I need 99% of the time.

If my business rules forces me to go deeper in details on address properties and manage 10, 18 or 50 properties like you proposes, I'll change my design and do something like that:

Address(add_id, add_Name, cust_id#)

Properties(pro_id, pro_Name) -- City, Street, ZipCode, Country, Floor, ....

AdressProperties(add_id#, pro_id#, ap_value)

I follow my business rules with simple design, I never see one NULL.

With good database design we can avoid NULL very easily.