August 16, 2006 at 4:49 am
Hi everybody,
Today one developer asked me if she should divide a table that is going be filled in a way that is different before. The table is used to be filled with non-null values on all columns but now it is going to be filled with null values on some rows except for the first two columns. The original and divided table pair structures as follows
CREATE
TABLE Original(ID
INT IDENTITY(1,1) NOT NULL,ColA VARCHAR
(10),ColB VARCHAR
(10),ColC VARCHAR
(10),ColD VARCHAR
(10),ColE VARCHAR
(10),ColF VARCHAR
(10),ColG VARCHAR
(10),ColH VARCHAR
(10),...)CREATE
TABLE Main(ID
INT IDENTITY(1,1) NOT NULL,ColA VARCHAR
(10),ColB VARCHAR
(10))CREATE
TABLE Detail(ID
INT NOT NULL,ColC VARCHAR
(10),ColD VARCHAR
(10),ColE VARCHAR
(10),ColF VARCHAR
(10),ColG VARCHAR
(10),ColH VARCHAR
(10),...)
Now the question comes: is there a ratio of the rows with null values to all rows in the table or the number of columns with null values to the number of all columns in a table to help you decide if the table should be divided into main-details form?
Thanks in advance
Zubeyir
August 16, 2006 at 8:17 am
This is a "it depends" answer. You're not splitting the table to normalise so all you're really doing is to add an extra join into what other joins you might have. Horizontal partitioning is a valid procedure but to be honest your table is really too narrow to really benefit.
I'd actually ask the question - what is it you expect to gain from splitting your table this way?
I'm not going to enter into arguments about nulls in columns as this isn't what you asked and diverts from your original question.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
August 16, 2006 at 10:06 am
Table schema reflects real world entities or relations. Without knowing the "real thing" the table represents in the real world, it's even hard to say the spliting is a nomalising process or not.
If all the columns in the table are attributes of one entity, and you are not exceeding the row length limit of sql server, i would not split it.
However, if the ratio (null rows/all rows ) you mentioned is over 0.5, it definitely can save spaces when u split it.
August 16, 2006 at 10:21 am
I'm intrigued to understand how a null varchar column can save space by placing it in another table.
sorry I typed horizontal in place of vertical in my first post .. vertical partitioning cannot be normalisation, my apologies for any confusion. You normalise to remove nulls, any table column which allows multiple nulls is not normalised.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
August 16, 2006 at 10:27 am
let's say if 50% of the records have null value for all the nullable columns, you do not need to insert those records into the second table. The second table only contains records that at least two columns (ID column and any other column) have non-null value.
The space saved depends on the data type of the columns in the table.
August 16, 2006 at 2:51 pm
Thanks for replies!
Let me tell you what I'm faced with by making example of it
The table (and some dummy data was like that)
IDColAColBColCColDColE ---------------------------------- 1AAABBBCCCDDDEEE 2AAABBBCCCDDDEEE 3AAABBBCCCDDDEEE 4AAABBBCCCDDDEEE 5AAABBBCCCDDDEEE 6AAABBBCCCDDDEEE 7AAABBBCCCDDDEEE
All (or most) of the columns has non-null values so it's a good desing for now. But a business requirement comes and makes it difficult.
IDColAColBColCColDColE ---------------------------------- 1AAABBBNULLNULLNULL 2AAABBBCCCDDDEEE 3AAABBBCCCDDDEEE 4AAABBBNULLNULLNULL 5AAABBBCCCDDDEEE 6AAABBBNULLNULLNULL 7AAABBBCCCDDDEEE
So we have null values for some of the rows but not all of them. In order to avoid nulls (to save space in other terms) we meant to move the columns ColC, ColD, ColE to a second table and join them to the original one with ID column. Then we began to think how necessary is that change? Can we just leave the null values in the table with peace or do we really need to move them into a secondary table? What is the criteria of such a question?
I know it took a bit long to explain it but you know long question makes the answer easier
Cheers
Zubeyir
August 16, 2006 at 3:12 pm
if it ain't broke don't fix it
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
August 18, 2006 at 1:33 am
I would take into consideration the following facts:
- what is the proportion of rows that have NULL values in ALL columns of proposed second table?
- how often will you be able to use only data from the main table, i.e. no need to JOIN the details table?
- what is the total row length at the moment and what part of this length will go into details table?
Your example is not a "real" sample of the data, that makes any decisions harder for us (e.g. all rows have 'AAA' in ColA, but I suppose the values are different in each row in reality; given that, I'm not sure whether the columns are really VARCHAR(10) or whether it is just another simplification; number of columns is unclear - there are dots after the last column suggesting there are more columns following).
Generally, if you know you will most of the time join both tables in queries, and if the row is not too long (not near allowable maximum), I'd suggest to leave it as it is. On the other hand, if you know that lots of rows will not need corresponding row in Details, and in half of the queries where Original table is used, you will get all you need from Main table - so you don't have to join Details - I would consider dividing the table.
If you are not sure, better just leave it as it is - you can always divide the tables later (and use a view that will represent Original table).
August 18, 2006 at 2:07 am
Thanks a lot everybody.
I know what I represented as my table is a bit general but since I wanted to reach a general rule for dividing tables I choose that manner on purpose. Sorry if it caused any trouble for you.
I completely agree with Colin's "it depens" explanation. In db you can berely make sharp decisions that is valid always, but anyway I tried my shot , on the other hand having some "key" values like peterhe and Vladan gave (%50) is useful as a start point sometimes.
Now I'll suggest the developer to keep the table as one as long as it gets.
Thanks again
Zubeyir
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply