Modeling occasional entity grouping

Question

Modeling occasional entity grouping

robert.levy

SSC Veteran

Points: 251
More actions
August 5, 2010 at 6:35 am

#236299

Hi DB folks,
I'm looking for advice about modeling the following setup as I'm terrified that I'm about to head into a design which would have better DBAs than me tearing their hair out.
Imagine a world in which 90% of the population live alone (a household of one). The other 10% can live in (potentially huge) family households. Each household needs a TV Licence.
In my current model I have:
People
PersonID, PersonName, HouseholdID
Households
HouseholdID, HouseholdName, LicenceID
Licences
LicenceID, LicenceNumber
My fear is that for 90% of the rows of household what I'm really representing is a person but I need to create a mythical household object just for the sake of giving that person a TV licence (and the household name will be the same as the person name). This means that if a person changes his or her name, he/she now belongs to a household with a name that no longer makes any sense.
Also, if there are two lonely John Smith's in this lonely world, they might have the same household name but no PersonID on the household object to tell them apart. Or some horrible solution with JohnSmithHousehold1, JohnSmithHousehold2. Yuk.
What I really want to do is to allow a licence to belong either to a person or to a household (obviously not both). Can this be modeled in a safe and sane way?
Many thanks in advance for your always brilliant input,
Rob

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 1

robert.levy (8/5/2010)
Hi DB folks,
I'm looking for advice about modeling the following setup as I'm terrified that I'm about to head into a design which would have better DBAs than me tearing their hair out.
Imagine a world in which 90% of the population live alone (a household of one). The other 10% can live in (potentially huge) family households. Each household needs a TV Licence.
In my current model I have:
People
PersonID, PersonName, HouseholdID
Households
HouseholdID, HouseholdName, LicenceID
Licences
LicenceID, LicenceNumber
My fear is that for 90% of the rows of household what I'm really representing is a person but I need to create a mythical household object just for the sake of giving that person a TV licence (and the household name will be the same as the person name). This means that if a person changes his or her name, he/she now belongs to a household with a name that no longer makes any sense.
Also, if there are two lonely John Smith's in this lonely world, they might have the same household name but no PersonID on the household object to tell them apart. Or some horrible solution with JohnSmithHousehold1, JohnSmithHousehold2. Yuk.
What I really want to do is to allow a licence to belong either to a person or to a household (obviously not both). Can this be modeled in a safe and sane way?
Many thanks in advance for your always brilliant input,
Rob

I had the opportunity of living in London for a couple of years and if I remember it right the TV license was tied to a specific TV set sitting in a specific address and paid by for a specific person.

Bottom line is, not sure if the system cares about "all the people in the country" or just about "the people that has paid (or should pay) a TV license".

Each TV license had a nice unique "number" and that number was the first piece of information you should give over the phone each time you wanted to do something... like moving to a different flat or cancelling the TV license.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

robert.levy SSC Veteran Points: 251 More actions · Answer 2

Hi again all,

apologies (and particular to Paul below!) for not being clearer.

The TV licences example was purely fictitious (I'm actually making a database to deal with quantitative financial information). I didn't want to get bogged down in the details of what I'm actually doing so I came up with a toy example.

What I'm really asking about is how to model the setup where a relationship can be made to either a single entity (95% of cases) or to a group of these same entities.

If you read my initial post and thought "I don't know anything about TV licences. I won't reply to this thread" please have another read with the thought that it's just a toy example!

Sorry again not to have been clearer.

Rob

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 3

robert.levy (8/5/2010)
In my current model I have:
People
PersonID, PersonName, HouseholdID
Households
HouseholdID, HouseholdName, LicenceID
Licences
LicenceID, LicenceNumber
My fear is that for 90% of the rows of household what I'm really representing is a person but I need to create a mythical household object just for the sake of giving that person a TV licence (and the household name will be the same as the person name). This means that if a person changes his or her name, he/she now belongs to a household with a name that no longer makes any sense.

Oops... sorry about that, some times I get stuff too literaly.

I'm assuming column HouseholdID is a surrogate key - is that correct?

Two solutions...

1- Why even bother in having a HouseholdName? if needed for some reason couldn't it match HouseholdID?

2- Apply a sort of SCD type II approach. This will require to add to Households table some columns including a new meaningless PK as well as DateValidFrom and DateValidUntil columns. This way when a row on People table changes personal name you just expire the related row in Households and create a new one with the proper information. Approach will keep an accurate histore of Households over time and will solve the name change issue.

Hope this time I got it right 😀

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

robert.levy SSC Veteran Points: 251 More actions · Answer 4

Hi Paul,

thanks (again!) for your quick response!

I'm not sure I've properly explained the problem. The issue of whether a household has a name or not (and, hence, whether it is different to the Person name) is a symptom of my design problem, not the problem itself.

The problem itself is this:

Every A has to have a C.

Sometimes many As are grouped into a single B.

In this case, it is the B that has the C.

Is this explanation any clearer? (perhaps much less clear!!)

Another toy example: Every Person has exactly one Heart. Occasionally a SiameseTwin is born where two Persons have one Heart. In this case it is the group SiameseTwin which has the Heart and the two Persons are both members of this one SiameseTwin. (Database table names in bold)

I'm looking for a good, sound modelling approach for this setup.

Many thanks,

Rob

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 5

robert.levy (8/9/2010)
The problem itself is this:
Every A has to have a C.
Sometimes many As are grouped into a single B.
In this case, it is the B that has the C.
Is this explanation any clearer? (perhaps much less clear!!)
Another toy example: Every Person has exactly one Heart. Occasionally a SiameseTwin is born where two Persons have one Heart. In this case it is the group SiameseTwin which has the Heart and the two Persons are both members of this one SiameseTwin. (Database table names in bold)
I'm looking for a good, sound modelling approach for this setup.

I'll translate the business spec this way.

Every B has to have one and only one C

90% of the time B has a single A

10% of the time B has more than one A

Going back to original specs I will model this way:

A-PersonTable (1-n) <<===> (1) B-HouseholdTable (1) <===> (1) C-LicenseTable

Does this works?

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

robert.levy SSC Veteran Points: 251 More actions · Answer 6

Absolutely right, this is the setup I currently have. But:

for the 90% of Bs which only have one A:

I want to present the user with the friendly name of A (ie. not the surrogate key)

Should the B in this case have no friendly, human readable name? (i.e. NULL in this column?)

for the 10% of Bs which are a genuine group of As:

The user needs to then see a friendly name for the group

This all just seems a bit non "database-theory" correct somehow. More like a sort of quick-and-dirty solution.

Can anyone offer reassurances that this won't lead to horrible problems going forward? Or that there isn't a better, more elegant way of doing this. Seems to me like having lots of groups with only one element would almost count as redundancy in this eyes of theory purists.

Advice, previous experience, links to good reading matter all welcomed!

Thanks,

Rob

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 7

robert.levy (8/9/2010)
Absolutely right, this is the setup I currently have. But:
for the 90% of Bs which only have one A:
I want to present the user with the friendly name of A (ie. not the surrogate key)
Should the B in this case have no friendly, human readable name? (i.e. NULL in this column?)
for the 10% of Bs which are a genuine group of As:
The user needs to then see a friendly name for the group
This all just seems a bit non "database-theory" correct somehow. More like a sort of quick-and-dirty solution.
Can anyone offer reassurances that this won't lead to horrible problems going forward? Or that there isn't a better, more elegant way of doing this. Seems to me like having lots of groups with only one element would almost count as redundancy in this eyes of theory purists.

... which leads us to solution #2 on a previous post of mine. Treat B-HouseholdsTable as a Type II Slowly Changing Dimension. At the time a row gets created in this table assign the "friendly name" then if somebody decides to change his/her name you can safely expire the original row and insert a new row showing the new "friendy name".

As stated in a previous post this will require to add a meaningless PK tothis table - you have to keep the same HouseholdID as a non-unique index - and also add a couple of DateValidFrom/DateValidUntil columns that will accurately describe the change history as well as the current value of the "friendly name".

You may want to check Type II Slowly Changing Dimensions concept - plenty of documentation in the net.

Hope this helps.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

robert.levy SSC Veteran Points: 251 More actions · Answer 8

Ok, thanks Paul.

I did understand your post first time round. SCD is definitely the way to go with references to the name of B. I only posted again because I was looking for comments about the theoretical soundness of this way of modeling the setup.

By the fact that you haven't already said "urgh! why haven't you read <insert DB design book here> before posting this, you fool!", I'll assume you think that this is an ok way for me to model my little problem.

That's good news. I'll press ahead with the model you suggest.

Thanks,

Rob

p.s. If anyone else did say "urgh!" upon reading my proposed setup, feel free to let me know!