January 27, 2009 at 9:42 am
Hi,
first post from a lousy sql apprentice, so please forgive
any noobness in the following question:
In my current assignment i have been asked to derive a
relational schema for a simple, one level deep, hierarchy of
objects ( just theoretical stuff, no products and order lines here :))
that are all specializations of a common entity.
Between any two of these entities a certain relationship can
be established, that also has to go in the database.
For several reasons i am forced to use a 'collapsed'
design with one table for each derived entity that includes both
local and inherited attributes ( plus a surrogate key ), and it
all works fine except i don't see a simple way of modeling the
relationship.
If i used a composition table RELAT(a, b) i would need to write
a trigger to make sure that a and b identify two actual rows of some
table, as i have no way of enforcing referential integrity through
a foreign key ( right? ). But all of this doesn't sound very 'relational to me.
The internet hasn't been very helpful so far so i thought i should ask
the professionals: Is there a smart way of establishing cross table
relationships like the one i've described?
Thanks in advance.
January 28, 2009 at 10:59 pm
beetlejuice (1/27/2009)
But all of this doesn't sound very 'relational to me.
I've never heard sound effects emanating from logic:) Lets see what you have so far. If it bleeds you can kill it, if you can relate things you can model it.
January 30, 2009 at 7:54 am
This depends rather heavily on the nature of the relationship. How, exactly, would any one of these "table-based entities" relate to another one? Unless the two to be "related" share something in common, you'll have to abstract their relationship to be something you can materialize into a common field, and ideally, have that same abstraction be applicable to ALL the entities. Lacking that, I'll go so far as to say you may well have the wrong design for a relational database.
Codd layed out the basic concepts for relational databases a rather long time ago, but blind normalization is not always the best way to go. The most important thing, most of the time, is to let any given database object represent a "real-world" entity, getting as close to reality as you reasonably can. Abstraction is often useful, especially when representing reality is inherently difficult, for whatever reason. The critical thing with abstraction is to do it consistently, and in some manner consistent with the nature of the entities being represented.
That may sound like a lot of fancy talk that doesn't say very much, but without more detail about the nature of your "entities", it's a bit difficult to be more specific.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
February 1, 2009 at 10:24 am
smunson (1/30/2009)
That may sound like a lot of fancy talk that doesn't say very much, but without more detail about the nature of your "entities", it's a bit difficult to be more specific.
I'm sorry, in an attempt to be concise i just left out all the vital pieces of information 🙂
My 'entities' are the symbols of an UML-like class diagram, classes, attributes and methods.
Attributes and methods must belong to a class, and a class can be in turn part of the scope
of another class or just be a top level class within a diagram.
The only things these entities have in common is that they're part of some diagram and have a name
which, to further complicate things, is optional for some entities and mandatory for others ( and
possibly unique within the containing entity ).
In my design, i have one table for each of the primary entities ( please don't mind the
awful naming of columns, it's just for clarity ):
Diagram ( id, ... )
Class ( id, diagram_id, owning_class_id, ... )
Attribute ( id, owning_class_id, ... )
Method ( id, owning_class_id, ... )
ClassRelationship ( class1_id, class2_id )
where ClassRelationship models, surprisingly, the kind of relationship you can establish between any
two classes. What i don't see is how to model heterogeneous relationships, eg. a 'dependency' between
an attribute of a class and a method of another class, without heavy use of triggers and possibly
with some referential integrity so that when an element gets dropped, all the related elements would
follow gracefully without me, or the dba, having to do too much work 🙂
February 2, 2009 at 6:52 am
Ok, but then what you're looking for is entirely "generic", such that you could effectively model ANY given class, method, and attribute scenario. The idea that a class not be named doesn't make any sense from a programming perspective - it has to be called something in order to be able to be referenced, so I'm not sure what that's all about. If you want more help, I think you need to be a lot more specific about the real-world scenario you're looking to implement. If, on the other hand, you're looking to build a generic model so that it can be used in a proprietary product, then you may well be on your own.
However, be aware there's an inherent problem associated with building a database to represent a "generic" object. You tend to end up with "EAV" designs (aka Entity, Attribute, Value). Such designs are notoriously poor in one aspect or another, be it performance or the ability to create a simple query. Often, the latter can become almost impossible without severe performance constraints. The problem is that a relational database can be made to be pretty darned good (if not excellent) at representing real-world objects, but them moment you lose sight of the mapping between real-world objects and database objects, things tend to get messy and out of hand in rather short order. You only abstract things so far before you start to lose sight of the true objective.
I think you need to worry more about WHY you need a database to represent these objects, because that has a better chance at driving the right design direction.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
February 2, 2009 at 5:03 pm
smunson (2/2/2009)
Ok, but then what you're looking for is entirely "generic", such that you could effectively model ANY given class, method, and attribute scenario. The idea that a class not be named doesn't make any sense from a programming perspective - it has to be called something in order to be able to be referenced, so I'm not sure what that's all about. If you want more help, I think you need to be a lot more specific about the real-world scenario you're looking to implement. If, on the other hand, you're looking to build a generic model so that it can be used in a proprietary product, then you may well be on your own.
I wasn't hiding anything this time, I just need to have UML-like diagrams, outputted from a visual tool, stored inside a relational database,
that's all 🙂
February 2, 2009 at 5:43 pm
Ok, but you still haven't answered the WHY question. Why do you need to do this? What's the objective that a database will give you that some other solution won't?
Whatever the answer, you're probably going to want to avoid the design you had in mind, as the power of the database would likely disappear due to the complexity of the queries necessary to get any useful output. Abstraction is likely to have a good use in this case. Identifying a particular diagram is easy - use an integer or GUID field, but don't make the field unique wihtin the table. All the records that are part of the same diagram will use the same ID. Recording the pieces of that diagram is then a matter of determining all possible "types" of diagram entities, and having fields available to describe the attributes. If the number of attributes is too large to be practical, then just store the items inherent to ALL diagrams in the main table, and have a 2nd table that gives you a limited EAV design, such that each entity type has a type id, and the attribute name and value. You'll need to be able to ID each entity as well, and then cross reference them with a linkage field.
This can get messy, and may require significantly recursive CTE's, but I'm not sure how practical the alternative is.
The other question that comes to mind is why not use a diagramming tool? What will the DB do that a UML diagramming tool won't?
Steve
(aka smunson)
:):):)
beetlejuice (2/2/2009)
smunson (2/2/2009)
Ok, but then what you're looking for is entirely "generic", such that you could effectively model ANY given class, method, and attribute scenario. The idea that a class not be named doesn't make any sense from a programming perspective - it has to be called something in order to be able to be referenced, so I'm not sure what that's all about. If you want more help, I think you need to be a lot more specific about the real-world scenario you're looking to implement. If, on the other hand, you're looking to build a generic model so that it can be used in a proprietary product, then you may well be on your own.I wasn't hiding anything this time, I just need to have UML-like diagrams, outputted from a visual tool, stored inside a relational database,
that's all 🙂
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
February 2, 2009 at 6:20 pm
I believe UML strives to be entity based and since relational is often mistaken for being a hierarchical form of entity management, relational databases often get put under pressure to cater for this.
Steve's analysis of the situation is quite correct but I also believe that given the correct structure, relational databases can offer object orientated functionality. That's not to say that a certain amount of background processing won't be necessary. I've seen some crazy methods being employed, some with constructed referential keys that depict the heirarchy within (there's your trigger scenario - primarily calling sp's to do the dirty work) and some with really complex referential hierarchies but all of them demand the overhead to maintain them.
Define your elements, then your limitations and then code the rest, hope this helps.
Max
February 2, 2009 at 7:03 pm
[font="Verdana"]I guess I am still trying to understand why you need to do this. From experience, over-generalised designs are a way of compensating for a lack of analysis. The performance and ease of use of the resulting design suffers accordingly.
If you plan to use a relational database, design your data store to use it efficiently. Don't use a relational database as a generalised hierarchical document store. There are other products to do that.[/font]
February 3, 2009 at 6:36 am
Most modeling tools use proprietary file formats to store diagrams, which works fine
for local editing but less in distributed environments, with several people accessing
and modifying the same diagrams.
Also, the definition of rules to enforce the correctness of diagrams, which may
vary on a project basis, must be done in code and this means either the application
needs to be rebuilt every time or some complex scripting engine must be employed.
All this data management and scripting is already done by RDBMS, but much more
efficiently and robustly than any code you could ever write.
can you make a brief example of your design, cause i didn't understand it fully.
i've done some research and it seems there are a few accepted methods to
handle OO scenarios, NONE of which i like really :):).
I know Oracle and some other RDBMS even introduced specific SQL statements
for this purpose, but i am devising this system to be used by people with
an average level of SQL expertise, so for now I am just going for plain relational.
February 3, 2009 at 9:10 am
beetlejuice,
My so-called design is what most folks call "EAV", or Entity, Attribute, Value. Such designs are notorious for poor performance. The biggest problem I foresee is being able to model all the object relationships within any given UML diagram. If you have to define rules about those relationships, you're in serious trouble, and EAV is not capable of handling things at that level without a really fancy application behind the scenes, or a rather complex trigger setup that calls stored procedures. Given your most recent statement about the "average level of SQL experience", that kind of setup just doesn't qualify.
However, if you can live without defining rules for the relationships, and simply allow your data store to say what type of relationship it is, and have that relationship always described exclusively at either the parent or the child, but never both, then you might be able to work with EAV. I still can't recommend it based on what I know so far, but here's the concept:
CREATE TABLE dbo.DIAGRAMS (
DIAGRAM_ID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
DIAGRAM_NAME varchar(50),
COMMON_PROPERTY1 varchar(20),
COMMON_PROPERTY2 varchar(20),
.
.
.
)
CREATE TABLE dbo.ENTITIES (
ENTITY_ID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
DIAGRAM_ID int NOT NULL, -- REFERENCES dbo.DIAGRAMS.DIAGRAM_ID
IS_RELATED_TO int NULL, -- REFERENCES dbo.ENTITIES.ENTITY_ID
RELATIONSHIP_TYPE varchar(40) NULL, -- used by your application
COMMON_PROPERTY1 varchar(20),
COMMON_PROPERTY2 varchar(20),
.
.
.
)
CREATE TABLE dbo.E_UNIQUE_PROPERTIES (
PROPERTY_ID int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
ENTITY_RN int NOT NULL, -- REFERENCES dbo.ENTITIES.ENTITY_ID
UNIQUE_PROPERTY_TYPE varchar(50),
UNIQUE_PROPERTY_VALUE varchar(50)
)
That last table is designed to have one record for each unique property associated with a given entity. I haven't actually coded the foreign-key references, as I don't have the time to test this right now, but I think you get the concept. Again, it's not exactly the recommended way to go, and you'll have to determine for yourself what the performance characteristics are going to be, and they could (and probably will) be troublesome.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
February 3, 2009 at 12:19 pm
Before anything else, thank you smunson for devoting so much time to my problem 🙂
This EAV approach is new to me and seems indeed very powerful ( although i fear Edgar Codd might be spinning in his grave 🙂 ), but as you pointed out it will require too much coding to ensure diagram correctness ( and yes, relationships are not only subject to several constraints, but are also entities themselves and can have properties :w00t: ).
What do you think of this solution:
table DIAGRAMS
id PK,
name,
table ENTITIES
id PK,
diagram_id REF. DIAGRAMS
table CLASSES
id PK, REF. ENTITIES,
table ATTRIBUTES
id PK, REF. ENTITIES,
class_id REF. CLASSES,
table RELATIONSHIP_A
id PK,
source REF. ENTITIES,
target REF. ENTITIES,
This configuration enables me to perform basic checks with minimal effort ( eg. two entities that are in a relationship must belong to
the same diagram ), but on the downside, the definition of a each entity is scattered between two tables and this
means a bit of trigger work is required to maintain consistency.
February 3, 2009 at 12:41 pm
beetlejuice (2/3/2009)
Most modeling tools use proprietary file formats to store diagrams, which works fine for local editing but less in distributed environments, with several people accessing and modifying the same diagrams.
[font="Verdana"]Most modelling tools I know of allow you to store your models within a database to solve precisely this issue.[/font]
February 3, 2009 at 2:18 pm
Bruce W Cassidy (2/3/2009)
[font="Verdana"]Most modelling tools I know of allow you to store your models within a database to solve precisely this issue.[/font]
I should have said most 'freeware' tools don't support this feature.
February 3, 2009 at 2:24 pm
[font="Verdana"]Okay, I am probably going to anger some purists with this suggestion, but let's be pragmatic here.
If you are trying to come up with a database design that ensures that all potential data stored is "valid", I have to ask... why?
Try a simple database design, use security to lock it down so only your application can make data changes, and ensure that the logic within your application is posting valid data.
That way you can focus on designing a database model that is easy to work with, performant, secure and so on. Don't over-complicate the issue by trying to build in what is really part of the business logic as part of the database design.
The database doesn't need to know what the data is for (that's the job of the application). It just needs to store the data.
Falls off the soapbox.[/font]
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply