December 29, 2005 at 12:05 am
I recently got a requirement for designing a database for capturing sales and sector competitor data. This data would be primarily used for reports and would be updated once or twice in a month. Since it’s a reporting module not more than 5 to 10 people would be using it at a time (10 being slightly rare).
We are to integrate this with our existing product and there are plans to “productise” it. The orders from the top brass is to keep it as generalized as possible so that adapting it to other requirements would be relatively easy.
I had proposed a slightly de-normalized schema and someone else produced something on which I would like to have your comments.
This is what was proposed.
Table1
AttributeID | Section | Subsection | Attribute | Data |
1 | Order | OrderDetails | OrderID | <SOMEDATE> |
2 | Order | OrderDetails | Quantity | |
3 | Order | OrderDetails | ProductID | |
4 | Order | OrderDetails | SalesPerson | |
5 | State | City | Name | |
6 | State | NULL | | |
Table 2
SrNo(Unique ID) | AttributeID | Value |
1 | 5 | State1 |
2 | 5 | State2 |
3 | 6 | City1 |
4 | 6 | City2 |
5 | 6 | City3 |
6 | 1 | 101 |
7 | 2 | 500 |
8 | 3 | P1 |
9 | 4 | Andrew N |
10 | 6 | City3 |
11 | 6 | City4 |
This is the way in which the entire schema will be represented and there will be np physical mapping of the logical schema shown in Table 1.
Hence Table 2 will contain all the data for all the tables, which will be represented logically in Table 1. I foresee not more than 0.5 Million records at its peak (in about 8 years) so generally it would increment in about 80-90 K records/year.
The queries would be first made of the Table1 and the schema would be picked up and then Table2 would be queried for the actual values.
I am at a loss of words hence asking for your comments on the same….
December 29, 2005 at 2:11 am
Read http://www.sqlservercentral.com/columnists/dpeterson/lookuptablemadness.asp
If you use foreign keys then I can see the potential for performance issues when it comes to deleting records.
If you look at the execution plan of a delete query when DRI is declared you will notice that it searches both tables to detect if any insert,update or delete is valid. Inserts and updates tend not to be a problem because they tend to be single record operations.
Deletes on the other hand tend to be large archiving processes so what you will be doing (assuming that DRI is declared) will be a query on two quite large tables.
If you are going to use this sort of schema then your aim should be to limit its use to attributes that have a small number of values. To clarify, you wouldn't want attributes with several thousand values.
December 29, 2005 at 9:22 am
I like "slightly denormalized" data for reporting, if it's a secondary data store and not a primary one.
I agree with David that this can become cumbersome, not only for deletes, but for simple daily troubleshooting and querying.
December 29, 2005 at 10:36 am
Just to add a bit more. This is called an Open schema system. It can be very flexible if you are looking at the extensibility of it but on data integrity it could become a royal pain and in my opinion you will really need to allow DML through SP only and those SP will have to be taylored VERY carefully to guarrant data quality at least through code!
Just my $0.02
* Noel
December 29, 2005 at 12:26 pm
"Use the Force, Luke!"
I would like to see YOUR version that is "slightly denormalized"... Having been on both ends... a user/manager whispering in my ear "let's eliminate all those messy joins and put this all in one big table" vs. the architect saying "it's only seven tables and ___ can handle that easily"....
Denormalization is fine when justified; but make sure it is justified. Consider each element and how it is being used. The management orders of "generalization" and "integration" also mean there is a lot more to be considered. You must also consider what you are integrating with... and if it were to become a product, whether you would be proud for anyone to see it
I do not know how skilled you are with this; but I suspect this is a pretty new experience for you or you would not be lost for words. There is no substitute for analysis on something like this and understanding exactly how your data is used and why you are putting it "there"...
Okay... off my soapbox... good luck to you... and err, Happy New Year!!!
December 29, 2005 at 11:31 pm
Thanks for all the replies.
The structure which I had proposed (part of it) is under keyed.
Order
OrderID | SalesPersonId | Date |
1 | Order | <SOMEDATE> |
2 | Order |
|
3 | Order |
|
4 | Order |
|
5 | State |
|
6 | State |
|
Order Details
OrderID | ProductID | Quantity | Attribute | Data |
1 | Order | OrderDetails | OrderID | <SOMEDATE> |
2 | Order | OrderDetails | Quantity |
|
3 | Order | OrderDetails | ProductID |
|
4 | Order | OrderDetails | SalesPerson |
|
5 | State | City | Name |
|
6 | State | NULL |
|
|
Region
RegionID | RegionName | Parent |
1 | State1 | 0 |
2 | State2 | 0 |
3 | City1 | State1 |
4 | City2 | State1 |
5 | City3 | State2 |
6 | City4 | State3 |
When one considered solution 1 (mentioned in the first post), the complexicity lies when there are joins required with external tables for data retrieval.
Not to mention the fact that every insert update or delete will have to be broken down into multiple rows rather than a single row. Even a simple select which would other wise be returned in a single row format would be returned in the multiple row format.
Eg:
6 | City4 | State3 |
Will be returned as
6 |
City4 |
State3 |
Since that’s how the data is stored for the solution in my original post.
One will have to do additional processing to convert the rows into columns, which would obviously be needed to display the data to the user in the right format.
And though I am a lil new to the concept, I am not totally alien to designing databases
Wish you all a happy new years.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply