Denormalisation and the other option.

Question

Post reply

Denormalisation and the other option.

nileshsane

Default port

Points: 1465
More actions
December 29, 2005 at 12:05 am

#167558

I recently got a requirement for designing a database for capturing sales and sector competitor data. This data would be primarily used for reports and would be updated once or twice in a month. Since it’s a reporting module not more than 5 to 10 people would be using it at a time (10 being slightly rare).
We are to integrate this with our existing product and there are plans to “productise” it. The orders from the top brass is to keep it as generalized as possible so that adapting it to other requirements would be relatively easy.
I had proposed a slightly de-normalized schema and someone else produced something on which I would like to have your comments.
This is what was proposed.
Table1
AttributeID
Section
Subsection
Attribute
Data
1
Order
OrderDetails
OrderID
<SOMEDATE>
2
Order
OrderDetails
Quantity

3
Order
OrderDetails
ProductID

4
Order
OrderDetails
SalesPerson

5
State
City
Name

6
State
NULL

Table 2
SrNo(Unique ID)
AttributeID
Value
1
5
State1
2
5
State2
3
6
City1
4
6
City2
5
6
City3
6
1
101
7
2
500
8
3
P1
9
4
Andrew N
10
6
City3
11
6
City4

This is the way in which the entire schema will be represented and there will be np physical mapping of the logical schema shown in Table 1.
Hence Table 2 will contain all the data for all the tables, which will be represented logically in Table 1. I foresee not more than 0.5 Million records at its peak (in about 8 years) so generally it would increment in about 80-90 K records/year.

The queries would be first made of the Table1 and the schema would be picked up and then Table2 would be queried for the actual values.

I am at a loss of words hence asking for your comments on the same….
What I hear I forget, what I see I remember, what I do I understand

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 76000 More actions · Answer 1

Read http://www.sqlservercentral.com/columnists/dpeterson/lookuptablemadness.asp

If you use foreign keys then I can see the potential for performance issues when it comes to deleting records.

If you look at the execution plan of a delete query when DRI is declared you will notice that it searches both tables to detect if any insert,update or delete is valid. Inserts and updates tend not to be a problem because they tend to be single record operations.

Deletes on the other hand tend to be large archiving processes so what you will be doing (assuming that DRI is declared) will be a query on two quite large tables.

If you are going to use this sort of schema then your aim should be to limit its use to attributes that have a small number of values. To clarify, you wouldn't want attributes with several thousand values.

LinkedIn Profile

Steve Jones - SSC Editor SSC Guru Points: 736212 More actions · Answer 2

I like "slightly denormalized" data for reporting, if it's a secondary data store and not a primary one.

I agree with David that this can become cumbersome, not only for deletes, but for simple daily troubleshooting and querying.

noeld SSC Guru Points: 96590 More actions · Answer 3

Just to add a bit more. This is called an Open schema system. It can be very flexible if you are looking at the extensibility of it but on data integrity it could become a royal pain and in my opinion you will really need to allow DML through SP only and those SP will have to be taylored VERY carefully to guarrant data quality at least through code!

Just my $0.02

* Noel

david russell-253790 SSCarpal Tunnel Points: 4042 More actions · Answer 4

"Use the Force, Luke!"

I would like to see YOUR version that is "slightly denormalized"... Having been on both ends... a user/manager whispering in my ear "let's eliminate all those messy joins and put this all in one big table" vs. the architect saying "it's only seven tables and ___ can handle that easily"....

Denormalization is fine when justified; but make sure it is justified. Consider each element and how it is being used. The management orders of "generalization" and "integration" also mean there is a lot more to be considered. You must also consider what you are integrating with... and if it were to become a product, whether you would be proud for anyone to see it

I do not know how skilled you are with this; but I suspect this is a pretty new experience for you or you would not be lost for words. There is no substitute for analysis on something like this and understanding exactly how your data is used and why you are putting it "there"...

Okay... off my soapbox... good luck to you... and err, Happy New Year!!!

Cheers,

david russell

nileshsane Default port Points: 1465 More actions · Answer 5

Thanks for all the replies.

The structure which I had proposed (part of it) is under keyed.

Order

OrderID

SalesPersonId

Date

1

Order

2

Order

3

Order

4

Order

5

State

6

State

Order Details

OrderID

ProductID

Quantity

Attribute

Data

1

Order

OrderDetails

OrderID

2

Order

OrderDetails

Quantity

3

Order

OrderDetails

ProductID

4

Order

OrderDetails

SalesPerson

5

State

City

Name

6

State

NULL

Region

RegionID

RegionName

Parent

1

State1

0

2

State2

0

3

City1

State1

4

City2

State1

5

City3

State2

6

City4

State3

When one considered solution 1 (mentioned in the first post), the complexicity lies when there are joins required with external tables for data retrieval.

Not to mention the fact that every insert update or delete will have to be broken down into multiple rows rather than a single row. Even a simple select which would other wise be returned in a single row format would be returned in the multiple row format.

Eg:

6

City4

State3

Will be returned as

6

City4

State3

Since that’s how the data is stored for the solution in my original post.

One will have to do additional processing to convert the rows into columns, which would obviously be needed to display the data to the user in the right format.

And though I am a lil new to the concept, I am not totally alien to designing databases

Wish you all a happy new years.

What I hear I forget, what I see I remember, what I do I understand