Denormalization Strategies

Question

Denormalization Strategies

Viewing 15 posts - 16 through 30 (of 45 total)

You must be logged in to reply to this topic. Login to reply

Alvin Ramard SSC-Forever Points: 41190 More actions · Answer 1

Paul White (3/15/2010)
Alvin Ramard (3/15/2010)
(Seriously, there was no pun intended.)
Given your track record for bad puns, Alvin, I have my doubts :laugh:
Benefit of the doubt. 😛

If a pun had been intended, I would have included a smiley.

I can understand you comment. I do have a reputation for trying to add a bit of humor to many situations. 🙂

Alvin Ramard
Memphis PASS Chapter[/url]

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 2

In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:

Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...

WHERE P_MS.DateReceived > getdate() - 365

...would have been better expressed declaring a scalar variable:

DECLARE @selectDate = getdate()-365

...

WHERE P_MS.DateReceived > @selectDate

...

...which would allow the optimizer to use an index on DateReceived.

Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

Marcus AFS Valued Member Points: 51 More actions · Answer 3

Both of these examples depict DSS type operations. Rather then denormalize a live database I would prefer to created a data warehouse where denormalization is the norm. I believe you denormalize during load testing and then only IF you have a significant performance issue. Over time, a normalized database is easier to modify than a denormalized database.

Another alternative would be to use all natural keys and that way the UID of the parent table would be carried down to the UID of the children, grandchildren etc. Of course the big disadvantage to this approach is if you have many generations there would be more key columns than data columns in the lowest generation.

Jaji03 SSC Enthusiast Points: 139 More actions · Answer 4

Les Cardwell (3/15/2010)
Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.

Nicely said Grasshopper 🙂

I like the idea of denormalization, but many people look for these types of articles to re-establish their non-existing point of designing a sloppy, good for nothing database. They just totally ignore the last paragraph! :rolleyes:

It usually takes alot of time and effort to denormalize a database. But shouldn't you FIRST NORMALIZE then DENORMALIZE if benefit can be measured???? Right??? 😀

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:
Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

Actually, this:

WHERE P_MS.DateReceived > getdate() - 365

can use an index on DateReceived. The function call is on the right of the conditional and will only be calculated once.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:
Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:

Also, this:

DECLARE @selectDate = getdate()-365

won't work. In SQL Server 2008 it needs to be like this:

DECLARE @selectDate datetime = getdate()-365

SQLRNNR SSC Guru Points: 281334 More actions · Answer 7

Paul White (3/15/2010)
Normalize 'til it hurts...de-normalize* 'til it works!

Agreed.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

SQLRNNR SSC Guru Points: 281334 More actions · Answer 8

Alvin Ramard (3/15/2010)
Paul White (3/15/2010)
Jim,
Yes. Data warehouses are a totally different kettle.
It's normal for denormalization to be present in a data warehouse.
(Seriously, there was no pun intended.)

Absolutely. There should not be a lot of transactions occurring there and flatter structures can be much more beneficial.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

SQLRNNR SSC Guru Points: 281334 More actions · Answer 9

Lynn Pettis (3/15/2010)
Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:
Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365

For what it's worth, it doesn't work in 2005 either.

Cannot assign a default value to a local variable.

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

CirquedeSQLeil (3/15/2010)
Lynn Pettis (3/15/2010)
Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:
Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365
For what it's worth, it doesn't work in 2005 either.
Cannot assign a default value to a local variable.

Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 11

Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365

For what it's worth, it doesn't work in 2005 either.
Cannot assign a default value to a local variable.

Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.

Good catch on the 'type' 🙂

Actually, in 2005 it needs to be...

DECLARE @selectDate DATETIME

SET @selectDate = getdate() - 365

;

I'm jumping around between SQL2000, SQL2005, SQL2008, Oracle10g, and DB2... nutz.

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 12

Actually, this:
WHERE P_MS.DateReceived > getdate() - 365
can use an index on DateReceived. The function call is on the right of the conditional and will only be calculated once.

Hmmm... positive? Since 'getdate()' is a non-deterministic function, like all non-deterministic functions, we've always assigned them to a scalar variable to ensure the dbms won't perform a table-scan...although admittedly, these days they seem to be more implementation dependent.

From SQL Server Help...

For example, the function GETDATE() is nondeterministic. SQL Server puts restrictions on various classes of nondeterminism. Therefore, nondeterministic functions should be used carefully. The lack of strict determinism of a function can block valuable performance optimizations. Certain plan reordering steps are skipped to conservatively preserve correctness. Additionally, the number, order, and timing of calls to user-defined functions is implementation-dependent. Do not rely on these invocation semantics.

JFWIW...

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

timclaason SSC-Addicted Points: 486 More actions · Answer 13

Good points made. I have never found using getdate() inside a SQL query to be problematic in my execution plans. However, if it's "best practice" to not do it, then I'll probably stop. I had never thought about it, before now.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 14

Les Cardwell (3/15/2010)
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365
For what it's worth, it doesn't work in 2005 either.
Cannot assign a default value to a local variable.
Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.
Good catch on the 'type' 🙂
Actually, in 2005 it needs to be...
DECLARE @selectDate DATETIME
SET @selectDate = getdate() - 365
;
I'm jumping around between SQL2000, SQL2005, SQL2008, Oracle10g, and DB2... nutz.

Pretty sure.

Table/Index defs

USE [SandBox]

GO

/****** Object: Table [dbo].[JBMTest] Script Date: 03/15/2010 12:49:16 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[JBMTest](

[RowNum] [int] IDENTITY(1,1) NOT NULL,

[AccountID] [int] NOT NULL,

[Amount] [money] NOT NULL,

[Date] [datetime] NOT NULL

) ON [PRIMARY]

GO

/****** Object: Index [IX_JBMTest] Script Date: 03/15/2010 12:49:16 ******/

CREATE CLUSTERED INDEX [IX_JBMTest] ON [dbo].[JBMTest]

(

[Date] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

/****** Object: Index [IX_JBMTest_AccountID_Date] Script Date: 03/15/2010 12:49:16 ******/

CREATE NONCLUSTERED INDEX [IX_JBMTest_AccountID_Date] ON [dbo].[JBMTest]

(

[AccountID] ASC,

[Date] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

Simple query:

select * from dbo.JBMTest where Date > getdate() - 365

Actual execution plan attached.

There are 1,000,000 records in the test table.

SQLRNNR SSC Guru Points: 281334 More actions · Answer 15

Lynn Pettis (3/15/2010)
USE [SandBox]
GO
/****** Object: Table [dbo].[b]JBMTest[/b] Script Date: 03/15/2010 12:49:16 ******/

Looks like a familiar setup 😉

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events