March 15, 2010 at 9:02 am
Paul White (3/15/2010)
Alvin Ramard (3/15/2010)
(Seriously, there was no pun intended.)Given your track record for bad puns, Alvin, I have my doubts :laugh:
Benefit of the doubt. 😛
If a pun had been intended, I would have included a smiley.
I can understand you comment. I do have a reputation for trying to add a bit of humor to many situations. 🙂
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
March 15, 2010 at 11:01 am
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:
Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
March 15, 2010 at 11:28 am
Both of these examples depict DSS type operations. Rather then denormalize a live database I would prefer to created a data warehouse where denormalization is the norm. I believe you denormalize during load testing and then only IF you have a significant performance issue. Over time, a normalized database is easier to modify than a denormalized database.
Another alternative would be to use all natural keys and that way the UID of the parent table would be carried down to the UID of the children, grandchildren etc. Of course the big disadvantage to this approach is if you have many generations there would be more key columns than data columns in the lowest generation.
March 15, 2010 at 11:30 am
Les Cardwell (3/15/2010)
Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Nicely said Grasshopper 🙂
I like the idea of denormalization, but many people look for these types of articles to re-establish their non-existing point of designing a sloppy, good for nothing database. They just totally ignore the last paragraph! :rolleyes:
It usually takes alot of time and effort to denormalize a database. But shouldn't you FIRST NORMALIZE then DENORMALIZE if benefit can be measured???? Right??? 😀
March 15, 2010 at 11:42 am
Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:
Actually, this:
WHERE P_MS.DateReceived > getdate() - 365
can use an index on DateReceived. The function call is on the right of the conditional and will only be calculated once.
March 15, 2010 at 11:44 am
Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365
March 15, 2010 at 11:45 am
Paul White (3/15/2010)
Normalize 'til it hurts...de-normalize* 'til it works!
Agreed.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 15, 2010 at 11:47 am
Alvin Ramard (3/15/2010)
Paul White (3/15/2010)
Jim,Yes. Data warehouses are a totally different kettle.
It's normal for denormalization to be present in a data warehouse.
(Seriously, there was no pun intended.)
Absolutely. There should not be a lot of transactions occurring there and flatter structures can be much more beneficial.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 15, 2010 at 11:51 am
Lynn Pettis (3/15/2010)
Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365
For what it's worth, it doesn't work in 2005 either.
Cannot assign a default value to a local variable.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 15, 2010 at 11:58 am
CirquedeSQLeil (3/15/2010)
Lynn Pettis (3/15/2010)
Les Cardwell (3/15/2010)
In spite of the criticism, it was still a simple example of minimal denormalization to achieve an end result rather than a full-on explosion of wide rows to reduce the NF to 0 :pinch:Interestingly enough, the biggest cost to the initial query, which probably exceeded benefits of denormalization, was using a 'function' in a WHERE predicate...
WHERE P_MS.DateReceived > getdate() - 365
...would have been better expressed declaring a scalar variable:
DECLARE @selectDate = getdate()-365
...
WHERE P_MS.DateReceived > @selectDate
...
...which would allow the optimizer to use an index on DateReceived.
Unfortunately, denormalization for immutable datasets as we've used it in the past just doesn't scale, especially on large datasets...not to mention the escalating complexity (and headaches) it entails. Ironically, not even for data-warehouses that go the same route (MOLAP) vs. a Multi-dimensional ROLAP Snowflake Schema. The stastical implications are the subject of current research, though it's proving a bit of a challenge to account for all the complexities it can entail (code proliferation, data-correctness, increased complexity of refactoring to accomodate changing business rules, data-explosion, etc.). The data-tsunami is upon us :w00t:
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365
For what it's worth, it doesn't work in 2005 either.
Cannot assign a default value to a local variable.
Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.
March 15, 2010 at 12:10 pm
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365
For what it's worth, it doesn't work in 2005 either.
Cannot assign a default value to a local variable.
Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.
Good catch on the 'type' 🙂
Actually, in 2005 it needs to be...
DECLARE @selectDate DATETIME
SET @selectDate = getdate() - 365
;
I'm jumping around between SQL2000, SQL2005, SQL2008, Oracle10g, and DB2... nutz.
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
March 15, 2010 at 12:17 pm
Actually, this:
WHERE P_MS.DateReceived > getdate() - 365
can use an index on DateReceived. The function call is on the right of the conditional and will only be calculated once.
Hmmm... positive? Since 'getdate()' is a non-deterministic function, like all non-deterministic functions, we've always assigned them to a scalar variable to ensure the dbms won't perform a table-scan...although admittedly, these days they seem to be more implementation dependent.
From SQL Server Help...
For example, the function GETDATE() is nondeterministic. SQL Server puts restrictions on various classes of nondeterminism. Therefore, nondeterministic functions should be used carefully. The lack of strict determinism of a function can block valuable performance optimizations. Certain plan reordering steps are skipped to conservatively preserve correctness. Additionally, the number, order, and timing of calls to user-defined functions is implementation-dependent. Do not rely on these invocation semantics.
JFWIW...
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
March 15, 2010 at 12:39 pm
Good points made. I have never found using getdate() inside a SQL query to be problematic in my execution plans. However, if it's "best practice" to not do it, then I'll probably stop. I had never thought about it, before now.
March 15, 2010 at 12:52 pm
Les Cardwell (3/15/2010)
Also, this:
DECLARE @selectDate = getdate()-365
won't work. In SQL Server 2008 it needs to be like this:
DECLARE @selectDate datetime = getdate()-365
For what it's worth, it doesn't work in 2005 either.
Cannot assign a default value to a local variable.
Nope, it doesn't. Being able to assign a value to a variable when it is declared is new to SQL Server 2008. Guess what, we upgraded our PeopleSoft systems to SQL Server 2008 EE. Now, we just need to start upgrading our other systems.
Good catch on the 'type' 🙂
Actually, in 2005 it needs to be...
DECLARE @selectDate DATETIME
SET @selectDate = getdate() - 365
;
I'm jumping around between SQL2000, SQL2005, SQL2008, Oracle10g, and DB2... nutz.
Pretty sure.
Table/Index defs
USE [SandBox]
GO
/****** Object: Table [dbo].[JBMTest] Script Date: 03/15/2010 12:49:16 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[JBMTest](
[RowNum] [int] IDENTITY(1,1) NOT NULL,
[AccountID] [int] NOT NULL,
[Amount] [money] NOT NULL,
[Date] [datetime] NOT NULL
) ON [PRIMARY]
GO
/****** Object: Index [IX_JBMTest] Script Date: 03/15/2010 12:49:16 ******/
CREATE CLUSTERED INDEX [IX_JBMTest] ON [dbo].[JBMTest]
(
[Date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
/****** Object: Index [IX_JBMTest_AccountID_Date] Script Date: 03/15/2010 12:49:16 ******/
CREATE NONCLUSTERED INDEX [IX_JBMTest_AccountID_Date] ON [dbo].[JBMTest]
(
[AccountID] ASC,
[Date] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
Simple query:
select * from dbo.JBMTest where Date > getdate() - 365
Actual execution plan attached.
There are 1,000,000 records in the test table.
March 15, 2010 at 1:10 pm
Lynn Pettis (3/15/2010)
USE [SandBox]
GO
/****** Object: Table [dbo].[b]JBMTest[/b] Script Date: 03/15/2010 12:49:16 ******/
Looks like a familiar setup 😉
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
Viewing 15 posts - 16 through 30 (of 45 total)
You must be logged in to reply to this topic. Login to reply