March 16, 2010 at 7:02 am
The Dixie Flatline (3/16/2010)
Jeff: Simply thanks. I've told you before at length how much I owe you. It's good not to be savaged with a pork chop first thing in the morning. 😉
BWAA-HAAA!!!! Well, "It Depends" on whether it's a smoked pork chop or not and whether there's a bit of apple-sauce to go with it. 🙂 Mmmmmm... pork chops.... gaaahhhh....
It's difficult to convince new and even some intermediate users that SQL Server has a proverbial "mind of its own" called the "optimizer" and that what works very well for a thousand rows may actually not be "scalable". I think this article demonstrates that quite nicely. Very well done (and I'm not talking pork chops :-)), Bob.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 16, 2010 at 7:11 am
Just to chime in, great article.
This kind of thing is why performance tuning isn't "one and done", but is an ongoing battle. Good illustration on that.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 16, 2010 at 7:27 am
Well done.
Great layout of examples to test several scenarios to show why you want to test with some different volumes.
I can see this as being pointed to often.
Greg E
March 16, 2010 at 7:35 am
I don't tend to praise a lot of articles, this one has my 5 stars!
Great Job.
* Noel
March 16, 2010 at 7:46 am
Am I the only one who copied the code directly and got 3 different result sets for each query?
Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.
I limited the generation of data to have 100 products and dates to reside in 2 years.
March 16, 2010 at 7:52 am
Well done Bob.
You're moving up in the world. First a presentation to the Memphis PASS Chapter on SQL Server Spatial Data, now this article. Next step, book. So when's the book coming out? 🙂
For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]
March 16, 2010 at 7:59 am
Not only does this article provide a concise explanation of "It depends" but it also provides a good example of how articles should be written for SQLServerCentral. (well written, to the point, contains easily-consumed example scripts & charts)
Thanks!
March 16, 2010 at 8:00 am
Thank you for a very clear and useful article. It provides a good reference to bring to some of those difficult discussions (you know the ones). Now I can start with, "What conditions did you test it under?" and give them a link (or a clue) 🙂
March 16, 2010 at 8:05 am
TedS (3/16/2010)
Am I the only one who copied the code directly and got 3 different result sets for each query?Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.
I limited the generation of data to have 100 products and dates to reside in 2 years.
Please go back and look at Paul's comment on the format of the code and see if that explains your issue.
Greg E
March 16, 2010 at 8:14 am
Good Job Bob... 🙂
-Roy
March 16, 2010 at 8:16 am
I really appreciate the depth on such a common topic. This is a simple scenario with great depth of information that should help any newcomer understand the nuances of databases and that number of records along with indexes can have a significant impact on performance. Thank you and I actually learned from this article after many years of database development.
Owen White
March 16, 2010 at 8:17 am
Greg Edwards-268690 (3/16/2010)
TedS (3/16/2010)
Am I the only one who copied the code directly and got 3 different result sets for each query?Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.
I limited the generation of data to have 100 products and dates to reside in 2 years.
Please go back and look at Paul's comment on the format of the code and see if that explains your issue.
Greg E
I did, it doesn't.
My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different.
While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.
March 16, 2010 at 8:33 am
Great article Bob. It couldn't have been easy to find examples that lined up so nicely to demonstrate your point. I'm guessing there are quite a few hours of work in that piece.
March 16, 2010 at 8:35 am
TedS (3/16/2010)
I did, it doesn't. My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different. While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.
I think the problem is in CTE1:
SELECT TOP(@rows) *
FROM dbo.SalesDetail
...add an ORDER BY ID to each CTE1 and it should produce the same results.
My formatted version of the code for anyone interested (with ORDER BY 'fix'):
------------------------------------------------------------
-- Testing All Three Techniques
------------------------------------------------------------
DECLARE @results
TABLE (
Technique VARCHAR(20) NOT NULL,
RowsCount INTEGER NOT NULL,
Millisecs INTEGER NOT NULL
);
DECLARE @timer DATETIME;
DECLARE @rows INTEGER;
DECLARE @loop INTEGER;
SET @rows = 1000; -- will be incremented X10 up to a million
WHILE @rows <= 1000000
BEGIN -- OUTER LOOP based on number of rows
SET @loop = 1;
WHILE @loop <= 50; -- INNER LOOP based on count to 10
BEGIN
------------------------------------------------------------
-- Join Detail to MAX([SaleDate])
------------------------------------------------------------
SET @timer = GETDATE();
WITH CTE1
AS (
SELECT TOP(@rows) *
FROM dbo.SalesDetail
ORDER BY ID
),
CTE2
AS (
SELECT product,
MAX(SaleDate) AS LastDate
FROM cte1
GROUP BY
product
)
SELECT S.*
INTO #results1
FROM dbo.SalesDetail S
JOIN cte2 C
ON C.product = S.product
AND S.SaleDate = C.lastDate;
INSERT @results
SELECT 'Match MAX()',
@rows,
DATEDIFF(MILLISECOND, @timer, GETDATE());
------------------------------------------------------------
-- SELECT TOP 1 for each product by SaleDate desc
------------------------------------------------------------
SET @timer = GETDATE();
WITH CTE1
AS (
SELECT TOP(@rows) *
FROM dbo.SalesDetail
ORDER BY ID
),
CTE2
AS (
SELECT DISTINCT product
FROM CTE1
)
SELECT ca.*
INTO #results2
FROM CTE2 C
CROSS
APPLY (
SELECT TOP (1) *
FROM dbo.SalesDetail S
WHERE S.product = C.product
ORDER BY
SaleDate DESC
) CA;
INSERT @results
SELECT 'Top (1)',
@rows,
DATEDIFF(MILLISECOND, @timer, GETDATE());
--------------------------------------------------------------------
-- use ROW_NUMBER() function to assign sequence IDs to rows
--------------------------------------------------------------------
SET @timer = GETDATE();
WITH CTE1
AS (
SELECT TOP(@rows) *
FROM dbo.SalesDetail
ORDER BY ID
),
CTE2
AS (
SELECT *,
ROW_NUMBER() OVER(
PARTITION BY product
ORDER BY SaleDate DESC)
AS seqID
FROM CTE1
)
SELECT *
INTO #results3
FROM CTE2
WHERE seqID = 1;
INSERT @results
SELECT 'Row_Number()',
@rows,
DATEDIFF(MILLISECOND, @timer, GETDATE())
-- note: prior to adding loops, the queries were tested to ensure
-- they were producing identical result sets
DROP TABLE #results1;
DROP TABLE #results2;
DROP TABLE #results3 ;
SET @loop = @loop + 1;
END; -- INNER LOOP
SET @rows = @rows * 10;
END; -- INNER LOOP
----------------------------------------------------------
SELECT Technique,
RowsCount,
AVG(1.0 * Millisecs) AS AvgMS
FROM @results
GROUP BY
Technique,RowsCount;
Untested.
March 16, 2010 at 8:58 am
Nice article with great statistics.
Although in the real world, I would think that a table containing only distinct products would be available, and that on the SalesDetail table placing the unique clustured index on product, salesdate desc, and id would have been considered to keep all of the sales data for a particular product on the same area of disk with the most recent sales at the beginning.
Viewing 15 posts - 16 through 30 (of 98 total)
You must be logged in to reply to this topic. Login to reply