October 23, 2007 at 9:06 pm
>I tested my cursor version against this on a million rows
>TESTING TESTING TESTING....did someone mention testing?
(This is not directed at anyone in particular. At most it
rises to the level of a shot across the bow of a ship
whose course needs dire correction).
Exactly what was tested here. What specific question(s) has been answered?
There is a pervasive idea in sql communities that sql operates in a different environment than the rest of computer science. That basic concepts of computer science do not apply to sql databases. That somehow the 'sql standard' immunizes sql users from concepts and standards established in the rest of computer science. There exists a double standard if you will. This of course is not only bizarre but stupid and has significant consequences. The idea of an sql programmer is taken as synonymous with a developer much like an sql database system is taken as synonymous with a relational database. That knowing everything in bol equates to understanding the database environment. That knowing how to write complex queries implies a deep understanding of relational ideas. All these things are utter nonsense. It leads to the sorry spectacle of brilliant programmers, who when asked to explain what they have really done, proving that they have no idea of what they are talking about. It leads to the assumption that sql vendors are the real gatekeepers of how things ought to be. Sadly this means the database industry is based on faith and little else. This means that the confusion that exists within sql is perpetually passed on to each newbie. This leads to the sorry spectacle of users who are unable to even frame an intelligent question being only outdone by the ridiculous explanations they are given. Unfortunately in the ever increasingly isolated world of sql 'really' reading something like bol and testing software remain elusive targets.
best,
October 23, 2007 at 11:31 pm
And, thank you for the compliment, Sandy.
--Jeff Moden
Hey Jeff :),
You deserve the compliment for this type of approach to SQL Script.
its really good way what i feel.
Cheers!
Sandy.
--
October 23, 2007 at 11:34 pm
>No doubt... RAC does everything...
>
>... slower...
>
>I already have enough performance impinged tools... but thanks
>anyway. You still haven't answered my question, though... how
>long does the million row example take?
Jeff,
The short answer.
That's like asking the fox to watch over the henhouse isnt' it?
Your placing your blind trust in a answer by the vendor? So
you want me to parrot the same nonsense offered by TPC. Allow
me to save you from yourself. If your that interested in seeing
a number from a million rows why in the world would you ask someone
else to do it for you much less the vendor!? And it would please
me even more if in addition you decided to test it.
A perspective
The time to go from A to B is one thing. To be able to get from
A to B is another.
It is one thing to recognize as significant elapsed time, it is
quite another to worship it. Time is tangible, concepts are
intangible. It is easier to sell time. You want to rank faster
use a ranking function. It's fast as long as the rank and its
order are the same. You just paid the price of fast. And how do
you measure the elapsed time of not being able to use it? You
want fast triggers ok you have them. But you also have undefined
triggers, triggers that can't distinguish between one row and
another. You want faster queries on joins. Ok you have them
since sql doesn't concern itself with the key of a resultset.
You want even faster queries on joins with any kind of
predicate(s). You have them since there is no restrictions on
the cardinality of a join. You want faster inserts well just
limit the kinds of constraints defined on columns. You want
as fast a response as possible on any operation well just
remove the entire process of checking types. In fact remove
any vestige of type altogether. How fast is lack of integrity?
Sql still really hasn't learned how to tell time. But, like
an old dog, it still can learn how to live with a few more
ticks.
best,
October 23, 2007 at 11:43 pm
rog pike (10/23/2007)
(This is not directed at anyone in particular. At most itrises to the level of a shot across the bow of a ship
whose course needs dire correction).
Exactly what was tested here. What specific question(s) has been answered?
There is a pervasive idea in sql communities that sql operates in a different environment than the rest of computer science. That basic concepts of computer science do not apply to sql databases. That somehow the 'sql standard' immunizes sql users from concepts and standards established in the rest of computer science. There exists a double standard if you will. This of course is not only bizarre but stupid and has significant consequences. The idea of an sql programmer is taken as synonymous with a developer much like an sql database system is taken as synonymous with a relational database. That knowing everything in bol equates to understanding the database environment. That knowing how to write complex queries implies a deep understanding of relational ideas. All these things are utter nonsense. It leads to the sorry spectacle of brilliant programmers, who when asked to explain what they have really done, proving that they have no idea of what they are talking about. It leads to the assumption that sql vendors are the real gatekeepers of how things ought to be. Sadly this means the database industry is based on faith and little else. This means that the confusion that exists within sql is perpetually passed on to each newbie. This leads to the sorry spectacle of users who are unable to even frame an intelligent question being only outdone by the ridiculous explanations they are given. Unfortunately in the ever increasingly isolated world of sql 'really' reading something like bol and testing software remain elusive targets.
So, you think the whole SQL Community is stupid... nice way to entice people to buy your product, Rog... and if you think this has anything to do with knowing everything in BOL, you're sadly mistaken.
You still haven't answered the question of how long it takes RAC to do the same thing... take the test data these guys have provided and run it against RAC... report the results for duration. Think of it as me asking you to spam us with how good your product really is, just this once 😛 This is your opportunity to shine. And, if it happens to run a bit slower, shower us with your knowledge of why that's a good thing. Show us how your product uses fewer resources in terms of CPU usage, RAM usage, and Disk I/O than anything in T-SQL... show us this wonderful decrease in development time and how 20 or 30 batch processes that use your product won't simply bring the server to it's knees as so many other 3rd party product have been demonstrated to do. C'mon... you can do it! :w00t:
--Jeff Moden
Change is inevitable... Change for the better is not.
October 23, 2007 at 11:51 pm
Sometimes it is necessary to rotate results so that columns are presented horizontally and rows are presented vertically. This is known as creating a PivotTable®, creating a cross-tab report, or rotating data.
Assume there is a table Pivot that has one row per quarter. A SELECT of Pivot reports the quarters vertically:
Year Quarter Amount
---- ------- ------
1990 1 1.1
1990 2 1.2
1990 3 1.3
1990 4 1.4
1991 1 2.1
1991 2 2.2
1991 3 2.3
1991 4 2.4
A report must be produced with a table that contains one row for each year, with the values for each quarter appearing in a separate column, such as:
Year
Q1
Q2
Q3
Q4
1990
1.1
1.2
1.3
1.4
1991
2.1
2.2
2.3
2.4
These are the statements used to create the Pivot table and populate it with the data from the first table:
USE Northwind
GO
CREATE TABLE Pivot
( Year SMALLINT,
Quarter TINYINT,
Amount DECIMAL(2,1) )
GO
INSERT INTO Pivot VALUES (1990, 1, 1.1)
INSERT INTO Pivot VALUES (1990, 2, 1.2)
INSERT INTO Pivot VALUES (1990, 3, 1.3)
INSERT INTO Pivot VALUES (1990, 4, 1.4)
INSERT INTO Pivot VALUES (1991, 1, 2.1)
INSERT INTO Pivot VALUES (1991, 2, 2.2)
INSERT INTO Pivot VALUES (1991, 3, 2.3)
INSERT INTO Pivot VALUES (1991, 4, 2.4)
GO
This is the SELECT statement used to create the rotated results:
SELECT Year,
SUM(CASE Quarter WHEN 1 THEN Amount ELSE 0 END) AS Q1,
SUM(CASE Quarter WHEN 2 THEN Amount ELSE 0 END) AS Q2,
SUM(CASE Quarter WHEN 3 THEN Amount ELSE 0 END) AS Q3,
SUM(CASE Quarter WHEN 4 THEN Amount ELSE 0 END) AS Q4
FROM Northwind.dbo.Pivot
GROUP BY Year
GO
This SELECT statement also handles a table in which there are multiple rows for each quarter. The GROUP BY combines all rows in Pivot for a given year into a single row in the output. When the grouping operation is being performed, the CASE functions in the SUM aggregates are applied in such a way that the Amount values for each quarter are added into the proper column in the result set and 0 is added to the result set columns for the other quarters.
If the results of this SELECT statement are used as input to a spreadsheet, it is easy for the spreadsheet to calculate a total for each year. When the SELECT is used from an application it may be easier to enhance the SELECT statement to calculate the yearly total. For example:
SELECT P1.*, (P1.Q1 + P1.Q2 + P1.Q3 + P1.Q4) AS YearTotal
FROM (SELECT Year,
SUM(CASE P.Quarter WHEN 1 THEN P.Amount ELSE 0 END) AS Q1,
SUM(CASE P.Quarter WHEN 2 THEN P.Amount ELSE 0 END) AS Q2,
SUM(CASE P.Quarter WHEN 3 THEN P.Amount ELSE 0 END) AS Q3,
SUM(CASE P.Quarter WHEN 4 THEN P.Amount ELSE 0 END) AS Q4
FROM Pivot AS P
GROUP BY P.Year) AS P1
GO
Both GROUP BY with CUBE and GROUP BY with ROLLUP compute the same sort of information as shown in the example, but in a slightly different format.
October 24, 2007 at 12:12 am
Matt, excellent testing.
BUT... I am missing data from SQL Profiler regarding CPU/DURATION/READS and WRITES for the different methods used.
When I started out with SQL Server 2005 and PIVOT operator in Beta1-3, I found it very slow and resource-hogging.
Now with SP2 it seems to be fixed.
For a month ago I helped Scott Klein with a problem that involved PIVOT.
I found that now PIVOT takes fewer READS than CASE/GROUP BY statement.
Do you think you can run the test again and add the missing SQL Profiler data?
And yes, CASE/GROUP BY approach is still more versatile because of able to have multiple aggregations per record. PIVOT still is only able to handle one aggregation.
N 56°04'39.16"
E 12°55'05.25"
October 24, 2007 at 1:10 am
VAIYDEYANATHAN.V.S (10/23/2007)
Sometimes it is necessary to rotate results so that columns are presented horizontally and rows are presented vertically. This is known as creating a PivotTable®, creating a cross-tab report, or rotating data.Assume there is a table Pivot that has one row per quarter. A SELECT of Pivot reports the quarters vertically:....
Nice copy of BOL... 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
October 24, 2007 at 1:30 am
Peter Larsson (10/24/2007)
Matt, excellent testing.BUT... I am missing data from SQL Profiler regarding CPU/DURATION/READS and WRITES for the different methods used.
When I started out with SQL Server 2005 and PIVOT operator in Beta1-3, I found it very slow and resource-hogging.
Now with SP2 it seems to be fixed.
For a month ago I helped Scott Klein with a problem that involved PIVOT.
I found that now PIVOT takes fewer READS than CASE/GROUP BY statement.
Do you think you can run the test again and add the missing SQL Profiler data?
And yes, CASE/GROUP BY approach is still more versatile because of able to have multiple aggregations per record. PIVOT still is only able to handle one aggregation.
I gotta agree with the excellent testing, there. Nicely done.
Just to be sure my eyes haven't gotten too bad, though... test results say 10M... but I'm pretty sure that the test pivot table only has 5M rows in it... true or do I really need a new set of glasses? :hehe:
--Jeff Moden
Change is inevitable... Change for the better is not.
October 24, 2007 at 7:31 am
Jeff Moden (10/24/2007)
I gotta agree with the excellent testing, there. Nicely done.
Just to be sure my eyes haven't gotten too bad, though... test results say 10M... but I'm pretty sure that the test pivot table only has 5M rows in it... true or do I really need a new set of glasses? :hehe:
Well - close but no cigar:) You missed the second little piece:
Insert blah....
Select top 50000000......
Go 2
It runs the insert twice.
And - the group by is more versatile, no doubt.
I'll get the profiler data later on.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 24, 2007 at 9:27 am
Heh... good... not my eyes... just time of day. Thanks, Matt.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 25, 2007 at 8:13 am
I have the profiler results. They're attached, but I summarized them a bit.
I ran it twice, once as is, and once after adding a CHAR(300) field (so that the process stops thinking a table scan is the best way to approach this).
Anyway here's a summary of the results:
TextData CPU Reads Writes Duration
--CASE syntax - no helper index 26,719 37,963 - 0:00:14.412 14,412,168
--CASE syntax - ProdID include (coid,orderamount) 28,030 31,257 - 0:00:14.446 14,445,524
--CASE syntax - (ProdID,coid) include (orderamount) 27,892 31,285 - 0:00:14.408 14,407,634
--CASE syntax - (CoID,ProdID) include (orderamount) 26,142 28,554 - 0:00:13.448 13,447,878
--CASE syntax - (CoID) include (ProdID,orderamount) 25,610 28,540 - 0:00:13.188 13,188,389
--PIVOT syntax - no helper index 32,874 40,096 2 0:00:18.319 18,318,680
--PIVOT syntax - ProdID include (coid,orderamount) 34,390 31,289 - 0:00:18.009 18,009,332
--PIVOT syntax - (ProdID,coid) include (orderamount) 34,703 31,261 - 0:00:17.800 17,799,724
--PIVOT syntax - (CoID,ProdID) include (orderamount) 29,515 28,561 - 0:00:15.368 15,368,222
--PIVOT syntax - (CoID) include (ProdID,orderamount) 29,297 28,548 - 0:00:29.993 29,992,939
Pivot range 29,297 28,548 0:00:15.368 15,368,222
34,703 40,096 0:00:29.993 29,992,939
CASE range 25,610 28,540 0:00:13.188 13,188,389
28,030 37,963 0:00:14.446 14,445,524
Case 2 - same as 1, BUT add a CHAR(300) field to simulate all of the other data a prod table might have
--CASE syntax - no helper index 58,608 313,642 1 0:02:31.113 151,113,171
--CASE syntax - ProdID include (coid,orderamount) 72,062 31,203 - 0:00:40.085 40,084,973
--CASE syntax - (ProdID,coid) include (orderamount) 71,750 31,231 - 0:00:36.794 36,793,597
--CASE syntax - (CoID,ProdID) include (orderamount) 74,625 31,229 - 0:00:38.376 38,375,597
--CASE syntax - (CoID) include (ProdID,orderamount) 74,642 31,201 - 0:00:37.861 37,860,628
--PIVOT syntax - no helper index 57,407 313,436 1 0:02:40.110 160,110,450
--PIVOT syntax - ProdID include (coid,orderamount) 79,672 31,207 - 0:00:41.119 41,119,222
--PIVOT syntax - (ProdID,coid) include (orderamount) 87,063 31,235 - 0:00:44.353 44,352,686
--PIVOT syntax - (CoID,ProdID) include (orderamount) 75,250 31,235 - 0:00:38.147 38,147,400
--PIVOT syntax - (CoID) include (ProdID,orderamount) 80,062 31,272 - 0:00:40.515 40,514,660
Pivot range 57,407 31,207 0:00:38.147 38,147,400
87,063 313,436 0:02:40.110 160,110,450
CASE range 58,608 31,201 0:00:36.794 36,793,597
74,642 313,642 0:02:31.113 151,113,171
This is no doubt going to get "whacked" formatting-wise, so I'll throw the spreadsheet in as well.
Bottom line though -
- PIVOT and CASE seem to both perform fairly close to on par, with a slight edge to CASE.
- They don't respond the same to indexing, since their "optimal" indexing scheme seems to be different.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 8, 2011 at 6:18 am
Jeff,
absolutely brilliant!
I have been using a different method for some time to do this - but yours is simpler and blistering fast!
Well done and thank you.
JJ
April 8, 2011 at 7:18 pm
Thanks for the kudo, Jeremy. :blush: I really appreciate it. And thanks for helping me find one of my "lost posts". I've been looking for this one but got thousands of hits when searching for things like the word "Pivot".
As a side bar, I wrote a couple of articles on Crosstabs and Pivots about a year after the code in this thread. I may have to revisit the code in this thread to see what I can do to improve it because the CrossTab method almost always beat the Pivot method in the articles.
Here're the links for the articles.
http://www.sqlservercentral.com/articles/T-SQL/63681/
http://www.sqlservercentral.com/articles/Crosstab/65048/
--Jeff Moden
Change is inevitable... Change for the better is not.
September 1, 2015 at 12:16 pm
I have code that creates a pivot table from a view and then appends a summary row with a UNION ALL.
I need to take the above output and put it into a table that can be displayed on a website and downloaded if necessary.
Will the published procedure do that?
September 1, 2015 at 1:02 pm
tnpich (9/1/2015)
I have code that creates a pivot table from a view and then appends a summary row with a UNION ALL.I need to take the above output and put it into a table that can be displayed on a website and downloaded if necessary.
Will the published procedure do that?
Which one? A lot of people have made entries on this thread.
Also, your method above hits the table twice. Once for the pivot and once for the total. It's probably not necessary if you were to do the pivot with a CROSSTAB/GROUP and use WITH ROLLUP or similar.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 45 (of 46 total)
You must be logged in to reply to this topic. Login to reply