December 30, 2009 at 8:03 pm
mister.magoo (12/30/2009)
For the sake of it, a solution which copes with new "ColumnName" values by using dynamic sql (go ahead - pork chop me or whatever you do :-D)
Heh... nah... no pork chops for you. I don't believe that dynamic SQL is all that bad a thing. Take a look at the following... I even published that it's a good thing when done correctly.
http://www.sqlservercentral.com/articles/Crosstab/65048/
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2009 at 8:15 pm
lobbymuncher (12/30/2009)
Another overcomplicated effort from the peanut gallery.
I like it. Nicely done.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2009 at 8:25 pm
Jeff Moden (12/30/2009)
lobbymuncher (12/30/2009)
Another overcomplicated effort from the peanut gallery.I like it. Nicely done.
Ack... I may have to take that back. The code has a triangular join in it that causes 60 internal rows to be read from one of the 4 appearances of the table in the execution plan. It's not quite as bad as a full blown triangular join in that it reads each row once for each group where as a real triangular join at the row level would be devastating here.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2009 at 8:39 pm
Justin,
Please post the table creation statement and be sure to include any indexes you may have on the table.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2009 at 9:28 pm
Jeff Moden (12/30/2009)
Jeff Moden (12/30/2009)
lobbymuncher (12/30/2009)
Another overcomplicated effort from the peanut gallery.I like it. Nicely done.
Ack... I may have to take that back. The code has a triangular join in it that causes 60 internal rows to be read from one of the 4 appearances of the table in the execution plan. It's not quite as bad as a full blown triangular join in that it reads each row once for each group where as a real triangular join at the row level would be devastating here.
Nah... it's good. The WHERE clause keeps it from being a full "Triangular Join" but only if you have an index on ID.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2009 at 10:05 pm
Man, in the presence of a clustered index on the ID, that's some mighty fast code there, Chris.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2009 at 11:43 pm
Jeff Moden (12/30/2009)
Man, in the presence of a clustered index on the ID, that's some mighty fast code there, Chris.
:blush: thanks Jeff. I read your triangular join article!
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 4, 2010 at 6:17 am
mister.magoo (12/30/2009)
For the sake of it, a solution which copes with new "ColumnName" values by using dynamic sql (go ahead - pork chop me or whatever you do :-D)If you don't like dynamic SQL , then you could just use this to generate the query as and when the data requires a change....just change the "exec sp_executesql @sql" line to "print @sql"....
--= Dump the data into a temp table as this uses dynamic sql and needs a "proper" table to read.
IF OBJECT_ID('tempdb..#school') IS NULL
SELECT ID, ColumnName, [Text]
INTO #school
FROM @school
DECLARE @sql NVARCHAR(4000)
SET @sql = ''
;WITH base(ID,ColName,Text,ColNum)
AS
(
--= Get the data into some kind of order with the ID from the '[DW].[DimClass]' row for each set of data as the key
--= And a column number to ascertain the correct sequence for the columns later
SELECT s1.ID, Details.ColName, Details.Text , ROW_NUMBER() OVER(PARTITION BY s1.ID ORDER BY Details.ID) AS ColNum
FROM #school s1
OUTER APPLY (
--= Get the associated rows for the current group/ID
SELECT ID,ColumnName,[Text]
FROM #school s2
WHERE s2.ID>s1.ID
AND s2.ID < ISNULL(
(
--= Get the ID of the next group of data
SELECT TOP 1 ID
FROM #school s3
WHERE s3.ID>s1.ID AND s3.ColumnName = 'DestinationTableName'
ORDER BY s3.ID
),
(
--= Get the last row of data when there are no more groups
SELECT MAX(ID) FROM #school
)
)
) AS Details(ID,ColName,TEXT)
WHERE s1.ColumnName = 'DestinationTableName'
), cols(Name,POSITION)
AS
(
--= Get the distinct ColumnName values and positions (assuming the data will always place them in order in the first place)
SELECT ColName ,MAX(ColNum)
FROM base
GROUP BY ColName
), colList(list)
AS
(
--= Use FOR XML PATH('') trick to get a list of columns
SELECT STUFF((
SELECT ', '+QUOTENAME(Name,'[')
FROM cols
ORDER BY POSITION
FOR XML PATH('')),1,1,'')
)
SELECT @sql=list
FROM colList
--= Now build a dynamic SQL to PIVOT the data
SET @sql = N'
;WITH base(BASEID,ColName,TEXT)
AS
(
SELECT s1.ID, Details.ColName, Details.Text
FROM #school s1
OUTER APPLY (
SELECT ID,ColumnName,[Text]
FROM #school s2
WHERE s2.ID>s1.ID
AND s2.ID < ISNULL(
(
SELECT TOP 1 ID
FROM #school s3
WHERE s3.ID>s1.ID AND s3.ColumnName = ''DestinationTableName''
ORDER BY s3.ID
),
(
SELECT MAX(ID) FROM #school
)
)
) AS Details(ID,ColName,TEXT)
WHERE s1.ColumnName = ''DestinationTableName''
)
SELECT BASEID,' + @sql + '
FROM base
PIVOT (MAX([Text]) FOR ColName IN ('+@sql +')) AS PVT
ORDER BY BASEID'
EXEC sp_executesql @sql
IF NOT (OBJECT_ID('tempdb..#school') IS NULL)
DROP TABLE #school
I make no claims for performance as I have not tried to optimise it, but it demonstrates a technique that could be used...
MM
This query did exactly what I needed! However, the execution time is HORRIFIC -- 1 day, 5 hours, and 5 minutes to execute. For the record, the table on which I'm operating this query has appx 900,000 rows which are being pivoted out into 46 possible columns. The indexes that I have on the @school table are below:
create NONClustered index idx1 on @school (columnname)
create Unique clustered index idx2 on @school (ID)
Are there any improvements that you can think of which I could incorporate into this query? I'm not that familiar with indexes, but have at least a basic understanding of them.
January 4, 2010 at 6:34 am
Hi Justin
This solution has a triangular join, notorious for scaling poorly - but over a day! Blimey. The solution I posted earlier doesn't, so may be worth a try. If I were in your shoes, I'd hard-code those 46 potential columns because it will take less time than dicking around with dynamic sql.
Cheers
ChrisM
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 4, 2010 at 6:43 am
Justin James (1/4/2010)
mister.magoo (12/30/2009)
For the sake of it, a solution which copes with new "ColumnName" values by using dynamic sql (go ahead - pork chop me or whatever you do :-D)If you don't like dynamic SQL , then you could just use this to generate the query as and when the data requires a change....just change the "exec sp_executesql @sql" line to "print @sql"....
...
I make no claims for performance as I have not tried to optimise it, but it demonstrates a technique that could be used...
MM
This query did exactly what I needed! However, the execution time is HORRIFIC -- 1 day, 5 hours, and 5 minutes to execute. For the record, the table on which I'm operating this query has appx 900,000 rows which are being pivoted out into 46 possible columns. The indexes that I have on the @school table are below:
create NONClustered index idx1 on @school (columnname)
create Unique clustered index idx2 on @school (ID)
Are there any improvements that you can think of which I could incorporate into this query? I'm not that familiar with indexes, but have at least a basic understanding of them.
Is there any specific reason why you didn't use lobbymunchers or my solution?
Both are supposed to perform pretty good. Even better with clustered index on the ID, as Jeff mentioned in an earlier post.
It makes me wondering why you go for a solution that had a performance disclaimer attached in the first place...
As a rough duration estimate (depending on the system and, of course, index applied as mentioned before) we should talk about seconds, not hours, or even DAYS!!!
January 4, 2010 at 7:11 am
I agree that is HORRENDOUS!
If you can get away with a fixed query then do so.(One of the other examples)
If you have to cater for dynamic columns then let us know and someone will come up with a quick scalable method.
(I am busy right now but have some ideas)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
January 4, 2010 at 9:23 am
After working with MM's query, I was able to get results to come back in 36 seconds. Below is the final SQL that I was able to get working:
--= Dump the data into a temp table as this uses dynamic sql and needs a "proper" table to read.
IF OBJECT_ID('tempdb..#school') IS NULL
SELECT UniqueID as [ID], ColumnName, [Text]
INTO #school
FROM @school
order by uniqueid
create NONClustered index idx1 on #school (columnname)
create Unique clustered index idx2 on #school (ID)
update statistics #school
DECLARE @sql NVARCHAR(4000)
SET @sql = ''
;WITH base(ID,ColName,Text,ColNum)
AS
(
--= Get the data into some kind of order with the ID from the 'DestinationTableName' row for each set of data as the key
--= And a column number to ascertain the correct sequence for the columns later
SELECT s1.ID, Details.ColName, Details.Text , ROW_NUMBER() OVER(PARTITION BY s1.ID ORDER BY Details.ID) AS ColNum
FROM #school s1
OUTER APPLY (
--= Get the associated rows for the current group/ID
SELECT ID,ColumnName,[Text]
FROM #school s2
WHERE s2.ID>s1.ID
AND s2.ID < (
(
--= Get the ID of the next group of data
SELECT TOP 1 isnull(ID,(SELECT max(ID) from #school))
FROM #school s3
WHERE s3.ID>s1.ID AND s3.ColumnName = 'DestinationTableName'
ORDER BY s3.ID
)
)
) AS Details(ID,ColName,TEXT)
WHERE s1.ColumnName = 'DestinationTableName'
), cols(Name,POSITION)
AS
(
--= Get the distinct ColumnName values and positions (assuming the data will always place them in order in the first place)
SELECT ColName ,MAX(ColNum)
FROM base
GROUP BY ColName
), colList(list)
AS
(
--= Use FOR XML PATH('') trick to get a list of columns
SELECT STUFF((
SELECT ', '+QUOTENAME(Name,'[')
FROM cols
ORDER BY POSITION
FOR XML PATH('')),1,1,'')
)
SELECT @sql=list
FROM colList
--= Now build a dynamic SQL to PIVOT the data
SET @sql = N'
;WITH base(BASEID,ColName,TEXT)
AS
(
SELECT s1.ID, Details.ColName, Details.Text
FROM #school s1
OUTER APPLY (
SELECT ID,ColumnName,[Text]
FROM #school s2
WHERE s2.ID>s1.ID
AND s2.ID < (
(
SELECT TOP 1 isnull(ID,(select max(ID) from #school))
FROM #school s3
WHERE s3.ID>s1.ID AND s3.ColumnName = ''DestinationTableName''
ORDER BY s3.ID
)
)
) AS Details(ID,ColName,TEXT)
WHERE s1.ColumnName = ''DestinationTableName''
)
SELECT BASEID,' + @sql + '
FROM base
PIVOT (MAX([Text]) FOR ColName IN ('+@sql +')) AS PVT
ORDER BY BASEID'
--print @sql
EXEC sp_executesql @sql
IF NOT (OBJECT_ID('tempdb..#school') IS NULL)
DROP TABLE #school
The only changes that I made were:
1. Added indexes to the #school table to increase performance
create nonclustered index idx1 on #school (columnname)
create Unique clustered index idx2 on #school (ID)
2. Modified MM's original query, removing the ISNULL function, and incorporating it into the subquery.
MANY MANY thanks for all of the help that you all have provided!! These forums are an indispensable source of information! π
January 4, 2010 at 9:24 am
The reason why it took so long is because, in the presence of a triangular join, it must spawn over 405,000,450,000 rows. To put it more succinctly, it spawned .4 TERA rows!!!!!!! I'm surprised it didn't take a week. π
--Jeff Moden
Change is inevitable... Change for the better is not.
January 4, 2010 at 9:27 am
Jeff Moden (1/4/2010)
The reason why it took so long is because, in the presence of a triangular join, it must spawn over 405,000,450,000 rows. To put it more succinctly, it spawned .4 TERA rows!!!!!!! I'm surprised it didn't take a week. π
Some people are so reluctant to learn. Here Jeff, take this large bag of frozen pork chops.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 4, 2010 at 9:59 am
Justin,
I think you may still have some work to do as the change you made will be excluding the final grouping - I think...
moving the ISNULL the way you have will not solve the problem of finding the records belonging to the final group.
(
(
--= Get the ID of the next group of data
SELECT TOP 1 isnull(ID,(SELECT max(ID) from #school))
FROM #school s3
WHERE s3.ID>s1.ID AND s3.ColumnName = 'DestinationTableName'
ORDER BY s3.ID
)
)
DOES NOT return the same information as
ISNULL(
(
--= Get the ID of the next group of data
SELECT TOP 1 ID
FROM #school s3
WHERE s3.ID>s1.ID AND s3.ColumnName = 'DestinationTableName'
ORDER BY s3.ID
),
(
--= Get the last row of data when there are no more groups
SELECT MAX(ID) FROM #school
)
)
But, if you just declare a variable and select the MAX(ID) into that outside the main query, you should get the same saving...
-- outside the CTE:
DECLARE @MaxID BIGINT
SELECT @MaxID=MAX(ID) FROM #school
-- inside - where you made your change
ISNULL(
(
--= Get the ID of the next group of data
SELECT TOP 1 ID
FROM #school s3
WHERE s3.ID>s1.ID AND s3.ColumnName = 'DestinationTableName'
ORDER BY s3.ID
),
(
--= Get the last row of data when there are no more groups
@MaxID
)
)
There should be further savings possible later....
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
Viewing 15 posts - 31 through 45 (of 50 total)
You must be logged in to reply to this topic. Login to reply