Normalize data using Pivot WITHOUT using Aggregate function

Question

Normalize data using Pivot WITHOUT using Aggregate function

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

jjbear11 Default port Points: 1497 More actions · Answer 1

Crap -- you're right about not getting the same results. There has to be a way around using ISNULL. That specific piece is causing records to come back 200/minute, which won't work for a table that has 900,000 records.

LutzM SSC Guru Points: 107049 More actions · Answer 2

Justin James (1/4/2010)
... There has to be a way around using ISNULL. ...

YES, there is: Try a different solution!

I have to ask again: What is your reason for refusing to have a look at the other solutions?? If you can wait more than a day for one query to finish there should be 10 minutes for an alternative...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

jjbear11 Default port Points: 1497 More actions · Answer 3

There is a reason -- the reason is that the solution that I posted above WORKS, albeit slowly. I tried your solution, and I have performance issues with it as well (returns 15,000 records in 3 minutes). Also, the number/names of columns is dynamic -- hence the reason I went with a dynamic SQL result.

LutzM SSC Guru Points: 107049 More actions · Answer 4

Well, since we just have learned that you're looking for a dynamic solution, here it is:

Step 1: running the code from before excluding the PIVOT part, but storing the result into a temp table instead,

Step 2: build a DynamicCrossTab as learned from Jeff Moden (by studying the related article in my signature...), Thanx again, Jeff 😉

Step 3: Verify the resulting SQL code is what you want

Step 4: uncomment the EXEC part at then end of the query and comment the PRINT section

Issue that is still open:

a) There is no information regarding the column order of the resulting PIVOT. And there are not many options either (e.g. order by Columnname or first appearance (=min(id)) or by number of appearance count(*) )

SET nocount on

declare @school table ([ID] int, ColumnName varchar(45), [Text] varchar(50))

--Group 1

insert into @school values (0, 'DestinationTableName', '[DW].[DimClass]')

insert into @school values (1, 'ID', '123146')

insert into @school values (2, 'SubjectArea', 'Math')

insert into @school values (3, 'LocalCourseCode', '000A1')

insert into @school values (4, 'SchoolID', '123')

insert into @school values (5, 'Source', 'Hub')

insert into @school values (6, 'CourseTitle', 'Algebra I')

--Group 2

insert into @school values (7, 'DestinationTableName', '[DW].[DimClass]')

insert into @school values (8, 'ID', '94503')

insert into @school values (9, 'SubjectArea', 'Science')

insert into @school values (10, 'LocalCourseCode', '9037')

insert into @school values (12, 'SchoolID', '46')

insert into @school values (12, 'Source', 'Annex')

insert into @school values (13, 'CourseTitle', 'Biology II')

insert into @school values (14, 'TeacherID', '58398')

insert into @school values (15, 'Credits Possible', '1.0')

--Group 3

insert into @school values (16, 'DestinationTableName', '[DW].[DimClass]')

insert into @school values (17, 'ID', '84023')

insert into @school values (18, 'Source', 'Annex')

insert into @school values (19, 'CourseTitle', 'Physics I')

;with cte1 AS

(

SELECT id,

row_number() OVER (order BY id) AS row

FROM @school

WHERE text = '[DW].[DimClass]'

),

cte2 AS

(

SELECT

row_number() OVER (order BY a.id) AS grp,

a.id start,

isnull(b.id,99) finish

FROM cte1 a

LEFT OUTER JOIN cte1 b ON a.row=b.row-1

)

SELECT s.*, c.grp

INTO #TmpSchool

FROM @school s

INNER JOIN cte2 c ON s.id >= c.start AND id < c.finish

DECLARE @SQL1 NVARCHAR(4000),

@SQL2 NVARCHAR(4000),

@SQL3 NVARCHAR(4000)

--===== Create the "static" section of the code

SELECT @SQL1 = 'SELECT max(CASE WHEN Columnname = ''ID'' THEN [text] ELSE null END) AS [ID]'+CHAR(10)

--===== Create the "Mostly Static" section of the code

SELECT @SQL3 = '

from #TmpSchool GROUP BY grp'

--===== The "Hard" Part - Concatenation to Form the Date Columns

SELECT @SQL2 = COALESCE(@SQL2,'')

+ ' ,max(CASE WHEN Columnname = ' + QUOTENAME(d.Columnname,'''')

+ ' THEN [text] ELSE null END) AS [' + d.Columnname + ']' + CHAR(10)

FROM

(

SELECT Columnname FROM #TmpSchool WHERE Columnname<>'ID' AND Columnname<>'DestinationTableName' GROUP BY Columnname

) d ORDER BY Columnname

--===== Print the Dynamic SQL

PRINT @SQL1 + @SQL2 + @SQL3

--===== Execute the Dynamic SQL to create the desired report (uncomment to run it)

--EXEC (@SQL1 + @SQL2 + @SQL3)

DROP TABLE #TmpSchool

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Jeff Moden SSC Guru Points: 1004703 More actions · Answer 5

lmu92 (1/4/2010)
Thanx again, Jeff

Anytime, Lutz... thanks for the feedback.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 6

Justin James (1/4/2010)
There is a reason -- the reason is that the solution that I posted above WORKS, albeit slowly. I tried your solution, and I have performance issues with it as well (returns 15,000 records in 3 minutes). Also, the number/names of columns is dynamic -- hence the reason I went with a dynamic SQL result.

Justin,

Hope this helps

--= New improved version - tested against 937932 rows with 100 "ColumnNames"

--= And a maximum of 35 "ColumnNames" in any 1 group

--= Execution time was about 30 seconds on my quad core / sql 2008 3GB ram (with results discarded)

--= Execution time for 1041408 rows with results displayed in a grid and a maximum of 100 ColumnNames per group = 37 secs

--= Clean up

IF OBJECT_ID('tempdb..#GroupIDs') IS NOT NULL

DROP TABLE #groupIDs

IF OBJECT_ID('tempdb..#data') IS NOT NULL

DROP TABLE #data

--= Get the group header records into a temp table

SELECT IDENTITY(BIGINT,1,1) AS ROWID, ID ,CONVERT(BIGINT,0) AS NextID

INTO #GroupIDs

FROM school s3

WHERE s3.ColumnName = 'DestinationTableName'

ORDER BY ID

CREATE UNIQUE CLUSTERED INDEX ix_GRP_ID ON #GroupIDs (ID)

--= Figure out the last record in the table

DECLARE @MaxID BIGINT

SELECT @MaxID = MAX(ID) FROM school

--= Work out the range of IDs for each group

--= Note - I tried a "Quirky Update" here and just could not get it to work - this is quick enough (<1 sec) though

UPDATE G1

SET NextID = ISNULL(G2.ID,@MaxID+1)

FROM #GroupIDs G1

LEFT OUTER JOIN #GroupIDs G2

ON G2.ROWID = G1.ROWID+1

--= This now pulls the actual data rows into a temp table with the new "GroupID" and a row number within each group

SELECT S1.ID AS GroupID, S2.ID AS ItemID,S2.ColumnName,S2.Text ,ROW_NUMBER() OVER(PARTITION BY S1.ID ORDER BY S2.ID) AS ColNum

INTO #Data

FROM #GroupIDs G1

JOIN school S1

ON S1.ID = G1.ID

JOIN school S2

ON S2.ID > G1.ID AND S2.ID < G1.NextID

CREATE UNIQUE CLUSTERED INDEX ix_data_IDs ON #data(GroupID, ItemID)

CREATE INDEX ix_data_cols ON #data(ColumnName,ColNum)

--= Now to build the dynamic query

DECLARE @sql NVARCHAR(MAX)

SET @sql = ''

--= Get the distinct ColumnName values and positions (assuming the data will always place them in order in the first place)

--= Use FOR XML PATH('') trick to get a list of columns as a string

SELECT @sql=STUFF(

(

SELECT ', '+QUOTENAME(ColumnName,'[')

FROM (

SELECT ColumnName ,MAX(ColNum) AS Position

FROM #Data

GROUP BY ColumnName

) cols

ORDER BY Position

FOR XML PATH('')),1,1,'')

--= Now build a dynamic SQL to PIVOT the data

SET @sql = N'

SELECT GroupID,' + @sql + '

FROM (SELECT GroupID,ColumnName,Text FROM #data) a

PIVOT (MAX([Text]) FOR ColumnName IN ('+@sql +')) AS PVT

ORDER BY GroupID'

--= And run it

EXEC sp_executesql @sql

--= Clean up

IF OBJECT_ID('tempdb..#GroupIDs') IS NOT NULL

DROP TABLE #groupIDs

IF OBJECT_ID('tempdb..#data') IS NOT NULL

DROP TABLE #data

EDIT: Sorry, I forgot to say that I used a permanent table for this one called "school"

It was populated with some horrible random data using this script:

IF OBJECT_ID('school') IS NOT NULL

DROP TABLE school

GO

IF OBJECT_ID('column_names') IS NULL

CREATE TABLE column_names (ColumnName VARCHAR(50))

ELSE

TRUNCATE TABLE column_names

GO

insert column_names (ColumnName )

select TOP 100 NEWID()

from syscolumns

;with Numbers(N)

as

(

select 1

union all

select N+1 from Numbers

union all

select N+2 from Numbers

union all

select N+3 from Numbers

),data(ID,ColumnName,[Text],Children)

as

(

select top 20000 N,'DestinationTableName', '[DW].[DimClass]', ABS(CHECKSUM(NEWID())) % 100

from Numbers

)

select ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ID, CASE WHEN ROW_NUMBER() OVER( PARTITION BY ID ORDER BY (SELECT NULL)) = 1 THEN data.ColumnName ELSE Colums.ColumnName END AS ColumnName,

CASE WHEN ROW_NUMBER() OVER( PARTITION BY ID ORDER BY (SELECT NULL)) = 1 THEN data.Text ELSE LEFT(Colums.ColumnName,ABS(CHECKSUM(Colums.ColumnName)) % 20) END As [Text]

into school

from data

cross apply (SELECT TOP 5 ColumnName from column_names ORDER BY ColumnName UNION Select TOP(data.Children) ColumnName from column_names ORDER BY NEWID() ) Colums(ColumnName)

option (maxrecursion 0)

create unique clustered index ix_school_ID on school(ID)

CREATE NONCLUSTERED INDEX ix_school_data

ON [dbo].[school] ([ColumnName])

INCLUDE ([ID])

NOTE: the indexes at the end of this query are part of the solution

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);