Normalize data using Pivot WITHOUT using Aggregate function

  • ouch that changes things a bit;

    can you please do a SELECT DISTINCT ColumnName From yourtable, so we can have the list of columnnames to normalize this data into? so far we have 9 columns, i'm betting there might be more:

    'DestinationTableName'

    'ID'

    'SubjectArea'

    'LocalCourseCode'

    'SchoolID'

    'Source'

    'CourseTitle'

    'TeacherID'

    'Credits Possible'

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I just did a select distinct, and those are all of the columns from the table.

  • Man, this would make for a good T-SQL challenge!

    Now I'm really determined to figure this out!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • This is what I've come up with so far:

    declare @count int

    declare @groupcount int

    declare @NumberofRows int

    select @NumberofRows = COUNT(*) from @school

    set @count = 1

    set @groupcount = 0

    while @count <= @NumberofRows

    begin

    if (select case when text = '[DW].[DimClass]' then 1 else 0 end from @school where UniqueID = @count) = 1

    BEGIN

    set @groupcount = @groupcount + 1

    update @school

    set GroupID = @groupcount

    where UniqueID = @count

    END

    ELSE

    update @school

    set GroupID = @groupcount

    where UniqueID = @count

    set @count = @count + 1

    END

    However, it is RBAR, and INCREDIBLY SLOOOOOW! I altered the @school table to include a GroupID (int) column.

    Since performance is such a bear, I'd ideally like to do something like the above on the fly -- when Pivoting the data, as it is looking at each row, check if the Text is '[DW].[DimClass]'; if it is, increment the GroupID by 1.

  • Justin James (12/29/2009)


    However, it is RBAR, and INCREDIBLY SLOOOOOW!

    Which is why I'm making every effort to avoid RBAR. (If I was coding this in VB.NET or something similar, I'd already have a solution.)

    I've been messing around with a recursive CTE, but I haven't been able to get it to work correctly. Here's what I have so far:

    declare @RowNum table (ID int, RowNo int)

    insert into @RowNum select ID, row_number() over (order by ID) from @school where ColumnName = 'DestinationTableName'

    select * from @RowNum

    ; with NumberThis (ID, RowNo, ColumnName, [Text]) as

    (

    select ID, 1, ColumnName, [Text] from @school where ID < (select ID from @RowNum where RowNo = 2)

    union all

    select s.ID, n.RowNo + 1, s.ColumnName, s.[Text] from @school s join NumberThis n on s.ID > n.ID and s.ID < (select ID from @RowNum where RowNo = n.RowNo + 1)

    )

    select * from NumberThis

    I haven't been able to get the recursive part of the CTE (after the UNION ALL) to work properly. Maybe someone else reading this can point out what I'm doing wrong.

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • I'm not having much luck with this, but here's where I'm going with it.

    Using the @school table that we defined earlier, I started with this query:

    select ID, row_number() over (order by ID) from @school

    where ColumnName = 'DestinationTableName'

    What this does is it assigns an index to every ID that contains "DestinationTableName." It looks something like this:

    ID(No column name)

    01

    72

    163

    In other words, your "DestinationTableName" (your delimiter, if you will) occurs at IDs 0, 7, and 16 in our sample data.

    What I'm hoping to do is to use this to define row numbers for the values in the table. I've been messing around with a recursive CTE for this, but haven't gotten it to work properly. I started with this code:

    ; with NumberThis (ID, RowNo, ColumnName, [Text]) as

    (

    select ID, 1, ColumnName, [Text] from @school where ID < (select ID from @RowNum where RowNo = 2)

    )

    select * from NumberThis

    (Note: @RowNum is a table I created to hold the contents of the "DestinationFileName" IDs I described above.)

    This CTE (without the recursion) gives me this:

    IDRowNoColumnNameText

    01DestinationTableName[DW].[DimClass]

    11ID123146

    21SubjectAreaMath

    31LocalCourseCode000A1

    41SchoolID123

    51SourceHub

    61CourseTitleAlgebra I

    Unfortunately, at this point, I'm at a bit of a loss as to what to do with it next. I'm still messing around with it, but I'm hoping one of you who has more SQL experience than I do can pick up the ball and continue helping poor Justin out!

    +--------------------------------------------------------------------------------------+
    Check out my blog at https://pianorayk.wordpress.com/

  • Justin James (12/29/2009)


    Please see my original post for code to copy/paste.

    Tell ya what, Justin... please see the first link in my signature line below. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Justin James (12/29/2009)


    There is not. I was thinking of using ROW_NUMBER(), but was unsure how to partition by each block of 6 rows.

    I see you've already got that... simple integer divide will do the trick.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Alright.. I think this might what you're looking for with a pivot-ish solution. I found this on another site and retro fitted to your code ( I just used a normal table for school):

    -- May be a solution

    ;With CTE(RID,ID,ColumnName,Text)

    AS

    (

    Select ROW_NUMBER() over (Order by ID) as RID,* from

    (Select * from school UNION Select Max(ID),'ID','' From school)a

    Where ColumnName = 'ID'

    )

    Select * into #group FROM School a

    JOIN (Select a.RID,A.id as Srt,B.ID as ends from cte a JOIN cte b on

    B.RID - A.rid =1)b

    ON a.id >= b.srt and a.id < b.ends

    Select PKeys.Text as ID,B.Text as 'SubjectArea',

    C.Text as 'LocalCourseCode',D.Text as 'SchoolID',

    E.Text as 'Source', F.Text as 'CourseTitle',

    G.Text as 'TeacherID', H.Text as 'Credits Possible'--,

    --I.columnName as 'Destination

    FROM

    (Select RID,Text from #group where ColumnName = 'ID')PKEYS

    LEFT OUTER JOIN #group B ON PKEYS.RID = B.RID AND B.ColumnName= 'SubjectArea'

    LEFT OUTER JOIN #group C ON PKEYS.RID = C.RID AND C.ColumnName= 'LocalCourseCode'

    LEFT OUTER JOIN #group D ON PKEYS.RID = D.RID AND D.ColumnName= 'SchoolID'

    LEFT OUTER JOIN #group E ON PKEYS.RID = E.RID AND E.ColumnName= 'Source'

    LEFT OUTER JOIN #group F ON PKEYS.RID = F.RID AND F.ColumnName= 'CourseTitle'

    LEFT OUTER JOIN #group G ON PKEYS.RID = G.RID AND G.ColumnName= 'TeacherID'

    LEFT OUTER JOIN #group H ON PKEYS.RID = H.RID AND H.ColumnName= 'Credits Possible'

    The original source was here: http://www.calsql.com/2009/10/pivot-with-out-aggregate.html

    Thanks,

    Rich

  • How about this to assign a group number to each set regardless of the number of rows?

    ;with cte1 AS

    (

    SELECT id,

    row_number() OVER (order BY id) AS row

    FROM @school

    WHERE text = '[DW].[DimClass]'

    ),

    cte2 AS

    (

    SELECT

    row_number() OVER (order BY a.id) AS grp,

    a.id start,

    isnull(b.id,99) finish

    FROM cte1 a

    LEFT OUTER JOIN cte1 b ON a.row=b.row-1

    )

    ,cte3 AS (

    SELECT s.*, c.grp

    FROM @school s

    INNER JOIN cte2 c ON s.id >= c.start AND id < c.finish

    )

    SELECT [ID],[SubjectArea], [LocalCourseCode], [CourseTitle], [SchoolID], [Source],[TeacherID],[Credits Possible]

    FROM (select ColumnName,[text],grp FROM cte3) As TheSource

    PIVOT

    (

    max([Text])

    FOR

    ColumnName --each value

    IN ( [ID],[SubjectArea], [LocalCourseCode], [CourseTitle], [SchoolID], [Source],[TeacherID],[Credits Possible])

    ) AS PIVOTALIAS

    /* result set:

    IDSubjectAreaLocalCourseCodeCourseTitleSchoolIDSourceTeacherIDCredits Possible

    123146Math000A1Algebra I123HubNULLNULL

    94503Science9037Biology II46Annex583981.0

    84023NULLNULLPhysics INULLAnnexNULLNULL

    */

    Edit: complete code and result set added (PIVOT clause was missing before...)



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Another overcomplicated effort from the peanut gallery.

    SELECT d.OutputRowNo,

    MAX(CASE ColumnName WHEN 'ID' THEN [Text] END) AS [ID],

    MAX(CASE ColumnName WHEN 'SubjectArea' THEN [Text] END) AS [SubjectArea],

    MAX(CASE ColumnName WHEN 'LocalCourseCode' THEN [Text] END) AS [LocalCourseCode],

    MAX(CASE ColumnName WHEN 'SchoolID' THEN [Text] END) AS [SchoolID],

    MAX(CASE ColumnName WHEN 'Source' THEN [Text] END) AS [Source],

    MAX(CASE ColumnName WHEN 'CourseTitle' THEN [Text] END) AS [CourseTitle],

    MAX(CASE ColumnName WHEN 'TeacherID' THEN [Text] END) AS [TeacherID],

    MAX(CASE ColumnName WHEN 'Credits Possible' THEN [Text] END) AS [Credits Possible]

    FROM @school s

    INNER JOIN (

    SELECT

    OutputRowNo = ROW_NUMBER() OVER(ORDER BY s.[ID]),

    Lowerbound = s.[ID],

    Upperbound = ISNULL((SELECT MIN([ID]) FROM @school WHERE [ID] > s.[ID] AND [Text] = '[DW].[DimClass]')-1, (SELECT MAX([ID]) FROM @school))

    FROM @school s

    WHERE s.[Text] = '[DW].[DimClass]'

    ) d ON s.[ID] BETWEEN d.Lowerbound AND d.Upperbound

    GROUP BY d.OutputRowNo

    Results:

    OutputRowNo ID SubjectArea LocalCourseCode SchoolID Source CourseTitle TeacherID Credits Possible

    ----------- ------- ----------- --------------- -------- ------ ----------- --------- ----------------

    1 123146 Math 000A1 123 Hub Algebra I NULL NULL

    2 94503 Science 9037 46 Annex Biology II 58398 1.0

    3 84023 NULL NULL NULL Annex Physics I NULL NULL

    Cheers

    ChrisM@home


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Nice one, Chris!!

    Another overcomplicated effort from the peanut gallery.

    I wouldn't take it that serious... if the requirement would have been to always have a group of 6, then there would have been easier solutions than what you recommended in your earlier post...

    But this one beats my PIVOT solution by about 50% (as per execution plan, since there are not enough sample data to get any comparable duration results).

    I'm sure there will be one or the other around to confirm that the CASE-based solution "outperformed" the PIVOT. Again.

    But I guess it's not really about using PIVOT or CASE but more how to get the data grouped together. And your solution requires one less sorting operation (I have one more call of the ROW_NUMBER function...).

    Again: Good job! (as far as I can see... 🙂 )



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Cheers Lutz!

    Having SQL Server handy to play with helps a bit ...

    Looks like the OP has at least two solutions to choose from & tweak to perfection. Job's a good'un.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Unless I missed it earlier in the thread, has anyone wondered why the text is "DW.DimClass"?

    Is this data coming out of a Data Warehouse?

    If so, would you not be better querying it with MDX (or failing that, finding the source data instead?)

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • For the sake of it, a solution which copes with new "ColumnName" values by using dynamic sql (go ahead - pork chop me or whatever you do :-D)

    If you don't like dynamic SQL , then you could just use this to generate the query as and when the data requires a change....just change the "exec sp_executesql @sql" line to "print @sql"....

    --= Dump the data into a temp table as this uses dynamic sql and needs a "proper" table to read.

    IF OBJECT_ID('tempdb..#school') IS NULL

    SELECT ID, ColumnName, [Text]

    INTO #school

    FROM @school

    DECLARE @sql NVARCHAR(4000)

    SET @sql = ''

    ;WITH base(ID,ColName,Text,ColNum)

    AS

    (

    --= Get the data into some kind of order with the ID from the '[DW].[DimClass]' row for each set of data as the key

    --= And a column number to ascertain the correct sequence for the columns later

    SELECT s1.ID, Details.ColName, Details.Text , ROW_NUMBER() OVER(PARTITION BY s1.ID ORDER BY Details.ID) AS ColNum

    FROM #school s1

    OUTER APPLY (

    --= Get the associated rows for the current group/ID

    SELECT ID,ColumnName,[Text]

    FROM #school s2

    WHERE s2.ID>s1.ID

    AND s2.ID < ISNULL(

    (

    --= Get the ID of the next group of data

    SELECT TOP 1 ID

    FROM #school s3

    WHERE s3.ID>s1.ID AND s3.ColumnName = 'DestinationTableName'

    ORDER BY s3.ID

    ),

    (

    --= Get the last row of data when there are no more groups

    SELECT MAX(ID) FROM #school

    )

    )

    ) AS Details(ID,ColName,TEXT)

    WHERE s1.ColumnName = 'DestinationTableName'

    ), cols(Name,POSITION)

    AS

    (

    --= Get the distinct ColumnName values and positions (assuming the data will always place them in order in the first place)

    SELECT ColName ,MAX(ColNum)

    FROM base

    GROUP BY ColName

    ), colList(list)

    AS

    (

    --= Use FOR XML PATH('') trick to get a list of columns

    SELECT STUFF((

    SELECT ', '+QUOTENAME(Name,'[')

    FROM cols

    ORDER BY POSITION

    FOR XML PATH('')),1,1,'')

    )

    SELECT @sql=list

    FROM colList

    --= Now build a dynamic SQL to PIVOT the data

    SET @sql = N'

    ;WITH base(BASEID,ColName,TEXT)

    AS

    (

    SELECT s1.ID, Details.ColName, Details.Text

    FROM #school s1

    OUTER APPLY (

    SELECT ID,ColumnName,[Text]

    FROM #school s2

    WHERE s2.ID>s1.ID

    AND s2.ID < ISNULL(

    (

    SELECT TOP 1 ID

    FROM #school s3

    WHERE s3.ID>s1.ID AND s3.ColumnName = ''DestinationTableName''

    ORDER BY s3.ID

    ),

    (

    SELECT MAX(ID) FROM #school

    )

    )

    ) AS Details(ID,ColName,TEXT)

    WHERE s1.ColumnName = ''DestinationTableName''

    )

    SELECT BASEID,' + @sql + '

    FROM base

    PIVOT (MAX([Text]) FOR ColName IN ('+@sql +')) AS PVT

    ORDER BY BASEID'

    EXEC sp_executesql @sql

    IF NOT (OBJECT_ID('tempdb..#school') IS NULL)

    DROP TABLE #school

    I make no claims for performance as I have not tried to optimise it, but it demonstrates a technique that could be used...

    MM

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

    Viewing 15 posts - 16 through 30 (of 50 total)

    You must be logged in to reply to this topic. Login to reply