Group by on 120 pivotted columns

  • Hi,

    I have the next case with a challenge on grouping records on 120 pivotted columns. Objectnames are in Dutch but hopefully that is no problem. The records in our source table have questionaire outputs for patients in our hospital. Every record holds the answer of one question in a questionaire. The questionaire that we would like to pivot consists of 120 questions so we have 120 records per patient. These records are already isolated from other questionaires so our source table only hols the appropriate questionaire. We would like to dynamically pivot the source table having the patientnumber on the rows and the individual questions as columns. This pivotting is no problem, but we get 120 records per patient. What we would like to achieve is to group the records per patient so we end up with one row per patient with the corresponding answer on a question in the corresponding column. What we like to avoid is to hard code all the 120 columns individually in the group by statement. What we like to now is if there is a way to dynamically group by all 120 columns.

    All the steps are below. We list them all to give a complete overview what is being executed, but the problem lies within the last steps.

    Sorry that the code in not indented like it is in our real queries to make it more readable ...

    1. First temp tables are dropped when exist:

    if object_id('tempdb..##vragen_antwoorden_dialyse_pivot') is not null
    drop table ##vragen_antwoorden_dialyse_pivot

    if object_id('tempdb..#vragen_antwoorden_dialyse_distinct') is not null
    drop table #vragen_antwoorden_dialyse_distinct

    if object_id('tempdb..#vragen_antwoorden_dialyse') is not null
    drop table #vragen_antwoorden_dialyse

    2. Variables are declared


    declare @sqlquery nvarchar(max)
    declare @pivotkolommen nvarchar(max)

    3. We take the unique questions from our source table and sort these questions based on the question number. The question holds the question number as the first x characters of that question.


    select z.vraagcode,
    z.subvraagcode,
    z.vraagstelling
    into #vragen_antwoorden_dialyse_distinct
    from (
    select top 1000 y.*
    from (
    select distinct x.vraagstelling,
    substring(vraagcode, patindex('%[^0-9]%', vraagcode), 1) as subvraagcode,
    case when patindex('%[^0-9]%', vraagcode) = 0 then vraagcode
    else substring(vraagcode, 1, patindex('%[^0-9]%', vraagcode) - 1) end as vraagcode
    from (
    select substring(vraagstelling, 1, charindex('.', vraagstelling, 1) - 1) as vraagcode,
    vraagstelling
    from [ElkerliekBA].kwt.vragen_antwoorden_dialyse
    ) x
    ) y
    order by cast(y.vraagcode as float)
    ) z

    4. Source query for pivot


    select patientnr,
    substring(vraagstelling, 1, charindex('.', vraagstelling, 1) - 1) as vraagcode,
    vraagstelling,
    Antwoord
    into #vragen_antwoorden_dialyse
    from [ElkerliekBA].kwt.vragen_antwoorden_dialyse

    5. Fill variable @PivotKolommen with all distinct questions


    select @PivotKolommen = coalesce(@PivotKolommen + ',','') + quotename(vraagstelling)
    from #vragen_antwoorden_dialyse_distinct

    6. Making the dynamic query

    First we used this code ...


    N'select [patientnr],' + @PivotKolommen + '
    into ##vragen_antwoorden_dialyse_pivot
    from #vragen_antwoorden_dialyse

    pivot(max([Antwoord])
    for [vraagstelling] in (' + @PivotKolommen + ')) as Qry'

    ... but tried to change it in this one to have the grouping per patient (one row per patient) a step earlier than grouping the finale result (see step 8 below)



    N'
    with pivot as
    (
    select [patientnr],' + @PivotKolommen + '
    from #vragen_antwoorden_dialyse

    pivot (
    max([Antwoord])
    for [vraagstelling] in (' + @PivotKolommen + ')
    ) as Qry
    )
    select patientnr,
    max(' + @PivotKolommen + ')
    from pivot
    group by patientnr'

    ... but to no avail ... the max function obviously can not handle the dynamic colunms in the variable PivotKolommen.

    7. Executing the query


    execute(@SQLQuery)

    8. Selecting the final result ... where we are stuck because per patient 120 records are shown (with the first query in step 6)


    select * from ##vragen_antwoorden_dialyse_pivot
    order by patientnr

    ... So, is there a way to accomplish a dynamic group by in order to avoid writing out all the 120 columns individually in the group by statement.

    • This topic was modified 5 years, 3 months ago by  michielbijnen.
    • This topic was modified 5 years, 3 months ago by  michielbijnen.
    • This topic was modified 5 years, 3 months ago by  michielbijnen.
  • I'll be honest, I'm not even going to try to propose a solution to the query. Sorry.

    Instead, let me ask a question. Have you looked at using a columnstore index? It's effectively a pivot on the data, from rows to columns. Further, aggregations are inherent, so you won't need to do this kind of group by processing. Just an idea. Hopefully someone with actual skill, and a ton of time, comes by shortly to propose a better answer. Good luck.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Without wading into the specifics of your situation you could use the stuff() function in combination with for xml path.

    In this code the cte is called num_expand and it creates a list of numbers from 1 to 10.  The select statement consolidates the list of numbers into a single comma delimited string.  If you replace the cte definition with the unique list of question Id's, the resulting comma  string could then be appended into your dynamic sql.  I don't think you need pivot just group by.

    with
    num_expand(num) as (
    select 1 num
    union all
    select
    ne.num+1
    from
    num_expand ne
    where
    num<10)
    select
    stuff((select ',' + cast(num as varchar(3)) from num_expand for xml path ('')), 1, 1, '') string;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • There's actually a very easy, almost trivial method to solve this problem.  Here's the article on how you could pull this off.  And, please, tell us why they want to denormalize the data like this, especially for so many columns.  It's not like it'll make a useful report and it'll make doing a computational analysis a whole lot more difficult.

    Here's the article.  You probably won't need "Pre-Aggregation" to speed it up because you don't have multiple rows to aggregate to use in a single "Cell" of the output.

    https://www.sqlservercentral.com/articles/cross-tabs-and-pivots-part-2-dynamic-cross-tabs

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply