Group by on 120 pivotted columns

Question

Group by on 120 pivotted columns

michielbijnen

Ten Centuries

Points: 1308
More actions
August 8, 2019 at 10:40 am

#3669806
Hi,
I have the next case with a challenge on grouping records on 120 pivotted columns. Objectnames are in Dutch but hopefully that is no problem. The records in our source table have questionaire outputs for patients in our hospital. Every record holds the answer of one question in a questionaire. The questionaire that we would like to pivot consists of 120 questions so we have 120 records per patient. These records are already isolated from other questionaires so our source table only hols the appropriate questionaire. We would like to dynamically pivot the source table having the patientnumber on the rows and the individual questions as columns. This pivotting is no problem, but we get 120 records per patient. What we would like to achieve is to group the records per patient so we end up with one row per patient with the corresponding answer on a question in the corresponding column. What we like to avoid is to hard code all the 120 columns individually in the group by statement. What we like to now is if there is a way to dynamically group by all 120 columns.
All the steps are below. We list them all to give a complete overview what is being executed, but the problem lies within the last steps.
Sorry that the code in not indented like it is in our real queries to make it more readable ...
1. First temp tables are dropped when exist:
```
if object_id('tempdb..##vragen_antwoorden_dialyse_pivot') is not null
drop table ##vragen_antwoorden_dialyse_pivot

if object_id('tempdb..#vragen_antwoorden_dialyse_distinct') is not null
drop table #vragen_antwoorden_dialyse_distinct

if object_id('tempdb..#vragen_antwoorden_dialyse') is not null
drop table #vragen_antwoorden_dialyse
```
2. Variables are declared
```
declare @sqlquery nvarchar(max)
declare @pivotkolommen nvarchar(max)
```
3. We take the unique questions from our source table and sort these questions based on the question number. The question holds the question number as the first x characters of that question.
```
select z.vraagcode,
z.subvraagcode,
z.vraagstelling
into #vragen_antwoorden_dialyse_distinct
from (
select top 1000 y.*
from (
select distinct x.vraagstelling,
substring(vraagcode, patindex('%[^0-9]%', vraagcode), 1) as subvraagcode,
case when patindex('%[^0-9]%', vraagcode) = 0 then vraagcode
else substring(vraagcode, 1, patindex('%[^0-9]%', vraagcode) - 1) end as vraagcode
from (
select substring(vraagstelling, 1, charindex('.', vraagstelling, 1) - 1) as vraagcode,
vraagstelling
from [ElkerliekBA].kwt.vragen_antwoorden_dialyse
) x
) y
order by cast(y.vraagcode as float)
) z
```
4. Source query for pivot
```
select patientnr,
substring(vraagstelling, 1, charindex('.', vraagstelling, 1) - 1) as vraagcode,
vraagstelling,
Antwoord
into #vragen_antwoorden_dialyse
from [ElkerliekBA].kwt.vragen_antwoorden_dialyse
```
5. Fill variable @PivotKolommen with all distinct questions
```
select @PivotKolommen = coalesce(@PivotKolommen + ',','') + quotename(vraagstelling)
from #vragen_antwoorden_dialyse_distinct
```
6. Making the dynamic query
First we used this code ...
```
N'select [patientnr],' + @PivotKolommen + '
into ##vragen_antwoorden_dialyse_pivot
from #vragen_antwoorden_dialyse

pivot(max([Antwoord])
for [vraagstelling] in (' + @PivotKolommen + ')) as Qry'
```
... but tried to change it in this one to have the grouping per patient (one row per patient) a step earlier than grouping the finale result (see step 8 below)
```
N'
with pivot as
(
select [patientnr],' + @PivotKolommen + '
from #vragen_antwoorden_dialyse

pivot (
max([Antwoord])
for [vraagstelling] in (' + @PivotKolommen + ')
) as Qry
)
select patientnr,
max(' + @PivotKolommen + ')
from pivot
group by patientnr'
```
... but to no avail ... the max function obviously can not handle the dynamic colunms in the variable PivotKolommen.
7. Executing the query
```
execute(@SQLQuery)
```
8. Selecting the final result ... where we are stuck because per patient 120 records are shown (with the first query in step 6)
```
select * from ##vragen_antwoorden_dialyse_pivot
order by patientnr
```
... So, is there a way to accomplish a dynamic group by in order to avoid writing out all the 120 columns individually in the group by statement.
- This topic was modified 5 years, 3 months ago by michielbijnen.
- This topic was modified 5 years, 3 months ago by michielbijnen.
- This topic was modified 5 years, 3 months ago by michielbijnen.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 1

I'll be honest, I'm not even going to try to propose a solution to the query. Sorry.

Instead, let me ask a question. Have you looked at using a columnstore index? It's effectively a pivot on the data, from rows to columns. Further, aggregations are inherent, so you won't need to do this kind of group by processing. Just an idea. Hopefully someone with actual skill, and a ton of time, comes by shortly to propose a better answer. Good luck.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Steve Collins SSCrazy Points: 2157 More actions · Answer 2

Without wading into the specifics of your situation you could use the stuff() function in combination with for xml path.

In this code the cte is called num_expand and it creates a list of numbers from 1 to 10. The select statement consolidates the list of numbers into a single comma delimited string. If you replace the cte definition with the unique list of question Id's, the resulting comma string could then be appended into your dynamic sql. I don't think you need pivot just group by.

with
num_expand(num) as (
select 1 num
union all
select
  ne.num+1
from
  num_expand ne
where
  num<10)
select
  stuff((select ',' + cast(num as varchar(3)) from num_expand for xml path ('')), 1, 1, '') string;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004340 More actions · Answer 3

There's actually a very easy, almost trivial method to solve this problem. Here's the article on how you could pull this off. And, please, tell us why they want to denormalize the data like this, especially for so many columns. It's not like it'll make a useful report and it'll make doing a computational analysis a whole lot more difficult.

Here's the article. You probably won't need "Pre-Aggregation" to speed it up because you don't have multiple rows to aggregate to use in a single "Cell" of the output.

https://www.sqlservercentral.com/articles/cross-tabs-and-pivots-part-2-dynamic-cross-tabs

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)